Re: [HACKERS] plperlu problem with utf8 [REVIEW]

2011-01-16 Thread Andy Colson
On 01/16/2011 07:14 PM, Alex Hunsaker wrote: On Sat, Jan 15, 2011 at 14:20, Andy Colson wrote: This is a review of "plperl encoding issues" https://commitfest.postgresql.org/action/patch_view?id=452 Thanks for taking the time to review! [...] The Patch: == Applies clean to git h

Re: [HACKERS] plperlu problem with utf8 [REVIEW]

2011-01-16 Thread Alex Hunsaker
On Sat, Jan 15, 2011 at 14:20, Andy Colson wrote: > > This is a review of  "plperl encoding issues" > > https://commitfest.postgresql.org/action/patch_view?id=452 Thanks for taking the time to review! [...] > > The Patch: > == > Applies clean to git head as of January 15 2011.  PG built

[HACKERS] plperlu problem with utf8 [REVIEW]

2011-01-15 Thread Andy Colson
This is a review of "plperl encoding issues" https://commitfest.postgresql.org/action/patch_view?id=452 Purpose: Your database uses one encoding, and passes data to perl in the same encoding, which perl is not prepared for (it assumes UTF-8). This patch makes sure data is encoded i

Re: [HACKERS] plperlu problem with utf8

2010-12-22 Thread Alex Hunsaker
On Wed, Dec 22, 2010 at 11:24, David E. Wheeler wrote: > On Dec 21, 2010, at 8:19 PM, Alex Hunsaker wrote: > >> And here is v3, [ ...] > Awesome. Would you add it to > https://commitfest.postgresql.org/action/commitfest_view?id=9 please? Nah, I was willing to spend a couple of hours playing wit

Re: [HACKERS] plperlu problem with utf8

2010-12-22 Thread David E. Wheeler
On Dec 21, 2010, at 8:19 PM, Alex Hunsaker wrote: > And here is v3, fixes the above and also makes sure to properly > encode/decode SPI arguments. Tested on a latin1 database with latin1 > columns and utf8 with utf8 columns. Also passes make installcheck (of > course) and changes one or two thin

Re: [HACKERS] plperlu problem with utf8

2010-12-21 Thread Alex Hunsaker
On Mon, Dec 20, 2010 at 00:39, Alex Hunsaker wrote: > In further review over caffeine this morning I noticed there are a few > places I missed: plperl_build_tuple_result(), plperl_modify_tuple() > and Util.XS. And here is v3, fixes the above and also makes sure to properly encode/decode SPI argu

Re: [HACKERS] plperlu problem with utf8

2010-12-20 Thread Robert Haas
On Sun, Dec 19, 2010 at 7:56 PM, David E. Wheeler wrote: > +1 Awesome. Should this go into the next commitfest? Or might it be > considered a bug fix? CommitFest or no CommitFest, patches get applied when a committer acquires enough round tuits. Putting then into the next CommitFest just provid

Re: [HACKERS] plperlu problem with utf8

2010-12-19 Thread Alex Hunsaker
On Sun, Dec 19, 2010 at 21:00, David Christensen wrote: > > On Dec 19, 2010, at 2:20 AM, Alex Hunsaker wrote: > >> With the attached we: >> - for function arguments, convert (using pg_do_encoding_conversion) to >> utf8 from the current database encoding. > > How does this deal with input records

Re: [HACKERS] plperlu problem with utf8

2010-12-19 Thread David Christensen
On Dec 19, 2010, at 2:20 AM, Alex Hunsaker wrote: > On Sat, Dec 18, 2010 at 20:29, David E. Wheeler wrote: >> ... >> I would argue that it should output the same as the first example. That is, >> PL/Perl should have decoded the latin-1 before passing the text to the Perl >> function. > > Yeah

Re: [HACKERS] plperlu problem with utf8

2010-12-19 Thread David E. Wheeler
On Dec 19, 2010, at 12:20 AM, Alex Hunsaker wrote: >> I would argue that it should output the same as the first example. That is, >> PL/Perl should have decoded the latin-1 before passing the text to the Perl >> function. > > Yeah, I don't think you will find anyone who disagrees :) PL/TCL and

Re: [HACKERS] plperlu problem with utf8

2010-12-19 Thread Alex Hunsaker
On Sat, Dec 18, 2010 at 20:29, David E. Wheeler wrote: > ... > I would argue that it should output the same as the first example. That is, > PL/Perl should have decoded the latin-1 before passing the text to the Perl > function. Yeah, I don't think you will find anyone who disagrees :) PL/TCL

Re: [HACKERS] plperlu problem with utf8

2010-12-18 Thread Alex Hunsaker
On Sat, Dec 18, 2010 at 20:29, David E. Wheeler wrote: > On Dec 17, 2010, at 9:32 PM, David Christensen wrote: >    latin=# SELECT * FROM perlgets('“hello”'); >     length │ is_utf8 >    ┼─ >         11 │ f > > (Yes I used Latin-1 curly quotes in that last example). Erm, latin1 do

Re: [HACKERS] plperlu problem with utf8

2010-12-18 Thread David E. Wheeler
On Dec 17, 2010, at 9:32 PM, David Christensen wrote: > +1 on the original sentiment, but only for the case that we're dealing with > data that is passed in/out as arguments. In the case that the > server_encoding is UTF-8, this is as trivial as a few macros on the > underlying SVs for text-li

Re: [HACKERS] plperlu problem with utf8

2010-12-18 Thread David E. Wheeler
On Dec 17, 2010, at 10:46 PM, Alex Hunsaker wrote: >> But that's a separate issue from the, erm, inconsistency with which PL/Perl >> treats encoding and decoding of its inputs and outputs. > > Yay! So I think we can finally agree that for Oleg's original test > case postgres was getting right.

Re: [HACKERS] plperlu problem with utf8

2010-12-17 Thread Alex Hunsaker
On Fri, Dec 17, 2010 at 18:04, David E. Wheeler wrote: > On Dec 16, 2010, at 8:39 PM, Alex Hunsaker wrote: > Yeah. So I just wrote and tested this function on 9.0 with Perl 5.12.2: > >    CREATE OR REPLACE FUNCTION perlgets( >        TEXT >    ) RETURNS TABLE(length INT, is_utf8 BOOL) LANGUAGE pl

Re: [HACKERS] plperlu problem with utf8

2010-12-17 Thread Alex Hunsaker
On Fri, Dec 17, 2010 at 18:22, David E. Wheeler wrote: > On Dec 17, 2010, at 5:04 PM, David E. Wheeler wrote: > >>> see? Either uri_unescape() should be decoding that utf8() or you need >>> to do it *after* you call uri_unescape().  Hence the maybe it could be >>> considered a bug in uri_unescape(

Re: [HACKERS] plperlu problem with utf8

2010-12-17 Thread Alex Hunsaker
On Fri, Dec 17, 2010 at 22:32, David Christensen wrote: > > On Dec 17, 2010, at 7:04 PM, David E. Wheeler wrote: > >> On Dec 16, 2010, at 8:39 PM, Alex Hunsaker wrote: >> No, URI::Escape is fine. The issue is that if you don't decode text to Perl's internal form, it assumes that it's La

Re: [HACKERS] plperlu problem with utf8

2010-12-17 Thread David Christensen
On Dec 17, 2010, at 7:04 PM, David E. Wheeler wrote: > On Dec 16, 2010, at 8:39 PM, Alex Hunsaker wrote: > >>> No, URI::Escape is fine. The issue is that if you don't decode text to >>> Perl's internal form, it assumes that it's Latin-1. >> >> So... you are saying "\xc3\xa9" eq "\xe9" or chr(2

Re: [HACKERS] plperlu problem with utf8

2010-12-17 Thread David E. Wheeler
On Dec 17, 2010, at 5:04 PM, David E. Wheeler wrote: >> see? Either uri_unescape() should be decoding that utf8() or you need >> to do it *after* you call uri_unescape(). Hence the maybe it could be >> considered a bug in uri_unescape(). > > Agreed. On second thought, no. You can in fact encode

Re: [HACKERS] plperlu problem with utf8

2010-12-17 Thread David E. Wheeler
On Dec 16, 2010, at 8:39 PM, Alex Hunsaker wrote: >> No, URI::Escape is fine. The issue is that if you don't decode text to >> Perl's internal form, it assumes that it's Latin-1. > > So... you are saying "\xc3\xa9" eq "\xe9" or chr(233) ? Not knowing what those mean, I'm not saying either one,

Re: [HACKERS] plperlu problem with utf8

2010-12-17 Thread Alex Hunsaker
On Fri, Dec 17, 2010 at 11:51, Alex Hunsaker wrote: > Also note this is just a simple test case, perl *could* elect to store > completely ascii strings internally as utf8.  In those cases we still Erm... not ascii I mean bytes >127 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresq

Re: [HACKERS] plperlu problem with utf8

2010-12-17 Thread Alex Hunsaker
On Fri, Dec 17, 2010 at 08:30, David Fetter wrote: > On Thu, Dec 16, 2010 at 07:24:46PM -0800, David Wheeler wrote: >> On Dec 16, 2010, at 6:39 PM, Alex Hunsaker wrote: >> >> > Grr that should error out with "Invalid server encoding", or worst >> > case should return a length of 3 (it utf8 encoded

Re: [HACKERS] plperlu problem with utf8

2010-12-17 Thread David Fetter
On Thu, Dec 16, 2010 at 07:24:46PM -0800, David Wheeler wrote: > On Dec 16, 2010, at 6:39 PM, Alex Hunsaker wrote: > > > Grr that should error out with "Invalid server encoding", or worst > > case should return a length of 3 (it utf8 encoded 128 into 2 bytes > > instead of leaving it as 1). In th

Re: [HACKERS] plperlu problem with utf8

2010-12-16 Thread Alex Hunsaker
On Thu, Dec 16, 2010 at 20:24, David E. Wheeler wrote: > On Dec 16, 2010, at 6:39 PM, Alex Hunsaker wrote: > >> You might argue this is a bug with URI::Escape as I *think* all uri's >> will be utf8 encoded.  Anyway, I think postgres is doing the right >> thing here. > > No, URI::Escape is fine. Th

Re: [HACKERS] plperlu problem with utf8

2010-12-16 Thread David E. Wheeler
On Dec 16, 2010, at 6:39 PM, Alex Hunsaker wrote: > You might argue this is a bug with URI::Escape as I *think* all uri's > will be utf8 encoded. Anyway, I think postgres is doing the right > thing here. No, URI::Escape is fine. The issue is that if you don't decode text to Perl's internal form

Re: [HACKERS] plperlu problem with utf8

2010-12-16 Thread Alex Hunsaker
On Wed, Dec 8, 2010 at 14:15, Oleg Bartunov wrote: > On Wed, 8 Dec 2010, David E. Wheeler wrote: > >> On Dec 8, 2010, at 8:13 AM, Oleg Bartunov wrote: >> >>> adding utf8::decode($_[0]) solves the problem: >> Hrm. Ideally all strings passed to PL/Perl functions would be decoded. > > yes, this is wh

Re: [HACKERS] plperlu problem with utf8

2010-12-08 Thread Oleg Bartunov
On Wed, 8 Dec 2010, David E. Wheeler wrote: On Dec 8, 2010, at 8:13 AM, Oleg Bartunov wrote: adding utf8::decode($_[0]) solves the problem: knn=# CREATE OR REPLACE FUNCTION url_decode(Vkw varchar) RETURNS varchar AS $$ use strict; use URI::Escape; utf8::decode($_[0]); return uri_

Re: [HACKERS] plperlu problem with utf8

2010-12-08 Thread David E. Wheeler
On Dec 8, 2010, at 8:13 AM, Oleg Bartunov wrote: > adding utf8::decode($_[0]) solves the problem: > > knn=# CREATE OR REPLACE FUNCTION url_decode(Vkw varchar) RETURNS varchar AS > $$ >use strict; >use URI::Escape; >utf8::decode($_[0]); >return uri_unescape($_[0]); $$ LANGUAGE pl

Re: [HACKERS] plperlu problem with utf8

2010-12-08 Thread Oleg Bartunov
adding utf8::decode($_[0]) solves the problem: knn=# CREATE OR REPLACE FUNCTION url_decode(Vkw varchar) RETURNS varchar AS $$ use strict; use URI::Escape; utf8::decode($_[0]); return uri_unescape($_[0]); $$ LANGUAGE plperlu; Oleg On Wed, 8 Dec 2010, Andrew Dunstan wrote: O

Re: [HACKERS] plperlu problem with utf8

2010-12-08 Thread Andrew Dunstan
On 12/08/2010 10:13 AM, Oleg Bartunov wrote: Hi there, below is the problem, which I don't have when running in shell. The database is in UTF-8 encoding. CREATE OR REPLACE FUNCTION url_decode(Vkw varchar) RETURNS varchar AS $$ use strict; use URI::Escape; return uri_unescape($_

[HACKERS] plperlu problem with utf8

2010-12-08 Thread Oleg Bartunov
Hi there, below is the problem, which I don't have when running in shell. The database is in UTF-8 encoding. CREATE OR REPLACE FUNCTION url_decode(Vkw varchar) RETURNS varchar AS $$ use strict; use URI::Escape; return uri_unescape($_[0]); $$ LANGUAGE plperlu; CREATE FUNCTION Tim