Re: [HACKERS] plperlu problem with utf8 [REVIEW]

2011-01-16 Thread Alex Hunsaker
On Sat, Jan 15, 2011 at 14:20, Andy Colson a...@squeakycode.net wrote: This is a review of  plperl encoding issues https://commitfest.postgresql.org/action/patch_view?id=452 Thanks for taking the time to review! [...] The Patch: == Applies clean to git head as of January 15 2011.

Re: [HACKERS] plperlu problem with utf8 [REVIEW]

2011-01-16 Thread Andy Colson
On 01/16/2011 07:14 PM, Alex Hunsaker wrote: On Sat, Jan 15, 2011 at 14:20, Andy Colsona...@squeakycode.net wrote: This is a review of plperl encoding issues https://commitfest.postgresql.org/action/patch_view?id=452 Thanks for taking the time to review! [...] The Patch: ==

[HACKERS] plperlu problem with utf8 [REVIEW]

2011-01-15 Thread Andy Colson
This is a review of plperl encoding issues https://commitfest.postgresql.org/action/patch_view?id=452 Purpose: Your database uses one encoding, and passes data to perl in the same encoding, which perl is not prepared for (it assumes UTF-8). This patch makes sure data is encoded

Re: [HACKERS] plperlu problem with utf8

2010-12-22 Thread David E. Wheeler
On Dec 21, 2010, at 8:19 PM, Alex Hunsaker wrote: And here is v3, fixes the above and also makes sure to properly encode/decode SPI arguments. Tested on a latin1 database with latin1 columns and utf8 with utf8 columns. Also passes make installcheck (of course) and changes one or two things

Re: [HACKERS] plperlu problem with utf8

2010-12-22 Thread Alex Hunsaker
On Wed, Dec 22, 2010 at 11:24, David E. Wheeler da...@kineticode.com wrote: On Dec 21, 2010, at 8:19 PM, Alex Hunsaker wrote: And here is v3, [ ...] Awesome. Would you add it to https://commitfest.postgresql.org/action/commitfest_view?id=9 please? Nah, I was willing to spend a couple of

Re: [HACKERS] plperlu problem with utf8

2010-12-21 Thread Alex Hunsaker
On Mon, Dec 20, 2010 at 00:39, Alex Hunsaker bada...@gmail.com wrote: In further review over caffeine this morning I noticed there are a few places I missed: plperl_build_tuple_result(), plperl_modify_tuple() and Util.XS. And here is v3, fixes the above and also makes sure to properly

Re: [HACKERS] plperlu problem with utf8

2010-12-20 Thread Robert Haas
On Sun, Dec 19, 2010 at 7:56 PM, David E. Wheeler da...@kineticode.com wrote: +1 Awesome. Should this go into the next commitfest? Or might it be considered a bug fix? CommitFest or no CommitFest, patches get applied when a committer acquires enough round tuits. Putting then into the next

Re: [HACKERS] plperlu problem with utf8

2010-12-19 Thread Alex Hunsaker
On Sat, Dec 18, 2010 at 20:29, David E. Wheeler da...@kineticode.com wrote: ... I would argue that it should output the same as the first example. That is, PL/Perl should have decoded the latin-1 before passing the text to the Perl function. Yeah, I don't think you will find anyone who

Re: [HACKERS] plperlu problem with utf8

2010-12-19 Thread David E. Wheeler
On Dec 19, 2010, at 12:20 AM, Alex Hunsaker wrote: I would argue that it should output the same as the first example. That is, PL/Perl should have decoded the latin-1 before passing the text to the Perl function. Yeah, I don't think you will find anyone who disagrees :) PL/TCL and

Re: [HACKERS] plperlu problem with utf8

2010-12-19 Thread David Christensen
On Dec 19, 2010, at 2:20 AM, Alex Hunsaker wrote: On Sat, Dec 18, 2010 at 20:29, David E. Wheeler da...@kineticode.com wrote: ... I would argue that it should output the same as the first example. That is, PL/Perl should have decoded the latin-1 before passing the text to the Perl

Re: [HACKERS] plperlu problem with utf8

2010-12-19 Thread Alex Hunsaker
On Sun, Dec 19, 2010 at 21:00, David Christensen da...@endpoint.com wrote: On Dec 19, 2010, at 2:20 AM, Alex Hunsaker wrote: With the attached we: - for function arguments, convert (using pg_do_encoding_conversion) to utf8 from the current database encoding. How does this deal with input

Re: [HACKERS] plperlu problem with utf8

2010-12-18 Thread David E. Wheeler
On Dec 17, 2010, at 10:46 PM, Alex Hunsaker wrote: But that's a separate issue from the, erm, inconsistency with which PL/Perl treats encoding and decoding of its inputs and outputs. Yay! So I think we can finally agree that for Oleg's original test case postgres was getting right. I hope

Re: [HACKERS] plperlu problem with utf8

2010-12-18 Thread David E. Wheeler
On Dec 17, 2010, at 9:32 PM, David Christensen wrote: +1 on the original sentiment, but only for the case that we're dealing with data that is passed in/out as arguments. In the case that the server_encoding is UTF-8, this is as trivial as a few macros on the underlying SVs for text-like

Re: [HACKERS] plperlu problem with utf8

2010-12-18 Thread Alex Hunsaker
On Sat, Dec 18, 2010 at 20:29, David E. Wheeler da...@kineticode.com wrote: On Dec 17, 2010, at 9:32 PM, David Christensen wrote:    latin=# SELECT * FROM perlgets('“hello”');     length │ is_utf8    ┼─         11 │ f (Yes I used Latin-1 curly quotes in that last example).

Re: [HACKERS] plperlu problem with utf8

2010-12-17 Thread David Fetter
On Thu, Dec 16, 2010 at 07:24:46PM -0800, David Wheeler wrote: On Dec 16, 2010, at 6:39 PM, Alex Hunsaker wrote: Grr that should error out with Invalid server encoding, or worst case should return a length of 3 (it utf8 encoded 128 into 2 bytes instead of leaving it as 1). In this case

Re: [HACKERS] plperlu problem with utf8

2010-12-17 Thread Alex Hunsaker
On Fri, Dec 17, 2010 at 08:30, David Fetter da...@fetter.org wrote: On Thu, Dec 16, 2010 at 07:24:46PM -0800, David Wheeler wrote: On Dec 16, 2010, at 6:39 PM, Alex Hunsaker wrote: Grr that should error out with Invalid server encoding, or worst case should return a length of 3 (it utf8

Re: [HACKERS] plperlu problem with utf8

2010-12-17 Thread Alex Hunsaker
On Fri, Dec 17, 2010 at 11:51, Alex Hunsaker bada...@gmail.com wrote: Also note this is just a simple test case, perl *could* elect to store completely ascii strings internally as utf8.  In those cases we still Erm... not ascii I mean bytes 127 -- Sent via pgsql-hackers mailing list

Re: [HACKERS] plperlu problem with utf8

2010-12-17 Thread David E. Wheeler
On Dec 16, 2010, at 8:39 PM, Alex Hunsaker wrote: No, URI::Escape is fine. The issue is that if you don't decode text to Perl's internal form, it assumes that it's Latin-1. So... you are saying \xc3\xa9 eq \xe9 or chr(233) ? Not knowing what those mean, I'm not saying either one, to my

Re: [HACKERS] plperlu problem with utf8

2010-12-17 Thread David E. Wheeler
On Dec 17, 2010, at 5:04 PM, David E. Wheeler wrote: see? Either uri_unescape() should be decoding that utf8() or you need to do it *after* you call uri_unescape(). Hence the maybe it could be considered a bug in uri_unescape(). Agreed. On second thought, no. You can in fact encode

Re: [HACKERS] plperlu problem with utf8

2010-12-17 Thread David Christensen
On Dec 17, 2010, at 7:04 PM, David E. Wheeler wrote: On Dec 16, 2010, at 8:39 PM, Alex Hunsaker wrote: No, URI::Escape is fine. The issue is that if you don't decode text to Perl's internal form, it assumes that it's Latin-1. So... you are saying \xc3\xa9 eq \xe9 or chr(233) ? Not

Re: [HACKERS] plperlu problem with utf8

2010-12-17 Thread Alex Hunsaker
On Fri, Dec 17, 2010 at 18:22, David E. Wheeler da...@kineticode.com wrote: On Dec 17, 2010, at 5:04 PM, David E. Wheeler wrote: see? Either uri_unescape() should be decoding that utf8() or you need to do it *after* you call uri_unescape().  Hence the maybe it could be considered a bug in

Re: [HACKERS] plperlu problem with utf8

2010-12-17 Thread Alex Hunsaker
On Fri, Dec 17, 2010 at 18:04, David E. Wheeler da...@kineticode.com wrote: On Dec 16, 2010, at 8:39 PM, Alex Hunsaker wrote: Yeah. So I just wrote and tested this function on 9.0 with Perl 5.12.2:    CREATE OR REPLACE FUNCTION perlgets(        TEXT    ) RETURNS TABLE(length INT, is_utf8

Re: [HACKERS] plperlu problem with utf8

2010-12-16 Thread Alex Hunsaker
On Wed, Dec 8, 2010 at 14:15, Oleg Bartunov o...@sai.msu.su wrote: On Wed, 8 Dec 2010, David E. Wheeler wrote: On Dec 8, 2010, at 8:13 AM, Oleg Bartunov wrote: adding utf8::decode($_[0]) solves the problem: Hrm. Ideally all strings passed to PL/Perl functions would be decoded. yes, this is

Re: [HACKERS] plperlu problem with utf8

2010-12-16 Thread David E. Wheeler
On Dec 16, 2010, at 6:39 PM, Alex Hunsaker wrote: You might argue this is a bug with URI::Escape as I *think* all uri's will be utf8 encoded. Anyway, I think postgres is doing the right thing here. No, URI::Escape is fine. The issue is that if you don't decode text to Perl's internal form,

Re: [HACKERS] plperlu problem with utf8

2010-12-16 Thread Alex Hunsaker
On Thu, Dec 16, 2010 at 20:24, David E. Wheeler da...@kineticode.com wrote: On Dec 16, 2010, at 6:39 PM, Alex Hunsaker wrote: You might argue this is a bug with URI::Escape as I *think* all uri's will be utf8 encoded.  Anyway, I think postgres is doing the right thing here. No, URI::Escape

[HACKERS] plperlu problem with utf8

2010-12-08 Thread Oleg Bartunov
Hi there, below is the problem, which I don't have when running in shell. The database is in UTF-8 encoding. CREATE OR REPLACE FUNCTION url_decode(Vkw varchar) RETURNS varchar AS $$ use strict; use URI::Escape; return uri_unescape($_[0]); $$ LANGUAGE plperlu; CREATE FUNCTION

Re: [HACKERS] plperlu problem with utf8

2010-12-08 Thread Andrew Dunstan
On 12/08/2010 10:13 AM, Oleg Bartunov wrote: Hi there, below is the problem, which I don't have when running in shell. The database is in UTF-8 encoding. CREATE OR REPLACE FUNCTION url_decode(Vkw varchar) RETURNS varchar AS $$ use strict; use URI::Escape; return

Re: [HACKERS] plperlu problem with utf8

2010-12-08 Thread Oleg Bartunov
adding utf8::decode($_[0]) solves the problem: knn=# CREATE OR REPLACE FUNCTION url_decode(Vkw varchar) RETURNS varchar AS $$ use strict; use URI::Escape; utf8::decode($_[0]); return uri_unescape($_[0]); $$ LANGUAGE plperlu; Oleg On Wed, 8 Dec 2010, Andrew Dunstan wrote:

Re: [HACKERS] plperlu problem with utf8

2010-12-08 Thread David E. Wheeler
On Dec 8, 2010, at 8:13 AM, Oleg Bartunov wrote: adding utf8::decode($_[0]) solves the problem: knn=# CREATE OR REPLACE FUNCTION url_decode(Vkw varchar) RETURNS varchar AS $$ use strict; use URI::Escape; utf8::decode($_[0]); return uri_unescape($_[0]); $$ LANGUAGE plperlu;

Re: [HACKERS] plperlu problem with utf8

2010-12-08 Thread Oleg Bartunov
On Wed, 8 Dec 2010, David E. Wheeler wrote: On Dec 8, 2010, at 8:13 AM, Oleg Bartunov wrote: adding utf8::decode($_[0]) solves the problem: knn=# CREATE OR REPLACE FUNCTION url_decode(Vkw varchar) RETURNS varchar AS $$ use strict; use URI::Escape; utf8::decode($_[0]); return