Re: [SPAM] [MessageLimit][lowlimit] Re: [HACKERS] pl/perl and utf-8 in sql_ascii databases

2012-07-11 Thread Alvaro Herrera

Excerpts from Alvaro Herrera's message of mar jul 10 16:23:57 -0400 2012:
 Excerpts from Kyotaro HORIGUCHI's message of mar jul 03 04:59:38 -0400 2012:
  Hello, Here is regression test runs on pg's also built with
  cygwin-gcc and VC++.
  
  The patches attached following,
  
  - plperl_sql_ascii-4.patch : fix for pl/perl utf8 vs sql_ascii
  - plperl_sql_ascii_regress-1.patch : regression test for this patch.
   I added some tests on encoding to this.
  
  I will mark this patch as 'ready for committer' after this.
 
 I have pushed these changes to HEAD, 9.2 and 9.1.  Instead of the games
 with plperl_lc_*.out being copied around, I just used the ASCII version
 as plperl_lc_1.out and the UTF8 one as plperl_lc.out.

... and this story hasn't ended yet, because one of the new tests is
failing.  See here:

http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=magpiedt=2012-07-11%2010%3A00%3A04

The interesting part of the diff is:

***
*** 34,41 
return ($str ne $match ? $code.DIFFER : $code.ab\x{5ddd}cd);
  $$ LANGUAGE plperl;
  SELECT encode(perl_utf_inout(E'ab\xe5\xb1\xb1cd')::bytea, 'escape')
!   encode  
! --
!  NotUTF8:ab\345\267\235cd
! (1 row)
! 
--- 34,38 
return ($str ne $match ? $code.DIFFER : $code.ab\x{5ddd}cd);
  $$ LANGUAGE plperl;
  SELECT encode(perl_utf_inout(E'ab\xe5\xb1\xb1cd')::bytea, 'escape')
! ERROR:  character with byte sequence 0xe5 0xb7 0x9d in encoding UTF8 has no 
equivalent in encoding LATIN1
! CONTEXT:  PL/Perl function perl_utf_inout


I am not sure what can we do here other than remove this function and
query from the test.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [SPAM] [MessageLimit][lowlimit] Re: [HACKERS] pl/perl and utf-8 in sql_ascii databases

2012-07-11 Thread Alex Hunsaker
On Wed, Jul 11, 2012 at 1:42 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:

 I have pushed these changes to HEAD, 9.2 and 9.1.  Instead of the games
 with plperl_lc_*.out being copied around, I just used the ASCII version
 as plperl_lc_1.out and the UTF8 one as plperl_lc.out.

 ... and this story hasn't ended yet, because one of the new tests is
 failing.  See here:

 http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=magpiedt=2012-07-11%2010%3A00%3A04

 [...]
   SELECT encode(perl_utf_inout(E'ab\xe5\xb1\xb1cd')::bytea, 'escape')
 ! ERROR:  character with byte sequence 0xe5 0xb7 0x9d in encoding UTF8 has 
 no equivalent in encoding LATIN1
 ! CONTEXT:  PL/Perl function perl_utf_inout


 I am not sure what can we do here other than remove this function and
 query from the test.

Hrm, me neither. I say drop em.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [SPAM] [MessageLimit][lowlimit] Re: [HACKERS] pl/perl and utf-8 in sql_ascii databases

2012-07-11 Thread Kyotaro HORIGUCHI
Hmm... Sorry for immature patch..

 ... and this story hasn't ended yet, because one of the new tests is
 failing.  See here:
 
 http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=magpiedt=2012-07-11%2010%3A00%3A04
 
 The interesting part of the diff is:
...
   SELECT encode(perl_utf_inout(E'ab\xe5\xb1\xb1cd')::bytea, 'escape')
 ! ERROR:  character with byte sequence 0xe5 0xb7 0x9d in encoding UTF8 has 
 no equivalent in encoding LATIN1
 ! CONTEXT:  PL/Perl function perl_utf_inout
 
 
 I am not sure what can we do here other than remove this function and
 query from the test.

I've run the regress only for the environment capable to handle
the character U+5ddd (Japanese character which means river)...

The byte sequences which can be decoded and the result byte
sequences of encoding from a unicode character vary among the
encodings.

The problem itself which is the aim of this thread could be
covered without the additional test. That confirms if
encoding/decoding is done as expected on calling the language
handler. I suppose that testing for the two cases and additional
one case which runs pg_do_encoding_conversion(), say latin1,
would be enough to confirm that encoding/decoding is properly
done, since the concrete conversion scheme is not significant
this case.

So I recommend that we should add the test for latin1 and omit
the test from other than sql_ascii, utf8 and latin1. This might
be archieved by create empty plperl_lc.sql and plperl_lc.out
files for those encodings.

What do you think about that?


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

== My e-mail address has been changed since Apr. 1, 2012.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [SPAM] [MessageLimit][lowlimit] Re: [HACKERS] pl/perl and utf-8 in sql_ascii databases

2012-07-10 Thread Alvaro Herrera

Excerpts from Kyotaro HORIGUCHI's message of mar jul 03 04:59:38 -0400 2012:
 Hello, Here is regression test runs on pg's also built with
 cygwin-gcc and VC++.
 
 The patches attached following,
 
 - plperl_sql_ascii-4.patch : fix for pl/perl utf8 vs sql_ascii
 - plperl_sql_ascii_regress-1.patch : regression test for this patch.
  I added some tests on encoding to this.
 
 I will mark this patch as 'ready for committer' after this.

I have pushed these changes to HEAD, 9.2 and 9.1.  Instead of the games
with plperl_lc_*.out being copied around, I just used the ASCII version
as plperl_lc_1.out and the UTF8 one as plperl_lc.out.

I chose to backpatch the whole thing instead of cherry-picking parts of
it; that was turning into a tedious and pointless exercise.  We'll see
how does the buildfarm like the whole thing -- including on MSVC, which
I did not test at all.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers