Re: [HACKERS] pg_conversion seems rather strangely defined

2016-01-11 Thread Tatsuo Ishii
>> I used to had a customer who needs to have different client and
>> database encoding than the default.  That is, a slightly different
>> mapping between Shift-JIS and other database encoding.  Due to
>> unfortunate historical reasons, there are several Shift-JIS variants
>> (in addition to the standard defined by government, there are IBM, NEC
>> and Microsoft versions).  This is the reason why I wanted to have the
>> functionality at that time.  I'm not sure the customer still uses the
>> functionality, but it is possible that there are other users who have
>> similar use cases, since the Shift-JIS variants are still used.
> 
> Hm.  Odd that we've not heard complaints about the removal of
> CONVERT(... USING ...), then.

Yeah, I'm not sure the customer updated to the newer version of
PostgreSQL.

> I think it would be a good idea at least to put back some equivalent
> of CONVERT(... USING ...), if for no other reason than that it would
> ease testing.  As I understood it, the argument to remove it was not
> that the functionality was bad, but that we were using a SQL-standard
> syntax for what we concluded was not SQL-standard functionality.
> I'd propose putting it back with a syntax of, say,
> 
>   convert_with(input bytea, conversion_name text) returns bytea
> 
> As for the client encoding conversion case, I still say a
> search-path-based lookup is a horrible idea, and furthermore there
> seems no very good reason why it has to be restricted to default
> conversions.  Aside from other arguments, that tends to push people
> to mark *every* conversion as default, which is outright silly if
> you have several competing ones.
> 
> As a sketch of an alternative, consider inventing a GUC named
> preferred_conversions or some such, which is a list of
> possibly-schema-qualified conversion names.  When establishing an
> original or new value of client_encoding, we look through the list
> for the first entry that exists and performs the desired encoding
> conversion (whether or not it is default).  If there is no match,
> look up the (unique) default conversion for the encoding pair, and
> use that.  (Obviously this has to be done twice, once for each
> direction, when setting up client encoding conversions.)  We could
> apply the same rules for identifying which specific conversion to use
> in convert() and siblings.

Sounds good to me.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_conversion seems rather strangely defined

2016-01-07 Thread Tom Lane
Tatsuo Ishii  writes:
>> It would only be important to be able to do it like that if different
>> users of the same database had conflicting ideas about what was the
>> correct conversion between client and database encodings.  I submit
>> that that's somewhere around epsilon probability, considering we have
>> not even heard of anyone replacing the system conversions at all.

> I used to had a customer who needs to have different client and
> database encoding than the default.  That is, a slightly different
> mapping between Shift-JIS and other database encoding.  Due to
> unfortunate historical reasons, there are several Shift-JIS variants
> (in addition to the standard defined by government, there are IBM, NEC
> and Microsoft versions).  This is the reason why I wanted to have the
> functionality at that time.  I'm not sure the customer still uses the
> functionality, but it is possible that there are other users who have
> similar use cases, since the Shift-JIS variants are still used.

Hm.  Odd that we've not heard complaints about the removal of
CONVERT(... USING ...), then.

I think it would be a good idea at least to put back some equivalent
of CONVERT(... USING ...), if for no other reason than that it would
ease testing.  As I understood it, the argument to remove it was not
that the functionality was bad, but that we were using a SQL-standard
syntax for what we concluded was not SQL-standard functionality.
I'd propose putting it back with a syntax of, say,

convert_with(input bytea, conversion_name text) returns bytea

As for the client encoding conversion case, I still say a
search-path-based lookup is a horrible idea, and furthermore there
seems no very good reason why it has to be restricted to default
conversions.  Aside from other arguments, that tends to push people
to mark *every* conversion as default, which is outright silly if
you have several competing ones.

As a sketch of an alternative, consider inventing a GUC named
preferred_conversions or some such, which is a list of
possibly-schema-qualified conversion names.  When establishing an
original or new value of client_encoding, we look through the list
for the first entry that exists and performs the desired encoding
conversion (whether or not it is default).  If there is no match,
look up the (unique) default conversion for the encoding pair, and
use that.  (Obviously this has to be done twice, once for each
direction, when setting up client encoding conversions.)  We could
apply the same rules for identifying which specific conversion to use
in convert() and siblings.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_conversion seems rather strangely defined

2016-01-07 Thread Tatsuo Ishii
> It would only be important to be able to do it like that if different
> users of the same database had conflicting ideas about what was the
> correct conversion between client and database encodings.  I submit
> that that's somewhere around epsilon probability, considering we have
> not even heard of anyone replacing the system conversions at all.

I used to had a customer who needs to have different client and
database encoding than the default.  That is, a slightly different
mapping between Shift-JIS and other database encoding.  Due to
unfortunate historical reasons, there are several Shift-JIS variants
(in addition to the standard defined by government, there are IBM, NEC
and Microsoft versions).  This is the reason why I wanted to have the
functionality at that time.  I'm not sure the customer still uses the
functionality, but it is possible that there are other users who have
similar use cases, since the Shift-JIS variants are still used.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_conversion seems rather strangely defined

2016-01-07 Thread Tom Lane
Noah Misch  writes:
> On Wed, Jan 06, 2016 at 11:56:14PM -0500, Tom Lane wrote:
>> Moreover, we have precedent both for this approach being a bad idea
>> and for us changing it without many complaints.  We used to have
>> search-path-dependent lookup of default index operator classes.
>> We found out that that was a bad idea and got rid of it, cf commit
>> 3ac1ac58c.  This situation looks much the same to me.

> Per the 3ac1ac58c log message, "CREATE OPERATOR CLASS only allows one default
> opclass per datatype regardless of schemas."  That had been true since day one
> for CREATE OPERATOR CLASS.  It doesn't hold for conversions today, and that's
> a crucial difference between that commit and this proposal.

Well, the state of affairs is slightly different, but that doesn't mean
it's not equally broken.  What I took away from the default-opclass fiasco
is that search-path-based lookup is a good idea only when you are looking
up something *by name*, so that the user can resolve any ambiguity by
schema-qualifying that name.  Searches that aren't based on a
user-specified name shouldn't depend on search_path.  This is important
because there are multiple conflicting concerns when you choose a
search_path setting, and you may not want to allow any particular search
to force your hand on what you put where in your search path.  If, for
example, you don't really want to put "public" on the front of your search
path, the current code gives you no way to select a conversion that's in
"public".

If we need to cater for alternative conversions, my preference would be a
GUC or some other kind of setting that allows selecting a client I/O
conversion by name, whether it is "default" or not.  But given the lack
of apparent demand, I don't really want to design and implement such a
feature right now.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_conversion seems rather strangely defined

2016-01-07 Thread Noah Misch
On Wed, Jan 06, 2016 at 11:56:14PM -0500, Tom Lane wrote:
> Noah Misch  writes:
> > On Tue, Jan 05, 2016 at 01:46:51PM -0500, Tom Lane wrote:
> >> I do not see a lot of point in the namespacing of encoding conversions
> >> either.  Does anyone really need or use search-path-dependent lookup of
> >> conversions?
> 
> > I have not issued CREATE CONVERSION except to experiment, and I have never
> > worked in a database in which someone else had created one.  Among PGXN
> > distributions, CREATE CONVERSION appears only in the pyrseas test suite.  It
> > could be hard to track down testimony on real-world usage patterns, but I
> > envision two credible patterns.  First, you could change the default search
> > path to "corrected_conversions, pg_catalog, $user, public" and inject fixed
> > versions of the system conversions.  One could use that to backport commit
> > 8d3e090.  Second, you could add conversions we omit entirely, like UTF8 ->
> > MULE_INTERNAL.  Dropping search-path-dependent lookup would remove the
> > supported way to fix system conversions.
> 
> The built-in conversions are very intentionally not pinned.  So to my
> mind, the supported way to replace one is to drop it and create your own.

I just learned something.  Interesting.

> The way you describe works only if an appropriate search path is installed
> at the time a new session activates its client encoding conversions.  TBH,
> I have no idea whether we apply (for example) "ALTER ROLE SET search_path"
> before that happens; but even if we do, there's no real guarantee that it
> wouldn't change.  We publish no documentation about the order of startup
> actions.  Moreover past attempts at defining dependencies between GUC
> settings have been spectacular failures, so I really don't want to go
> there in this context.
> 
> It would only be important to be able to do it like that if different
> users of the same database had conflicting ideas about what was the
> correct conversion between client and database encodings.  I submit
> that that's somewhere around epsilon probability, considering we have
> not even heard of anyone replacing the system conversions at all.
> 
> (Adding conversions we don't supply is, of course, orthogonal to this.)

Agreed on all those points.  Even so, I don't find that the need to set GUCs
in a particular order makes a good case for removing this ancient capability.
I _would_ send a new feature back for redesign on the strength of such a
defect, but that is different.

Independent from that dearth of positive cause to restrict this, users taking
the "DROP CONVERSION pg_catalog.foo" route get dump/restore problems.  pg_dump
doesn't notice that one dropped a pg_catalog conversion; the user would
manually repeat each drop before each restore.  That's especially awkward for
pg_upgrade.  I guess the user could drop each conversion in the new cluster's
template0, run pg_upgrade, and then recreate conversions in databases that had
not overridden them in the original cluster.  That's a mess.

> Moreover, we have precedent both for this approach being a bad idea
> and for us changing it without many complaints.  We used to have
> search-path-dependent lookup of default index operator classes.
> We found out that that was a bad idea and got rid of it, cf commit
> 3ac1ac58c.  This situation looks much the same to me.

Per the 3ac1ac58c log message, "CREATE OPERATOR CLASS only allows one default
opclass per datatype regardless of schemas."  That had been true since day one
for CREATE OPERATOR CLASS.  It doesn't hold for conversions today, and that's
a crucial difference between that commit and this proposal.

Thanks,
nm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_conversion seems rather strangely defined

2016-01-06 Thread Tom Lane
Noah Misch  writes:
> On Tue, Jan 05, 2016 at 01:46:51PM -0500, Tom Lane wrote:
>> I do not see a lot of point in the namespacing of encoding conversions
>> either.  Does anyone really need or use search-path-dependent lookup of
>> conversions?

> I have not issued CREATE CONVERSION except to experiment, and I have never
> worked in a database in which someone else had created one.  Among PGXN
> distributions, CREATE CONVERSION appears only in the pyrseas test suite.  It
> could be hard to track down testimony on real-world usage patterns, but I
> envision two credible patterns.  First, you could change the default search
> path to "corrected_conversions, pg_catalog, $user, public" and inject fixed
> versions of the system conversions.  One could use that to backport commit
> 8d3e090.  Second, you could add conversions we omit entirely, like UTF8 ->
> MULE_INTERNAL.  Dropping search-path-dependent lookup would remove the
> supported way to fix system conversions.

The built-in conversions are very intentionally not pinned.  So to my
mind, the supported way to replace one is to drop it and create your own.
The way you describe works only if an appropriate search path is installed
at the time a new session activates its client encoding conversions.  TBH,
I have no idea whether we apply (for example) "ALTER ROLE SET search_path"
before that happens; but even if we do, there's no real guarantee that it
wouldn't change.  We publish no documentation about the order of startup
actions.  Moreover past attempts at defining dependencies between GUC
settings have been spectacular failures, so I really don't want to go
there in this context.

It would only be important to be able to do it like that if different
users of the same database had conflicting ideas about what was the
correct conversion between client and database encodings.  I submit
that that's somewhere around epsilon probability, considering we have
not even heard of anyone replacing the system conversions at all.

(Adding conversions we don't supply is, of course, orthogonal to this.)

Moreover, we have precedent both for this approach being a bad idea
and for us changing it without many complaints.  We used to have
search-path-dependent lookup of default index operator classes.
We found out that that was a bad idea and got rid of it, cf commit
3ac1ac58c.  This situation looks much the same to me.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_conversion seems rather strangely defined

2016-01-06 Thread Noah Misch
On Tue, Jan 05, 2016 at 01:46:51PM -0500, Tom Lane wrote:
> I do not see a lot of point in the namespacing of encoding conversions
> either.  Does anyone really need or use search-path-dependent lookup of
> conversions?

I have not issued CREATE CONVERSION except to experiment, and I have never
worked in a database in which someone else had created one.  Among PGXN
distributions, CREATE CONVERSION appears only in the pyrseas test suite.  It
could be hard to track down testimony on real-world usage patterns, but I
envision two credible patterns.  First, you could change the default search
path to "corrected_conversions, pg_catalog, $user, public" and inject fixed
versions of the system conversions.  One could use that to backport commit
8d3e090.  Second, you could add conversions we omit entirely, like UTF8 ->
MULE_INTERNAL.  Dropping search-path-dependent lookup would remove the
supported way to fix system conversions.

> (If they do, it's probably broken anyway, since for example
> we do not trouble to re-identify the client encoding conversion functions
> when search_path changes.)

That's bad in principle, but I'll guess it's tolerable in practice.  Switching
among implementations of a particular conversion might happen with O(weeks) or
longer period, like updating your system's iconv() conversion tables.  I can't
easily envision one application switching between implementations over the
course of a session.  (An application doing that today probably works around
the problem, perhaps with extra "SET client_encoding" calls.)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_conversion seems rather strangely defined

2016-01-05 Thread Tom Lane
I wrote:
> I still think however that search-path-based lookup of default encoding
> conversions is a Bad Idea, and that we only ought to allow one default
> conversion to exist for any pair of encodings.

> I realized that we could implement that without too much trouble by
> redefining pg_conversion.condefault as being true for default conversions
> and NULL (not false) for non-default ones.  With this definition, a
> unique index on pg_conversion (conforencoding, contoencoding, condefault)
> would have the behavior we want --- sort of a poor man's partial unique
> index, except that it would work correctly on a system catalog whereas
> a true partial index wouldn't.

Turns out that indeed that works just fine.  See attached draft patch.

regards, tom lane

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 97ef618..dcb8b8e 100644
*** a/doc/src/sgml/catalogs.sgml
--- b/doc/src/sgml/catalogs.sgml
***
*** 2538,2544 
condefault
bool

!   True if this is the default conversion
   
  
  
--- 2538,2545 
condefault
bool

!   True if this is the default conversion for these two encodings,
!else NULL
   
  
  
diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml
index f8c7ac3..eb39a9f 100644
*** a/doc/src/sgml/charset.sgml
--- b/doc/src/sgml/charset.sgml
*** $ psql -l
*** 1102,1108 
   pg_conversion system catalog.  PostgreSQL
   comes with some predefined conversions, as shown in . You can create a new
!  conversion using the SQL command CREATE CONVERSION.
  
  
   
--- 1102,1108 
   pg_conversion system catalog.  PostgreSQL
   comes with some predefined conversions, as shown in . You can create a new
!  conversion using the SQL command .
  
  
   
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index a0b42c2..f57d891 100644
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
***
*** 1488,1494 
  original encoding is specified by
  src_encoding. The
  string must be valid in this encoding.
! Conversions can be defined by CREATE CONVERSION.
  Also there are some predefined conversions. See  for available conversions.
 
--- 1488,1494 
  original encoding is specified by
  src_encoding. The
  string must be valid in this encoding.
! Conversions can be defined with .
  Also there are some predefined conversions. See  for available conversions.
 
***
*** 1525,1534 
 
 bytea
 
! Convert string to dest_encoding.
 
 convert_to('some text', 'UTF8')
!some text represented in the UTF8 encoding

  

--- 1525,1535 
 
 bytea
 
! Convert string from the database encoding
! to dest_encoding.
 
 convert_to('some text', 'UTF8')
!some text represented in UTF8 encoding

  

diff --git a/doc/src/sgml/ref/create_conversion.sgml b/doc/src/sgml/ref/create_conversion.sgml
index d2e2c01..1933107 100644
*** a/doc/src/sgml/ref/create_conversion.sgml
--- b/doc/src/sgml/ref/create_conversion.sgml
*** CREATE [ DEFAULT ] CONVERSION 
 CREATE CONVERSION defines a new conversion between
!character set encodings.  Also, conversions that
 are marked DEFAULT can be used for automatic encoding
 conversion between
 client and server. For this purpose, two conversions, from encoding A to
--- 28,34 
  

 CREATE CONVERSION defines a new conversion between
!character set encodings.  Conversions that
 are marked DEFAULT can be used for automatic encoding
 conversion between
 client and server. For this purpose, two conversions, from encoding A to
*** CREATE [ DEFAULT ] CONVERSION 
 The DEFAULT clause indicates that this conversion
 is the default for this particular source to destination
!encoding. There should be only one default encoding in a schema
 for the encoding pair.

   
--- 53,59 

 The DEFAULT clause indicates that this conversion
 is the default for this particular source to destination
!encoding. There can be only one default conversion in a database
 for the encoding pair.

   
diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index 8b105fe..98977f6 100644
*** a/src/backend/catalog/namespace.c
--- b/src/backend/catalog/namespace.c
***
*** 28,34 
  #include "catalog/pg_authid.h"
  #include "catalog/pg_collation.h"
  #include "catalog/pg_conversion.h"
- #include "catalog/pg_conversion_fn.h"
  #include "catalog/pg_namespace.h"
  #include "catalog/pg_opclass.h"
  #include "catalog/pg_operator.h"
--- 28,33 

Re: [HACKERS] pg_conversion seems rather strangely defined

2016-01-05 Thread Tom Lane
I wrote:
> What is the point of pg_conversion.condefault (the flag that says whether
> a conversion is "default")?  AFAICS, there is absolutely no way to invoke
> a conversion that is not default, which means we might as well eliminate
> the concept.

> I do not see a lot of point in the namespacing of encoding conversions
> either.  Does anyone really need or use search-path-dependent lookup of
> conversions?  (If they do, it's probably broken anyway, since for example
> we do not trouble to re-identify the client encoding conversion functions
> when search_path changes.)

> While actually removing pg_conversion.connamespace might not be worth
> the trouble, it's mighty tempting to have just a single unique index on
> (conforencoding, contoencoding), thereby enforcing that There Can Be Only
> One conversion between any given pair of encodings, and then we can just
> use that index to find the right entry without any fooling with search
> path.

> But in any case we should get rid of the concept of defaultness, because
> it's pointless; all entries should be equally "default".

After further poking around, I realized that there *used* to be a way to
get at non-default encoding conversions, but it was removed here:

commit 02138357ffc8c41a3d646035368712a5394f1f5f
Author: Andrew Dunstan 
Date:   Mon Sep 24 01:29:30 2007 +

Remove "convert 'blah' using conversion_name" facility, because if it
produces text it is an encoding hole and if not it's incompatible
with the spec, whatever the spec means (which we're not sure about anyway).

Now, the big problem there was that the function could produce values of
type "text" that violate the current database encoding.  We could have
retained the facility by making the function take and return bytea, as
convert() does today, but apparently nobody spoke up for the need then
or since.

Still, somebody might want it someday, and it would be pretty messy
to remove the concept of a default encoding only to have to put it
back later.

I still think however that search-path-based lookup of default encoding
conversions is a Bad Idea, and that we only ought to allow one default
conversion to exist for any pair of encodings.

I realized that we could implement that without too much trouble by
redefining pg_conversion.condefault as being true for default conversions
and NULL (not false) for non-default ones.  With this definition, a
unique index on pg_conversion (conforencoding, contoencoding, condefault)
would have the behavior we want --- sort of a poor man's partial unique
index, except that it would work correctly on a system catalog whereas
a true partial index wouldn't.

Thoughts?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_conversion seems rather strangely defined

2016-01-05 Thread Tom Lane
What is the point of pg_conversion.condefault (the flag that says whether
a conversion is "default")?  AFAICS, there is absolutely no way to invoke
a conversion that is not default, which means we might as well eliminate
the concept.

I do not see a lot of point in the namespacing of encoding conversions
either.  Does anyone really need or use search-path-dependent lookup of
conversions?  (If they do, it's probably broken anyway, since for example
we do not trouble to re-identify the client encoding conversion functions
when search_path changes.)

While actually removing pg_conversion.connamespace might not be worth
the trouble, it's mighty tempting to have just a single unique index on
(conforencoding, contoencoding), thereby enforcing that There Can Be Only
One conversion between any given pair of encodings, and then we can just
use that index to find the right entry without any fooling with search
path.

But in any case we should get rid of the concept of defaultness, because
it's pointless; all entries should be equally "default".

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers