On 27 August 2013 20:06, Heikki Linnakangas <hlinnakan...@vmware.com> wrote:
> On 26.08.2013 18:59, Tom Lane wrote: > >> Heikki Linnakangas<hlinnakangas@**vmware.com <hlinnakan...@vmware.com>> >> writes: >> >> The pg_dump -E option just sets client_encoding, but I think it would be >>> better for -E to only set the encoding used in the dump, and >>> PGCLIENTENCODING env variable (if set) was used to determine the >>> encoding of the command-line arguments. Opinions? >>> >> >> I think this is going to be a lot easier said than done, but feel free >> to see if you can make it work. (As you point out, we don't have >> any client-side encoding conversion infrastructure, but I don't see >> how you're going to make this work without it.) >> > > First set client_encoding to PGCLIENTENCODING (ie. let libpq do its > default thing), and run the queries that fetch the OIDs of any matching > tables. Only after doing that, set client_encoding to that specified by -E. > That's actually pretty easy to do, as pg_dump already issues all the > queries that include user-given strings first. > > There's one small wrinkle in that: if the dump fails because the server > sends an error, the error would come from the server in the -E encoding, > because that's used as the client_encoding after the initial queries. > Logically, the errors should be printed in PGCLIENTENCODING. But I think we > can live with that. > > The pg_restore part is more difficult, as pg_restore needs to work without > a database connection at all. There the conversion has to be done > client-side. > > > A second issue is whether we should divorce -E and PGCLIENTENCODING like >> that, when they have always meant the same thing. You mentioned the >> alternative of looking at pg_dump's locale environment to determine the >> command line encoding --- would that be better? >> > > Hmm. I wasn't thinking of looking at the locale environment as an > alternative to the divorce, just as a way to determine the default for the > client encoding if PGCLIENTENCODING is not set. > > It would be good for pg_dump to be consistent with other tools (reindexdb > etc.). If we say that LC_CTYPE determines the encoding of command-line > options, regardless of PGCLIENTENCODING, we should do the same in other > tools, too. And that would be weird for the other tools. So no, I don't > think we should do that. > > My plan is to assume that the command-line options are in the "client > encoding", and the client encoding is determined by the following things, > in this order: > > 1. client_encoding setting given explicitly in the command line, as part > of the connection string. > 2. PGCLIENTENCODING > 3. LC_CTYPE > 4. same as server_encoding > > The encoding to be used in the pg_dump output is a different concept, and > there are reasons to *not* want the dump to be in the client encoding. > Hence using server_encoding for that would be a better default than the > current client encoding. This isn't so visible today, as client_encoding > defaults to server_encoding anyway, but it will be if we set > client_encoding='auto' (ie. look at LC_CTYPE) by default. > > There are certainly cases where you'd want to use the client encoding as > the dump output encoding. For example if you pipe the pg_dump output to > some custom script that mangles it. Or if you just want to look at the > output, ie. if the output goes to a terminal. But for the usual case of > taking a backup, using the server encoding makes more sense. One option is > to check if the output is a tty (like psql does), and output the dump in > client or server encoding depending on that (if -E is not given). > > So yeah, I think we should divorce -E and PGCLIENTENCODING. It feels that > using PGCLIENTENCODING to specify the output encoding was more accidental > than on purpose. Looking at the history, the implementation has been that > way forever, but the docs were adjusted by commit b524cb36 in 2005 to > document that fact. > > Here is a set of three patches that I've been working on: > > 0001-Divorce-pg_dump-E-option-**from-PGCLIENTENCODING.patch > > Separates pg_dump -E from PGCLIENTENCODING. > Since AH->encoding is now used only to represent dump encoding, we should rename it as AH->dump_encoding. ----- The standard_conforming_strings parameter is currently set in setup_connection. You have moved it in setup_dump_encoding(). Is it in any way related to encoding ? If not, I think we should keep it in setup_connection(). ----- If a user supplies pg_dump --role=ROLENAME, we do the SET-ROLE in setup_connection. The role name is in client encoding. In case of parallel pg_dump, the setup_connection() in the parent process does it right, but in the worker processes, the client_encoding is already set to the dump_encoding when -E is given (parent has already updated it's client_encoding to dump_encoding and I think the child inherits client_encoding from parent), and then in the child when SET-ROLE is called through setup_connection, it does not work because role name is in client encoding, not dump encoding. $ pg_dump -t "pöö" -E LATIN1 --role=uöö postgres # Above succeeds $ `which pg_dump` -t "pöö" -E LATIN1 -j 5 -f dumpdir -Fd postgres --role=uöö pg_dump: [archiver (db)] query failed: ERROR: role "uöö" does not exist pg_dump: [parallel archiver] query was: SET ROLE "uöö" pg_dump: [archiver (db)] query failed: ERROR: role "uöö" does not exist pg_dump: [archiver (db)] query failed: ERROR: role "uöö" does not exist pg_dump: [archiver (db)] query failed: ERROR: role "uöö" does not exist pg_dump: [archiver (db)] query failed: ERROR: role "uöö" does not exist > > 0002-Set-client_encoding-auto-**in-all-client-utilities.patch > Set client_encoding='auto' in all the client utilities, like we already > did in psql. This fixes e.g "reindexdb -t" with a table name with non-ASCII > chars > > This patch looks good, I don't have any issues with this. > 0003-Convert-object-names-to-**archive-encoding-before-matc.**patch > > Use iconv(3) in pg_restore to do encoding conversion in the client. This > involves a lot of autoconf changes that I'm not 100% sure about, other than > that it's pretty straightforward. I haven't looked into this one yet. > > > - Heikki > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > >