[HACKERS] [PATCH] Remove trailing spaces

2017-03-29 Thread Alexander Law

Hello,

Please consider committing the attached patches to remove trailing 
spaces in strings in the source code.
One patch is for localizable messages, and the other is just for 
consistency (less important).


--
Alexander Lakhin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index bfe44b8..3010723 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -683,7 +683,7 @@ usage(void)
 	printf(_("%s decodes and displays PostgreSQL transaction logs for debugging.\n\n"),
 		   progname);
 	printf(_("Usage:\n"));
-	printf(_("  %s [OPTION]... [STARTSEG [ENDSEG]] \n"), progname);
+	printf(_("  %s [OPTION]... [STARTSEG [ENDSEG]]\n"), progname);
 	printf(_("\nOptions:\n"));
 	printf(_("  -b, --bkp-details  output detailed information about backup blocks\n"));
 	printf(_("  -e, --end=RECPTR   stop reading at log position RECPTR\n"));
diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index 27155f8..1e59fc8 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -3082,7 +3082,7 @@ keep_going:		/* We will come back to here until there is
 restoreErrorMessage(conn, );
 appendPQExpBuffer(>errorMessage,
   libpq_gettext("test \"show transaction_read_only\" failed "
-" on \"%s:%s\" \n"),
+" on \"%s:%s\"\n"),
   conn->connhost[conn->whichhost].host,
   conn->connhost[conn->whichhost].port);
 conn->status = CONNECTION_OK;
diff --git a/contrib/oid2name/oid2name.c b/contrib/oid2name/oid2name.c
index 778e8ba..ec93e4b 100644
--- a/contrib/oid2name/oid2name.c
+++ b/contrib/oid2name/oid2name.c
@@ -506,20 +506,20 @@ sql_exec_searchtables(PGconn *conn, struct options * opts)
 	/* now build the query */
 	todo = psprintf(
 	"SELECT pg_catalog.pg_relation_filenode(c.oid) as \"Filenode\", relname as \"Table Name\" %s\n"
-	"FROM pg_catalog.pg_class c \n"
-		"	LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace \n"
+	"FROM pg_catalog.pg_class c\n"
+		"	LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace\n"
 	"	LEFT JOIN pg_catalog.pg_database d ON d.datname = pg_catalog.current_database(),\n"
-	"	pg_catalog.pg_tablespace t \n"
+	"	pg_catalog.pg_tablespace t\n"
 	"WHERE relkind IN (" CppAsString2(RELKIND_RELATION) ","
 	CppAsString2(RELKIND_MATVIEW) ","
 	CppAsString2(RELKIND_INDEX) ","
 	CppAsString2(RELKIND_SEQUENCE) ","
-	CppAsString2(RELKIND_TOASTVALUE) ") AND \n"
+	CppAsString2(RELKIND_TOASTVALUE) ") AND\n"
 	"		t.oid = CASE\n"
 			"			WHEN reltablespace <> 0 THEN reltablespace\n"
 	"			ELSE dattablespace\n"
-	"		END AND \n"
-	"  (%s) \n"
+	"		END AND\n"
+	"  (%s)\n"
 	"ORDER BY relname\n",
 	opts->extended ? addfields : "",
 	qualifiers);
diff --git a/src/backend/executor/nodeMergejoin.c b/src/backend/executor/nodeMergejoin.c
index 62784af..7c0725c 100644
--- a/src/backend/executor/nodeMergejoin.c
+++ b/src/backend/executor/nodeMergejoin.c
@@ -587,7 +587,7 @@ ExecMergeTupleDump(MergeJoinState *mergestate)
 	ExecMergeTupleDumpInner(mergestate);
 	ExecMergeTupleDumpMarked(mergestate);
 
-	printf(" \n");
+	printf("\n");
 }
 #endif
 
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 262f553..91319a8 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -13900,12 +13900,12 @@ dumpTSConfig(Archive *fout, TSConfigInfo *cfginfo)
 
 	resetPQExpBuffer(query);
 	appendPQExpBuffer(query,
-	  "SELECT \n"
-	  "  ( SELECT alias FROM pg_catalog.ts_token_type('%u'::pg_catalog.oid) AS t \n"
-	  "WHERE t.tokid = m.maptokentype ) AS tokenname, \n"
-	  "  m.mapdict::pg_catalog.regdictionary AS dictname \n"
-	  "FROM pg_catalog.pg_ts_config_map AS m \n"
-	  "WHERE m.mapcfg = '%u' \n"
+	  "SELECT\n"
+	  "  ( SELECT alias FROM pg_catalog.ts_token_type('%u'::pg_catalog.oid) AS t\n"
+	  "WHERE t.tokid = m.maptokentype ) AS tokenname,\n"
+	  "  m.mapdict::pg_catalog.regdictionary AS dictname\n"
+	  "FROM pg_catalog.pg_ts_config_map AS m\n"
+	  "WHERE m.mapcfg = '%u'\n"
 	  "ORDER BY m.mapcfg, m.maptokentype, m.mapseqno",
 	  cfginfo->cfgparser, cfginfo->dobj.catId.oid);
 
diff --git a/src/bin/pg_rewind/libpq_fetch.c b/src/bin/pg_rewind/libpq_fetch.c
index d86ecd3..eb74d2f 100644
--- a/src/bin/pg_rewind/libpq_fetch.c
+++ b/src/bin/pg_rewind/libpq_fetch.c
@@ -489,7 +489,7 @@ libpq_executeFileMap(filemap_t *map)
 	 * temporary table. Now, actually fetch all of those ranges.
 	 */
 	sql =
-		"SELECT path, begin, \n"
+		"SELECT path, begin,\n"
 		"  pg_read_binary_file(path, begin, len, true) AS chunk\n"
 		"FROM fetchchunks\n";
 
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index b0f3e5e..a623630 100644
--- a/src/bin/psql/describe.c
+++ 

[HACKERS] Re: [COMMITTERS] pgsql: Build HTML documentation using XSLT stylesheets by default

2016-12-15 Thread Alexander Law

Hello Alvaro,

It's caused by the condition
...
in the simple.xlink template 
(docbook/stylesheet/docbook-xsl/xhtml/inline.xsl). (This test executed 
for each xlink (~ 9 times)).

Yes, it's inefficient but it doesn't affect build time (for me).
You can try to apply the attached patch and measure the time with it.
So If the performance is rather acceptable now I'd continue switch to 
XML, and get back to the performance issues after the switch.
(epub generation is much more slow, and I have developed a patch to 
speed up it too.)


Best regards,
Alexander

01.12.2016 19:49, Alvaro Herrera wrote:

Pavel Stehule wrote:


It does much more intensive work with IO - I have feeling like there are
intensive fsync.

You could prove that, by running "make html" under "strace -f -e
trace=fsync" etc.  I just tried that, and I don't see any fsync.  I
guess you could try other syscalls, or simply "-e trace=file".  Doing
the latter I noticed an absolutely stupid number of attempts to open
file
/usr/lib/libxslt-plugins/nwalsh_com_xslt_ext_com_nwalsh_saxon_UnwrapLinks.so
which deserves a WTF.



diff --git a/doc/src/sgml/stylesheet-speedup-xhtml.xsl b/doc/src/sgml/stylesheet-speedup-xhtml.xsl
index ff08bef..60c9fd4 100644
--- a/doc/src/sgml/stylesheet-speedup-xhtml.xsl
+++ b/doc/src/sgml/stylesheet-speedup-xhtml.xsl
@@ -1,5 +1,6 @@
 
 http://www.w3.org/1999/XSL/Transform;
+xmlns:xlink="http://www.w3.org/1999/xlink;
 xmlns="http://www.w3.org/1999/xhtml;
 version='1.0'>
 
@@ -292,4 +293,189 @@
   
 
 
+
+
+
+  
+  
+
+  
+  
+  
+
+  
+  
+
+  _blank
+  _top
+  
+
+  
+
+  
+
+  
+
+
+
+  
+
+
+1
+0
+  
+
+
+
+
+  
+
+
+1
+0
+  
+
+
+
+  
+
+  
+
+  
+
+  
+
+
+  
+
+  
+
+
+
+
+
+
+  
+
+
+
+  
+
+  XLink to nonexistent id: 
+  
+
+
+  
+
+  
+
+  
+  
+
+  
+
+  
+
+  
+
+  
+
+  
+
+  
+
+
+  
+
+  
+
+  
+
+  
+
+  
+
+  
+
+
+  
+
+  
+
+  
+  
+
+  
+  
+  
+
+  
+  
+
+  
+
+  
+
+  
+  
+
+  
+
+  
+
+
+  
+
+  
+
+  
+  
+  
+
+  
+
+  
+
+  
+
+
+
+
+  
+
+
+
+  
+  
+  
+
+  
+
+  
+
+  
+
+  
+  
+
+  
+  
+
+  
+
+  
+
+
+  
+
+
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BUG #7493: Postmaster messages unreadable in a Windows console

2013-06-24 Thread Alexander Law

Hello Noah,

Thanks for your work, your patch is definitely better. I agree that this 
approach much more generic.

23.06.2013 20:53, Noah Misch wrote:

The attached revision fixes all above points.  Would you look it over?  The
area was painfully light on comments, so I added some.  I renamed
pgwin32_toUTF16(), which ceases to be a good name now that it converts from
message encoding, not database encoding.  GetPlatformEncoding() became unused,
so I removed it.  (If we have cause reintroduce the exact same concept later,
GetTTYEncoding() would name it more accurately.)
Yes, the patch works for me. I have just a little question about 
pgwin32_message_to_UTF16. Do we need to convert SQL_ASCII through UTF8 
or should SQL_ASCII be mapped to 20127 (US-ASCII (7-bit))?

What should we do for the back branches, if anything?  Fixes for garbled
display on consoles and event logs are fair to back-patch, but users might be
accustomed to the present situation for SQL_ASCII databases.  Given the low
incidence of complaints and the workaround of using logging_collector, I am
inclined to put the whole thing in master only.
I thought that the change could be a first step to the PosgreSQL log 
encoding normalization. Today the log may contain messages with 
different encodings (we had a long discussion a year ago: 
http://www.postgresql.org/message-id/5007c399.6000...@gmail.com)
Now the new function GetMessageEncoding allows to convert all the 
messages consistently. If the future log encoding fix will be considered 
as important enough to be backported, then this patch should be 
backported too.


Thanks again!

Best regards,
Alexander



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BUG #7493: Postmaster messages unreadable in a Windows console

2013-02-20 Thread Alexander Law

Hello,

15.02.2013 02:59, Noah Misch wrote:

With your proposed change, the problem will resurface in an actual SQL_ASCII
database.  At the problem's root is write_console()'s assumption that messages
are in the database encoding.  pg_bind_textdomain_codeset() tries to make that
so, but it only works for encodings with a pg_enc2gettext_tbl entry.  That
excludes SQL_ASCII, MULE_INTERNAL, and others.  write_console() needs to
behave differently in such cases.

Thank you for the notice. So it seems that DatabaseEncoding variable
alone can't present a database encoding (for communication with a
client) and current process messages encoding (for logging messages) at
once. There should be another variable, something like
CurrentProcessEncoding, that will be set to OS encoding at start and can
be changed to encoding of a connected database (if
bind_textdomain_codeset succeeded).

I'd call it MessageEncoding unless it corresponds with similar rigor to a
broader concept.

Please look at the next version of the patch.

Thanks,
Alexander
From 5bce21326d48761c6f86be8797432a69b2533dcd Mon Sep 17 00:00:00 2001
From: Alexander Lakhin exclus...@gmail.com
Date: Wed, 20 Feb 2013 15:34:05 +0400
Subject: Fix postmaster messages encoding

---
 src/backend/main/main.c|2 ++
 src/backend/utils/error/elog.c |4 ++--
 src/backend/utils/mb/mbutils.c |   24 ++--
 3 files changed, 26 insertions(+), 4 deletions(-)

diff --git a/src/backend/main/main.c b/src/backend/main/main.c
index 1173bda..ed4067e 100644
--- a/src/backend/main/main.c
+++ b/src/backend/main/main.c
@@ -100,6 +100,8 @@ main(int argc, char *argv[])
 
 	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN(postgres));
 
+	SetMessageEncoding(GetPlatformEncoding());
+
 #ifdef WIN32
 
 	/*
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 3a211bf..40f20f3 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -1868,7 +1868,7 @@ write_eventlog(int level, const char *line, int len)
 	 * Also verify that we are not on our way into error recursion trouble due
 	 * to error messages thrown deep inside pgwin32_toUTF16().
 	 */
-	if (GetDatabaseEncoding() != GetPlatformEncoding() 
+	if (GetMessageEncoding() != GetPlatformEncoding() 
 		!in_error_recursion_trouble())
 	{
 		utf16 = pgwin32_toUTF16(line, len, NULL);
@@ -1915,7 +1915,7 @@ write_console(const char *line, int len)
 	 * through to writing unconverted if we have not yet set up
 	 * CurrentMemoryContext.
 	 */
-	if (GetDatabaseEncoding() != GetPlatformEncoding() 
+	if (GetMessageEncoding() != GetPlatformEncoding() 
 		!in_error_recursion_trouble() 
 		!redirection_done 
 		CurrentMemoryContext != NULL)
diff --git a/src/backend/utils/mb/mbutils.c b/src/backend/utils/mb/mbutils.c
index 287ff80..8b51b78 100644
--- a/src/backend/utils/mb/mbutils.c
+++ b/src/backend/utils/mb/mbutils.c
@@ -57,6 +57,7 @@ static FmgrInfo *ToClientConvProc = NULL;
  */
 static pg_enc2name *ClientEncoding = pg_enc2name_tbl[PG_SQL_ASCII];
 static pg_enc2name *DatabaseEncoding = pg_enc2name_tbl[PG_SQL_ASCII];
+static pg_enc2name *MessageEncoding = pg_enc2name_tbl[PG_SQL_ASCII];
 static pg_enc2name *PlatformEncoding = NULL;
 
 /*
@@ -881,6 +882,16 @@ SetDatabaseEncoding(int encoding)
 	Assert(DatabaseEncoding-encoding == encoding);
 }
 
+void
+SetMessageEncoding(int encoding)
+{
+	if (!PG_VALID_BE_ENCODING(encoding))
+		elog(ERROR, invalid message encoding: %d, encoding);
+
+	MessageEncoding = pg_enc2name_tbl[encoding];
+	Assert(MessageEncoding-encoding == encoding);
+}
+
 /*
  * Bind gettext to the codeset equivalent with the database encoding.
  */
@@ -915,6 +926,8 @@ pg_bind_textdomain_codeset(const char *domainname)
 			if (bind_textdomain_codeset(domainname,
 		pg_enc2gettext_tbl[i].name) == NULL)
 elog(LOG, bind_textdomain_codeset failed);
+			else
+SetMessageEncoding(encoding);
 			break;
 		}
 	}
@@ -964,6 +977,13 @@ GetPlatformEncoding(void)
 	return PlatformEncoding-encoding;
 }
 
+int
+GetMessageEncoding(void)
+{
+	Assert(MessageEncoding);
+	return MessageEncoding-encoding;
+}
+
 #ifdef WIN32
 
 /*
@@ -977,7 +997,7 @@ pgwin32_toUTF16(const char *str, int len, int *utf16len)
 	int			dstlen;
 	UINT		codepage;
 
-	codepage = pg_enc2name_tbl[GetDatabaseEncoding()].codepage;
+	codepage = pg_enc2name_tbl[GetMessageEncoding()].codepage;
 
 	/*
 	 * Use MultiByteToWideChar directly if there is a corresponding codepage,
@@ -994,7 +1014,7 @@ pgwin32_toUTF16(const char *str, int len, int *utf16len)
 		char	   *utf8;
 
 		utf8 = (char *) pg_do_encoding_conversion((unsigned char *) str,
-		len, GetDatabaseEncoding(), PG_UTF8);
+		len, GetMessageEncoding(), PG_UTF8);
 		if (utf8 != str)
 			len = strlen(utf8);
 
-- 
1.7.10.4


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BUG #7493: Postmaster messages unreadable in a Windows console

2013-02-13 Thread Alexander Law

Hello,

Alexander Law exclus...@gmail.com writes:

Please look at the following l10n bug:
http://www.postgresql.org/message-id/502a26f1.6010...@gmail.com
and the proposed patch.

With your proposed change, the problem will resurface in an actual SQL_ASCII
database.  At the problem's root is write_console()'s assumption that messages
are in the database encoding.  pg_bind_textdomain_codeset() tries to make that
so, but it only works for encodings with a pg_enc2gettext_tbl entry.  That
excludes SQL_ASCII, MULE_INTERNAL, and others.  write_console() needs to
behave differently in such cases.
Thank you for the notice. So it seems that DatabaseEncoding variable 
alone can't present a database encoding (for communication with a 
client) and current process messages encoding (for logging messages) at 
once. There should be another variable, something like 
CurrentProcessEncoding, that will be set to OS encoding at start and can 
be changed to encoding of a connected database (if 
bind_textdomain_codeset succeeded).



On Tue, Feb 12, 2013 at 03:22:17AM +, Greg Stark wrote:

But that said I'm not sure saying the whole file is in an encoding is
the right approach. Paths are actually binary strings. any encoding is
purely for display purposes anyways.

For Unix, yes.  On Windows, they're ultimately UTF16 strings; some system APIs
accept paths in the Windows ANSI code page and convert to UTF16 internally.
Nonetheless, good point.
Yes, and if postresql.conf not going to be UTF16 encoded, it seems 
natural to use ANSI code page on Windows to write such paths in it.
So the paths should be written in OS encoding, which is accepted by OS 
functions, such as fopen. (This is what we have now.)
And it seems too complicated to have different encodings in one file. Or 
maybe path parameters should be separated from the others, for which OS 
encoding is undesirable.

If we knew that postgresql.conf was stored in, say, UTF8, then it would
probably be possible to perform encoding conversion to get string
variables into the database encoding.  Perhaps we should allow some
magic syntax to tell us the encoding of a config file?

 file_encoding = 'utf8'  # must precede any non-ASCII in the file

If we're going to do that we might as well use the Emacs standard
-*-coding: latin-1;-*-


Explicit encoding specification such as these (or even ?xml 
version=1.0 encoding=utf-8?) can be useful but what encoding to 
assume without it? For XML (without BOM) it's UTF-8, for emacs it 
depends on it's language environment.
If postgresql.conf doesn't have to be portable (as XML), then IMO OS 
encoding is the right choice for it.



Best regards,
Alexander


Re: [HACKERS] BUG #7493: Postmaster messages unreadable in a Windows console

2013-01-29 Thread Alexander Law

30.01.2013 05:51, Noah Misch wrote:

On Tue, Jan 29, 2013 at 09:54:04AM -0500, Tom Lane wrote:

Alexander Law exclus...@gmail.com writes:

Please look at the following l10n bug:
http://www.postgresql.org/message-id/502a26f1.6010...@gmail.com
and the proposed patch.

That patch looks entirely unsafe to me.  Neither of those functions
should be expected to be able to run when none of our standard
infrastructure (palloc, elog) is up yet.

Possibly it would be safe to do this somewhere around where we do
GUC initialization.


Looking at elog.c:write_console, and boostrap.c:AuxiliaryProcessMain, 
mcxt.c:MemoryContextInit I would place this call 
(SetDatabaseEncoding(GetPlatformEncoding())) at MemoryContextInit.
(The branch of conversion pgwin32_toUTF16 is not executed until 
CurrentMemoryContext is not null)


But I see some calls to ereport before MemoryContextInit. Is it ok or 
MemoryContext initialization should be done before?

For example, main.c:main - pgwin32_signal_initialize - ereport

And there is another issue with elog.c:write_stderr
if (pgwin32_is_service) then the process writes message to the windows 
eventlog (write_eventlog), trying to convert in to UTF16. But it doesn't 
check MemoryContext before the call to pgwin32_toUTF16 (as write_console 
does) and we can get a crash in the following way:
main.c:check_root - if (pgwin32_is_admin()) write_stderr - if 
(pgwin32_is_service()) write_eventlog - if (if (GetDatabaseEncoding() 
!= GetPlatformEncoding() ) pgwin32_toUTF16 - crash


So placing SetDatabaseEncoding(GetPlatformEncoding()) before the 
check_root can be a solution for the issue.



Even then, I wouldn't be surprised to find problematic consequences beyond
error display.  What if all the databases are EUC_JP, the platform encoding is
KOI8, and some postgresql.conf settings contain EUC_JP characters?  Does the
postmaster not rely on its use of SQL_ASCII to allow those values?

I would look at fixing this by making the error output machinery smarter in
this area before changing the postmaster's notion of server_encoding.
Maybe I still miss something but I thought that 
postinit.c/CheckMyDatabase will switch encoding of a messages by 
pg_bind_textdomain_codeset to EUC_JP so there will be no issues with it. 
But until then KOI8 should be used.
Regarding postgresql.conf, as it has no explicit encoding specification, 
it should be interpreted as having the platform encoding. So in your 
example it should contain KOI8, not EUC_JP characters.


Thanks,
Alexander


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BUG #7493: Postmaster messages unreadable in a Windows console

2013-01-28 Thread Alexander Law

Hello,
Thanks for fixing bug #6510!
Please look at the following l10n bug:
http://www.postgresql.org/message-id/502a26f1.6010...@gmail.com
and the proposed patch.

Best regards,
Alexander

From 1e2d5f712744d4731b665724703c0da4971ea41e Mon Sep 17 00:00:00 2001
From: Alexander Lakhin exclus...@gmail.com
Date: Mon, 28 Jan 2013 08:19:34 +0400
Subject: Fix postmaster messages encoding

---
 src/backend/main/main.c |6 ++
 1 file changed, 6 insertions(+)
 mode change 100644 = 100755 src/backend/main/main.c

diff --git a/src/backend/main/main.c b/src/backend/main/main.c
index 1173bda..b79a483
--- a/src/backend/main/main.c
+++ b/src/backend/main/main.c
@@ -89,6 +89,12 @@ main(int argc, char *argv[])
 	pgwin32_install_crashdump_handler();
 #endif
 
+/*
+* Use the platform encoding until the process connects to a database
+* and sets the appropriate encoding.
+*/
+	SetDatabaseEncoding(GetPlatformEncoding());
+
 	/*
 	 * Set up locale information from environment.	Note that LC_CTYPE and
 	 * LC_COLLATE will be overridden later from pg_control if we are in an
-- 
1.7.10.4




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BUG #6510: A simple prompt is displayed using wrong charset

2013-01-23 Thread Alexander Law

Hello,
Please let me know if I can do something to get the bug fix 
(https://commitfest.postgresql.org/action/patch_view?id=902) committed.
I would like to fix other bugs related to postgres localization, but I 
am not sure yet how to do it.


Thanks in advance,
Alexander

18.10.2012 19:46, Alvaro Herrera wrote:

Noah Misch escribió:


Following an off-list ack from Alexander, here is that version.  No functional
differences from Alexander's latest version, and I have verified that it still
fixes the original test case.  I'm marking this Ready for Committer.

This seems good to me, but I'm not comfortable committing Windows stuff.
Andrew, Magnus, are you able to handle this?







--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: BUG #6742: pg_dump doesn't convert encoding of DB object names to OS encoding

2012-07-25 Thread Alexander Law

Hello,
I would like to fix this bug, but it looks like it would be not one-line 
patch.
Looking at the pg_dump code I see that the object names come through the 
following chain:
1. pg_dump executes 'SELECT c.tableoid, c.oid, c.relname, ... ' and gets 
the object_name with the encoding chosen for db connection/dump.

2. it invokes write_msg function or alike:
write_msg(NULL, finding the columns and types of table \%s\\n, 
tbinfo-dobj.name);

3. vwrite_msg localizes text message, but not the argument(s):
vfprintf(stderr, _(fmt), ap);
Here gettext (_) internally translates fmt to OS encoding (if it's 
different from UTF-8 - encoding of a localized strings).


And I can see only a few solutions of the problem:
1. To convert the object name at the back-end, i.e. to modify all the 
similar SELECT's as:
'SELECT c.tableoid, c.oid, c.relname, convert_to(c.relname, 
'OS_ENCODING') AS locrelname, ...'
and then do write_msg(NULL, finding the columns and types of table 
\%s\\n, tbinfo-dobj.local_name);
The downside of this approach is that it requires rewriting all the 
SELECT's for all the object. And it doesn't help us to write out any 
other text from backend, such as localized backend error.


2. To setup another connection to backend with the OS encoding, and to 
get all the object names through it. It looks insane too. And we have 
the same problem with the localized backend errors coming on main 
connection.


3. To make convert_to_os_encoding(text, encoding) function for a 
frontend utilities. Unfortunately frontend can't use internal PostgreSQL 
conversion functions, and modifying them to use through libpq looks 
unfeasible.
So the only way to implement such function is to use another encoding 
conversion framework (library).
And my question is - is it possible to include libiconv (add this 
dependency) to the frontend utilities code?


4. To force users to use OS encoding as the Database encoding. Or to not 
use non-ASCII characters in an db object names and to disable nls on 
Windows completely. It doesn't look like a solution at all.


BTW, it's not the only one instance of the issue. For example, when I 
try to use vacuumdb, I get completely unreadable messages:

http://oi48.tinypic.com/1c8j9.jpg
(blue marks what is in Russian or English, all the other text is gibberish).

Best regards,
Alexander


18.07.2012 12:51, Alexander Law wrote:

Hello,

The dump file itself is correct. The issue is only with the non-ASCII 
object names in pg_dump messages.
The messages text (which is non-ASCII too) displayed consistently with 
right encoding (i.e. with OS encoding thanks to libintl/gettext), but 
encoding of db object names depends on the dump encoding and thus 
they're getting unreadable when different encoding is used.
The same can be reproduced in Linux (where console encoding is UTF-8) 
when doing dump with Windows-1251 or Latin1 (for western european 
languages).


Thanks,
Alexander


The following bug has been logged on the website:

Bug reference:  6742
Logged by:  Alexander LAW
Email address:  exclusion(at)gmail(dot)com
PostgreSQL version: 9.1.4
Operating system:   Windows
Description:

When I try to dump database with UTF-8 encoding in Windows, I get unreadable
object names.
Please look at the screenshot (http://oi50.tinypic.com/2lw6ipf.jpg). On the
left window all the pg_dump messages displayed correctly (except for the
prompt password (bug #6510)), but the non-ASCII object name is gibberish. On
the right window (where dump is done with the Windows 1251 encoding (OS
Encoding for Russian locale)) everything is right.

Did you check the dump file using an editor that can handle UTF-8?
The Windows console is not known for properly handling that encoding.

Thomas