Re: [HACKERS] warning handling in Perl scripts

2012-06-25 Thread Peter Eisentraut
On sön, 2012-06-24 at 16:05 -0400, Robert Haas wrote:
  diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
  index ebc4825..7d66da9 100644
  --- a/src/backend/catalog/genbki.pl
  +++ b/src/backend/catalog/genbki.pl
  @@ -19,6 +19,8 @@
   use strict;
   use warnings;
 
  +local $SIG{__WARN__} = sub { die $_[0] };
  +
   my @input_files;
   our @include_path;
   my $output_path = '';
 
  would address that.
 
  Could that cause any other problems?  Should it be added to all Perl
  scripts?
 
 This seems like a band-aid.

I'd think of it as a safety net.

 How about if we instead add whatever error-handling the script is
 missing, so that it produces an appropriate, human-readable error
 message?

That could also be worthwhile, but I think given the audience of that
script and the complexity of the possible failures, it could be a very
large and aimless project.  But there should still be a catch-all for
uncaught failures.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_tablespace.spclocation column removed in 9.2

2012-06-25 Thread Guillaume Lelarge
Hi Pavel,

On Mon, 2012-06-25 at 08:26 +0300, Pavel Golub wrote:
 Hello, Pgsql-bugs.
 
 According to the Moving tablespaces thread started by Bruce
 http://archives.postgresql.org/pgsql-docs/2011-12/msg3.php
 pg_tablespace.spclocation column is removed in the 9.2beta. However
 this breaks backward compatibility for a bunch of products, e.g.
 pgAdmin, phpPgAdmin, PgMDD etc.
 
 I'm not sure this is the best choice. Because each application with
 tablespace support will need additional check now to determine what
 way to use for obtaining tablespace location:
 pg_get_tablespace_location(oid) or tablespace.spclocation
 
 I'm aware of problems caused by this hard coded column. My proposal is
 to convert pg_tablespace to system view may be?
 

I don't see why it causes you so much trouble. You should already have
many locations in your code where you need to check the version to be
compatible with the latest major releases. I know pgAdmin does. So I
guess that one more is not a big deal.

And this change in PostgreSQL helps a lot DBAs who want to move
tablespaces (not really common work AFAIK, I agree).


-- 
Guillaume
http://blog.guillaume.lelarge.info
http://www.dalibo.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_tablespace.spclocation column removed in 9.2

2012-06-25 Thread Pavel Golub
Hello, Guillaume.

You wrote:

GL Hi Pavel,

GL On Mon, 2012-06-25 at 08:26 +0300, Pavel Golub wrote:
 Hello, Pgsql-bugs.
 
 According to the Moving tablespaces thread started by Bruce
 http://archives.postgresql.org/pgsql-docs/2011-12/msg3.php
 pg_tablespace.spclocation column is removed in the 9.2beta. However
 this breaks backward compatibility for a bunch of products, e.g.
 pgAdmin, phpPgAdmin, PgMDD etc.
 
 I'm not sure this is the best choice. Because each application with
 tablespace support will need additional check now to determine what
 way to use for obtaining tablespace location:
 pg_get_tablespace_location(oid) or tablespace.spclocation
 
 I'm aware of problems caused by this hard coded column. My proposal is
 to convert pg_tablespace to system view may be?
 

GL I don't see why it causes you so much trouble.

Not so much. However.

GL You should already have
GL many locations in your code where you need to check the version to be
GL compatible with the latest major releases.

This is holy true.

GL I know pgAdmin does. So I
GL guess that one more is not a big deal.

GL And this change in PostgreSQL helps a lot DBAs who want to move
GL tablespaces (not really common work AFAIK, I agree).

I know. I just followed the advice of Josh Berkus and added this as a
bug.

-- 
With best wishes,
 Pavel  mailto:pa...@gf.microolap.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] temporal support patch

2012-06-25 Thread Vlad Arkhipov

On 05/31/2012 11:52 AM, Jeff Davis wrote:

On Wed, 2012-05-16 at 23:14 +0200, Miroslav Šimulčík wrote:

Hi all,


as a part of my master's thesis I have created temporal support patch
for PostgreSQL. It enables the creation of special temporal tables
with entries versioning. Modifying operations (UPDATE, DELETE,
TRUNCATE) on these tables don't cause permanent changes to entries,
but create new versions of them. Thus user can easily get to the past
states of the table.


I would be very interested to see this, thank you for working on it.

There are quite a few aspects to a temporal database system, and you are
working on a system-maintained transaction-time historical table, right?
Or are there other aspects to your proposal?

Some general comments:

* I'd very much like to see you make use of Range Types from 9.2; in
particular, TSTZRANGE would be much better than holding two timestamps.
If a standard requires you to display two timestamps in certain
situations, perhaps you could use ranges internally and display the
boundaries as timestamps when needed.
It's not sufficient to store only a period of validity for a row. If two 
transactions started in the same time change the same record, you have a 
problem with TSTZRANGE type because it's normalized to empty interval. 
The other issue is how to handle multiple changes of the same record 
within the transaction. Should they be stored or not?
Also it's necessary to store some kind of operation type that was 
applied to the record (insert/update/delete). For example, there is a 
table with one record with validity period [0, ) and value 'A'.


First way
1. Delete this record in time 1, now there is [0, 1), A in the history 
table.
2. Insert a new record in time 1, now there is [0, 1), A in the history 
table and [1, ), B record in the current data table.


Second way
1. Update this record in time 1, now there is [0, 1), A in the history 
table and [1, ), B record in the current data table.


So you have the same data in the tables but the actions that led to this 
configuration were different and the history has been lost partly.



* There is other useful information that could be recorded, such as the
user who inserted/updated/deleted the record.
I'm not sure that the database user is the proper thing to be stored in 
the history table. Many applications usually connect to a database using 
some virtual user and have their own users/roles tables to handle with 
privileges. There should be some way to substitute the stored user in 
the history table with the application's one. It's also helpful to store 
transaction id that inserted/updated/deleted the record.



* For some purposes, it's very useful to keep track of the columns that
changed. For instance, a query like show me any time a salary was
changed over the last month (or some other rare event) would be very
slow to run if there was not some explicit annotation on the historical
records (e.g. a columns changed bitmap or something).
It's a great proposal but seems to be impossible to implement with 
triggers only solution, isn't it? Is there any kind of hooks on ALTER 
TABLE ... in PostgreSQL to update changed columns bitmaps when table 
structure changes?

* In general, I'm not fond of adorning queries with TRANSACTION TIME AS
OF... kinds of things. Those constructs are redundant with a WHERE
clause (on a range type, you'd use the contains operator). If a
standard requires that, maybe it would be OK to allow such things as
syntactic sugar.
In SQL2011 there is only one table with the all data, historical and 
current. So it's not very convenient to specifiy WHERE condition on 
system time everywhere and for all tables in the query. By default only 
the current data is selected with a query like SELECT * FROM table.

* As Jim mentioned, it might make sense to use something resembling
inheritance so that selecting from the historical table includes the
current data (but with no upper bound for the range).
We have a success experience with inheritance with our trigger-based 
solution. It's completely transparent for the existing applications and 
does not have any impact on performance.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] compiler warnings on mingw

2012-06-25 Thread Peter Eisentraut
I've tried to cross-compile PostgreSQL from Linux to Windows, following
the ideas of Andrew Dunstan [0].  This works quite well.  I see two
compiler warnings altogether, which might be worth getting rid of:

#1

mingwcompat.c:60:1: warning: ‘RegisterWaitForSingleObject’ redeclared without 
dllimport attribute: previous dllimport ignored [-Wattributes]

This can apparently go away with this:

diff --git a/src/backend/port/win32/mingwcompat.c 
b/src/backend/port/win32/mingwcompat.c
index 0978e8c..b1a5ca5 100644
--- a/src/backend/port/win32/mingwcompat.c
+++ b/src/backend/port/win32/mingwcompat.c
@@ -56,6 +56,7 @@
(PHANDLE, HANDLE, WAITORTIMERCALLBACK, PVOID, ULONG, ULONG);
 static __RegisterWaitForSingleObject _RegisterWaitForSingleObject = NULL;
 
+__attribute__((dllimport))
 BOOL   WINAPI
 RegisterWaitForSingleObject(PHANDLE phNewWaitObject,
HANDLE hObject,

Oddly, the mingw buildfarm member[1] complains about

mingwcompat.c:66: warning: no previous prototype for 
'RegisterWaitForSingleObject'

instead.  So there might be some divergent header files around.

Anyone know details about this?

#2

pg_stat_statements.c: In function ‘pgss_ProcessUtility’:
pg_stat_statements.c:840:4: warning: unknown conversion type character ‘l’ in 
format [-Wformat]
pg_stat_statements.c:840:4: warning: too many arguments for format 
[-Wformat-extra-args]

We use a replacement snprintf and set the int64 format to %lld and %llu
based on that.  But pg_stat_statements.c uses sscanf, for which we have
no replacement.  The configure check comments

# MinGW uses '%I64d', though gcc throws an warning with -Wall,
# while '%lld' doesn't generate a warning, but doesn't work.

So assuming that sscanf in the mingw C library works consistently with
snprintf, that might mean that pg_stat_statements is broken on that
platform.  (The claim that %lld doesn't generate a warning is also
questionable here.)


[0]: 
http://people.planetpostgresql.org/andrew/index.php?/archives/264-Cross-compiling-PostgreSQL-for-WIndows.html
[1]: 
http://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=narwhaldt=2012-06-22%2004%3A00%3A05stg=make

PS: Instructions for Debian:

apt-get install gcc-mingw-w64

./configure --build=$(config/config.guess) --host=i686-w64-mingw32 
--without-zlib --without-readline ZIC=/usr/sbin/zic
make world



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] compiler warnings on mingw

2012-06-25 Thread Magnus Hagander
On Mon, Jun 25, 2012 at 11:42 AM, Peter Eisentraut pete...@gmx.net wrote:
 I've tried to cross-compile PostgreSQL from Linux to Windows, following
 the ideas of Andrew Dunstan [0].  This works quite well.  I see two
 compiler warnings altogether, which might be worth getting rid of:

 #1

 mingwcompat.c:60:1: warning: ‘RegisterWaitForSingleObject’ redeclared without 
 dllimport attribute: previous dllimport ignored [-Wattributes]

 This can apparently go away with this:

 diff --git a/src/backend/port/win32/mingwcompat.c 
 b/src/backend/port/win32/mingwcompat.c
 index 0978e8c..b1a5ca5 100644
 --- a/src/backend/port/win32/mingwcompat.c
 +++ b/src/backend/port/win32/mingwcompat.c
 @@ -56,6 +56,7 @@
            (PHANDLE, HANDLE, WAITORTIMERCALLBACK, PVOID, ULONG, ULONG);
  static __RegisterWaitForSingleObject _RegisterWaitForSingleObject = NULL;

 +__attribute__((dllimport))
  BOOL       WINAPI
  RegisterWaitForSingleObject(PHANDLE phNewWaitObject,
                            HANDLE hObject,

Seems like a proper fix to me - but could be verified by checking what
the actual mingw header looks like. Or maybe that's what you did
already..


 Oddly, the mingw buildfarm member[1] complains about

 mingwcompat.c:66: warning: no previous prototype for 
 'RegisterWaitForSingleObject'

I think that one is just laziness - in the case when we're injecting
that API function into mingw we should declare it in our own headers.
It was likely just left out because the proper API headers already
carry it, it was just missing in mingw.. So it hsould be added to the
port header, under an #ifdef.


 instead.  So there might be some divergent header files around.

 Anyone know details about this?

Perhaps mingw has added it to their api *properly* this time, and the
whole function should go away from mingwcompat.c? In that case it'd
obviously require a configure test, since it doesn't exist in previous
releases.


 #2

 pg_stat_statements.c: In function ‘pgss_ProcessUtility’:
 pg_stat_statements.c:840:4: warning: unknown conversion type character ‘l’ in 
 format [-Wformat]
 pg_stat_statements.c:840:4: warning: too many arguments for format 
 [-Wformat-extra-args]

 We use a replacement snprintf and set the int64 format to %lld and %llu
 based on that.  But pg_stat_statements.c uses sscanf, for which we have
 no replacement.  The configure check comments

 # MinGW uses '%I64d', though gcc throws an warning with -Wall,
 # while '%lld' doesn't generate a warning, but doesn't work.

 So assuming that sscanf in the mingw C library works consistently with
 snprintf, that might mean that pg_stat_statements is broken on that
 platform.  (The claim that %lld doesn't generate a warning is also
 questionable here.)

can't commend on that part without more investigation.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Allow breaking out of hung connection attempts

2012-06-25 Thread Ryan Kelly
Shigeru:

Thank you very much for your review. Comments are inline below, and a
new patch is attached.

On Fri, Jun 22, 2012 at 10:06:53AM +0900, Shigeru HANADA wrote:
 (2012/05/01 0:30), Ryan Kelly wrote:
 On Mon, Apr 30, 2012 at 09:02:33AM -0400, Alvaro Herrera wrote:
 Well, do *you* want it?
 Of course. That way I can stop patching my psql and go back to using the
 one that came with my release :)
 
 Here is result of my review of v4 patch.
 
 Submission
 --
 - The patch is in context diff format
 - It needs rebase for HEAD, but patch command takes care automatically.
 - Make finishes cleanly, and all regression tests pass
 
 Functionality
 -
 After applying this patch, I could cancel connection attempt at
 start-up by pressing Ctrl+C on terminal just after invoking psql
 command with an unused IP address.  Without this patch, such attempt
 ends up with error such as No route to host after uninterruptable
 few seconds (maybe the duration until error would depend on
 environment).
 
 Connection attempt by \connect command could be also canceled by
 pressing Ctrl+C on psql prompt.
 
 In addition, I tried setting PGCONNECT_TIMEOUT to 0 (infinite), but
 psql gave up after few seconds, for both start-up and re-connect.
 Is this intentional behavior?
A timeout of 0 (infinite) means to keep trying until we succeed or fail,
not keep trying forever. As you mentioned above, your connection
attempts error out after a few seconds. This is what is happening. In my
environment no such error occurs and as a result psql continues to
attempt to connect for as long as I let it.

 Detail of changes
 -
 Basically this patch consists of three changes:
 
 1) defer setup of cancel handler until start-up connection has established
 2) new libpq API PQconnectTimeout which returns connect_timeout
 value of current session
 3) use asynchronous connection API at \connect psql command, this
 requires API added by 2)
 
 These seem reasonable to achieve canceling connection attempt at
 start-up and \connect, but, as Ryan says, codes added to do_connect
 might need more simplification.
 
 I have some random comments.
 
 - Checking status by calling PQstatus just after
 PQconnectStartParams is necessary.
Yes, I agree.

 - Copying only select() part of pqSocketPoll seems not enough,
 should we use poll(2) if it is supported?
I did not think the additional complexity was worth it in this case.
Unless you see some reason to use poll(2) that I do not.

 - Don't we need to clear error message stored in PGconn after
 PQconnectPoll returns OK status, like connectDBComplete?
I do not believe there is a client interface for clearing the error
message. Additionally, the documentation states that PQerrorMessage
Returns the error message most recently generated by an operation on
the connection. This seems to indicate that the error message should be
cleared as this behavior is part of the contract of PQerrorMessage.

 
 Regards,
 -- 
 Shigeru HANADA

-Ryan Kelly
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 5c5dd68..79f4bc0 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -1496,6 +1496,24 @@ char *PQoptions(const PGconn *conn);
   /para
  /listitem
 /varlistentry
+
+varlistentry id=libpq-pqconnecttimeout
+ term
+  functionPQconnectTimeout/function
+  indexterm
+   primaryPQconnectTimeout/primary
+  /indexterm
+ /term
+
+ listitem
+  para
+   Returns the connect_timeout property as given to libpq.
+synopsis
+char *PQconnectTimeout(const PGconn *conn);
+/synopsis
+  /para
+ /listitem
+/varlistentry
/variablelist
   /para
 
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 0926786..23b0106 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -1506,7 +1506,7 @@ static bool
 do_connect(char *dbname, char *user, char *host, char *port)
 {
 	PGconn	   *o_conn = pset.db,
-			   *n_conn;
+			   *n_conn = NULL;
 	char	   *password = NULL;
 
 	if (!dbname)
@@ -1539,7 +1539,7 @@ do_connect(char *dbname, char *user, char *host, char *port)
 
 	while (true)
 	{
-#define PARAMS_ARRAY_SIZE	8
+#define PARAMS_ARRAY_SIZE	9
 		const char **keywords = pg_malloc(PARAMS_ARRAY_SIZE * sizeof(*keywords));
 		const char **values = pg_malloc(PARAMS_ARRAY_SIZE * sizeof(*values));
 
@@ -1557,30 +1557,140 @@ do_connect(char *dbname, char *user, char *host, char *port)
 		values[5] = pset.progname;
 		keywords[6] = client_encoding;
 		values[6] = (pset.notty || getenv(PGCLIENTENCODING)) ? NULL : auto;
-		keywords[7] = NULL;
-		values[7] = NULL;
+		keywords[7] = connect_timeout;
+		values[7] = PQconnectTimeout(o_conn);
+		keywords[8] = NULL;
+		values[8] = NULL;
 
-		n_conn = PQconnectdbParams(keywords, values, true);
-
-		free(keywords);
-		free(values);
-
-		/* We can immediately discard the password -- no longer needed */
-		if (password)
-			free(password);
-
-		if (PQstatus(n_conn) == 

[HACKERS] pg_upgrade broken by xlog numbering

2012-06-25 Thread Kevin Grittner
On HEAD at the moment, `make check-world` is failing on a 32-bit Linux
build:

+ pg_upgrade -d
/home/kevin/pg/master/contrib/pg_upgrade/tmp_check/data.old -D
/home/kevin/pg/master/contrib/pg_upgrade/tmp_check/data -b
/home/kevin/pg/master/contrib/pg_upgrade/tmp_check/install//home/kevin/pg/master/Debug/bin
-B
/home/kevin/pg/master/contrib/pg_upgrade/tmp_check/install//home/kevin/pg/master/Debug/bin
Performing Consistency Checks
-
Checking current, bin, and data directories ok
Checking cluster versions   ok
Some required control information is missing;  cannot find:
  first log file ID after reset
  first log file segment after reset

Cannot continue without required control information, terminating
Failure, exiting



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq compression

2012-06-25 Thread Magnus Hagander
On Mon, Jun 25, 2012 at 4:04 AM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Jun 22, 2012 at 12:38 PM, Euler Taveira eu...@timbira.com wrote:
 On 20-06-2012 17:40, Marko Kreen wrote:
 On Wed, Jun 20, 2012 at 10:05 PM, Florian Pflug f...@phlo.org wrote:
 I'm starting to think that relying on SSL/TLS for compression of
 unencrypted connections might not be such a good idea after all. We'd
 be using the protocol in a way it quite clearly never was intended to
 be used...

 Maybe, but what is the argument that we should avoid
 on encryption+compression at the same time?

 AES is quite lightweight compared to compression, so should
 be no problem in situations where you care about compression.

 If we could solve compression problem without AES that will turn things
 easier. Compression-only via encryption is a weird manner to solve the 
 problem
 in the user's POV.

 RSA is noticeable, but only for short connections.
 Thus easily solvable with connection pooling.

 RSA overhead is not the main problem. SSL/TLS setup is.

 And for really special compression needs you can always
 create a UDF that does custom compression for you.

 You have to own the code to modify it; it is not always an option.

 So what exactly is the situation we need to solve
 with postgres-specific protocol compression?

 Compression only support. Why do I need to set up SSL/TLS just for 
 compression?

 IMHO SSL/TLS use is no different from relying in another library to handle
 compression for the protocol and more it is compression-specific. That way, 
 we
 could implement another algorithms in such library without needing to modify
 libpq code. Using SSL/TLS you are bounded by what SSL/TLS software products
 decide to use as compression algorithms. I'll be happy to maintain the code
 iif it is postgres-specific or even as close as possible to core.

 I guess my feeling on this is that, so far as I can see, supporting
 compression via OpenSSL involves work and trade-offs, and supporting
 it without depending on OpenSSL also involves work, and trade-offs.

Nice summary :)

 So it's not real evident to me that we should prefer one to the other
 on general principle.  It seems to me that a lot might come down to
 performance.  If someone can demonstrate that using an external
 library involves gets significantly better compression, chews up
 significantly less CPU time, and/or is significantly less code than
 supporting this via OpenSSL, then maybe we ought to consider it.

I think we should, yes. But as you say, we need to know first. It's
also a question of if one of these compression schemes are trivial
enough that we could embed the code rather than rely on it externally
- I have no idea if that's even remotely possibe, but that would move
the goalposts a bit too.

 OpenSSL is kind of an ugly piece of code, and all things being equal
 depending on it more heavily would not be my first choice.

Indeed.

But we should really stop saying rely on openssl and start saying
rely on the ssl library. There are other SSL libraries which are not
quite so ugly, which we should eventually support.

That said, it's *still* a bit ugly to rely on the SSL library for this, IMO.


 On the other hand, this does not seem to me to be a situation where we
 should accept a patch to use an external library just because someone
 takes the time to write one, because there is also a non-trivial cost
 for the entire project to depending on more things; or if the
 compression code gets put into the backend, then there's a cost to us
 to maintain that code inside our source base.  So I think we really
 need someone to try this both ways and compare.  Right now it seems
 like we're mostly speculating on how well either approach would work,
 which is not as good as having some experimental results.  If, for
 example, someone can demonstrate that an awesomebsdlz compresses 10x
 as fast as OpenSSL...  that'd be pretty compelling.

Or that it takes less code/generates cleaner code...

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP Patch: Selective binary conversion of CSV file foreign tables

2012-06-25 Thread Kohei KaiGai
Fujita-san,

The revised patch is almost good for me.
Only point I noticed is the check for redundant or duplicated options
I pointed out on the previous post.
So, if you agree with the attached patch, I'd like to hand over this
patch for committers.

All I changed is:
 --- a/src/backend/commands/copy.c
 +++ b/src/backend/commands/copy.c
 @@ -122,6 +122,11 @@ typedef struct CopyStateData
@@ -217,7 +221,7 @@ index 98bcb2f..0143d81 100644
}
 +  else if (strcmp(defel-defname, convert_binary) == 0)
 +  {
-+  if (cstate-convert_binary)
++  if (cstate-convert_selectively)
 +  ereport(ERROR,
 +  (errcode(ERRCODE_SYNTAX_ERROR),
 +   errmsg(conflicting or redundant options)));

Thanks,

2012/6/20 Etsuro Fujita fujita.ets...@lab.ntt.co.jp:
 Hi KaiGai-san,

 Thank you for the review.

 -Original Message-
 From: pgsql-hackers-ow...@postgresql.org
 [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Kohei KaiGai
 Sent: Wednesday, June 20, 2012 1:26 AM
 To: Etsuro Fujita
 Cc: Robert Haas; pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] WIP Patch: Selective binary conversion of CSV file
 foreign tables

 Hi Fujita-san,

 Could you rebase this patch towards the latest tree?
 It was unavailable to apply the patch cleanly.

 Sorry, I updated the patch.  Please find attached an updated version of the
 patch.

 I looked over the patch, then noticed a few points.

 At ProcessCopyOptions(), defel-arg can be NIL, isn't it?
 If so, cstate-convert_binary is not a suitable flag to check redundant
 option. It seems to me cstate-convert_selectively is more suitable flag
 to check it.

 +       else if (strcmp(defel-defname, convert_binary) == 0)
 +       {
 +           if (cstate-convert_binary)
 +               ereport(ERROR,
 +                       (errcode(ERRCODE_SYNTAX_ERROR),
 +                        errmsg(conflicting or redundant options)));
 +           cstate-convert_selectively = true;
 +           if (defel-arg == NULL || IsA(defel-arg, List))
 +               cstate-convert_binary = (List *) defel-arg;
 +           else
 +               ereport(ERROR,
 +                       (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 +                        errmsg(argument to option \%s\ must be a
 list of column names,
 +                               defel-defname)));
 +       }

 Yes, defel-arg can be NIL.  defel-arg is a List structure listing all the
 columns needed to be converted to binary representation, which is NIL for
 the case where no columns are needed to be converted.  For example,
 defel-arg is NIL for SELECT COUNT(*).  In this case, while
 cstate-convert_selectively is set to true, no columns are converted at
 NextCopyFrom().  Most efficient case!  In short, cstate-convert_selectively
 represents whether to do selective binary conversion at NextCopyFrom(), and
 cstate-convert_binary represents all the columns to be converted at
 NextCopyFrom(), that can be NIL.

 At NextCopyFrom(), this routine computes default values if configured.
 In case when these values are not referenced, it might be possible to skip
 unnecessary calculations.
 Is it unavailable to add logic to avoid to construct cstate-defmap on
 unreferenced columns at  BeginCopyFrom()?

 I think that we don't need to add the above logic because file_fdw does
 BeginCopyFrom() with attnamelist = NIL, in which case, BeginCopyFrom()
 doesn't construct cstate-defmap at all.

 I fixed a bug plus some minor optimization in check_binary_conversion() that
 is renamed to check_selective_binary_conversion() in the updated version,
 and now file_fdw gives up selective binary conversion for the following
 cases:

  a) BINARY format
  b) CSV/TEXT format and whole row reference
  c) CSV/TEXT format and all the user attributes needed


 Best regards,
 Etsuro Fujita

 Thanks,

 2012/5/11 Etsuro Fujita fujita.ets...@lab.ntt.co.jp:
  -Original Message-
  From: Robert Haas [mailto:robertmh...@gmail.com]
  Sent: Friday, May 11, 2012 1:36 AM
  To: Etsuro Fujita
  Cc: pgsql-hackers@postgresql.org
  Subject: Re: [HACKERS] WIP Patch: Selective binary conversion of CSV
  file foreign tables
 
  On Tue, May 8, 2012 at 7:26 AM, Etsuro Fujita
  fujita.ets...@lab.ntt.co.jp
  wrote:
   I would like to propose to improve parsing efficiency of
   contrib/file_fdw by selective parsing proposed by Alagiannis et
   al.[1], which means that for a CSV/TEXT file foreign table,
   file_fdw performs binary conversion only for the columns needed for
   query processing.  Attached is a WIP patch implementing the feature.
 
  Can you add this to the next CommitFest?  Looks interesting.
 
  Done.
 
  Best regards,
  Etsuro Fujita
 
  --
  Robert Haas
  EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL
  Company
 
 
 
 
  --
  Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To
  make changes to your subscription:
  

Re: [HACKERS] libpq compression

2012-06-25 Thread Florian Pflug
On Jun25, 2012, at 04:04 , Robert Haas wrote:
 If, for
 example, someone can demonstrate that an awesomebsdlz compresses 10x
 as fast as OpenSSL...  that'd be pretty compelling.

That, actually, is demonstrably the case for at least Google's snappy.
(and LZO, but that's not an option since its license is GPL) They state in
their documentation that

  In our tests, Snappy usually is faster than algorithms in the same class
  (e.g. LZO, LZF, FastLZ, QuickLZ, etc.) while achieving comparable
  compression ratios.

The only widely supported compression method for SSL seems to be DEFLATE,
which is also what gzip/zlib uses. I've benchmarked LZO against gzip/zlib
a few months ago, and LZO outperformed zlib in fast mode (i.e. gzip -1) by
an order of magnitude.

The compression ratio achieved by DEFLATE/gzip/zlib is much better, though.
The snappy documentation states

  Typical compression ratios (based on the benchmark suite) are about
  1.5-1.7x for plain text, about 2-4x for HTML, and of course 1.0x for
  JPEGs, PNGs and other already-compressed data. Similar numbers for zlib
  in its fastest mode are 2.6-2.8x, 3-7x and 1.0x, respectively.

Here are a few numbers for LZO vs. gzip. Snappy should be comparable to
LZO - I tested LZO because I still had the command-line compressor lzop
lying around on my machine, whereas I'd have needed to download and compile
snappy first.

$ dd if=/dev/random of=data bs=1m count=128
$ time gzip -1  data  data.gz
real0m6.189s
user0m5.947s
sys 0m0.224s
$ time lzop  data  data.lzo
real0m2.697s
user0m0.295s
sys 0m0.224s
$ ls -lh data*
-rw-r--r--  1 fgp  staff   128M Jun 25 14:43 data
-rw-r--r--  1 fgp  staff   128M Jun 25 14:44 data.gz
-rw-r--r--  1 fgp  staff   128M Jun 25 14:44 data.lzo

$ dd if=/dev/zero of=zeros bs=1m count=128
$ time gzip -1  zeros  zeros.gz
real0m1.083s
user0m1.019s
sys 0m0.052s
$ time lzop  zeros  zeros.lzo
real0m0.186s
user0m0.123s
sys 0m0.053s
$ ls -lh zeros*
-rw-r--r--  1 fgp  staff   128M Jun 25 14:47 zeros
-rw-r--r--  1 fgp  staff   572K Jun 25 14:47 zeros.gz
-rw-r--r--  1 fgp  staff   598K Jun 25 14:47 zeros.lzo

To summarize, on my 2.66 Ghz Core2 Duo Macbook Pro, LZO compresses about
350MB/s if the data is purely random, and about 800MB/s if the data
compresses extremely well. (Numbers based on user time since that indicates
the CPU time used, and ignores the IO overhead, which is substantial)

IMHO, the only compelling argument (and a very compelling one) to use
SSL compression was that it requires very little code on our side. We've
since discovered that it's not actually that simple, at least if we want
to support compression without authentication or encryption, and don't
want to restrict ourselves to using OpenSSL forever. So unless we give
up at least one of those requirements, the arguments for using
SSL-compression are rather thin, I think.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq compression

2012-06-25 Thread k...@rice.edu
On Mon, Jun 25, 2012 at 03:12:46PM +0200, Florian Pflug wrote:
 On Jun25, 2012, at 04:04 , Robert Haas wrote:
  If, for
  example, someone can demonstrate that an awesomebsdlz compresses 10x
  as fast as OpenSSL...  that'd be pretty compelling.
 
 That, actually, is demonstrably the case for at least Google's snappy.
 (and LZO, but that's not an option since its license is GPL) They state in
 their documentation that
 
   In our tests, Snappy usually is faster than algorithms in the same class
   (e.g. LZO, LZF, FastLZ, QuickLZ, etc.) while achieving comparable
   compression ratios.
 
 The only widely supported compression method for SSL seems to be DEFLATE,
 which is also what gzip/zlib uses. I've benchmarked LZO against gzip/zlib
 a few months ago, and LZO outperformed zlib in fast mode (i.e. gzip -1) by
 an order of magnitude.
 
 The compression ratio achieved by DEFLATE/gzip/zlib is much better, though.
 The snappy documentation states
 
   Typical compression ratios (based on the benchmark suite) are about
   1.5-1.7x for plain text, about 2-4x for HTML, and of course 1.0x for
   JPEGs, PNGs and other already-compressed data. Similar numbers for zlib
   in its fastest mode are 2.6-2.8x, 3-7x and 1.0x, respectively.
 
 Here are a few numbers for LZO vs. gzip. Snappy should be comparable to
 LZO - I tested LZO because I still had the command-line compressor lzop
 lying around on my machine, whereas I'd have needed to download and compile
 snappy first.
 
 $ dd if=/dev/random of=data bs=1m count=128
 $ time gzip -1  data  data.gz
 real  0m6.189s
 user  0m5.947s
 sys   0m0.224s
 $ time lzop  data  data.lzo
 real  0m2.697s
 user  0m0.295s
 sys   0m0.224s
 $ ls -lh data*
 -rw-r--r--  1 fgp  staff   128M Jun 25 14:43 data
 -rw-r--r--  1 fgp  staff   128M Jun 25 14:44 data.gz
 -rw-r--r--  1 fgp  staff   128M Jun 25 14:44 data.lzo
 
 $ dd if=/dev/zero of=zeros bs=1m count=128
 $ time gzip -1  zeros  zeros.gz
 real  0m1.083s
 user  0m1.019s
 sys   0m0.052s
 $ time lzop  zeros  zeros.lzo
 real  0m0.186s
 user  0m0.123s
 sys   0m0.053s
 $ ls -lh zeros*
 -rw-r--r--  1 fgp  staff   128M Jun 25 14:47 zeros
 -rw-r--r--  1 fgp  staff   572K Jun 25 14:47 zeros.gz
 -rw-r--r--  1 fgp  staff   598K Jun 25 14:47 zeros.lzo
 
 To summarize, on my 2.66 Ghz Core2 Duo Macbook Pro, LZO compresses about
 350MB/s if the data is purely random, and about 800MB/s if the data
 compresses extremely well. (Numbers based on user time since that indicates
 the CPU time used, and ignores the IO overhead, which is substantial)
 
 IMHO, the only compelling argument (and a very compelling one) to use
 SSL compression was that it requires very little code on our side. We've
 since discovered that it's not actually that simple, at least if we want
 to support compression without authentication or encryption, and don't
 want to restrict ourselves to using OpenSSL forever. So unless we give
 up at least one of those requirements, the arguments for using
 SSL-compression are rather thin, I think.
 
 best regards,
 Florian Pflug
 
+1 for http://code.google.com/p/lz4/ support. It has a BSD license too.
Using SSL libraries give all the complexity without any real benefit.

Regards,
Ken

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] warning handling in Perl scripts

2012-06-25 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes:
 On sön, 2012-06-24 at 16:05 -0400, Robert Haas wrote:
 +local $SIG{__WARN__} = sub { die $_[0] };

 This seems like a band-aid.

 I'd think of it as a safety net.

+1 for the concept of turning warnings into errors, but is that really
the cleanest, most idiomatic way to do so in Perl?  Sheesh.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Catalog/Metadata consistency during changeset extraction from wal

2012-06-25 Thread Andres Freund
On Monday, June 25, 2012 03:08:51 AM Robert Haas wrote:
 On Sun, Jun 24, 2012 at 5:11 PM, Andres Freund and...@2ndquadrant.com 
wrote:
  There are some interesting problems related to locking and snapshots
  here. Not sure if they are resolvable:
  
  We need to restrict SnapshotNow to represent to the view it had back when
  the wal record were currently decoding had. Otherwise we would possibly
  get wrong column types and similar. As were working in the past locking
  doesn't protect us against much here. I have that (mostly and
  inefficiently).
  
  One interesting problem are table rewrites (truncate, cluster, some ALTER
  TABLE's) and dropping tables. Because we nudge SnapshotNow to the past
  view it had back when the wal record was created we get the old
  relfilenode. Which might have been dropped in part of the transaction
  cleanup...
  With most types thats not a problem. Even things like records and arrays
  aren't problematic. More interesting cases include VACUUM FULL $systable
  (e.g. pg_enum) and vacuum full'ing a table which is used in the *_out
  function of a type (like a user level pg_enum implementation).
  
  The only theoretical way I see against that problem would be to postpone
  all relation unlinks untill everything that could possibly read them has
  finished. Doesn't seem to alluring although it would be needed if we
  ever move more things of SnapshotNow.
  
  Input/Ideas/Opinions?
 
 Yeah, this is slightly nasty.  I'm not sure whether or not there's a
 way to make it work.
Postponing all non-rollback unlinks to the next logical checkpoint is the 
only thing I can think of...

 I had another idea.  Suppose decoding happens directly on the primary,
 because I'm still hoping there's a way to swing that.  Suppose further
 that we handle DDL by insisting that (1) any backend which wants to
 add columns or change the types of existing columns must first wait
 for logical replication to catch up and (2) if a backend which has
 added columns or changed the types of existing columns then writes to
 the modified table, decoding of those writes will be postponed until
 transaction commit.  I think that's enough to guarantee that the
 decoding process can just use the catalogs as they stand, with plain
 old SnapshotNow.
I don't think its that easy. If you e.g. have multiple ALTER's in the same 
transaction interspersed with inserted rows they will all have different 
TupleDesc's.
I don't see how thats resolvable without either replicating ddl to the target 
system or changing what SnapshotNow does...

 The downside of this approach is that it makes certain kinds of DDL
 suck worse if logical replication is in use and behind.  But I don't
 necessarily see that as prohibitive because (1) logical replication
 being behind is likely to suck for a lot of other reasons too and (2)
 adding or retyping columns isn't a terribly frequent operation and
 people already expect a hit when they do it.  Also, I suspect that we
 could find ways to loosen those restrictions at least in common cases
 in some future version; meanwhile, less work now.
Agreed.

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] pg_tablespace.spclocation column removed in 9.2

2012-06-25 Thread Tom Lane
Pavel Golub pa...@microolap.com writes:
 I'm aware of problems caused by this hard coded column. My proposal is
 to convert pg_tablespace to system view may be?

It's not that easy to make a 100% backwards compatible view for a system
catalog.  The main sticking point is the OID column; see recent problems
with pg_roles' OID column for an example.  On the whole I don't think
renaming pg_tablespace and inserting a mostly-compatible view would be
a net advance.

More generally, I don't see that this particular incompatibility in
the system catalogs is worse than many others that we've perpetrated.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] pg_tablespace.spclocation column removed in 9.2

2012-06-25 Thread Magnus Hagander
On Mon, Jun 25, 2012 at 3:46 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Pavel Golub pa...@microolap.com writes:
 I'm aware of problems caused by this hard coded column. My proposal is
 to convert pg_tablespace to system view may be?

 It's not that easy to make a 100% backwards compatible view for a system
 catalog.  The main sticking point is the OID column; see recent problems
 with pg_roles' OID column for an example.  On the whole I don't think
 renaming pg_tablespace and inserting a mostly-compatible view would be
 a net advance.

 More generally, I don't see that this particular incompatibility in
 the system catalogs is worse than many others that we've perpetrated.

I'd say it's a lot less bad than some others. At least for this one
you can reasonably connect and figure it out. There was the changes
for database config, I think, which made at least pgadmin break even
before it managed to connect properly. (It got the connection in the
libpq sense, but not in the pgadmin sense).

Bottom line is, any admin tool will *always* have to have to know
about the specific versions and have code to deal with being able to
run different queries on different versions anyway. And this one is
trivial to reimplement with a different query, compared to most
others.

Yes, if we had a set of those stable-system-views that people keep
asking for every now and then, this is one of the few changes that
they *would* actually help with. But there would still be changes that
even those couldn't deal with, because they simply can't know the
future...

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade broken by xlog numbering

2012-06-25 Thread Robert Haas
On Mon, Jun 25, 2012 at 8:11 AM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 On HEAD at the moment, `make check-world` is failing on a 32-bit Linux
 build:

 + pg_upgrade -d
 /home/kevin/pg/master/contrib/pg_upgrade/tmp_check/data.old -D
 /home/kevin/pg/master/contrib/pg_upgrade/tmp_check/data -b
 /home/kevin/pg/master/contrib/pg_upgrade/tmp_check/install//home/kevin/pg/master/Debug/bin
 -B
 /home/kevin/pg/master/contrib/pg_upgrade/tmp_check/install//home/kevin/pg/master/Debug/bin
 Performing Consistency Checks
 -
 Checking current, bin, and data directories                 ok
 Checking cluster versions                                   ok
 Some required control information is missing;  cannot find:
  first log file ID after reset
  first log file segment after reset

 Cannot continue without required control information, terminating
 Failure, exiting

On MacOS X, on latest sources, initdb fails:

creating directory /Users/rhaas/pgsql/src/test/regress/./tmp_check/data ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 32MB
creating configuration files ... ok
creating template1 database in
/Users/rhaas/pgsql/src/test/regress/./tmp_check/data/base/1 ... ok
initializing pg_authid ... ok
initializing dependencies ... ok
creating system views ... ok
loading system objects' descriptions ... ok
creating collations ... ok
creating conversions ... ok
creating dictionaries ... FATAL:  control file contains invalid data
child process exited with exit code 1
initdb: data directory
/Users/rhaas/pgsql/src/test/regress/./tmp_check/data not removed at
user's request

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH 04/16] Add embedded list interface (header only)

2012-06-25 Thread Tom Lane
Peter Geoghegan pe...@2ndquadrant.com writes:
 On 22 June 2012 01:04, Tom Lane t...@sss.pgh.pa.us wrote:
 This is nonsense.  There are at least three buildfarm machines running
 compilers that do not pretend to be gcc (at least, configure
 recognizes them as not gcc) and are not MSVC either.

 So those three don't have medium to high degrees of compatibility with GCC?

Uh, they all compile C, so perforce they have reasonable degrees of
compatibility with gcc.  That doesn't mean they implement gcc's
nonstandard extensions.

 We ought to have more IMO, because software monocultures are
 dangerous.  Of
 those three, two pass the quiet inline test and one --- the newest of the 
 three
 if I guess correctly --- does not.  So it is not the case that
 !USE_INLINE is dead code, even if you adopt the position that we don't
 care about any compiler not represented in the buildfarm.

 I note that you said that it doesn't pass the quiet inline test, and
 not that it doesn't support inline functions.

What's your point?  If the compiler isn't implementing inline the same
way gcc does, we can't use the same inlining arrangements.  I will be
the first to agree that C99's definition of inline sucks, but that
doesn't mean we can assume that gcc's version is implemented everywhere.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH 04/16] Add embedded list interface (header only)

2012-06-25 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On Friday, June 22, 2012 02:04:02 AM Tom Lane wrote:
 This is nonsense.  There are at least three buildfarm machines running
 compilers that do not pretend to be gcc (at least, configure
 recognizes them as not gcc) and are not MSVC either.

 Should there be no other trick - I think there is though - we could just 
 specify -W2177 as an alternative parameter to test in the 'quiet static 
 inline' test.

What is that, an MSVC switch?  If so it's rather irrelevant to non-MSVC
compilers.

 I definitely do not want to bar any sensible compiler from compiling postgres
 but the keyword here is 'sensible'. If it requires some modest force/trickery
 to behave sensible, thats ok, but if we need to ship around huge unreadable 
 crufty macros just to support them I don't find it ok.

So you propose to define any compiler that strictly implements C99 as
not sensible and not one that will be able to compile Postgres?  I do
not think that's acceptable.  I have no problem with producing better
code on gcc than elsewhere (as we already do), but being flat out broken
for compilers that don't match gcc's interpretation of inline is not
good enough.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade broken by xlog numbering

2012-06-25 Thread Thom Brown
On 25 June 2012 13:11, Kevin Grittner kevin.gritt...@wicourts.gov wrote:
 On HEAD at the moment, `make check-world` is failing on a 32-bit Linux
 build:

 + pg_upgrade -d
 /home/kevin/pg/master/contrib/pg_upgrade/tmp_check/data.old -D
 /home/kevin/pg/master/contrib/pg_upgrade/tmp_check/data -b
 /home/kevin/pg/master/contrib/pg_upgrade/tmp_check/install//home/kevin/pg/master/Debug/bin
 -B
 /home/kevin/pg/master/contrib/pg_upgrade/tmp_check/install//home/kevin/pg/master/Debug/bin
 Performing Consistency Checks
 -
 Checking current, bin, and data directories                 ok
 Checking cluster versions                                   ok
 Some required control information is missing;  cannot find:
  first log file ID after reset
  first log file segment after reset

 Cannot continue without required control information, terminating
 Failure, exiting

I get precisely the same on 64-bit Linux.

-- 
Thom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] warning handling in Perl scripts

2012-06-25 Thread David E. Wheeler
On Jun 25, 2012, at 3:35 PM, Tom Lane wrote:

 +1 for the concept of turning warnings into errors, but is that really
 the cleanest, most idiomatic way to do so in Perl?  Sheesh.

It’s the most backward-compatible, but the most idiomatic way to do it 
lexically is:

use warnings 'FATAL';

However, that works only for the current lexical scope. If there are warnings 
in the code you are calling from the current scope, the use of `local 
$SIG{__WARN__}` is required.

HTH,

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Catalog/Metadata consistency during changeset extraction from wal

2012-06-25 Thread Robert Haas
On Mon, Jun 25, 2012 at 9:43 AM, Andres Freund and...@2ndquadrant.com wrote:
  The only theoretical way I see against that problem would be to postpone
  all relation unlinks untill everything that could possibly read them has
  finished. Doesn't seem to alluring although it would be needed if we
  ever move more things of SnapshotNow.
 
  Input/Ideas/Opinions?

 Yeah, this is slightly nasty.  I'm not sure whether or not there's a
 way to make it work.
 Postponing all non-rollback unlinks to the next logical checkpoint is the
 only thing I can think of...

There are a number of cool things we could do if we postponed unlinks.
 Like, why can't we allow concurrent read-only queries while a CLUSTER
operation is in progress?  Well, two reasons.  The first is that we
currently can't do ANY DDL with less than a full table lock because of
SnapshotNow-related race conditions.  The second is that people might
still need to look at the old heap after the CLUSTER transaction
commits.  Some kind of delayed unlink facility where we
garbage-collect relation backing files when their refcount falls to
zero would solve the second problem - not that that's any help by
itself without a solution to the first one, but hey.

 I had another idea.  Suppose decoding happens directly on the primary,
 because I'm still hoping there's a way to swing that.  Suppose further
 that we handle DDL by insisting that (1) any backend which wants to
 add columns or change the types of existing columns must first wait
 for logical replication to catch up and (2) if a backend which has
 added columns or changed the types of existing columns then writes to
 the modified table, decoding of those writes will be postponed until
 transaction commit.  I think that's enough to guarantee that the
 decoding process can just use the catalogs as they stand, with plain
 old SnapshotNow.
 I don't think its that easy. If you e.g. have multiple ALTER's in the same
 transaction interspersed with inserted rows they will all have different
 TupleDesc's.

If new columns were added, then tuples created with those older
tuple-descriptors can still be interpreted with the latest
tuple-descriptor.

Columns that are dropped or retyped are a little trickier, but
honestly... how much do we care about those cases?  How practical is
it to suppose we're going to be able to handle them sanely anyway?
Suppose that the user defines a type which works just like int4 except
that the output functions writes out each number in pig latin (and the
input function parses pig latin).  The user defines the types as
binary coercible to each other and then does ALTER TABLE on a large
table with an int4 column, transforming it into an int4piglatin
column.  Due to Noah Misch's fine work, we will conclude that no table
rewrite is needed.  But if logical replication is in use, then in
theory we should scan the whole table and generate an LCR for each row
saying the row with primary key X was updated, and column Y, which
used to contain 42, now contains ourty-two-fay.  Otherwise, if we're
doing heterogenous replication into a system that just stores that
column as text, it'll end up with the wrong contents.  On the other
hand, if we're trying to ship data to another PostgreSQL instance
where the column hasn't yet been updated, then all of those LCRs are
just going to error out when we try to apply them.

A more realistic scenario where you have the same problem is with
something like ALTER TABLE .. ADD COLUMN .. DEFAULT.   If you add a
column with a default in a single step (as opposed to first adding the
column and then setting its default), we rewrite the table and set
every row to the default value.  Should that generate LCRs showing
every row being updated to add that new value, or should we generate
no LCRs and assume that the DBA will independently do the same
operation on the remote side?  Either answer could be correct,
depending on how the LCRs are being used.  If you're just rewriting
with a constant default, then perhaps the sensible thing is to
generate no LCRs, since it will be more efficient to mimic the
operation on the remote side than to replay the changes row-by-row.
But what if the default isn't a constant, like maybe it's
nextval('new_synthetic_pkey_seq') or even something like now().  In
those cases, it seems quite likely that if you don't generate LCRs,
manual user intervention will be required to get things back on track.
 On the other hand, if you do generate LCRs, the remote side will
become horribly bloated on replay, unless the LCRs also instruct the
far side that they should be applied via a full-table rewrite.

Can we just agree to punt all this complexity for version 1 (and maybe
versions 2, 3, and 4)?  I'm not sure what Slony does in situations
like this but I bet for a lot of replication systems, the answer is
do a full resync.  In other words, we either forbid the operation
outright when the table is enabled for logical replication, or else we
emit an LCR that 

Re: [HACKERS] [PATCH 04/16] Add embedded list interface (header only)

2012-06-25 Thread Andres Freund
On Monday, June 25, 2012 05:15:43 PM Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On Friday, June 22, 2012 02:04:02 AM Tom Lane wrote:
  This is nonsense.  There are at least three buildfarm machines running
  compilers that do not pretend to be gcc (at least, configure
  recognizes them as not gcc) and are not MSVC either.
  
  Should there be no other trick - I think there is though - we could just
  specify -W2177 as an alternative parameter to test in the 'quiet static
  inline' test.
 What is that, an MSVC switch?  If so it's rather irrelevant to non-MSVC
 compilers.
HP-UX/aCC, the only compiler in the buildfarm I found that seems to fall short 
in the quiet inline test.

MSVC seems to work fine with in supported versions, USE_INLINE is defined 
these days.

  I definitely do not want to bar any sensible compiler from compiling
  postgres but the keyword here is 'sensible'. If it requires some modest
  force/trickery to behave sensible, thats ok, but if we need to ship
  around huge unreadable crufty macros just to support them I don't find
  it ok.
 So you propose to define any compiler that strictly implements C99 as
 not sensible and not one that will be able to compile Postgres?  I do
 not think that's acceptable.  I have no problem with producing better
 code on gcc than elsewhere (as we already do), but being flat out broken
 for compilers that don't match gcc's interpretation of inline is not
 good enough.
I propose to treat any compiler which has no way to get to equivalent 
behaviour as not sensible. Yes. I don't think there really are many of those 
around. As you pointed out there is only one compiler in the buildfarm with 
problems and I think those can be worked around (can't test it yet though, the 
only HP-UX I could get my hands on quickly is at 11.11...).

Greetings,

Andres

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq compression

2012-06-25 Thread Euler Taveira
On 24-06-2012 23:04, Robert Haas wrote:
 So I think we really
 need someone to try this both ways and compare.  Right now it seems
 like we're mostly speculating on how well either approach would work,
 which is not as good as having some experimental results.
 
Not a problem. That's what I'm thinking too but I would like to make sure that
others don't object to general idea. Let me give it a try in both ideas...


-- 
   Euler Taveira de Oliveira - Timbira   http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade broken by xlog numbering

2012-06-25 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On MacOS X, on latest sources, initdb fails:

 creating directory /Users/rhaas/pgsql/src/test/regress/./tmp_check/data ... ok
 creating subdirectories ... ok
 selecting default max_connections ... 100
 selecting default shared_buffers ... 32MB
 creating configuration files ... ok
 creating template1 database in
 /Users/rhaas/pgsql/src/test/regress/./tmp_check/data/base/1 ... ok
 initializing pg_authid ... ok
 initializing dependencies ... ok
 creating system views ... ok
 loading system objects' descriptions ... ok
 creating collations ... ok
 creating conversions ... ok
 creating dictionaries ... FATAL:  control file contains invalid data
 child process exited with exit code 1

Same for me.  It's crashing here:

if (ControlFile-state  DB_SHUTDOWNED ||
ControlFile-state  DB_IN_PRODUCTION ||
!XRecOffIsValid(ControlFile-checkPoint))
ereport(FATAL,
(errmsg(control file contains invalid data)));

state == DB_SHUTDOWNED, so the problem is with the XRecOffIsValid test.
ControlFile-checkPoint == 19972072 (0x130BFE8), what's wrong with that?

(I suppose the reason this is only failing on some machines is
platform-specific variations in xlog entry size, but it's still a bit
distressing that this got committed in such a broken state.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] warning handling in Perl scripts

2012-06-25 Thread Alvaro Herrera

Excerpts from David E. Wheeler's message of lun jun 25 11:23:34 -0400 2012:
 On Jun 25, 2012, at 3:35 PM, Tom Lane wrote:
 
  +1 for the concept of turning warnings into errors, but is that really
  the cleanest, most idiomatic way to do so in Perl?  Sheesh.
 
 It’s the most backward-compatible, but the most idiomatic way to do it 
 lexically is:
 
 use warnings 'FATAL';
 
 However, that works only for the current lexical scope. If there are warnings 
 in the code you are calling from the current scope, the use of `local 
 $SIG{__WARN__}` is required.

So lets add 'FATAL' to the already existing use warnings lines in
Catalog.pm and genbki.pl.

I think the other files we should add this to  are generate-errcodes.pl,
generate-plerrorcodes.pl, generate-spiexceptions.pl, Gen_fmgrtab.pl.
Maybe psql/create_help.pl too.

We have a bunch of files in ECPG and MSVC areas and others in src/tools;
not sure about those.

We also have gen_qsort_tuple.pl which amusingly does not even
use warnings.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] new --maintenance-db options

2012-06-25 Thread Robert Haas
On Sat, Jun 23, 2012 at 6:26 PM, Peter Eisentraut pete...@gmx.net wrote:
 About the new --maintenance-db options:

 Why was this option not added to createuser and dropuser?  In the
 original discussion[0] they were mentioned, but it apparently never made
 it into the code.

Oops.  That was an oversight.

 I find the name to be unfortunate.  For example, I think of running
 vacuum as maintenance.  So running vacuumdb --maintenance-db=X would
 imply that the vacuum maintenance is done on X.  In fact, the whole
 point of this option is to find out where the maintenance is to be run,
 not to run the maintenance.  Maybe something like --initial-db would be
 better?

As Dave says, I picked this because pgAdmin has long used that terminology.

 What is the purpose of these options?  The initial discussion was
 unclear on this.  The documentation contains no explanation of why they
 should be used.  If we want to really support the case where both
 postgres and template1 are removed, an environment variable might be
 more useful than requiring this to be typed out for every command.

 [0]: 
 http://archives.postgresql.org/message-id/ca+tgmoacjwsis9nnqjgaaml1vg6c8b6o1ndgqnuca2gm00d...@mail.gmail.com

Well, I would be opposed to having ONLY an environment variable,
because I think that anything that can be controlled via an
environment variable should be able to be overridden on the command
line.  It might be OK to have both an environment variable AND a
command-line option, but I tend to thing it's too marginal to justify
that.

In retrospect, it seems as though it might have been a good idea to
make the postgres database read-only and undroppable, so that all
client utilities could count on being able to connect to it and get a
list of databases in the cluster without the need for all this
complexity.  Or else having some other way for a client to
authenticate and list out all the available databases.  In the absence
of such a mechanism, I don't think we can turn around and say that not
having a postgres database is an unsupported configuration, and
therefore we need some way to cope with it when it happens.

I think the original report that prompted this change was a complaint
that pg_upgrade failed when the postgres database had been dropped.
Now, admittedly, pg_upgrade fails for all kinds of crazy stupid
reasons and the chances of fixing that problem completely any time in
the next 5 years do not seem good, but that's not a reason not to keep
plugging the holes we can.  Anyhow, the same commit that introduced
--maintenance-db fixed that problem by making arranging to try both
postgres and template1 before giving up...  but have two hard-coded
database names either of which can be dropped or renamed seems only
marginally better than having one, hence the switch.  Really, I think
pg_upgrade needs this option too, unless we're going to kill the
problem at its root by providing a reliable way to enumerate database
names without first knowing the name one that you can connect to.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH 04/16] Add embedded list interface (header only)

2012-06-25 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On Monday, June 25, 2012 05:15:43 PM Tom Lane wrote:
 So you propose to define any compiler that strictly implements C99 as
 not sensible and not one that will be able to compile Postgres?

 I propose to treat any compiler which has no way to get to equivalent 
 behaviour as not sensible. Yes.

Well, my response is no.  I could see saying that we require (some) C99
features at this point, but not features that are in no standard, no
matter how popular gcc might be.

 I don't think there really are many of those 
 around. As you pointed out there is only one compiler in the buildfarm with 
 problems

This just means we don't have a wide enough collection of non-mainstream
machines in the buildfarm.  Deciding to break any platform with a
non-gcc-equivalent compiler isn't going to improve that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] warning handling in Perl scripts

2012-06-25 Thread David E. Wheeler
On Jun 25, 2012, at 5:51 PM, Alvaro Herrera wrote:

 However, that works only for the current lexical scope. If there are 
 warnings in the code you are calling from the current scope, the use of 
 `local $SIG{__WARN__}` is required.
 
 So lets add 'FATAL' to the already existing use warnings lines in
 Catalog.pm and genbki.pl.
 
 I think the other files we should add this to  are generate-errcodes.pl,
 generate-plerrorcodes.pl, generate-spiexceptions.pl, Gen_fmgrtab.pl.
 Maybe psql/create_help.pl too.
 
 We have a bunch of files in ECPG and MSVC areas and others in src/tools;
 not sure about those.
 
 We also have gen_qsort_tuple.pl which amusingly does not even
 use warnings.

Hrm, I think that `use warnings 'FATAL';` might only work for core warnings. 
Which is annoying. I missed what was warning up-thread, but the most foolproof 
way to make all warnings fatal is the originally suggested

  local $SIG{__WARN__} = sub { die shift };

A *bit* cleaner is to use Carp::croak:

use Carp;
local $SIG{__WARN__} = \croak;

Or if you wanted to get a stack trace out of it, use Carp::confess:

use Carp;
local $SIG{__WARN__} = \confess;

Exception-handling in Perl is one of the few places that annoy me regularly.

Best,

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade broken by xlog numbering

2012-06-25 Thread Robert Haas
On Mon, Jun 25, 2012 at 11:50 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On MacOS X, on latest sources, initdb fails:

 creating directory /Users/rhaas/pgsql/src/test/regress/./tmp_check/data ... 
 ok
 creating subdirectories ... ok
 selecting default max_connections ... 100
 selecting default shared_buffers ... 32MB
 creating configuration files ... ok
 creating template1 database in
 /Users/rhaas/pgsql/src/test/regress/./tmp_check/data/base/1 ... ok
 initializing pg_authid ... ok
 initializing dependencies ... ok
 creating system views ... ok
 loading system objects' descriptions ... ok
 creating collations ... ok
 creating conversions ... ok
 creating dictionaries ... FATAL:  control file contains invalid data
 child process exited with exit code 1

 Same for me.  It's crashing here:

    if (ControlFile-state  DB_SHUTDOWNED ||
        ControlFile-state  DB_IN_PRODUCTION ||
        !XRecOffIsValid(ControlFile-checkPoint))
        ereport(FATAL,
                (errmsg(control file contains invalid data)));

 state == DB_SHUTDOWNED, so the problem is with the XRecOffIsValid test.
 ControlFile-checkPoint == 19972072 (0x130BFE8), what's wrong with that?

 (I suppose the reason this is only failing on some machines is
 platform-specific variations in xlog entry size, but it's still a bit
 distressing that this got committed in such a broken state.)

I'm guessing that the problem is as follows: in the old code, the
XLogRecord header could not be split, so any offset that was closer to
the end of the page than SizeOfXLogRecord was a sure sign of trouble.
But commit 061e7efb1b4c5b8a5d02122b7780531b8d5bf23d relaxed that
restriction, so now it IS legal for the checkpoint record to be where
it is.  But it seems that XRecOffIsValid() didn't get the memo.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] warning handling in Perl scripts

2012-06-25 Thread Tom Lane
David E. Wheeler da...@justatheory.com writes:
 Hrm, I think that `use warnings 'FATAL';` might only work for core warnings. 
 Which is annoying. I missed what was warning up-thread, but the most 
 foolproof way to make all warnings fatal is the originally suggested

   local $SIG{__WARN__} = sub { die shift };

Sigh, let's do it that way then.

 A *bit* cleaner is to use Carp::croak:

 use Carp;
 local $SIG{__WARN__} = \croak;

Just as soon not add a new module dependency if we don't have to.
In this case, since we're not really expecting the warnings to get
thrown, it seems like there'd be little value added by doing so.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] warning handling in Perl scripts

2012-06-25 Thread Ryan Kelly
On Mon, Jun 25, 2012 at 12:07:55PM -0400, Tom Lane wrote:
 David E. Wheeler da...@justatheory.com writes:
  Hrm, I think that `use warnings 'FATAL';` might only work for core 
  warnings. Which is annoying. I missed what was warning up-thread, but the 
  most foolproof way to make all warnings fatal is the originally suggested
 
local $SIG{__WARN__} = sub { die shift };
 
 Sigh, let's do it that way then.
 
  A *bit* cleaner is to use Carp::croak:
 
  use Carp;
  local $SIG{__WARN__} = \croak;
 
 Just as soon not add a new module dependency if we don't have to.
Carp is core since Perl 5 [1994-10-17].

 In this case, since we're not really expecting the warnings to get
 thrown, it seems like there'd be little value added by doing so.
 
   regards, tom lane
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

-Ryan Kelly

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Remove sanity test in XRecOffIsValid.

2012-06-25 Thread Tom Lane
Robert Haas rh...@postgresql.org writes:
 Remove sanity test in XRecOffIsValid.

 Commit 061e7efb1b4c5b8a5d02122b7780531b8d5bf23d changed the rules
 for splitting xlog records across pages, but neglected to update this
 test.  It's possible that there's some better action here than just
 removing the test completely, but this at least appears to get some
 of the things that are currently broken (like initdb on MacOS X)
 working again.

Offhand, I'm wondering why this macro doesn't include a MAXALIGN test.
If it did, I don't think that the upper-limit test would have any
use anymore.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Catalog/Metadata consistency during changeset extraction from wal

2012-06-25 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote:
 
 I bet for a lot of replication systems, the answer is do a full
 resync.  In other words, we either forbid the operation outright
 when the table is enabled for logical replication, or else we emit
 an LCR that says, in effect, transaction 12345 monkeyed with the
 table, please resync.  It strikes me that it's really the job of
 some higher-level control logic to decide what the correct
 behavior is in these cases; the decoding process doesn't really
 have enough information about what the user is trying to do to
 make a sensible decision anyway.
 
This is clearly going to depend on the topology.  You would
definitely want to try to replicate the DDL for the case on which
Simon is focused (which seems to me to be essentially physical
replication of catalogs with logical replication of data changes
from any machine to all others).  What you do about transactions in
flight is the hard part.  You could try to suppress concurrent DML
of the same objects or have some complex matrix of rules for trying
to resolve the transactions in flight.  I don't see how the latter
could ever be 100% accurate.
 
In our shop it is much easier.  We always have one database which is
the only valid source for any tuple, although rows from many such
databases can be in one table, and one row might replicate to many
databases.  Thus, we don't want automatic replication of DDL.
 
 - When a column is going to be added to the source machines, we
   first add it to the targets, with either a default or as
   NULL-capable.
 
 - When a column is going to be deleted from the source machines, we
   make sure it is NULL-capable or has a default on the replicas. 
   We drop it from all replicas after it is gone from all sources.
 
 - If a column is changing name or is changing to a fundamentally
   different type we need to give the new column a new name, have
   triggers to convert old to new (and vice versa) on the replicas,
   and drop the old after all sources are updated.
 
 - If a column is changing in a minor way, like its precision, we
   make sure the replicas can accept either format until all sources
   have been converted.  We update the replicas to match the sources
   after all sources are converted.
 
We most particularly *don't* want DDL to replicate automatically,
because the schema changes are deployed along with related software
changes, and we like to pilot any changes for at least a few days. 
Depending on the release, the rollout may take a couple months, or
we may slam in out everywhere a few days after the first pilot
deployment.
 
So you could certainly punt all of this for any release as far as
Wisconsin Courts are concerned.  We need to know table and column
names, before and after images, and some application-supplied
metadata.
 
I don't know that what we're looking for is any easier (although I
doubt that it's any harder), but I'm starting to wonder how much
mechanism they can really share.  The 2Q code is geared toward page
format OIDs and data values for automated DDL distribution and
faster replication, while we're looking for something which works
between releases, architectures, and OSes.  We keep coming back to
the idea of one mechanism because both WAL and a logical transaction
stream would have after tuples, although they need them in
different formats.
 
I think the need for truly logical replication is obvious, since so
many different people have developed trigger-based versions of that.
And it sure seems like 2Q has clients who are willing to pay for the
other.
 
Perhaps the first question is: Is there enough in common between
logical replication (and all the topologies that might be created
with that) and the proposal on the table (which seems to be based
around one particular topology with a vague notion of bolting
logical replication on to it after the fact) to try to resolve the
differences in one feature?  Or should the identical schema with
multiple identical copies case be allowed to move forward more or
less in isolation, with logical replication having its own design if
and when someone wants to take it on?  Two non-compromised features
might be cleaner -- I'm starting to feel like we're trying to design
a toaster which can also water your garden.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Catalog/Metadata consistency during changeset extraction from wal

2012-06-25 Thread Robert Haas
On Mon, Jun 25, 2012 at 12:42 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Perhaps the first question is: Is there enough in common between
 logical replication (and all the topologies that might be created
 with that) and the proposal on the table (which seems to be based
 around one particular topology with a vague notion of bolting
 logical replication on to it after the fact) to try to resolve the
 differences in one feature?  Or should the identical schema with
 multiple identical copies case be allowed to move forward more or
 less in isolation, with logical replication having its own design if
 and when someone wants to take it on?  Two non-compromised features
 might be cleaner -- I'm starting to feel like we're trying to design
 a toaster which can also water your garden.

I think there are a number of shared pieces.  Being able to read WAL
and do something with it is a general need that both solutions share;
I think actually that might be the piece that we should try to get
committed first.  I suspect that there are a number of applications
for just that and nothing more - for example, it might allow a contrib
module that reads WAL as it's generated and prints out a debug trace,
which I can imagine being useful.

Also, I think that even for MMR there will be a need for control
logic, resynchronization, and similar mechanisms.  I mean, suppose you
have four servers in an MMR configuration.  Now, you want to deploy a
schema change that adds a new column and which, as it so happens,
requires a table rewrite to add the default.  It is very possible that
you do NOT want that to automatically replicate around the cluster.
Instead, you likely want to redirect load to the remaining three
servers, do the change on the fourth, put it back into the ring and
take out a different one, do the change on that one, and so on.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Remove sanity test in XRecOffIsValid.

2012-06-25 Thread Robert Haas
On Mon, Jun 25, 2012 at 12:35 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas rh...@postgresql.org writes:
 Remove sanity test in XRecOffIsValid.

 Commit 061e7efb1b4c5b8a5d02122b7780531b8d5bf23d changed the rules
 for splitting xlog records across pages, but neglected to update this
 test.  It's possible that there's some better action here than just
 removing the test completely, but this at least appears to get some
 of the things that are currently broken (like initdb on MacOS X)
 working again.

 Offhand, I'm wondering why this macro doesn't include a MAXALIGN test.
 If it did, I don't think that the upper-limit test would have any
 use anymore.

Yeah, I wondered that, too, but wasn't sure enough about what the real
alignment requirements were to do it myself.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade broken by xlog numbering

2012-06-25 Thread Robert Haas
On Mon, Jun 25, 2012 at 8:11 AM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 On HEAD at the moment, `make check-world` is failing on a 32-bit Linux
 build:

This appears to be because of the following hunk from commit
dfda6ebaec6763090fb78b458a979b558c50b39b:

@@ -558,10 +536,10 @@ PrintControlValues(bool guessed)
snprintf(sysident_str, sizeof(sysident_str), UINT64_FORMAT,
 ControlFile.system_identifier);

-   printf(_(First log file ID after reset:%u\n),
-  newXlogId);
-   printf(_(First log file segment after reset:   %u\n),
-  newXlogSeg);
+   XLogFileName(fname, ControlFile.checkPointCopy.ThisTimeLineID, newXlogSe
+
+   printf(_(First log segment after reset:%s\n),
+  fname);
printf(_(pg_control version number:%u\n),
   ControlFile.pg_control_version);
printf(_(Catalog version number:   %u\n),

Evidently, Heikki failed to realize that pg_upgrade gets the control
data information by parsing the output of pg_controldata.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH 04/16] Add embedded list interface (header only)

2012-06-25 Thread Andres Freund
On Monday, June 25, 2012 05:57:51 PM Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On Monday, June 25, 2012 05:15:43 PM Tom Lane wrote:
  So you propose to define any compiler that strictly implements C99 as
  not sensible and not one that will be able to compile Postgres?
  
  I propose to treat any compiler which has no way to get to equivalent
  behaviour as not sensible. Yes.

 Well, my response is no.  I could see saying that we require (some) C99
 features at this point, but not features that are in no standard, no
 matter how popular gcc might be.
I fail to see how gcc is the relevant point here given that there is 
equivalent definitions available from multiple compiler vendors.

Also, 'static inline' *is* C99 conforming as far as I can see? The problem 
with it is that some compilers may warn if the function isn't used in the same 
translation unit. Thats doesn't make not using a function non standard-
conforming though.

  I don't think there really are many of those
  around. As you pointed out there is only one compiler in the buildfarm
  with problems
 This just means we don't have a wide enough collection of non-mainstream
 machines in the buildfarm.  Deciding to break any platform with a
 non-gcc-equivalent compiler isn't going to improve that.
No, it won't improve that. But neither will the contrary.

Greetings,

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq compression

2012-06-25 Thread Greg Jaskiewicz
Wasn't this more of an issue in de-coupling compression from encryption ?

 
On 25 Jun 2012, at 16:36, Euler Taveira wrote:

 On 24-06-2012 23:04, Robert Haas wrote:
 So I think we really
 need someone to try this both ways and compare.  Right now it seems
 like we're mostly speculating on how well either approach would work,
 which is not as good as having some experimental results.
 
 Not a problem. That's what I'm thinking too but I would like to make sure that
 others don't object to general idea. Let me give it a try in both ideas...
 
 
 -- 
   Euler Taveira de Oliveira - Timbira   http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL format changes

2012-06-25 Thread Fujii Masao
On Mon, Jun 25, 2012 at 1:24 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Ok, committed all the WAL format changes now.

This breaks pg_resetxlog -l at all. When I ran pg_resetxlog -l 0x01,0x01,0x01
in the HEAD, I got the following error message though the same command
successfully completed in 9.1.

pg_resetxlog: invalid argument for option -l
Try pg_resetxlog --help for more information.

I think the attached patch needs to be applied.

Regards,

-- 
Fujii Masao


resetxlog_bugfix_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Catalog/Metadata consistency during changeset extraction from wal

2012-06-25 Thread Andres Freund
On Monday, June 25, 2012 05:34:13 PM Robert Haas wrote:
 On Mon, Jun 25, 2012 at 9:43 AM, Andres Freund and...@2ndquadrant.com 
wrote:
   The only theoretical way I see against that problem would be to
   postpone all relation unlinks untill everything that could possibly
   read them has finished. Doesn't seem to alluring although it would be
   needed if we ever move more things of SnapshotNow.
   
   Input/Ideas/Opinions?
  
  Yeah, this is slightly nasty.  I'm not sure whether or not there's a
  way to make it work.
  
  Postponing all non-rollback unlinks to the next logical checkpoint is
  the only thing I can think of...
 There are a number of cool things we could do if we postponed unlinks.
  Like, why can't we allow concurrent read-only queries while a CLUSTER
 operation is in progress?  Well, two reasons.  The first is that we
 currently can't do ANY DDL with less than a full table lock because of
 SnapshotNow-related race conditions.  The second is that people might
 still need to look at the old heap after the CLUSTER transaction
 commits.  Some kind of delayed unlink facility where we
 garbage-collect relation backing files when their refcount falls to
 zero would solve the second problem - not that that's any help by
 itself without a solution to the first one, but hey.
Its an argument why related infrastructure would be interesting to more than 
that patch and thats not bad.
If the garbage collecting is done in a very simplistic manner it doesn't sound 
too hard... The biggest problem is probably crash-recovery of that knowledge 
and how to hook knowledge into it that logical rep needs that data...

  I had another idea.  Suppose decoding happens directly on the primary,
  because I'm still hoping there's a way to swing that.  Suppose further
  that we handle DDL by insisting that (1) any backend which wants to
  add columns or change the types of existing columns must first wait
  for logical replication to catch up and (2) if a backend which has
  added columns or changed the types of existing columns then writes to
  the modified table, decoding of those writes will be postponed until
  transaction commit.  I think that's enough to guarantee that the
  decoding process can just use the catalogs as they stand, with plain
  old SnapshotNow.
  
  I don't think its that easy. If you e.g. have multiple ALTER's in the
  same transaction interspersed with inserted rows they will all have
  different TupleDesc's.
 
 If new columns were added, then tuples created with those older
 tuple-descriptors can still be interpreted with the latest
 tuple-descriptor.
But you need to figure that out. If you have just the before-after images of 
the tupledescs you don't know what happened in there... That would mean either 
doing special things on catalog changes or reassembling the meaning from the 
changed pg_* rows. Neither seems enticing.

 Columns that are dropped or retyped are a little trickier, but
 honestly... how much do we care about those cases?  How practical is
 it to suppose we're going to be able to handle them sanely anyway?
 Suppose that the user defines a type which works just like int4 except
 that the output functions writes out each number in pig latin (and the
 input function parses pig latin).  The user defines the types as
 binary coercible to each other and then does ALTER TABLE on a large
 table with an int4 column, transforming it into an int4piglatin
 column.  Due to Noah Misch's fine work, we will conclude that no table
 rewrite is needed.  But if logical replication is in use, then in
 theory we should scan the whole table and generate an LCR for each row
 saying the row with primary key X was updated, and column Y, which
 used to contain 42, now contains ourty-two-fay.  Otherwise, if we're
 doing heterogenous replication into a system that just stores that
 column as text, it'll end up with the wrong contents.  On the other
 hand, if we're trying to ship data to another PostgreSQL instance
 where the column hasn't yet been updated, then all of those LCRs are
 just going to error out when we try to apply them.

 A more realistic scenario where you have the same problem is with
 something like ALTER TABLE .. ADD COLUMN .. DEFAULT.   If you add a
 column with a default in a single step (as opposed to first adding the
 column and then setting its default), we rewrite the table and set
 every row to the default value.  Should that generate LCRs showing
 every row being updated to add that new value, or should we generate
 no LCRs and assume that the DBA will independently do the same
 operation on the remote side?  Either answer could be correct,
 depending on how the LCRs are being used.  If you're just rewriting
 with a constant default, then perhaps the sensible thing is to
 generate no LCRs, since it will be more efficient to mimic the
 operation on the remote side than to replay the changes row-by-row.
 But what if the default isn't a constant, like maybe it's
 

Re: [HACKERS] WAL format changes

2012-06-25 Thread Fujii Masao
On Mon, Jun 25, 2012 at 1:24 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Ok, committed all the WAL format changes now.

I found the typo.

In walsender.c
-reply.write.xlogid, reply.write.xrecoff,
-reply.flush.xlogid, reply.flush.xrecoff,
-reply.apply.xlogid, reply.apply.xrecoff);
+(uint32) (reply.write  32), (uint32) reply.write,
+(uint32) (reply.flush  32), (uint32) reply.flush,
+(uint32) (reply.apply  32), (uint32) reply.apply);

 should be . The attached patch fixes this typo.

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL format changes

2012-06-25 Thread Fujii Masao
On Tue, Jun 26, 2012 at 2:53 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Mon, Jun 25, 2012 at 1:24 AM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 Ok, committed all the WAL format changes now.

 I found the typo.

 In walsender.c
 -                reply.write.xlogid, reply.write.xrecoff,
 -                reply.flush.xlogid, reply.flush.xrecoff,
 -                reply.apply.xlogid, reply.apply.xrecoff);
 +                (uint32) (reply.write  32), (uint32) reply.write,
 +                (uint32) (reply.flush  32), (uint32) reply.flush,
 +                (uint32) (reply.apply  32), (uint32) reply.apply);

  should be . The attached patch fixes this typo.

Oh, I forgot to attach the patch.. Here is the patch.

Regards,

-- 
Fujii Masao


walsender_typo_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL format changes

2012-06-25 Thread Robert Haas
On Mon, Jun 25, 2012 at 1:57 PM, Fujii Masao masao.fu...@gmail.com wrote:
  should be . The attached patch fixes this typo.

 Oh, I forgot to attach the patch.. Here is the patch.

I committed both of the patches you posted to this thread.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Catalog/Metadata consistency during changeset extraction from wal

2012-06-25 Thread Robert Haas
On Mon, Jun 25, 2012 at 1:50 PM, Andres Freund and...@2ndquadrant.com wrote:
 Its an argument why related infrastructure would be interesting to more than
 that patch and thats not bad.
 If the garbage collecting is done in a very simplistic manner it doesn't sound
 too hard... The biggest problem is probably crash-recovery of that knowledge
 and how to hook knowledge into it that logical rep needs that data...

I suppose the main reason we haven't done it already is that it
increases the period of time during which we're using 2X the disk
space.

  I don't think its that easy. If you e.g. have multiple ALTER's in the
  same transaction interspersed with inserted rows they will all have
  different TupleDesc's.

 If new columns were added, then tuples created with those older
 tuple-descriptors can still be interpreted with the latest
 tuple-descriptor.
 But you need to figure that out. If you have just the before-after images of
 the tupledescs you don't know what happened in there... That would mean either
 doing special things on catalog changes or reassembling the meaning from the
 changed pg_* rows. Neither seems enticing.

I think there is absolutely nothing wrong with doing extra things in
ALTER TABLE when logical replication is enabled.  We've got code
that's conditional on Hot Standby being enabled in many places in the
system; why should logical replication be any different?  If we set
the bar for logical replication at the system can't do anything
differently when logical replication is enabled then I cheerfully
submit that we are doomed.  You've already made WAL format changes to
support logging the pre-image of the tuple, which is a hundred times
more likely to cause a performance problem than any monkeying around
we might want to do in ALTER TABLE.

 Can we just agree to punt all this complexity for version 1 (and maybe
 versions 2, 3, and 4)?  I'm not sure what Slony does in situations
 like this but I bet for a lot of replication systems, the answer is
 do a full resync.  In other words, we either forbid the operation
 outright when the table is enabled for logical replication, or else we
 emit an LCR that says, in effect, transaction 12345 monkeyed with the
 table, please resync.  It strikes me that it's really the job of some
 higher-level control logic to decide what the correct behavior is in
 these cases; the decoding process doesn't really have enough
 information about what the user is trying to do to make a sensible
 decision anyway.  It would be nice to be able to support some simple
 cases like adding a column that has no default or dropping a
 column without punting, but going much further than that seems like
 it will require embedding policy decisions that should really be
 happening at a higher level.
 I am totally fine with saying that we do not support everything from the
 start. But we need to choose an architecture where its possible to add that
 support gradually and I don't think without looking inside transaction makes
 that possible.

I am deeply skeptical that we need to look inside of transactions that
do full-table rewrites.  But even if we do, I don't see that what I'm
proposing precludes it.  For example, I think we could have ALTER
TABLE emit WAL records specifically for logical replication that allow
us to disentangle which tuple descriptor to use at which point in the
transaction.  I don't see that that would even be very difficult to
set up.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Catalog/Metadata consistency during changeset extraction from wal

2012-06-25 Thread Andres Freund
Hi,

(munching the mail from Robert and Kevin together)

On Monday, June 25, 2012 06:42:41 PM Kevin Grittner wrote:
 Robert Haas robertmh...@gmail.com wrote:
  I bet for a lot of replication systems, the answer is do a full
  resync.  In other words, we either forbid the operation outright
  when the table is enabled for logical replication, or else we emit
  an LCR that says, in effect, transaction 12345 monkeyed with the
  table, please resync.  It strikes me that it's really the job of
  some higher-level control logic to decide what the correct
  behavior is in these cases; the decoding process doesn't really
  have enough information about what the user is trying to do to
  make a sensible decision anyway.
 
 This is clearly going to depend on the topology.  You would
 definitely want to try to replicate the DDL for the case on which
 Simon is focused (which seems to me to be essentially physical
 replication of catalogs with logical replication of data changes
 from any machine to all others).  What you do about transactions in
 flight is the hard part.  You could try to suppress concurrent DML
 of the same objects or have some complex matrix of rules for trying
 to resolve the transactions in flight.  I don't see how the latter
 could ever be 100% accurate.
Yes. Thats why I dislike that proposal. I don't think thats going to be 
understandable and robust enough.

If we really look inside transactions (3b) and 1)) that shouldn't be a problem 
though. So I think it really has to be one of those.


 In our shop it is much easier.  We always have one database which is
 the only valid source for any tuple, although rows from many such
 databases can be in one table, and one row might replicate to many
 databases.  Thus, we don't want automatic replication of DDL.
 
  - When a column is going to be added to the source machines, we
first add it to the targets, with either a default or as
NULL-capable.
 
  - When a column is going to be deleted from the source machines, we
make sure it is NULL-capable or has a default on the replicas.
We drop it from all replicas after it is gone from all sources.
 
  - If a column is changing name or is changing to a fundamentally
different type we need to give the new column a new name, have
triggers to convert old to new (and vice versa) on the replicas,
and drop the old after all sources are updated.
 
  - If a column is changing in a minor way, like its precision, we
make sure the replicas can accept either format until all sources
have been converted.  We update the replicas to match the sources
after all sources are converted.

 We most particularly *don't* want DDL to replicate automatically,
 because the schema changes are deployed along with related software
 changes, and we like to pilot any changes for at least a few days.
 Depending on the release, the rollout may take a couple months, or
 we may slam in out everywhere a few days after the first pilot
 deployment.
Thats a sensible for your use-case - but I do not think its thats the 
appropriate behaviour for anything which is somewhat out-of-the box...

 So you could certainly punt all of this for any release as far as
 Wisconsin Courts are concerned.  We need to know table and column
 names, before and after images, and some application-supplied
 metadata.
I am not sure were going to get all that into 9.3. More on that below.

On Monday, June 25, 2012 07:09:38 PM Robert Haas wrote:
 On Mon, Jun 25, 2012 at 12:42 PM, Kevin Grittner wrote:
  I don't know that what we're looking for is any easier (although I
  doubt that it's any harder), but I'm starting to wonder how much
  mechanism they can really share.  The 2Q code is geared toward page
  format OIDs and data values for automated DDL distribution and
  faster replication, while we're looking for something which works
  between releases, architectures, and OSes.  We keep coming back to
  the idea of one mechanism because both WAL and a logical transaction
  stream would have after tuples, although they need them in
  different formats.
  
  I think the need for truly logical replication is obvious, since so
  many different people have developed trigger-based versions of that.
  And it sure seems like 2Q has clients who are willing to pay for the
  other.
 
  Perhaps the first question is: Is there enough in common between
  logical replication (and all the topologies that might be created
  with that) and the proposal on the table (which seems to be based
  around one particular topology with a vague notion of bolting
  logical replication on to it after the fact) to try to resolve the
  differences in one feature?  Or should the identical schema with
  multiple identical copies case be allowed to move forward more or
  less in isolation, with logical replication having its own design if
  and when someone wants to take it on?  Two non-compromised features
  might be cleaner -- I'm starting to feel like we're trying 

Re: [HACKERS] new --maintenance-db options

2012-06-25 Thread Alvaro Herrera

Excerpts from Robert Haas's message of lun jun 25 11:57:36 -0400 2012:

 Really, I think
 pg_upgrade needs this option too, unless we're going to kill the
 problem at its root by providing a reliable way to enumerate database
 names without first knowing the name one that you can connect to.

I think pg_upgrade could do this one task by using a standalone backend
instead of a full-blown postmaster.  It should be easy enough ...

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Catalog/Metadata consistency during changeset extraction from wal

2012-06-25 Thread Kevin Grittner
Andres Freund and...@2ndquadrant.com wrote:
 
 We most particularly *don't* want DDL to replicate automatically,
 because the schema changes are deployed along with related
 software changes, and we like to pilot any changes for at least a
 few days.  Depending on the release, the rollout may take a
 couple months, or we may slam in out everywhere a few days after
 the first pilot deployment.
 Thats a sensible for your use-case - but I do not think its thats
 the appropriate behaviour for anything which is somewhat
 out-of-the box...
 
Right.  We currently consider the issues involved in a change and
script it by hand.  I think we want to continue to do that.  The
point was that, given the variety of timings and techniques for
distributing schema changes, maybe is was only worth doing that
automatically for the most constrained and controlled cases.
 
 So you could certainly punt all of this for any release as far as
 Wisconsin Courts are concerned.  We need to know table and column
 names, before and after images, and some application-supplied
 metadata.
 I am not sure were going to get all that into 9.3.
 
Sure, that was more related to why I was questioning how much these
use cases even *could* integrate -- whether it even paid to
*consider* these use cases at this point.  Responses from Robert and
you have pretty much convinced me that I was being overly
pessimistic on that.
 
One fine point regarding before and after images -- if a value
doesn't change in an UPDATE, there's no reason to include it in both
the BEFORE and AFTER tuple images, as long as we have the null
column bitmaps -- or some other way of distinguishing unchanged from
NULL.  (This could be especially important when the unchanged column
was a 50 MB bytea.)  It doesn't matter to me which way this is
optimized -- in our trigger-based system we chose to keep the full
BEFORE tuple and just show AFTER values which were distinct from the
corresponding BEFORE values, but it would be trivial to adapt the
code to the other way.
 
Sorry for that bout of pessimism.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] new --maintenance-db options

2012-06-25 Thread Robert Haas
On Mon, Jun 25, 2012 at 2:49 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 Excerpts from Robert Haas's message of lun jun 25 11:57:36 -0400 2012:
 Really, I think
 pg_upgrade needs this option too, unless we're going to kill the
 problem at its root by providing a reliable way to enumerate database
 names without first knowing the name one that you can connect to.

 I think pg_upgrade could do this one task by using a standalone backend
 instead of a full-blown postmaster.  It should be easy enough ...

Maybe, but it seems like baking even more hackery into a tool that's
already got too much hackery.  It's also hard for pg_upgrade to know
things like - whether pg_hba.conf prohibits access to certain
users/databases/etc. or just requires the use of authentication
methods that happen to fail.  From pg_upgrade's perspective, it would
be nice to have a flag that starts the server in some mode where
nobody but pg_upgrade can connect to it and all connections are
automatically allowed, but it's not exactly clear how to implement
nobody but pg_upgrade can connect to it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] new --maintenance-db options

2012-06-25 Thread Alvaro Herrera

Excerpts from Robert Haas's message of lun jun 25 14:58:25 -0400 2012:
 
 On Mon, Jun 25, 2012 at 2:49 PM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:
  Excerpts from Robert Haas's message of lun jun 25 11:57:36 -0400 2012:
  Really, I think
  pg_upgrade needs this option too, unless we're going to kill the
  problem at its root by providing a reliable way to enumerate database
  names without first knowing the name one that you can connect to.
 
  I think pg_upgrade could do this one task by using a standalone backend
  instead of a full-blown postmaster.  It should be easy enough ...
 
 Maybe, but it seems like baking even more hackery into a tool that's
 already got too much hackery.  It's also hard for pg_upgrade to know
 things like - whether pg_hba.conf prohibits access to certain
 users/databases/etc. or just requires the use of authentication
 methods that happen to fail.  From pg_upgrade's perspective, it would
 be nice to have a flag that starts the server in some mode where
 nobody but pg_upgrade can connect to it and all connections are
 automatically allowed, but it's not exactly clear how to implement
 nobody but pg_upgrade can connect to it.

Well, have it specify a private socket directory, listen only on that
(not TCP), and bypass all pg_hba rules.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Catalog/Metadata consistency during changeset extraction from wal

2012-06-25 Thread Andres Freund
Hi,

On Monday, June 25, 2012 08:13:54 PM Robert Haas wrote:
 On Mon, Jun 25, 2012 at 1:50 PM, Andres Freund and...@2ndquadrant.com 
wrote:
  Its an argument why related infrastructure would be interesting to more
  than that patch and thats not bad.
  If the garbage collecting is done in a very simplistic manner it doesn't
  sound too hard... The biggest problem is probably crash-recovery of that
  knowledge and how to hook knowledge into it that logical rep needs that
  data...
 I suppose the main reason we haven't done it already is that it
 increases the period of time during which we're using 2X the disk
 space.
I find that an acceptable price if its optional. Making it such doesn't seem 
to be a problem for me.

   I don't think its that easy. If you e.g. have multiple ALTER's in the
   same transaction interspersed with inserted rows they will all have
   different TupleDesc's.
  
  If new columns were added, then tuples created with those older
  tuple-descriptors can still be interpreted with the latest
  tuple-descriptor.
  
  But you need to figure that out. If you have just the before-after images
  of the tupledescs you don't know what happened in there... That would
  mean either doing special things on catalog changes or reassembling the
  meaning from the changed pg_* rows. Neither seems enticing.
 
 I think there is absolutely nothing wrong with doing extra things in
 ALTER TABLE when logical replication is enabled.  We've got code
 that's conditional on Hot Standby being enabled in many places in the
 system; why should logical replication be any different?  If we set
 the bar for logical replication at the system can't do anything
 differently when logical replication is enabled then I cheerfully
 submit that we are doomed.  You've already made WAL format changes to
 support logging the pre-image of the tuple, which is a hundred times
 more likely to cause a performance problem than any monkeying around
 we might want to do in ALTER TABLE.

 I am deeply skeptical that we need to look inside of transactions that
 do full-table rewrites.  But even if we do, I don't see that what I'm
 proposing precludes it.  For example, I think we could have ALTER
 TABLE emit WAL records specifically for logical replication that allow
 us to disentangle which tuple descriptor to use at which point in the
 transaction.  I don't see that that would even be very difficult to
 set up.
Sorry, I was imprecise above: I have no problem doing some changes during 
ALTER TABLE if logical rep is enabled. I am worried though that to make that 
robust you would need loads of places that emit additional information:
* ALTER TABLE
* ALTER FUNCTIION
* ALTER OPERATOR
* ALTER/CREATE CAST
* TRUNCATE
* CLUSTER
* ...

I have the feeling that would we want to do that the full amount of required 
information would be rather high and end up being essentially the catalog. 
Just having an intermediate tupledesc doesn't help that much if you have e.g. 
record_out doing type lookups of its own.

There also is the issue you have talked about before, that a user-type might 
depend on values in other tables. Unless were ready to break at least 
transactional behaviour for those for now...) I don't see how decoding outside 
of the transaction is ever going to be valid? I wouldn't have a big problem 
declaring that as broken for now...

Greetings,

Andres

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq compression

2012-06-25 Thread Dimitri Fontaine
Magnus Hagander mag...@hagander.net writes:
 Or that it takes less code/generates cleaner code...

So we're talking about some LZO things such as snappy from google, and
that would be another run time dependency IIUC.

I think it's time to talk about fastlz:

  http://fastlz.org/
  http://code.google.com/p/fastlz/source/browse/trunk/fastlz.c

  551 lines of C code under MIT license, works also under windows

I guess it would be easy (and safe) enough to embed in our tree should
we decide going this way.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq compression

2012-06-25 Thread Florian Pflug
On Jun25, 2012, at 21:21 , Dimitri Fontaine wrote:
 Magnus Hagander mag...@hagander.net writes:
 Or that it takes less code/generates cleaner code...
 
 So we're talking about some LZO things such as snappy from google, and
 that would be another run time dependency IIUC.
 
 I think it's time to talk about fastlz:
 
  http://fastlz.org/
  http://code.google.com/p/fastlz/source/browse/trunk/fastlz.c
 
  551 lines of C code under MIT license, works also under windows
 
 I guess it would be easy (and safe) enough to embed in our tree should
 we decide going this way.

Agreed. If we extend the protocol to support compression, and not rely
on SSL, then let's pick one of these LZ77-style compressors, and let's
integrate it into our tree.

We should then also make it possible to enable compression only for
the server - client direction. Since those types of compressions are
usually pretty easy to decompress, that reduces the amount to work
non-libpq clients have to put in to take advantage of compression.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq compression

2012-06-25 Thread Phil Sorber
On Mon, Jun 25, 2012 at 3:45 PM, Florian Pflug f...@phlo.org wrote:
 On Jun25, 2012, at 21:21 , Dimitri Fontaine wrote:
 Magnus Hagander mag...@hagander.net writes:
 Or that it takes less code/generates cleaner code...

 So we're talking about some LZO things such as snappy from google, and
 that would be another run time dependency IIUC.

 I think it's time to talk about fastlz:

  http://fastlz.org/
  http://code.google.com/p/fastlz/source/browse/trunk/fastlz.c

  551 lines of C code under MIT license, works also under windows

 I guess it would be easy (and safe) enough to embed in our tree should
 we decide going this way.

 Agreed. If we extend the protocol to support compression, and not rely
 on SSL, then let's pick one of these LZ77-style compressors, and let's
 integrate it into our tree.

 We should then also make it possible to enable compression only for
 the server - client direction. Since those types of compressions are
 usually pretty easy to decompress, that reduces the amount to work
 non-libpq clients have to put in to take advantage of compression.

+1


 best regards,
 Florian Pflug


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH 04/16] Add embedded list interface (header only)

2012-06-25 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On Monday, June 25, 2012 05:57:51 PM Tom Lane wrote:
 Well, my response is no.  I could see saying that we require (some) C99
 features at this point, but not features that are in no standard, no
 matter how popular gcc might be.

 Also, 'static inline' *is* C99 conforming as far as I can see?

Hmm.  I went back and re-read the C99 spec, and it looks like most of
the headaches we had in the past with C99 inline are specific to the
case where you want an extern declaration to be available.  For a
function that exists *only* as a static it might be all right.  So maybe
I'm misremembering how well this would work.  We'd have to be sure we
don't need any extern declarations, though.

Having said that, I'm still of the opinion that it's not so hard to deal
with that we should just blow off compilers where inline doesn't work
well.  I have no sympathy at all for the we'd need two copies
argument.  First off, if the code is at any risk whatsoever of changing
intra-major-release, it is not acceptable to inline it (there would be
inline copies in third-party modules where we couldn't ensure
recompilation).  So that's going to force us to use this only in cases
where the code is small and stable enough that two copies aren't such
a big problem.  Second, it's not that hard to set things up so there's
only one source-code copy, as was noted upthread.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Catalog/Metadata consistency during changeset extraction from wal

2012-06-25 Thread Alvaro Herrera

Excerpts from Kevin Grittner's message of lun jun 25 14:50:54 -0400 2012:

 One fine point regarding before and after images -- if a value
 doesn't change in an UPDATE, there's no reason to include it in both
 the BEFORE and AFTER tuple images, as long as we have the null
 column bitmaps -- or some other way of distinguishing unchanged from
 NULL.  (This could be especially important when the unchanged column
 was a 50 MB bytea.)

Yeah, probably the best is to have the whole thing in BEFORE, and just
send AFTER values for those columns that changed, and include the
'replace' bool array (probably packed as a bitmap), so that the update
can be trivially constructed at the other end just like in
heap_modify_tuple.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] new --maintenance-db options

2012-06-25 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 From pg_upgrade's perspective, it would
 be nice to have a flag that starts the server in some mode where
 nobody but pg_upgrade can connect to it and all connections are
 automatically allowed, but it's not exactly clear how to implement
 nobody but pg_upgrade can connect to it.

The implementation I've wanted to see for some time is that you can
start a standalone backend, but it speaks FE/BE protocol to its caller
(preferably over pipes, so that there is no issue whatsoever of where
you can securely put a socket or anything like that).  Making that
happen might be a bit too much work if pg_upgrade were the only use
case, but there are a lot of people who would like to use PG as an
embedded database, and this might be close enough for such use-cases.

However, that has got little to do with whether --maintenance-db is a
worthwhile thing or not, because that's about external client-side
tools, not pg_upgrade.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH 04/16] Add embedded list interface (header only)

2012-06-25 Thread Peter Geoghegan
On 25 June 2012 20:59, Tom Lane t...@sss.pgh.pa.us wrote:
 Andres Freund and...@2ndquadrant.com writes:
 Also, 'static inline' *is* C99 conforming as far as I can see?

 Hmm.  I went back and re-read the C99 spec, and it looks like most of
 the headaches we had in the past with C99 inline are specific to the
 case where you want an extern declaration to be available.  For a
 function that exists *only* as a static it might be all right.  So maybe
 I'm misremembering how well this would work.  We'd have to be sure we
 don't need any extern declarations, though.

Yeah, the extern inline functions sounds at least superficially
similar to what happened with extern templates in C++ - exactly one
compiler vendor implemented them to the letter of the standard (they
remained completely unimplemented elsewhere), and subsequently went
bust, before they were eventually removed from the standard last year.

Note that when you build Postgres with Clang, it's implicitly and
automatically building C code as C99. There is an excellent analysis
of the situation here, under C99 inline functions:

http://clang.llvm.org/compatibility.html

 Having said that, I'm still of the opinion that it's not so hard to deal
 with that we should just blow off compilers where inline doesn't work
 well.

Fair enough.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq compression

2012-06-25 Thread k...@rice.edu
On Mon, Jun 25, 2012 at 09:45:26PM +0200, Florian Pflug wrote:
 On Jun25, 2012, at 21:21 , Dimitri Fontaine wrote:
  Magnus Hagander mag...@hagander.net writes:
  Or that it takes less code/generates cleaner code...
  
  So we're talking about some LZO things such as snappy from google, and
  that would be another run time dependency IIUC.
  
  I think it's time to talk about fastlz:
  
   http://fastlz.org/
   http://code.google.com/p/fastlz/source/browse/trunk/fastlz.c
  
   551 lines of C code under MIT license, works also under windows
  
  I guess it would be easy (and safe) enough to embed in our tree should
  we decide going this way.
 
 Agreed. If we extend the protocol to support compression, and not rely
 on SSL, then let's pick one of these LZ77-style compressors, and let's
 integrate it into our tree.
 
 We should then also make it possible to enable compression only for
 the server - client direction. Since those types of compressions are
 usually pretty easy to decompress, that reduces the amount to work
 non-libpq clients have to put in to take advantage of compression.
 
 best regards,
 Florian Pflug
 

Here is the benchmark list from the Google lz4 page:

NameRatio   C.speed D.speed
LZ4 (r59)   2.084   330  915
LZO 2.05 1x_1   2.038   311  480
QuickLZ 1.5 -1  2.233   257  277
Snappy 1.0.52.024   227  729
LZF 2.076   197  465
FastLZ  2.030   190  420
zlib 1.2.5 -1   2.72839  195
LZ4 HC (r66)2.71218 1020
zlib 1.2.5 -6   3.09514  210

lz4 absolutely dominates on compression/decompression speed. While fastlz
is faster than zlib(-1) on compression, lz4 is almost 2X faster still.

Regards,
Ken

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_stat_lwlocks view - lwlocks statistics

2012-06-25 Thread Satoshi Nagayasu
Hi all,

I've been working on a new system view, pg_stat_lwlocks, to observe
LWLock, and just completed my 'proof-of-concept' code that can work
with version 9.1.

Now, I'd like to know the possibility of this feature for future
release.

With this patch, DBA can easily determine a bottleneck around lwlocks.
--
postgres=# SELECT * FROM pg_stat_lwlocks ORDER BY time_ms DESC LIMIT 10;
 lwlockid | calls  | waits | time_ms
--++---+-
   49 | 193326 | 32096 |   23688
8 |   3305 |   133 |1335
2 | 21 | 0 |   0
4 | 135188 | 0 |   0
5 |  57935 | 0 |   0
6 |141 | 0 |   0
7 |  24580 | 1 |   0
3 |   3282 | 0 |   0
1 | 41 | 0 |   0
9 |  3 | 0 |   0
(10 rows)

postgres=#
--

In this view,
  'lwlockid' column represents LWLockId used in the backends.
  'calls' represents how many times LWLockAcquire() was called.
  'waits' represents how many times LWLockAcquire() needed to wait
  within it before lock acquisition.
  'time_ms' represents how long LWLockAcquire() totally waited on
  a lwlock.

And lwlocks that use a LWLockId range, such as BufMappingLock or
LockMgrLock, would be grouped and summed up in a single record.
For example, lwlockid 49 in the above view represents LockMgrLock
statistics.

Now, I know there are some considerations.

(1) Performance

  I've measured LWLock performance both with and without the patch,
  and confirmed that this patch does not affect the LWLock perfomance
  at all.

  pgbench scores with the patch:
tps = 900.906658 (excluding connections establishing)
tps = 908.528422 (excluding connections establishing)
tps = 903.900977 (excluding connections establishing)
tps = 910.470595 (excluding connections establishing)
tps = 909.685396 (excluding connections establishing)

  pgbench scores without the patch:
tps = 909.096785 (excluding connections establishing)
tps = 894.868712 (excluding connections establishing)
tps = 910.074669 (excluding connections establishing)
tps = 904.022770 (excluding connections establishing)
tps = 895.673830 (excluding connections establishing)

  Of course, this experiment was not I/O bound, and the cache hit ratio
  was 99.9%.

(2) Memory space

  In this patch, I added three new members to LWLock structure
  as uint64 to collect statistics.

  It means that those members must be held in the shared memory,
  but I'm not sure whether it's appropriate.

  I think another possible option is holding those statistics
  values in local (backend) process memory, and send them through
  the stat collector process (like other statistics values).

(3) LWLock names (or labels)

  Now, pg_stat_lwlocks view shows LWLockId itself. But LWLockId is
  not easy for DBA to determine actual lock type.

  So, I want to show LWLock names (or labels), like 'WALWriteLock'
  or 'LockMgrLock', but how should I implement it?

Any comments?

Regards,
-- 
Satoshi Nagayasu sn...@uptime.jp
Uptime Technologies, LLC. http://www.uptime.jp
diff -rc postgresql-9.1.2.orig/src/backend/catalog/postgres.bki 
postgresql-9.1.2/src/backend/catalog/postgres.bki
*** postgresql-9.1.2.orig/src/backend/catalog/postgres.bki  2012-06-20 
03:32:46.0 +0900
--- postgresql-9.1.2/src/backend/catalog/postgres.bki   2012-06-26 
01:51:52.0 +0900
***
*** 1553,1558 
--- 1553,1559 
  insert OID = 3071 ( pg_xlog_replay_pause 11 10 12 1 0 0 f f f t f v 0 0 2278 
 _null_ _null_ _null_ _null_ pg_xlog_replay_pause _null_ _null_ _null_ )
  insert OID = 3072 ( pg_xlog_replay_resume 11 10 12 1 0 0 f f f t f v 0 0 2278 
 _null_ _null_ _null_ _null_ pg_xlog_replay_resume _null_ _null_ _null_ )
  insert OID = 3073 ( pg_is_xlog_replay_paused 11 10 12 1 0 0 f f f t f v 0 0 
16  _null_ _null_ _null_ _null_ pg_is_xlog_replay_paused _null_ _null_ _null_ 
)
+ insert OID = 3764 ( pg_stat_get_lwlocks 11 10 12 1 100 0 f f f f t s 0 0 2249 
 {20,20,20,20} {o,o,o,o} {lwlockid,calls,waits,time_ms} _null_ 
pg_stat_get_lwlocks _null_ _null_ _null_ )
  insert OID = 2621 ( pg_reload_conf 11 10 12 1 0 0 f f f t f v 0 0 16  
_null_ _null_ _null_ _null_ pg_reload_conf _null_ _null_ _null_ )
  insert OID = 2622 ( pg_rotate_logfile 11 10 12 1 0 0 f f f t f v 0 0 16  
_null_ _null_ _null_ _null_ pg_rotate_logfile _null_ _null_ _null_ )
  insert OID = 2623 ( pg_stat_file 11 10 12 1 0 0 f f f t f v 1 0 2249 25 
{25,20,1184,1184,1184,1184,16} {i,o,o,o,o,o,o} 
{filename,size,access,modification,change,creation,isdir} _null_ pg_stat_file 
_null_ _null_ _null_ )
diff -rc postgresql-9.1.2.orig/src/backend/catalog/postgres.description 
postgresql-9.1.2/src/backend/catalog/postgres.description
*** postgresql-9.1.2.orig/src/backend/catalog/postgres.description  
2012-06-20 03:32:46.0 +0900
--- 

Re: [HACKERS] libpq compression

2012-06-25 Thread Euler Taveira
On 25-06-2012 16:45, Florian Pflug wrote:
 Agreed. If we extend the protocol to support compression, and not rely
 on SSL, then let's pick one of these LZ77-style compressors, and let's
 integrate it into our tree.
 
If we have an option to have it out of our tree, good; if not, let's integrate
it. I, particularly, don't see a compelling reason to integrate compression
code in our tree, I mean, if we want to support more than one algorithm, it is
clear that the overhead for maintain the compression code is too high (a lot
of my-new-compression-algorithms).

 We should then also make it possible to enable compression only for
 the server - client direction. Since those types of compressions are
 usually pretty easy to decompress, that reduces the amount to work
 non-libpq clients have to put in to take advantage of compression.
 
I don't buy this idea. My use case (data load) will not be covered if we only
enable server - client compression. I figure that there are use cases for
server - client compression (replication, for example) but also there are
important use cases for client - server (data load, for example). If you
implement decompression, why not code compression code too?


-- 
   Euler Taveira de Oliveira - Timbira   http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq compression

2012-06-25 Thread Euler Taveira
On 25-06-2012 14:30, Greg Jaskiewicz wrote:
 Wasn't this more of an issue in de-coupling compression from encryption ?
 
Let me give a try to take some conclusion. If we decide for an independent
compression code instead of an SSL-based one, that is a possibility to be
tested: SSL + SSL-based compression x SSL + our compression code. If the
latter is faster then we could discuss the possibility to disable compression
in our SSL code and put in documentation that it is recommended to enable
compression in SSL connections.


-- 
   Euler Taveira de Oliveira - Timbira   http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq compression

2012-06-25 Thread Merlin Moncure
On Mon, Jun 25, 2012 at 3:38 PM, Euler Taveira eu...@timbira.com wrote:
 On 25-06-2012 16:45, Florian Pflug wrote:
 Agreed. If we extend the protocol to support compression, and not rely
 on SSL, then let's pick one of these LZ77-style compressors, and let's
 integrate it into our tree.

 If we have an option to have it out of our tree, good; if not, let's integrate
 it. I, particularly, don't see a compelling reason to integrate compression
 code in our tree, I mean, if we want to support more than one algorithm, it is
 clear that the overhead for maintain the compression code is too high (a lot
 of my-new-compression-algorithms).

 We should then also make it possible to enable compression only for
 the server - client direction. Since those types of compressions are
 usually pretty easy to decompress, that reduces the amount to work
 non-libpq clients have to put in to take advantage of compression.

 I don't buy this idea. My use case (data load) will not be covered if we only
 enable server - client compression. I figure that there are use cases for
 server - client compression (replication, for example) but also there are
 important use cases for client - server (data load, for example). If you
 implement decompression, why not code compression code too?

+1.  lz4, which is looking like a strong candidate,  has c#, java,
etc. which are the main languages that are going to lag behind in
terms of protocol support.  I don't think you're saving a lot by going
only one way (although you could make a case for the client to signal
interest in compression separately from decompression?)

another point:
It's been obvious for years now that zlib is somewhat of a dog in
terms of cpu usage for what it gives you.  however, raw performance #s
are not the whole story -- it would be interesting to compress real
world protocol messages to/from the server in various scenarios to see
how compression would work, in particular with OLTP type queries --
for example pgbench runs, etc. That would speak a lot more to the
benefits than canned benchmarks.

merlin

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_lwlocks view - lwlocks statistics

2012-06-25 Thread Josh Berkus
On 6/25/12 1:29 PM, Satoshi Nagayasu wrote:
 (1) Performance
 
   I've measured LWLock performance both with and without the patch,
   and confirmed that this patch does not affect the LWLock perfomance
   at all.

This would be my main concern with this patch; it's hard for me to
imagine that it has no performance impact *at all*, since trace_lwlocks
has quite a noticable one in my experience.  However, the answer to that
is to submit the patch and let people test.

I will remark that it would be far more useful to me if we could also
track lwlocks per session.  Overall counts are somewhat useful, but more
granular counts are even more useful.  What period of time does the
table cover?  Since last reset?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL format changes

2012-06-25 Thread Heikki Linnakangas

On 25.06.2012 21:01, Robert Haas wrote:

On Mon, Jun 25, 2012 at 1:57 PM, Fujii Masaomasao.fu...@gmail.com  wrote:

 should be. The attached patch fixes this typo.


Oh, I forgot to attach the patch.. Here is the patch.


I committed both of the patches you posted to this thread.


Thanks Robert. I was thinking that pg_resetxlog -l would accept a WAL 
file name, instead of comma-separated tli, xlogid, segno arguments. The 
latter is a bit meaningless now that we don't use the xlogid+segno 
combination anywhere else. Alvaro pointed out that pg_upgrade was broken 
by the change in pg_resetxlog -n output - I changed that too to print 
the First log segment after reset information as a WAL file name, 
instead of logid+segno. Another option would be to print the 64-bit 
segment number, but I think that's worse, because the 64-bit 	segment 
number is harder to associate with a physical WAL file.


So I think we should change pg_resetxlog -l option to take a WAL file 
name as argument, and fix pg_upgrade accordingly.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH 08/16] Introduce the ApplyCache module which can reassemble transactions from a stream of interspersed changes

2012-06-25 Thread Steve Singer

On 12-06-21 04:37 AM, Andres Freund wrote:

Hi Steve,
Thanks!



Attached is a detailed review of the patch.

Very good analysis, thanks!

Another reasons why we cannot easily do 1) is that subtransactions aren't
discernible from top-level transactions before the top-level commit happens,
we can only properly merge in the right order (by sorting via lsn) once we
have seen the commit record which includes a list of all committed
subtransactions.



Based on that and your comments further down in your reply (and that no 
one spoke up and disagreed with you) It sounds like that doing (1) isn't 
going to be practical.



I also don't think 1) would be particularly welcome by people trying to
replicate into foreign systems.



They could still sort the changes into transaction groups before 
applying to the foreign system.



I planned to have some cutoff 'max_changes_in_memory_per_txn' value. 
If it has

been reached for one transaction all existing changes are spilled to disk. New
changes again can be kept in memory till its reached again.

Do you want max_changes_per_in_memory_txn or do you want to put a limit 
on the total amount of memory that the cache is able to use? How are you 
going to tell a DBA to tune max_changes_in_memory_per_txn? They know how 
much memory their system has and that they can devote to the apply cache 
versus other things, giving them guidance on how estimating how much 
open transactions they might have at a point in time  and how many
WAL change records each transaction generates seems like a step 
backwards from the progress we've been making in getting Postgresql to 
be easier to tune.  The maximum number of transactions that could be 
opened at a time is governed by max_connections on the master at the 
time the WAL was generated , so I don't even see how the machine 
processing the WAL records could autotune/guess that.





We need to support serializing the cache for crash recovery + shutdown of the
receiving side as well. Depending on how we do the wal decoding we will need
it more frequently...



Have you described your thoughts on crash recovery on another thread?

I am thinking that this module would have to serialize some state 
everytime it calls cache-commit() to ensure that consumers don't get 
invoked twice on the same transaction.


If the apply module is making changes to the same backend that the apply 
cache serializes to then both the state for the apply cache and the 
changes that committed changes/transactions make will be persisted (or 
not persisted) together.   What if I am replicating from x86 to x86_64 
via a apply module that does textout conversions?


x86 Proxy x86_64
WAL-- apply
 cache
  |   (proxy catalog)
  |
 apply module
  textout  -
  SQL statements


How do we ensure that the commits are all visible(or not visible)  on 
the catalog on the proxy instance used for decoding WAL, the destination 
database, and the state + spill files of the apply cache stay consistent 
in the event of a crash of either the proxy or the target?
I don't think you can (unless we consider two-phase commit, and I'd 
rather we didn't).  Can we come up with a way of avoiding the need for 
them to be consistent with each other?


I think apply modules will need to be able to be passed the same 
transaction twice (once before the crash and again after) and come up 
with a  way of deciding if that transaction has  a) Been applied to the 
translation/proxy catalog and b) been applied on the replica instance.   
How is the walreceiver going to decide which WAL sgements it needs to 
re-process after a crash?  I would want to see more of these details 
worked out before we finalize the interface between the apply cache and 
the apply modules and how the serialization works.



Code Review
=

applycache.h
---
+typedef struct ApplyCacheTupleBuf
+{
+/* position in preallocated list */
+ilist_s_node node;
+
+HeapTupleData tuple;
+HeapTupleHeaderData header;
+char data[MaxHeapTupleSize];
+} ApplyCacheTupleBuf;

Each ApplyCacheTupleBuf will be about 8k (BLKSZ) big no matter how big 
the data in the transaction is? Wouldn't workloads with inserts of lots 
of small rows in a transaction eat up lots of memory that is allocated 
but storing nothing?  The only alternative I can think of is dynamically 
allocating these and I don't know what the cost/benefit of that overhead 
will be versus spilling to disk sooner.


+* FIXME: better name
+ */
+ApplyCacheChange*
+ApplyCacheGetChange(ApplyCache*);

How about:

ApplyCacheReserveChangeStruct(..)
ApplyCacheReserveChange(...)
ApplyCacheAllocateChange(...)

as ideas?
+/*
+ * Return an unused ApplyCacheChange struct
 +*/
+void
+ApplyCacheReturnChange(ApplyCache*, ApplyCacheChange*);


Re: [HACKERS] WAL format changes

2012-06-25 Thread Alvaro Herrera

Excerpts from Heikki Linnakangas's message of lun jun 25 20:09:34 -0400 2012:
 On 25.06.2012 21:01, Robert Haas wrote:
  On Mon, Jun 25, 2012 at 1:57 PM, Fujii Masaomasao.fu...@gmail.com  wrote:
   should be. The attached patch fixes this typo.
 
  Oh, I forgot to attach the patch.. Here is the patch.
 
  I committed both of the patches you posted to this thread.
 
 Thanks Robert. I was thinking that pg_resetxlog -l would accept a WAL 
 file name, instead of comma-separated tli, xlogid, segno arguments. The 
 latter is a bit meaningless now that we don't use the xlogid+segno 
 combination anywhere else. Alvaro pointed out that pg_upgrade was broken 
 by the change in pg_resetxlog -n output - I changed that too to print 
 the First log segment after reset information as a WAL file name, 
 instead of logid+segno. Another option would be to print the 64-bit 
 segment number, but I think that's worse, because the 64-bit segment 
 number is harder to associate with a physical WAL file.
 
 So I think we should change pg_resetxlog -l option to take a WAL file 
 name as argument, and fix pg_upgrade accordingly.

The only thing pg_upgrade does with the tli/logid/segno combo, AFAICT,
is pass it back to pg_resetxlog -l, so this plan seems reasonable.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump and dependencies and --section ... it's a mess

2012-06-25 Thread Tom Lane
I wrote:
 Barring objections or better ideas, I'll push forward with applying
 this patch and the dependency-fixup patch.

Applied and back-patched.

BTW, I had complained at the end of the pgsql-bugs thread about bug #6699
that it seemed like FK constraints would get prevented from being
restored in parallel fashion if they depended on a constraint-style
unique index, because the dependency for the FK constraint would
reference the index's dump ID, which is nowhere to be seen in the dump.
But in fact they are restored in parallel, and the reason is that
pg_restore silently ignores any references to unknown dump IDs (a hack
put in specifically because of the bogus dependency data, no doubt).
So that raises the opposite question: how come pg_restore doesn't fall
over from trying to create the FK constraint before the unique index it
depends on is created?  And the answer to that is dumb luck, more or
less.  The locking dependencies hack in pg_restore knows that creation
of an FK constraint requires exclusive lock on both tables, so it won't
try to restore the FK constraint before creation of the constraint-style
index is done.  So getting the dependency information fixed is a
necessary prerequisite for any attempt to reduce the locking
requirements there.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL format changes

2012-06-25 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 So I think we should change pg_resetxlog -l option to take a WAL file 
 name as argument, and fix pg_upgrade accordingly.

Seems reasonable I guess.  It's really specifying a starting WAL
location, but only to file granularity, so treating the argument as a
file name is sort of a type cheat but seems convenient.

If we do it that way, we'd better validate that the argument is a legal
WAL file name, so as to catch any cases where somebody tries to do it
old-style.

BTW, does pg_resetxlog's logic for setting the default -l value (from
scanning pg_xlog to find the largest existing file name) still work?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_lwlocks view - lwlocks statistics

2012-06-25 Thread Satoshi Nagayasu

2012/06/26 6:44, Josh Berkus wrote:

On 6/25/12 1:29 PM, Satoshi Nagayasu wrote:

(1) Performance

   I've measured LWLock performance both with and without the patch,
   and confirmed that this patch does not affect the LWLock perfomance
   at all.


This would be my main concern with this patch; it's hard for me to
imagine that it has no performance impact *at all*, since trace_lwlocks
has quite a noticable one in my experience.  However, the answer to that
is to submit the patch and let people test.


Thanks. I will submit the patch to the CommitFest page with some fixes
to be able to work with the latest PostgreSQL on Git.


I will remark that it would be far more useful to me if we could also
track lwlocks per session.  Overall counts are somewhat useful, but more
granular counts are even more useful.  What period of time does the
table cover?  Since last reset?


Yes. it has not yet been implemented yet since this code is just a PoC
one, but it is another design issue which needs to be discussed.

To implement it, a new array can be added in the local process memory
to hold lwlock statistics, and update counters both in the shared
memory and the local process memory at once. Then, the session can
retrieve 'per-session' statistics from the local process memory
via some dedicated function.

Does it make sense? Any comments?

Regards,
--
Satoshi Nagayasu sn...@uptime.jp
Uptime Technologies, LLC. http://www.uptime.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: avoid heavyweight locking on hash metapage

2012-06-25 Thread Jeff Janes
On Mon, Jun 18, 2012 at 5:42 PM, Robert Haas robertmh...@gmail.com wrote:

 Hmm.  That was actually a gloss I added on existing code to try to
 convince myself that it was safe; I don't think that the changes I
 made make that any more or less safe than it was before.

Right, sorry.  I thought there was some strength reduction going on
there as well.

Thanks for the various explanations, they address my concerns.  I see
that v2 applies over v1.

I've verified performance improvements using 8 cores with my proposed
pgbench -P benchmark, with a scale that fits in shared_buffers.
It brings it most of the way, but not quite, up to the btree performance.


I've marked this as ready for committer.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_lwlocks view - lwlocks statistics

2012-06-25 Thread Satoshi Nagayasu

2012/06/26 7:04, Kevin Grittner wrote:

Josh Berkusj...@agliodbs.com  wrote:

On 6/25/12 1:29 PM, Satoshi Nagayasu wrote:

(1) Performance

   I've measured LWLock performance both with and without the
   patch, and confirmed that this patch does not affect the LWLock
   perfomance at all.


This would be my main concern with this patch; it's hard for me to
imagine that it has no performance impact *at all*, since
trace_lwlocks has quite a noticable one in my experience.
However, the answer to that is to submit the patch and let people
test.


I think overhead is going to depend quite a bit on the
gettimeofday() implementation, since that is called twice per lock
wait.


Yes.
It's one of my concerns, and what I actually want hackers to test.


It looked to me like there was nothing to prevent concurrent updates
of the counts while gathering the accumulated values for display.
Won't this be a problem on 32-bit builds?


Actually, I'd like to know how I can improve my code in a 32bit box.

However, unfortunately I don't have any 32bit (physical) box now,
so I want someone to test it if it needs to be tested.


Please add this to the Open COmmitFest for a proper review:

https://commitfest.postgresql.org/action/commitfest_view/open


Will submit soon. Thanks.



-Kevin



Regards,
--
Satoshi Nagayasu sn...@uptime.jp
Uptime Technologies, LLC. http://www.uptime.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL format changes

2012-06-25 Thread Amit Kapila
From: pgsql-hackers-ow...@postgresql.org
[mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
  So I think we should change pg_resetxlog -l option to take a WAL file 
  name as argument, and fix pg_upgrade accordingly.

 Seems reasonable I guess.  It's really specifying a starting WAL
 location, but only to file granularity, so treating the argument as a
 file name is sort of a type cheat but seems convenient.

 If we do it that way, we'd better validate that the argument is a legal
 WAL file name, so as to catch any cases where somebody tries to do it
 old-style.

 BTW, does pg_resetxlog's logic for setting the default -l value (from
 scanning pg_xlog to find the largest existing file name) still work?
  

It finds the segment number for largest existing file name from pg_xlog and
then compare it with input provided by the 
user for -l Option, if input is greater it will use the input to set in
control file.

With Regards,
Amit Kapila.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] lock_timeout and common SIGALRM framework

2012-06-25 Thread Alvaro Herrera
I cleaned up the framework patch a bit.  My version's attached.  Mainly,
returning false for failure in some code paths that are only going to
have the caller elog(FATAL) is rather pointless -- it seems much better
to just have the code itself do the elog(FATAL).  In any case, I
searched for reports of those error messages being reported in the wild
and saw none.

There are other things but they are in the nitpick department.  (A
reference to -check_timeout remains that needs to be fixed too).

There is one thing that still bothers me on this whole framework patch,
but I'm not sure it's easily fixable.  Basically the API to timeout.c is
the whole list of timeouts and their whole behaviors.  If you want to
add a new one, you can't just call into the entry points, you have to
modify timeout.c itself (as well as timeout.h as well as whatever code
it is that you want to add timeouts to).  This may be good enough, but I
don't like it.  I think it boils down to proctimeout.h is cheating.

The interface I would actually like to have is something that lets the
modules trying to get a timeout register the timeout-checking function
as a callback.  API-wise, it would be much cleaner.  However, I'm not
raelly sure that code-wise it would be any cleaner at all.  In fact I
think it'd be rather cumbersome.  However, if you could give that idea
some thought, it'd be swell.

I haven't looked at the second patch at all yet.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


1-timeout-framework-v9.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] proof concept - access to session variables on client side

2012-06-25 Thread Pavel Stehule
Hello

I worked on simple patch, that enable access from server side to
client side data. It add two new hooks to libpq - one for returning of
local context, second for setting of local context.

A motivation is integration of possibilities of psql console together
with stronger language - plpgsql. Second target is enabling
possibility to save a result of some server side process in psql. It
improve vars feature in psql.

pavel ~/src/postgresql/src $ cat test.sql
\echo value of external paremeter is :myvar

do $$
begin
  -- we can take any session variable on client side
  -- it is safe against to SQL injection
  raise notice 'external parameter accessed from plpgsql is %',
hgetvar('myvar');

  -- we can change this session variable and finish transaction
  perform hsetvar('myvar', 'Hello, World');
end;
$$ language plpgsql;

\echo new value of session variable is :myvar

cat test.sql | psql postgres -v myvar=Hello
value of external paremeter is Hello
NOTICE:  external parameter accessed from plpgsql is Hello
DO
new value of session variable is Hello, World

This is just proof concept - there should be better integration with
pl languages, using cache for read on server side, ...

Notices?

Regards

Pavel Stehule


client-session-vars.diff
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers