[HACKERS] Allowing join removals for more join types

2014-05-17 Thread David Rowley
I'm currently in the early stages of looking into expanding join removals.

Currently left outer joins can be removed if none of the columns of the
table are required for anything and the table being joined is a base table
that contains a unique index on all columns in the join clause.

The case I would like to work on is to allow sub queries where the query is
grouped by or distinct on all of the join columns.

Take the following as an example:

CREATE TABLE products (productid integer NOT NULL, code character
varying(32) NOT NULL);
CREATE TABLE sales (saleid integer NOT NULL, productid integer NOT NULL,
qty integer NOT NULL);

CREATE VIEW product_sales AS
 SELECT p.productid,
p.code,
s.qty
   FROM (products p
 LEFT JOIN ( SELECT sales.productid,
sum(sales.qty) AS qty
   FROM sales
  GROUP BY sales.productid) s ON ((p.productid = s.productid)));

If a user does:
SELECT productid,code FROM product_sales;
Then, if I'm correct, the join on sales can be removed.

As I said above, I'm in the early stages of looking at this and I'm
currently a bit confused. Basically I've put a breakpoint at the top of
the join_is_removable function and I can see that innerrel-rtekind
is RTE_SUBQUERY for my query, so the function is going to return false. So
what I need to so is get access to innerrel-subroot-parse so that I can
look at groupClause and distinctClause. The thing is innerrel-subroot is
NULL in this case, but I see a comment for subroot saying subroot -
PlannerInfo for subquery (NULL if it's not a subquery) so I guess this
does not also mean subroot - PlannerInfo for subquery (NOT NULL if it's a
subquery)?

Has anyone got any pointers to where I might be able to get the Query
details for the subquery? These structures are quite new to me.

Regards

David Rowley


Re: [HACKERS] Allowing join removals for more join types

2014-05-17 Thread David Rowley
On Sat, May 17, 2014 at 8:57 PM, David Rowley dgrowle...@gmail.com wrote:

 I'm currently in the early stages of looking into expanding join removals.

 As I said above, I'm in the early stages of looking at this and I'm
 currently a bit confused. Basically I've put a breakpoint at the top of
 the join_is_removable function and I can see that innerrel-rtekind
 is RTE_SUBQUERY for my query, so the function is going to return false. So
 what I need to so is get access to innerrel-subroot-parse so that I can
 look at groupClause and distinctClause. The thing is innerrel-subroot is
 NULL in this case, but I see a comment for subroot saying subroot -
 PlannerInfo for subquery (NULL if it's not a subquery) so I guess this
 does not also mean subroot - PlannerInfo for subquery (NOT NULL if it's a
 subquery)?

 Has anyone got any pointers to where I might be able to get the Query
 details for the subquery? These structures are quite new to me.


I think I've managed to answer my own question here. But please someone
correct me if this sounds wrong.
It looks like the existing join removals are done quite early in the
planning and redundant joins are removed before any subqueries from that
query are planned. So this innerrel-subroot-parse has not been done yet.
It seems to be done later in query_planner() when make_one_rel() is called.

The best I can come up with on how to implement this is to have 2 stages of
join removals. Stage 1 would be the existing stage that attempts to remove
joins from non subqueries. Stage 2 would happen just after make_one_rel()
is called from query_planner(), this would be to attempt to remove any
subqueries that are not need, and if it managed to remove any it would
force a 2nd call to make_one_rel().

Does this sound reasonable or does it sound like complete non-sense?



 Regards

 David Rowley




Re: [HACKERS] 9.4 checksum error in recovery with btree index

2014-05-17 Thread Heikki Linnakangas

On 05/17/2014 12:28 AM, Jeff Janes wrote:

More fun with my torn page injection test program on 9.4.

24171  2014-05-16 14:00:44.934 PDT:WARNING:  01000: page verification
failed, calculated checksum 21100 but expected 3356
24171  2014-05-16 14:00:44.934 PDT:CONTEXT:  xlog redo split_l: rel
1663/16384/16405 left 35191, right 35652, next 34666, level 0, firstright
192
24171  2014-05-16 14:00:44.934 PDT:LOCATION:  PageIsVerified, bufpage.c:145
24171  2014-05-16 14:00:44.934 PDT:FATAL:  XX001: invalid page in block
34666 of relation base/16384/16405
24171  2014-05-16 14:00:44.934 PDT:CONTEXT:  xlog redo split_l: rel
1663/16384/16405 left 35191, right 35652, next 34666, level 0, firstright
192
24171  2014-05-16 14:00:44.934 PDT:LOCATION:  ReadBuffer_common,
bufmgr.c:483


I've seen this twice now, the checksum failure was both times for the block
labelled next in the redo record.  Is this another case where the block
needs to be reinitialized upon replay?


Hmm, it looks like I fumbled the numbering of the backup blocks in the 
b-tree split WAL record (in 9.4). I blame the comments; the comments 
where the record is generated numbers the backup blocks starting from 1, 
but XLR_BKP_BLOCK(x) and RestoreBackupBlock(...) used in replay number 
them starting from 0.


Attached is a patch that I think fixes them. In addition to the 
rnext-reference, clearing the incomplete-split flag in the child page, 
had a similar numbering mishap.


- Heikki
diff --git a/src/backend/access/nbtree/nbtinsert.c b/src/backend/access/nbtree/nbtinsert.c
index d64cbd9..ecee5ac 100644
--- a/src/backend/access/nbtree/nbtinsert.c
+++ b/src/backend/access/nbtree/nbtinsert.c
@@ -1299,7 +1299,7 @@ _bt_split(Relation rel, Buffer buf, Buffer cbuf, OffsetNumber firstright,
 
 			lastrdata-data = (char *) newitem;
 			lastrdata-len = MAXALIGN(newitemsz);
-			lastrdata-buffer = buf;	/* backup block 1 */
+			lastrdata-buffer = buf;	/* backup block 0 */
 			lastrdata-buffer_std = true;
 		}
 
@@ -1320,7 +1320,7 @@ _bt_split(Relation rel, Buffer buf, Buffer cbuf, OffsetNumber firstright,
 			item = (IndexTuple) PageGetItem(origpage, itemid);
 			lastrdata-data = (char *) item;
 			lastrdata-len = MAXALIGN(IndexTupleSize(item));
-			lastrdata-buffer = buf;	/* backup block 1 */
+			lastrdata-buffer = buf;	/* backup block 0 */
 			lastrdata-buffer_std = true;
 		}
 
@@ -1333,11 +1333,11 @@ _bt_split(Relation rel, Buffer buf, Buffer cbuf, OffsetNumber firstright,
 			 * Although we don't need to WAL-log anything on the left page, we
 			 * still need XLogInsert to consider storing a full-page image of
 			 * the left page, so make an empty entry referencing that buffer.
-			 * This also ensures that the left page is always backup block 1.
+			 * This also ensures that the left page is always backup block 0.
 			 */
 			lastrdata-data = NULL;
 			lastrdata-len = 0;
-			lastrdata-buffer = buf;	/* backup block 1 */
+			lastrdata-buffer = buf;	/* backup block 0 */
 			lastrdata-buffer_std = true;
 		}
 
@@ -1353,7 +1353,7 @@ _bt_split(Relation rel, Buffer buf, Buffer cbuf, OffsetNumber firstright,
 			cblkno = BufferGetBlockNumber(cbuf);
 			lastrdata-data = (char *) cblkno;
 			lastrdata-len = sizeof(BlockNumber);
-			lastrdata-buffer = cbuf;	/* backup block 2 */
+			lastrdata-buffer = cbuf;	/* backup block 1 */
 			lastrdata-buffer_std = true;
 		}
 
@@ -1386,7 +1386,7 @@ _bt_split(Relation rel, Buffer buf, Buffer cbuf, OffsetNumber firstright,
 
 			lastrdata-data = NULL;
 			lastrdata-len = 0;
-			lastrdata-buffer = sbuf;	/* bkp block 2 (leaf) or 3 (non-leaf) */
+			lastrdata-buffer = sbuf;	/* bkp block 1 (leaf) or 2 (non-leaf) */
 			lastrdata-buffer_std = true;
 		}
 
diff --git a/src/backend/access/nbtree/nbtxlog.c b/src/backend/access/nbtree/nbtxlog.c
index 640639c..5f9fc49 100644
--- a/src/backend/access/nbtree/nbtxlog.c
+++ b/src/backend/access/nbtree/nbtxlog.c
@@ -300,8 +300,8 @@ btree_xlog_split(bool onleft, bool isroot,
 	 */
 	if (!isleaf)
 	{
-		if (record-xl_info  XLR_BKP_BLOCK(2))
-			(void) RestoreBackupBlock(lsn, record, 2, false, false);
+		if (record-xl_info  XLR_BKP_BLOCK(1))
+			(void) RestoreBackupBlock(lsn, record, 1, false, false);
 		else
 			_bt_clear_incomplete_split(lsn, record, xlrec-node, cblkno);
 	}
@@ -439,10 +439,10 @@ btree_xlog_split(bool onleft, bool isroot,
 	if (xlrec-rnext != P_NONE)
 	{
 		/*
-		 * the backup block containing right sibling is 2 or 3, depending
+		 * the backup block containing right sibling is 1 or 2, depending
 		 * whether this was a leaf or internal page.
 		 */
-		int			rnext_index = isleaf ? 2 : 3;
+		int			rnext_index = isleaf ? 1 : 2;
 
 		if (record-xl_info  XLR_BKP_BLOCK(rnext_index))
 			(void) RestoreBackupBlock(lsn, record, rnext_index, false, false);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allowing join removals for more join types

2014-05-17 Thread Tom Lane
David Rowley dgrowle...@gmail.com writes:
 It looks like the existing join removals are done quite early in the
 planning and redundant joins are removed before any subqueries from that
 query are planned. So this innerrel-subroot-parse has not been done yet.
 It seems to be done later in query_planner() when make_one_rel() is called.

It's true that we don't plan the subquery till later, but I don't see why
that's important here.  Everything you want to know is available from the
subquery parsetree; so just look at the RTE, don't worry about how much
of the RelOptInfo has been filled in.

 The best I can come up with on how to implement this is to have 2 stages of
 join removals. Stage 1 would be the existing stage that attempts to remove
 joins from non subqueries. Stage 2 would happen just after make_one_rel()
 is called from query_planner(), this would be to attempt to remove any
 subqueries that are not need, and if it managed to remove any it would
 force a 2nd call to make_one_rel().

That sounds like a seriously bad idea.  For one thing, it blows the
opportunity to not plan the subquery in the first place.  For another,
most of these steps happen in a carefully chosen order because there
are interdependencies.  You can't just go back and re-run some earlier
processing step.  A large fraction of the complexity of analyzejoins.c
right now arises from the fact that it has to undo some earlier
processing; that would get enormously worse if you delayed it further.

BTW, just taking one step back ... this seems like a pretty specialized
requirement.  Are you sure it wouldn't be easier to fix your app to
not generate such silly queries?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wrapping in extended mode doesn't work well with default pager

2014-05-17 Thread Greg Stark
I'm trying to review all the combinations of the options exhaustively
but in the process I noticed a few pre-existing psql oddities. Both of
these are present in 9.3:

Can anyone explain this? It's linestyle=old-style, border=1,
expanded=off, format=aligned. It looks like it's using new-style ascii
indicators in the header but old-style in the data cells:

  a   | a
+ |+b
+ b   |+
--+
 xx   | yy
  | 
 xx   : yy
  : 
 xx   : yy
  : 
 xx   : yy
  : 
 xx   : yy
  :
(2 rows)

Also the line-ending white-space is very odd here. It's
linestyle=old-ascii, border=0, expanded=off, format=aligned. There's
an extra space on the header and the first line of the data, but not
on the subsequent lines of the data:

 a   a
+b
 b  +
 --
xx   yy
 
xx   yy
 
xx   yy
 
xx   yy
 
xx   yy

(2 rows)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] %d in log_line_prefix doesn't work for bg/autovacuum workers

2014-05-17 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2014-05-17 00:17:46 +0200, Andres Freund wrote:
 On 2014-05-16 17:48:28 -0400, Tom Lane wrote:
 This is basically reintroducing a variable that used to exist and was
 intentionally gotten rid of, on the grounds that it'd fail to track
 any renaming of the session's current database (cf commit b256f24264).

 I did look whether there's a race making MyDatabaseName out of date. I
 didn't see any.

Hm.  Actually, it looks like commit b256f24264 was a bit schizophrenic:
although it removed this particular way in which a database rename could
cause trouble, it nonetheless *added* checks that there were no active
sessions in the database.  As you say, the checks were race-prone at the
time and have since been made bulletproof (though I could not find the
commit you mentioned?).  But the intent was there.

I think the key issue comes down to this comment in RenameDatabase:

 * XXX Client applications probably store the current database somewhere,
 * so renaming it could cause confusion.  On the other hand, there may not
 * be an actual problem besides a little confusion, so think about this
 * and decide.

If we're willing to decide that we will never support renaming an active
database, then the approach you're proposing makes sense.  And it seems
to me that this issue of potentially confusing client apps is much the
strongest argument for making such a decision.  But is it strong enough?

Given that there haven't been complaints in the past ten years about how
you can't rename an active database, I'm OK personally with locking this
down forever.  But I wonder if anyone wants to argue the contrary.

 I'd briefly changed elog.c to only use MydatabaseName, thinking that
 there seems to be little reason to continue using database_name in
 elog.c once the variable is introduced. But that's not a good idea:
 MyProcPort-database_name exists earlier.

If we do this at all, I think actually that is a good idea.
MyProcPort-database_name is a *requested* db name, but arguably the
%d log line field should represent the *actual* connected db name,
which means it shouldn't start being reported until we have some
confidence that such a DB exists.  I'd personally fill MyDatabaseName
a bit sooner than you have here, probably about where MyProc-databaseId
gets set.  But I don't think elog.c should be looking at more than one
source of truth for %d.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wrapping in extended mode doesn't work well with default pager

2014-05-17 Thread Greg Stark
Sorry, a couple things still look to not be quite right.

1) The width of the table when linestyle=old-ascii and border=0 or
border=1 (and expanded=on and format=wrapped) seems to off by one.

2) The hyphens following the RECORD NN are short by one

I'm surprised the last patch was so big since it sounded like a simple
off-by-one bug. It looks like you've removed the leading space on the
border=0 expanded case. I guess that makes sense but we should
probably stop making significant changes now and just focus on fixing
the off by one bugs.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] buildfarm: strange OOM failures on markhor (running CLOBBER_CACHE_RECURSIVELY)

2014-05-17 Thread Tomas Vondra
Hi all,

a few days ago I setup an buildfarm animal markhor, running the tests
with CLOBBER_CACHE_RECURSIVELY. As the tests are running very long,
reporting the results back to the server fails because of a safeguard
limit in the buildfarm server. Anyway, that's being discussed in a
different thread - here it's merely as a 'don't bother looking for addax
on the buildfarm website' warning.

I've been checking the progress of the recursive tests today, and I
found it actually failed in the 'make check' step. The logs are
available here:

buildfarm logs: http://www.fuzzy.cz/tmp/buildfarm/recursive-oom.tgz
kernel logs: http://www.fuzzy.cz/tmp/buildfarm/messages

The tests are running within a LXC container (operated through libvirt),
so whenever I say 'VM' I actually mean a LXC container. It might be some
VM/LXC misconfiguration, but as this happens only to a single VM (the
one running tests with recursive clobber), I find it unlikely.

== An example of the failure ==

parallel group (20 tests):  pg_lsn regproc oid name char money float4
txid text int2 varchar int4 float8 boolean int8 uuid rangetypes bit
numeric enum
 ...
 float4  ... ok
 float8  ... ok
 bit ... FAILED (test process exited with exit code 2)
 numeric ... FAILED (test process exited with exit code 2)
 txid... ok
 ...
===

and then of course the usual 'terminating connection because of crash of
another server process' warning. Apparently, it's getting killed by the
OOM killer, because it exhausts all the memory assigned to that VM (2GB).

May 15 19:44:53 postgres invoked oom-killer: gfp_mask=0xd0, order=0,
oom_adj=0, oom_score_adj=0
May 15 19:44:53 cspug kernel: postgres cpuset=recursive-builds
mems_allowed=0
May 15 19:44:53 cspug kernel: Pid: 17159, comm: postgres Not tainted
2.6.32-431.17.1.el6.centos.plus.x86_64 #1

AFAIK 2GB is more than enough for a buildfarm machine (after all,
chipmunk hass just 512MB). Also, this only happens on this VM
(cpuset=recursive-builds), the other two VMs, with exactly the same
limits, running other buildfarm animals (regular or with
CLOBBER_CACHE_ALWAYS) are perfectly happy. See magpie or markhor for
example. And I don't see any reason why a build with recursive clobber
should require more memory than a regular build. So this seems like a
memory leak somewhere in the cache invalidation code.

I thought it might be fixed by commit b23b0f5588 (Code review for recent
changes in relcache.c), but mite is currently working on
7894ac5004 and yet it already failed on OOM.

The failures apparently happen within a few hours of the test start. For
example on addax (gcc), the build started on 02:50 and the first OOM
failure happened on 05:19, on mite (clang), it's 03:20 vs. 06:50. So
it's like ~3-4 after the tests start.

regards
Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] buildfarm: strange OOM failures on markhor (running CLOBBER_CACHE_RECURSIVELY)

2014-05-17 Thread Tom Lane
Tomas Vondra t...@fuzzy.cz writes:
 ... then of course the usual 'terminating connection because of crash of
 another server process' warning. Apparently, it's getting killed by the
 OOM killer, because it exhausts all the memory assigned to that VM (2GB).

Can you fix things so it runs into its process ulimit before the OOM killer
triggers?  Then we'd get a memory map dumped to stderr, which would be
helpful in localizing the problem.

 ... So this seems like a
 memory leak somewhere in the cache invalidation code.

Smells that way to me too, but let's get some more evidence.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] buildfarm animals and 'snapshot too old'

2014-05-17 Thread Andrew Dunstan


On 05/15/2014 07:47 PM, Tomas Vondra wrote:

On 15.5.2014 22:07, Andrew Dunstan wrote:

Yes, I've seen that. Frankly, a test that takes something like 500
hours is a bit crazy.

Maybe. It certainly is not a test people will use during development.
But if it can detect some hard-to-find errors in the code, that might
possibly lead to serious problems, then +1 from me to run them at least
on one animal. 500 hours is ~3 weeks, which is not that bad IMHO.

Also, once you know where it fails the developer can run just that
single test (which might take minutes/hours, but not days).




I have made a change that omits the snapshot sanity check for 
CLOBBER_CACHE_RECURSIVELY cases, but keeps it for all others. See 
https://github.com/PGBuildFarm/server-code/commit/abd946918279b7683056a4fc3156415ef31a4675


cheers

andrew




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] buildfarm: strange OOM failures on markhor (running CLOBBER_CACHE_RECURSIVELY)

2014-05-17 Thread Tomas Vondra
On 17.5.2014 19:55, Tom Lane wrote:
 Tomas Vondra t...@fuzzy.cz writes:
 ... then of course the usual 'terminating connection because of crash of
 another server process' warning. Apparently, it's getting killed by the
 OOM killer, because it exhausts all the memory assigned to that VM (2GB).
 
 Can you fix things so it runs into its process ulimit before the OOM killer
 triggers?  Then we'd get a memory map dumped to stderr, which would be
 helpful in localizing the problem.

I did this in /etc/security/limits.d/80-pgbuild.conf:

pgbuild hard as 1835008

so the user the buildfarm runs under will have up to ~1.75GB of RAM (of
the 2GB available to the container).

 
 ... So this seems like a
 memory leak somewhere in the cache invalidation code.
 
 Smells that way to me too, but let's get some more evidence.

The tests are already running, and there are a few postgres processes:

  PID   VIRT  RES %CPUTIME+  COMMAND
11478   449m 240m 100.0 112:53.57 postgres: pgbuild regression [local]
CREATE VIEW
11423   219m  19m  0.0   0:00.17 postgres: checkpointer process
11424   219m 2880  0.0   0:00.05 postgres: writer process
11425   219m 5920  0.0   0:00.12 postgres: wal writer process
11426   219m 2708  0.0   0:00.05 postgres: autovacuum launcher process
11427  79544 1836  0.0   0:00.17 postgres: stats collector process
11479  1198m 1.0g  0.0  91:09.99 postgres: pgbuild regression [local]
CREATE INDEX waiting

Attached is 'pmap -x' output for the two interesting processes (11478,
11479).

Tomas
11478:   postgres: pgbuild regression [local] SELECT   
Address   Kbytes RSS   Dirty Mode   Mapping
004058403348   0 r-x--  postgres
00bb4000  48  48  48 rw---  postgres
00bc 308 188 188 rw---[ anon ]
015b3000 244 244 244 rw---[ anon ]
015f 956 956 956 rw---[ anon ]
7f642d696000  48   0   0 r-x--  libnss_files-2.12.so
7f642d6a20002048   0   0 -  libnss_files-2.12.so
7f642d8a2000   4   4   4 r  libnss_files-2.12.so
7f642d8a3000   4   4   4 rw---  libnss_files-2.12.so
7f642d8a4000  14492055525552 rw-s-  zero (deleted)
7f643662a000 452   0   0 r-x--  libfreebl3.so
7f643669b0002044   0   0 -  libfreebl3.so
7f643689a000   8   8   8 r  libfreebl3.so
7f643689c000   4   4   4 rw---  libfreebl3.so
7f643689d000  16   0   0 rw---[ anon ]
7f64368a1000  28   0   0 r-x--  libcrypt-2.12.so
7f64368a80002048   0   0 -  libcrypt-2.12.so
7f6436aa8000   4   4   4 r  libcrypt-2.12.so
7f6436aa9000   4   4   4 rw---  libcrypt-2.12.so
7f6436aaa000 184   0   0 rw---[ anon ]
7f6436ad8000 116   0   0 r-x--  libselinux.so.1
7f6436af50002044   0   0 -  libselinux.so.1
7f6436cf4000   4   4   4 r  libselinux.so.1
7f6436cf5000   4   4   4 rw---  libselinux.so.1
7f6436cf6000   4   4   4 rw---[ anon ]
7f6436cf7000 100   0   0 r-x--  libsasl2.so.2.0.23
7f6436d12044   0   0 -  libsasl2.so.2.0.23
7f6436f0f000   4   4   4 r  libsasl2.so.2.0.23
7f6436f1   4   4   4 rw---  libsasl2.so.2.0.23
7f6436f11000 228   0   0 r-x--  libnspr4.so
7f6436f4a0002044   0   0 -  libnspr4.so
7f6437149000   4   4   4 r  libnspr4.so
7f643714a000   8   8   8 rw---  libnspr4.so
7f643714c000   8   0   0 rw---[ anon ]
7f643714e000  16   0   0 r-x--  libplc4.so
7f64371520002044   0   0 -  libplc4.so
7f6437351000   4   4   4 r  libplc4.so
7f6437352000   4   4   4 rw---  libplc4.so
7f6437353000  12   0   0 r-x--  libplds4.so
7f64373560002044   0   0 -  libplds4.so
7f6437555000   4   4   4 r  libplds4.so
7f6437556000   4   4   4 rw---  libplds4.so
7f6437557000 148   0   0 r-x--  libnssutil3.so
7f643757c0002048   0   0 -  libnssutil3.so
7f643777c000  24  24  24 r  libnssutil3.so
7f6437782000   4   4   4 rw---  libnssutil3.so
7f64377830001236   0   0 r-x--  libnss3.so
7f64378b80002048   0   0 -  libnss3.so
7f6437ab8000  20  20  20 r  libnss3.so
7f6437abd000   8   8   8 rw---  libnss3.so
7f6437abf000   8   0   0 rw---[ anon ]
7f6437ac1000 160  

Re: [HACKERS] %d in log_line_prefix doesn't work for bg/autovacuum workers

2014-05-17 Thread Andres Freund
Hi,

On 2014-05-17 12:23:51 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2014-05-17 00:17:46 +0200, Andres Freund wrote:
  On 2014-05-16 17:48:28 -0400, Tom Lane wrote:
  This is basically reintroducing a variable that used to exist and was
  intentionally gotten rid of, on the grounds that it'd fail to track
  any renaming of the session's current database (cf commit b256f24264).
 
  I did look whether there's a race making MyDatabaseName out of date. I
  didn't see any.
 
 Hm.  Actually, it looks like commit b256f24264 was a bit schizophrenic:
 although it removed this particular way in which a database rename could
 cause trouble, it nonetheless *added* checks that there were no active
 sessions in the database.

I think it was just about making it harder to hit issues related to
renames. But since there wasn't enough infrastructure to make it
bulletproof yet...

  As you say, the checks were race-prone at the
 time and have since been made bulletproof (though I could not find the
 commit you mentioned?).  But the intent was there.

It's 52667d56a3b489e5645f069522631824b7ffc520. I've lost the initial 5
when copying it into emacs.

 I think the key issue comes down to this comment in RenameDatabase:
 
  * XXX Client applications probably store the current database somewhere,
  * so renaming it could cause confusion.  On the other hand, there may not
  * be an actual problem besides a little confusion, so think about this
  * and decide.
 
 If we're willing to decide that we will never support renaming an active
 database, then the approach you're proposing makes sense.  And it seems
 to me that this issue of potentially confusing client apps is much the
 strongest argument for making such a decision.  But is it strong enough?
 
 Given that there haven't been complaints in the past ten years about how
 you can't rename an active database, I'm OK personally with locking this
 down forever.  But I wonder if anyone wants to argue the contrary.

I don't see much need to allow it. But even if we're interested in
allowing rename properly there'll definitely be the need to update the
database name used in log messages. This doesn't seem to make it
harder. My guess would be that we'd need some kind of invalidation
message for it...

  I'd briefly changed elog.c to only use MydatabaseName, thinking that
  there seems to be little reason to continue using database_name in
  elog.c once the variable is introduced. But that's not a good idea:
  MyProcPort-database_name exists earlier.
 
 If we do this at all, I think actually that is a good idea.
 MyProcPort-database_name is a *requested* db name, but arguably the
 %d log line field should represent the *actual* connected db name,
 which means it shouldn't start being reported until we have some
 confidence that such a DB exists.

Hm. I tentatively think that it's still reasonable to report it
earlier. There's a bunch of things (option processing, authentication)
that can error out before the database name is finally validated. It
seems somewhat helpful to give context there.
It's easy enough to get data from connecting clients into the log
anyway, so there doesn't seem to be a a security argument either.

 I'd personally fill MyDatabaseName
 a bit sooner than you have here, probably about where MyProc-databaseId
 gets set.

The reason I placed it where I did is that it's the first location where
we can be sure the database conected to is the correct one because now
the lock on the database is held and we've rechecked the name.

 But I don't think elog.c should be looking at more than one source of
 truth for %d.

Ok.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] buildfarm: strange OOM failures on markhor (running CLOBBER_CACHE_RECURSIVELY)

2014-05-17 Thread Andres Freund
On 2014-05-17 20:41:37 +0200, Tomas Vondra wrote:
 On 17.5.2014 19:55, Tom Lane wrote:
  Tomas Vondra t...@fuzzy.cz writes:
 The tests are already running, and there are a few postgres processes:
 
   PID   VIRT  RES %CPUTIME+  COMMAND
 11478   449m 240m 100.0 112:53.57 postgres: pgbuild regression [local]
 CREATE VIEW
 11423   219m  19m  0.0   0:00.17 postgres: checkpointer process
 11424   219m 2880  0.0   0:00.05 postgres: writer process
 11425   219m 5920  0.0   0:00.12 postgres: wal writer process
 11426   219m 2708  0.0   0:00.05 postgres: autovacuum launcher process
 11427  79544 1836  0.0   0:00.17 postgres: stats collector process
 11479  1198m 1.0g  0.0  91:09.99 postgres: pgbuild regression [local]
 CREATE INDEX waiting
 
 Attached is 'pmap -x' output for the two interesting processes (11478,
 11479).

Could you gdb -p 11479 into the process and issue 'p
MemoryContextStats(TopMemoryContext)'. That should print information
about the server's allocation to its stderr.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [9.4] Minor SSL/ECDH related doc fixes

2014-05-17 Thread Marko Kreen
- Clarify ECDH decription in release notes.
- Fix default value - it's 'prime256v1'.
- List curves with good cross-platform support explicitly
  (NIST P-256 / P-384 / P-521).

The -list_curves output is full of garbage, it's hard to know which
ones make sense to use.  Only those three curves are supported
cross-platform - OpenSSL/Java/Windows - so list them explicitly.

Only reason to tune this value is changing overall security
level up/down, so now this can be done safely and quickly.

Only upwards though.  We could also list here NIST P-192/P-224
(prime192v1, secp224r1), but those are not supported by Windows.
And prime256v1 is quite fast already.

In the future it might make philosophical sense to list
also Brainpool curves (RFC7027), or some new curves from
http://safecurves.cr.yp.to/ when they are brought to TLS.
But currently only NIST/NSA curves are working option,
so let's keep it simple for users.

-- 
marko

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index d9e5985..4a666d0 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1020,13 +1020,23 @@ include 'filename'
   /term
   listitem
para
-Specifies the name of the curve to use in ECDH key exchanges.  The
-default is literalprime256p1/.
+Specifies the name of the curve to use in ECDH key exchange.
+It needs to be supported by all clients that connect.
+It does not need to be same curve as used by server's
+Elliptic Curve key.  The default is literalprime256v1/.  
/para
 
para
-The list of available curves can be shown with the command
-literalopenssl ecparam -list_curves/literal.
+OpenSSL names for most common curves:
+literalprime256v1/ (NIST P-256),
+literalsecp384r1/ (NIST P-384),
+literalsecp521r1/ (NIST P-521).
+   /para
+
+   para
+The full list of available curves can be shown with the command
+literalopenssl ecparam -list_curves/literal.  Not all of them
+are usable in TLS though.
/para
   /listitem
  /varlistentry
diff --git a/doc/src/sgml/release-9.4.sgml b/doc/src/sgml/release-9.4.sgml
index 3070d0b..7c6fb8f 100644
--- a/doc/src/sgml/release-9.4.sgml
+++ b/doc/src/sgml/release-9.4.sgml
@@ -603,17 +603,23 @@
/para
 
para
-Such keys are faster and have improved security over previous
-options. The new configuration
-parameter link linkend=guc-ssl-ecdh-curvevarnamessl_ecdh_curve//link
-controls which curve is used.
+This allows use of Elliptic Curve keys for server authentication.
+Such keys are faster and have improved security over RSA keys.
+Also it makes RSA keys perform slightly faster, as ECDHE-RSA key
+exchange will be used over DHE-RSA if both sides support it.
+   /para
+
+   para
+The new configuration parameter
+link linkend=guc-ssl-ecdh-curvevarnamessl_ecdh_curve//link
+controls which curve is used for ECDH.
/para
   /listitem
 
   listitem
para
 Improve the default link
-linkend=guc-ssl-ciphersvarnamessl_ciphers//link ciphers
+linkend=guc-ssl-ciphersvarnamessl_ciphers//link value
 (Marko Kreen)
/para
   /listitem

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] btree_gist macaddr valgrind woes

2014-05-17 Thread Tom Lane
I wrote:
 BTW, the *real* problem with all this stuff is that the gbtreekeyNN types
 are declared as having int alignment, even though some of the opclasses
 store double-aligned types in them.  I imagine it's possible to provoke
 bus errors on machines that are picky about alignment.  The first column
 of an index is safe enough because index tuples will be double-aligned
 anyway, but it seems like there's a hazard for lower-order columns.

And indeed there is:

regression=# create extension btree_gist ;
CREATE EXTENSION
regression=# create table tt (f1 int2, f2 float8);
CREATE TABLE
regression=# create index on tt using gist(f1,f2);
CREATE INDEX
regression=# insert into tt values(1,2);
INSERT 0 1
regression=# insert into tt values(2,4);
INSERT 0 1
regression=# set enable_seqscan TO 0;
SET
regression=# explain select * from tt where f1=2::int2 and f2=4;
   QUERY PLAN   

 Index Scan using tt_f1_f2_idx on tt  (cost=0.14..8.16 rows=1 width=10)
   Index Cond: ((f1 = 2::smallint) AND (f2 = 4::double precision))
 Planning time: 1.043 ms
(3 rows)

regression=# select * from tt where f1=2::int2 and f2=4;
 bus error 

 This is something we cannot fix compatibly :-(

AFAICS, what we have to do is mark the wider gbtreekeyNN types as
requiring double alignment.  This will break pg_upgrade'ing any index in
which they're used as non-first columns, unless perhaps all the preceding
columns have at least double size/alignment.  I guess pg_upgrade can
check for that, but it'll be kind of a pain.

Another issue is what the heck btree_gist's extension upgrade script ought
to do about this.  It can't just go and modify the type declarations.

Actually, on further thought, this isn't an issue for pg_upgrade at all,
just for the extension upgrade script.  Maybe we just have to make it
poke through the catalogs looking for at-risk indexes, and refuse to
complete the extension upgrade if there are any?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] %d in log_line_prefix doesn't work for bg/autovacuum workers

2014-05-17 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2014-05-17 12:23:51 -0400, Tom Lane wrote:
 I think the key issue comes down to this comment in RenameDatabase:
 
 * XXX Client applications probably store the current database somewhere,
 * so renaming it could cause confusion.  On the other hand, there may not
 * be an actual problem besides a little confusion, so think about this
 * and decide.
 
 If we're willing to decide that we will never support renaming an active
 database, then the approach you're proposing makes sense.  And it seems
 to me that this issue of potentially confusing client apps is much the
 strongest argument for making such a decision.  But is it strong enough?
 
 Given that there haven't been complaints in the past ten years about how
 you can't rename an active database, I'm OK personally with locking this
 down forever.  But I wonder if anyone wants to argue the contrary.

 I don't see much need to allow it. But even if we're interested in
 allowing rename properly there'll definitely be the need to update the
 database name used in log messages.

I think the client-side issues are far worse.  For example, if we allow
renaming active databases then the subprocesses in a parallel pg_dump or
pg_restore could connect to the wrong database, ie not the one the leader
process is connected to.  The very best-case outcome of that is confusing
error messages, and worst-case could be pretty nasty (perhaps even
security issues?).  We could no doubt teach pg_dump and pg_restore to
cross-check database OIDs after reconnecting, but lots of other
applications could have comparable problems.

 If we do this at all, I think actually that is a good idea.
 MyProcPort-database_name is a *requested* db name, but arguably the
 %d log line field should represent the *actual* connected db name,
 which means it shouldn't start being reported until we have some
 confidence that such a DB exists.

 Hm. I tentatively think that it's still reasonable to report it
 earlier. There's a bunch of things (option processing, authentication)
 that can error out before the database name is finally validated. It
 seems somewhat helpful to give context there.

What context?  By definition, no such processing can depend on which
database you're trying to connect to.

 The reason I placed it where I did is that it's the first location where
 we can be sure the database conected to is the correct one because now
 the lock on the database is held and we've rechecked the name.

No, if we've completed the previous lookup then the DB exists.  The
recheck is to catch the possibility that a rename or drop is completing
concurrently --- but I don't think it's unreasonable to show the
database's (old) name as soon as we've seen evidence that it does/did
exist.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] %d in log_line_prefix doesn't work for bg/autovacuum workers

2014-05-17 Thread Andres Freund
On 2014-05-17 16:23:26 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2014-05-17 12:23:51 -0400, Tom Lane wrote:
  Given that there haven't been complaints in the past ten years about how
  you can't rename an active database, I'm OK personally with locking this
  down forever.  But I wonder if anyone wants to argue the contrary.
 
  I don't see much need to allow it. But even if we're interested in
  allowing rename properly there'll definitely be the need to update the
  database name used in log messages.
 
 I think the client-side issues are far worse.

No disagreements from me there. Those just aren't particulary related to
MyDatabaseName existing...

 For example, if we allow
 renaming active databases then the subprocesses in a parallel pg_dump or
 pg_restore could connect to the wrong database, ie not the one the leader
 process is connected to.  The very best-case outcome of that is confusing
 error messages, and worst-case could be pretty nasty (perhaps even
 security issues?).

Yea, I am pretty sure there's some nastyness that way. Don't really want
to go there for a feature that's not even been requested.

  We could no doubt teach pg_dump and pg_restore to
 cross-check database OIDs after reconnecting, but lots of other
 applications could have comparable problems.

I guess pg_dump would have to lock the database before dumping

  If we do this at all, I think actually that is a good idea.
  MyProcPort-database_name is a *requested* db name, but arguably the
  %d log line field should represent the *actual* connected db name,
  which means it shouldn't start being reported until we have some
  confidence that such a DB exists.
 
  Hm. I tentatively think that it's still reasonable to report it
  earlier. There's a bunch of things (option processing, authentication)
  that can error out before the database name is finally validated. It
  seems somewhat helpful to give context there.
 
 What context?  By definition, no such processing can depend on which
 database you're trying to connect to.

With context I mean printing a value for log_line_prefix's '%d'. If you
have a application constantly connecting with the wrong username it
might be easier to diagnose if you know the database it's trying to
connect to.

  The reason I placed it where I did is that it's the first location where
  we can be sure the database conected to is the correct one because now
  the lock on the database is held and we've rechecked the name.
 
 No, if we've completed the previous lookup then the DB exists.  The
 recheck is to catch the possibility that a rename or drop is completing
 concurrently --- but I don't think it's unreasonable to show the
 database's (old) name as soon as we've seen evidence that it does/did
 exist.

Fair enough. It's imo a minor judgement call with reasons for going
either way.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] buildfarm animals and 'snapshot too old'

2014-05-17 Thread Tomas Vondra
On 17.5.2014 19:58, Andrew Dunstan wrote:
 
 On 05/15/2014 07:47 PM, Tomas Vondra wrote:
 On 15.5.2014 22:07, Andrew Dunstan wrote:
 Yes, I've seen that. Frankly, a test that takes something like 500
 hours is a bit crazy.
 Maybe. It certainly is not a test people will use during development.
 But if it can detect some hard-to-find errors in the code, that might
 possibly lead to serious problems, then +1 from me to run them at least
 on one animal. 500 hours is ~3 weeks, which is not that bad IMHO.

 Also, once you know where it fails the developer can run just that
 single test (which might take minutes/hours, but not days).
 
 
 
 I have made a change that omits the snapshot sanity check for
 CLOBBER_CACHE_RECURSIVELY cases, but keeps it for all others. See
 https://github.com/PGBuildFarm/server-code/commit/abd946918279b7683056a4fc3156415ef31a4675

OK, thanks. Seems reasonable.

Tomas



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] btree_gist macaddr valgrind woes

2014-05-17 Thread Heikki Linnakangas

On 05/17/2014 11:12 PM, Tom Lane wrote:

I wrote:

BTW, the *real* problem with all this stuff is that the gbtreekeyNN types
are declared as having int alignment, even though some of the opclasses
store double-aligned types in them.  I imagine it's possible to provoke
bus errors on machines that are picky about alignment.  The first column
of an index is safe enough because index tuples will be double-aligned
anyway, but it seems like there's a hazard for lower-order columns.


And indeed there is:

regression=# create extension btree_gist ;
CREATE EXTENSION
regression=# create table tt (f1 int2, f2 float8);
CREATE TABLE
regression=# create index on tt using gist(f1,f2);
CREATE INDEX
regression=# insert into tt values(1,2);
INSERT 0 1
regression=# insert into tt values(2,4);
INSERT 0 1
regression=# set enable_seqscan TO 0;
SET
regression=# explain select * from tt where f1=2::int2 and f2=4;
QUERY PLAN

  Index Scan using tt_f1_f2_idx on tt  (cost=0.14..8.16 rows=1 width=10)
Index Cond: ((f1 = 2::smallint) AND (f2 = 4::double precision))
  Planning time: 1.043 ms
(3 rows)

regression=# select * from tt where f1=2::int2 and f2=4;
 bus error 


This is something we cannot fix compatibly :-(


AFAICS, what we have to do is mark the wider gbtreekeyNN types as
requiring double alignment.  This will break pg_upgrade'ing any index in
which they're used as non-first columns, unless perhaps all the preceding
columns have at least double size/alignment.  I guess pg_upgrade can
check for that, but it'll be kind of a pain.

Another issue is what the heck btree_gist's extension upgrade script ought
to do about this.  It can't just go and modify the type declarations.

Actually, on further thought, this isn't an issue for pg_upgrade at all,
just for the extension upgrade script.  Maybe we just have to make it
poke through the catalogs looking for at-risk indexes, and refuse to
complete the extension upgrade if there are any?


I think it would be best to just not allow pg_upgrade if there are any 
indexes using the ill-defined types. The upgrade script could then 
simply drop the types and re-create them. The DBA would need to drop the 
affected indexes before upgrade and re-create them afterwards, but I 
think that would be acceptable. I doubt there are many people using 
btree_gist on int8 or float8 columns.


Another way to attack this would be to change the code to memcpy() the 
values before accessing them. That would be ugly, but it would be 
back-patchable. In HEAD, I'd rather bite the bullet and get the catalogs 
fixed, though.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] buildfarm: strange OOM failures on markhor (running CLOBBER_CACHE_RECURSIVELY)

2014-05-17 Thread Tomas Vondra
On 17.5.2014 22:33, Tomas Vondra wrote:
 On 17.5.2014 21:31, Andres Freund wrote:
 On 2014-05-17 20:41:37 +0200, Tomas Vondra wrote:
 On 17.5.2014 19:55, Tom Lane wrote:
 Tomas Vondra t...@fuzzy.cz writes:
 The tests are already running, and there are a few postgres processes:

   PID   VIRT  RES %CPUTIME+  COMMAND
 11478   449m 240m 100.0 112:53.57 postgres: pgbuild regression [local]
 CREATE VIEW
 11423   219m  19m  0.0   0:00.17 postgres: checkpointer process
 11424   219m 2880  0.0   0:00.05 postgres: writer process
 11425   219m 5920  0.0   0:00.12 postgres: wal writer process
 11426   219m 2708  0.0   0:00.05 postgres: autovacuum launcher process
 11427  79544 1836  0.0   0:00.17 postgres: stats collector process
 11479  1198m 1.0g  0.0  91:09.99 postgres: pgbuild regression [local]
 CREATE INDEX waiting

 Attached is 'pmap -x' output for the two interesting processes (11478,
 11479).

 Could you gdb -p 11479 into the process and issue 'p
 MemoryContextStats(TopMemoryContext)'. That should print information
 about the server's allocation to its stderr.

And another memory context stats for a session executing CREATE INDEX,
while having allocated  The interesting thing is there are ~11k lines
that look exactly like this:

pg_namespace_oid_index: 1024 total in 1 blocks; 88 free (0 chunks); 936 used

Seems a bit strange to me.

Tomas


create-index.log.gz
Description: application/gzip

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] btree_gist macaddr valgrind woes

2014-05-17 Thread Andres Freund
On 2014-05-17 23:46:39 +0300, Heikki Linnakangas wrote:
 On 05/17/2014 11:12 PM, Tom Lane wrote:
 I wrote:
 BTW, the *real* problem with all this stuff is that the gbtreekeyNN types
 are declared as having int alignment, even though some of the opclasses
 store double-aligned types in them.  I imagine it's possible to provoke
 bus errors on machines that are picky about alignment.  The first column
 of an index is safe enough because index tuples will be double-aligned
 anyway, but it seems like there's a hazard for lower-order columns.
 
 And indeed there is:
 
 regression=# create extension btree_gist ;
 CREATE EXTENSION
 regression=# create table tt (f1 int2, f2 float8);
 CREATE TABLE
 regression=# create index on tt using gist(f1,f2);
 CREATE INDEX
 regression=# insert into tt values(1,2);
 INSERT 0 1
 regression=# insert into tt values(2,4);
 INSERT 0 1
 regression=# set enable_seqscan TO 0;
 SET
 regression=# explain select * from tt where f1=2::int2 and f2=4;
 QUERY PLAN
 
   Index Scan using tt_f1_f2_idx on tt  (cost=0.14..8.16 rows=1 width=10)
 Index Cond: ((f1 = 2::smallint) AND (f2 = 4::double precision))
   Planning time: 1.043 ms
 (3 rows)
 
 regression=# select * from tt where f1=2::int2 and f2=4;
  bus error 
 
 This is something we cannot fix compatibly :-(

Too bad.

 Another issue is what the heck btree_gist's extension upgrade script ought
 to do about this.  It can't just go and modify the type declarations.
 
 Actually, on further thought, this isn't an issue for pg_upgrade at all,
 just for the extension upgrade script.  Maybe we just have to make it
 poke through the catalogs looking for at-risk indexes, and refuse to
 complete the extension upgrade if there are any?

Hm. I guess it could just add a C function to do that job. I think
pg_upgrade already has some.

 I think it would be best to just not allow pg_upgrade if there are any
 indexes using the ill-defined types. The upgrade script could then simply
 drop the types and re-create them. The DBA would need to drop the affected
 indexes before upgrade and re-create them afterwards, but I think that would
 be acceptable.
 I doubt there are many people using btree_gist on int8 or float8
 columns.

I don't think it's that unlikely. It can make a fair amount of sense to
have multicolumn indexes where one columns is over an int8 and the other
over the datatype requiring the gist index to be used. I've certainly
done that.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] buildfarm: strange OOM failures on markhor (running CLOBBER_CACHE_RECURSIVELY)

2014-05-17 Thread Andres Freund
Hi,

On 2014-05-17 22:33:31 +0200, Tomas Vondra wrote:
 Anyway, the main difference between the analyze snapshot seems to be this:
 
   init:  CacheMemoryContext: 67100672 total in 17 blocks; ...
   350MB: CacheMemoryContext: 134209536 total in 25 blocks; ...
   400MB: CacheMemoryContext: 192929792 total in 32 blocks; ...
   500MB: CacheMemoryContext: 293593088 total in 44 blocks; ...
   600MB: CacheMemoryContext: 411033600 total in 58 blocks; ...

Hm, so something is definitely leaking memory inside CacheMemoryContext
itself. Is that happening constantly or just with individual tests?

 Not sure if there's something wrong with the SELECT memory context. It
 has ~1500 of nested nodes like these:
 
SQL function data: 24576 total in 2 blocks; ...
   ExecutorState: 24576 total in 2 blocks; ...
 SQL function data: 24576 total in 2 blocks; ...
 ExprContext: 8192 total in 1 blocks; ...
 
 But maybe it's expected / OK.

I'd guess that's a recursive function call. Any chance you know what's
been executing at that point? I'd bet it's been the 'errors' check. That
has:
-- Check that stack depth detection mechanism works and
-- max_stack_depth is not set too high
create function infinite_recurse() returns int as
'select infinite_recurse()' language sql;
\set VERBOSITY terse
select infinite_recurse();
ERROR:  stack depth limit exceeded

which'd pretty much produce a tree of executions like yours.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] %d in log_line_prefix doesn't work for bg/autovacuum workers

2014-05-17 Thread Christoph Berg
Re: Tom Lane 2014-05-17 22961.1400343...@sss.pgh.pa.us
 I think the key issue comes down to this comment in RenameDatabase:
 
  * XXX Client applications probably store the current database somewhere,
  * so renaming it could cause confusion.  On the other hand, there may not
  * be an actual problem besides a little confusion, so think about this
  * and decide.
 
 If we're willing to decide that we will never support renaming an active
 database, then the approach you're proposing makes sense.  And it seems
 to me that this issue of potentially confusing client apps is much the
 strongest argument for making such a decision.  But is it strong enough?
 
 Given that there haven't been complaints in the past ten years about how
 you can't rename an active database, I'm OK personally with locking this
 down forever.  But I wonder if anyone wants to argue the contrary.

Fwiw, we ran into exactly this question yesterday during a customer
training session. Their plan is to provide a database with updated
content every few hours, so their idea was to populate db_new, rename
db to db_old, rename db_new to db, and eventually drop db_old from
cron. This fails when sessions are connected to db. Surprisingly, it
actually works if you do the db renaming on a master server, but let
the (anyway readonly) client connect to a streaming slave. (On 9.2,
didn't test 9.4.)

We also looked into the confused-client problem, but
current_database() reports the new name correctly, and hence figured
there shouldn't be any problem with this approach, despite it
obviously being slightly out of spec because of the dependency on
running on a SR slave. I even jokingly noted that this will probably
get fixed in one or the other direction once I mention it on the
mailing list ;)

So here's your complaint/request that renaming active databases should
be possible...

Christoph
-- 
c...@df7cb.de | http://www.df7cb.de/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] %d in log_line_prefix doesn't work for bg/autovacuum workers

2014-05-17 Thread Christoph Berg
Re: Andres Freund 2014-05-17 20140517203404.gb4...@awork2.anarazel.de
  For example, if we allow
  renaming active databases then the subprocesses in a parallel pg_dump or
  pg_restore could connect to the wrong database, ie not the one the leader
  process is connected to.  The very best-case outcome of that is confusing
  error messages, and worst-case could be pretty nasty (perhaps even
  security issues?).
 
 Yea, I am pretty sure there's some nastyness that way. Don't really want
 to go there for a feature that's not even been requested.

Does pg_dumpall protect against connecting to the wrong database if
renames are on the way? To me, this sounds like the same problem as
with pg_dump -j.

   We could no doubt teach pg_dump and pg_restore to
  cross-check database OIDs after reconnecting, but lots of other
  applications could have comparable problems.
 
 I guess pg_dump would have to lock the database before dumping

Sounds like a sane idea for both.

Christoph
-- 
c...@df7cb.de | http://www.df7cb.de/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] %d in log_line_prefix doesn't work for bg/autovacuum workers

2014-05-17 Thread Andres Freund
On 2014-05-17 23:10:42 +0200, Christoph Berg wrote:
  Given that there haven't been complaints in the past ten years about how
  you can't rename an active database, I'm OK personally with locking this
  down forever.  But I wonder if anyone wants to argue the contrary.
 
 Fwiw, we ran into exactly this question yesterday during a customer
 training session. Their plan is to provide a database with updated
 content every few hours, so their idea was to populate db_new, rename
 db to db_old, rename db_new to db, and eventually drop db_old from
 cron. This fails when sessions are connected to db. Surprisingly, it
 actually works if you do the db renaming on a master server, but let
 the (anyway readonly) client connect to a streaming slave. (On 9.2,
 didn't test 9.4.)

Ick. That's a bug imo. It should cause a recovery conflict disconnecting
active sessions with force...
I'd very much discourage your users from relying on this. Independent of
my complaint about %d this isn't something that's supported and it won't
change unless somebody puts non-neglegible resources into it.

 We also looked into the confused-client problem, but
 current_database() reports the new name correctly, and hence figured
 there shouldn't be any problem with this approach, despite it
 obviously being slightly out of spec because of the dependency on
 running on a SR slave.

At the very least log_line_prefix's %d will log the wrong thing.

 I even jokingly noted that this will probably
 get fixed in one or the other direction once I mention it on the
 mailing list ;)

Heh.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] btree_gist macaddr valgrind woes

2014-05-17 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2014-05-17 23:46:39 +0300, Heikki Linnakangas wrote:
 I doubt there are many people using btree_gist on int8 or float8
 columns.

 I don't think it's that unlikely. It can make a fair amount of sense to
 have multicolumn indexes where one columns is over an int8 and the other
 over the datatype requiring the gist index to be used. I've certainly
 done that.

Yeah; the whole point of having these opclasses at all is to allow a
GIST index to index a scalar column along with some GIST-only
functionality.

However, given the lack of field complaints, it seems likely that nobody
is trying to do that on non-Intel architectures; or at least, not on a
non-Intel architecture where the kernel is unwilling to mask the problem
via trap emulation.  So my feeling about it is we don't need to try to
back-patch a fix.  I'd like to see it fixed in HEAD, though.

A larger issue is that we evidently have no buildfarm animals that are
picky about alignment, or at least none that are running a modern-enough
buildfarm script to be running the contrib/logical_decoding test.
That seems like a significant gap.  I don't want to volunteer to run
a critter on my HPPA box: it's old enough, and eats enough electricity,
that I no longer want to leave it on 24x7.  Plus a lot of the time its
response to a bus error is to lock up in a tight loop rather than report
an error, so a failure wouldn't get reported usefully by the buildfarm
anyway.  Does anyone have an ARM or PPC box where they can configure
the kernel not to mask misaligned fetches?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] uuid-ossp (Re: [pgsql-packagers] Postgresapp 9.4 beta build ready)

2014-05-17 Thread Christoph Berg
[redirecting to -hackers]

Re: Tom Lane 2014-05-15 31008.1400180...@sss.pgh.pa.us
 Sandeep Thakkar sandeep.thak...@enterprisedb.com writes:
  Yes, Jakob is right. On 9.4, we had to patch configure script along with
  uuid-ossp.c to resolve the uuid issue.
 
 I think we need to discuss this on the open pgsql-hackers list.
 It seems like we have to either 
 
 (1) hack both configure and uuid-ossp.c to work around the issue
 
 (2) do something more radical, like adopt Palle's patch to use the
 BSD uuid functions.
 
 Now, (2) sounds a bit scary for a post-beta1 change, but on the other hand
 we know that ossp-uuid is a train wreck in progress.  Maybe it's time
 to do something about it.  But that's got to be debated on -hackers not
 here.

Fwiw, libossp-uuid is the only non-trivial dependency of
postgresql*.{deb,rpm} - I've been to trainings at customer sites more
than once where we just had a preinstalled Linux machine with no
internet and a set of PostgreSQL packages that needed dpkg
--force-depends or rpm --nodeps to install just because the -contrib
package needs that lib.

So if we got rid of that dependency, life might become a bit easier
also in that setting.

Christoph
-- 
c...@df7cb.de | http://www.df7cb.de/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] %d in log_line_prefix doesn't work for bg/autovacuum workers

2014-05-17 Thread Christoph Berg
Re: Andres Freund 2014-05-17 20140517211930.ga10...@awork2.anarazel.de
 On 2014-05-17 23:10:42 +0200, Christoph Berg wrote:
   Given that there haven't been complaints in the past ten years about how
   you can't rename an active database, I'm OK personally with locking this
   down forever.  But I wonder if anyone wants to argue the contrary.
  
  Fwiw, we ran into exactly this question yesterday during a customer
  training session. Their plan is to provide a database with updated
  content every few hours, so their idea was to populate db_new, rename
  db to db_old, rename db_new to db, and eventually drop db_old from
  cron. This fails when sessions are connected to db. Surprisingly, it
  actually works if you do the db renaming on a master server, but let
  the (anyway readonly) client connect to a streaming slave. (On 9.2,
  didn't test 9.4.)
 
 Ick. That's a bug imo. It should cause a recovery conflict disconnecting
 active sessions with force...
 I'd very much discourage your users from relying on this. Independent of
 my complaint about %d this isn't something that's supported and it won't
 change unless somebody puts non-neglegible resources into it.

They want to pg_terminate_backend() the sessions on db_old anyway, but
to get that point, you need to be able to rename the db. The
alternative would be to disallow connections to the db (for which
there is no real nice way), kill existing connections, do the
renaming, and then reenable connections. That's much more effort,
though.

Fwiw, this wasn't the first time I've heard of that idea, it also
doesn't sound too far-fetched for me. I guess people usually go damn,
I can't rename active dbs, let's try something else instead of
complaining on the mailing lists in that case.

  We also looked into the confused-client problem, but
  current_database() reports the new name correctly, and hence figured
  there shouldn't be any problem with this approach, despite it
  obviously being slightly out of spec because of the dependency on
  running on a SR slave.
 
 At the very least log_line_prefix's %d will log the wrong thing.

I guess if you are doing database renames, you can live with that,
especially if you are doing the renames only for the purpose of
putting an updated db in place, and then throwing away the old db
quickly.

Christoph
-- 
c...@df7cb.de | http://www.df7cb.de/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] buildfarm: strange OOM failures on markhor (running CLOBBER_CACHE_RECURSIVELY)

2014-05-17 Thread Andres Freund
On 2014-05-17 22:55:14 +0200, Tomas Vondra wrote:
 And another memory context stats for a session executing CREATE INDEX,
 while having allocated  The interesting thing is there are ~11k lines
 that look exactly like this:
 
 pg_namespace_oid_index: 1024 total in 1 blocks; 88 free (0 chunks); 936 used

Heh. That certainly doesn't look right. I bet it's the contexts from
RelationInitIndexAccessInfo().

Looks like there generally are recursion 'troubles' with system
indexes. RelationClearRelation() will mark them as invalid causing any
further lookup to do RelationBuildDesc() calls. Which will again rebuild
the same relation if it's used somewhere inside RelationBuildDesc() or
RelationInitIndexAccessInfo(). From a quick look it looks like it should
resolve itself after some time. Even freeing the superflous memory
contexts. But I am not sure if there scenarios where that won't
happen...


Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] %d in log_line_prefix doesn't work for bg/autovacuum workers

2014-05-17 Thread Andres Freund
On 2014-05-17 23:35:43 +0200, Christoph Berg wrote:
 They want to pg_terminate_backend() the sessions on db_old anyway, but
 to get that point, you need to be able to rename the db. The
 alternative would be to disallow connections to the db (for which
 there is no real nice way), kill existing connections, do the
 renaming, and then reenable connections. That's much more effort,
 though.

There should really be a way to set datallowcon... But anyway, you can
set the connection limit to 0, that should mostly work?

 Fwiw, this wasn't the first time I've heard of that idea, it also
 doesn't sound too far-fetched for me. I guess people usually go damn,
 I can't rename active dbs, let's try something else instead of
 complaining on the mailing lists in that case.

Hm.

   We also looked into the confused-client problem, but
   current_database() reports the new name correctly, and hence figured
   there shouldn't be any problem with this approach, despite it
   obviously being slightly out of spec because of the dependency on
   running on a SR slave.
  
  At the very least log_line_prefix's %d will log the wrong thing.
 
 I guess if you are doing database renames, you can live with that,
 especially if you are doing the renames only for the purpose of
 putting an updated db in place, and then throwing away the old db
 quickly.

I don't know. If you switch databases around, for example, and sessions
on both report the original database name things really get confusing.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allowing join removals for more join types

2014-05-17 Thread David Rowley
On Sun, May 18, 2014 at 2:55 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 David Rowley dgrowle...@gmail.com writes:
  It looks like the existing join removals are done quite early in the
  planning and redundant joins are removed before any subqueries from that
  query are planned. So this innerrel-subroot-parse has not been done
 yet.
  It seems to be done later in query_planner() when make_one_rel() is
 called.

 It's true that we don't plan the subquery till later, but I don't see why
 that's important here.  Everything you want to know is available from the
 subquery parsetree; so just look at the RTE, don't worry about how much
 of the RelOptInfo has been filled in.


Thanks. I think I've found what you're talking about in PlannerInfo
simple_rte_array.
That's got the ball rolling.



  The best I can come up with on how to implement this is to have 2 stages
 of
  join removals. Stage 1 would be the existing stage that attempts to
 remove
  joins from non subqueries. Stage 2 would happen just after make_one_rel()
  is called from query_planner(), this would be to attempt to remove any
  subqueries that are not need, and if it managed to remove any it would
  force a 2nd call to make_one_rel().

 That sounds like a seriously bad idea.  For one thing, it blows the
 opportunity to not plan the subquery in the first place.  For another,
 most of these steps happen in a carefully chosen order because there
 are interdependencies.  You can't just go back and re-run some earlier
 processing step.  A large fraction of the complexity of analyzejoins.c
 right now arises from the fact that it has to undo some earlier
 processing; that would get enormously worse if you delayed it further.


Agreed, but at the time I didn't know that the subquery information was
available elsewhere.



 BTW, just taking one step back ... this seems like a pretty specialized
 requirement.  Are you sure it wouldn't be easier to fix your app to
 not generate such silly queries?


Well, couldn't you say the same about any join removals? Of course the
query could be rewritten to not reference that relation, but there are
cases where removing the redundant join might be more silly, for example a
fairly complex view may exist and many use cases for the view don't require
all of the columns. It might be a bit of a pain to maintain a whole series
of views with each required subset of columns instead of just maintaining a
single view and allow callers to use what they need from it.

I came across the need for this at work this week where we have a grid in a
UI where the users can select columns that they need to see in the grid.
The data in each grid is supplied by a single view which contains all the
possible columns that they might need, if the user is just using a narrow
subset of those columns then it could seem quite wasteful to do more than
is required.

Regards

David Rowley


Re: [HACKERS] 9.4 beta1 crash on Debian sid/i386

2014-05-17 Thread Christoph Berg
Re: Tom Lane 2014-05-14 1357.1400028...@sss.pgh.pa.us
 Christoph Berg c...@df7cb.de writes:
  Building 9.4 beta1 on Debian sid/i386 fails during the regression
  tests. amd64 works fine, as does i386 on the released distributions.
 
 It would appear that something is wrong with check_stack_depth(),
 and/or getrlimit(RLIMIT_STACK) is lying to us about the available stack.
 None of that logic has changed in awhile.  You might try checking what
 the max_stack_depth GUC gets set to by default in each build, and how that
 compares to what ulimit -s has to say.  If it looks sane, try tracing
 through check_stack_depth.

ulimit -s is 8192 (kB); max_stack_depth is 2MB.

check_stack_depth looks right, max_stack_depth_bytes there is 2097152
and I can see stack_base_ptr - stack_top_loc grow over repeated
invocations of the function (stack_depth itself is optimized out).
Still, it never enters if (stack_depth  max_stack_depth_bytes...).

Using b check_stack_depth if (stack_base_ptr - stack_top_loc) 
100, I could see the stack size at 1000264, though when I then
tried with  190, it caught SIGBUS again.

In the meantime, the problem has manifested itself also on other
architectures: armel, armhf, and mipsel (the build logs are at [1],
though they don't contain anything except a FATAL:  the database
system is in recovery mode).

[1] https://buildd.debian.org/status/logs.php?pkg=postgresql-9.4ver=9.4~beta1-1

Interestingly, the Debian buildd managed to run the testsuite for
i386, while I could reproduce the problem on the pgapt build machine
and on my notebook, so there must be some system difference. Possibly
the reason is these two machines are running a 64bit kernel and I'm
building in a 32bit chroot, though that hasn't been a problem before.

Christoph
-- 
c...@df7cb.de | http://www.df7cb.de/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pgbench is broken on strict-C89 compilers

2014-05-17 Thread Tom Lane
I got around to trying to build PG with the HP-supplied compiler on my
ancient HPUX box, something I do about once a release cycle to see if
we've finally broken that trailing-edge toolchain.  Things still seem
to work except for this:

cc: pgbench.c, line 1579: error 1521: Incorrect initialization.
cc: pgbench.c, line 1591: error 1521: Incorrect initialization.

What it's complaining about is these nonconstant initializers:

struct ddlinfo DDLs[] = {
{
pgbench_history,
scale = SCALE_32BIT_THRESHOLD
? tid int,bid int,aid bigint,delta int,mtime timestamp,filler 
char(22)
: tid int,bid int,aidint,delta int,mtime timestamp,filler 
char(22),
0
},

which were apparently added in commit 89d00cbe.  It appears to me that
the compiler is within its rights to refuse a nonconstant expression
for an inner initializer according to C89, though I don't see any such
restriction in C99.

We shipped this code in 9.3, and nobody's complained yet, so maybe
it's time to forget about C89 compliance.  On the other hand, minor
notational convenience seems like a pretty poor reason to move the
goalposts for C language compliance, so I'm inclined to fix this.

I'm not entirely sure what the least ugly substitute code would be.
One idea is to replace the bigint/int column types with %s and fill
in the correct type with a snprintf operation, but that's not real
attractive because it would only work for a single such column,
and the compiler could not catch any format-spec-mismatch problems.

Thoughts?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench is broken on strict-C89 compilers

2014-05-17 Thread Andres Freund
Hi,

On 2014-05-17 19:15:15 -0400, Tom Lane wrote:
 I got around to trying to build PG with the HP-supplied compiler on my
 ancient HPUX box, something I do about once a release cycle to see if
 we've finally broken that trailing-edge toolchain.  Things still seem
 to work except for this:
 
 cc: pgbench.c, line 1579: error 1521: Incorrect initialization.
 cc: pgbench.c, line 1591: error 1521: Incorrect initialization.
 
 What it's complaining about is these nonconstant initializers:
 
 struct ddlinfo DDLs[] = {
 {
 pgbench_history,
 scale = SCALE_32BIT_THRESHOLD
 ? tid int,bid int,aid bigint,delta int,mtime timestamp,filler 
 char(22)
 : tid int,bid int,aidint,delta int,mtime timestamp,filler 
 char(22),
 0
 },
 
 which were apparently added in commit 89d00cbe.  It appears to me that
 the compiler is within its rights to refuse a nonconstant expression
 for an inner initializer according to C89, though I don't see any such
 restriction in C99.

Yea, I've complained about it in
http://www.postgresql.org/message-id/20140403152834.gg17...@awork2.anarazel.de

That piece code is also confused about using static vs. const. For a lot
longer though...

I'd planned to put up a buildfarm animal compiling pg with
-Wc11-extensions and -Wc99-extensions once the configure bits are
committed in some form.

 We shipped this code in 9.3, and nobody's complained yet, so maybe
 it's time to forget about C89 compliance.  On the other hand, minor
 notational convenience seems like a pretty poor reason to move the
 goalposts for C language compliance, so I'm inclined to fix this.

I think it's time to move past pure C89 in the near future. But I also
think if we do so we should do it at the beginning of the release
cycle. And selectively, using only the features we want and that are
widely available. E.g. msvc still doesn't fully support C99.
So +1 for fixing this.

 I'm not entirely sure what the least ugly substitute code would be.
 One idea is to replace the bigint/int column types with %s and fill
 in the correct type with a snprintf operation, but that's not real
 attractive because it would only work for a single such column,
 and the compiler could not catch any format-spec-mismatch problems.

I'd just duplicated the ddl structs. Seemed to be the least ugly thing I
could come up with. For from pretty tho.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.4 checksum error in recovery with btree index

2014-05-17 Thread Jeff Janes
On Saturday, May 17, 2014, Heikki Linnakangas hlinnakan...@vmware.com
wrote:

 On 05/17/2014 12:28 AM, Jeff Janes wrote:

 More fun with my torn page injection test program on 9.4.

 24171  2014-05-16 14:00:44.934 PDT:WARNING:  01000: page verification
 failed, calculated checksum 21100 but expected 3356
 24171  2014-05-16 14:00:44.934 PDT:CONTEXT:  xlog redo split_l: rel
 1663/16384/16405 left 35191, right 35652, next 34666, level 0, firstright
 192
 24171  2014-05-16 14:00:44.934 PDT:LOCATION:  PageIsVerified,
 bufpage.c:145
 24171  2014-05-16 14:00:44.934 PDT:FATAL:  XX001: invalid page in block
 34666 of relation base/16384/16405
 24171  2014-05-16 14:00:44.934 PDT:CONTEXT:  xlog redo split_l: rel
 1663/16384/16405 left 35191, right 35652, next 34666, level 0, firstright
 192
 24171  2014-05-16 14:00:44.934 PDT:LOCATION:  ReadBuffer_common,
 bufmgr.c:483


 I've seen this twice now, the checksum failure was both times for the
 block
 labelled next in the redo record.  Is this another case where the block
 needs to be reinitialized upon replay?


 Hmm, it looks like I fumbled the numbering of the backup blocks in the
 b-tree split WAL record (in 9.4). I blame the comments; the comments where
 the record is generated numbers the backup blocks starting from 1, but
 XLR_BKP_BLOCK(x) and RestoreBackupBlock(...) used in replay number them
 starting from 0.

 Attached is a patch that I think fixes them. In addition to the
 rnext-reference, clearing the incomplete-split flag in the child page, had
 a similar numbering mishap.


The seems to have fixed it.

Thanks,

Jeff


Re: [HACKERS] 9.4 beta1 crash on Debian sid/i386

2014-05-17 Thread Tom Lane
Christoph Berg c...@df7cb.de writes:
 Re: Tom Lane 2014-05-14 1357.1400028...@sss.pgh.pa.us
 It would appear that something is wrong with check_stack_depth(),
 and/or getrlimit(RLIMIT_STACK) is lying to us about the available stack.

 ulimit -s is 8192 (kB); max_stack_depth is 2MB.

 check_stack_depth looks right, max_stack_depth_bytes there is 2097152
 and I can see stack_base_ptr - stack_top_loc grow over repeated
 invocations of the function (stack_depth itself is optimized out).
 Still, it never enters if (stack_depth  max_stack_depth_bytes...).

Hm.  Did you check that stack_base_ptr is non-NULL?  If it were somehow
not getting set, that would disable the error report.  But on most
architectures that would also result in silly values for the pointer
difference, so I doubt this is the issue.

 Interestingly, the Debian buildd managed to run the testsuite for
 i386, while I could reproduce the problem on the pgapt build machine
 and on my notebook, so there must be some system difference. Possibly
 the reason is these two machines are running a 64bit kernel and I'm
 building in a 32bit chroot, though that hasn't been a problem before.

I'm suspicious that something has changed in your build environment,
because that stack-checking logic hasn't changed since these commits:

Author: Heikki Linnakangas heikki.linnakan...@iki.fi
Branch: master Release: REL9_2_BR [ef3883d13] 2012-04-08 19:07:55 +0300
Branch: REL9_1_STABLE Release: REL9_1_4 [ef29bb1f7] 2012-04-08 19:08:13 +0300
Branch: REL9_0_STABLE Release: REL9_0_8 [77dc2b0a4] 2012-04-08 19:09:12 +0300
Branch: REL8_4_STABLE Release: REL8_4_12 [89da5dc6d] 2012-04-08 19:09:26 +0300
Branch: REL8_3_STABLE Release: REL8_3_19 [ddeac5dec] 2012-04-08 19:09:37 +0300

Do stack-depth checking in all postmaster children.

We used to only initialize the stack base pointer when starting up a regular
backend, not in other processes. In particular, autovacuum workers can run
arbitrary user code, and without stack-depth checking, infinite recursion
in e.g an index expression will bring down the whole cluster.

The lack of reports from the buildfarm or other users is also evidence
against there being a widespread issue here.

A different thought: I have heard of environments in which the available
stack depth is much less than what ulimit would suggest because the ulimit
space gets split up for multiple per-thread stacks.  That should not be
happening in a Postgres backend, since we don't do threading, but I'm
running out of ideas to investigate ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] uuid-ossp (Re: [pgsql-packagers] Postgresapp 9.4 beta build ready)

2014-05-17 Thread Tom Lane
Christoph Berg c...@df7cb.de writes:
 [redirecting to -hackers]
 Re: Tom Lane 2014-05-15 31008.1400180...@sss.pgh.pa.us
 Sandeep Thakkar sandeep.thak...@enterprisedb.com writes:
 Yes, Jakob is right. On 9.4, we had to patch configure script along with
 uuid-ossp.c to resolve the uuid issue.

 I think we need to discuss this on the open pgsql-hackers list.
 It seems like we have to either 
 
 (1) hack both configure and uuid-ossp.c to work around the issue
 
 (2) do something more radical, like adopt Palle's patch to use the
 BSD uuid functions.
 
 Now, (2) sounds a bit scary for a post-beta1 change, but on the other hand
 we know that ossp-uuid is a train wreck in progress.  Maybe it's time
 to do something about it.  But that's got to be debated on -hackers not
 here.

 Fwiw, libossp-uuid is the only non-trivial dependency of
 postgresql*.{deb,rpm} - I've been to trainings at customer sites more
 than once where we just had a preinstalled Linux machine with no
 internet and a set of PostgreSQL packages that needed dpkg
 --force-depends or rpm --nodeps to install just because the -contrib
 package needs that lib.
 So if we got rid of that dependency, life might become a bit easier
 also in that setting.

I was intending to draft something more self-contained to present to
-hackers, but since this is where we're at ... the quoted material
omits a couple of important points:

(1) People had been working around uuid-ossp's issues on OS X by
patching a #define _XOPEN_SOURCE into contrib/uuid-ossp/uuid-ossp.c.
As of 9.4 this is no longer sufficient because we upgraded to autoconf
2.69, which uses stricter testing for include-file validity than older
versions did.  The test to see if uuid.h is available therefore fails,
unless that #define is also patched into the configure script.  In any
case #define _XOPEN_SOURCE has enough potential implications that
this doesn't seem like a very recommendable solution.

(2) Palle's patch mentioned above can be seen here:
http://svnweb.freebsd.org/ports/head/databases/postgresql93-server/files/patch-contrib-uuid?revision=348732view=markup
Basically it jacks up contrib/uuid-ossp and rolls the BSD uuid functions
underneath.  It looks like this is based on work by Andrew Gierth.

So, having seen that proof-of-concept, I'm wondering if we shouldn't make
an effort to support contrib/uuid-ossp with a choice of UUID libraries
underneath it.  There is a non-OSSP set of UUID library functions
available on Linux (libuuid from util-linux-ng).  I don't know whether
that's at all compatible with the BSD functions, but even if it's not,
presumably a shim for it wouldn't be much larger than the BSD patch.

A bigger question is whether there are any user-visible functional
incompatibilities introduced by depending on either of those libraries
instead of OSSP uuid.  Some research would be needed.

It's not exactly appetizing to consider fooling around with such changes
post-beta1.  But on the other hand, OSSP uuid *is* a train wreck in
progress, and remaining dependent on it for another release cycle is not
exactly appetizing either.

Comments?  Is anybody feeling motivated to do the legwork on this?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench is broken on strict-C89 compilers

2014-05-17 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2014-05-17 19:15:15 -0400, Tom Lane wrote:
 ... It appears to me that
 the compiler is within its rights to refuse a nonconstant expression
 for an inner initializer according to C89, though I don't see any such
 restriction in C99.

 Yea, I've complained about it in
 http://www.postgresql.org/message-id/20140403152834.gg17...@awork2.anarazel.de

Ah.  I'd sort of ignored that patch because it didn't seem too relevant
to the issues we were discussing at the time.

 That piece code is also confused about using static vs. const. For a lot
 longer though...

Well, static is also a good thing here because it eliminates the need
for runtime initialization of a function-local array variable.  But yeah,
the code is way under-const-ified as well.

 I'd just duplicated the ddl structs. Seemed to be the least ugly thing I
 could come up with. For from pretty tho.

It works, anyway.  If I don't think of something better, I'll do a bit
more polishing and commit that tomorrow.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Possible fix for occasional failures on castoroides etc

2014-05-17 Thread Tom Lane
Dave Page dp...@pgadmin.org writes:
 On Sat, May 3, 2014 at 8:29 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2012-09-17 08:23:01 -0400, Dave Page wrote:
 I've added MAX_CONNECTIONS=5 to both Castoroides and Protosciurus.

 I've just noticed (while checking whether backporting 4c8aa8b5aea caused
 problems) that this doesn't seem to have fixed the issue. One further
 thing to try would be to try whether tcp connections don't have the same
 problem.

 I've added:
 EXTRA_REGRESS_OPTS = '--host=localhost',
 to the build_env setting for both animals.

According to
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=protosciurusdt=2014-05-16%2014%3A27%3A58
this did not fix the problem; however, the failure is

! psql: could not connect to server: Connection refused
!   Is the server running locally and accepting
!   connections on Unix domain socket /tmp/.s.PGSQL.57345?

which shows that this configuration change did not actually have the
desired effect of forcing the regression tests to be run across TCP.
I'm too tired to check into what *would* force that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Possible fix for occasional failures on castoroides etc

2014-05-17 Thread Andres Freund
On 2014-05-18 01:35:04 -0400, Tom Lane wrote:
 Dave Page dp...@pgadmin.org writes:
  On Sat, May 3, 2014 at 8:29 PM, Andres Freund and...@2ndquadrant.com 
  wrote:
  On 2012-09-17 08:23:01 -0400, Dave Page wrote:
  I've added MAX_CONNECTIONS=5 to both Castoroides and Protosciurus.
 
  I've just noticed (while checking whether backporting 4c8aa8b5aea caused
  problems) that this doesn't seem to have fixed the issue. One further
  thing to try would be to try whether tcp connections don't have the same
  problem.
 
  I've added:
  EXTRA_REGRESS_OPTS = '--host=localhost',
  to the build_env setting for both animals.
 
 According to
 http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=protosciurusdt=2014-05-16%2014%3A27%3A58
 this did not fix the problem; however, the failure is
 
 ! psql: could not connect to server: Connection refused
 ! Is the server running locally and accepting
 ! connections on Unix domain socket /tmp/.s.PGSQL.57345?
 
 which shows that this configuration change did not actually have the
 desired effect of forcing the regression tests to be run across TCP.
 I'm too tired to check into what *would* force that.

I think that's just because EXTRA_REGRESS_OPTS is fairly new
(19fa6161dd6ba85b6c88b3476d165745dd5192d9). No idea if there's a nice
way to pass options to the pg_regress invocations of buildfarm animals.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers