Re: [HACKERS] Securing make check (CVE-2014-0067)

2014-12-25 Thread David Rowley
On 25 December 2014 at 18:27, Noah Misch n...@leadboat.com wrote:

 On Thu, Dec 25, 2014 at 03:55:02PM +1300, David Rowley wrote:
  f6dc6dd seems to have broken vcregress check for me:

  FATAL:  no pg_hba.conf entry for host ::1, user David, database
  postgres
  ...
  FATAL:  no pg_hba.conf entry for host ::1, user David, database
  postgres

 Thanks.  I bet this is the reason buildfarm members hamerkop, jacana and
 bowerbird have not been reporting in.

  @@ -1085,6 +1085,8 @@ config_sspi_auth(const char *pgdata)
CW(fputs(# Configuration written by config_sspi_auth()\n, hba)
 = 0);
CW(fputs(host all all 127.0.0.1/32  sspi include_realm=1
 map=regress\n,
 hba) = 0);
  + CW(fputs(host all all ::1/128  sspi include_realm=1
 map=regress\n,
  +  hba) = 0);

 This needs to be conditional on whether the platform supports IPv6, like
 we do
 in setup_config().  The attached patch works on these configurations:

 64-bit Windows Server 2003, 32-bit VS2010
 64-bit Windows Server 2003, MinGW (always 32-bit)
 64-bit Windows Server 2008, 64-bit VS2012
 64-bit Windows Server 2008, 64-bit MinGW-w64

 If the patch looks reasonable, I will commit it.


I'm just looking at initdb.c I see that there's this:

#ifdef HAVE_IPV6

/*
 * Probe to see if there is really any platform support for IPv6, and
 * comment out the relevant pg_hba line if not.  This avoids runtime
 * warnings if getaddrinfo doesn't actually cope with IPv6.  Particularly
 * useful on Windows, where executables built on a machine with IPv6 may
 * have to run on a machine without.
 */
The comment does seem to indicate that getaddrinfo might give a warning on
an IPv4 only machine when given an IPv6 address to resolve. I think likely
we want that here too. Though I don't have an IPv4 only machine to test on.

I'll test the patch with IPv4 disabled and see if I get a warning...

Ok, it seems to still write the Ipv6 entry into the pg_hba.conf with IPv6
disabled, so perhaps disabling IPv6 is not sufficient, maybe it needs to be
tested on a machine that does not support IPv6 at all.

Regards

David Rowley


Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

2014-12-25 Thread Abhijit Menon-Sen
At 2014-12-22 08:05:57 -0500, robertmh...@gmail.com wrote:

 On Tue, Dec 16, 2014 at 1:28 PM, Stephen Frost sfr...@snowman.net wrote:
  … ok, does the audit role have any permissions here? and, if the
  result is yes, then the command is audited. …
 
 This is a little weird because you're effectively granting an
 anti-permission.

Yes, it's a very clever solution, but also pretty weird. I think that's
why I didn't understand it. I've been looking into writing the code, but
I haven't quite gotten over the weirdness yet.

 I'm not sure whether that ought to be regarded as a serious problem,
 but it's a little surprising.

I'm not sure either.

Stephen likes the idea, obviously; Simon also said he liked it, but I
now wonder if he may have liked the part I implemented (which allows a
hot standby to have a different auditing configuration than the primary)
but not fully realised the remainder of the proposal.

Before I go much further, how do others feel about it?

To summarise for people who haven't followed the thread in detail, the
idea is that you would do:

grant select on foo to audit;

…and the server would audit-log any select … from foo … queries (by
any user). One immediate consequence is that only things you could grant
permissions for could be audited (by this mechanism), but I guess that's
a problem only in the short term. Another consequence is that you can't
audit selects on foo only by role x and selects on bar only by role y.

 Also, what makes the audit role magical?

I think it's because it exists only to receive these negative grants,
there's no other magic involved. Stephen also said «Note that this role,
from core PG's perspective, wouldn't be special at all».

-- Abhijit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Moving RestoreBlockImage from xlogreader.c to xlogutils.c

2014-12-25 Thread Andres Freund
On 2014-12-25 08:52:05 +0900, Michael Paquier wrote:
 On Wed, Dec 24, 2014 at 10:51 PM, Fujii Masao masao.fu...@gmail.com wrote:
  Fair enough. Anyway I wait for applying the patch which moves 
  pg_lzcompress.c
  until we will have reached any consensus about this.
 Just to be clear (after sleeping on it), we still need pglz stuff in
 src/common to offer to the frontends the possibility to uncompress
 block data. My point is simply that we should only provide in the xlog
 reader facility enough data to do operations on them, but not directly
 APIs to operate them. So ISTM that you could still push the patch to
 have pglz in common library to clear the way, and let's use this
 thread to discuss if we want the API to rebuild blocks in the reader
 facility or not.

I think it's a bad idea to move it away - the current placement provides
a API that allows to get at the image data without having to deal with
the low level details. E.g. the in_use, has_image and hole
logic. *Especially* when we add compression that's quite a useful
abstraction. Having it it in place allows us to restructure internals
without disturbing clients - and i don't see any price in this case.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: hash_create API changes (was Re: [HACKERS] speedup tidbitmap patch: hash BlockNumber)

2014-12-25 Thread Andres Freund
On 2014-12-24 13:58:41 -0600, Jim Nasby wrote:
 This surprises me given that SMHasher shows XXH to be 50% faster than Spooky 
 for 20 byte keys.

Note that there's quite some differences in how smhasher and postgres
use the hash functions. The former nearly exclusively exercises the hash
function while postgres also executes a lot of other code. Which means
that the instruction cache and branch prediction is much less likely to
contain relevant entries. And that can change the performance
characteristics noticeably. Additionally there's differences in how much
pipelining can be employed - the likelihood the CPU is able to do so in
tight loops is much higher.

Also note that in many cases in which you can see tag_hash/hash_any in
workloads a part of the cost won't actually be the computation of the
hashes themselves but the cache misses triggered by the hash computation
accessing the data.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Moving RestoreBlockImage from xlogreader.c to xlogutils.c

2014-12-25 Thread Michael Paquier
On Thu, Dec 25, 2014 at 7:48 PM, Andres Freund and...@2ndquadrant.com wrote:
 I think it's a bad idea to move it away - the current placement provides
 a API that allows to get at the image data without having to deal with
 the low level details. E.g. the in_use, has_image and hole
 logic. *Especially* when we add compression that's quite a useful
 abstraction. Having it it in place allows us to restructure internals
 without disturbing clients - and i don't see any price in this case.

There is no price as long as we keep the compression algorithm fixed
in core, but I do foresee one regarding the pluggability of the
decompression API knowing that RestoreBlockImage is the natural place
where block decompression should occur, and that now it is shared
between frontend and backend layers. I am digressing here, but what I
had in mind was to add exactly there a hook to allow our users to
plugin a custom compression algorithm, something that could be useful
for us and for our users in terms of flexibility for the WAL
compression, particularly for algorithms that are not compatible with
the Postgres license.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-25 Thread Michael Paquier
On Wed, Dec 24, 2014 at 9:03 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 Returning only a boolean is fine for me (that's what my first patch
 did), especially if we add at some point hooks for compression and
 decompression calls.
Here is a patch rebased on current HEAD (60838df) for the core feature
with the APIs of pglz using booleans as return values.
-- 
Michael
From c8891e6086682d4e5bc197ef3047068b3a3875c5 Mon Sep 17 00:00:00 2001
From: Michael Paquier michael@otacoo.com
Date: Tue, 25 Nov 2014 14:24:26 +0900
Subject: [PATCH] Support compression for full-page writes in WAL

Compression is controlled with a new parameter called wal_compression.
This parameter can be changed at session level to control WAL compression.
---
 contrib/pg_xlogdump/pg_xlogdump.c |  20 ++--
 doc/src/sgml/config.sgml  |  29 +
 src/backend/access/transam/xlog.c |   1 +
 src/backend/access/transam/xloginsert.c   | 155 ++
 src/backend/access/transam/xlogreader.c   |  77 ++---
 src/backend/utils/misc/guc.c  |   9 ++
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/access/xlog.h |   1 +
 src/include/access/xlogreader.h   |   7 +-
 src/include/access/xlogrecord.h   |  36 --
 src/include/pg_config.h.in|   4 +-
 11 files changed, 283 insertions(+), 57 deletions(-)

diff --git a/contrib/pg_xlogdump/pg_xlogdump.c b/contrib/pg_xlogdump/pg_xlogdump.c
index 9f05e25..3ba1d8f 100644
--- a/contrib/pg_xlogdump/pg_xlogdump.c
+++ b/contrib/pg_xlogdump/pg_xlogdump.c
@@ -363,15 +363,12 @@ XLogDumpCountRecord(XLogDumpConfig *config, XLogDumpStats *stats,
 	 * takes up BLCKSZ bytes, minus the hole length.
 	 *
 	 * XXX: We peek into xlogreader's private decoded backup blocks for the
-	 * hole_length. It doesn't seem worth it to add an accessor macro for
+	 * length of block. It doesn't seem worth it to add an accessor macro for
 	 * this.
 	 */
 	fpi_len = 0;
 	for (block_id = 0; block_id = record-max_block_id; block_id++)
-	{
-		if (XLogRecHasBlockImage(record, block_id))
-			fpi_len += BLCKSZ - record-blocks[block_id].hole_length;
-	}
+		fpi_len += record-blocks[block_id].bkp_len;
 
 	/* Update per-rmgr statistics */
 
@@ -465,9 +462,16 @@ XLogDumpDisplayRecord(XLogDumpConfig *config, XLogReaderState *record)
    blk);
 			if (XLogRecHasBlockImage(record, block_id))
 			{
-printf( (FPW); hole: offset: %u, length: %u\n,
-	   record-blocks[block_id].hole_offset,
-	   record-blocks[block_id].hole_length);
+if (record-blocks[block_id].is_compressed)
+	printf( (FPW compressed); hole offset: %u, 
+		   compressed length: %u, original length: %u\n,
+		   record-blocks[block_id].hole_offset,
+		   record-blocks[block_id].bkp_len,
+		   record-blocks[block_id].bkp_uncompress_len);
+else
+	printf( (FPW); hole offset: %u, length: %u\n,
+		   record-blocks[block_id].hole_offset,
+		   record-blocks[block_id].bkp_len);
 			}
 			putchar('\n');
 		}
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 6bcb106..acbbd20 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2282,6 +2282,35 @@ include_dir 'conf.d'
   /listitem
  /varlistentry
 
+ varlistentry id=guc-wal-compression xreflabel=wal_compression
+  termvarnamewal_compression/varname (typeboolean/type)
+  indexterm
+   primaryvarnamewal_compression/ configuration parameter/primary
+  /indexterm
+  /term
+  listitem
+   para
+When this parameter is literalon/, the productnamePostgreSQL/
+server compresses the content of full-page writes when necessary and
+inserts in WAL records with smaller sizes, reducing the amount of
+WAL stored on disk.
+   /para
+
+   para
+Compression has the advantage of reducing the amount of disk I/O when
+doing WAL-logging, at the cost of some extra CPU to perform the
+compression of a block image.  At WAL replay, compressed block images
+need extra CPU cycles to perform the decompression of each block
+image, but it can reduce as well replay time in I/O bounded
+environments.
+   /para
+
+   para
+The default value is literaloff/.
+   /para
+  /listitem
+ /varlistentry
+
  varlistentry id=guc-wal-buffers xreflabel=wal_buffers
   termvarnamewal_buffers/varname (typeinteger/type)
   indexterm
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index e54..d68d9e3 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -88,6 +88,7 @@ char	   *XLogArchiveCommand = NULL;
 bool		EnableHotStandby = false;
 bool		fullPageWrites = true;
 bool		wal_log_hints = false;
+bool		wal_compression = false;
 bool		log_checkpoints = false;
 int			sync_method = 

Re: [HACKERS] Moving RestoreBlockImage from xlogreader.c to xlogutils.c

2014-12-25 Thread Andres Freund
On 2014-12-25 21:12:54 +0900, Michael Paquier wrote:
 On Thu, Dec 25, 2014 at 7:48 PM, Andres Freund and...@2ndquadrant.com wrote:
  I think it's a bad idea to move it away - the current placement provides
  a API that allows to get at the image data without having to deal with
  the low level details. E.g. the in_use, has_image and hole
  logic. *Especially* when we add compression that's quite a useful
  abstraction. Having it it in place allows us to restructure internals
  without disturbing clients - and i don't see any price in this case.
 
 There is no price as long as we keep the compression algorithm fixed
 in core, but I do foresee one regarding the pluggability of the
 decompression API knowing that RestoreBlockImage is the natural place
 where block decompression should occur, and that now it is shared
 between frontend and backend layers.
 I am digressing here, but what I had in mind was to add exactly there
 a hook to allow our users to plugin a custom compression algorithm,
 something that could be useful for us and for our users in terms of
 flexibility for the WAL compression, particularly for algorithms that
 are not compatible with the Postgres license.  -- Michael

I personally think that's a bad idea and we shouldn't provide
functionality for that. But even if we added it, something that provides
the decompression for frontend code seems critical.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

2014-12-25 Thread Jaime Casanova
On Thu, Dec 25, 2014 at 5:42 AM, Abhijit Menon-Sen a...@2ndquadrant.com wrote:

 To summarise for people who haven't followed the thread in detail, the
 idea is that you would do:

 grant select on foo to audit;

 ...and the server would audit-log any select ... from foo ... queries (by
 any user).

what if i want to audit different things for different users?

-- 
Jaime Casanova www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación
Phone: +593 4 5107566 Cell: +593 987171157


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Better way of dealing with pgstat wait timeout during buildfarm runs?

2014-12-25 Thread Andres Freund
Hi,

We quite regularly have buildfarm failures that are caused by 'WARNING:
pgstat wait timeout' at random points during the build. Especially on
some of the ARM buildfarm animals those are really frequent, to the
point that it's hard to know the actual state of the buildfarm without
looking at several animals. The ARM ones are probably affected more
frequently because they'll often run on sd cards and such.

So I think a better way to deal with that warning would be a good
idea. Besides somehow making the mechanism there are two ways to attack
this that I can think of, neither of them awe inspiring:

1) Make that WARNING a LOG message instead. Since those don't get send
to the client with default settings...
2) Increase PGSTAT_MAX_WAIT_TIME even further than what 99b545 increased
it to.

Does anybody have an actually good idea?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Better way of dealing with pgstat wait timeout during buildfarm runs?

2014-12-25 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 So I think a better way to deal with that warning would be a good
 idea. Besides somehow making the mechanism there are two ways to attack
 this that I can think of, neither of them awe inspiring:

 1) Make that WARNING a LOG message instead. Since those don't get send
 to the client with default settings...
 2) Increase PGSTAT_MAX_WAIT_TIME even further than what 99b545 increased
 it to.

Yeah, I've been getting more annoyed by that too lately.  I keep wondering
though whether there's an actual bug underneath that behavior that we're
failing to see.  PGSTAT_MAX_WAIT_TIME is already 10 seconds; it's hard to
credit that increasing it still further would be fixing anything.
The other change would also mainly just sweep the issue under the rug,
if there is any issue and it's not just that we're overloading
underpowered buildfarm machines.  (Maybe a better fix would be to reduce
MAX_CONNECTIONS for the tests on these machines?)

I wonder whether when multiple processes are demanding statsfile updates,
there's some misbehavior that causes them to suck CPU away from the stats
collector and/or convince it that it doesn't need to write anything.
There are odd things in the logs in some of these events.  For example in
today's failure on hamster,
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamsterdt=2014-12-25%2016%3A00%3A07
there are two client-visible wait-timeout warnings, one each in the
gist and spgist tests.  But in the postmaster log we find these in
fairly close succession:

[549c38ba.724d:2] WARNING:  pgstat wait timeout
[549c39b1.73e7:10] WARNING:  pgstat wait timeout
[549c38ba.724d:3] WARNING:  pgstat wait timeout

Correlating these with other log entries shows that the first and third
are from the autovacuum launcher while the second is from the gist test
session.  So the spgist failure failed to get logged, and in any case the
big picture is that we had four timeout warnings occurring in a pretty
short span of time, in a parallel test set that's not all that demanding
(12 parallel tests, well below our max).  Not sure what to make of that.

BTW, I notice that in the current state of pgstat.c, all the logic for
keeping track of request arrival times is dead code, because nothing is
actually looking at DBWriteRequest.request_time.  This makes me think that
somebody simplified away some logic we maybe should have kept.  But if
we're going to leave it like this, we could replace the DBWriteRequest
data structure with a simple OID list and save a fair amount of code.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Better way of dealing with pgstat wait timeout during buildfarm runs?

2014-12-25 Thread Andres Freund
On 2014-12-25 14:36:42 -0500, Tom Lane wrote:
 I wonder whether when multiple processes are demanding statsfile updates,
 there's some misbehavior that causes them to suck CPU away from the stats
 collector and/or convince it that it doesn't need to write anything.
 There are odd things in the logs in some of these events.  For example in
 today's failure on hamster,
 http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamsterdt=2014-12-25%2016%3A00%3A07
 there are two client-visible wait-timeout warnings, one each in the
 gist and spgist tests.  But in the postmaster log we find these in
 fairly close succession:
 
 [549c38ba.724d:2] WARNING:  pgstat wait timeout
 [549c39b1.73e7:10] WARNING:  pgstat wait timeout
 [549c38ba.724d:3] WARNING:  pgstat wait timeout
 
 Correlating these with other log entries shows that the first and third
 are from the autovacuum launcher while the second is from the gist test
 session.  So the spgist failure failed to get logged, and in any case the
 big picture is that we had four timeout warnings occurring in a pretty
 short span of time, in a parallel test set that's not all that demanding
 (12 parallel tests, well below our max).  Not sure what to make of that.

My guess is that a checkpoint happened at that time. Maybe it'd be a
good idea to make pg_regress start postgres with log_checkpoints
enabled? My guess is that we'd find horrendous 'sync' times.

Michael: Could you perhaps turn log_checkpoints on in the config?

 BTW, I notice that in the current state of pgstat.c, all the logic for
 keeping track of request arrival times is dead code, because nothing is
 actually looking at DBWriteRequest.request_time.  This makes me think that
 somebody simplified away some logic we maybe should have kept.  But if
 we're going to leave it like this, we could replace the DBWriteRequest
 data structure with a simple OID list and save a fair amount of code.

That's indeed odd. Seems to have been lost when the statsfile was split
into multiple files. Alvaro, Tomas?

I wondered for a second whether the split could be responsible somehow,
but there's reports of that in older backbranches as well:
http://pgbuildfarm.org/cgi-bin/show_log.pl?nm=mereswinedt=2014-12-23%2019%3A17%3A41

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Some other odd buildfarm failures

2014-12-25 Thread Tom Lane
These two recent failures look suspiciously similar:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jaguarundidt=2014-12-24%2021%3A03%3A05
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=koupreydt=2014-12-25%2018%3A43%3A17

to wit:

*** /usr/home/pgbf/buildenv/HEAD/pgsql.build/src/test/regress/expected/brin.out 
Wed Dec 24 22:03:07 2014
--- /usr/home/pgbf/buildenv/HEAD/pgsql.build/src/test/regress/results/brin.out  
Wed Dec 24 22:54:26 2014
***
*** 146,151 
--- 146,154 
  end loop;
  end;
  $x$;
+ ERROR:  cache lookup failed for function 30281
+ CONTEXT:  SQL statement create temp table qry_2_ss (tid tid) ON COMMIT DROP
+ PL/pgSQL function inline_code_block line 26 at EXECUTE statement
  INSERT INTO brintest SELECT
repeat(stringu1, 42)::bytea,
substr(stringu1, 1, 1)::char,


*** 
/home/markwkm/buildroot/HEAD/pgsql.24814/src/test/regress/expected/matview.out  
Thu Dec 25 18:43:31 2014
--- 
/home/markwkm/buildroot/HEAD/pgsql.24814/src/test/regress/results/matview.out   
Thu Dec 25 18:45:25 2014
***
*** 90,97 
--- 90,102 
  
  CREATE MATERIALIZED VIEW tvvm AS SELECT * FROM tvv;
  CREATE VIEW tvvmv AS SELECT * FROM tvvm;
+ ERROR:  cache lookup failed for function 30311
  CREATE MATERIALIZED VIEW bb AS SELECT * FROM tvvmv;


When I saw jagarundi's failure yesterday, I figured it was something to do
with CLOBBER_CACHE_ALWAYS ... but kouprey isn't using that option AFAICS,
so that idea fails to hold water.

I don't believe that the referenced function OIDs would ever actually
exist in the database in the current regression tests; and the two failing
statements have no reason to be accessing any user-defined functions
anyway.  So those OIDs are probably bogus.  It seems likely that something
is clobbering storage that is later expected to hold an OID.  Whatever's
going on, it's likely that this is a recently-introduced bug, because
I don't recall seeing reports like these before.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Some other odd buildfarm failures

2014-12-25 Thread Tom Lane
I wrote:
 These two recent failures look suspiciously similar:
 http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jaguarundidt=2014-12-24%2021%3A03%3A05
 http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=koupreydt=2014-12-25%2018%3A43%3A17

And I'd barely finished posting that before this one arrived:

http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=shearwaterdt=2014-12-25%2020%3A24%3A32

*** 
/home/buildfarm/build_root/HEAD/pgsql.32096/src/test/regress/expected/privileges.out
Fri Dec 26 00:24:38 2014
--- 
/home/buildfarm/build_root/HEAD/pgsql.32096/src/test/regress/results/privileges.out
 Fri Dec 26 00:25:54 2014
***
*** 197,202 
--- 197,203 
  CREATE VIEW atestv1 AS SELECT * FROM atest1; -- ok
  /* The next *should* fail, but it's not implemented that way yet. */
  CREATE VIEW atestv2 AS SELECT * FROM atest2;
+ ERROR:  cache lookup failed for function 30274
  CREATE VIEW atestv3 AS SELECT * FROM atest3; -- ok
  /* Empty view is a corner case that failed in 9.2. */
  CREATE VIEW atestv0 AS SELECT 0 as x WHERE false; -- ok

I find it suspicious that all three examples are in the same group of
parallel tests.  A possible theory is that one of these tests:

test: brin gin gist spgist privileges security_label collate matview lock 
replica_identity rowsecurity object_address

is doing something that has bad side-effects on concurrent sessions.

In any case, it now seems dead certain that this is a recently introduced
bug.  Andres is fortunate that the first instance occurred before his
recent batch of commits, or I'd be on him to revert them.  As is, though,
I'm wondering if 37de8de9e33606a043e98fee64b5595aedaa8254 could possibly
be related.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Better way of dealing with pgstat wait timeout during buildfarm runs?

2014-12-25 Thread Tomas Vondra
On 25.12.2014 20:36, Tom Lane wrote:

 Yeah, I've been getting more annoyed by that too lately. I keep
 wondering though whether there's an actual bug underneath that
 behavior that we're failing to see. PGSTAT_MAX_WAIT_TIME is already
 10 seconds; it's hard to credit that increasing it still further
 would be fixing anything. The other change would also mainly just
 sweep the issue under the rug, if there is any issue and it's not
 just that we're overloading underpowered buildfarm machines. (Maybe a
 better fix would be to reduce MAX_CONNECTIONS for the tests on these
 machines?)

I agree that increasing the limit further is not a good solution.

 BTW, I notice that in the current state of pgstat.c, all the logic
 for keeping track of request arrival times is dead code, because
 nothing is actually looking at DBWriteRequest.request_time. This
 makes me think that somebody simplified away some logic we maybe
 should have kept. But if we're going to leave it like this, we could
 replace the DBWriteRequest data structure with a simple OID list and
 save a fair amount of code.

Really? Which part of the code is dead? I see pgstat_recv_inquiry() is
updating the request_time after receiving the inquiry, and
pgstat_db_requested() is looking at it when writing the files.

If we can simplify the code by keeping just OIDs, let's do that. I think
the main reason why we haven't done that in 187492b6 was to keep as much
of the existing logic (and maybe change it in a separate patch).

regards
Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Better way of dealing with pgstat wait timeout during buildfarm runs?

2014-12-25 Thread Tom Lane
Tomas Vondra t...@fuzzy.cz writes:
 On 25.12.2014 20:36, Tom Lane wrote:
 BTW, I notice that in the current state of pgstat.c, all the logic
 for keeping track of request arrival times is dead code, because
 nothing is actually looking at DBWriteRequest.request_time.

 Really? Which part of the code is dead? I see pgstat_recv_inquiry() is
 updating the request_time after receiving the inquiry, and
 pgstat_db_requested() is looking at it when writing the files.

Where is pgstat_db_requested() looking at request_time?

 If we can simplify the code by keeping just OIDs, let's do that. I think
 the main reason why we haven't done that in 187492b6 was to keep as much
 of the existing logic (and maybe change it in a separate patch).

The real point here is that I think that commit *already* changed the
existing logic, because the time-of-receipt used to matter.  In
particular, there used to be a throttle on how often the stats file
could get written, which seems to have vanished.  I seriously doubt
that that was a good change, especially on write-bandwidth-challenged
platforms.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2014-12-25 Thread Alex Shulgin
Trent Shipley trent_ship...@qwest.net writes:

 On Friday 2007-12-14 16:22, Tom Lane wrote:
 Neil Conway ne...@samurai.com writes:
  By modifying COPY: COPY IGNORE ERRORS or some such would instruct COPY
  to drop (and log) rows that contain malformed data. That is, rows with
  too many or too few columns, rows that result in constraint violations,
  and rows containing columns where the data type's input function raises
  an error. The last case is the only thing that would be a bit tricky to
  implement, I think: you could use PG_TRY() around the InputFunctionCall,
  but I guess you'd need a subtransaction to ensure that you reset your
  state correctly after catching an error.

 Yeah.  It's the subtransaction per row that's daunting --- not only the
 cycles spent for that, but the ensuing limitation to 4G rows imported
 per COPY.

 You could extend the COPY FROM syntax with a COMMIT EVERY n clause.  This 
 would help with the 4G subtransaction limit.  The cost to the ETL process is 
 that a simple rollback would not be guaranteed send the process back to it's 
 initial state.  There are easy ways to deal with the rollback issue though.  

 A {NO} RETRY {USING algorithm} clause might be useful.   If the NO RETRY 
 option is selected then the COPY FROM can run without subtransactions and in 
 excess of the 4G per transaction limit.  NO RETRY should be the default since 
 it preserves the legacy behavior of COPY FROM.

 You could have an EXCEPTIONS TO {filename|STDERR} clause. I would not give 
 the 
 option of sending exceptions to a table since they are presumably malformed, 
 otherwise they would not be exceptions.  (Users should re-process exception 
 files if they want an if good then table a else exception to table b ...)

 EXCEPTIONS TO and NO RETRY would be mutually exclusive.


 If we could somehow only do a subtransaction per failure, things would
 be much better, but I don't see how.

Hello,

Attached is a proof of concept patch for this TODO item.  There is no
docs yet, I just wanted to know if approach is sane.

The added syntax is like the following:

  COPY [table] FROM [file/program/stdin] EXCEPTIONS TO [file or stdout]

The way it's done it is abusing Copy Both mode and from my limited
testing, that seems to just work.  The error trapping itself is done
using PG_TRY/PG_CATCH and can only catch formatting or before-insert
trigger errors, no attempt is made to recover from a failed unique
constraint, etc.

Example in action:

postgres=# \d test_copy2
  Table public.test_copy2
 Column |  Type   | Modifiers 
+-+---
 id | integer | 
 val| integer | 

postgres=# copy test_copy2 from program 'seq 3' exceptions to stdout;
1
NOTICE:  missing data for column val
CONTEXT:  COPY test_copy2, line 1: 1
2
NOTICE:  missing data for column val
CONTEXT:  COPY test_copy2, line 2: 2
3
NOTICE:  missing data for column val
CONTEXT:  COPY test_copy2, line 3: 3
NOTICE:  total exceptions ignored: 3

postgres=# \d test_copy1
  Table public.test_copy1
 Column |  Type   | Modifiers 
+-+---
 id | integer | not null

postgres=# set client_min_messages to warning;
SET
postgres=# copy test_copy1 from program 'ls /proc' exceptions to stdout;
...
vmstat
zoneinfo
postgres=# 

Limited performance testing shows no significant difference between
error-catching and plain code path.  For example, timing

  copy test_copy1 from program 'seq 100' [exceptions to stdout]

shows similar numbers with or without the added exceptions to clause.

Now that I'm sending this I wonder if the original comment about the
need for subtransaction around every loaded line still holds.  Any
example of what would be not properly rolled back by just PG_TRY?

Happy hacking!
--
Alex

From 50f7ab0a503a0d61776add8a138abf2622fc6c35 Mon Sep 17 00:00:00 2001
From: Alex Shulgin a...@commandprompt.com
Date: Fri, 19 Dec 2014 18:21:31 +0300
Subject: [PATCH] POC: COPY FROM ... EXCEPTIONS TO

---
 contrib/file_fdw/file_fdw.c |   4 +-
 src/backend/commands/copy.c | 251 +---
 src/backend/parser/gram.y   |  26 +++-
 src/bin/psql/common.c   |  14 +-
 src/bin/psql/copy.c | 119 ++-
 src/bin/psql/settings.h |   1 +
 src/bin/psql/startup.c  |   1 +
 src/bin/psql/tab-complete.c |  12 +-
 src/include/commands/copy.h |   3 +-
 src/include/nodes/parsenodes.h  |   1 +
 src/include/parser/kwlist.h |   1 +
 src/interfaces/ecpg/preproc/ecpg.addons |   2 +-
 12 files changed, 396 insertions(+), 39 deletions(-)

diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
new file mode 100644
index 5a4d5aa..0df02f7
*** a/contrib/file_fdw/file_fdw.c
--- b/contrib/file_fdw/file_fdw.c
*** fileBeginForeignScan(ForeignScanState *n
*** 624,629 
--- 624,630 
  	cstate = 

Re: [HACKERS] Proposal: two new role attributes and/or capabilities?

2014-12-25 Thread Robert Haas
On Tue, Dec 23, 2014 at 11:20 AM, José Luis Tallón
jltal...@adv-solutions.net wrote:
 I've found myself needing two role capabilities? as of lately, when
 thinking about restricting some roles to the barely minimum allowed
 permissions needed to perform their duties ... as opposed to having a
 superuser role devoted to these task.

 The capabilities would be:
 * MAINTENANCE --- Ability to run
 VACUUM [ANALYZE | FREEZE] (but not VACUUM FULL),
 ANALYZE (including SET LOCAL statistics_target TO 1),
 REINDEX CONCURRENTLY  (but not the blocking, regular, one)
 REFRESH MATERIALIZED VIEW CONCURRENTLY (but not the blocking one)
 COPY ???

 Rationale: delegate the routine maintenance tasks to a low privilege
 role, which can't do harm (apart from some performance degradation) ---
 hence the no exclusive locking operations requirement.

I think the problem here is that, while almost everybody would
probably agree that something like this is useful, three hackers in a
room will have four or five different opinions on how to set the
boundaries around it.  I for example wouldn't feel too bad about
grouping VACUUM and ANALYZE under the same umbrella, but certainly
would be surprised to see all of the other stuff included.  But you've
got a different idea that is clearly valid, and somebody else might
want yet another thing.  We can avoid those problems by making the
capabilities finer-grained, but of course then you end up with lots
and lots of them, which is annoying too.

 * IMPERSONATE --- Ability to do SET AUTHORIZATION TO some_role; and RESET
 AUTHORIZATION
 This might be further refined to provide a way to say This role is
 authorized to impersonate role1 but no other

I can't see this providing any meaningful security improvement.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Better way of dealing with pgstat wait timeout during buildfarm runs?

2014-12-25 Thread Tomas Vondra
On 25.12.2014 21:14, Andres Freund wrote:
 On 2014-12-25 14:36:42 -0500, Tom Lane wrote:

 My guess is that a checkpoint happened at that time. Maybe it'd be a 
 good idea to make pg_regress start postgres with log_checkpoints 
 enabled? My guess is that we'd find horrendous 'sync' times.
 
 Michael: Could you perhaps turn log_checkpoints on in the config?

Logging timestamps (using log_line_prefux) would be also helpful.

 
 BTW, I notice that in the current state of pgstat.c, all the logic
 for keeping track of request arrival times is dead code, because
 nothing is actually looking at DBWriteRequest.request_time. This
 makes me think that somebody simplified away some logic we maybe
 should have kept. But if we're going to leave it like this, we
 could replace the DBWriteRequest data structure with a simple OID
 list and save a fair amount of code.
 
 That's indeed odd. Seems to have been lost when the statsfile was
 split into multiple files. Alvaro, Tomas?

The goal was to keep the logic as close to the original as possible.
IIRC there were pgstat wait timeout issues before, and in most cases
the conclusion was that it's probably because of overloaded I/O.

But maybe there actually was another bug, and it's entirely possible
that the split introduced a new one, and that's what we're seeing now.
The strange thing is that the split happened ~2 years ago, which is
inconsistent with the sudden increase of this kind of issues. So maybe
something changed on that particular animal (a failing SD card causing
I/O stalls, perhaps)?

Anyway, I happen to have a spare Raspberry PI, so I'll try to reproduce
and analyze the issue locally. But that won't happen until January.

 I wondered for a second whether the split could be responsible
 somehow, but there's reports of that in older backbranches as well:
 http://pgbuildfarm.org/cgi-bin/show_log.pl?nm=mereswinedt=2014-12-23%2019%3A17%3A41


Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Better way of dealing with pgstat wait timeout during buildfarm runs?

2014-12-25 Thread Tom Lane
Tomas Vondra t...@fuzzy.cz writes:
 The strange thing is that the split happened ~2 years ago, which is
 inconsistent with the sudden increase of this kind of issues. So maybe
 something changed on that particular animal (a failing SD card causing
 I/O stalls, perhaps)?

I think that hamster has basically got a tin can and string for an I/O
subsystem.  It's not real clear to me whether there's actually been an
increase in wait timeout failures recently; somebody would have to
go through and count them before I'd have much faith in that thesis.
However, I notice that at least the last few occurrences on hamster
all seem to have been in this parallel block:

test: brin gin gist spgist privileges security_label collate matview lock 
replica_identity rowsecurity object_address

which as recently as 9.4 contained just these tests:

test: privileges security_label collate matview lock replica_identity

I'm fairly sure that the index-related tests in this batch are I/O
intensive, and since they were not there at all six months ago, it's not
hard to believe that this block of tests has far greater I/O demands than
used to exist.  Draw your own conclusions ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Better way of dealing with pgstat wait timeout during buildfarm runs?

2014-12-25 Thread Tomas Vondra
On 25.12.2014 22:16, Tom Lane wrote:
 Tomas Vondra t...@fuzzy.cz writes:
 On 25.12.2014 20:36, Tom Lane wrote:
 BTW, I notice that in the current state of pgstat.c, all the logic
 for keeping track of request arrival times is dead code, because
 nothing is actually looking at DBWriteRequest.request_time.
 
 Really? Which part of the code is dead? I see pgstat_recv_inquiry() is
 updating the request_time after receiving the inquiry, and
 pgstat_db_requested() is looking at it when writing the files.
 
 Where is pgstat_db_requested() looking at request_time?

Oh, right. pgstat_db_requested() is not looking at the timestamp - it
only checks the OID of the database. But pgstat_recv_inquiry() is
checking it, comparing it to cutoff_time etc.

ISTM the main change related to this is that this:

if (last_statwrite  last_statrequest)
pgstat_write_statsfile(false);

got replaced by this:

if (pgstat_write_statsfile_needed())
pgstat_write_statsfiles(false, false);

where pgstat_write_statsfile_needed() only checks if the list is empty
(and ignores the request/write timestamps).

If I understand that correctly, this can would lead to writing the files
more often, and we're dealing with a timeout.

 If we can simplify the code by keeping just OIDs, let's do that. I think
 the main reason why we haven't done that in 187492b6 was to keep as much
 of the existing logic (and maybe change it in a separate patch).
 
 The real point here is that I think that commit *already* changed
 the existing logic, because the time-of-receipt used to matter. In 
 particular, there used to be a throttle on how often the stats file 
 could get written, which seems to have vanished. I seriously doubt 
 that that was a good change, especially on
 write-bandwidth-challenged platforms.

Yes - if that change causes writing the files being written more
frequently, it's not a good change.

But I think the time-of-receipt still matters - pgstat_recv_inquiry
logic remained the same, just applied per database. ISTM the only thing
that disappeared is the (last_statwrite  last_statrequest) check.

I have to think about this a bit more, I haven't seen this code since
the split patch.

Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Better way of dealing with pgstat wait timeout during buildfarm runs?

2014-12-25 Thread Tomas Vondra
On 25.12.2014 22:40, Tom Lane wrote:
 Tomas Vondra t...@fuzzy.cz writes:
 The strange thing is that the split happened ~2 years ago, which is
 inconsistent with the sudden increase of this kind of issues. So maybe
 something changed on that particular animal (a failing SD card causing
 I/O stalls, perhaps)?
 
 I think that hamster has basically got a tin can and string for an I/O
 subsystem.  It's not real clear to me whether there's actually been an

Yes. It's called SD card.

 increase in wait timeout failures recently; somebody would have to
 go through and count them before I'd have much faith in that thesis.

That's what I did. On hamster I see this (in the HEAD):

2014-12-25 16:00:07 yes
2014-12-24 16:00:07 yes
2014-12-23 16:00:07 yes
2014-12-22 16:00:07 yes
2014-12-19 16:00:07 yes
2014-12-15 16:00:11 no
2014-10-25 16:00:06 no
2014-10-24 16:00:06 no
2014-10-23 16:00:06 no
2014-10-22 16:00:06 no
2014-10-21 16:00:07 no
2014-10-19 16:00:06 no
2014-09-28 16:00:06 no
2014-09-26 16:00:07 no
2014-08-28 16:00:06 no
2014-08-12 16:00:06 no
2014-08-05 22:04:48 no
2014-07-19 01:53:30 no
2014-07-06 16:00:06 no
2014-07-04 16:00:06 no
2014-06-29 16:00:06 no
2014-05-09 16:00:04 no
2014-05-07 16:00:04 no
2014-05-04 16:00:04 no
2014-04-28 16:00:04 no
2014-04-18 16:00:04 no
2014-04-04 16:00:04 no

(where yes means pgstat wait timeout is in the logs). On chipmunk,
the trend is much less convincing (but there's much less failures, and
only 3 of them failed because of the pgstat wait timeout).

However, it's worth mentioning that all the pgstat failures happened at
16:00:07 and most of the older failures are before that. So it may be
that it was failing because of something else, and the pgstat timeout
could not happen because of that. OTOH, there's a fair amount of
successful runs.


 However, I notice that at least the last few occurrences on hamster
 all seem to have been in this parallel block:
 
 test: brin gin gist spgist privileges security_label collate matview
 lock replica_identity rowsecurity object_address
 
 which as recently as 9.4 contained just these tests:
 
 test: privileges security_label collate matview lock replica_identity
 
 I'm fairly sure that the index-related tests in this batch are I/O 
 intensive, and since they were not there at all six months ago, it's
 not hard to believe that this block of tests has far greater I/O
 demands than used to exist. Draw your own conclusions ...

Yes, that might be the culprit here. Would be interesting to know what's
happening on the machine while the tests are running to confirm this
hypothesis.

regards
Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Securing make check (CVE-2014-0067)

2014-12-25 Thread Noah Misch
On Thu, Dec 25, 2014 at 11:35:31PM +1300, David Rowley wrote:
 On 25 December 2014 at 18:27, Noah Misch n...@leadboat.com wrote:
  This needs to be conditional on whether the platform supports IPv6, like
  we do
  in setup_config().  The attached patch works on these configurations:
 
  64-bit Windows Server 2003, 32-bit VS2010
  64-bit Windows Server 2003, MinGW (always 32-bit)
  64-bit Windows Server 2008, 64-bit VS2012
  64-bit Windows Server 2008, 64-bit MinGW-w64
 
  If the patch looks reasonable, I will commit it.
 
 
 I'm just looking at initdb.c I see that there's this:
 
 #ifdef HAVE_IPV6
 
 /*
  * Probe to see if there is really any platform support for IPv6, and
  * comment out the relevant pg_hba line if not.  This avoids runtime
  * warnings if getaddrinfo doesn't actually cope with IPv6.  Particularly
  * useful on Windows, where executables built on a machine with IPv6 may
  * have to run on a machine without.
  */
 The comment does seem to indicate that getaddrinfo might give a warning on
 an IPv4 only machine when given an IPv6 address to resolve. I think likely
 we want that here too. Though I don't have an IPv4 only machine to test on.

A default installation of Windows Server 2003 is IPv4-only.  Putting a ::1/128
line in pg_hba.conf makes the postmaster fail to start there, reporting error
specifying both host name and CIDR mask is invalid.

 I'll test the patch with IPv4 disabled and see if I get a warning...
 
 Ok, it seems to still write the Ipv6 entry into the pg_hba.conf with IPv6
 disabled, so perhaps disabling IPv6 is not sufficient, maybe it needs to be
 tested on a machine that does not support IPv6 at all.

It is fine to emit the IPv6 entry on any system where it does not impede
postmaster start, even systems that won't actually use IPv6 to connect.

I went ahead and committed this.  Andrew, would you unstick buildfarm members
jacana and bowerbird on all branches?  SRA, would you do the same for
hamerkop?  Thanks.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Some other odd buildfarm failures

2014-12-25 Thread Andres Freund
Hi,

On 2014-12-25 16:01:47 -0500, Tom Lane wrote:
 In any case, it now seems dead certain that this is a recently introduced
 bug.  Andres is fortunate that the first instance occurred before his
 recent batch of commits, or I'd be on him to revert them.

Yes, Phew. These look rather odd.

 As is, though,
 I'm wondering if 37de8de9e33606a043e98fee64b5595aedaa8254 could possibly
 be related.

I really can't imagine how. If a additional barrier in that place can
cause such problems we'd surely have more reports by an accidental wait
in the right place than these.

My guess is that it's related to d7ee82e50f. It seems realistic that the
event trigger added by it to the object_address test can cause errors at
varying times.

I wonder if it'd not be a good idea if the event trigger code installed
a error context callback? Since they can be called in situations we
don't routinely expect that'd make diagnosis in many cases easier.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Some other odd buildfarm failures

2014-12-25 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 I wonder if it'd not be a good idea if the event trigger code installed
 a error context callback? Since they can be called in situations we
 don't routinely expect that'd make diagnosis in many cases easier.

+1 ... even if that's not related to the immediate issue, it seems like
a good idea.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Better way of dealing with pgstat wait timeout during buildfarm runs?

2014-12-25 Thread Tom Lane
Tomas Vondra t...@fuzzy.cz writes:
 On 25.12.2014 22:40, Tom Lane wrote:
 I think that hamster has basically got a tin can and string for an I/O
 subsystem.  It's not real clear to me whether there's actually been an
 increase in wait timeout failures recently; somebody would have to
 go through and count them before I'd have much faith in that thesis.

 That's what I did. On hamster I see this (in the HEAD):

 2014-12-25 16:00:07 yes
 2014-12-24 16:00:07 yes
 2014-12-23 16:00:07 yes
 2014-12-22 16:00:07 yes
 2014-12-19 16:00:07 yes
 2014-12-15 16:00:11 no
 2014-10-25 16:00:06 no
 2014-10-24 16:00:06 no
 2014-10-23 16:00:06 no
 2014-10-22 16:00:06 no
 2014-10-21 16:00:07 no
 2014-10-19 16:00:06 no
 2014-09-28 16:00:06 no
 2014-09-26 16:00:07 no
 2014-08-28 16:00:06 no
 2014-08-12 16:00:06 no
 2014-08-05 22:04:48 no
 2014-07-19 01:53:30 no
 2014-07-06 16:00:06 no
 2014-07-04 16:00:06 no
 2014-06-29 16:00:06 no
 2014-05-09 16:00:04 no
 2014-05-07 16:00:04 no
 2014-05-04 16:00:04 no
 2014-04-28 16:00:04 no
 2014-04-18 16:00:04 no
 2014-04-04 16:00:04 no

 (where yes means pgstat wait timeout is in the logs). On chipmunk,
 the trend is much less convincing (but there's much less failures, and
 only 3 of them failed because of the pgstat wait timeout).

mereswine's history is also pretty interesting in this context.  That
series makes it look like the probability of pgstat wait timeout took
a big jump around the beginning of December, especially if you make the
unproven-but-not-unreasonable assumption that the two pg_upgradecheck
failures since then were also wait timeout failures.  That's close enough
after commit 88fc71926392115c (Nov 19) to make me suspect that that was
what put us over the edge: that added a bunch more I/O *and* a bunch more
statistics demands to this one block of parallel tests.

But even if we are vastly overstressing the I/O subsystem on these boxes,
why is it manifesting like this?  pgstat never fsyncs the stats temp file,
so it should not have to wait for physical I/O I'd think.  Or perhaps the
file rename() operations get fsync'd behind the scenes by the filesystem?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Better way of dealing with pgstat wait timeout during buildfarm runs?

2014-12-25 Thread Tomas Vondra
On 26.12.2014 02:59, Tom Lane wrote:
 Tomas Vondra t...@fuzzy.cz writes:
 On 25.12.2014 22:40, Tom Lane wrote:
 I think that hamster has basically got a tin can and string for an I/O
 subsystem.  It's not real clear to me whether there's actually been an
 increase in wait timeout failures recently; somebody would have to
 go through and count them before I'd have much faith in that thesis.
 
 That's what I did. On hamster I see this (in the HEAD):
 
 2014-12-25 16:00:07 yes
 2014-12-24 16:00:07 yes
 2014-12-23 16:00:07 yes
 2014-12-22 16:00:07 yes
 2014-12-19 16:00:07 yes
 2014-12-15 16:00:11 no
 2014-10-25 16:00:06 no
 2014-10-24 16:00:06 no
 2014-10-23 16:00:06 no
 2014-10-22 16:00:06 no
 2014-10-21 16:00:07 no
 2014-10-19 16:00:06 no
 2014-09-28 16:00:06 no
 2014-09-26 16:00:07 no
 2014-08-28 16:00:06 no
 2014-08-12 16:00:06 no
 2014-08-05 22:04:48 no
 2014-07-19 01:53:30 no
 2014-07-06 16:00:06 no
 2014-07-04 16:00:06 no
 2014-06-29 16:00:06 no
 2014-05-09 16:00:04 no
 2014-05-07 16:00:04 no
 2014-05-04 16:00:04 no
 2014-04-28 16:00:04 no
 2014-04-18 16:00:04 no
 2014-04-04 16:00:04 no
 
 (where yes means pgstat wait timeout is in the logs). On
 chipmunk, the trend is much less convincing (but there's much less
 failures, and only 3 of them failed because of the pgstat wait
 timeout).
 
 mereswine's history is also pretty interesting in this context. That 
 series makes it look like the probability of pgstat wait timeout
 took a big jump around the beginning of December, especially if you
 make the unproven-but-not-unreasonable assumption that the two
 pg_upgradecheck failures since then were also wait timeout failures.
 That's close enough after commit 88fc71926392115c (Nov 19) to make me
 suspect that that was what put us over the edge: that added a bunch
 more I/O *and* a bunch more statistics demands to this one block of
 parallel tests.

Interesting. But even if this commit tipped us over the edge, it's not a
proof that the split patch was perfectly correct.

 But even if we are vastly overstressing the I/O subsystem on these
 boxes, why is it manifesting like this? pgstat never fsyncs the stats
 temp file, so it should not have to wait for physical I/O I'd think.
 Or perhaps the file rename() operations get fsync'd behind the scenes
 by the filesystem?

My guess is that the amount of dirty data in page cache reaches, reaches
dirty_ratio/dirty_bytes, effectively forcing the writes to go directly
to the disks. Those ARM machines have rather low amounts of RAM
(typically 256-512MB), and the default values for dirty_ratio are ~20%
IIRC. So that's ~50MB-100MB.

Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Better way of dealing with pgstat wait timeout during buildfarm runs?

2014-12-25 Thread Michael Paquier
On Fri, Dec 26, 2014 at 6:28 AM, Tomas Vondra t...@fuzzy.cz wrote:
 On 25.12.2014 21:14, Andres Freund wrote:
 On 2014-12-25 14:36:42 -0500, Tom Lane wrote:

 My guess is that a checkpoint happened at that time. Maybe it'd be a
 good idea to make pg_regress start postgres with log_checkpoints
 enabled? My guess is that we'd find horrendous 'sync' times.

 Michael: Could you perhaps turn log_checkpoints on in the config?

 Logging timestamps (using log_line_prefux) would be also helpful.
Done. If have been wondering about those failures as well but didn't
get the time to look around. FYI, hamster is running with a 8GB class
10 SD card able to do 40M/s in read, card that I changed 2 weeks ago
btw, the former 4GB card beginning to show I/O errors, RIP to it. So
this is what is triggering the wait timeouts more frequently... But I
have no real explanation why REL9_4_STABLE shows no signs of failures.

For the time being, what about putting pg_stats_tmp into a ram disk to
leverage the I/O? Whatever what we come up with, I imagine that I/O
will still be tight on hamster.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Some other odd buildfarm failures

2014-12-25 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 My guess is that it's related to d7ee82e50f. It seems realistic that the
 event trigger added by it to the object_address test can cause errors at
 varying times.

[ squint... ]  Event triggers are global across the whole database, aren't
they?  Isn't it frickin insane to run a test like this in parallel with
others?

Not but what it seems to be exposing some bugs.  Still, I don't think this
is a reasonable test design.  We have absolutely no idea what behaviors
are being triggered in the other tests, except that they are unrelated to
what those tests think they are testing.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-25 Thread Michael Paquier
On Thu, Dec 25, 2014 at 10:10 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Wed, Dec 24, 2014 at 9:03 PM, Michael Paquier
 michael.paqu...@gmail.com wrote:
 Returning only a boolean is fine for me (that's what my first patch
 did), especially if we add at some point hooks for compression and
 decompression calls.
 Here is a patch rebased on current HEAD (60838df) for the core feature
 with the APIs of pglz using booleans as return values.
After the revert of 1st patch moving pglz to src/common, I have
reworked both patches, resulting in the attached.

For pglz, the dependency to varlena has been removed to make the code
able to run independently on both frontend and backend sides. In order
to do that the APIs of pglz_compress and pglz_decompress have been
changed a bit:
- pglz_compress returns the number of bytes compressed.
- pglz_decompress takes as additional argument the compressed length
of the buffer, and returns the number of bytes decompressed instead of
a simple boolean for consistency with the compression API.
PGLZ_Header is not modified to keep the on-disk format intact.

The WAL compression patch is realigned based on those changes.
Regards,
-- 
Michael


20141226_fpw_compression_v11.tar.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Some other odd buildfarm failures

2014-12-25 Thread Alvaro Herrera
Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  My guess is that it's related to d7ee82e50f. It seems realistic that the
  event trigger added by it to the object_address test can cause errors at
  varying times.
 
 [ squint... ]  Event triggers are global across the whole database, aren't
 they?  Isn't it frickin insane to run a test like this in parallel with
 others?

Well, the event trigger function is BEGIN END; so I don't think it
should affect anything.

 Not but what it seems to be exposing some bugs.

That seems to me a good thing ... a bit inconvenient of course, but it
beats having users get strange behavior which they can't possibly debug.

 Still, I don't think this is a reasonable test design.  We have
 absolutely no idea what behaviors are being triggered in the other
 tests, except that they are unrelated to what those tests think they
 are testing.

I can of course move it to a separate parallel test, but I don't think
that should be really necessary.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Better way of dealing with pgstat wait timeout during buildfarm runs?

2014-12-25 Thread Alvaro Herrera
Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  So I think a better way to deal with that warning would be a good
  idea. Besides somehow making the mechanism there are two ways to attack
  this that I can think of, neither of them awe inspiring:
 
  1) Make that WARNING a LOG message instead. Since those don't get send
  to the client with default settings...
  2) Increase PGSTAT_MAX_WAIT_TIME even further than what 99b545 increased
  it to.
 
 Yeah, I've been getting more annoyed by that too lately.  I keep wondering
 though whether there's an actual bug underneath that behavior that we're
 failing to see.

I think the first thing to do is reconsider usage of PGSTAT_RETRY_DELAY
instead of PGSTAT_STAT_INTERVAL in autovacuum workers.  That decreases
the wait time 50-fold, if I recall this correctly, and causes large
amounts of extra I/O traffic.

 BTW, I notice that in the current state of pgstat.c, all the logic for
 keeping track of request arrival times is dead code, because nothing is
 actually looking at DBWriteRequest.request_time.  This makes me think that
 somebody simplified away some logic we maybe should have kept.

I will have a look.  I remember being confused about this at some point
when reviewing that patch.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-12-25 Thread Noah Misch
On Thu, Nov 20, 2014 at 09:15:51PM +0100, Andres Freund wrote:
 On 2014-11-20 13:57:25 -0500, Robert Haas wrote:
  On Thu, Nov 20, 2014 at 12:19 PM, Andres Freund and...@2ndquadrant.com 
  wrote:
  If the
  deadlock detector would kill the processes anyway, it doesn't matter,
  because CheckTableNotInUse() should do it first, so that we get a
  better error and without waiting for deadlock_timeout.  So any
  scenario that's predicated on the assumption that CheckTableNotInUse()
  will succeed in a parallel context is 100% unconvincing to me as an
  argument for anything.
 
 Meh, there's plenty cases where CheckTableNotInUse() isn't used where we
 still rely on locks actually working. And it's not just postgres code,
 this *will* break user code.

Today, holding a relation AccessExclusiveLock ensures no other process is
using the table.  That is one of two prerequisites for making an alteration
that threatens concurrent users of the table.  The other prerequisite is
verifying that no other subsystem of the current process is using the table,
hence CheckTableNotInUse().  Robert's designs do soften AccessExclusiveLock
for relations; holding it will not preclude use of the table in workers of the
same parallel group.  Pessimizing CheckTableNotInUse() compensates.  In every
scenario where AccessExclusiveLock is softer than before, CheckTableNotInUse()
will fail unconditionally.  Code that, today, performs the correct checks
before making a threatening table alteration will remain correct.

ANALYZE and lazy VACUUM are the only core operations that take self-exclusive
relation locks and skip CheckTableNotInUse().  PreventTransactionChain() will
block naive concurrent VACUUM.  (If that weren't the case, VACUUM would need
CheckTableNotInUse() even without parallelism in the picture.  I wouldn't
oppose adding CheckTableNotInUse() to VACUUM as defense in depth.)  ANALYZE is
special.  It's compatible with most concurrent use of the table, so
CheckTableNotInUse() is unnecessary.  Blocking ANALYZE during parallelism,
even if we someday allow other database writes, is a wart I can live with.

Given a choice between no parallelism and parallelism when the initiating
transaction holds no self-exclusive locks, I'd pick the latter.  But you have
overstated the significance of holding a lock and the degree to which Robert's
proposals change the same.

 Your argument essentially is that we don't actually need to lock objects
 if the other side only reads. Yes, you add a condition or two ontop
 (e.g. CheckTableNotInUse() fails), but that's not that much. If we want
 to do this we really need to forbid *any* modification to catalog/user
 tables while parallel operations are ongoing. In the primary and worker
 backends.  I think that's going to end up being much more
 problematic/restrictive than not allowing non-cooperative paralellism
 after = ShareUpdateExclusive. If we ever want to parallelize writing
 operations we'll regret this quite a bit.

https://wiki.postgresql.org/wiki/Parallel_Sort already contemplates blocking
every kind of database write, due to challenges around combo CIDs (affects
UPDATE and DELETE) and invalidation messages (mostly affects DDL).  I doubt
we'll ever care to allow a worker to start an independent DDL operation.  We
might want to allow INSERT, UPDATE, and DELETE.  I don't see that allowing
multiple workers to hold AccessExclusiveLock will make that more difficult.
Difficulties around XactLockTableWait() are more noteworthy, and they stand
regardless of what we do with heavyweight locking.

 The 'lets grant all the current locks at start of parallel query'
 approach seems quite a bit safer than always doing that during parallel
 query, don't get me wrong.

Agreed.

 Btw, what are your thoughts about SERIALIZABLE and parallel query?
 Afaics that'd be pretty hard to support?

It didn't look obviously awful, because most of the relevant data structures
live in shared memory.  We already have the concept of one backend adjusting
the predicate locks pertaining to another backend's transaction.  (That's not
to say the number of SERIALIZABLE-specific changes will be zero, either.)

Thanks,
nm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Join push-down support for foreign tables

2014-12-25 Thread Shigeru Hanada
2014-12-16 0:45 GMT+09:00 Tom Lane t...@sss.pgh.pa.us:
 Shigeru Hanada shigeru.han...@gmail.com writes:
 I'm working on $SUBJECT and would like to get comments about the
 design.  Attached patch is for the design below.  Note that the patch
 requires Kaigai-san's custom foriegn join patch[1]

 For the record, I'm not particularly on-board with custom scan, and
 even less so with custom join.  I don't want FDW features like this
 depending on those kluges, especially not when you're still in need
 of core-code changes (which really points up the inadequacy of those
 concepts).

This design derived from discussion about custom scan/join.  In that
discussion Robert suggested common infrastructure for replacing Join
path with Scan node.  Here I agree to user such common infrastructure.
One concern is introducing hook function I/F which seems to break
FdwRoutine I/F boundary...


 Also, please don't redefine struct NestPath like that.  That adds a
 whole bunch of notational churn (and backpatch risk) for no value
 that I can see.  It might've been better to do it like that in a
 green field, but you're about twenty years too late to do it now.

Ok, will revert it.

-- 
Shigeru HANADA


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Join push-down support for foreign tables

2014-12-25 Thread Shigeru Hanada
2014-12-16 1:22 GMT+09:00 Robert Haas robertmh...@gmail.com:
 On Mon, Dec 15, 2014 at 3:40 AM, Shigeru Hanada
 shigeru.han...@gmail.com wrote:
 I'm working on $SUBJECT and would like to get comments about the
 design.  Attached patch is for the design below.

 I'm glad you are working on this.

 1. Join source relations
 As described above, postgres_fdw (and most of SQL-based FDWs) needs to
 check that 1) all foreign tables in the join belong to a server, and
 2) all foreign tables have same checkAsUser.
 In addition to that, I add extra limitation that both inner/outer
 should be plain foreign tables, not a result of foreign join.  This
 limiation makes SQL generator simple.  Fundamentally it's possible to
 join even join relations, so N-way join is listed as enhancement item
 below.

 It seems pretty important to me that we have a way to push the entire
 join nest down.  Being able to push down a 2-way join but not more
 seems like quite a severe limitation.

Hmm, I agree to support N-way join is very useful.  Postgres-XC's SQL
generator seems to give us a hint for such case, I'll check it out
again.

-- 
Shigeru HANADA


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] inherit support for foreign tables

2014-12-25 Thread Etsuro Fujita
On 2014/12/23 0:36, Tom Lane wrote:
 Yeah, we need to do something about the PlanRowMark data structure.
 Aside from the pre-existing issue in postgres_fdw, we need to fix it
 to support inheritance trees in which more than one rowmark method
 is being used.  That rte.hasForeignChildren thing is a crock, and
 would still be a crock even if it were correctly documented as being
 a planner temporary variable (rather than the current implication that
 it's always valid).  RangeTblEntry is no place for planner temporaries.

Agreed.

 The idea I'd had about that was to convert the markType field into a
 bitmask, so that a parent node's markType could represent the logical
 OR of the rowmark methods being used by all its children.  I've not
 attempted to code this up though, and probably won't get to it until
 after Christmas.  One thing that's not clear is what should happen
 with ExecRowMark.

That seems like a good idea, as parent PlanRowMarks are ignored at runtime.

Aside from the above, I noticed that the patch has a bug in handling
ExecRowMarks/ExecAuxRowMarks for foreign tables in inheritance trees
during the EPQ processing.:-(  Attached is an updated version of the
patch to fix that, which has been created on top of [1], as said before.

Thanks,

[1] http://www.postgresql.org/message-id/5497bf4c.6080...@lab.ntt.co.jp

Best regards,
Etsuro Fujita
*** a/contrib/file_fdw/input/file_fdw.source
--- b/contrib/file_fdw/input/file_fdw.source
***
*** 148,153  EXPLAIN (VERBOSE, COSTS FALSE) SELECT * FROM agg_csv WHERE a  0;
--- 148,167 
  SELECT * FROM agg_csv WHERE a  0;
  RESET constraint_exclusion;
  
+ -- table inheritance tests
+ CREATE TABLE agg (a int2, b float4);
+ ALTER FOREIGN TABLE agg_csv INHERIT agg;
+ SELECT tableoid::regclass, * FROM agg;
+ SELECT tableoid::regclass, * FROM agg_csv;
+ SELECT tableoid::regclass, * FROM ONLY agg;
+ -- updates aren't supported
+ UPDATE agg SET a = 1;
+ DELETE FROM agg WHERE a = 100;
+ -- but this should be ignored
+ SELECT tableoid::regclass, * FROM agg FOR UPDATE;
+ ALTER FOREIGN TABLE agg_csv NO INHERIT agg;
+ DROP TABLE agg;
+ 
  -- privilege tests
  SET ROLE file_fdw_superuser;
  SELECT * FROM agg_text ORDER BY a;
*** a/contrib/file_fdw/output/file_fdw.source
--- b/contrib/file_fdw/output/file_fdw.source
***
*** 249,254  SELECT * FROM agg_csv WHERE a  0;
--- 249,294 
  (0 rows)
  
  RESET constraint_exclusion;
+ -- table inheritance tests
+ CREATE TABLE agg (a int2, b float4);
+ ALTER FOREIGN TABLE agg_csv INHERIT agg;
+ SELECT tableoid::regclass, * FROM agg;
+  tableoid |  a  |b
+ --+-+-
+  agg_csv  | 100 |  99.097
+  agg_csv  |   0 | 0.09561
+  agg_csv  |  42 |  324.78
+ (3 rows)
+ 
+ SELECT tableoid::regclass, * FROM agg_csv;
+  tableoid |  a  |b
+ --+-+-
+  agg_csv  | 100 |  99.097
+  agg_csv  |   0 | 0.09561
+  agg_csv  |  42 |  324.78
+ (3 rows)
+ 
+ SELECT tableoid::regclass, * FROM ONLY agg;
+  tableoid | a | b 
+ --+---+---
+ (0 rows)
+ 
+ -- updates aren't supported
+ UPDATE agg SET a = 1;
+ ERROR:  cannot update foreign table agg_csv
+ DELETE FROM agg WHERE a = 100;
+ ERROR:  cannot delete from foreign table agg_csv
+ -- but this should be ignored
+ SELECT tableoid::regclass, * FROM agg FOR UPDATE;
+  tableoid |  a  |b
+ --+-+-
+  agg_csv  | 100 |  99.097
+  agg_csv  |   0 | 0.09561
+  agg_csv  |  42 |  324.78
+ (3 rows)
+ 
+ ALTER FOREIGN TABLE agg_csv NO INHERIT agg;
+ DROP TABLE agg;
  -- privilege tests
  SET ROLE file_fdw_superuser;
  SELECT * FROM agg_text ORDER BY a;
*** a/contrib/postgres_fdw/expected/postgres_fdw.out
--- b/contrib/postgres_fdw/expected/postgres_fdw.out
***
*** 3027,3032  NOTICE:  NEW: (13,test triggered !)
--- 3027,3544 
  (1 row)
  
  -- ===
+ -- test inheritance features
+ -- ===
+ CREATE TABLE a (aa TEXT);
+ CREATE TABLE loct (aa TEXT, bb TEXT);
+ CREATE FOREIGN TABLE b (bb TEXT) INHERITS (a)
+   SERVER loopback OPTIONS (table_name 'loct');
+ INSERT INTO a(aa) VALUES('aaa');
+ INSERT INTO a(aa) VALUES('');
+ INSERT INTO a(aa) VALUES('a');
+ INSERT INTO a(aa) VALUES('aa');
+ INSERT INTO a(aa) VALUES('aaa');
+ INSERT INTO a(aa) VALUES('');
+ INSERT INTO b(aa) VALUES('bbb');
+ INSERT INTO b(aa) VALUES('');
+ INSERT INTO b(aa) VALUES('b');
+ INSERT INTO b(aa) VALUES('bb');
+ INSERT INTO b(aa) VALUES('bbb');
+ INSERT INTO b(aa) VALUES('');
+ SELECT relname, a.* FROM a, pg_class where a.tableoid = pg_class.oid;
+  relname |aa
+ -+--
+  a   | aaa
+  a   | 
+  a   | a
+  a   | aa
+  a   | aaa
+  a   | 
+  b   | bbb
+  b   | 
+  b   | b
+  b   | bb
+  b   | bbb
+  b   | 
+ (12 rows)
+ 
+ 

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-25 Thread Fujii Masao
On Fri, Dec 26, 2014 at 12:31 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Thu, Dec 25, 2014 at 10:10 PM, Michael Paquier
 michael.paqu...@gmail.com wrote:
 On Wed, Dec 24, 2014 at 9:03 PM, Michael Paquier
 michael.paqu...@gmail.com wrote:
 Returning only a boolean is fine for me (that's what my first patch
 did), especially if we add at some point hooks for compression and
 decompression calls.
 Here is a patch rebased on current HEAD (60838df) for the core feature
 with the APIs of pglz using booleans as return values.
 After the revert of 1st patch moving pglz to src/common, I have
 reworked both patches, resulting in the attached.

 For pglz, the dependency to varlena has been removed to make the code
 able to run independently on both frontend and backend sides. In order
 to do that the APIs of pglz_compress and pglz_decompress have been
 changed a bit:
 - pglz_compress returns the number of bytes compressed.
 - pglz_decompress takes as additional argument the compressed length
 of the buffer, and returns the number of bytes decompressed instead of
 a simple boolean for consistency with the compression API.
 PGLZ_Header is not modified to keep the on-disk format intact.

pglz_compress() and pglz_decompress() still use PGLZ_Header, so the frontend
which uses those functions needs to handle PGLZ_Header. But it basically should
be handled via the varlena macros. That is, the frontend still seems to need to
understand the varlena datatype. I think we should avoid that. Thought?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-25 Thread Michael Paquier
On Fri, Dec 26, 2014 at 3:24 PM, Fujii Masao masao.fu...@gmail.com wrote:
 pglz_compress() and pglz_decompress() still use PGLZ_Header, so the frontend
 which uses those functions needs to handle PGLZ_Header. But it basically 
 should
 be handled via the varlena macros. That is, the frontend still seems to need 
 to
 understand the varlena datatype. I think we should avoid that. Thought?
Hm, yes it may be wiser to remove it and make the data passed to pglz
for varlena 8 bytes shorter..
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] The return value of allocate_recordbuf()

2014-12-25 Thread Fujii Masao
Hi,

While reviewing FPW compression patch, I found that allocate_recordbuf()
always returns TRUE though its source code comment says that FALSE is
returned if out of memory. Its return value is checked in two places, but
which is clearly useless.

allocate_recordbuf() was introduced by 7fcbf6a, and then changed by
2c03216 so that palloc() is used instead of malloc and FALSE is never returned
even if out of memory. So this seems an oversight of 2c03216. Maybe
we should change it so that it checks whether we can enlarge the memory
with the requested size before actually allocating the memory?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] a fast bloat measurement tool (was Re: Measuring relation free space)

2014-12-25 Thread Abhijit Menon-Sen
At 2014-09-25 15:40:11 +0530, a...@2ndquadrant.com wrote:

 All right, then I'll post a version that addresses Amit's other
 points, adds a new file/function to pgstattuple, acquires content
 locks, and uses HeapTupleSatisfiesVacuum, hint-bit setting and all.

Sorry, I forgot to post this patch. It does what I listed above, and
seems to work fine (it's still quite a lot faster than pgstattuple
in many cases).

A couple of remaining questions:

1. I didn't change the handling of LP_DEAD items, because the way it is
   now matches what pgstattuple counts. I'm open to changing it, though.
   Maybe it makes sense to count LP_DEAD items iff lp_len != 0? Or just
   leave it as it is? I think it doesn't matter much.

2. I changed the code to acquire the content locks on the buffer, as
   discussed, but it still uses HeapTupleSatisfiesVacuum. Amit suggested
   using HeapTupleSatisfiesVisibility, but it's not clear to me why that
   would be better. I welcome advice in this matter.

   (If anything, I should use HeapTupleIsSurelyDead, which doesn't set
   any hint bits, but which I earlier rejected as too conservative in
   its dead/not-dead decisions for this purpose.)

(I've actually patched the pgstattuple.sgml docs as well, but I want to
re-read that to make sure it's up to date, and didn't want to wait to
post the code changes.)

I also didn't change the name. I figure it's easy enough to change it
everywhere once all the remaining pieces are in place.

Comments welcome.

-- Abhijit
diff --git a/contrib/pgstattuple/Makefile b/contrib/pgstattuple/Makefile
index 862585c..bae80c9 100644
--- a/contrib/pgstattuple/Makefile
+++ b/contrib/pgstattuple/Makefile
@@ -1,10 +1,10 @@
 # contrib/pgstattuple/Makefile
 
 MODULE_big	= pgstattuple
-OBJS		= pgstattuple.o pgstatindex.o $(WIN32RES)
+OBJS		= pgstattuple.o pgstatindex.o fastbloat.o $(WIN32RES)
 
 EXTENSION = pgstattuple
-DATA = pgstattuple--1.2.sql pgstattuple--1.1--1.2.sql pgstattuple--1.0--1.1.sql pgstattuple--unpackaged--1.0.sql
+DATA = pgstattuple--1.3.sql pgstattuple--1.2--1.3.sql pgstattuple--1.1--1.2.sql pgstattuple--1.0--1.1.sql pgstattuple--unpackaged--1.0.sql
 PGFILEDESC = pgstattuple - tuple-level statistics
 
 REGRESS = pgstattuple
diff --git a/contrib/pgstattuple/fastbloat.c b/contrib/pgstattuple/fastbloat.c
new file mode 100644
index 000..b33db14
--- /dev/null
+++ b/contrib/pgstattuple/fastbloat.c
@@ -0,0 +1,367 @@
+/*
+ * contrib/pgstattuple/fastbloat.c
+ *
+ * Abhijit Menon-Sen a...@2ndquadrant.com
+ * Portions Copyright (c) 2001,2002	Tatsuo Ishii (from pg_stattuple)
+ *
+ * Permission to use, copy, modify, and distribute this software and
+ * its documentation for any purpose, without fee, and without a
+ * written agreement is hereby granted, provided that the above
+ * copyright notice and this paragraph and the following two
+ * paragraphs appear in all copies.
+ *
+ * IN NO EVENT SHALL THE AUTHOR BE LIABLE TO ANY PARTY FOR DIRECT,
+ * INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING
+ * LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS
+ * DOCUMENTATION, EVEN IF THE UNIVERSITY OF CALIFORNIA HAS BEEN ADVISED
+ * OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ * THE AUTHOR SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE.  THE SOFTWARE PROVIDED HEREUNDER IS ON AN AS
+ * IS BASIS, AND THE AUTHOR HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE,
+ * SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
+ */
+
+#include postgres.h
+
+#include access/visibilitymap.h
+#include access/transam.h
+#include access/xact.h
+#include access/multixact.h
+#include access/htup_details.h
+#include catalog/namespace.h
+#include funcapi.h
+#include miscadmin.h
+#include storage/bufmgr.h
+#include storage/freespace.h
+#include storage/procarray.h
+#include storage/lmgr.h
+#include utils/builtins.h
+#include utils/tqual.h
+#include commands/vacuum.h
+
+PG_FUNCTION_INFO_V1(fastbloat);
+PG_FUNCTION_INFO_V1(fastbloatbyid);
+
+/*
+ * tuple_percent, dead_tuple_percent and free_percent are computable,
+ * so not defined here.
+ */
+typedef struct fastbloat_output_type
+{
+	uint64		table_len;
+	uint64		tuple_count;
+	uint64		tuple_len;
+	uint64		dead_tuple_count;
+	uint64		dead_tuple_len;
+	uint64		free_space;
+	uint64		total_pages;
+	uint64		scanned_pages;
+} fastbloat_output_type;
+
+static Datum build_output_type(fastbloat_output_type *stat,
+			   FunctionCallInfo fcinfo);
+static Datum fbstat_relation(Relation rel, FunctionCallInfo fcinfo);
+static Datum fbstat_heap(Relation rel, FunctionCallInfo fcinfo);
+
+/*
+ * build a fastbloat_output_type tuple
+ */
+static Datum
+build_output_type(fastbloat_output_type *stat, FunctionCallInfo fcinfo)
+{
+#define NCOLUMNS	10
+#define NCHARS		32
+
+	HeapTuple	tuple;
+	char	   *values[NCOLUMNS];
+	char		values_buf[NCOLUMNS][NCHARS];
+	int			i;
+	double		tuple_percent;
+	double		dead_tuple_percent;
+	double