date:20150216

Hi,

In a lot of places in the code we have many structures doing
declarations of the type foo[1] to emulate variable length arrays.
Attached are a set of patches aimed at replacing that with
FLEXIBLE_ARRAY_MEMBER to prevent potential failures that could be
caused by compiler optimizations and improve report of static code
analyzers of the type Coverity (idea by Tom, patches from me):
- 0001 uses FLEXIBLE_ARRAY_MEMBER in a bunch of trivial places (at
least check-world does not complain)
- 0002 does the same for catalog tables
- 0003 changes varlena in c.h. This is done as a separate patch
because this needs some other modifications as variable-length things
need to be placed at the end of structures, because of:
-- rolpassword that should be placed as the last field of pg_authid
(and shouldn't there be CATALOG_VARLEN here??)
-- inv_api.c uses bytea in an internal structure without putting it at
the end of the structure. For clarity I think that we should just use
a bytea pointer and do a sufficient allocation for the duration of the
lobj manipulation.
-- Similarly, tuptoaster.c needed to be patched for toast_save_datum

There are as well couple of things that are not changed on purpose:
- in namespace.h for FuncCandidateList. I tried manipulating it but it
resulted in allocation overflow in PortalHeapMemory
- I don't think that the t_bits fields in htup_details.h should be
updated either.
Regards,
-- 
Michael
From f3389a945811e1db19eb5debc9e41b84fed0a9f5 Mon Sep 17 00:00:00 2001
From: Michael Paquier mich...@otacoo.com
Date: Mon, 16 Feb 2015 02:44:10 +0900
Subject: [PATCH 1/3] First cut with FLEXIBLE_ARRAY_MEMBER

This is targetted to prevent false positive errors that static code
analyzers may do by assuming that some structures have an unvarying
size.
---
 contrib/cube/cubedata.h |  7 +++
 contrib/intarray/_int.h |  4 ++--
 contrib/ltree/ltree.h   | 14 +++---
 contrib/pg_trgm/trgm.h  |  2 +-
 src/bin/pg_dump/dumputils.h |  2 +-
 src/include/access/gin_private.h|  4 ++--
 src/include/access/gist_private.h   |  5 +++--
 src/include/access/heapam_xlog.h|  2 +-
 src/include/access/htup_details.h   |  4 ++--
 src/include/access/spgist_private.h | 10 +-
 src/include/access/xact.h   |  8 
 src/include/c.h |  4 ++--
 src/include/commands/dbcommands.h   |  4 ++--
 src/include/commands/tablespace.h   |  2 +-
 src/include/executor/hashjoin.h |  2 +-
 src/include/nodes/bitmapset.h   |  2 +-
 src/include/nodes/params.h  |  2 +-
 src/include/nodes/tidbitmap.h   |  2 +-
 src/include/postgres.h  |  9 +
 src/include/postmaster/syslogger.h  |  2 +-
 src/include/replication/walsender_private.h |  2 +-
 src/include/storage/bufpage.h   |  3 ++-
 src/include/storage/fsm_internals.h |  2 +-
 src/include/storage/standby.h   |  4 ++--
 src/include/tsearch/dicts/regis.h   |  2 +-
 src/include/tsearch/dicts/spell.h   |  6 +++---
 src/include/tsearch/ts_type.h   |  6 +++---
 src/include/utils/catcache.h|  2 +-
 src/include/utils/datetime.h|  5 +++--
 src/include/utils/geo_decls.h   |  3 ++-
 src/include/utils/jsonb.h   |  2 +-
 src/include/utils/relmapper.h   |  2 +-
 src/include/utils/varbit.h  |  3 ++-
 33 files changed, 69 insertions(+), 64 deletions(-)

diff --git a/contrib/cube/cubedata.h b/contrib/cube/cubedata.h
index 5d44e11..3535847 100644
--- a/contrib/cube/cubedata.h
+++ b/contrib/cube/cubedata.h
@@ -23,11 +23,10 @@ typedef struct NDBOX
 	unsigned int header;
 
 	/*
-	 * Variable length array. The lower left coordinates for each dimension
-	 * come first, followed by upper right coordinates unless the point flag
-	 * is set.
+	 * The lower left coordinates for each dimension come first, followed
+	 * by upper right coordinates unless the point flag is set.
 	 */
-	double		x[1];
+	double		x[FLEXIBLE_ARRAY_MEMBER];
 } NDBOX;
 
 #define POINT_BIT			0x8000
diff --git a/contrib/intarray/_int.h b/contrib/intarray/_int.h
index 7f93206..d524f0f 100644
--- a/contrib/intarray/_int.h
+++ b/contrib/intarray/_int.h
@@ -73,7 +73,7 @@ typedef struct
 {
 	int32		vl_len_;		/* varlena header (do not touch directly!) */
 	int32		flag;
-	char		data[1];
+	char		data[FLEXIBLE_ARRAY_MEMBER];
 } GISTTYPE;
 
 #define ALLISTRUE		0x04
@@ -133,7 +133,7 @@ typedef struct QUERYTYPE
 {
 	int32		vl_len_;		/* varlena header (do not touch directly!) */
 	int32		size;			/* number of ITEMs */
-	ITEM		items[1];		/* variable length array */
+	ITEM		items[FLEXIBLE_ARRAY_MEMBER];
 } QUERYTYPE;
 
 #define HDRSIZEQT	offsetof(QUERYTYPE, items)
diff --git a/contrib/ltree/ltree.h

Re: [HACKERS] GSoC 2015 - mentors, students and admins.

2015-02-16 Thread Ilia Ivanicki

I'm interested in participating in gsoc 2015 as student.

Sincerely,
Ivanitskiy Ilya.

Re: [HACKERS] Allow snapshot too old error, to prevent bloat

2015-02-16 Thread Bruce Momjian

On Sun, Feb 15, 2015 at 11:36:50AM -0500, Tom Lane wrote:
 I wonder if we couldn't achieve largely the same positive effects through
 adding a simple transaction-level timeout option.  That would be far
 easier for users to understand and manage, it would be trivial to allow
 specific high-value transactions to run with a laxer setting, it does not
 create any implementation-detail-dependent behavior that we'd be having to
 explain to users forevermore, and (not incidentally) it would be a lot
 simpler and more trustworthy to implement.  There's no well-defined
 correlation between your setting and the net effect on database bloat,
 but that's true with the snapshot too old approach as well.

While we have prided ourselves on not generating query cancel messages
due to snapshot-too-old, there is the downside of bloat.  I understand
the need to allow users to make the trade-off between bloat and query
cancellation.

It seems we already have a mechanism in place that allows tuning of
query cancel on standbys vs. preventing standby queries from seeing old
data, specifically
max_standby_streaming_delay/max_standby_archive_delay.  We obsessed
about how users were going to react to these odd variables, but there
has been little negative feedback.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allow snapshot too old error, to prevent bloat

2015-02-16 Thread Jim Nasby


On 2/16/15 2:41 AM, Andres Freund wrote:

Since you actually don't want cancellations for the long running
reporting queries it very much might be sensible to switch to a more
complicated version of HeapTupleSatisfiesVacuum() if there's longrunning
queries. One that can recognize if rows are actually visible to any
backend, or if there are just snapshots that see older rows. I've
previously wondered how hard this would be, but I don't think it's
*that*  hard. And it'd be a much more general solution.


Josh Berkus mentioned on IRC that they're working on improving vacuum, 
and I think it was something along these lines.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issue installing doc tools on OSX

2015-02-16 Thread Peter Eisentraut

On 2/15/15 9:23 PM, David Steele wrote:
 That seems a bit incredible, since port should be able to resolve the
 dependencies by itself.  I suggest that this should be reported as a bug
 to MacPorts.
 
 Sure, that has been my experience, but the error message was very clear.
  Unfortunately I did not capture the error before I changed the order
 and the log file was removed on the next run.

How about uninstalling and reinstalling?



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issue installing doc tools on OSX

2015-02-16 Thread Florian Pflug

On Feb16, 2015, at 23:18 , Peter Eisentraut pete...@gmx.net wrote:
 On 2/15/15 9:23 PM, David Steele wrote:
 That seems a bit incredible, since port should be able to resolve the
 dependencies by itself.  I suggest that this should be reported as a bug
 to MacPorts.
 
 Sure, that has been my experience, but the error message was very clear.
 Unfortunately I did not capture the error before I changed the order
 and the log file was removed on the next run.

I just tried this on OS X 10.9.5 running MacPorts 2.3.3 and a ports tree
as of a few minutes ago. The command

After uninstalling docbook, opensp and openjade, re-installing them with

  sudo port install docbook-dsssl docbook-sgml-4.2 docbook-xml-4.2 docbook-xsl
  libxslt openjade opensp

completed without a hitch. It even logged

  ---  Computing dependencies for openjade
  ---  Dependencies to be installed: opensp

BTW, why does the list of suggested packages include docbook-xml? I was
under the impression that postgres used only the SGML version of docbook.
And I previously only has the SGML version installed, and I'm pretty sure
that I was able to build the documentation successfully.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commit fest 2015-12 enters money time

2015-02-16 Thread Kouhei Kaigai

  I couldn't find operation to add new entry on the current CF.
 
 Go to the bottom of this CF:
 https://commitfest.postgresql.org/4/
 Then click on New patch.

I couldn't find the New patch link on both of IE and Chrome,
even though I logged in. Did you do something special?

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com


 -Original Message-
 From: pgsql-hackers-ow...@postgresql.org
 [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Michael Paquier
 Sent: Tuesday, February 17, 2015 8:34 AM
 To: Kohei KaiGai
 Cc: PostgreSQL mailing lists
 Subject: Re: [HACKERS] Commit fest 2015-12 enters money time
 
 On Tue, Feb 17, 2015 at 7:39 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
  2015-02-16 12:25 GMT+09:00 Michael Paquier michael.paqu...@gmail.com:
 
 
  On Fri, Feb 13, 2015 at 10:06 AM, Michael Paquier
  michael.paqu...@gmail.com wrote:
 
  In order to move on to the next CF, I am going to go through all the
  remaining patches and update their status accordingly. And sorry for
  slacking a bit regarding that. If you wish to complain, of course
  feel free to post messages on this thread or on the thread related
  to the patch itself.
 
 
  As all the entries have been either moved or updated, CF 2014-12 is
  closed, and CF 2015-02 has been changed to In Progress.
 
  I tried to add my name on the CF entry for the Join pushdown support
  for foreign tables patch, however, it was unintentionally marked as
  rejected on the last CF, even though it should be returned with feedback.
 
  https://commitfest.postgresql.org/3/20/
 
  Is it available to move it to the current CF?
 
 I don't recall you can...
 
  I couldn't find operation to add new entry on the current CF.
 
 Go to the bottom of this CF:
 https://commitfest.postgresql.org/4/
 Then click on New patch.
 --
 Michael
 
 
 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Expanding the use of FLEXIBLE_ARRAY_MEMBER for declarations like foo[1]

On 2015-02-17 05:51:22 +0900, Michael Paquier wrote:
 - 0002 does the same for catalog tables
 - 0003 changes varlena in c.h. This is done as a separate patch
 because this needs some other modifications as variable-length things
 need to be placed at the end of structures, because of:
 -- rolpassword that should be placed as the last field of pg_authid
 (and shouldn't there be CATALOG_VARLEN here??)

Yes, there should.

 -- inv_api.c uses bytea in an internal structure without putting it at
 the end of the structure. For clarity I think that we should just use
 a bytea pointer and do a sufficient allocation for the duration of the
 lobj manipulation.

Hm. I don't really see the problem tbh.

There's actually is a reason the bytea is at the beginning - the other
fields are *are* the data part of the bytea (which is just the varlena
header).

 -- Similarly, tuptoaster.c needed to be patched for toast_save_datum

I'm not a big fan of these two changes. This adds some not that small
memory allocations to a somewhat hot path. Without a big win in
clarity. And it doesn't seem to have anything to do with with
FLEXIBLE_ARRAY_MEMBER.

 There are as well couple of things that are not changed on purpose:
 - in namespace.h for FuncCandidateList. I tried manipulating it but it
 resulted in allocation overflow in PortalHeapMemory

Did you change the allocation in FuncnameGetCandidates()? Note the -
sizeof(Oid) there.

 - I don't think that the t_bits fields in htup_details.h should be
 updated either.

Why not? Any not broken code should already use MinHeapTupleSize and
similar macros.

Generally it's worthwhile to check the code you changed for
sizeof(changed_struct) and similar things. Because this very well could
add bugs. And not all of them will result in a outright visibile error
like the FuncnameGetCandidates() one.

 --- a/src/include/access/htup_details.h
 +++ b/src/include/access/htup_details.h
 @@ -150,7 +150,7 @@ struct HeapTupleHeaderData
  
   /* ^ - 23 bytes - ^ */
  
 - bits8   t_bits[1];  /* bitmap of NULLs -- VARIABLE 
 LENGTH */
 + bits8   t_bits[1];  /* bitmap of NULLs */
  
   /* MORE DATA FOLLOWS AT END OF STRUCT */
  };
 @@ -579,7 +579,7 @@ struct MinimalTupleData
  
   /* ^ - 23 bytes - ^ */
  
 - bits8   t_bits[1];  /* bitmap of NULLs -- VARIABLE 
 LENGTH */
 + bits8   t_bits[1];  /* bitmap of NULLs */
  
   /* MORE DATA FOLLOWS AT END OF STRUCT */
  };

This sees like overeager search/replace.

 diff --git a/src/include/nodes/params.h b/src/include/nodes/params.h
 index 5b096c5..0ad7ef0 100644
 --- a/src/include/nodes/params.h
 +++ b/src/include/nodes/params.h
 @@ -71,7 +71,7 @@ typedef struct ParamListInfoData
   ParserSetupHook parserSetup;/* parser setup hook */
   void   *parserSetupArg;
   int numParams;  /* number of 
 ParamExternDatas following */
 - ParamExternData params[1];  /* VARIABLE LENGTH ARRAY */
 + ParamExternData params[1];
  }ParamListInfoData;


Eh?

 diff --git a/src/include/catalog/pg_attribute.h 
 b/src/include/catalog/pg_attribute.h
 index 87a3462..73bcefe 100644
 --- a/src/include/catalog/pg_attribute.h
 +++ b/src/include/catalog/pg_attribute.h
 @@ -157,13 +157,13 @@ CATALOG(pg_attribute,1249) BKI_BOOTSTRAP 
 BKI_WITHOUT_OIDS BKI_ROWTYPE_OID(75) BK
   /* NOTE: The following fields are not present in tuple descriptors. */
  
   /* Column-level access permissions */
 - aclitem attacl[1];
 + aclitem attacl[FLEXIBLE_ARRAY_MEMBER];
  
   /* Column-level options */
 - textattoptions[1];
 + textattoptions[FLEXIBLE_ARRAY_MEMBER];
  
   /* Column-level FDW options */
 - textattfdwoptions[1];
 + textattfdwoptions[FLEXIBLE_ARRAY_MEMBER];
  #endif
  } FormData_pg_attribute;

Ugh. This really isn't C at all anymore - these structs wouldn't compile
if you removed the #ifdef. Maybe we should instead a 'textarray' type
for genbki's benefit?

Generally the catalog changes are much less clearly a benefit imo.

 From ad8f54cdb5776146f17d1038bb295b5f13b549f1 Mon Sep 17 00:00:00 2001
 From: Michael Paquier mich...@otacoo.com
 Date: Mon, 16 Feb 2015 03:53:38 +0900
 Subject: [PATCH 3/3] Update varlena in c.h to use FLEXIBLE_ARRAY_MEMBER
 
 Places using a variable-length variable not at the end of a structure
 are changed with workaround, without impacting what those features do.

I vote for rejecting most of this, except a (corrected version) of the
pg_authid change. Which doesn't really belong to the rest of the patch
anyway ;)x

  #define VARHDRSZ ((int32) sizeof(int32))
 diff --git a/src/include/catalog/pg_authid.h b/src/include/catalog/pg_authid.h
 index e01e6aa..d8789a5 100644
 --- a/src/include/catalog/pg_authid.h
 +++ b/src/include/catalog/pg_authid.h
 @@ -56,8 +56,10 @@

Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0

2015-02-16 Thread Peter Geoghegan

On Mon, Feb 16, 2015 at 12:00 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 So INSERT ON CONFLICT IGNORE on a table with an exclusion constraint might
 fail. I don't like that. The point of having the command in the first place
 is to deal with concurrency issues. If it sometimes doesn't work, it's
 broken.

I don't like it either, although I think it wouldn't come up very
often with exclusion constraints - basically, it would rarely be
noticed due to the different use cases. To be honest, in suggesting
this idea I was hedging against us not figuring out a solution to that
problem in time. You didn't like my suggestion of dropping IGNORE
entirely, either. I'll do my best to come up with something, but I'm
uncomfortable that at this late stage, ON CONFLICT IGNORE support for
exclusion constraints seems like a risk to the entire project.

I ask that if push comes to shove you show some flexibility here. I'll
try my best to ensure that you don't have to even consider committing
something with a notable omission. You don't have to give me an answer
to this now.

 The idea of comparing the TIDs of the tuples as a tie-breaker seems most
 promising to me. If the conflicting tuple's TID is smaller than the inserted
 tuple's, super-delete and wait. Otherwise, wait without super-deletion.
 That's really simple. Do you see a problem with that?

No. I'll work on that, and see how it stands up to stress testing.
Come to think of it, that does seem most promising.

 BTW, it would good to explain somewhere in comments or a README the term
 unprincipled deadlock. It's been a useful concept, and hard to grasp. If
 you defined it once, with examples and everything, then you could just have
 See .../README in other places that need to refer it.

Agreed. I have described those in the revised executor README, in case
you missed that. Do you think they ought to have their own section?
Note that hash indexes have unprincipled deadlock issues, but no one
has bothered to fix them.

 Whatever works, really. I can't say that the performance implications
 of acquiring that hwlock are at the forefront of my mind. I never
 found that to be a big problem on an 8 core box, relative to vanilla
 INSERTs, FWIW - lock contention is not normal, and may be where any
 heavweight lock costs would really be encountered.

 Oh, cool. I guess the fast-path in lmgr.c kicks ass, then :-).

Seems that way. But even if that wasn't true, it wouldn't matter,
since I don't see that we have a choice.

 I don't see how storing the promise token in heap tuples buys us not
 having to involve heavyweight locking of tokens. (I think you may have
 made a thinko in suggesting otherwise)

 It wouldn't get rid of heavyweight locks, but it would allow getting rid of
 the procarray changes. The inserter could generate a token, then acquire the
 hw-lock on the token, and lastly insert the heap tuple with the correct
 token.

Do you really think that's worth the disk overhead? I generally agree
with the zero overhead principle, which is that anyone not using the
feature shouldn't pay no price for it (or vanishingly close to no
price). Can't say that I have a good sense of the added distributed
cost (if any) to be paid by adding new fields to the PGPROC struct
(since the PGXACT struct was added in 9.2). Is your only concern that
the PGPROC array will now have more fields, making it more
complicated? Surely that's a price worth paying to avoid making these
heap tuples artificially somewhat larger. You're probably left with
tuples that are at least 8 bytes larger, once alignment is taken into
consideration. That doesn't seem great.

 That couldn't work without further HeapTupleSatisfiesDirty() logic.

 Why not?

Just meant that it wasn't sufficient to check xmin == xmax, while
allowing that something like that could work with extra work (e.g. the
use of infomask bits)...

 Ok, so we can't unconditionally always ignore tuples with xmin==xmax. We
 would need an infomask bit to indicate super-deletion, and only ignore it if
 the bit is set.

...which is what you say here.

 I'm starting to think that we should bite the bullet and consume an infomask
 bit for this. The infomask bits are a scarce resource, but we should use
 them when it makes sense. It would be good for forensic purposes too, to
 leave a trace that a super-deletion happened.

 I too think the tqual.c changes are ugly. I can't see a way around
 using a token lock, though - I would only consider (and only consider)
 refining the interface by which a waiter becomes aware that it must
 wait on the outcome of the inserting xact's speculative
 insertion/value lock (and in particular, that is should not wait on
 xact outcome). We clearly need something to wait on that isn't an
 XID...heavyweight locking of a token value is the obvious way of doing
 that.


 Yeah.

 Jim Nasby said something about setting the HEAP_XMIN_INVALID hint bit.
 Maybe he is right...if that can be made to be reliable (always

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2015-02-16 Thread Andrew Dunstan



On 02/16/2015 08:48 PM, Petr Jelinek wrote:

On 17/02/15 01:57, Peter Geoghegan wrote:
On Mon, Feb 16, 2015 at 4:44 PM, Petr Jelinek p...@2ndquadrant.com 
wrote:

We definitely want this feature, I wished to have this info many times.


I would still like to see a benchmark.




Average of 3 runs of read-only pgbench on my system all with 
pg_stat_statement activated:

HEAD:  20631
SQRT:  20533
SQRTD: 20592





So using sqrtd the cost is 0.18%. I think that's acceptable.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commit fest 2015-12 enters money time

On Tue, Feb 17, 2015 at 8:50 AM, Kouhei Kaigai kai...@ak.jp.nec.com wrote:
  I couldn't find operation to add new entry on the current CF.

 Go to the bottom of this CF:
 https://commitfest.postgresql.org/4/
 Then click on New patch.

 I couldn't find the New patch link on both of IE and Chrome,
 even though I logged in. Did you do something special?

Err... No. Perhaps I can do it only thanks to my superpowers :)
Well, no worries I just added it here and it has the same data:
https://commitfest.postgresql.org/4/167/
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement


On 21/01/15 17:32, Andrew Dunstan wrote:


On 01/21/2015 11:21 AM, Arne Scheffer wrote:




Why is it a bad thing to call the column stddev_samp analog to the
aggregate function or make a note in the documentation, that the
sample stddev is used to compute the solution?


...

But I will add a note to the documentation, that seems reasonable.



I agree this is worth mentioning in the doc.

In any case here some review from me:

We definitely want this feature, I wished to have this info many times.

The patch has couple of issues though:

The 1.2 to 1.3 upgrade file is named pg_stat_statements--1.2-1.3.sql and 
should be renamed to pg_stat_statements--1.2--1.3.sql (two dashes).


There is new sqrtd function for fast sqrt calculation, which I think is 
a good idea as it will reduce the overhead of the additional calculation 
and this is not something where little loss of precision would matter 
too much. The problem is that the new function is actually not used 
anywhere in the code, I only see use of plain sqrt.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2015-02-16 Thread Peter Geoghegan

On Mon, Feb 16, 2015 at 4:44 PM, Petr Jelinek p...@2ndquadrant.com wrote:
 We definitely want this feature, I wished to have this info many times.

I would still like to see a benchmark.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issue installing doc tools on OSX

2015-02-16 Thread David Steele

On 2/16/15 6:10 PM, Florian Pflug wrote:
 On Feb16, 2015, at 23:18 , Peter Eisentraut pete...@gmx.net wrote:
 On 2/15/15 9:23 PM, David Steele wrote:
 That seems a bit incredible, since port should be able to resolve the
 dependencies by itself.  I suggest that this should be reported as a bug
 to MacPorts.

 Sure, that has been my experience, but the error message was very clear.
 Unfortunately I did not capture the error before I changed the order
 and the log file was removed on the next run.
 
 I just tried this on OS X 10.9.5 running MacPorts 2.3.3 and a ports tree
 as of a few minutes ago. The command
 
 After uninstalling docbook, opensp and openjade, re-installing them with
 
   sudo port install docbook-dsssl docbook-sgml-4.2 docbook-xml-4.2 docbook-xsl
   libxslt openjade opensp
 
 completed without a hitch. It even logged
 
   ---  Computing dependencies for openjade
   ---  Dependencies to be installed: opensp

Yeah, it worked when I did the uninstall/reinstall.

-- 
- David Steele
da...@pgmasters.net



signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] Commit fest 2015-12 enters money time

On Tue, Feb 17, 2015 at 7:39 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 2015-02-16 12:25 GMT+09:00 Michael Paquier michael.paqu...@gmail.com:


 On Fri, Feb 13, 2015 at 10:06 AM, Michael Paquier
 michael.paqu...@gmail.com wrote:

 In order to move on to the next CF, I am going to go through all the
 remaining patches and update their status accordingly. And sorry for
 slacking a bit regarding that. If you wish to complain, of course feel
 free to post messages on this thread or on the thread related to the
 patch itself.


 As all the entries have been either moved or updated, CF 2014-12 is closed,
 and CF 2015-02 has been changed to In Progress.

 I tried to add my name on the CF entry for the Join pushdown support
 for foreign tables patch, however, it was unintentionally marked as
 rejected on the last CF, even though it should be returned with feedback.

 https://commitfest.postgresql.org/3/20/

 Is it available to move it to the current CF?

I don't recall you can...

 I couldn't find operation to add new entry on the current CF.

Go to the bottom of this CF:
https://commitfest.postgresql.org/4/
Then click on New patch.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commit fest 2015-12 enters money time

2015-02-16 Thread Kouhei Kaigai

 On Tue, Feb 17, 2015 at 8:50 AM, Kouhei Kaigai kai...@ak.jp.nec.com wrote:
   I couldn't find operation to add new entry on the current CF.
 
  Go to the bottom of this CF:
  https://commitfest.postgresql.org/4/
  Then click on New patch.
 
  I couldn't find the New patch link on both of IE and Chrome, even
  though I logged in. Did you do something special?
 
 Err... No. Perhaps I can do it only thanks to my superpowers :) Well, no
 worries I just added it here and it has the same data:
 https://commitfest.postgresql.org/4/167/

That's exactly superpower. Thanks for your help.
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issue installing doc tools on OSX

2015-02-16 Thread Peter Eisentraut

On 2/16/15 6:10 PM, Florian Pflug wrote:
 BTW, why does the list of suggested packages include docbook-xml?

That is used for building the man pages.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allow snapshot too old error, to prevent bloat

On 2015-02-16 16:35:46 -0500, Bruce Momjian wrote:
 It seems we already have a mechanism in place that allows tuning of
 query cancel on standbys vs. preventing standby queries from seeing old
 data, specifically
 max_standby_streaming_delay/max_standby_archive_delay.  We obsessed
 about how users were going to react to these odd variables, but there
 has been little negative feedback.

FWIW, I think that's a somewhat skewed perception. I think it was right to
introduce those, because we didn't really have any alternatives.

But max_standby_streaming_delay, max_standby_archive_delay and
hot_standby_feedback are among the most frequent triggers for questions
and complaints that I/we see.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] NULL checks of deferenced pointers in picksplit method of intarray

On Tue, Feb 17, 2015 at 6:49 AM, Kevin Grittner kgri...@ymail.com wrote:
 Michael Paquier michael.paqu...@gmail.com wrote:

 Coverity is pointing out that _int_split.c has unnecessary checks
 for deferenced pointers in 5 places.

 Attached is a patch to adjust those things.

 Pushed.  Thanks!

Thanks.

 Also, as far as I understood from this code, no elements
 manipulated are NULL, perhaps this is worth an assertion?

 I'm not clear where you were thinking of, but anyway that seemed
 like a separate patch if we're going to do it, so I went ahead with
 pushing the issued Coverity flagged.  The arguments to the function
 don't need such a check because the function is exposed to SQL with
 the STRICT option (but you probably already knew that).  While
 reviewing the safety of this patch the only place that I ran across
 that I felt maybe deserved an assertion was that n = 0 near the
 top of copy_intArrayType(), but that seems marginal.

Yeah, we don't do that for the other STRICT functions, let's not do it then.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] We do not need pg_promote_v4_to_v6_addr/mask

We have some code in the server that attempts to match IPv4 address
entries in pg_hba.conf to incoming connections that are in IPv6 protocol
but have addresses in the range :::: (the IPv4-in-IPv6
subrange).  As revealed by today's bug report from Hugo Osvaldo Barrera,
this code has been broken since commit f3aec2c7f51904e7 (shipped in 9.0),
as a result of sloppiness with a memcpy() source address.  How is it that
nobody noticed?

Experimentation with a RHEL6 box says that at least on Linux kernels,
such cases never arise because the kernel will not report such an address
to the postmaster.  You can ask to connect with an address like that,
viz psql -h :::7f00:0001 ... but what the kernel will tell the
postmaster is the client address is IPv4-style 127.0.0.1.

It's certainly possible that on other platforms there is a visible
distinction between :::7f00:0001 and 127.0.0.1, but the lack of field
complaints about this bug suggests that there isn't.

Moreover: suppose that on hypothetical platform X the kernel does report
such client addresses differently.  Anybody on such a platform who's used
both types of client addresses with any 9.x release must have ended up
making different pg_hba.conf entries for the two cases.  It's not
impossible that they chose to make the entries meaningfully different;
in which case they will not thank us for fixing the code so that the
distinction vanishes again.  If your platform thinks these are different
addresses then you probably do too.

In short, then, I'm having second thoughts about the quick-hack patch
I committed earlier to fix the memcpy thinko.  On platforms where it has
any effect at all, that effect will be to make a subtle and perhaps
security-relevant change to the interpretation of pg_hba.conf entries,
changing what the postmaster has treated them as meaning since 9.0.0.
That doesn't sound like the kind of thing we want to do in minor releases.

Therefore, I now think what we ought to do, both in HEAD and in the back
branches, is rip out the #ifdef HAVE_IPV6 stanza in check_ip(), delete
pg_promote_v4_to_v6_addr and pg_promote_v4_to_v6_mask which will thereby
become unused, and remove the single sentence in the manual that claims
that IPv4-in-IPv6 connections will match IPv4 pg_hba.conf entries.  This
amounts to accepting our behavior since 9.0 as being correct.  If there
was ever a live need for the older behavior, its time has evidently
passed.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Join push-down support for foreign tables

2015-02-16 Thread Kouhei Kaigai

Let me put some comments in addition to where you're checking now.

[design issues]
* Cost estimation
Estimation and evaluation of cost for remote join query is not an
obvious issue. In principle, local side cannot determine the cost
to run remote join without remote EXPLAIN, because local side has
no information about JOIN logic applied on the remote side.
Probably, we have to put an assumption for remote join algorithm,
because local planner has no idea about remote planner's choice
unless foreign-join don't take use_remote_estimate.
I think, it is reasonable assumption (even if it is incorrect) to
calculate remote join cost based on local hash-join algorithm.
If user wants more correct estimation, remote EXPLAIN will make
more reliable cost estimation.

It also needs a consensus whether cost for remote CPU execution is
equivalent to local CPU. If we think local CPU is rare resource
than remote one, a discount rate will make planner more preferable
to choose remote join than local one.

Once we assume a join algorithm for remote join, unit cost for
remote CPU, we can calculate a cost for foreign join based on
the local join logic plus cost for network translation (maybe
fdw_tuple_cost?).


* FDW options
Unlike table scan, FDW options we should refer is unclear.
Table level FDW options are associated with a foreign table as
literal. I think we have two options here:
1. Foreign-join refers FDW options for foreign-server, but ones
   for foreign-tables are ignored.
2. Foreign-join is prohibited when both of relations don't have
   identical FDW options.
My preference is 2. Even though N-way foreign join, it ensures
all the tables involved with (N-1)-way foreign join has identical
FDW options, thus it leads we can make N-way foreign join with
all identical FDW options.
One exception is updatable flag of postgres_fdw. It does not
make sense on remote join, so I think mixture of updatable and
non-updatable foreign tables should be admitted, however, it is
a decision by FDW driver.

Probably, above points need to take time for getting consensus.
I'd like to see your opinion prior to editing your patch.

[implementation issues]
The interface does not intend to add new Path/Plan type for each scan
that replaces foreign joins. What postgres_fdw should do is, adding
ForeignPath towards a particular joinrel, then it populates ForeignScan
with remote join query once it got chosen by the planner.

A few functions added in src/backend/foreign/foreign.c are not
called by anywhere, at this moment.

create_plan_recurse() is reverted to static. It is needed for custom-
join enhancement, if no other infrastructure can support.

I'll check the code to construct remote query later.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com


 -Original Message-
 From: Shigeru Hanada [mailto:shigeru.han...@gmail.com]
 Sent: Monday, February 16, 2015 1:54 PM
 To: Kaigai Kouhei(海外 浩平)
 Cc: Robert Haas; PostgreSQL-development
 Subject: ##freemail## Re: [HACKERS] Join push-down support for foreign
 tables
 
 Kaigai-san,
 
 Oops.  I rebased the patch onto your v4 custom/foreign join patch.
 But as you mentioned off-list, I found a flaw about inappropriate change
 about NestPath still remains in the patch...  I might have made my dev branch
 into unexpected state.  I'll check it soon.
 
 2015-02-16 13:13 GMT+09:00 Kouhei Kaigai kai...@ak.jp.nec.com:
  Hanada-san,
 
  Your patch mixtures enhancement of custom-/foreign-scan interface and
  enhancement of contrib/postgres_fdw... Probably, it is a careless mis-
  operation.
  Please make your patch as differences from my infrastructure portion.
 
 
  Also, I noticed this Join pushdown support for foreign tables patch
  is unintentionally rejected in the last commit fest.
https://commitfest.postgresql.org/3/20/
  I couldn't register myself as reviewer. How do I operate it on the new
  commitfest application?
 
  Thanks,
  --
  NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei
  kai...@ak.jp.nec.com
 
 
  -Original Message-
  From: pgsql-hackers-ow...@postgresql.org
  [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Shigeru
  Hanada
  Sent: Monday, February 16, 2015 1:03 PM
  To: Robert Haas
  Cc: PostgreSQL-development
  Subject: Re: [HACKERS] Join push-down support for foreign tables
 
  Hi
 
  I've revised the patch based on Kaigai-san's custom/foreign join
  patch posted in the thread below.
 
 
 http://www.postgresql.org/message-id/9A28C8860F777E439AA12E8AEA7694F8
  0
  108c...@bpxm15gp.gisp.nec.co.jp
 
  Basically not changed from the version in the last CF, but as Robert
  commented before, N-way (not only 2-way) joins should be supported in
  the first version by construct SELECT SQL by containing source query
  in FROM clause as inline views (a.k.a. from clause subquery).
 
  2014-12-26 13:48 GMT+09:00 Shigeru Hanada shigeru.han...@gmail.com:
   2014-12-16 1:22 GMT+09:00 Robert Haas robertmh...@gmail.com:
   On

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement


On 17/02/15 01:57, Peter Geoghegan wrote:

On Mon, Feb 16, 2015 at 4:44 PM, Petr Jelinek p...@2ndquadrant.com wrote:

We definitely want this feature, I wished to have this info many times.


I would still like to see a benchmark.




Average of 3 runs of read-only pgbench on my system all with 
pg_stat_statement activated:

HEAD:  20631
SQRT:  20533
SQRTD: 20592


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] We do not need pg_promote_v4_to_v6_addr/mask

2015-02-16 Thread Andrew Dunstan



On 02/16/2015 09:07 PM, Tom Lane wrote:

I wrote:

We have some code in the server that attempts to match IPv4 address
entries in pg_hba.conf to incoming connections that are in IPv6 protocol
but have addresses in the range :::: (the IPv4-in-IPv6
subrange).  As revealed by today's bug report from Hugo Osvaldo Barrera,
this code has been broken since commit f3aec2c7f51904e7 (shipped in 9.0),
as a result of sloppiness with a memcpy() source address.  How is it that
nobody noticed?

BTW, a bit of digging in the git logs and mail archives says that the code
in question was originally added in 7.4 (commit 3c9bb8886df7d56a), in
response to this discussion:
http://www.postgresql.org/message-id/flat/200309012156.05874.t.maekit...@epgmbh.de

So back in 2003 there were Linux boxes that actively transformed IPv4
connection addresses to :::: format.  Current Linux behavior
is the exact opposite: even if you try to say :::: in a
connection request, IPv4 is what comes out the other end.  I find the same
on current OS X btw.  So I'm definitely now of the opinion that this is a
workaround for a long-deceased Linux kernel bug, and not something we need
to continue^X^X^Xresume supporting.




Wow, talk about a walk down memory lane. Apparently that thread 
triggered my rewrite of initdb - I'd totally forgotten that.


The proposed course sounds reasonable enough.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Expanding the use of FLEXIBLE_ARRAY_MEMBER for declarations like foo[1]

Andres Freund and...@2ndquadrant.com writes:
 On 2015-02-17 05:51:22 +0900, Michael Paquier wrote:
 diff --git a/src/include/catalog/pg_authid.h 
 b/src/include/catalog/pg_authid.h
 index e01e6aa..d8789a5 100644
 --- a/src/include/catalog/pg_authid.h
 +++ b/src/include/catalog/pg_authid.h
 @@ -56,8 +56,10 @@ CATALOG(pg_authid,1260) BKI_SHARED_RELATION 
 BKI_ROWTYPE_OID(2842) BKI_SCHEMA_MAC
 int32rolconnlimit;   /* max connections allowed (-1=no 
 limit) */
 
 /* remaining fields may be null; use heap_getattr to read them! */
 -textrolpassword;/* password, if any */
 timestamptz rolvaliduntil;   /* password expiration time, if any */
 +#ifdef CATALOG_VARLEN
 +textrolpassword;/* password, if any */
 +#endif
 } FormData_pg_authid;

 That change IIRC is wrong, because it'll make rolvaliduntil until NOT
 NULL (any column that's fixed width and has only fixed with columns
 before it is marked as such).

You were muttering about a BKI_FORCE_NOT_NULL option ... for symmetry,
maybe we could add BKI_FORCE_NULL as well, and then use that for cases
like this?  Also, if we want to insist that these fields be accessed
through heap_getattr, I'd be inclined to put them inside the #ifdef
CATALOG_VARLEN to enforce that.

I'm generally -1 on reordering any catalog columns as part of this patch.
There should be zero user-visible change from it IMO.  However, if we
stick both those columns inside the ifdef, we don't need to reorder.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Manipulating complex types as non-contiguous structures in-memory

Attached is an 0.3 version, rebased over today's HEAD changes (applies to
commit 9e3ad1aac52454569393a947c06be0d301749362 or later), and with some
better logic for transferring expanded array values into and out of plpgsql
functions.  Using this example:

create or replace function arraysetnum(n int) returns numeric[] as $$
declare res numeric[] := '{}';
begin
  for i in 1 .. n loop
res[i] := i;
  end loop;
  return res;
end
$$ language plpgsql strict;

create or replace function arraysumnum(arr numeric[]) returns numeric as $$
declare res numeric := 0;
begin
  for i in array_lower(arr, 1) .. array_upper(arr, 1) loop
res := res + arr[i];
  end loop;
  return res;
end
$$ language plpgsql strict;

create or replace function arraytimenum(n int) returns numeric as $$
declare tmp numeric[];
begin
  tmp := arraysetnum(n);
  return arraysumnum(tmp);
end
$$ language plpgsql strict;

either of the test cases

select arraysumnum(arraysetnum(10));
select arraytimenum(10);

involve exactly one coercion from flat to expanded array (during the
initial assignment of the '{}' constant to the res variable), no
coercions from expanded to flat, and no bulk copy operations.

So I'm starting to feel fairly good about this.  Obviously there's a
nontrivial amount of work to do with integrating the array-code changes
and teaching the rest of the array functions about expanded arrays (or
at least as many of them as seem performance-critical).  But that looks
like just a few days of basically-mechanical effort.  A larger question
is what we ought to do about extending the array-favoring hacks in plpgsql
to support this type of optimization for non-built-in types.

Realize that what this patch is able to improve are basically two types
of cases:

* nests of function calls that take and return the same complex datatype,
think foo(bar(baz(x))), where x is stored in some flat format but foo()
bar() and baz() all agree on an expanded format that's easier to process.

* plpgsql variables stored in an expanded format that's easier to process
for most functions that might work with their values.

The first case can be implemented by mutual agreement among the functions
of the datatype; it does not need any additional help beyond what's in
this patch.  But the second case does not work very well unless plpgsql
takes some proactive step to force variable values into the expanded
format.  Otherwise you get a win only if the last assignment to the
variable happened to come from a source that supplied a read-write
expanded value.  You can make that happen with appropriate coding in
the plpgsql function, of course, but it's klugy to have to do that.

I would not be ashamed to ship this in 9.5 as just an array optimization
and leave the larger question for next time ... but it does feel a bit
unfinished like this.  OTOH, I'm not sure whether the PostGIS folk care
all that much about the intermediate-values-in-plpgsql-variables
scenario.  They didn't bring it up in the discussion a year or so back
about their requirements.  We do know very well that plpgsql array
variables are a performance pain point, so maybe fixing that is enough
of a goal for 9.5.

(BTW, the nested-function-calls case sure seems like it's dead center
of the wheelhouse for JSONB.  Just sayin'.  I do not myself have time
to think about applying this technology to JSONB right now, but does
anyone else want to step up?)

regards, tom lane

diff --git a/src/backend/access/common/heaptuple.c b/src/backend/access/common/heaptuple.c
index 867035d..860ad78 100644
*** a/src/backend/access/common/heaptuple.c
--- b/src/backend/access/common/heaptuple.c
***
*** 60,65 
--- 60,66 
  #include access/sysattr.h
  #include access/tuptoaster.h
  #include executor/tuptable.h
+ #include utils/expandeddatum.h
  
  
  /* Does att's datatype allow packing into the 1-byte-header varlena format? */
*** heap_compute_data_size(TupleDesc tupleDe
*** 93,105 
  	for (i = 0; i  numberOfAttributes; i++)
  	{
  		Datum		val;
  
  		if (isnull[i])
  			continue;
  
  		val = values[i];
  
! 		if (ATT_IS_PACKABLE(att[i]) 
  			VARATT_CAN_MAKE_SHORT(DatumGetPointer(val)))
  		{
  			/*
--- 94,108 
  	for (i = 0; i  numberOfAttributes; i++)
  	{
  		Datum		val;
+ 		Form_pg_attribute atti;
  
  		if (isnull[i])
  			continue;
  
  		val = values[i];
+ 		atti = att[i];
  
! 		if (ATT_IS_PACKABLE(atti) 
  			VARATT_CAN_MAKE_SHORT(DatumGetPointer(val)))
  		{
  			/*
*** heap_compute_data_size(TupleDesc tupleDe
*** 108,118 
  			 */
  			data_length += VARATT_CONVERTED_SHORT_SIZE(DatumGetPointer(val));
  		}
  		else
  		{
! 			data_length = att_align_datum(data_length, att[i]-attalign,
! 		  att[i]-attlen, val);
! 			data_length = att_addlength_datum(data_length, att[i]-attlen,
  			  val);
  		}
  	}
--- 111,131 
  			 */
  			data_length += VARATT_CONVERTED_SHORT_SIZE(DatumGetPointer(val));
  		}
+

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement


On 17/02/15 03:03, Andrew Dunstan wrote:


On 02/16/2015 08:57 PM, Andrew Dunstan wrote:


On 02/16/2015 08:48 PM, Petr Jelinek wrote:

On 17/02/15 01:57, Peter Geoghegan wrote:

On Mon, Feb 16, 2015 at 4:44 PM, Petr Jelinek p...@2ndquadrant.com
wrote:

We definitely want this feature, I wished to have this info many
times.


I would still like to see a benchmark.




Average of 3 runs of read-only pgbench on my system all with
pg_stat_statement activated:
HEAD:  20631
SQRT:  20533
SQRTD: 20592





So using sqrtd the cost is 0.18%. I think that's acceptable.



Actually, sqrt/sqrtd is not called in accumulating the stats, only in
the reporting function. So it looks like the difference here should be
noise. Maybe we need some longer runs.



Yes there are variations between individual runs so it might be really 
just that, I can leave it running for much longer time tomorrow.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] We do not need pg_promote_v4_to_v6_addr/mask

I wrote:
 We have some code in the server that attempts to match IPv4 address
 entries in pg_hba.conf to incoming connections that are in IPv6 protocol
 but have addresses in the range :::: (the IPv4-in-IPv6
 subrange).  As revealed by today's bug report from Hugo Osvaldo Barrera,
 this code has been broken since commit f3aec2c7f51904e7 (shipped in 9.0),
 as a result of sloppiness with a memcpy() source address.  How is it that
 nobody noticed?

BTW, a bit of digging in the git logs and mail archives says that the code
in question was originally added in 7.4 (commit 3c9bb8886df7d56a), in
response to this discussion:
http://www.postgresql.org/message-id/flat/200309012156.05874.t.maekit...@epgmbh.de

So back in 2003 there were Linux boxes that actively transformed IPv4
connection addresses to :::: format.  Current Linux behavior
is the exact opposite: even if you try to say :::: in a
connection request, IPv4 is what comes out the other end.  I find the same
on current OS X btw.  So I'm definitely now of the opinion that this is a
workaround for a long-deceased Linux kernel bug, and not something we need
to continue^X^X^Xresume supporting.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0

2015-02-16 Thread Peter Geoghegan

On Mon, Feb 16, 2015 at 4:11 PM, Peter Geoghegan p...@heroku.com wrote:
 Jim Nasby said something about setting the HEAP_XMIN_INVALID hint bit.
 Maybe he is right...if that can be made to be reliable (always
 WAL-logged), it could be marginally better than setting xmin to
 invalidTransactionId.


 I'm not a big fan of that. The xmin-invalid bit is currently always just a
 hint.

 Well, Andres makes the point that that isn't quite so.

Hmm. So the tqual.c routines actually check if
(HeapTupleHeaderXminInvalid(tuple)). Which is:

#define HeapTupleHeaderXminInvalid(tup) \
( \
((tup)-t_infomask  (HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID)) == \
HEAP_XMIN_INVALID \
)

What appears to happen if I try to only use the HEAP_XMIN_INVALID bit
(and not explicitly set xmin to InvalidTransactionId, and not
explicitly check that xmin isn't InvalidTransactionId in each
HeapTupleSatisfies* routine) is that after a little while, Jeff Janes'
tool shows up spurious unique violations, as if some tuple has become
visible when it shouldn't have. I guess this is because the
HEAP_XMIN_COMMITTED hint bit can still be set, which in effect
invalidates the HEAP_XMIN_INVALID hint bit.

It takes about 2 minutes before this happens for the first time when
fsync = off, following a fresh initdb, which is unacceptable. I'm not
sure if it's worth trying to figure out how HEAP_XMIN_COMMITTED gets
set. Not that I'm 100% sure that that's what this really is; it just
seems very likely.

Attached broken patch (broken_visible.patch) shows the changes made to
reveal this problem. Only the changes to tqual.c and heap_delete()
should matter here, since I did not test recovery.

I also thought about unifying the check for if
(TransactionIdEquals(HeapTupleHeaderGetRawXmin(tuple),
InvalidTransactionId)) with the conventional
HeapTupleHeaderXminInvalid() macro, and leaving everything else as-is.
This is no good either, and for similar reasons - control often won't
reach the macro, which is behind a check of if
(!HeapTupleHeaderXminCommitted(tuple)).

The best I think we can hope for is having a dedicated if
(TransactionIdEquals(HeapTupleHeaderGetRawXmin(tuple),
InvalidTransactionId)) macro HeapTupleHeaderSuperDeleted() to do the
work each time, which does not need to be checked so often. A second
attached patch (compact_tqual_works.patch - which is non-broken,
AFAICT) shows how this is possible, while also moving the check
further down within each tqual.c routine (which seems more in keeping
with the fact that super deletion is a relatively obscure concept). I
haven't been able to break this variant using my existing test suite,
so this seems promising as a way of reducing tqual.c clutter. However,
as I said the other day, this is basically just polish.

-- 
Peter Geoghegan
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 0aa3e57..b777c56 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2899,7 +2899,7 @@ l1:
 	}
 	else
 	{
-		HeapTupleHeaderSetXmin(tp.t_data, InvalidTransactionId);
+		HeapTupleHeaderSetXminInvalid(tp.t_data);
 	}
 
 	/* Make sure there is no forward chain link in t_ctid */
@@ -7382,12 +7382,12 @@ heap_xlog_delete(XLogReaderState *record)
 		htup-t_infomask = ~(HEAP_XMAX_BITS | HEAP_MOVED);
 		htup-t_infomask2 = ~HEAP_KEYS_UPDATED;
 		HeapTupleHeaderClearHotUpdated(htup);
+		if (xlrec-flags  XLOG_HEAP_KILLED_SPECULATIVE_TUPLE)
+			HeapTupleHeaderSetXminInvalid(htup);
+
 		fix_infomask_from_infobits(xlrec-infobits_set,
    htup-t_infomask, htup-t_infomask2);
-		if (!(xlrec-flags  XLOG_HEAP_KILLED_SPECULATIVE_TUPLE))
-			HeapTupleHeaderSetXmax(htup, xlrec-xmax);
-		else
-			HeapTupleHeaderSetXmin(htup, InvalidTransactionId);
+		HeapTupleHeaderSetXmax(htup, xlrec-xmax);
 		HeapTupleHeaderSetCmax(htup, FirstCommandId, false);
 
 		/* Mark the page as a candidate for pruning */
diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c
index 99bb417..fd857b1 100644
--- a/src/backend/utils/time/tqual.c
+++ b/src/backend/utils/time/tqual.c
@@ -170,13 +170,6 @@ HeapTupleSatisfiesSelf(HeapTuple htup, Snapshot snapshot, Buffer buffer)
 	Assert(ItemPointerIsValid(htup-t_self));
 	Assert(htup-t_tableOid != InvalidOid);
 
-	/*
-	 * Never return super-deleted tuples
-	 */
-	if (TransactionIdEquals(HeapTupleHeaderGetRawXmin(tuple),
-			InvalidTransactionId))
-		return false;
-
 	if (!HeapTupleHeaderXminCommitted(tuple))
 	{
 		if (HeapTupleHeaderXminInvalid(tuple))
@@ -735,15 +728,6 @@ HeapTupleSatisfiesDirty(HeapTuple htup, Snapshot snapshot,
 	snapshot-xmin = snapshot-xmax = InvalidTransactionId;
 	snapshot-speculativeToken = 0;
 
-	/*
-	 * Never return super-deleted tuples
-	 *
-	 * XXX:  Comment this code out and you'll get conflicts within
-	 * ExecLockUpdateTuple(), which result in an infinite loop.
-	 */
-	if (TransactionIdEquals(HeapTupleHeaderGetRawXmin(tuple),
-			InvalidTransactionId))
-		return false;
 
 	if

Re: [HACKERS] We do not need pg_promote_v4_to_v6_addr/mask

Andrew Dunstan and...@dunslane.net writes:
 On 02/16/2015 09:07 PM, Tom Lane wrote:
 BTW, a bit of digging in the git logs and mail archives says that the code
 in question was originally added in 7.4 (commit 3c9bb8886df7d56a), in
 response to this discussion:
 http://www.postgresql.org/message-id/flat/200309012156.05874.t.maekit...@epgmbh.de

 Wow, talk about a walk down memory lane.

Tell me about it ;-).  I was entirely astonished to discover that the
original author of the promote_v4_to_v6 code was moi.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_basebackup -x/X doesn't play well with archive_mode wal_keep_segments

2015-02-16 Thread Fujii Masao

On Fri, Feb 13, 2015 at 9:18 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2015-02-12 11:44:05 -0800, Sergey Konoplev wrote:
 On Thu, Feb 12, 2015 at 11:40 AM, Andres Freund and...@2ndquadrant.com 
 wrote:
  This obviously should not be the case. I'll have a look in a couple of 
  hours. Until then you can likely  just work around the problem by creating 
  the archive_status directory.

 Thank you. Just let me know if you need some extra info or debugging.

 No need for debugging. It's plain and simply a (cherry-pick) conflict I
 resolved wrongly during backpatching. 9.3, 9.4 and master do not have
 that problem. That whole fix was quite painful because every single
 release had significantly different code :(. pg_basebackup/ is pretty
 messy.
 I'm not sure why my testsuite didn't trigger that problem. Possibly
 because a retry makes things work :(

 Somewhat uckily it's 9.2 only (9.3, 9.4 and master look correct, earlier
 releases don't have pg_receivexlog)

Are you planning to back-patch the fix to 9.2?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2015-02-16 Thread Andrew Dunstan



On 02/16/2015 08:57 PM, Andrew Dunstan wrote:


On 02/16/2015 08:48 PM, Petr Jelinek wrote:

On 17/02/15 01:57, Peter Geoghegan wrote:
On Mon, Feb 16, 2015 at 4:44 PM, Petr Jelinek p...@2ndquadrant.com 
wrote:
We definitely want this feature, I wished to have this info many 
times.


I would still like to see a benchmark.




Average of 3 runs of read-only pgbench on my system all with 
pg_stat_statement activated:

HEAD:  20631
SQRT:  20533
SQRTD: 20592





So using sqrtd the cost is 0.18%. I think that's acceptable.



Actually, sqrt/sqrtd is not called in accumulating the stats, only in 
the reporting function. So it looks like the difference here should be 
noise. Maybe we need some longer runs.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement


On 17/02/15 02:57, Andrew Dunstan wrote:


On 02/16/2015 08:48 PM, Petr Jelinek wrote:

On 17/02/15 01:57, Peter Geoghegan wrote:

On Mon, Feb 16, 2015 at 4:44 PM, Petr Jelinek p...@2ndquadrant.com
wrote:

We definitely want this feature, I wished to have this info many times.


I would still like to see a benchmark.




Average of 3 runs of read-only pgbench on my system all with
pg_stat_statement activated:
HEAD:  20631
SQRT:  20533
SQRTD: 20592





So using sqrtd the cost is 0.18%. I think that's acceptable.



I think so too.

I found one more issue with the 1.2--1.3 upgrade script, the DROP 
FUNCTION pg_stat_statements(); should be DROP FUNCTION 
pg_stat_statements(bool); since in 1.2 the function identity has changed.



--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal : REINDEX xxx VERBOSE

2015-02-16 Thread Kyotaro HORIGUCHI

Hello, I had a look on gram.y and found other syntaxes using WITH
option clause.

At Wed, 11 Feb 2015 14:34:17 -0600, Jim Nasby jim.na...@bluetreble.com wrote 
in 54dbbcc9.1020...@bluetreble.com
 I suspect at least some of this stems from how command line programs
 tend to process options before arguments. I tend to agree with you
 Tom, but I think what's more important is that we're consistent. COPY
 is already a bit of an oddball because it uses WITH, but both EXPLAIN
 and VACUUM use parenthesis immediately after the first
 verb. Introducing a parenthesis version that goes at the end instead
 of the beginning is just going to make this worse.
 
 If we're going to take a stand on this, we need to do it NOW, before
 we have even more commands that use ().
 
 I know you were worried about accepting options anywhere because it
 leads to reserved words, but perhaps we could support it just for
 EXPLAIN and VACUUM, and then switch to trailing options if people
 think that would be better.

I agree with the direction, but I see two issues here; how many
syntax variants we are allowed to have for one command at a time,
and how far we should/may extend the unified options syntax on
other commands.


Let me put the issues aside for now, VACUUM can have trailing
options naturally but it seems to change, but, IMHO, EXPLAIN
should have the target statement at the tail end. Are you
thinking of syntaxes like following?

  VACUUM [FULL] [FREEZE] ... [ANALYZE] [tname [(cname, ...)]
| VACUUM [({FULL [bool]|FREEZE [bool]|...}[,...])] [tname [(cname, ...)]
| VACUUM [tname [(cname, ...)] [[WITH ]({FULL [bool]|FREEZE [bool]|...})]

REINDEX [{INDEX|TABLE|...}] name [[WITH] (VERBOSE [bool]|...)]

EXPLAIN [[WITH] ({ANALYZE [bool]|VERBOSE [bool]|... [,...]})] statement

For concrete examples, the lines prefixed by asterisk are in new
syntax.

 VACUUM FULL table1;
 VACUUM ANALYZE table1 (col1);
 VACUUM (ANALYZE, VERBOSE) table1 (col1);
*VACUUM table1 WITH (FREEZE on)
*VACUUM table1 (cola) WITH (ANALYZE)
*VACUUM table1 WITH (ANALYZE)
*VACUUM table1 (FREEZE on)

The fifth example looks quite odd.


 REINDEX INDEX index1 FORCE;
*REINDEX TABLE table1 WITH (VERBOSE on);
*REINDEX TABLE table1 (VERBOSE on, FORCE on);
 
 EXPLAIN (ANALYZE) SELECT 1;
*EXPLAIN WITH (ANALYZE) SELECT 1;

The WITH looks a bit uneasy..


 COPY table1 FROM 'file.txt' WITH (FORMAT csv);


Returning to the second issue, the following statements have
option list (or single option) preceded (or not preceded) by the
word WITH. The prefixing dollar sign indicates that the syntax is
of SQL standard according to the PostgreSQL
Documentation. Asterisk indicates that the line shows the syntax
if the new policy is applied. Other few statements like DECLARE
looks using WITH as a part of an idiom.


 CREATE MATERIALIZED VIEW mv1 WITH (storageparam, ...) AS qry WITH DATA;

This is similar to EXPLAIN in the sense that a query follows
it, but this syntax can have the second WITH follows by DATA.


 CREATE EXTENSION ext1 WITH SCHEMA s1 VERSION v1 FROM over;
*CREATE EXTENSION ext1 WITH (SCHEMA s1, VERSION v1, FROM over);

This seems to fit the unification.


 CREATE ROLE role WITH LOGIN;
 CREATE ROLE role SUPERUSER, LOGIN;
$CREATE ROLE role WITH ADMIN admin;
*CREATE ROLE role WITH (SUPERUSER, LOGIN);
*CREATE ROLE role (SUPERUSER, LOGIN);

This seems meaninglessly too complecated.


 GRANT  WITH GRANT OPTION;
*GRANT  WITH (GRANT on);

Mmm. Seems no reasoning...


 CREATE VIEW v1 AS qry WITH CASCADED CHECK OPTION;
*CREATE VIEW v1 AS qry WITH (CASCADED_CHECK);

Wired syntax?


 ALTER DATABASE db1 WITH CONNECTION LIMIT 50;
*ALTER DATABASE db1 WITH (CONNECTION_LIMIT 50);

Hardly looks reasonable..


$DECLARE c1 INSENSITIVE SCROLL CURSOR WITH HOLD;

This cannot have another style.



Mmm... I'm at a loss what is desirable..

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC 2015 - mentors, students and admins.

2015-02-16 Thread Thom Brown

On 13 February 2015 at 15:40, Thom Brown t...@linux.com wrote:


 On 12 February 2015 at 14:55, Alexander Korotkov aekorot...@gmail.com
 wrote:

 Hi!

 On Mon, Feb 9, 2015 at 11:52 PM, Thom Brown t...@linux.com wrote:

 Google Summer of Code 2015 is approaching.  I'm intending on registering
 PostgreSQL again this year.

 Before I do that, I'd like to have an idea of how many people are
 interested in being either a student or a mentor.


 I'm ready to participate as mentor again!


 Thanks for the interest from mentors and students so far.

 Could I interest more to volunteer as mentors this year?  I'll be
 registering the project with Google soon so I'll want an idea of how many
 mentors we'll have available.


Any more mentors wish to step forward?  Please let me know A.S.A.P.

Thanks

Thom

Re: [HACKERS] Commit fest 2015-12 enters money time

2015-02-16 Thread Kohei KaiGai

2015-02-16 12:25 GMT+09:00 Michael Paquier michael.paqu...@gmail.com:


 On Fri, Feb 13, 2015 at 10:06 AM, Michael Paquier
 michael.paqu...@gmail.com wrote:

 In order to move on to the next CF, I am going to go through all the
 remaining patches and update their status accordingly. And sorry for
 slacking a bit regarding that. If you wish to complain, of course feel
 free to post messages on this thread or on the thread related to the
 patch itself.


 As all the entries have been either moved or updated, CF 2014-12 is closed,
 and CF 2015-02 has been changed to In Progress.

I tried to add my name on the CF entry for the Join pushdown support
for foreign tables patch, however, it was unintentionally marked as
rejected on the last CF, even though it should be returned with feedback.

https://commitfest.postgresql.org/3/20/

Is it available to move it to the current CF?
I couldn't find operation to add new entry on the current CF.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allow snapshot too old error, to prevent bloat

2015-02-16 Thread Magnus Hagander

On Feb 17, 2015 12:26 AM, Andres Freund and...@2ndquadrant.com wrote:

 On 2015-02-16 16:35:46 -0500, Bruce Momjian wrote:
  It seems we already have a mechanism in place that allows tuning of
  query cancel on standbys vs. preventing standby queries from seeing old
  data, specifically
  max_standby_streaming_delay/max_standby_archive_delay.  We obsessed
  about how users were going to react to these odd variables, but there
  has been little negative feedback.

 FWIW, I think that's a somewhat skewed perception. I think it was right to
 introduce those, because we didn't really have any alternatives.

 But max_standby_streaming_delay, max_standby_archive_delay and
 hot_standby_feedback are among the most frequent triggers for questions
 and complaints that I/we see.


Agreed.

And a really bad one used to be vacuum_defer_cleanup_age, because of
confusing units amongst other things. Which in terms seems fairly close to
Kevins suggestions, unfortunately.

/Magnus

Re: [HACKERS] Really bad blowups with hash outer join and nulls

2015-02-16 Thread Tomas Vondra

On 16.2.2015 03:38, Andrew Gierth wrote:
 Tomas == Tomas Vondra tomas.von...@2ndquadrant.com
 writes:

 Tomas Improving the estimates is always good, but it's not going
 to Tomas fix the case of non-NULL values (it shouldn't be all
 that Tomas difficult to create such examples with a value whose
 hash starts Tomas with a bunch of zeroes).

 Right now, I can't get it to plan such an example, because (a) if
 there are no stats to work from then the planner makes fairly
 pessimistic assumptions about hash bucket filling, and (b) if
 there _are_ stats to work from, then a frequently-occurring
 non-null value shows up as an MCV and the planner takes that into
 account to calculate bucketsize.

 The problem could only be demonstrated for NULLs because the
 planner was ignoring NULL for the purposes of estimating
 bucketsize, which is correct for all join types except RIGHT and
 FULL (which, iirc, are more recent additions to the hashjoin
 repertoire).

Oh, right, the estimate fix is probably sufficient then.


 If you want to try testing it, you may find this useful:

 select i, hashint8(i) from unnest(array[1474049294, -1779024306,
-1329041947]) u(i);
 i  | hashint8 -+-- 1474049294 |0
 -1779024306 |0 -1329041947 |0 (3 rows)

 (those are the only three int4 values that hash to exactly 0)

 It's probably possible to construct pathological cases by finding
 a lot of different values with zeros in the high bits of the hash,
 but that's something that wouldn't be likely to happen by chance.

Yeah, it's probably possible, but it's admittedly considerably harder
than I initially thought. For example it could be possible to create
the table with no MCV values but sorted so that all the initial values
have hashvalue=0, triggering (growEnabled=false). But that's rather
unlikely to happen in practice I guess.

A more likely failure scenario is a hash join higher up the plan,
processing results of other joins etc. In that case the estimates will
be tricky, although the planner chooses quite pessimistic defaults in
those cases.

-- 
Tomas Vondrahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC 2015 - mentors, students and admins.

2015-02-16 Thread Dmitry Dolgov

I would like to participate as student.

On 10 February 2015 at 03:52, Thom Brown t...@linux.com wrote:

 Hi all,

 Google Summer of Code 2015 is approaching.  I'm intending on registering
 PostgreSQL again this year.

 Before I do that, I'd like to have an idea of how many people are
 interested in being either a student or a mentor.

 I've volunteered to be admin again, but if anyone else has a strong
 interest of seeing the projects through this year, please let yourself be
 known.

 Who would be willing to mentor projects this year?

 Project ideas are welcome, and I've copied many from last year's
 submissions into this year's wiki page.  Please feel free to add more (or
 delete any that stand no chance or are redundant):
 https://wiki.postgresql.org/wiki/GSoC_2015

 Students can find more information about GSoC here:
 http://www.postgresql.org/developer/summerofcode

 Thanks

 Thom

Re: [HACKERS] Allow snapshot too old error, to prevent bloat

On 2015-02-16 05:19:11 +, Kevin Grittner wrote:
 Tom Lane t...@sss.pgh.pa.us wrote:
  Jim Nasby jim.na...@bluetreble.com writes:
  On 2/15/15 10:36 AM, Tom Lane wrote:
  I wonder if we couldn't achieve largely the same positive effects through
  adding a simple transaction-level timeout option.
 
 We suggested this to our customer and got out of the meeting with
 it looking like it *might* fly.  In the next meeting, however, they
 said they had run it by others and reviewed the code and it was
 completely unacceptable -- they would not consider pg with this as
 the solution.
 
  A common use-case is long-running reports hitting relatively stable data
  in a database that also has tables with a high churn rate (ie: queue
  tables). In those scenarios your only options right now are to suffer
  huge amounts of bloat in the high-churn or not do your reporting. A
  simple transaction timeout only solves this by denying you reporting
  queries.
 
  Agreed, but Kevin's proposal has exactly the same problem only worse,
  because (a) the reporting transactions might or might not fail (per
  Murphy, they'll fail consistently only when you're on deadline), and
  (b) if they do fail, there's nothing you can do short of increasing the
  slack db-wide.
 
 These they were comfortable with, and did *not* consider to be
 unpredictable or something they could not do something about.
 I really don't feel I can say more than that, though, without
 disclosing more than I should.

I agree that we need to do something about the dangers of long
snapshots, I'm not sure this is it. I'm not sure of the contrary either.

What I'm definitely not a fan of though, is this implementation. Having
to add checks to a large number of places is a recipe for silently wrong
query results.


One thing I was wondering about recently was introducing an optimization
for repeatedly updated rows into the vacuum code: A row that has xmin =
xmax where these have committed can be removed, even if the xid is above
the xmin horizon - no other transaction is ever going to see it. While
there's some hairy-ness implementing that, it doesn't seem too hard. And
there's a fair number of cases where that'd allow less bloat to
accumulate.  Obviously it'd be better if we had logic to do that for
other patterns as well (where the updates aren't in the same xact), but
it seems like a good start.



  There might be something in that, but again it's not much like this patch.
  The key point I think we're both making is that nondeterministic failures
  are bad, especially when you're talking about long-running, expensive-to-
  retry transactions.
 
 What the customer most doesn't want to be nondeterministic is
 whether the error is generated only when the snapshot has been used
 to read a page which has been modified since the snapshot was
 taken.  If tables or materialized views are set up before a query
 and then not subsequently modified during the execution of the
 query, that query must not be canceled even if it runs for days,
 but it must not cause unconstrained bloat during the run.  So far I
 don't see any way to avoid including the LSN with the snapshot or
 modifying the index AMs.  Let's be clear on the footprint for that;
 for the btree implementation it is:

IIUC you never would want cancellations when accessing the the tables
these longrunning backends actually access. The errors about too old
snapshots are just a stopgap because we can't compute the xmin per table
in a meaningful way atm. Right?

Is the bloat caused by rows these transactions actually can see or are
the not-removed rows newer than the transaction's xmax?

Since you actually don't want cancellations for the long running
reporting queries it very much might be sensible to switch to a more
complicated version of HeapTupleSatisfiesVacuum() if there's longrunning
queries. One that can recognize if rows are actually visible to any
backend, or if there are just snapshots that see older rows. I've
previously wondered how hard this would be, but I don't think it's
*that* hard. And it'd be a much more general solution.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication identifiers, take 4

On 2015-02-16 11:07:09 +0200, Heikki Linnakangas wrote:
 On 02/16/2015 02:21 AM, Andres Freund wrote:
 Furthermore the fact that the origin of records is recorded allows to
 avoid decoding them in logical decoding. That has both efficiency
 advantages (we can do so before they are stored in memory/disk) and
 functionality advantages. Imagine using a logical replication solution
 to replicate inserts to a single table between two databases where
 inserts are allowed on both - unless you prevent the replicated inserts
 from being replicated again you obviously have a loop. This
 infrastructure lets you avoid that.
 
 That makes sense.
 
 How does this work if you have multiple replication systems in use in the
 same cluster? You might use Slony to replication one table to one system,
 and BDR to replication another table with another system. Or the same
 replication software, but different hosts.

It should just work. Replication identifiers are identified by a free
form text, each replication solution can add the
information/distinguising data they need in there.

Bdr uses something like
#define BDR_NODE_ID_FORMAT bdr_UINT64_FORMAT_%u_%u_%u_%s
with
remote_sysid, remote_tlid, remote_dboid, MyDatabaseId, configurable_name
as parameters as a replication identifier name.

I've been wondering whether the bdr_ part in the above should be split
of into a separate field, similar to how the security label stuff does
it. But I don't think it'd really buy us much, especially as we did
not do that for logical slot names.

Each of the used replication solution would probably ask their output
plugin to only stream locally generated (i.e. origin_id =
InvalidRepNodeId) changes, and possibly from a defined list of other
known hosts in the cascading case.

Does that answer your question?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0

On 2015-02-16 10:00:24 +0200, Heikki Linnakangas wrote:
 On 02/16/2015 02:44 AM, Peter Geoghegan wrote:
 Are we happy with acquiring the SpeculativeInsertLock heavy-weight lock for
 every insertion? That seems bad for performance reasons. Also, are we happy
 with adding the new fields to the proc array? Another alternative that was
 discussed was storing the speculative insertion token on the heap tuple
 itself. (See
 http://www.postgresql.org/message-id/52d00d2d.6030...@vmware.com)
 
 Whatever works, really. I can't say that the performance implications
 of acquiring that hwlock are at the forefront of my mind. I never
 found that to be a big problem on an 8 core box, relative to vanilla
 INSERTs, FWIW - lock contention is not normal, and may be where any
 heavweight lock costs would really be encountered.
 
 Oh, cool. I guess the fast-path in lmgr.c kicks ass, then :-).

I don't think it actually uses the fast path, does it? IIRC that's
restricted to LOCKTAG_RELATION when done via LockAcquireExtended + open
coded for the VirtualXactLock table.

I'm not super worried atm either. Worth checking, but probably not worth
obsessing over.

 Besides, why should one transaction be entitled to insert a
 conflicting value tuple just because a still running transaction
 deleted it (having also inserted the tuple itself)? Didn't one
 prototype version of value locking #2 have that as a bug [1]? In fact,
 originally, didn't the xmin set to invalid thing come immediately
 from realizing that that wasn't workable?
 
 Ah, right. So the problem was that some code might assume that if you insert
 a row, delete it in the same transaction, and then insert the same value
 again, the 2nd insertion will always succeed because the previous insertion
 effectively acquired a value-lock on the key.
 
 Ok, so we can't unconditionally always ignore tuples with xmin==xmax. We
 would need an infomask bit to indicate super-deletion, and only ignore it if
 the bit is set.
 
 I'm starting to think that we should bite the bullet and consume an infomask
 bit for this. The infomask bits are a scarce resource, but we should use
 them when it makes sense. It would be good for forensic purposes too, to
 leave a trace that a super-deletion happened.

Well, we IIRC don't have any left right now. We could reuse
MOVED_IN|MOVED_OUT, as that never was set in the past. But it'd
essentially use two infomask bits forever...

 Jim Nasby said something about setting the HEAP_XMIN_INVALID hint bit.
 Maybe he is right...if that can be made to be reliable (always
 WAL-logged), it could be marginally better than setting xmin to
 invalidTransactionId.
 
 I'm not a big fan of that. The xmin-invalid bit is currently always just a
 hint.

We already rely on XMIN_INVALID being set correctly for
freezing. C.f. HeapTupleHeaderXminFrozen, HeapTupleHeaderXminInvalid, et
al. So it'd not necessarily end up being that bad. And the super
deletion could easily just set it in the course of it's WAL logging.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication identifiers, take 4

2015-02-16 Thread Heikki Linnakangas


On 02/16/2015 02:21 AM, Andres Freund wrote:

Furthermore the fact that the origin of records is recorded allows to
avoid decoding them in logical decoding. That has both efficiency
advantages (we can do so before they are stored in memory/disk) and
functionality advantages. Imagine using a logical replication solution
to replicate inserts to a single table between two databases where
inserts are allowed on both - unless you prevent the replicated inserts
from being replicated again you obviously have a loop. This
infrastructure lets you avoid that.


That makes sense.

How does this work if you have multiple replication systems in use in 
the same cluster? You might use Slony to replication one table to one 
system, and BDR to replication another table with another system. Or the 
same replication software, but different hosts.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0

2015-02-16 Thread Heikki Linnakangas

On 02/16/2015 02:44 AM, Peter Geoghegan wrote:

On Sat, Feb 14, 2015 at 2:06 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:

I did not solve the potential for livelocks (discussed at
http://www.postgresql.org/message-id/cam3swztftt_fehet3tu3ykcpcypynnauquz3q+naasnh-60...@mail.gmail.com).
The patch always super-deletes the already-inserted tuple on conflict, and
then waits for the other inserter. That would be nice to fix, using one of
the ideas from that thread, or something else.

How about we don't super-delete at all with exclusion constraints? I'd
be willing to accept unprincipled deadlocks for exclusion constraints,
because they already exist today for UPSERT/NOSERT type use cases, and
with idiomatic usage seem far less likely for the IGNORE variant
(especially with exclusion constraints).

So INSERT ON CONFLICT IGNORE on a table with an exclusion constraint
might fail. I don't like that. The point of having the command in the
first place is to deal with concurrency issues. If it sometimes doesn't
work, it's broken.

I can see people using ON
CONFLICT UPDATE where they'd almost or actually be better off using a
plain UPDATE - that's quite a different situation. I find livelocks to
be a *very* scary prospect, and I don't think the remediations that
were mentioned are going to fly. It's just too messy, and too much of
a niche use case. TBH I am skeptical of the idea that we can fix
exclusion constraints properly in this way at all, at least not
without the exponential backoff thing, which just seems horrible.

The idea of comparing the TIDs of the tuples as a tie-breaker seems most
promising to me. If the conflicting tuple's TID is smaller than the
inserted tuple's, super-delete and wait. Otherwise, wait without
super-deletion. That's really simple. Do you see a problem with that?

Although you understood what I was on about when I first talked about
unprincipled deadlocks, I think that acceptance of that idea came
much later from other people, because it's damn complicated.

BTW, it would good to explain somewhere in comments or a README the term
unprincipled deadlock. It's been a useful concept, and hard to grasp.
If you defined it once, with examples and everything, then you could
just have See .../README in other places that need to refer it.

It's not worth it to add
some weird Dining philosophers exponential backoff thing to make
sure that the IGNORE variant when used with exclusion constraints can
never deadlock in an unprincipled way, but if it is worth it then it
seems unreasonable to suppose that this patch needs to solve that
pre-existing problem. No?

The point of solving the existing problem is that it allows us to split
the patch, for easier review.

Are we happy with acquiring the SpeculativeInsertLock heavy-weight lock for
every insertion? That seems bad for performance reasons. Also, are we happy
with adding the new fields to the proc array? Another alternative that was
discussed was storing the speculative insertion token on the heap tuple
itself. (See
http://www.postgresql.org/message-id/52d00d2d.6030...@vmware.com)

Whatever works, really. I can't say that the performance implications
of acquiring that hwlock are at the forefront of my mind. I never
found that to be a big problem on an 8 core box, relative to vanilla
INSERTs, FWIW - lock contention is not normal, and may be where any
heavweight lock costs would really be encountered.

Oh, cool. I guess the fast-path in lmgr.c kicks ass, then :-).

I don't see how storing the promise token in heap tuples buys us not
having to involve heavyweight locking of tokens. (I think you may have
made a thinko in suggesting otherwise)

It wouldn't get rid of heavyweight locks, but it would allow getting rid
of the procarray changes. The inserter could generate a token, then
acquire the hw-lock on the token, and lastly insert the heap tuple with
the correct token.

Are we happy with the way super-deletion works? Currently, the xmin field is
set to invalid to mark the tuple as super-deleted. That required checks in
HeapTupleSatisfies* functions. One alternative would be to set xmax equal to
xmin, and teach the code currently calls XactLockTableWait() to check if
xmax=xmin, and not consider the tuple as conflicting.

That couldn't work without further HeapTupleSatisfiesDirty() logic.

Why not?

Besides, why should one transaction be entitled to insert a
conflicting value tuple just because a still running transaction
deleted it (having also inserted the tuple itself)? Didn't one
prototype version of value locking #2 have that as a bug [1]? In fact,
originally, didn't the xmin set to invalid thing come immediately
from realizing that that wasn't workable?

Ah, right. So the problem was that some code might assume that if you
insert a row, delete it in the same transaction, and then insert the
same value again, the 2nd insertion will always succeed because the
previous insertion effectively acquired a

Re: [HACKERS] Replication identifiers, take 4

2015-02-16 Thread Heikki Linnakangas


On 02/16/2015 11:18 AM, Andres Freund wrote:

On 2015-02-16 11:07:09 +0200, Heikki Linnakangas wrote:

How does this work if you have multiple replication systems in use in the
same cluster? You might use Slony to replication one table to one system,
and BDR to replication another table with another system. Or the same
replication software, but different hosts.


It should just work. Replication identifiers are identified by a free
form text, each replication solution can add the
information/distinguising data they need in there.

Bdr uses something like
#define BDR_NODE_ID_FORMAT bdr_UINT64_FORMAT_%u_%u_%u_%s
with
remote_sysid, remote_tlid, remote_dboid, MyDatabaseId, configurable_name
as parameters as a replication identifier name.

I've been wondering whether the bdr_ part in the above should be split
of into a separate field, similar to how the security label stuff does
it. But I don't think it'd really buy us much, especially as we did
not do that for logical slot names.

Each of the used replication solution would probably ask their output
plugin to only stream locally generated (i.e. origin_id =
InvalidRepNodeId) changes, and possibly from a defined list of other
known hosts in the cascading case.

Does that answer your question?


Yes, thanks. Note to self and everyone else looking at this: It's 
important to keep in mind is that the replication IDs are completely 
internal to the local cluster. They are *not* sent over the wire.


That's not completely true if you also use physical replication, though. 
The replication IDs will be included in the WAL stream. Can you have 
logical decoding running in a hot standby server? And how does the 
progress report stuff that's checkpointed in pg_logical/checkpoints work 
in a hot standby? (Sorry if I'm not making sense, I haven't really 
thought hard about this myself)


At a quick glance, this basic design seems workable. I would suggest 
expanding the replication IDs to regular 4 byte oids. Two extra bytes is 
a small price to pay, to make it work more like everything else in the 
system.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication identifiers, take 4

On 2015-02-16 11:34:10 +0200, Heikki Linnakangas wrote:
 Yes, thanks. Note to self and everyone else looking at this: It's important
 to keep in mind is that the replication IDs are completely internal to the
 local cluster. They are *not* sent over the wire.

Well, if you *want* to, you could send the external (free form text)
replication identifiers over the wire in the output plugin. There are
scenarios where that might make sense.

 That's not completely true if you also use physical replication, though. The
 replication IDs will be included in the WAL stream.

Right. But then a physical rep server isn't realy a different server.

 Can you have logical decoding running in a hot standby server?

Not at the moment, there's some minor stuff missing (following
timelines, enforcing tuple visibility on the primary).

 And how does the progress report stuff that's checkpointed in
 pg_logical/checkpoints work in a hot standby?  (Sorry if I'm not
 making sense, I haven't really thought hard about this myself)

It doesn't work that greatly atm, they'd need to be WAL logged for that
- which would not be hard. It'd be better to include the information in
the checkpoint, but that unfortunately doesn't really work, because we
store the checkpoint in the control file. And that has to be =
512 bytes.

What, I guess, we could do is log it in the checkpoint, after
determining the redo pointer, and store the LSN for it in the checkpoint
record proper. That'd mean we'd read one more record at startup, but
that isn't particularly bad.

 At a quick glance, this basic design seems workable. I would suggest
 expanding the replication IDs to regular 4 byte oids. Two extra bytes is a
 small price to pay, to make it work more like everything else in the system.

I don't know. Growing from 3 to 5 byte overhead per relevant record (or
even 0 to 5 in case the padding is reused) is rather noticeable. If we
later find it to be a limit (I seriously doubt that), we can still
increase it in a major release without anybody really noticing.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-16 Thread Syed, Rahila

Hello,

Thank you for reviewing and testing the patch.

+   /* leave if data cannot be compressed */
+   if (compressed_len == 0)
+   return false;
This should be  0, pglz_compress returns -1 when compression fails.

+   if (pglz_decompress(block_image, bkpb-bkp_len, 
record-uncompressBuf,
+   
bkpb-bkp_uncompress_len) == 0)
Similarly, this should be  0.

These have been corrected in the attached.

Regarding the sanity checks that have been added recently. I think that they 
are useful but I am suspecting as well that only a check on the record CRC is 
done because that's reliable enough and not doing those checks accelerates a 
bit replay. So I am thinking that we should simply replace them by assertions.
Removing the checks makes sense as CRC ensures correctness . Moreover ,as error 
message for invalid length of record is present in the code , messages for 
invalid block length can be redundant.
Checks have been replaced by assertions in the attached patch.

Following if  condition in XLogCompressBackupBlock has been modified as follows

Previous
/*
+  * We recheck the actual size even if pglz_compress() reports success 
and
+  * see if at least 2 bytes of length have been saved, as this 
corresponds
+  * to the additional amount of data stored in WAL record for a 
compressed
+  * block via raw_length when block contains hole..
+  */
+  *len = (uint16) compressed_len;
+  if (*len = orig_len - SizeOfXLogRecordBlockImageCompressionInfo)
+  return false;
+  return true;


Current
if ((hole_length != 0) 
+  (*len = orig_len - 
SizeOfXLogRecordBlockImageCompressionInfo))
+  return false;
+return true

This is because the extra information raw_length is included only if compressed 
block has hole in it.

Once we get those small issues fixes, I think that it is with having a 
committer look at this patch, presumably Fujii-san
Agree. I will mark this patch as ready for committer

Thank you,
Rahila Syed

__
Disclaimer: This email and any attachments are sent in strictest confidence
for the sole use of the addressee and may contain legally privileged,
confidential, and proprietary data. If you are not the intended recipient,
please advise the sender by replying promptly to this email and then delete
and destroy this email and any attachments without any further use, copying
or forwarding.

Support-compression-for-full-page-writes-in-WAL_v19.patch
Description: Support-compression-for-full-page-writes-in-WAL_v19.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

On 2015-02-16 11:30:20 +, Syed, Rahila wrote:
 - * As a trivial form of data compression, the XLOG code is aware that
 - * PG data pages usually contain an unused hole in the middle, which
 - * contains only zero bytes.  If hole_length  0 then we have removed
 - * such a hole from the stored data (and it's not counted in the
 - * XLOG record's CRC, either).  Hence, the amount of block data actually
 - * present is BLCKSZ - hole_length bytes.
 + * Block images are able to do several types of compression:
 + * - When wal_compression is off, as a trivial form of compression, the
 + * XLOG code is aware that PG data pages usually contain an unused hole
 + * in the middle, which contains only zero bytes.  If length  BLCKSZ
 + * then we have removed such a hole from the stored data (and it is
 + * not counted in the XLOG record's CRC, either).  Hence, the amount
 + * of block data actually present is length bytes.  The hole offset
 + * on page is defined using hole_offset.
 + * - When wal_compression is on, block images are compressed using a
 + * compression algorithm without their hole to improve compression
 + * process of the page. length corresponds in this case to the length
 + * of the compressed block. hole_offset is the hole offset of the page,
 + * and the length of the uncompressed block is defined by raw_length,
 + * whose data is included in the record only when compression is enabled
 + * and with_hole is set to true, see below.
 + *
 + * is_compressed is used to identify if a given block image is compressed
 + * or not. Maximum page size allowed on the system being 32k, the hole
 + * offset cannot be more than 15-bit long so the last free bit is used to
 + * store the compression state of block image. If the maximum page size
 + * allowed is increased to a value higher than that, we should consider
 + * increasing this structure size as well, but this would increase the
 + * length of block header in WAL records with alignment.
 + *
 + * with_hole is used to identify the presence of a hole in a block image.
 + * As the length of a block cannot be more than 15-bit long, the extra bit in
 + * the length field is used for this identification purpose. If the block 
 image
 + * has no hole, it is ensured that the raw size of a compressed block image 
 is
 + * equal to BLCKSZ, hence the contents of XLogRecordBlockImageCompressionInfo
 + * are not necessary.
   */
  typedef struct XLogRecordBlockImageHeader
  {
 - uint16  hole_offset;/* number of bytes before hole */
 - uint16  hole_length;/* number of bytes in hole */
 + uint16  length:15,  /* length of block data in 
 record */
 + with_hole:1;/* status of hole in the block 
 */
 +
 + uint16  hole_offset:15, /* number of bytes before hole */
 + is_compressed:1;/* compression status of image */
 +
 + /* Followed by the data related to compression if block is compressed */
  } XLogRecordBlockImageHeader;

Yikes, this is ugly.

I think we should change the xlog format so that the block_id (which
currently is XLR_BLOCK_ID_DATA_SHORT/LONG or a actual block id) isn't
the block id but something like XLR_CHUNK_ID. Which is used as is for
XLR_CHUNK_ID_DATA_SHORT/LONG, but for backup blocks can be set to to
XLR_CHUNK_BKP_WITH_HOLE, XLR_CHUNK_BKP_COMPRESSED,
XLR_CHUNK_BKP_REFERENCE... The BKP blocks will then follow, storing the
block id following the chunk id.

Yes, that'll increase the amount of data for a backup block by 1 byte,
but I think that's worth it. I'm pretty sure we will be happy about the
added extensibility pretty soon.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

On Mon, Feb 16, 2015 at 8:30 PM, Syed, Rahila rahila.s...@nttdata.com
wrote:


 Regarding the sanity checks that have been added recently. I think that
 they are useful but I am suspecting as well that only a check on the record
 CRC is done because that's reliable enough and not doing those checks
 accelerates a bit replay. So I am thinking that we should simply replace
 them by assertions.

 Removing the checks makes sense as CRC ensures correctness . Moreover ,as
 error message for invalid length of record is present in the code ,
 messages for invalid block length can be redundant.

 Checks have been replaced by assertions in the attached patch.


After more thinking, we may as well simply remove them, an error with CRC
having high chances to complain before reaching this point...



 Current

 if ((hole_length != 0) 

 +  (*len = orig_len -
 SizeOfXLogRecordBlockImageCompressionInfo))

 +  return false;

 +return true


This makes sense.

Nitpicking 1:
+Assert(!(blk-with_hole == 1   blk-hole_offset = 0));
Double-space here.

Nitpicking 2:
char * page
This should be rewritten as char *page, the * being assigned with the
variable name.
-- 
Michael

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes