Re: [HACKERS] Expanding the use of FLEXIBLE_ARRAY_MEMBER for declarations like foo[1]

2015-02-18 Thread Michael Paquier
On Tue, Feb 17, 2015 at 9:02 AM, Andres Freund wrote:
 On 2015-02-17 05:51:22 +0900, Michael Paquier wrote:
 -- inv_api.c uses bytea in an internal structure without putting it at
 the end of the structure. For clarity I think that we should just use
 a bytea pointer and do a sufficient allocation for the duration of the
 lobj manipulation.

 Hm. I don't really see the problem tbh.

 There's actually is a reason the bytea is at the beginning - the other
 fields are *are* the data part of the bytea (which is just the varlena
 header).

 -- Similarly, tuptoaster.c needed to be patched for toast_save_datum

 I'm not a big fan of these two changes. This adds some not that small
 memory allocations to a somewhat hot path. Without a big win in
 clarity. And it doesn't seem to have anything to do with with
 FLEXIBLE_ARRAY_MEMBER.

We could replace those palloc calls with a simple buffer that has a
predefined size, but this somewhat reduces the readability of the
code.

 There are as well couple of things that are not changed on purpose:
 - in namespace.h for FuncCandidateList. I tried manipulating it but it
 resulted in allocation overflow in PortalHeapMemory

 Did you change the allocation in FuncnameGetCandidates()? Note the -
 sizeof(Oid) there.

Yeah. Missed it.

 - I don't think that the t_bits fields in htup_details.h should be
 updated either.

 Why not? Any not broken code should already use MinHeapTupleSize and
 similar macros.

Changing t_bits impacts HeapTupleHeaderData, ReorderBufferTupleBuf and
similarly a couple of redo routines in heapam.c using
HeapTupleHeaderData in a couple of structures not placing it at the
end (compiler complains). We could use for each of them a buffer that
has enough room with sizeof(HeapTupleHeaderData) + MaxHeapTupleSize,
but wouldn't it reduce the readability of the current code? Opinions
welcome.

 diff --git a/src/include/catalog/pg_attribute.h 
 b/src/include/catalog/pg_attribute.h
 [...]
 Generally the catalog changes are much less clearly a benefit imo.

OK, let's drop them then.

 From ad8f54cdb5776146f17d1038bb295b5f13b549f1 Mon Sep 17 00:00:00 2001
 From: Michael Paquier mich...@otacoo.com
 Date: Mon, 16 Feb 2015 03:53:38 +0900
 Subject: [PATCH 3/3] Update varlena in c.h to use FLEXIBLE_ARRAY_MEMBER

 Places using a variable-length variable not at the end of a structure
 are changed with workaround, without impacting what those features do.

 I vote for rejecting most of this, except a (corrected version) of the
 pg_authid change. Which doesn't really belong to the rest of the patch
 anyway ;)x

Changing bytea to use FLEXIBLE_ARRAY_MEMBER requires those changes, or
at least some changes in this area as something with
FLEXIBLE_ARRAY_MEMBER can only be placed at the end of a structure.
But I guess that we can do fine as well by replacing those structures
with some buffers with a pre-defined size. I'll draft an additional
patch on top of 0001 with all those less-trivial changes implemented.

  #define VARHDRSZ ((int32) sizeof(int32))
 diff --git a/src/include/catalog/pg_authid.h 
 b/src/include/catalog/pg_authid.h
 index e01e6aa..d8789a5 100644
 --- a/src/include/catalog/pg_authid.h
 +++ b/src/include/catalog/pg_authid.h
 @@ -56,8 +56,10 @@ CATALOG(pg_authid,1260) BKI_SHARED_RELATION 
 BKI_ROWTYPE_OID(2842) BKI_SCHEMA_MAC
   int32   rolconnlimit;   /* max connections allowed (-1=no 
 limit) */

   /* remaining fields may be null; use heap_getattr to read them! */
 - textrolpassword;/* password, if any */
   timestamptz rolvaliduntil;  /* password expiration time, if any */
 +#ifdef CATALOG_VARLEN
 + textrolpassword;/* password, if any */
 +#endif
  } FormData_pg_authid;

 That change IIRC is wrong, because it'll make rolvaliduntil until NOT
 NULL (any column that's fixed width and has only fixed with columns
 before it is marked as such).

This sounds better as a separate patch...
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup may fail to send feedbacks.

2015-02-18 Thread Kyotaro HORIGUCHI
Hello, this is the last patch for pg_basebackup/pg_receivexlog on
master (9.5). Preor versions don't have this issue.

4. basebackup_reply_fix_mst_v2.patch
  receivelog.c patch applyable on master.

This is based on the same design with
walrcv_reply_fix_91_v2.patch in the aspect of gettimeofday().

regards,

At Tue, 17 Feb 2015 19:45:19 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
horiguchi.kyot...@lab.ntt.co.jp wrote in 
20150217.194519.58137941.horiguchi.kyot...@lab.ntt.co.jp
 Thank you for the comment. Three new patches are attached. I
 forgot to give a revision number on the previous patch, but I
 think this is the 2nd version.
 
 1. walrcv_reply_fix_94_v2.patch
Walreceiver patch applyable on master/REL9_4_STBLE/REL9_3_STABLE
 
 2. walrcv_reply_fix_92_v2.patch
Walreceiver patch applyable on REL9_2_STABLE
 
 3. walrcv_reply_fix_91_v2.patch
Walreceiver patch applyable on REL9_1_STABLE
 
 
 At Sat, 14 Feb 2015 03:01:22 +0900, Fujii Masao masao.fu...@gmail.com wrote 
 in cahgqgwhz_4ywyoka+5ks9s_698adjh8+0hamcsc9wyfh37g...@mail.gmail.com
  On Tue, Feb 10, 2015 at 7:48 PM, Kyotaro HORIGUCHI
  horiguchi.kyot...@lab.ntt.co.jp wrote:
   Introduce the maximum number of records that we can receive and
   process between feedbacks? For example, we can change
   XLogWalRcvSendHSFeedback() so that it's called per at least
   8 records. I'm not sure what number is good, though...
  
   At the beginning of the while(len  0){if (len  0){ block,
   last_recv_timestamp is always kept up to date (by using
   gettimeotda():). So we can use the variable instead of
   gettimeofday() in my previous proposal.
  
  Yes, but something like last_recv_timestamp doesn't exist in
  REL9_1_STABLE. So you don't think that your proposed fix
  should be back-patched to all supported versions. Right?
 
 Back to 9.3 has the loop with the same structure. 9.2 is in a bit
 differenct shape but looks to have the same issue. 9.1 and 9.0
 has the same staps with 9.2 but 9.0 doesn't has wal sender
 timeout and 9.1 does not have a variable having current time.
 
 9.3 and later: the previous patch works, but revised as your
   comment.
 
 9.2: The time of the last reply is stored in the file-scope
   variable reply_message, and the current time is stored in
   walrcv-lastMsgReceiptTime. The timeout is determined using
   theses variables.
 
 9.1: walrcv doesn't have lastMsgReceiptTime. The current time
   should be acquired explicitly in the innermost loop. I tried
   calling gettimeofday() once per several loops. The skip count
   is determined by anticipated worst duration to process a wal
   record and wal_receiver_status_interval. However, I doubt the
   necessity to do so.. The value of the worst duration is simply
   a random guess.
 
 9.0: No point to do this kind of fix.
 
 
   The start time of the timeout could be last_recv_timestamp at the
   beginning of the while loop, since it is a bit earlier than the
   actual time at the point.
  
  Sounds strange to me. I think that last_recv_timestamp should be
  compared with the time when the last feedback message was sent,
  e.g., maybe you can expose sendTime in XLogWalRcvSendReplay()
  as global variable, and use it to compare with last_recv_timestamp.
 
 You're right to do exactly right thing, but, as you mentioned,
 exposing sendTime is requied to do so. The solution I proposed is
 the second-best assuming that wal_sender_timeout is recommended
 to be longer enough (several times longer) than
 wal_receiver_status_interval. If no one minds to expose sendTime,
 it is the best solution. The attached patch does it.
 
 # The added comment in the patch was wrong, though.
 
  When we get out of the WAL receive loop due to the timeout check
  which your patch added, XLogWalRcvFlush() is always executed.
  I don't think that current WAL file needs to be flushed at that time.
 
 Thank you for pointing it out. Run XLogWalRcvSendReply(force =
 true) immediately instead of breaking from the loop.
 
   The another solution would be using
   RegisterTimeout/enable_timeout_after and it seemed to be work for
   me. It can throw out the time counting stuff in the loop we are
   talking about and that of XLogWalRcvSendHSFeedback and
   XLogWalRcvSendReply, but it might be a bit too large for the
   gain.
  
  Yes, sounds overkill.
 
 However, gettimeofday() for each recv is also a overkill for most
 cases. I'll post the patches for receivelog.c based on the patch
 for 9.1 wal receiver.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
From cfe01eadd4d8567f4410bccb8334c52fc897c002 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi horiguchi.kyot...@lab.ntt.co.jp
Date: Wed, 18 Feb 2015 12:30:58 +0900
Subject: [PATCH] Make pg_bascbackup and pg_receivexlog to keep sending WAL
 receive feedback regularly on heavy load v2.

pg_basebackup and pg_receivexlog fail to send receiver reply message
while they are receiving continuous WAL stream caused by heavy load or
something else. This patch makes 

Re: [HACKERS] How about to have relnamespace and relrole?

2015-02-18 Thread Kyotaro HORIGUCHI
Sorry, I sent the previous mail without patches by accident. The
patches are attached to this mail.


Hello, this is the patchset v2 of this feature.

0001-Add-regrole.patch
  Adding regrole as the name says.

0002-Add-regnamespace.patch
  Adding regnamespace. This depends on 0001 patch.

0003-Check-new-reg-types-are-not-used-as-default-values.patch
  Inhibiting the new OID aliss types from being used as constants
  in default values, which make dependencies on the other
  (existing) OID alias types.

0004-Documentation-for-new-OID-alias-types.patch
  Documentation patch for this new types.

regards,

At Tue, 17 Feb 2015 20:19:11 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
horiguchi.kyot...@lab.ntt.co.jp wrote in 
20150217.201911.239516619.horiguchi.kyot...@lab.ntt.co.jp
 Hello, thank you for the comment.
 
 The current patch lacks change in documentation and dependency
 stuff. Current framework doesn't consider changing pg_shdepend
 from column default expressions so the possible measures are the
 followings, I think.
 
 1. Make pg_shdepend to have refobjsubid and add some deptypes of
pg_depend, specifically DEPENDENCY_NORMAL is needed. Then,
change the dependency stuff.
 
 2. Simplly inhibit specifying them in default
   expressions. Integer or OID can act as the replacement.
 
=# create table t1 (a int, b regrole default 'horiguti'::regrole);
ERROR:  Type 'regrole' cannot have a default value
 
 1 is quite overkill but hardly seems to be usable. So I will go
 on 2 and post additional patch.
 
 
 At Thu, 12 Feb 2015 17:12:24 -0600, Jim Nasby jim.na...@bluetreble.com 
 wrote in 54dd3358.9030...@bluetreble.com
  On 2/12/15 5:28 AM, Kyotaro HORIGUCHI wrote:
   Hello, I changed the subject.
 
 # Ouch! the subject remaines wrong..
 
   1. Expected frequency of use
  ...
   For that reason, although the current policy of deciding whether
   to have reg* types seems to be whether they have schema-qualified
   names, I think regrole and regnamespace are valuable to have.
  
  Perhaps schema qualification was the original intent, but I think at
  this point everyone uses them for convenience. Why would I want to
  type out JOIN pg_namespace n ON n.oid = blah.???namespace when I could
  simply do ???namespace::regnamespace?
 
 Yes, that is the most important point of this patch.
 
   2. Anticipaed un-optimizability
  
   Tom pointed that these reg* types prevents planner from
   optimizing the query, so we should refrain from having such
   machinary. It should have been a long-standing issue but reg*s
   sees to be rather faster than joining corresponding catalogs
   for moderate number of the result rows, but this raises another
   more annoying issue.
  
  
   3. Potentially breakage of MVCC
  
   The another issue Tom pointed is potentially breakage of MVCC by
   using these reg* types. Syscache is invalidated on updates so
   this doesn't seem to be a problem on READ COMMITTED mode, but
   breaks SERIALIZABLE mode. But IMHO it is not so serious problem
   as long as such inconsistency occurs only on system catalog and
   it is explicitly documented togethee with the un-optimizability
   issue. I'll add a patch for this later.
  
  The way I look at it, all these arguments went out the window when
  regclass was introduced.
  
  The reality is that reg* is *way* more convenient than manual
  joins. The arguments about optimization and MVCC presumably apply to
  all reg* casts, which means that ship has long since sailed.
 
 I agree basically, but I think some caveat is needed. I'll
 include that in the coming documentation patch.
 
  If we offered views that made it easier to get at this data then maybe
  this wouldn't matter so much. But DBA's don't want to join 3 catalogs
  together to try and answer a simple question.
 
 It has been annoying me these days.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
From 2a6f689afdc8197c2fe2fc235a4819ce1a5e9928 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi horiguchi.kyot...@lab.ntt.co.jp
Date: Wed, 18 Feb 2015 14:38:32 +0900
Subject: [PATCH 1/4] Add regrole

Add new regtype regrole. For the development reason, the system OIDs
used for the new procs, types, casts are making a gap from existing
OIDs.
---
 src/backend/bootstrap/bootstrap.c |   2 +
 src/backend/utils/adt/regproc.c   | 101 ++
 src/backend/utils/adt/selfuncs.c  |   2 +
 src/backend/utils/cache/catcache.c|   1 +
 src/include/catalog/pg_cast.h |   7 +++
 src/include/catalog/pg_proc.h |  10 
 src/include/catalog/pg_type.h |   5 ++
 src/include/utils/builtins.h  |   5 ++
 src/test/regress/expected/regproc.out |  26 -
 src/test/regress/sql/regproc.sql  |   7 +++
 10 files changed, 165 insertions(+), 1 deletion(-)

diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index bc66eac..11e40ee 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ 

Re: [HACKERS] How about to have relnamespace and relrole?

2015-02-18 Thread Kyotaro HORIGUCHI
Hello, this is the patchset v2 of this feature.

0001-Add-regrole.patch
  Adding regrole as the name says.

0002-Add-regnamespace.patch
  Adding regnamespace. This depends on 0001 patch.

0003-Check-new-reg-types-are-not-used-as-default-values.patch
  Inhibiting the new OID aliss types from being used as constants
  in default values, which make dependencies on the other
  (existing) OID alias types.

0004-Documentation-for-new-OID-alias-types.patch
  Documentation patch for this new types.

regards,

At Tue, 17 Feb 2015 20:19:11 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
horiguchi.kyot...@lab.ntt.co.jp wrote in 
20150217.201911.239516619.horiguchi.kyot...@lab.ntt.co.jp
 Hello, thank you for the comment.
 
 The current patch lacks change in documentation and dependency
 stuff. Current framework doesn't consider changing pg_shdepend
 from column default expressions so the possible measures are the
 followings, I think.
 
 1. Make pg_shdepend to have refobjsubid and add some deptypes of
pg_depend, specifically DEPENDENCY_NORMAL is needed. Then,
change the dependency stuff.
 
 2. Simplly inhibit specifying them in default
   expressions. Integer or OID can act as the replacement.
 
=# create table t1 (a int, b regrole default 'horiguti'::regrole);
ERROR:  Type 'regrole' cannot have a default value
 
 1 is quite overkill but hardly seems to be usable. So I will go
 on 2 and post additional patch.
 
 
 At Thu, 12 Feb 2015 17:12:24 -0600, Jim Nasby jim.na...@bluetreble.com 
 wrote in 54dd3358.9030...@bluetreble.com
  On 2/12/15 5:28 AM, Kyotaro HORIGUCHI wrote:
   Hello, I changed the subject.
 
 # Ouch! the subject remaines wrong..
 
   1. Expected frequency of use
  ...
   For that reason, although the current policy of deciding whether
   to have reg* types seems to be whether they have schema-qualified
   names, I think regrole and regnamespace are valuable to have.
  
  Perhaps schema qualification was the original intent, but I think at
  this point everyone uses them for convenience. Why would I want to
  type out JOIN pg_namespace n ON n.oid = blah.???namespace when I could
  simply do ???namespace::regnamespace?
 
 Yes, that is the most important point of this patch.
 
   2. Anticipaed un-optimizability
  
   Tom pointed that these reg* types prevents planner from
   optimizing the query, so we should refrain from having such
   machinary. It should have been a long-standing issue but reg*s
   sees to be rather faster than joining corresponding catalogs
   for moderate number of the result rows, but this raises another
   more annoying issue.
  
  
   3. Potentially breakage of MVCC
  
   The another issue Tom pointed is potentially breakage of MVCC by
   using these reg* types. Syscache is invalidated on updates so
   this doesn't seem to be a problem on READ COMMITTED mode, but
   breaks SERIALIZABLE mode. But IMHO it is not so serious problem
   as long as such inconsistency occurs only on system catalog and
   it is explicitly documented togethee with the un-optimizability
   issue. I'll add a patch for this later.
  
  The way I look at it, all these arguments went out the window when
  regclass was introduced.
  
  The reality is that reg* is *way* more convenient than manual
  joins. The arguments about optimization and MVCC presumably apply to
  all reg* casts, which means that ship has long since sailed.
 
 I agree basically, but I think some caveat is needed. I'll
 include that in the coming documentation patch.
 
  If we offered views that made it easier to get at this data then maybe
  this wouldn't matter so much. But DBA's don't want to join 3 catalogs
  together to try and answer a simple question.
 
 It has been annoying me these days.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Combining Aggregates

2015-02-18 Thread Kouhei Kaigai
Hi Rowley,

Let me put some comments on this patch.

This patch itself looks good as an infrastructure towards
the big picture, however, we still don't reach the consensus
how combined functions are used instead of usual translation
functions.

Aggregate function usually consumes one or more values extracted
from a tuple, then it accumulates its internal state according
to the argument. Exiting transition function performs to update
its internal state with assumption of a function call per records.
On the other hand, new combined function allows to update its
internal state with partial aggregated values which is processed
by preprocessor node.
An aggregate function is represented with Aggref node in plan tree,
however, we have no certain way to determine which function shall
be called to update internal state of aggregate.

For example, avg(float) has an internal state with float[3] type
for number of rows, sum of X and X^2. If combined function can
update its internal state with partially aggregated values, its
argument should be float[3]. It is obviously incompatible to
float8_accum(float) that is transition function of avg(float).
I think, we need a new flag on Aggref node to inform executor
which function shall be called to update internal state of
aggregate. Executor cannot decide it without this hint.

Also, do you have idea to push down aggregate function across
joins? Even though it is a bit old research, I could find
a systematic approach to push down aggregate across join.
https://cs.uwaterloo.ca/research/tr/1993/46/file.pdf

I think, it is great if core functionality support this query
rewriting feature based on cost estimation, without external
modules.

How about your opinions?

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com


 -Original Message-
 From: pgsql-hackers-ow...@postgresql.org
 [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of David Rowley
 Sent: Friday, December 19, 2014 8:39 PM
 To: Simon Riggs
 Cc: Kaigai Kouhei(海外 浩平); PostgreSQL-development; Amit Kapila
 Subject: Re: [HACKERS] Combining Aggregates
 
 On 18 December 2014 at 02:48, Simon Riggs si...@2ndquadrant.com wrote:
 
 
   David, if you can update your patch with some docs to explain the
   behaviour, it looks complete enough to think about committing it
 in
   early January, to allow other patches that depend upon it to stand
 a
   chance of getting into 9.5. (It is not yet ready, but I see it could
   be).
 
 
 
 
 Sure, I've more or less added the docs from your patch. I still need to
 trawl through and see if there's anywhere else that needs some additions.
 
 
   The above example is probably the best description of the need,
 since
   user defined aggregates must also understand this.
 
   Could we please call these combine functions or other? MERGE is
 an
   SQL Standard statement type that we will add later, so it will be
   confusing if we use the merge word in this context.
 
 
 
 
 Ok, changed.
 
 
   David, your patch avoids creating any mergefuncs for existing
   aggregates. We would need to supply working examples for at least
 a
   few of the builtin aggregates, so we can test the infrastructure.
 We
   can add examples for all cases later.
 
 
 
 
 In addition to MIN(), MAX(), BIT_AND(), BIT_OR, SUM() for floating point
 types, cash and interval. I've now added combine functions for count(*)
 and count(col). It seems that int8pl() is suitable for this.
 
 
 Do you think it's worth adding any new functions at this stage, or is it
 best to wait until there's a patch which can use these?
 
 Regards
 
 David Rowley

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Odd behavior of updatable security barrier views on foreign tables

2015-02-18 Thread Etsuro Fujita

On 2015/02/18 7:44, Stephen Frost wrote:

* Etsuro Fujita (fujita.ets...@lab.ntt.co.jp) wrote:

On 2015/02/11 4:06, Stephen Frost wrote:

I had been trying to work out an FDW-specific way to address this, but I
think Dean's right that this should be addressed in
expand_security_qual(), which means it'll apply to all cases and not
just these FDW calls.  I don't think that's actually an issue though and
it'll match up to how SELECT FOR UPDATE is handled today.


Sorry, my explanation was not accurate, but I also agree with Dean's
idea.  In the above, I just wanted to make it clear that such a lock
request done by expand_security_qual() should be limited to the case
where the relation that is a former result relation is a foreign
table.


Attached is a patch which should address this.  Would love your (or
anyone else's) feedback on it.  It appears to address the issue which
you raised and the regression test changes are all in-line with
inserting a LockRows into the subquery, as anticipated.


I've looked into the patch.

* The patch applies to the latest head, 'make' passes successfully, but 
'make check' fails in the rowsecurity test.


* I found one place in expand_security_qual that I'm concerned about:

+   if (targetRelation)
+   applyLockingClause(subquery, 1, LCS_FORUPDATE,
+  false, 
false);

ISTM that it'd be better to use LockWaitBlock as the fourth argument of 
applyLockingClause.


Other than that, the patch looks good to me.

Thanks for the work!

Best regards,
Etsuro Fujita


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC 2015 - mentors, students and admins.

2015-02-18 Thread Thom Brown
On 9 February 2015 at 20:52, Thom Brown t...@linux.com wrote:


 Hi all,

 Google Summer of Code 2015 is approaching.  I'm intending on registering
 PostgreSQL again this year.

 Before I do that, I'd like to have an idea of how many people are
 interested in being either a student or a mentor.

 I've volunteered to be admin again, but if anyone else has a strong
 interest of seeing the projects through this year, please let yourself be
 known.

 Who would be willing to mentor projects this year?

 Project ideas are welcome, and I've copied many from last year's
 submissions into this year's wiki page.  Please feel free to add more (or
 delete any that stand no chance or are redundant):
 https://wiki.postgresql.org/wiki/GSoC_2015

 Students can find more information about GSoC here:
 http://www.postgresql.org/developer/summerofcode


I'd also like to ask whether anyone would be willing to be a backup admin
for this year's GSoC?  Please message me directly if you're interested.  I
need to know A.S.A.P though.

Thanks

Thom


Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

2015-02-18 Thread Fujii Masao
On Wed, Feb 18, 2015 at 1:26 AM, David Steele da...@pgmasters.net wrote:
 On 2/17/15 10:23 AM, Simon Riggs wrote:
 I vote to include pgaudit in 9.5, albeit with any changes. In
 particular, David may have some changes to recommend, but I haven't
 seen a spec or a patch, just a new version of code (which isn't how we
 do things...).

 I submitted the new patch in my name under a separate thread Auditing
 extension for PostgreSQL (Take 2) (54e005cc.1060...@pgmasters.net)

I played the patch version of pg_audit a bit and have basic comments about
its spec.

The pg_audit doesn't log BIND parameter values when prepared statement is used.
Seems this is an oversight of the patch. Or is this intentional?

The pg_audit cannot log the statement like SELECT 1 which doesn't access to
the database object. Is this intentional? I think that there are many users who
want to audit even such statement.

Imagine the case where you call the user-defined function which executes
many nested statements. In this case, pg_audit logs only top-level statement
(i.e., issued directly by client) every time nested statement is executed.
In fact, one call of such UDF can cause lots of *same* log messages. I think
this is problematic.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ExplainModifyTarget doesn't work as expected

2015-02-18 Thread Etsuro Fujita
On 2015/02/18 8:13, Tom Lane wrote:
 So I went back to your v1 patch and have now committed that with some
 cosmetic modifications.  Sorry for making you put time into a dead end.

I don't mind it.  Thanks for committing the patch!

Best regards,
Etsuro Fujita


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-02-18 Thread Amit Kapila
On Tue, Feb 17, 2015 at 9:52 PM, Andres Freund and...@2ndquadrant.com
wrote:

 On 2015-02-11 15:49:17 -0500, Robert Haas wrote:

 A query whose runetime is dominated by a sequential scan (+ attached
 filter) is certainly going to require a bigger prefetch size than one
 that does other expensive stuff.

 Imagine parallelizing
 SELECT * FROM largetable WHERE col = low_cardinality_value;
 and
 SELECT *
 FROM largetable JOIN gigantic_table ON (index_nestloop_condition)
 WHERE col = high_cardinality_value;

 The first query will be a simple sequential and disk reads on largetable
 will be the major cost of executing it.  In contrast the second query
 might very well sensibly be planned as a parallel sequential scan with
 the nested loop executing in the same worker. But the cost of the
 sequential scan itself will likely be completely drowned out by the
 nestloop execution - index probes are expensive/unpredictable.


I think the work/task given to each worker should be as granular
as possible to make it more predictable.
I think the better way to parallelize such a work (Join query) is that
first worker does sequential scan and filtering on large table and
then pass it to next worker for doing join with gigantic_table.

 
  I think it makes sense to think of a set of tasks in which workers can
  assist.  So you a query tree which is just one query tree, with no
  copies of the nodes, and then there are certain places in that query
  tree where a worker can jump in and assist that node.  To do that, it
  will have a copy of the node, but that doesn't mean that all of the
  stuff inside the node becomes shared data at the code level, because
  that would be stupid.

 My only problem with that description is that I think workers will
 have to work on more than one node - it'll be entire subtrees of the
 executor tree.


There could be some cases where it could be beneficial for worker
to process a sub-tree, but I think there will be more cases where
it will just work on a part of node and send the result back to either
master backend or another worker for further processing.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] deparsing utility commands

2015-02-18 Thread Alvaro Herrera
Andres Freund wrote:
 Hi,
 
 On 2015-02-15 01:48:15 -0300, Alvaro Herrera wrote:
  This is a repost of the patch to add CREATE command deparsing support to
  event triggers.  It now supports not only CREATE but also ALTER and
  other commands such as GRANT/REVOKE, COMMENT ON and SECURITY LABEL.
  This patch series is broken up in a rather large number of patches, but
  my intention would be to commit separately only the first 6 patches; the
  remaining parts are split up only so that it's easy to see how deparsing
  each patch is implemented, and would be committed as a single changeset
  (but before that I need a bunch of extra fixes and additions to other
  parts.)
 
 I think you should just go ahead and commit 0001, 0002, 0006. Those have
 previously been discussed and seem like a good idea and uncontroversial
 even if the rest of the series doesn't go in.

I have pushed 0001 (changed pg_identify_object for opclasses and
opfamilies to use USING instead of FOR) to master only, and backpatched
a fix for pg_conversion object identities, which were missing
schema-qualification.

Patch 0002 I think is good to go as well, AFAICT (have the various
RENAME commands return the OID and attnum of affected objects).

On 0006 (which is about having ALTER TABLE return the OID/attnum of the
affected object on each subcommand), I have a problem about the ALTER
TABLE ALTER COLUMN SET DATA TYPE USING subcommand.  The problem with
that is that in order to fully deparse that subcommand we need to
deparse the expression of the USING clause.  But in the parse node we
only have info about the untransformed expression, so it is not possible
to pass it through ruleutils directly; it needs to be run by
transformExpr() first.  Doing that in the deparse code seems the wrong
solution, so I am toying with the idea of adding a Node * output
parameter for some ATExec* routines; the Prep step would insert the
transformed expression there, so that it is available at the deparse
stage.  As far as I know, only the SET DATA TYPE USING form has this
issue: for other subcommands, the current code is simply grabbing the
expression from catalogs.  Maybe it would be good to use that Node in
those cases as well and avoid catalog lookups, not sure.


 I think the GRANT/REVOKE, COMMENT ON and SECURITY LABEL changes are good
 independently as well, but there previously have been raised some
 concerns about shared objects. I think the answer in the patches which
 is to raise events when affecting database local objects makes sense,
 but others might disagree.

Yes, I will push these unless somebody objects soon, as they seem
perfectly reasonable to me.  The only troubling thing is that there is
no regression test for this kind of thing in event triggers (i.e. verify
which command tags get support and which don't), which seems odd to me.
Not these patches's fault, though, so I'm not considering adding any ATM.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] deparsing utility commands

2015-02-18 Thread Alvaro Herrera
Stephen Frost wrote:

 I've started taking a look at this as the pgaudit bits depend on it and
 noticed that the patch set implies there's 42 patches, but there were
 only 37 attached..?

Ah, I didn't realize when I posted that the subject was counting those
extra patches.  They are later patches that add the testing framework,
but since the tests don't pass currently, they are not usable yet.
Mostly they are about the running deparse_init.sql file that I posted
separately.  I will post a real patch for that stuff later today to make
it clear what it is that we're talking about.


FWIW, one of Robert's main objections is that future hackers will forget
to add deparse support for new commands as they are added.  In an
attempt to get this sorted out, I have modified the stuff in
ProcessUtilitySlow() so that instead of having one
EventTriggerStashCommand call for each node type, there is only one call
at the end of the function.  That way, new cases in the big switch there
will automatically get something added to the stash, which should make
the test fail appropriately.  (Things like adding a new clause to
existing commands will be tested by running pg_dump on the databases and
comparing.)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] deparsing utility commands

2015-02-18 Thread Stephen Frost
Alvaro,

* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
 This is a repost of the patch to add CREATE command deparsing support to
 event triggers.  It now supports not only CREATE but also ALTER and
 other commands such as GRANT/REVOKE, COMMENT ON and SECURITY LABEL.
 This patch series is broken up in a rather large number of patches, but
 my intention would be to commit separately only the first 6 patches; the
 remaining parts are split up only so that it's easy to see how deparsing
 each patch is implemented, and would be committed as a single changeset
 (but before that I need a bunch of extra fixes and additions to other
 parts.)

I've started taking a look at this as the pgaudit bits depend on it and
noticed that the patch set implies there's 42 patches, but there were
only 37 attached..?

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] deparsing utility commands

2015-02-18 Thread Stephen Frost
Alvaro,

* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
 Stephen Frost wrote:
  I've started taking a look at this as the pgaudit bits depend on it and
  noticed that the patch set implies there's 42 patches, but there were
  only 37 attached..?
 
 Ah, I didn't realize when I posted that the subject was counting those
 extra patches.  They are later patches that add the testing framework,
 but since the tests don't pass currently, they are not usable yet.
 Mostly they are about the running deparse_init.sql file that I posted
 separately.  I will post a real patch for that stuff later today to make
 it clear what it is that we're talking about.

Oh, ok, no problem, just wanted to make sure things weren't missing.

 FWIW, one of Robert's main objections is that future hackers will forget
 to add deparse support for new commands as they are added.  In an
 attempt to get this sorted out, I have modified the stuff in
 ProcessUtilitySlow() so that instead of having one
 EventTriggerStashCommand call for each node type, there is only one call
 at the end of the function.  That way, new cases in the big switch there
 will automatically get something added to the stash, which should make
 the test fail appropriately.  (Things like adding a new clause to
 existing commands will be tested by running pg_dump on the databases and
 comparing.)

Right, I've been following the discussion and fully agree with Robert
that we really need a way to make sure that future work doesn't forget
to address deparseing (or the various other bits, object classes, etc,
really).  The approach you've outlined sounds interesting but I haven't
yet gotten to that bit of the code.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] deparsing utility commands

2015-02-18 Thread Alvaro Herrera
Stephen,

Stephen Frost wrote:

 * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:

  Patch 0002 I think is good to go as well, AFAICT (have the various
  RENAME commands return the OID and attnum of affected objects).
 
 It's not a huge complaint, but it feels a bit awkward to me that
 ExecRenameStmt is now returning one item and using an out variable for
 the other when the two really go together (Oid and Object Sub ID, that
 is).  Further, the comment above ExecRenameStmt should make it clear
 that it's safe to pass NULL into objsubid if you don't care about it.
 
 The same probably goes for the COMMENT bits.

Hmm, while I agree that it's a relatively minor point, it seems a fair
one.  I think we could handle this by returning ObjectAddress rather
than Oid in ExecRenameStmt() and CommentObject(); then you have all the
bits you need in a single place.  Furthermore, the function in another
patch EventTriggerStashCommand() instead of getting separately (ObjType,
objectId, objectSubId) could take a single argument of type
ObjectAddress.

Now, we probably don't want to hack *all* the utility commands to return
ObjectAddress instead of OID, because it many cases that's just not
going to be convenient (not to speak of the code churn); so I think for
most objtypes the ProcessUtilitySlow stanza would look like this:

case T_AlterTSConfigurationStmt:
objectId = 
AlterTSConfiguration((AlterTSConfigurationStmt *) parsetree);
objectType = OBJECT_TSCONFIGURATION;
break;

For ExecRenameStmt and CommentObject (and probably other cases such as
security labels) the stanza in ProcessUtilitySlow would be simpler:

case T_CommentStmt:
address = CommentObject((CommentStmt *) 
parsetree);
break;

and at the bottom of the loop we would transform the objid/type into
address for the cases that need it:

if (!commandStashed)
{
if (objectId != InvalidOid)
{
address.classId = 
get_objtype_catalog_oid(objectType);
address.objectId = objectId;
address.objectSubId = 0;
}
EventTriggerStashCommand(address, secondaryOid, 
parsetree);
}

  On 0006 (which is about having ALTER TABLE return the OID/attnum of the
  affected object on each subcommand), I have a problem about the ALTER
  TABLE ALTER COLUMN SET DATA TYPE USING subcommand.  The problem with
  that is that in order to fully deparse that subcommand we need to
  deparse the expression of the USING clause.  But in the parse node we
  only have info about the untransformed expression, so it is not possible
  to pass it through ruleutils directly; it needs to be run by
  transformExpr() first.
 
 I agree- I'm pretty sure we definitely don't want to run through
 transformExpr again in the deparse code.  I'm not a huge fan of adding a
 Node* output parameter, but I havn't got any other great ideas about how
 to address that.

Yeah, my thoughts exactly :-(

   I think the GRANT/REVOKE, COMMENT ON and SECURITY LABEL changes are good
   independently as well, but there previously have been raised some
   concerns about shared objects. I think the answer in the patches which
   is to raise events when affecting database local objects makes sense,
   but others might disagree.
  
  Yes, I will push these unless somebody objects soon, as they seem
  perfectly reasonable to me.  The only troubling thing is that there is
  no regression test for this kind of thing in event triggers (i.e. verify
  which command tags get support and which don't), which seems odd to me.
  Not these patches's fault, though, so I'm not considering adding any ATM.
 
 Ugh.  I dislike that when we say an event trigger will fire before
 'GRANT' what we really mean is GRANT when it's operating against a
 local object.  At the minimum we absolutely need to be very clear in
 the documentation about that limitation.  Perhaps there is something
 already which reflects that, but I don't see anything in the patch
 being added about that.

Hmm, good point, will give this some thought.  I'm thinking perhaps we
can add a table of which object types are supported for generic commands
such as GRANT, COMMENT and SECURITY LABEL.

 Still looking at the rest of the patches.

Great, thanks -- and thanks for the review so far.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] deparsing utility commands

2015-02-18 Thread Stephen Frost
Alvaro,

* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
 Andres Freund wrote:
  On 2015-02-15 01:48:15 -0300, Alvaro Herrera wrote:
  I think you should just go ahead and commit 0001, 0002, 0006. Those have
  previously been discussed and seem like a good idea and uncontroversial
  even if the rest of the series doesn't go in.
 
 I have pushed 0001 (changed pg_identify_object for opclasses and
 opfamilies to use USING instead of FOR) to master only, and backpatched
 a fix for pg_conversion object identities, which were missing
 schema-qualification.

That looked fine to me.

 Patch 0002 I think is good to go as well, AFAICT (have the various
 RENAME commands return the OID and attnum of affected objects).

It's not a huge complaint, but it feels a bit awkward to me that
ExecRenameStmt is now returning one item and using an out variable for
the other when the two really go together (Oid and Object Sub ID, that
is).  Further, the comment above ExecRenameStmt should make it clear
that it's safe to pass NULL into objsubid if you don't care about it.

The same probably goes for the COMMENT bits.

 On 0006 (which is about having ALTER TABLE return the OID/attnum of the
 affected object on each subcommand), I have a problem about the ALTER
 TABLE ALTER COLUMN SET DATA TYPE USING subcommand.  The problem with
 that is that in order to fully deparse that subcommand we need to
 deparse the expression of the USING clause.  But in the parse node we
 only have info about the untransformed expression, so it is not possible
 to pass it through ruleutils directly; it needs to be run by
 transformExpr() first.  Doing that in the deparse code seems the wrong
 solution, so I am toying with the idea of adding a Node * output
 parameter for some ATExec* routines; the Prep step would insert the
 transformed expression there, so that it is available at the deparse
 stage.  As far as I know, only the SET DATA TYPE USING form has this
 issue: for other subcommands, the current code is simply grabbing the
 expression from catalogs.  Maybe it would be good to use that Node in
 those cases as well and avoid catalog lookups, not sure.

I agree- I'm pretty sure we definitely don't want to run through
transformExpr again in the deparse code.  I'm not a huge fan of adding a
Node* output parameter, but I havn't got any other great ideas about how
to address that.

  I think the GRANT/REVOKE, COMMENT ON and SECURITY LABEL changes are good
  independently as well, but there previously have been raised some
  concerns about shared objects. I think the answer in the patches which
  is to raise events when affecting database local objects makes sense,
  but others might disagree.
 
 Yes, I will push these unless somebody objects soon, as they seem
 perfectly reasonable to me.  The only troubling thing is that there is
 no regression test for this kind of thing in event triggers (i.e. verify
 which command tags get support and which don't), which seems odd to me.
 Not these patches's fault, though, so I'm not considering adding any ATM.

Ugh.  I dislike that when we say an event trigger will fire before
'GRANT' what we really mean is GRANT when it's operating against a
local object.  At the minimum we absolutely need to be very clear in
the documentation about that limitation.  Perhaps there is something
already which reflects that, but I don't see anything in the patch
being added about that.

Still looking at the rest of the patches.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] deparsing utility commands

2015-02-18 Thread Stephen Frost
Alvaro,

* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
 Stephen Frost wrote:
  * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
   Patch 0002 I think is good to go as well, AFAICT (have the various
   RENAME commands return the OID and attnum of affected objects).
  
  It's not a huge complaint, but it feels a bit awkward to me that
  ExecRenameStmt is now returning one item and using an out variable for
  the other when the two really go together (Oid and Object Sub ID, that
  is).  Further, the comment above ExecRenameStmt should make it clear
  that it's safe to pass NULL into objsubid if you don't care about it.
  
  The same probably goes for the COMMENT bits.
 
 Hmm, while I agree that it's a relatively minor point, it seems a fair
 one.  I think we could handle this by returning ObjectAddress rather
 than Oid in ExecRenameStmt() and CommentObject(); then you have all the
 bits you need in a single place.  Furthermore, the function in another
 patch EventTriggerStashCommand() instead of getting separately (ObjType,
 objectId, objectSubId) could take a single argument of type
 ObjectAddress.

Agreed, that thought occured to me as well while I was looking through
the other deparse patches and I agree that it makes sense.

 Now, we probably don't want to hack *all* the utility commands to return
 ObjectAddress instead of OID, because it many cases that's just not
 going to be convenient (not to speak of the code churn); so I think for
 most objtypes the ProcessUtilitySlow stanza would look like this:
 
   case T_AlterTSConfigurationStmt:
   objectId = 
 AlterTSConfiguration((AlterTSConfigurationStmt *) parsetree);
   objectType = OBJECT_TSCONFIGURATION;
   break;
 
 For ExecRenameStmt and CommentObject (and probably other cases such as
 security labels) the stanza in ProcessUtilitySlow would be simpler:
 
   case T_CommentStmt:
   address = CommentObject((CommentStmt *) 
 parsetree);
   break;
 
 and at the bottom of the loop we would transform the objid/type into
 address for the cases that need it:
 
   if (!commandStashed)
   {
   if (objectId != InvalidOid)
   {
   address.classId = 
 get_objtype_catalog_oid(objectType);
   address.objectId = objectId;
   address.objectSubId = 0;
   }
   EventTriggerStashCommand(address, secondaryOid, 
 parsetree);
   }

That'd be fine with me, though for my 2c, I wouldn't object to changing
them all to return ObjectAddress either.  I agree that it'd cause a fair
bit of code churn to do so, but there's a fair bit of code churn
happening here anyway (looking at what 0008 does to ProcessUtilitySlow
anyway).

   Yes, I will push these unless somebody objects soon, as they seem
   perfectly reasonable to me.  The only troubling thing is that there is
   no regression test for this kind of thing in event triggers (i.e. verify
   which command tags get support and which don't), which seems odd to me.
   Not these patches's fault, though, so I'm not considering adding any ATM.
  
  Ugh.  I dislike that when we say an event trigger will fire before
  'GRANT' what we really mean is GRANT when it's operating against a
  local object.  At the minimum we absolutely need to be very clear in
  the documentation about that limitation.  Perhaps there is something
  already which reflects that, but I don't see anything in the patch
  being added about that.
 
 Hmm, good point, will give this some thought.  I'm thinking perhaps we
 can add a table of which object types are supported for generic commands
 such as GRANT, COMMENT and SECURITY LABEL.

That sounds like a good idea.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] deparsing utility commands

2015-02-18 Thread Stephen Frost
* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
   Now, we probably don't want to hack *all* the utility commands to return
   ObjectAddress instead of OID, because it many cases that's just not
   going to be convenient (not to speak of the code churn); so I think for
   most objtypes the ProcessUtilitySlow stanza would look like this:
 
  That'd be fine with me, though for my 2c, I wouldn't object to changing
  them all to return ObjectAddress either.  I agree that it'd cause a fair
  bit of code churn to do so, but there's a fair bit of code churn
  happening here anyway (looking at what 0008 does to ProcessUtilitySlow
  anyway).
 
 Well, that would make my life easier I think (even if it's a bit more
 work), so unless there are objections I will do things this way.  It's a
 bit of a pity that Robert and Dimitri went to huge lengths to have these
 functions return OID and we're now changing it all to ObjAddress
 instead, but oh well.

Not sure that I see it as that huge a deal..  They're still returning an
Oid, it's just embedded in the ObjAddress to provide a complete
statement of what the object is.

btw, the hunk in 0026 which adds a 'break;' into standard_ProcessUtility
caught me by surprise.  Looks like that 'break;' was missing from 0003
(for T_GrantStmt).

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Allow snapshot too old error, to prevent bloat

2015-02-18 Thread Kevin Grittner
Magnus Hagander mag...@hagander.net wrote:
 On Feb 17, 2015 12:26 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2015-02-16 16:35:46 -0500, Bruce Momjian wrote:

 It seems we already have a mechanism in place that allows
 tuning of query cancel on standbys vs. preventing standby
 queries from seeing old data, specifically
 max_standby_streaming_delay/max_standby_archive_delay.  We
 obsessed about how users were going to react to these odd
 variables, but there has been little negative feedback.

 FWIW, I think that's a somewhat skewed perception. I think it
 was right to introduce those, because we didn't really have any
 alternatives.

As far as I can see, the alternatives suggested so far are all
about causing heap bloat to progress more slowly, but still without
limit.  I suggest, based on a lot of user feedback (from the
customer I've talked about at some length on this thread, as well
as numerous others), that unlimited bloat based on the activity of
one connection is a serious deficiency in the product; and that
there is no real alternative to something like a snapshot too old
error if we want to fix that deficiency.  Enhancements to associate
a snapshot with a database and using a separate vacuum xmin per
database, not limiting vacuum of a particular object by snapshots
that cannot see that snapshot, etc., are all Very Good Things and I
hope those changes are made, but none of that fixes a very
fundamental flaw.

 But max_standby_streaming_delay, max_standby_archive_delay and
 hot_standby_feedback are among the most frequent triggers for
 questions and complaints that I/we see.

 Agreed.
 And a really bad one used to be vacuum_defer_cleanup_age, because
 of confusing units amongst other things. Which in terms seems
 fairly close to Kevins suggestions, unfortunately.

Particularly my initial suggestion, which was to base snapshot
age it on the number of transaction IDs assigned.  Does this look
any better to you if it is something that can be set to '20min' or
'1h'?  Just to restate, that would not automatically cancel the
snapshots past that age; it would allow vacuum of any tuples which
became dead that long ago, and would cause a snapshot too old
message for any read of a page modified more than that long ago
using a snapshot which was older than that.

Unless this idea gets some +1 votes Real Soon Now, I will mark the
patch as Rejected and move on.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] deparsing utility commands

2015-02-18 Thread Alvaro Herrera
  Now, we probably don't want to hack *all* the utility commands to return
  ObjectAddress instead of OID, because it many cases that's just not
  going to be convenient (not to speak of the code churn); so I think for
  most objtypes the ProcessUtilitySlow stanza would look like this:

 That'd be fine with me, though for my 2c, I wouldn't object to changing
 them all to return ObjectAddress either.  I agree that it'd cause a fair
 bit of code churn to do so, but there's a fair bit of code churn
 happening here anyway (looking at what 0008 does to ProcessUtilitySlow
 anyway).

Well, that would make my life easier I think (even if it's a bit more
work), so unless there are objections I will do things this way.  It's a
bit of a pity that Robert and Dimitri went to huge lengths to have these
functions return OID and we're now changing it all to ObjAddress
instead, but oh well.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Expanding the use of FLEXIBLE_ARRAY_MEMBER for declarations like foo[1]

2015-02-18 Thread Andres Freund
On 2015-02-19 07:10:19 +0900, Michael Paquier wrote:
 On Wed, Feb 18, 2015 at 10:09 PM, Andres Freund and...@2ndquadrant.com 
 wrote:
  On 2015-02-18 17:15:18 +0900, Michael Paquier wrote:
   - I don't think that the t_bits fields in htup_details.h should be
   updated either.
  
   Why not? Any not broken code should already use MinHeapTupleSize and
   similar macros.
 
  Changing t_bits impacts HeapTupleHeaderData, ReorderBufferTupleBuf and
  similarly a couple of redo routines in heapam.c using
  HeapTupleHeaderData in a couple of structures not placing it at the
  end (compiler complains).
 
  The compiler will complain if you use a FLEXIBLE_ARRAY_MEMBER in the
  middle of a struct but not when when you embed a struct that uses it
  into the middle another struct. At least gcc doesn't and I think it'd be
  utterly broken if another compiler did that. If there's a compiler that
  does so, we need to make it define FLEXIBLE_ARRAY_MEMBER to 1.
 
 clang does complain on my OSX laptop regarding that ;)

I think that then puts the idea of doing it for those structs pretty
much to bed.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Expanding the use of FLEXIBLE_ARRAY_MEMBER for declarations like foo[1]

2015-02-18 Thread Michael Paquier
On Wed, Feb 18, 2015 at 10:09 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2015-02-18 17:15:18 +0900, Michael Paquier wrote:
  - I don't think that the t_bits fields in htup_details.h should be
  updated either.
 
  Why not? Any not broken code should already use MinHeapTupleSize and
  similar macros.

 Changing t_bits impacts HeapTupleHeaderData, ReorderBufferTupleBuf and
 similarly a couple of redo routines in heapam.c using
 HeapTupleHeaderData in a couple of structures not placing it at the
 end (compiler complains).

 The compiler will complain if you use a FLEXIBLE_ARRAY_MEMBER in the
 middle of a struct but not when when you embed a struct that uses it
 into the middle another struct. At least gcc doesn't and I think it'd be
 utterly broken if another compiler did that. If there's a compiler that
 does so, we need to make it define FLEXIBLE_ARRAY_MEMBER to 1.

clang does complain on my OSX laptop regarding that ;)
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Expanding the use of FLEXIBLE_ARRAY_MEMBER for declarations like foo[1]

2015-02-18 Thread Tom Lane
Michael Paquier michael.paqu...@gmail.com writes:
 On Wed, Feb 18, 2015 at 10:09 PM, Andres Freund and...@2ndquadrant.com 
 wrote:
 The compiler will complain if you use a FLEXIBLE_ARRAY_MEMBER in the
 middle of a struct but not when when you embed a struct that uses it
 into the middle another struct. At least gcc doesn't and I think it'd be
 utterly broken if another compiler did that. If there's a compiler that
 does so, we need to make it define FLEXIBLE_ARRAY_MEMBER to 1.

 clang does complain on my OSX laptop regarding that ;)

I'm a bit astonished that gcc doesn't consider this an error.  Sure seems
like it should.  (Has anyone tried it on recent gcc?)  I am entirely
opposed to Andreas' claim that we ought to consider compilers that do warn
to be broken; if anything it's the other way around.

Moreover, if we have any code that is assuming such cases are okay, it
probably needs a second look.  Isn't this situation effectively assuming
that a variable-length array is fixed-length?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow snapshot too old error, to prevent bloat

2015-02-18 Thread Stephen Frost
Kevin,

* Kevin Grittner (kgri...@ymail.com) wrote:
 Magnus Hagander mag...@hagander.net wrote:
  On Feb 17, 2015 12:26 AM, Andres Freund and...@2ndquadrant.com wrote:
  On 2015-02-16 16:35:46 -0500, Bruce Momjian wrote:
 
  It seems we already have a mechanism in place that allows
  tuning of query cancel on standbys vs. preventing standby
  queries from seeing old data, specifically
  max_standby_streaming_delay/max_standby_archive_delay.  We
  obsessed about how users were going to react to these odd
  variables, but there has been little negative feedback.
 
  FWIW, I think that's a somewhat skewed perception. I think it
  was right to introduce those, because we didn't really have any
  alternatives.
 
 As far as I can see, the alternatives suggested so far are all
 about causing heap bloat to progress more slowly, but still without
 limit.  I suggest, based on a lot of user feedback (from the
 customer I've talked about at some length on this thread, as well
 as numerous others), that unlimited bloat based on the activity of
 one connection is a serious deficiency in the product; and that
 there is no real alternative to something like a snapshot too old
 error if we want to fix that deficiency.  Enhancements to associate
 a snapshot with a database and using a separate vacuum xmin per
 database, not limiting vacuum of a particular object by snapshots
 that cannot see that snapshot, etc., are all Very Good Things and I
 hope those changes are made, but none of that fixes a very
 fundamental flaw.

I also agree with the general idea that it makes sense to provide a way
to control bloat, but I think you've missed what Andres was getting at
with his suggestion (if I understand correctly, apologies if I don't).

The problem is that we're only looking at the overall xmin / xmax
horizon when it comes to deciding if a given tuple is dead.  That's
not quite right- the tuple might actually be dead to all *current*
transactions by being newer than the oldest transaction but dead for all
later transactions.  Basically, there exist gaps between our cluster
wide xmin / xmax where we might find actually dead rows.  Marking those
rows dead and reusable *would* stop the bloat, not just slow it down.

In the end, with a single long-running transaction, the worst bloat you
would have is double the size of the system at the time the long-running
transaction started.  Another way of thinking about it is with this
timeline:

xx
/\  /\ /\ /\
Long runningrow createdrow deletedvacuum
transaction starts

Now, if all transactions currently running started either before the row
was created or after the row was deleted, then the row is dead and
vacuum can reclaim it.  Finding those gaps in the transaction space for
all of the currently running processes might not be cheap, but for this
kind of a use-case, it might be worth the effort.  On a high-churn
system where the actual set of rows being copied over constantly isn't
very high then you could end up with some amount of duplication of rows-
those to satisfy the ancient transaction and the current working set,
but there wouldn't be any further bloat and it'd almost cerainly be a
much better situation than what is being seen now.

 Particularly my initial suggestion, which was to base snapshot
 age it on the number of transaction IDs assigned.  Does this look
 any better to you if it is something that can be set to '20min' or
 '1h'?  Just to restate, that would not automatically cancel the
 snapshots past that age; it would allow vacuum of any tuples which
 became dead that long ago, and would cause a snapshot too old
 message for any read of a page modified more than that long ago
 using a snapshot which was older than that.
 
 Unless this idea gets some +1 votes Real Soon Now, I will mark the
 patch as Rejected and move on.

I'm not against having a knob like this, which is defaulted to off,
but I do think we'd be a lot better off with a system that could
realize when rows are not visible to any currently running transaction
and clean them up.  If this knob's default is off then I don't think
we'd be likely to get the complaints which are being discussed (or, if
we did, we could point the individual at the admin who set the knob...).

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-18 Thread Michael Paquier
On Mon, Feb 16, 2015 at 8:55 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2015-02-16 11:30:20 +, Syed, Rahila wrote:
 - * As a trivial form of data compression, the XLOG code is aware that
 - * PG data pages usually contain an unused hole in the middle, which
 - * contains only zero bytes.  If hole_length  0 then we have removed
 - * such a hole from the stored data (and it's not counted in the
 - * XLOG record's CRC, either).  Hence, the amount of block data actually
 - * present is BLCKSZ - hole_length bytes.
 + * Block images are able to do several types of compression:
 + * - When wal_compression is off, as a trivial form of compression, the
 + * XLOG code is aware that PG data pages usually contain an unused hole
 + * in the middle, which contains only zero bytes.  If length  BLCKSZ
 + * then we have removed such a hole from the stored data (and it is
 + * not counted in the XLOG record's CRC, either).  Hence, the amount
 + * of block data actually present is length bytes.  The hole offset
 + * on page is defined using hole_offset.
 + * - When wal_compression is on, block images are compressed using a
 + * compression algorithm without their hole to improve compression
 + * process of the page. length corresponds in this case to the length
 + * of the compressed block. hole_offset is the hole offset of the page,
 + * and the length of the uncompressed block is defined by raw_length,
 + * whose data is included in the record only when compression is enabled
 + * and with_hole is set to true, see below.
 + *
 + * is_compressed is used to identify if a given block image is compressed
 + * or not. Maximum page size allowed on the system being 32k, the hole
 + * offset cannot be more than 15-bit long so the last free bit is used to
 + * store the compression state of block image. If the maximum page size
 + * allowed is increased to a value higher than that, we should consider
 + * increasing this structure size as well, but this would increase the
 + * length of block header in WAL records with alignment.
 + *
 + * with_hole is used to identify the presence of a hole in a block image.
 + * As the length of a block cannot be more than 15-bit long, the extra bit 
 in
 + * the length field is used for this identification purpose. If the block 
 image
 + * has no hole, it is ensured that the raw size of a compressed block image 
 is
 + * equal to BLCKSZ, hence the contents of 
 XLogRecordBlockImageCompressionInfo
 + * are not necessary.
   */
  typedef struct XLogRecordBlockImageHeader
  {
 - uint16  hole_offset;/* number of bytes before hole */
 - uint16  hole_length;/* number of bytes in hole */
 + uint16  length:15,  /* length of block data in 
 record */
 + with_hole:1;/* status of hole in the block 
 */
 +
 + uint16  hole_offset:15, /* number of bytes before hole */
 + is_compressed:1;/* compression status of image */
 +
 + /* Followed by the data related to compression if block is compressed 
 */
  } XLogRecordBlockImageHeader;

 Yikes, this is ugly.

 I think we should change the xlog format so that the block_id (which
 currently is XLR_BLOCK_ID_DATA_SHORT/LONG or a actual block id) isn't
 the block id but something like XLR_CHUNK_ID. Which is used as is for
 XLR_CHUNK_ID_DATA_SHORT/LONG, but for backup blocks can be set to to
 XLR_CHUNK_BKP_WITH_HOLE, XLR_CHUNK_BKP_COMPRESSED,
 XLR_CHUNK_BKP_REFERENCE... The BKP blocks will then follow, storing the
 block id following the chunk id.
 Yes, that'll increase the amount of data for a backup block by 1 byte,
 but I think that's worth it. I'm pretty sure we will be happy about the
 added extensibility pretty soon.

Yeah, that would help for readability and does not cost much compared
to BLCKSZ. Still could you explain what kind of extensibility you have
in mind except code readability? It is hard to make a nice picture
with only the paper and the pencils, and the current patch approach
has been taken to minimize the record length, particularly for users
who do not care about WAL compression.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Join push-down support for foreign tables

2015-02-18 Thread Shigeru Hanada
2015-02-17 10:39 GMT+09:00 Kouhei Kaigai kai...@ak.jp.nec.com:
 Let me put some comments in addition to where you're checking now.

 [design issues]
 * Cost estimation
 Estimation and evaluation of cost for remote join query is not an
 obvious issue. In principle, local side cannot determine the cost
 to run remote join without remote EXPLAIN, because local side has
 no information about JOIN logic applied on the remote side.
 Probably, we have to put an assumption for remote join algorithm,
 because local planner has no idea about remote planner's choice
 unless foreign-join don't take use_remote_estimate.
 I think, it is reasonable assumption (even if it is incorrect) to
 calculate remote join cost based on local hash-join algorithm.
 If user wants more correct estimation, remote EXPLAIN will make
 more reliable cost estimation.

Hm, I guess that you chose hash-join as least-costed join.  In the
pgbench model, most combination between two tables generate hash join
as cheapest path.  Remote EXPLAIN is very expensive in the context of
planning, so it would easily make the plan optimization meaningless.
But giving an option to users is good, I agree.


 It also needs a consensus whether cost for remote CPU execution is
 equivalent to local CPU. If we think local CPU is rare resource
 than remote one, a discount rate will make planner more preferable
 to choose remote join than local one

Something like cpu_cost_ratio as a new server-level FDW option?


 Once we assume a join algorithm for remote join, unit cost for
 remote CPU, we can calculate a cost for foreign join based on
 the local join logic plus cost for network translation (maybe
 fdw_tuple_cost?).

Yes, sum of these costs is the total cost of a remote join.
o fdw_startup_cost
o hash-join cost, estimated as a local join
o fdw_tuple_cost * rows * width

 * FDW options
 Unlike table scan, FDW options we should refer is unclear.
 Table level FDW options are associated with a foreign table as
 literal. I think we have two options here:
 1. Foreign-join refers FDW options for foreign-server, but ones
for foreign-tables are ignored.
 2. Foreign-join is prohibited when both of relations don't have
identical FDW options.
 My preference is 2. Even though N-way foreign join, it ensures
 all the tables involved with (N-1)-way foreign join has identical
 FDW options, thus it leads we can make N-way foreign join with
 all identical FDW options.
 One exception is updatable flag of postgres_fdw. It does not
 make sense on remote join, so I think mixture of updatable and
 non-updatable foreign tables should be admitted, however, it is
 a decision by FDW driver.

 Probably, above points need to take time for getting consensus.
 I'd like to see your opinion prior to editing your patch.

postgres_fdw can't push down a join which contains foreign tables on
multiple servers, so use_remote_estimate and fdw_startup_cost  are the
only FDW options to consider.  So we have options for each option.

1-a. If all foreign tables in the join has identical
use_remote_estimate, allow pushing down.
1-b. If any of foreign table in the join has true as
use_remote_estimate, use remote estimate.

2-a. If all foreign tables in the join has identical fdw_startup_cost,
allow pushing down.
2-b. Always use max value in the join. (cost would be more expensive)
2-c. Always use min value in the join.  (cost would be cheaper)

I prefer 1-a and 2-b, so more joins avoid remote EXPLAIN but have
reasonable cost about startup.

I agree about updatable option.


 [implementation issues]
 The interface does not intend to add new Path/Plan type for each scan
 that replaces foreign joins. What postgres_fdw should do is, adding
 ForeignPath towards a particular joinrel, then it populates ForeignScan
 with remote join query once it got chosen by the planner.

That idea is interesting, and make many things simpler.  Please let me consider.


 A few functions added in src/backend/foreign/foreign.c are not
 called by anywhere, at this moment.

 create_plan_recurse() is reverted to static. It is needed for custom-
 join enhancement, if no other infrastructure can support.

I made it back to static because I thought that create_plan_recurse
can be called by core before giving control to FDWs.  But I'm not sure
it can be applied to custom scans.  I'll recheck that part.


-- 
Shigeru HANADA


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Expanding the use of FLEXIBLE_ARRAY_MEMBER for declarations like foo[1]

2015-02-18 Thread Michael Paquier
On Thu, Feb 19, 2015 at 3:57 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Thu, Feb 19, 2015 at 2:58 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Michael Paquier michael.paqu...@gmail.com writes:
 1-2) sizeof(ParamListInfoData) is present in a couple of places,
 assuming that sizeof(ParamListInfoData) has the equivalent of 1
 parameter, like prepare.c, functions.c, spi.c and postgres.c:
 -   /* sizeof(ParamListInfoData) includes the first array element */
 paramLI = (ParamListInfo)
 palloc(sizeof(ParamListInfoData) +
 -  (num_params - 1) * sizeof(ParamExternData));
 +  num_params * sizeof(ParamExternData));
 1-3) FuncCandidateList in namespace.c (thanks Andres!):
 newResult = (FuncCandidateList)
 -   palloc(sizeof(struct _FuncCandidateList) - 
 sizeof(Oid)
 -  + effective_nargs * sizeof(Oid));
 +   palloc(sizeof(struct _FuncCandidateList) +
 +  effective_nargs * sizeof(Oid));
 I imagine that we do not want for those palloc calls to use ifdef
 FLEXIBLE_ARRAY_MEMBER to save some memory for code readability even if
 compiler does not support flexible-array length, right?

 These are just wrong.  As a general rule, we do not want to *ever* take
 sizeof() a struct that contains a flexible array: the results will not
 be consistent across platforms.  The right thing is to use offsetof()
 instead.  See the helpful comment autoconf provides:

 [...]

 And I had this one in front of my eyes a couple of hours ago... Thanks.

 This point is actually the main reason we've not done this change long
 since.  People did not feel like running around to make sure there were
 no overlooked uses of sizeof().

 Thanks for the clarifications and the review. Attached is a new set.

Grr. Completely forgot to use offsetof in dumputils.c as well. Patch
that can be applied on top of 0001 is attached.
-- 
Michael
diff --git a/src/bin/pg_dump/dumputils.c b/src/bin/pg_dump/dumputils.c
index 892d3fa..3459adc 100644
--- a/src/bin/pg_dump/dumputils.c
+++ b/src/bin/pg_dump/dumputils.c
@@ -1217,7 +1217,7 @@ simple_string_list_append(SimpleStringList *list, const char *val)
 	SimpleStringListCell *cell;
 
 	/* this calculation correctly accounts for the null trailing byte */
-	cell = (SimpleStringListCell *) pg_malloc(sizeof(SimpleStringListCell));
+	cell = (SimpleStringListCell *) pg_malloc(offsetof(SimpleStringListCell, val));
 
 	cell-next = NULL;
 	strcpy(cell-val, val);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Odd behavior of updatable security barrier views on foreign tables

2015-02-18 Thread Stephen Frost
Etsuro,

* Etsuro Fujita (fujita.ets...@lab.ntt.co.jp) wrote:
 On 2015/02/18 7:44, Stephen Frost wrote:
 * The patch applies to the latest head, 'make' passes successfully,
 but 'make check' fails in the rowsecurity test.

Here's the patch against master.  I'm still fiddling with the comment
wording and the commit message a bit, but barring objections these
patches are what I'm planning to move forward with.

Thanks!

Stephen
From ea3713a8b648459d3024d331ef0374f6c9622247 Mon Sep 17 00:00:00 2001
From: Stephen Frost sfr...@snowman.net
Date: Tue, 17 Feb 2015 15:43:33 -0500
Subject: [PATCH] Add locking clause for SB views for update/delete

In expand_security_qual(), we were handling locking correctly when a
PlanRowMark existed, but not when we were working with the target
relation (which doesn't have any PlanRowMarks, but the subquery created
for the security barrier quals still needs to lock the rows under it).

Noted by Etsuro Fujita when working with the Postgres FDW, which wasn't
properly issuing a SELECT ... FOR UPDATE to the remote side under a
DELETE.

Back-patch to 9.4 where updatable security barrier views were
introduced.
---
 src/backend/optimizer/prep/prepsecurity.c |  24 ++-
 src/test/regress/expected/rowsecurity.out |  64 ---
 src/test/regress/expected/updatable_views.out | 264 ++
 3 files changed, 199 insertions(+), 153 deletions(-)

diff --git a/src/backend/optimizer/prep/prepsecurity.c b/src/backend/optimizer/prep/prepsecurity.c
index af3ee61..ce7b203 100644
--- a/src/backend/optimizer/prep/prepsecurity.c
+++ b/src/backend/optimizer/prep/prepsecurity.c
@@ -37,7 +37,7 @@ typedef struct
 } security_barrier_replace_vars_context;
 
 static void expand_security_qual(PlannerInfo *root, List *tlist, int rt_index,
-	 RangeTblEntry *rte, Node *qual);
+	 RangeTblEntry *rte, Node *qual, bool targetRelation);
 
 static void security_barrier_replace_vars(Node *node,
 			  security_barrier_replace_vars_context *context);
@@ -63,6 +63,7 @@ expand_security_quals(PlannerInfo *root, List *tlist)
 	Query	   *parse = root-parse;
 	int			rt_index;
 	ListCell   *cell;
+	bool		targetRelation = false;
 
 	/*
 	 * Process each RTE in the rtable list.
@@ -98,6 +99,12 @@ expand_security_quals(PlannerInfo *root, List *tlist)
 		{
 			RangeTblEntry *newrte = copyObject(rte);
 
+			/*
+			 * We need to let expand_security_qual know if this is the target
+			 * relation, as it has additional work to do in that case.
+			 */
+			targetRelation = true;
+
 			parse-rtable = lappend(parse-rtable, newrte);
 			parse-resultRelation = list_length(parse-rtable);
 
@@ -147,7 +154,8 @@ expand_security_quals(PlannerInfo *root, List *tlist)
 			rte-securityQuals = list_delete_first(rte-securityQuals);
 
 			ChangeVarNodes(qual, rt_index, 1, 0);
-			expand_security_qual(root, tlist, rt_index, rte, qual);
+			expand_security_qual(root, tlist, rt_index, rte, qual,
+ targetRelation);
 		}
 	}
 }
@@ -160,7 +168,7 @@ expand_security_quals(PlannerInfo *root, List *tlist)
  */
 static void
 expand_security_qual(PlannerInfo *root, List *tlist, int rt_index,
-	 RangeTblEntry *rte, Node *qual)
+	 RangeTblEntry *rte, Node *qual, bool targetRelation)
 {
 	Query	   *parse = root-parse;
 	Oid			relid = rte-relid;
@@ -256,6 +264,16 @@ expand_security_qual(PlannerInfo *root, List *tlist, int rt_index,
 			}
 
 			/*
+			 * We need to handle the case where this is the target relation
+			 * explicitly since it won't have any row marks, because we still
+			 * need to lock the records coming back from the with-security-quals
+			 * subquery.  This might not appear obivous, but it matches what
+			 * we're doing above and keeps FDWs happy too.
+			 */
+			if (targetRelation)
+applyLockingClause(subquery, 1, LCS_FORUPDATE,
+   LockWaitBlock, false);
+			/*
 			 * Replace any variables in the outer query that refer to the
 			 * original relation RTE with references to columns that we will
 			 * expose in the new subquery, building the subquery's targetlist
diff --git a/src/test/regress/expected/rowsecurity.out b/src/test/regress/expected/rowsecurity.out
index 21817d8..f41bef1 100644
--- a/src/test/regress/expected/rowsecurity.out
+++ b/src/test/regress/expected/rowsecurity.out
@@ -1034,22 +1034,25 @@ EXPLAIN (COSTS OFF) EXECUTE p2(2);
 --
 SET SESSION AUTHORIZATION rls_regress_user1;
 EXPLAIN (COSTS OFF) UPDATE t1 SET b = b || b WHERE f_leak(b);
- QUERY PLAN  
--
+QUERY PLAN 
+---
  Update on t1 t1_3
-  Subquery Scan on t1
  Filter: f_leak(t1.b)
- -  Seq Scan on t1 t1_4
-   Filter: ((a % 2) = 0)
+ -  LockRows
+   -  Seq Scan on t1 t1_4
+ Filter: ((a % 2) = 0)
-  Subquery Scan on t1_1
  Filter: f_leak(t1_1.b)
- -  Seq Scan on t2
-  

Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

2015-02-18 Thread Neil Tiffin
Stephen,

I meant it to go to the list, but hit the wrong button.

 On Feb 17, 2015, at 7:01 PM, Stephen Frost sfr...@snowman.net wrote:
 
 Neil,
 
 I noticed that you email'd me directly on this reply.  Not sure if you
 intended to or not, but I'm fine with my response going to the list.
 
 * Neil Tiffin (ne...@neiltiffin.com) wrote:
 On Feb 17, 2015, at 1:10 PM, Stephen Frost sfr...@snowman.net wrote:
 If the DB account isn't a superuser then everything changes.  There's no
 point fighting with the OS user- they can run some other PG binary which
 they've copied over locally and run SQL with that, or copy all the files
 over to another server which they have complete access to.  The fact
 that they can also connect to the DB and run SQL isn’t really an issue.
 
 Thats the point. If this environment matters then the db superuser would not 
 be an authorized os superuser (with root or even close privileges). And no, 
 they could not be running some other binary.
 
 One way to do pg superuser auditing is to utilize a file (outside of the pg 
 data directory, which probably violates something in pg) like 
 /etc/pg_su_audit that has os root rw and r for all others (or the equivalent 
 for other os’s) containing a URL.  If this file is present, send all db 
 superuser usage to the URL.  If this file is present with the wrong 
 privileges, then don’t start pg. Done.  No configuration in pg config files, 
 no GUCs, no nothing for the pg superuser to mess around with, not tables, no 
 ability for any pg superuser to configure or control.  
 
 This approach doesn't violate anything in PG and can be used with any of
 the pgaudit approaches being discussed.  The fact that it's
 postgresql.conf, which represents GUCs, doesn't change anything
 regarding what you’re getting it.
 

It changes everything, pg superusers have complete control of files in the pg 
data directory.  If set up correctly pg superusers have no control in /etc. 

 If they can copy or install a PG binary, then the OS auditing and security 
 failed. PG code need not be concerned.
 
 Sure someone could still break access to the URL, but again, not PG’s 
 concern.  Other security and auditing would have responsibility to pick that 
 up.
 
 It is a really simple use case, record everything any db superuser does and 
 send it to the audit system.  Done.  Then a designated role can control what 
 gets audited in production.  As long as everything the db superuser does can 
 be written to an audit log, then it no longer matters technically if the db 
 superuser can change the rest of the auditing.  If they do and it violates 
 policy, then when the audit picks it up, they lose their job plus depending 
 on the environment. 
 
 The issue is really around what we claim to provide with this auditing.
 We can't claim to provide *full* superuser auditing with any of these
 approaches since the superuser can disable auditing.  We can, however,
 provide auditing up until that happens, which is likely to be sufficient
 in many environments.  For those environments where full superuser
 auditing is required, an independent system must be used.
 
 Of course, it's important that any auditing mechanism which is used to
 audit superusers be impossible for the superuser to modify after the
 fact, meaning that syslog or similar needs to be used.
 

I’m still confused since you do do not differentiate between db superuser and 
os superuser and what you mean by an independent system?

With the scheme I described above, how does the db superuser disable auditing 
without os root privileges?  If they can, then pg security is fundamentally 
broken, and I don’t believe it is.

How can an independent system monitor what commands are issued inside the 
database?

I understand my comments do not cover what is being proposed or worked on and 
that is fine.  But saying it does not have value because the superuser could 
disable any system in pg, is wrong IMO.  Being able to reliability log db 
superusers without their ability to change the logging would be a fundamentally 
good security tool as companies become more serious about internal security.  
This is, and will be more, important since a lot of people consider insider 
breaches the biggest security challenge today.

Neil






-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Expanding the use of FLEXIBLE_ARRAY_MEMBER for declarations like foo[1]

2015-02-18 Thread Andres Freund
On 2015-02-18 17:15:18 +0900, Michael Paquier wrote:
  - I don't think that the t_bits fields in htup_details.h should be
  updated either.
 
  Why not? Any not broken code should already use MinHeapTupleSize and
  similar macros.
 
 Changing t_bits impacts HeapTupleHeaderData, ReorderBufferTupleBuf and
 similarly a couple of redo routines in heapam.c using
 HeapTupleHeaderData in a couple of structures not placing it at the
 end (compiler complains).

The compiler will complain if you use a FLEXIBLE_ARRAY_MEMBER in the
middle of a struct but not when when you embed a struct that uses it
into the middle another struct. At least gcc doesn't and I think it'd be
utterly broken if another compiler did that. If there's a compiler that
does so, we need to make it define FLEXIBLE_ARRAY_MEMBER to 1.

Code embedding structs using *either* [FLEXIBLE_ARRAY_MEMBER] or [1] for
variable length obviously has to take care where the variable length
part goes. And that what the structs you removed where doing - that's
where the t_bits et al go:
{
...
HeapTupleHeaderData header;
chardata[MaxHeapTupleSize];
...
}
the 'data' bit is just the t_bits + data together. And similar in
-   struct
-   {
-   struct varlena hdr;
-   chardata[TOAST_MAX_CHUNK_SIZE]; /* make struct big 
enough */
-   int32   align_it;   /* ensure struct is aligned 
well enough */
-   }   chunk_data;

The 'data' is where the varlena's vl_dat stuff is stored.
  Places using a variable-length variable not at the end of a structure
  are changed with workaround, without impacting what those features do.
 
  I vote for rejecting most of this, except a (corrected version) of the
  pg_authid change. Which doesn't really belong to the rest of the patch
  anyway ;)x
 
 Changing bytea to use FLEXIBLE_ARRAY_MEMBER requires those changes, or
 at least some changes in this area as something with
 FLEXIBLE_ARRAY_MEMBER can only be placed at the end of a structure.

Again, I think you're confusing things that may not be be done in a
single struct, and structs that are embedded in other places.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw foreign keys with default sequence

2015-02-18 Thread Kevin Grittner
Tim Kane tim.k...@gmail.com wrote:

 CREATE MATERIALISED VIEW local.devices;

 CREATE test_table (device_id bigint FOREIGN KEY (device_id) REFERENCES 
 local.devices (device_id) );

 ERROR:  referenced relation devices is not a table

In the future, please show code that you have actually run.  In
this case it's pretty easy to know how to answer, but in others it
may really draw out the process of helping you.

At this time materialized views do not support constraints, and may
not be referenced in foreign key definitions.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Odd behavior of updatable security barrier views on foreign tables

2015-02-18 Thread Stephen Frost
Etsuro,

* Etsuro Fujita (fujita.ets...@lab.ntt.co.jp) wrote:
 On 2015/02/18 7:44, Stephen Frost wrote:
 Attached is a patch which should address this.  Would love your (or
 anyone else's) feedback on it.  It appears to address the issue which
 you raised and the regression test changes are all in-line with
 inserting a LockRows into the subquery, as anticipated.
 
 I've looked into the patch.
 
 * The patch applies to the latest head, 'make' passes successfully,
 but 'make check' fails in the rowsecurity test.

Apologies for not being clear- the patch was against 9.4, where it
passes all the regression tests (at least for me- if you see
differently, please let me know!).

 * I found one place in expand_security_qual that I'm concerned about:
 
 + if (targetRelation)
 + applyLockingClause(subquery, 1, LCS_FORUPDATE,
 +false, 
 false);
 
 ISTM that it'd be better to use LockWaitBlock as the fourth argument
 of applyLockingClause.

LockWaitBlock isn't in 9.4. :)  Otherwise, I'd agree, and it's what I
plan to do for master.

 Other than that, the patch looks good to me.

Great, thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

2015-02-18 Thread Stephen Frost
Neil,

* Neil Tiffin (ne...@neiltiffin.com) wrote:
 I meant it to go to the list, but hit the wrong button.

No problem.

  On Feb 17, 2015, at 7:01 PM, Stephen Frost sfr...@snowman.net wrote:
  I noticed that you email'd me directly on this reply.  Not sure if you
  intended to or not, but I'm fine with my response going to the list.
  
  This approach doesn't violate anything in PG and can be used with any of
  the pgaudit approaches being discussed.  The fact that it's
  postgresql.conf, which represents GUCs, doesn't change anything
  regarding what you’re getting it.
  
 
 It changes everything, pg superusers have complete control of files in the pg 
 data directory.  If set up correctly pg superusers have no control in /etc. 

If set up correctly, postgresql.conf is in /etc. :)  That's distribution
specific though.  However, postgresql.auto.conf ends up causing problems
since it can't be disabled and it's forced to live in the PG data
directory.  Even so though, your argument was that, as long as the
system is set up to be auditing initially and whatever action the
superuser takes to disable auditing will be audited, and disabling
auditing is against policy, then it's sufficient to meet the
requirement.  That's true in either case.

  The issue is really around what we claim to provide with this auditing.
  We can't claim to provide *full* superuser auditing with any of these
  approaches since the superuser can disable auditing.  We can, however,
  provide auditing up until that happens, which is likely to be sufficient
  in many environments.  For those environments where full superuser
  auditing is required, an independent system must be used.
  
  Of course, it's important that any auditing mechanism which is used to
  audit superusers be impossible for the superuser to modify after the
  fact, meaning that syslog or similar needs to be used.
 
 I’m still confused since you do do not differentiate between db superuser and 
 os superuser and what you mean by an independent system?

When I'm talking about 'superuser' here, it's the PG superuser, not the
OS superuser.  What I mean by independent system is a process which is
not owned by the same user that the database is running as, and isn't
owned by the user who is connecting, that facilitates the connection
from the user to PG, but which logs everything that happens on that
connection.

 With the scheme I described above, how does the db superuser disable auditing 
 without os root privileges?  If they can, then pg security is fundamentally 
 broken, and I don’t believe it is.

The DB superuser can modify the running process, through mechanisms as
simple as changing GUCs to creating functions in untrusted languages
(including C) and then using them to change or break more-or-less
anything that's happening in the backend.

 How can an independent system monitor what commands are issued inside the 
 database?

It can log everything which happens on the connection between the user
and the database because it's in the middle.

 I understand my comments do not cover what is being proposed or worked on and 
 that is fine.  But saying it does not have value because the superuser could 
 disable any system in pg, is wrong IMO.  Being able to reliability log db 
 superusers without their ability to change the logging would be a 
 fundamentally good security tool as companies become more serious about 
 internal security.  This is, and will be more, important since a lot of 
 people consider insider breaches the biggest security challenge today.

It'd be a great tool, certainly, but it's not actually possible to do
completely inside the DB process given the capabilities the PG superuser
has.  Being able to create and run functions in untrusted languages
means that the superuser can completely hijack the process, that's why
those languages are untrusted.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-18 Thread Syed, Rahila
Hello,

I think we should change the xlog format so that the block_id (which currently 
is XLR_BLOCK_ID_DATA_SHORT/LONG or a actual block id) isn't the block id but 
something like XLR_CHUNK_ID. Which is used as is for 
XLR_CHUNK_ID_DATA_SHORT/LONG, but for backup blocks can be set to to 
XLR_CHUNK_BKP_WITH_HOLE, XLR_CHUNK_BKP_COMPRESSED, XLR_CHUNK_BKP_REFERENCE... 
The BKP blocks will then follow, storing the block id following the chunk id.

Yes, that'll increase the amount of data for a backup block by 1 byte, but I 
think that's worth it. I'm pretty sure we will be happy about the added 
extensibility pretty soon.

To clarify my understanding of the above change,

Instead of a block id to reference different fragments of an xlog record , a 
single byte field chunk_id  should be used.  chunk_id  will be same as 
XLR_BLOCK_ID_DATA_SHORT/LONG for main data fragments. 
But for block references, it will take store following values in order to store 
information about the backup blocks.
#define XLR_CHUNK_BKP_COMPRESSED  0x01  
#define XLR_CHUNK_BKP_WITH_HOLE 0x02
...

The new xlog format should look like follows,

Fixed-size header (XLogRecord struct)
Chunk_id(add a field before id field in XLogRecordBlockHeader struct)
XLogRecordBlockHeader 
Chunk_id
 XLogRecordBlockHeader 
...
...
Chunk_id ( rename id field of the XLogRecordDataHeader struct) 
XLogRecordDataHeader[Short|Long] 
 block data
 block data
 ...
 main data

I will post a patch based on this.

Thank you,
Rahila Syed

-Original Message-
From: Andres Freund [mailto:and...@2ndquadrant.com] 
Sent: Monday, February 16, 2015 5:26 PM
To: Syed, Rahila
Cc: Michael Paquier; Fujii Masao; PostgreSQL mailing lists
Subject: Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

On 2015-02-16 11:30:20 +, Syed, Rahila wrote:
 - * As a trivial form of data compression, the XLOG code is aware that
 - * PG data pages usually contain an unused hole in the middle, 
 which
 - * contains only zero bytes.  If hole_length  0 then we have removed
 - * such a hole from the stored data (and it's not counted in the
 - * XLOG record's CRC, either).  Hence, the amount of block data 
 actually
 - * present is BLCKSZ - hole_length bytes.
 + * Block images are able to do several types of compression:
 + * - When wal_compression is off, as a trivial form of compression, 
 + the
 + * XLOG code is aware that PG data pages usually contain an unused hole
 + * in the middle, which contains only zero bytes.  If length  BLCKSZ
 + * then we have removed such a hole from the stored data (and it is
 + * not counted in the XLOG record's CRC, either).  Hence, the amount
 + * of block data actually present is length bytes.  The hole offset
 + * on page is defined using hole_offset.
 + * - When wal_compression is on, block images are compressed using a
 + * compression algorithm without their hole to improve compression
 + * process of the page. length corresponds in this case to the 
 + length
 + * of the compressed block. hole_offset is the hole offset of the 
 + page,
 + * and the length of the uncompressed block is defined by 
 + raw_length,
 + * whose data is included in the record only when compression is 
 + enabled
 + * and with_hole is set to true, see below.
 + *
 + * is_compressed is used to identify if a given block image is 
 + compressed
 + * or not. Maximum page size allowed on the system being 32k, the 
 + hole
 + * offset cannot be more than 15-bit long so the last free bit is 
 + used to
 + * store the compression state of block image. If the maximum page 
 + size
 + * allowed is increased to a value higher than that, we should 
 + consider
 + * increasing this structure size as well, but this would increase 
 + the
 + * length of block header in WAL records with alignment.
 + *
 + * with_hole is used to identify the presence of a hole in a block image.
 + * As the length of a block cannot be more than 15-bit long, the 
 + extra bit in
 + * the length field is used for this identification purpose. If the 
 + block image
 + * has no hole, it is ensured that the raw size of a compressed block 
 + image is
 + * equal to BLCKSZ, hence the contents of 
 + XLogRecordBlockImageCompressionInfo
 + * are not necessary.
   */
  typedef struct XLogRecordBlockImageHeader  {
 - uint16  hole_offset;/* number of bytes before hole */
 - uint16  hole_length;/* number of bytes in hole */
 + uint16  length:15,  /* length of block data in 
 record */
 + with_hole:1;/* status of hole in the block 
 */
 +
 + uint16  hole_offset:15, /* number of bytes before hole */
 + is_compressed:1;/* compression status of image */
 +
 + /* Followed by the data related to compression if block is 
 +compressed */
  } XLogRecordBlockImageHeader;

Yikes, this is ugly.

I think we should change the xlog format so that the block_id 

Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

2015-02-18 Thread Alvaro Herrera
Stephen Frost wrote:
 Abhijit,
 
 * Abhijit Menon-Sen (a...@2ndquadrant.com) wrote:

  I'm confused and unhappy about your characterisation of the state of
  this patch. You make it seem as though there was broad consensus about
  the changes that were needed, and that I left the patch sitting in the
  archives for a long time without addressing important issues.
 
 The specific issue which I was referring to there was the #ifdef's for
 the deparse branch.  I thought it was pretty clear, based on the
 feedback from multiple people, that all of that really needed to be
 taken out as we don't commit code to the main repo which has such
 external dependencies.

We can remove the #ifdef lines as soon as DDL deparse gets committed, of
course.  There is no external dependency here, only a dependency on a
patch that's being submitted in parallel.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Expanding the use of FLEXIBLE_ARRAY_MEMBER for declarations like foo[1]

2015-02-18 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2015-02-16 21:34:57 -0500, Tom Lane wrote:
 Also, if we want to insist that these fields be accessed
 through heap_getattr, I'd be inclined to put them inside the #ifdef
 CATALOG_VARLEN to enforce that.

 That we definitely should do. It's imo just a small bug that it was
 omitted here. I'll fix it, but not backpatch unless you prefer?

Agreed, that's really independent of FLEXIBLE_ARRAY_MEMBER, so fix
it as its own patch.  I also agree that back-patching is probably
not appropriate.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auditing extension for PostgreSQL (Take 2)

2015-02-18 Thread Simon Riggs
On 15 February 2015 at 02:34, David Steele da...@pgmasters.net wrote:

 I've posted a couple of messages over the last few weeks about the work
 I've been doing on the pg_audit extension.  The lack of response could
 be due to either universal acclaim or complete apathy, but in any case I
 think this is a very important topic so I want to give it another try.

You mentioned you had been following the thread for some time and yet
had not contributed to it. Did that indicate your acclaim for the
earlier patch, or was that apathy? I think neither.

People have been working on this feature for 9 months now, so you
having to wait 9 days for a response is neither universal acclaim, nor
apathy. I've waited much longer than that for Stephen to give the
review he promised, but have not bad mouthed him for that wait, nor do
I do so now. In your first post you had removed the author's email
addresses, so they were likely unaware of your post. I certainly was.

 I've extensively reworked the code that was originally submitted by
 2ndQuandrant.  This is not an indictment of their work, but rather an
 attempt to redress concerns that were expressed by members of the
 community.  I've used core functions to determine how audit events
 should be classified and simplified and tightened the code wherever
 possible.  I've removed deparse and event triggers and opted for methods
 that rely only on existing hooks.  In my last message I provided
 numerous examples of configuration, usage, and output which I hoped
 would alleviate concerns of complexity.  I've also written a ton of unit
 tests to make sure that the code works as expected.

Some people that have contributed ideas to this patch are from
2ndQuadrant, some are not. The main point is that we work together on
things, rather than writing a slightly altered version and then
claiming credit.

If you want to help, please do. We give credit where its due, not to
whoever touched the code last in some kind of bidding war. If we let
this happen, we'd generate a flood of confusing patch versions and
little would ever get committed.

Let's keep to one thread and work to include everybody's ideas then
we'll get something useful committed.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, RemoteDBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Expanding the use of FLEXIBLE_ARRAY_MEMBER for declarations like foo[1]

2015-02-18 Thread Andres Freund
On 2015-02-16 21:34:57 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2015-02-17 05:51:22 +0900, Michael Paquier wrote:
  diff --git a/src/include/catalog/pg_authid.h 
  b/src/include/catalog/pg_authid.h
  index e01e6aa..d8789a5 100644
  --- a/src/include/catalog/pg_authid.h
  +++ b/src/include/catalog/pg_authid.h
  @@ -56,8 +56,10 @@ CATALOG(pg_authid,1260) BKI_SHARED_RELATION 
  BKI_ROWTYPE_OID(2842) BKI_SCHEMA_MAC
  int32  rolconnlimit;   /* max connections allowed (-1=no 
  limit) */
  
  /* remaining fields may be null; use heap_getattr to read them! */
  -  textrolpassword;/* password, if any */
  timestamptz rolvaliduntil; /* password expiration time, if any */
  +#ifdef CATALOG_VARLEN
  +  textrolpassword;/* password, if any */
  +#endif
  } FormData_pg_authid;
 
  That change IIRC is wrong, because it'll make rolvaliduntil until NOT
  NULL (any column that's fixed width and has only fixed with columns
  before it is marked as such).
 
 You were muttering about a BKI_FORCE_NOT_NULL option ... for symmetry,
 maybe we could add BKI_FORCE_NULL as well, and then use that for cases
 like this?

Yea, I guess it'd not be too hard.

 Also, if we want to insist that these fields be accessed
 through heap_getattr, I'd be inclined to put them inside the #ifdef
 CATALOG_VARLEN to enforce that.

That we definitely should do. It's imo just a small bug that it was
omitted here. I'll fix it, but not backpatch unless you prefer?

 I'm generally -1 on reordering any catalog columns as part of this patch.
 There should be zero user-visible change from it IMO.  However, if we
 stick both those columns inside the ifdef, we don't need to reorder.

Agreed.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

2015-02-18 Thread Stephen Frost
Abhijit,

* Abhijit Menon-Sen (a...@2ndquadrant.com) wrote:
 At 2015-02-17 13:01:46 -0500, sfr...@snowman.net wrote:
  I have to admit that I'm confused by this.  Patches don't stabilise
  through sitting in the archives, they stabilise when the comments are
  being addressed, the patch updated, and further comments are
  addressing less important issues.  The issues which Robert and I had
  both commented on didn't appear to be getting addressed.
 
 I'm confused and unhappy about your characterisation of the state of
 this patch. You make it seem as though there was broad consensus about
 the changes that were needed, and that I left the patch sitting in the
 archives for a long time without addressing important issues.

The specific issue which I was referring to there was the #ifdef's for
the deparse branch.  I thought it was pretty clear, based on the
feedback from multiple people, that all of that really needed to be
taken out as we don't commit code to the main repo which has such
external dependencies.  That wasn't meant as a characterization of the
patch itself but rather a comment on the state of the process and that
I, at least, had been hoping for a new version which addressed those
bits.

 You revised and refined your GRANT proposal in stages, and I tried to
 adapt the code to suit. I'm sorry that my efforts were not fast enough 
 or responsive enough to make you feel that progress was being made. But
 nobody else commented in detail on the GRANT changes except to express
 general misgivings, and nobody else even disagreed when I inferred, in
 light of the lack of consensus that Robert pointed out, that the code
 was unlikely to make it into 9.5.

Progress was being made and I certainly appreciate your efforts.  I
don't think anyone would want to stand up and say it's going to be in
9.5 or not be in 9.5 which is likely why you didn't get any reaction
from your comment about it being unlikely.  I'm certainly hopeful that
something gets in along these lines as it's a pretty big capability that
we're currently missing.

 Given that I've maintained the code over the past year despite its being
 rejected in an earlier CF, and given the circumstances outlined above, I
 do not think it's reasonable to conclude after a couple of weeks without
 a new version that it was abandoned. As I had mentioned earlier, there
 are people who already use pgaudit as-is, and complain if I break it.

For my part, at least, I didn't intend to characterize it as abandoned
but rather that it simply didn't seem to be moving forward, perhaps due
to a lack of time to work on it or other priorities; I didn't mean to
imply that you wouldn't be continuing to maintain it.  As for the
comment about people depending on it, I'm not sure what that's getting
at, but I don't think that's really a consideration for us as it relates
to having a contrib module to provide this capability.  We certainly
want to consider what capabilities users are looking for and make sure
we cover those cases, if possible, but this is at a development stage
with regard to core and therefore things may break for existing users.

If that's an issue for your users then it would probably be good to have
a clean distinction between the stable code you're providing to them for
auditing and what's being submitted for inclusion in core, with an
expectation that users will need to make some changes when they switch
to the version which is included in core.

 Anyway, I guess there is no such thing as a constructive history
 discussion, so I'll drop it.

It's not my intent to continue this as I certainly agree that it seems
unlikely to be a constructive use of time, but I wanted to reply and
clarify that it wasn't my intent to characterize pgaudit as abandoned
and to apologize for my comments coming across that way.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Commit fest 2015-12 enters money time

2015-02-18 Thread Magnus Hagander
On Wed, Feb 18, 2015 at 12:34 AM, Michael Paquier michael.paqu...@gmail.com
 wrote:

 On Wed, Feb 18, 2015 at 5:55 AM, Corey Huinker corey.huin...@gmail.com
 wrote:
  What is required to get the New Patch superpower? I'm also in need of it.

 CCing Magnus... I am not sure myself, but I would imagine that the
 commit fest needs to be Open and not In Process to be able to add
 patches.


Sorry, I was out on a long travel day with lots of airplane time. I had
this thread on my watch list but hadn't time to fix it.

So the basic reason is that there was no Open commitfest available. Only
Commitfest Managers are allowed to break the rules and add patches to the
wrong commitfest - this is why Michael was able to do that, and pretty much
nobody else.

There's supposed to always be one open commitfest. Since none was added
when 2015-02 was switched to In Progress (probably because I didn't
actually tell Michael about it), there was nowhere to post new patches.

I have fixed this now by creating 2015-06, which is open for submissions.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] Commit fest 2015-12 enters money time

2015-02-18 Thread Magnus Hagander
On Tue, Feb 17, 2015 at 12:34 AM, Michael Paquier michael.paqu...@gmail.com
 wrote:

 On Tue, Feb 17, 2015 at 7:39 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
  2015-02-16 12:25 GMT+09:00 Michael Paquier michael.paqu...@gmail.com:
 
 
  On Fri, Feb 13, 2015 at 10:06 AM, Michael Paquier
  michael.paqu...@gmail.com wrote:
 
  In order to move on to the next CF, I am going to go through all the
  remaining patches and update their status accordingly. And sorry for
  slacking a bit regarding that. If you wish to complain, of course feel
  free to post messages on this thread or on the thread related to the
  patch itself.
 
 
  As all the entries have been either moved or updated, CF 2014-12 is
 closed,
  and CF 2015-02 has been changed to In Progress.
 
  I tried to add my name on the CF entry for the Join pushdown support
  for foreign tables patch, however, it was unintentionally marked as
  rejected on the last CF, even though it should be returned with
 feedback.
 
  https://commitfest.postgresql.org/3/20/
 
  Is it available to move it to the current CF?

 I don't recall you can...


You're correct, you can't. It would probably make sense for at least a CF
manager to be able to do that - added to my TODO.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] deparsing utility commands

2015-02-18 Thread Andres Freund
Hi,

On 2015-02-15 01:48:15 -0300, Alvaro Herrera wrote:
 This is a repost of the patch to add CREATE command deparsing support to
 event triggers.  It now supports not only CREATE but also ALTER and
 other commands such as GRANT/REVOKE, COMMENT ON and SECURITY LABEL.
 This patch series is broken up in a rather large number of patches, but
 my intention would be to commit separately only the first 6 patches; the
 remaining parts are split up only so that it's easy to see how deparsing
 each patch is implemented, and would be committed as a single changeset
 (but before that I need a bunch of extra fixes and additions to other
 parts.)

I think you should just go ahead and commit 0001, 0002, 0006. Those have
previously been discussed and seem like a good idea and uncontroversial
even if the rest of the series doesn't go in.

I think the GRANT/REVOKE, COMMENT ON and SECURITY LABEL changes are good
independently as well, but there previously have been raised some
concerns about shared objects. I think the answer in the patches which
is to raise events when affecting database local objects makes sense,
but others might disagree.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-02-18 Thread Andres Freund
On 2015-02-18 16:59:26 +0530, Amit Kapila wrote:
 On Tue, Feb 17, 2015 at 9:52 PM, Andres Freund and...@2ndquadrant.com
 wrote:
  A query whose runetime is dominated by a sequential scan (+ attached
  filter) is certainly going to require a bigger prefetch size than one
  that does other expensive stuff.
 
  Imagine parallelizing
  SELECT * FROM largetable WHERE col = low_cardinality_value;
  and
  SELECT *
  FROM largetable JOIN gigantic_table ON (index_nestloop_condition)
  WHERE col = high_cardinality_value;
 
  The first query will be a simple sequential and disk reads on largetable
  will be the major cost of executing it.  In contrast the second query
  might very well sensibly be planned as a parallel sequential scan with
  the nested loop executing in the same worker. But the cost of the
  sequential scan itself will likely be completely drowned out by the
  nestloop execution - index probes are expensive/unpredictable.

 I think the work/task given to each worker should be as granular
 as possible to make it more predictable.
 I think the better way to parallelize such a work (Join query) is that
 first worker does sequential scan and filtering on large table and
 then pass it to next worker for doing join with gigantic_table.

I'm pretty sure that'll result in rather horrible performance. IPC is
rather expensive, you want to do as little of it as possible.

  
   I think it makes sense to think of a set of tasks in which workers can
   assist.  So you a query tree which is just one query tree, with no
   copies of the nodes, and then there are certain places in that query
   tree where a worker can jump in and assist that node.  To do that, it
   will have a copy of the node, but that doesn't mean that all of the
   stuff inside the node becomes shared data at the code level, because
   that would be stupid.
 
  My only problem with that description is that I think workers will
  have to work on more than one node - it'll be entire subtrees of the
  executor tree.

 There could be some cases where it could be beneficial for worker
 to process a sub-tree, but I think there will be more cases where
 it will just work on a part of node and send the result back to either
 master backend or another worker for further processing.

I think many parallelism projects start out that way, and then notice
that it doesn't parallelize very efficiently.

The most extreme example, but common, is aggregation over large amounts
of data - unless you want to ship huge amounts of data between processes
eto parallize it you have to do the sequential scan and the
pre-aggregate step (that e.g. selects count() and sum() to implement a
avg over all the workers) inside one worker.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2015-02-18 Thread Andrew Dunstan


On 02/17/2015 08:21 PM, Peter Eisentraut wrote:

On 1/20/15 6:32 PM, David G Johnston wrote:

In fact, as far as
the database knows, the values provided to this function do represent an
entire population and such a correction would be unnecessary.  I guess it
boils down to whether future queries are considered part of the population
or whether the population changes upon each query being run and thus we are
calculating the ever-changing population variance.

I think we should be calculating the population variance.  We are
clearly taking the population to be all past queries (from the last
reset point).  Otherwise calculating min and max wouldn't make sense.




The difference is likely to be very small in any case where you actually 
want to examine the standard deviation, so I feel we're rather arguing 
about how many angels can fit on the head of a pin, but if this is the 
consensus I'll change it.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal : REINDEX xxx VERBOSE

2015-02-18 Thread Sawada Masahiko
   VACUUM [FULL] [FREEZE] ... [ANALYZE] [tname [(cname, ...)]
 | VACUUM [({FULL [bool]|FREEZE [bool]|...}[,...])] [tname [(cname, ...)]
 | VACUUM [tname [(cname, ...)] [[WITH ]({FULL [bool]|FREEZE [bool]|...})]

 REINDEX [{INDEX|TABLE|...}] name [[WITH] (VERBOSE [bool]|...)]

 EXPLAIN [[WITH] ({ANALYZE [bool]|VERBOSE [bool]|... [,...]})] statement


I don't think (OptionName [bool]) style like (VERBOSE on, FORCE
on) is needed for REINDEX command.
EXPLAIN command has such option style because it has the FORMAT option
can have value excepting ON/TRUE or OFF/FALSE.(e.g., TEXT, XML)
But the value of REINDEX command option can have only ON or OFF.
I think the option name is good enough.

Next, regarding of the location of such option, the several
maintenance command like CLUSTER, VACUUM has option at immediately
after command name.
From consistency perspective,  I tend to agree with Robert to put
option at immediately after command name as follows.
REINDEX [(VERBOSE | FORCE)] {INDEX | ...} name;

Btw how long will the FORCE command available?

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auditing extension for PostgreSQL (Take 2)

2015-02-18 Thread David Steele
On 2/18/15 8:25 AM, Simon Riggs wrote:
 On 15 February 2015 at 02:34, David Steele da...@pgmasters.net wrote:
 
 I've posted a couple of messages over the last few weeks about the work
 I've been doing on the pg_audit extension.  The lack of response could
 be due to either universal acclaim or complete apathy, but in any case I
 think this is a very important topic so I want to give it another try.
 
 You mentioned you had been following the thread for some time and yet
 had not contributed to it. Did that indicate your acclaim for the
 earlier patch, or was that apathy? I think neither.

In my case it actually was acclaim.  I was happy with the direction
things were going and had nothing in particular to add - and I didn't
think a +1 from me was going to carry any weight with the community.

I can see now that everyone's opinion matters here, so I'll be more
active about weighing in when I think something is valuable.

 
 People have been working on this feature for 9 months now, so you
 having to wait 9 days for a response is neither universal acclaim, nor
 apathy. I've waited much longer than that for Stephen to give the
 review he promised, but have not bad mouthed him for that wait, nor do
 I do so now. In your first post you had removed the author's email
 addresses, so they were likely unaware of your post. I certainly was.

I understand that, but with the CF closing I felt like I had to act.
Abhijit's last comment on the thread was that he was no longer going to
work on it in relation to 9.5.  I felt that it was an important feature
(and one that I have a lot of interest in), so that's when I got involved.

I posted two messages, but I only addressed one of them directly to
Abhijit.  As you said, I'm new here and I'm still getting used to the
way things are done.

 I've extensively reworked the code that was originally submitted by
 2ndQuandrant.  This is not an indictment of their work, but rather an
 attempt to redress concerns that were expressed by members of the
 community.  I've used core functions to determine how audit events
 should be classified and simplified and tightened the code wherever
 possible.  I've removed deparse and event triggers and opted for methods
 that rely only on existing hooks.  In my last message I provided
 numerous examples of configuration, usage, and output which I hoped
 would alleviate concerns of complexity.  I've also written a ton of unit
 tests to make sure that the code works as expected.
 
 Some people that have contributed ideas to this patch are from
 2ndQuadrant, some are not. The main point is that we work together on
 things, rather than writing a slightly altered version and then
 claiming credit.
 
 If you want to help, please do. We give credit where its due, not to
 whoever touched the code last in some kind of bidding war. If we let
 this happen, we'd generate a flood of confusing patch versions and
 little would ever get committed.

Agreed, and I apologize if I came off that way.  It certainly wasn't my
intention.  I was hesitant because I had made so many changes and I
wasn't sure how the authors would feel about it.  I wrote to them
privately to get their take on the situation.

 Let's keep to one thread and work to include everybody's ideas then
 we'll get something useful committed.

I'm a little confused about how to proceed here.  I created a new thread
because the other patch had already been rejected.  How should I handle
that?

-- 
- David Steele
da...@pgmasters.net



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

2015-02-18 Thread David Steele
Hi Fujii,

Thanks for taking a look at the patch.  Comments below:

On 2/18/15 6:11 AM, Fujii Masao wrote:
 On Wed, Feb 18, 2015 at 1:26 AM, David Steele da...@pgmasters.net wrote:
 On 2/17/15 10:23 AM, Simon Riggs wrote:
 I vote to include pgaudit in 9.5, albeit with any changes. In
 particular, David may have some changes to recommend, but I haven't
 seen a spec or a patch, just a new version of code (which isn't how we
 do things...).

 I submitted the new patch in my name under a separate thread Auditing
 extension for PostgreSQL (Take 2) (54e005cc.1060...@pgmasters.net)
 
 I played the patch version of pg_audit a bit and have basic comments about
 its spec.
 
 The pg_audit doesn't log BIND parameter values when prepared statement is 
 used.
 Seems this is an oversight of the patch. Or is this intentional?

It's actually intentional - following the model I talked about in my
earlier emails, the idea is to log statements only.  This also follows
on 2ndQuadrant's implementation.

Logging values is interesting, but I'm sure the user would want to
specify which columns to log, which I felt was beyond the scope of the
patch.

 The pg_audit cannot log the statement like SELECT 1 which doesn't access to
 the database object. Is this intentional? I think that there are many users 
 who
 want to audit even such statement.

I think I see how to make this work.  I'll work on it for the next
version of the patch.

 
 Imagine the case where you call the user-defined function which executes
 many nested statements. In this case, pg_audit logs only top-level statement
 (i.e., issued directly by client) every time nested statement is executed.
 In fact, one call of such UDF can cause lots of *same* log messages. I think
 this is problematic.

I agree - not sure how to go about addressing it, though.  I've tried to
cut down on the verbosity of the logging in general, but of course it
can still be a problem.

Using security definer and a different logging GUC for the defining role
might work.  I'll add that to my unit tests and see what happens.

I appreciate your feedback!

-- 
- David Steele
da...@pgmasters.net



signature.asc
Description: OpenPGP digital signature


[HACKERS] array_push is just stupid ...

2015-02-18 Thread Tom Lane
While hacking around with expanded arrays, I realized that array_push()
is just a horribly badly designed function.  There is no reason at all
for the same C function to underlie both array_append and array_prepend:
the function spends significant time and effort reverse-engineering which
way it was called, and there's very little logic-in-common that would
justify merging the two code paths.  If we split it apart into separate
array_append and array_prepend functions, I think we'd actually end up
with less code.  And it would certainly be noticeably faster because
we'd not need to be doing get_fn_expr_argtype and get_element_type (the
latter of which is a syscache lookup, hence not so cheap) on *both*
arguments every time through.  (Admittedly, some of that could be bought
back with a bit more effort on locally caching the result, but it would be
better to know which input is the array a priori.)

Barring objections, I'm going to go fix this independently of the
expanded-array changes.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Expanding the use of FLEXIBLE_ARRAY_MEMBER for declarations like foo[1]

2015-02-18 Thread Michael Paquier
On Thu, Feb 19, 2015 at 7:29 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Michael Paquier michael.paqu...@gmail.com writes:
 On Wed, Feb 18, 2015 at 10:09 PM, Andres Freund and...@2ndquadrant.com 
 wrote:
 The compiler will complain if you use a FLEXIBLE_ARRAY_MEMBER in the
 middle of a struct but not when when you embed a struct that uses it
 into the middle another struct. At least gcc doesn't and I think it'd be
 utterly broken if another compiler did that. If there's a compiler that
 does so, we need to make it define FLEXIBLE_ARRAY_MEMBER to 1.

 clang does complain on my OSX laptop regarding that ;)

 I'm a bit astonished that gcc doesn't consider this an error.  Sure seems
 like it should.  (Has anyone tried it on recent gcc?)

Just tried with gcc 4.9.2 on an ArchLinux bix and it does not complain
after switching the declaration of varlena declaration from [1] to
FLEXIBLE_ARRAY_MEMBER in c.h on HEAD. But it does with clang 3.5.1 on
the same box.

 I am entirely
 opposed to Andreas' claim that we ought to consider compilers that do warn
 to be broken; if anything it's the other way around.

I'm on board with that.

 Moreover, if we have any code that is assuming such cases are okay, it
 probably needs a second look.  Isn't this situation effectively assuming
 that a variable-length array is fixed-length?

AFAIK, switching a bunch of things to use FLEXIBLE_ARRAY_MEMBER has
put a couple of things in light that could be revisited:
1) tuptoaster.c, with this declaration of varlena:
struct
{
struct varlena hdr;
chardata[TOAST_MAX_CHUNK_SIZE]; /* make
struct big enough */
int32   align_it;   /* ensure struct is
aligned well enough */
}   chunk_data;
2) inv_api.c with this thing:
struct
{
bytea   hdr;
chardata[LOBLKSIZE];/* make struct
big enough */
int32   align_it;   /* ensure struct is
aligned well enough */
}   workbuf;
3) heapam.c in three places with HeapTupleHeaderData:
struct
{
HeapTupleHeaderData hdr;
chardata[MaxHeapTupleSize];
}   tbuf;
4) pg_authid.h with its use of relpasswd.
5) reorderbuffer.h with its use of HeapTupleHeaderData:
typedef struct ReorderBufferTupleBuf
{
/* position in preallocated list */
slist_node  node;

/* tuple, stored sequentially */
HeapTupleData tuple;
HeapTupleHeaderData header;
chardata[MaxHeapTupleSize];
} ReorderBufferTupleBuf;

Those issues can be grouped depending on where foo[1] is switched to
FLEXIBLE_ARRAY_MEMBER, so I will try to get a set of patches depending
on that.

Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication identifiers, take 4

2015-02-18 Thread Petr Jelinek

On 16/02/15 10:46, Andres Freund wrote:

On 2015-02-16 11:34:10 +0200, Heikki Linnakangas wrote:


At a quick glance, this basic design seems workable. I would suggest
expanding the replication IDs to regular 4 byte oids. Two extra bytes is a
small price to pay, to make it work more like everything else in the system.


I don't know. Growing from 3 to 5 byte overhead per relevant record (or
even 0 to 5 in case the padding is reused) is rather noticeable. If we
later find it to be a limit (I seriously doubt that), we can still
increase it in a major release without anybody really noticing.



I agree that limiting the overhead is important.

But I have related though about this - I wonder if it's worth to try to 
map this more directly to the CommitTsNodeId. I mean it is currently 
using that for the actual storage, but I am thinking of having the 
Replication Identifiers be the public API and the CommitTsNodeId stuff 
be just hidden implementation detail. This thought is based on the fact 
that CommitTsNodeId provides somewhat meaningless number while 
Replication Identifiers give us nice name for that number. And if we do 
that then the type should perhaps be same for both?


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the stats snapshot timestamp to SQL

2015-02-18 Thread Tomas Vondra
Hi there,

On 30.1.2015 06:27, Matt Kelly wrote:
 ​Actually, I just did one more code review of myself, and somehow missed
 that I submitted the version with the wrong oid.  The oid used in the
 first version is wrong (1) and was from before I read the docs on
 properly picking one.
 
 Attached is the fixed version. (hopefully with the right mime-type and
 wrong extension.  Alas, gmail doesn't let you set mime-types; time to
 find a new email client...)

I do have a question regarding the patch, although I see the patch is
already marked as 'ready for committer' (sorry, somehow managed to miss
the patch until now).

I see the patch only works with the top-level snapshot timestamp, stored
in globalStats, but since 9.3 (when the stats were split into per-db
files) we track per-database timestamps too.

Shouldn't we make those timestamps accessible too? It's not necessary
for detecting unresponsive statistics collector (if it's stuck it's not
writing anything, so the global timestamp will be old too), but it seems
more appropriate for querying database-level stats to query
database-level timestamp too.

But maybe that's not necessary, because to query database stats you have
to be connected to that particular database and that should write fresh
stats, so the timestamps should not be very different.

regards

-- 
Tomas Vondrahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow snapshot too old error, to prevent bloat

2015-02-18 Thread Magnus Hagander
On Wed, Feb 18, 2015 at 10:57 PM, Kevin Grittner kgri...@ymail.com wrote:

 Magnus Hagander mag...@hagander.net wrote:
  On Feb 17, 2015 12:26 AM, Andres Freund and...@2ndquadrant.com
 wrote:
  On 2015-02-16 16:35:46 -0500, Bruce Momjian wrote:
  But max_standby_streaming_delay, max_standby_archive_delay and
  hot_standby_feedback are among the most frequent triggers for
  questions and complaints that I/we see.
 
  Agreed.
  And a really bad one used to be vacuum_defer_cleanup_age, because
  of confusing units amongst other things. Which in terms seems
  fairly close to Kevins suggestions, unfortunately.

 Particularly my initial suggestion, which was to base snapshot
 age it on the number of transaction IDs assigned.  Does this look
 any better to you if it is something that can be set to '20min' or
 '1h'?  Just to restate, that would not automatically cancel the
 snapshots past that age; it would allow vacuum of any tuples which
 became dead that long ago, and would cause a snapshot too old
 message for any read of a page modified more than that long ago
 using a snapshot which was older than that.


Yes, it would definitely look much better. My reference per above was
exactly that - having a setting in the unit number of xids confused a lot
of users and made it really hard to tune. Having something in time units is
a lot easier to understand and tune for most people.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2015-02-18 Thread Andrew Dunstan


On 02/18/2015 08:34 PM, David Fetter wrote:

On Tue, Feb 17, 2015 at 08:21:32PM -0500, Peter Eisentraut wrote:

On 1/20/15 6:32 PM, David G Johnston wrote:

In fact, as far as the database knows, the values provided to this
function do represent an entire population and such a correction
would be unnecessary.  I guess it boils down to whether future
queries are considered part of the population or whether the
population changes upon each query being run and thus we are
calculating the ever-changing population variance.

I think we should be calculating the population variance.

Why population variance and not sample variance?  In distributions
where the second moment about the mean exists, it's an unbiased
estimator of the variance.  In this, it's different from the
population variance.




Because we're actually measuring the whole population, and not a sample?

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] anyarray

2015-02-18 Thread Tomas Vondra
On 13.2.2015 16:20, Teodor Sigaev wrote:
 Some of users of intarray contrib module wish to use its features
 with another kind of arrays, not only for int4 type. Suggested
 module generalizes intarray over other (not all) types op pgsql.
 
 Anyarray also provides a calculation of similarity two one
 dimensinal arrays similar to smlar module. Anyarray module doesn't
 provide an something similar to query_int feature of intarray,
 because this feature is very hard to implement (it requires new
 pseudo-type anyquery), it is close to impossible to have operation
 extensibility and it's complicated queries are hidden from pgsql's
 optimizer (like to jsquery). As far I know, query_int isn't very
 popular for now.

Hi Teodor,

I started reviewing this patch today - first of all, kudos for having
944kB of regression tests in 1MB patch ;-)

I've noticed two unrelated files

../src/test/modules/dummy_seclabel/expected/dummy_seclabel.out
../src/test/modules/dummy_seclabel/sql/dummy_seclabel.sql

I suppose that's not intentional, right?

-- 
Tomas Vondrahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Expanding the use of FLEXIBLE_ARRAY_MEMBER for declarations like foo[1]

2015-02-18 Thread Gavin Flower

On 19/02/15 15:00, Tom Lane wrote:

Michael Paquier michael.paqu...@gmail.com writes:

On Thu, Feb 19, 2015 at 7:29 AM, Tom Lane t...@sss.pgh.pa.us wrote:

Moreover, if we have any code that is assuming such cases are okay, it
probably needs a second look.  Isn't this situation effectively assuming
that a variable-length array is fixed-length?

AFAIK, switching a bunch of things to use FLEXIBLE_ARRAY_MEMBER has
put a couple of things in light that could be revisited:
1) tuptoaster.c, with this declaration of varlena:
 struct
 {
 struct varlena hdr;
 chardata[TOAST_MAX_CHUNK_SIZE]; /* make
struct big enough */
 int32   align_it;   /* ensure struct is
aligned well enough */
 }   chunk_data;

I'm pretty sure that thing ought to be a union, not a struct.


2) inv_api.c with this thing:
 struct
 {
 bytea   hdr;
 chardata[LOBLKSIZE];/* make struct
big enough */
 int32   align_it;   /* ensure struct is
aligned well enough */
 }   workbuf;

And probably this too.


3) heapam.c in three places with HeapTupleHeaderData:
 struct
 {
 HeapTupleHeaderData hdr;
 chardata[MaxHeapTupleSize];
 }   tbuf;

And this, though I'm not sure if we'd have to change the size of the
padding data[] member.


5) reorderbuffer.h with its use of HeapTupleHeaderData:

Hmm.  Andres will have to answer for that one ;-)

regards, tom lane



Curious, has this problem been raised with the gcc maintainers?

Is this still a problem with gcc 5.0 (which is due to be released soon)?


Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] INSERT ... ON CONFLICT UPDATE and logical decoding

2015-02-18 Thread Peter Geoghegan
Andres pointed out that the INSERT ... ON CONFLICT UPDATE patch
doesn't work well with logical decoding. Basically, as things stand,
there is no means of determining that an INSERT had an ON CONFLICT
UPDATE clause from logical decoding. It isn't obvious that this
matters - after all, the relevant WAL records are generally consistent
with there being a vanilla INSERT (or vanilla UPDATE). If there was no
possibility of concurrent conflicts, that might actually be the end of
it, but of course there is.

I guess that the best way of fixing this is exposing to output plugins
that a super deletion is a
REORDER_BUFFER_CHANGE_INTERNAL_SUPERDELETE.  This is kind of an
internal ReorderBufferChangeType constant, because it means that the
output plugin should probably just omit the tuple just inserted and
now deleted. I tend to think that it would be best if the fact that a
speculative insertion was a speculative insertion is not exposed. We
just formalize that a REORDER_BUFFER_CHANGE_INTERNAL_SUPERDELETE is a
variant of REORDER_BUFFER_CHANGE_DELETE...it's implicit that things
like triggers are not fired here (which may or may not matter,
depending on the use case, I suppose).

Would that be sufficiently flexible for the various logical
replication use cases? I guess we are somewhat forced to push that
kind of thing into output plugins, because doing so lets them decide
how to handle this. It's a little unfortunate that this implementation
detail is exposed to output plugins, though, which is why I'd be
willing to believe that it'd be better to have transaction reassembly
normalize the records such that a super deleted tuple was never
exposed to output plugins in the first place...they'd only see a
REORDER_BUFFER_CHANGE_INSERT when that was the definitive outcome of
an UPSERT, or alternatively a REORDER_BUFFER_CHANGE_UPDATE when that
was the definitive outcome. No need for output plugins to consider
REORDER_BUFFER_CHANGE_INTERNAL_SUPERDELETE at all.

Thoughts? Can't say that I've given conflict resolution for
multi-master systems a great deal of thought before now, so I might be
quite off the mark here.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] anyarray

2015-02-18 Thread Tomas Vondra
On 19.2.2015 03:14, Tomas Vondra wrote:

 I've noticed two unrelated files

Meh, should be I noticed the patch removes two unrelated files ...

 
 ../src/test/modules/dummy_seclabel/expected/dummy_seclabel.out
 ../src/test/modules/dummy_seclabel/sql/dummy_seclabel.sql
 
 I suppose that's not intentional, right?
 



-- 
Tomas Vondrahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2015-02-18 Thread David Fetter
On Tue, Feb 17, 2015 at 08:21:32PM -0500, Peter Eisentraut wrote:
 On 1/20/15 6:32 PM, David G Johnston wrote:
  In fact, as far as the database knows, the values provided to this
  function do represent an entire population and such a correction
  would be unnecessary.  I guess it boils down to whether future
  queries are considered part of the population or whether the
  population changes upon each query being run and thus we are
  calculating the ever-changing population variance.
 
 I think we should be calculating the population variance.

Why population variance and not sample variance?  In distributions
where the second moment about the mean exists, it's an unbiased
estimator of the variance.  In this, it's different from the
population variance.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Expanding the use of FLEXIBLE_ARRAY_MEMBER for declarations like foo[1]

2015-02-18 Thread Tom Lane
Michael Paquier michael.paqu...@gmail.com writes:
 On Thu, Feb 19, 2015 at 7:29 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Moreover, if we have any code that is assuming such cases are okay, it
 probably needs a second look.  Isn't this situation effectively assuming
 that a variable-length array is fixed-length?

 AFAIK, switching a bunch of things to use FLEXIBLE_ARRAY_MEMBER has
 put a couple of things in light that could be revisited:
 1) tuptoaster.c, with this declaration of varlena:
 struct
 {
 struct varlena hdr;
 chardata[TOAST_MAX_CHUNK_SIZE]; /* make
 struct big enough */
 int32   align_it;   /* ensure struct is
 aligned well enough */
 }   chunk_data;

I'm pretty sure that thing ought to be a union, not a struct.

 2) inv_api.c with this thing:
 struct
 {
 bytea   hdr;
 chardata[LOBLKSIZE];/* make struct
 big enough */
 int32   align_it;   /* ensure struct is
 aligned well enough */
 }   workbuf;

And probably this too.

 3) heapam.c in three places with HeapTupleHeaderData:
 struct
 {
 HeapTupleHeaderData hdr;
 chardata[MaxHeapTupleSize];
 }   tbuf;

And this, though I'm not sure if we'd have to change the size of the
padding data[] member.

 5) reorderbuffer.h with its use of HeapTupleHeaderData:

Hmm.  Andres will have to answer for that one ;-)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Odd behavior of updatable security barrier views on foreign tables

2015-02-18 Thread Etsuro Fujita

On 2015/02/18 21:44, Stephen Frost wrote:

* Etsuro Fujita (fujita.ets...@lab.ntt.co.jp) wrote:

On 2015/02/18 7:44, Stephen Frost wrote:

Attached is a patch which should address this.  Would love your (or
anyone else's) feedback on it.  It appears to address the issue which
you raised and the regression test changes are all in-line with
inserting a LockRows into the subquery, as anticipated.


I've looked into the patch.

* The patch applies to the latest head, 'make' passes successfully,
but 'make check' fails in the rowsecurity test.


Apologies for not being clear- the patch was against 9.4, where it
passes all the regression tests (at least for me- if you see
differently, please let me know!).


Sorry, I assumed that the patch was against HEAD.  I comfermed that the 
back-patched 9.4 passes all the regression tests!


Best regards,
Etsuro Fujita


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow snapshot too old error, to prevent bloat

2015-02-18 Thread Kevin Grittner
Stephen Frost sfr...@snowman.net wrote:

 I also agree with the general idea that it makes sense to provide a way
 to control bloat, but I think you've missed what Andres was getting at
 with his suggestion (if I understand correctly, apologies if I don't).

 The problem is that we're only looking at the overall xmin / xmax
 horizon when it comes to deciding if a given tuple is dead.  That's
 not quite right- the tuple might actually be dead to all *current*
 transactions by being newer than the oldest transaction but dead for all
 later transactions.  Basically, there exist gaps between our cluster
 wide xmin / xmax where we might find actually dead rows.  Marking those
 rows dead and reusable *would* stop the bloat, not just slow it down.

 In the end, with a single long-running transaction, the worst bloat you
 would have is double the size of the system at the time the long-running
 transaction started.

I agree that limiting bloat to one dead tuple for every live one
for each old snapshot is a limit that has value, and it was unfair
of me to characterize that as not being a limit.  Sorry for that.

This possible solution was discussed with the user whose feedback
caused me to write this patch, but there were several reasons they
dismissed that as a viable total solution for them, two of which I
can share:

(1)  They have a pool of connections each of which can have several
long-running cursors, so the limit from that isn't just doubling
the size of their database, it is limiting it to some two-or-three
digit multiple of the necessary size.

(2)  They are already prepared to deal with snapshot too old
errors on queries that run more than about 20 minutes and which
access tables which are modified.  They would rather do that than
suffer the bloat beyond that point.

IMO all of these changes people are working are very valuable, and
complement each other.  This particular approach is likely to be
especially appealing to those moving from Oracle because it is a
familiar mechanism, and one which some of them have written their
software to be aware of and deal with gracefully.

 I'm not against having a knob like this, which is defaulted to off,

Thanks!  I'm not sure that amounts to a +1, but at least it doesn't
sound like a -1.  :-)

 but I do think we'd be a lot better off with a system that could
 realize when rows are not visible to any currently running transaction
 and clean them up.

+1

But they are not mutually exclusive; I see them as complementary.

 If this knob's default is off then I don't think
 we'd be likely to get the complaints which are being discussed (or, if
 we did, we could point the individual at the admin who set the knob...).

That's how I see it, too.  I would not suggest making the default
anything other than off, but there are situations where it would
be a nice tool to have in the toolbox.

Thanks for the feedback!

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow snapshot too old error, to prevent bloat

2015-02-18 Thread Greg Stark
On Sun, Feb 15, 2015 at 8:27 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 There might be something in that, but again it's not much like this patch.
 The key point I think we're both making is that nondeterministic failures
 are bad, especially when you're talking about long-running, expensive-to-
 retry transactions.


But I think Kevin would agree with you there. That's why he's interested in
having the errors not occur if you don't read from the volatile tables. Ie,
your reporting query reading from read-only tables would be guaranteed to
succeed even if some other table had had some rows vacuumed away.

I'm not sure that's really going to work because of things like hint bit
updates or hot pruning. That is, say your table is read-only now but last
week some of the records were updated. You might reasonably expect those
rows to be safe and indeed the rows themselves will still be in your
report. But the earlier versions of the rows could still be sitting on some
pages invisible to every query but not vacuumable quite yet and then
suddenly they get vacuumed away and the LSN on the page gets bumped.

Fwiw I would strongly suggest that instead of having a number of
transactions to have a time difference. I haven't been following the
commit timestamp thread but I'm hoping that patch has the infrastructure
needed to look up an lsn and find out the timestamp and vice versa. So when
you take a snapshot you could look up the effective cut-off LSN based on
the time difference specified in the GUC. As a DBA I would find it
infinitely more comforting specifying old_snapshot_threshold=4h than
specifying some arbitrary large number of transactions and hoping I
calculated the transaction rate reasonably accurately.


-- 
greg


Re: [HACKERS] Allow snapshot too old error, to prevent bloat

2015-02-18 Thread Stephen Frost
* Kevin Grittner (kgri...@ymail.com) wrote:
 Stephen Frost sfr...@snowman.net wrote:
  I also agree with the general idea that it makes sense to provide a way
  to control bloat, but I think you've missed what Andres was getting at
  with his suggestion (if I understand correctly, apologies if I don't).
 
  The problem is that we're only looking at the overall xmin / xmax
  horizon when it comes to deciding if a given tuple is dead.  That's
  not quite right- the tuple might actually be dead to all *current*
  transactions by being newer than the oldest transaction but dead for all
  later transactions.  Basically, there exist gaps between our cluster
  wide xmin / xmax where we might find actually dead rows.  Marking those
  rows dead and reusable *would* stop the bloat, not just slow it down.
 
  In the end, with a single long-running transaction, the worst bloat you
  would have is double the size of the system at the time the long-running
  transaction started.
 
 I agree that limiting bloat to one dead tuple for every live one
 for each old snapshot is a limit that has value, and it was unfair
 of me to characterize that as not being a limit.  Sorry for that.
 
 This possible solution was discussed with the user whose feedback
 caused me to write this patch, but there were several reasons they
 dismissed that as a viable total solution for them, two of which I
 can share:
 
 (1)  They have a pool of connections each of which can have several
 long-running cursors, so the limit from that isn't just doubling
 the size of their database, it is limiting it to some two-or-three
 digit multiple of the necessary size.

This strikes me as a bit off-the-cuff; was an analysis done which
deteremined that would be the result?  If there is overlap between the
long-running cursors then there would be less bloat, and most systems
which I'm familiar with don't turn the entire database over in 20
minutes, 20 hours, or even 20 days except in pretty specific cases.
Perhaps this is one of those, and if so then I'm all wet, but the
feeling I get is that this is a way to dismiss this solution because
it's not what's wanted, which is what Oracle did.

 (2)  They are already prepared to deal with snapshot too old
 errors on queries that run more than about 20 minutes and which
 access tables which are modified.  They would rather do that than
 suffer the bloat beyond that point.

That, really, is the crux here- they've already got support for dealing
with it the way Oracle did and they'd like PG to do that too.
Unfortunately, that, by itself, isn't a good reason for a particular
capability (we certainly aren't going to be trying to duplicate PL/SQL
in PG any time soon).  That said, there are capabilities in other
RDBMS's which are valuable and which we *do* want, so the fact that
Oracle does this also isn't a reason to not include it.

 IMO all of these changes people are working are very valuable, and
 complement each other.  This particular approach is likely to be
 especially appealing to those moving from Oracle because it is a
 familiar mechanism, and one which some of them have written their
 software to be aware of and deal with gracefully.

For my 2c, I'd much rather provide them with a system where they don't
have to deal with broken snapshots than give them a way to have them the
way Oracle provided them. :)  That said, even the approach Andres
outlined will cause bloat and it may be beyond what's acceptable in some
environments, and it's certainly more complicated and unlikely to get
done in the short term.

  I'm not against having a knob like this, which is defaulted to off,
 
 Thanks!  I'm not sure that amounts to a +1, but at least it doesn't
 sound like a -1.  :-)

So, at the time I wrote that, I wasn't sure if it was a +1 or not
myself.  I've been thinking about it since then, however, and I'm
leaning more towards having the capability than not, so perhaps that's a
+1, but it doesn't excuse the need to come up with an implementation
that everyone can be happy with and what you've come up with so far
doesn't have a lot of appeal, based on the feedback (I've only glanced
through it myself, but I agree with Andres and Tom that it's a larger
footprint than we'd want for this and the number of places having to be
touched is concerning as it could lead to future bugs).

A lot of that would go away if there was a way to avoid having to mess
with the index AMs, I'd think, but I wonder if we'd actually need more
done there- it's not immediately obvious to me how an index-only scan is
safe with this.  Whenever an actual page is visited, we can check the
LSN, and the relation can't be truncated by vacuum since the transaction
will still have a lock on the table which prevents it, but does the
visibility-map update check make sure to never mark pages all-visible
when one of these old transactions is running around?  On what basis?

  but I do think we'd be a lot better off with a system that could
  realize when rows