Re: [HACKERS] Rename max_parallel_degree?

2016-06-02 Thread Albe Laurenz
Robert Haas wrote:
> On Wed, Jun 1, 2016 at 5:29 PM, Tom Lane  wrote:
> > Robert Haas  writes:
> >> I've largely given up hope of coming up with an alternative that can
> >> attract more than one vote and that is also at least mildly accurate,
> >> but one idea is max_parallel_workers_per_gather_node.  That will be
> >> totally clear.
> >
> > Given the reference to Gather nodes, couldn't you drop the word
> > "parallel"?  "node" might not be necessary either.
> 
> Well, I think we could drop node, if you like.  I think parallel
> wouldn't be good to drop, though, because it sounds like we want a
> global limit on parallel workers also, and that can't be just
> max_workers.  So I think we should keep parallel in there for all of
> them, and have max_parallel_workers and
> max_parallel_workers_per_gather(_node).  The reloption and the Path
> struct field can be parallel_workers rather than parallel_degree.

I believe that it will be impossible to find a name that makes
the meaning clear to everybody.  Those who do not read the documentation
will always find a way to misunderstand it.

These suggestions have the added disadvantage that it is hard
to remember them.  I see myself going, "I have to change the maximum
for parallel workers, what was the name again?", and having to resort
to the manual (or SHOW ALL) each time.

I suggest to follow the precedent of "work_mem", stick with
something simple like "max_parallel_workers" and accept the risk
of not being totally self-explanatory.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] BUG #14155: bloom index error with unlogged table

2016-06-02 Thread Michael Paquier
On Fri, Jun 3, 2016 at 3:11 PM, Michael Paquier
 wrote:
> On Fri, Jun 3, 2016 at 1:03 PM, Tom Lane  wrote:
>> I wrote:
>>> Jeff Janes  writes:
 My biggest gripe with it at the moment is that the signature size should be
 expressed in bits, and then internally rounded up to a multiple of 16,
 rather than having it be expressed in 'uint16'.
 If that were done it would be easier to fix the documentation to be more
 understandable.
>>
>>> +1 ... that sort of definition seems much more future-proof, too.
>>> IMO it's not too late to change this.  (We probably don't want to change
>>> the on-disk representation of the reloptions, but we could convert from
>>> bits to words in bloptions().)
>>
>> There were no objections to this, but also no action.  Attached is a draft
>> patch ... any complaints?
>
> None. This looks rather sane to me.

Actually, the docs could be more polished. "Limitation" should be
changed to "Limitations", and "Opclass interface" to "Operator Class
Interface". The current wording "Seqscan is slow" is not clear, this
should mention a sequential scan instead. And it is not that slow,
just slower than the heap index scan of bloom..
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] BUG #14155: bloom index error with unlogged table

2016-06-02 Thread Michael Paquier
On Fri, Jun 3, 2016 at 1:03 PM, Tom Lane  wrote:
> I wrote:
>> Jeff Janes  writes:
>>> My biggest gripe with it at the moment is that the signature size should be
>>> expressed in bits, and then internally rounded up to a multiple of 16,
>>> rather than having it be expressed in 'uint16'.
>>> If that were done it would be easier to fix the documentation to be more
>>> understandable.
>
>> +1 ... that sort of definition seems much more future-proof, too.
>> IMO it's not too late to change this.  (We probably don't want to change
>> the on-disk representation of the reloptions, but we could convert from
>> bits to words in bloptions().)
>
> There were no objections to this, but also no action.  Attached is a draft
> patch ... any complaints?

None. This looks rather sane to me.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perf Benchmarking and regression.

2016-06-02 Thread Andres Freund
On 2016-06-03 01:57:33 -0400, Noah Misch wrote:
> > Which means that transactional workloads that are bigger than the OS
> > memory, or which have a non-uniform distribution leading to some
> > locality, are likely to be faster. In practice those are *hugely* more
> > likely than the uniform distribution that pgbench has.
> 
> That is formally true; non-benchmark workloads rarely issue uniform writes.
> However, enough non-benchmark workloads have too little locality to benefit
> from caches.  Those will struggle against *_flush_after like uniform writes
> do, so discounting uniform writes wouldn't simplify this project.

But such workloads rarely will hit the point of constantly re-dirtying
already dirty pages in kernel memory within 30s.


> Today's defaults for *_flush_after greatly smooth and accelerate performance
> for one class of plausible workloads while greatly slowing a different class
> of plausible workloads.

I don't think checkpoint_flush_after is in that class, due to the
fsync()s we already emit at the end of checkpoints.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perf Benchmarking and regression.

2016-06-02 Thread Noah Misch
On Wed, Jun 01, 2016 at 03:33:18PM -0700, Andres Freund wrote:
> On 2016-05-31 16:03:46 -0400, Robert Haas wrote:
> > On Fri, May 27, 2016 at 12:37 AM, Andres Freund  wrote:
> > > I don't think the situation is quite that simple. By *disabling* backend 
> > > flushing it's also easy to see massive performance regressions.  In 
> > > situations where shared buffers was configured appropriately for the 
> > > workload (not the case here IIRC).
> > 
> > On what kind of workload does setting backend_flush_after=0 represent
> > a large regression vs. the default settings?
> > 
> > I think we have to consider that pgbench and parallel copy are pretty
> > common things to want to do, and a non-zero default setting hurts
> > those workloads a LOT.
> 
> I don't think pgbench's workload has much to do with reality. Even less
> so in the setup presented here.
> 
> The slowdown comes from the fact that default pgbench randomly, but
> uniformly, updates a large table. Which is slower with
> backend_flush_after if the workload is considerably bigger than
> shared_buffers, but, and that's a very important restriction, the
> workload at the same time largely fits in to less than
> /proc/sys/vm/dirty_ratio / 20% (probably even 10% /
> /proc/sys/vm/dirty_background_ratio) of the free os memory.

Looking at some of the top hits for 'postgresql shared_buffers':

https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
https://www.postgresql.org/docs/current/static/runtime-config-resource.html
http://rhaas.blogspot.com/2012/03/tuning-sharedbuffers-and-walbuffers.html
https://www.keithf4.com/a-large-database-does-not-mean-large-shared_buffers/
http://www.cybertec.at/2014/02/postgresql-9-3-shared-buffers-performance-1/

Choices mentioned (some in comments on a main post):

1. .25 * RAM
2. min(8GB, .25 * RAM)
3. Sizing procedure that arrived at 4GB for 900GB of data
4. Equal to data size

Thus, it is not outlandish to have the write portion of a working set exceed
shared_buffers while remaining under 10-20% of system RAM.  Choice (4) won't
achieve that, but (2) and (3) may achieve it given a mere 64 GiB of RAM.
Choice (1) can go either way; if read-mostly data occupies half of
shared_buffers, then writes passing through the other 12.5% of system RAM may
exhibit the property you describe.

Incidentally, a typical reason for a site to use low shared_buffers is to
avoid the latency spikes that *_flush_after combat:
https://www.postgresql.org/message-id/flat/4DDE270502250003DD4F%40gw.wicourts.gov

> > I have a really hard time believing that the benefits on other
> > workloads are large enough to compensate for the slowdowns we're
> > seeing here.
> 
> As a random example, without looking for good parameters, on my laptop:
> pgbench -i -q -s 1000
> 
> Cpu: i7-6820HQ
> Ram: 24GB of memory
> Storage: Samsung SSD 850 PRO 1TB, encrypted
> postgres -c shared_buffers=6GB -c backend_flush_after=128 -c 
> max_wal_size=100GB -c fsync=on -c synchronous_commit=off
> pgbench -M prepared -c 16 -j 16 -T 520 -P 1 -n -N
> (note the -N)
> disabled:
> latency average = 2.774 ms
> latency stddev = 10.388 ms
> tps = 5761.883323 (including connections establishing)
> tps = 5762.027278 (excluding connections establishing)
> 
> 128:
> latency average = 2.543 ms
> latency stddev = 3.554 ms
> tps = 6284.069846 (including connections establishing)
> tps = 6284.184570 (excluding connections establishing)
> 
> Note the latency dev which is 3x better. And the improved throughput.

That is an improvement.  The workload is no less realistic than the ones
having shown regressions.

> Which means that transactional workloads that are bigger than the OS
> memory, or which have a non-uniform distribution leading to some
> locality, are likely to be faster. In practice those are *hugely* more
> likely than the uniform distribution that pgbench has.

That is formally true; non-benchmark workloads rarely issue uniform writes.
However, enough non-benchmark workloads have too little locality to benefit
from caches.  Those will struggle against *_flush_after like uniform writes
do, so discounting uniform writes wouldn't simplify this project.


Today's defaults for *_flush_after greatly smooth and accelerate performance
for one class of plausible workloads while greatly slowing a different class
of plausible workloads.

nm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Let file_fdw access COPY FROM PROGRAM

2016-06-02 Thread Craig Ringer
On 3 June 2016 at 04:48, Corey Huinker  wrote:

> A while back, there was a push to make COPY gzip-aware. That didn't
> happen, but COPY FROM PROGRAM did, and it scratches the same itch.
>


> - writing unwanted columns to a temp/work table via COPY, and then
> immediately re-reading them
>

Without wanting to take away from the value of letting file FDW access FROM
PROGRAM, I think this really merits a solution that applies to COPY as
well. Variants on "how do I COPY just some columns from a CSV" is a real
FAQ, and it doesn't seem like it'd be excessively hard to support. We'd
just need some way to pass a list of column-ordinals or header-names.

Not asking you to do that work, just pointing out that this particular
issue applies to COPY its self as well.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Rename max_parallel_degree?

2016-06-02 Thread Peter Eisentraut

On 6/3/16 12:21 AM, Petr Jelinek wrote:

On 01/06/16 17:55, David G. Johnston wrote:

On Wed, Jun 1, 2016 at 11:45 AM, Petr Jelinek mailto:p...@2ndquadrant.com>>wrote:

That GUC also controls worker processes that are started by
extensions, not just ones that parallel query starts. This is btw
one thing I don't like at all about how the current limits work, the
parallel query will fight for workers with extensions because they
share the same limit.


​Given that this models reality the GUC is doing its job.  Now, maybe we
need additional knobs to give the end-user the ability to influence how
those fights will turn out.


Agreed, my point is that I think we do need additional knob.


We need one knob to control how many process slots to create at server 
start, and then a bunch of sliders to control how to allocate those 
between regular connections, superuser connections, replication, 
autovacuum, parallel workers, background workers (by tag/label/group), 
and so on.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Statement timeout

2016-06-02 Thread Craig Ringer
On 3 June 2016 at 09:45, Tatsuo Ishii  wrote:

> > Well, multiple parse/bind/execute messages before a sync are definitely
> > used by PgJDBC and nPgSQL for batching,
>
> Yes, I realized in JDBC.
>
> > and I just posted a patch for it
> > for libpq.
>
> I didn't noticed it. Could you give me the message id or URL?
>
>
https://commitfest.postgresql.org/10/634/

https://www.postgresql.org/message-id/flat/CAMsr+YFUjJytRyV4J-16bEoiZyH=4nj+sQ7JP9ajwz=b4dm...@mail.gmail.com#CAMsr+YFUjJytRyV4J-16bEoiZyH=4nj+sQ7JP9ajwz=b4dm...@mail.gmail.com



> > I am very surprised to find out that statement_timeout tracks the total
> > time and isn't reset by a new statement, but I guess it makes sense -
> what,
> > exactly, delimits a "query" in extended query mode? statement_timeout in
> > simple-query mode starts at parse time and runs until the end of execute.
> > In e.q.p. there might be only one parse, then a series of Bind and
> Execute
> > messages, or there may be repeated Parse messages.
>
> Another issue is inconsistency with log duration, which shows the the
> elapsed time for each execute message. I think statement timeout
> should be consistent with statement duration. Otherwise users will be
> confused.
>

While I agree that's confusing, I think that's actualyl a problem with
log_duration.

log_duration is really more of an internal trace parameter that should be
named debug_log_duration or something IMO. It also fails to print the
message type, which makes it even more confusing since it for a typical
extended protocl query it usually logs 3 durations with no indication of
which is what.

Users should be using log_min_duration_statement. You know, the confusingly
named one. Or is it log_duration_min_statement or
log_statement_min_duration or ...?

Yeah, log_duration is confusing to the point I think it needs a comment
like "To record query run-time you probably want
log_min_duration_statement, not log_duration".

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Rename max_parallel_degree?

2016-06-02 Thread Petr Jelinek

On 01/06/16 17:55, David G. Johnston wrote:

On Wed, Jun 1, 2016 at 11:45 AM, Petr Jelinek mailto:p...@2ndquadrant.com>>wrote:

That GUC also controls worker processes that are started by
extensions, not just ones that parallel query starts. This is btw
one thing I don't like at all about how the current limits work, the
parallel query will fight for workers with extensions because they
share the same limit.


​Given that this models reality the GUC is doing its job.  Now, maybe we
need additional knobs to give the end-user the ability to influence how
those fights will turn out.


Agreed, my point is that I think we do need additional knob.

--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] BUG #14155: bloom index error with unlogged table

2016-06-02 Thread Tom Lane
I wrote:
> Jeff Janes  writes:
>> My biggest gripe with it at the moment is that the signature size should be
>> expressed in bits, and then internally rounded up to a multiple of 16,
>> rather than having it be expressed in 'uint16'.
>> If that were done it would be easier to fix the documentation to be more
>> understandable.

> +1 ... that sort of definition seems much more future-proof, too.
> IMO it's not too late to change this.  (We probably don't want to change
> the on-disk representation of the reloptions, but we could convert from
> bits to words in bloptions().)

There were no objections to this, but also no action.  Attached is a draft
patch ... any complaints?

regards, tom lane

diff --git a/contrib/bloom/bloom.h b/contrib/bloom/bloom.h
index c21eebf..a9daf63 100644
*** a/contrib/bloom/bloom.h
--- b/contrib/bloom/bloom.h
*** typedef BloomPageOpaqueData *BloomPageOp
*** 79,96 
  #define BLOOM_HEAD_BLKNO		(1)		/* first data page */
  
  /*
!  * Maximum of bloom signature length in uint16. Actual value
!  * is 512 bytes
   */
! #define MAX_BLOOM_LENGTH		(256)
  
  /* Bloom index options */
  typedef struct BloomOptions
  {
  	int32		vl_len_;		/* varlena header (do not touch directly!) */
! 	int			bloomLength;	/* length of signature in uint16 */
! 	int			bitSize[INDEX_MAX_KEYS];		/* signature bits per index
!  * key */
  }	BloomOptions;
  
  /*
--- 79,108 
  #define BLOOM_HEAD_BLKNO		(1)		/* first data page */
  
  /*
!  * We store Bloom signatures as arrays of uint16 words.
   */
! typedef uint16 BloomSignatureWord;
! 
! #define SIGNWORDBITS ((int) (BITS_PER_BYTE * sizeof(BloomSignatureWord)))
! 
! /*
!  * Default and maximum Bloom signature length in bits.
!  */
! #define DEFAULT_BLOOM_LENGTH	(5 * SIGNWORDBITS)
! #define MAX_BLOOM_LENGTH		(256 * SIGNWORDBITS)
! 
! /*
!  * Default and maximum signature bits generated per index key.
!  */
! #define DEFAULT_BLOOM_BITS		2
! #define MAX_BLOOM_BITS			(MAX_BLOOM_LENGTH - 1)
  
  /* Bloom index options */
  typedef struct BloomOptions
  {
  	int32		vl_len_;		/* varlena header (do not touch directly!) */
! 	int			bloomLength;	/* length of signature in words (not bits!) */
! 	int			bitSize[INDEX_MAX_KEYS];	/* signature bits per index key */
  }	BloomOptions;
  
  /*
*** typedef struct BloomState
*** 143,154 
  /*
   * Tuples are very different from all other relations
   */
- typedef uint16 SignType;
- 
  typedef struct BloomTuple
  {
  	ItemPointerData heapPtr;
! 	SignType	sign[FLEXIBLE_ARRAY_MEMBER];
  }	BloomTuple;
  
  #define BLOOMTUPLEHDRSZ offsetof(BloomTuple, sign)
--- 155,164 
  /*
   * Tuples are very different from all other relations
   */
  typedef struct BloomTuple
  {
  	ItemPointerData heapPtr;
! 	BloomSignatureWord	sign[FLEXIBLE_ARRAY_MEMBER];
  }	BloomTuple;
  
  #define BLOOMTUPLEHDRSZ offsetof(BloomTuple, sign)
*** typedef struct BloomTuple
*** 156,162 
  /* Opaque data structure for bloom index scan */
  typedef struct BloomScanOpaqueData
  {
! 	SignType   *sign;			/* Scan signature */
  	BloomState	state;
  }	BloomScanOpaqueData;
  
--- 166,172 
  /* Opaque data structure for bloom index scan */
  typedef struct BloomScanOpaqueData
  {
! 	BloomSignatureWord   *sign;			/* Scan signature */
  	BloomState	state;
  }	BloomScanOpaqueData;
  
*** extern void BloomFillMetapage(Relation i
*** 170,176 
  extern void BloomInitMetapage(Relation index);
  extern void BloomInitPage(Page page, uint16 flags);
  extern Buffer BloomNewBuffer(Relation index);
! extern void signValue(BloomState * state, SignType * sign, Datum value, int attno);
  extern BloomTuple *BloomFormTuple(BloomState * state, ItemPointer iptr, Datum *values, bool *isnull);
  extern bool BloomPageAddItem(BloomState * state, Page page, BloomTuple * tuple);
  
--- 180,186 
  extern void BloomInitMetapage(Relation index);
  extern void BloomInitPage(Page page, uint16 flags);
  extern Buffer BloomNewBuffer(Relation index);
! extern void signValue(BloomState * state, BloomSignatureWord * sign, Datum value, int attno);
  extern BloomTuple *BloomFormTuple(BloomState * state, ItemPointer iptr, Datum *values, bool *isnull);
  extern bool BloomPageAddItem(BloomState * state, Page page, BloomTuple * tuple);
  
diff --git a/contrib/bloom/blscan.c b/contrib/bloom/blscan.c
index aebf32a..0c954dc 100644
*** a/contrib/bloom/blscan.c
--- b/contrib/bloom/blscan.c
*** blgetbitmap(IndexScanDesc scan, TIDBitma
*** 93,99 
  		/* New search: have to calculate search signature */
  		ScanKey		skey = scan->keyData;
  
! 		so->sign = palloc0(sizeof(SignType) * so->state.opts.bloomLength);
  
  		for (i = 0; i < scan->numberOfKeys; i++)
  		{
--- 93,99 
  		/* New search: have to calculate search signature */
  		ScanKey		skey = scan->keyData;
  
! 		so->sign = palloc0(sizeof(BloomSignatureWord) * so->state.opts.bloomLength);
  
  		for (i = 0; i < scan->numberOfKeys; i+

Re: [HACKERS] Prepared statements and generic plans

2016-06-02 Thread David G. Johnston
On Thu, Jun 2, 2016 at 9:56 PM, Bruce Momjian  wrote:

> In Postgres 9.2 we improved the logic of when generic plans are used by
> EXECUTE.  We weren't sure how well it would work, and the docs included
> a vague description on when generic plans are chosen.
>
> I have gotten a few questions lately about how prepared statements are
> handled with multiple executions so I have updated the PREPARE manual
> page with the attached patch to more clearly explain generic plans and
> when they are chosen.
>
> I would like to apply this to the 9.6 docs.
>

​While putting the proposed patch in context I came across this.

​"""
​
Prepared statements have the largest performance advantage when a single
session is being used to execute a large number of similar statements. The
performance difference will be particularly significant if the statements
are complex to plan or rewrite, for example, if the query involves a join
of many tables or requires the application of several rules. If the
statement is relatively simple to plan and rewrite but relatively expensive
to execute, the performance advantage of prepared statements will be less
noticeable.
​"""

Until and unless the generic plan is used the  "...if statements are
complex to plan..." doesn't make sense; and no where in the description
itself do we introduce the generic plan concept.  This is inconsistent but
I'm not addressing it below though its worth considering before a patch to
this area is committed.

As to the patch...

Isn't this backwards?  [change]

"""
otherwise it  occurs only after five or more
executions produce [execution /strike plans] plus planning costs that are,
on average, [roughly equal to /strike cheaper] than the generic plan.
"""

I'd maybe go with something like this:

All executions of a prepared statement having zero parameters will use the
same plan so the planning time taken during the first execution will be
spread across all subsequent executions.  For statements having parameters
the first five executions will result in value-specific plans as previously
described.  However, on the sixth execution a generic plan will also be
computed and if the average planning + execution cost of all previous
value-specific plans is about equal to the execution cost of the generic
plan the generic plan will be chosen for that and all subsequent executions.



If we are getting generic plans significantly cheaper than the
value-specific plans I suspect there is a problem...so any comparison that
indicates "less-than" is prone to cause confusion.  The original is worded
well on this point: "...generic plan appears to be not much more
expensive..." but lacks detail elsewhere.

This part:

!A generic plan assumes each value supplied to
EXECUTE
!is one of the column's distinct values and that column values are
!uniformly distributed.  For example, if statistics records three
!distinct column values, a generic plan assumes a column equality
!comparison will match 33% of processed rows.  Column statistics
!also allows generic plans to accurately compute the selectivity of
!unique columns.  Comparisons on non-uniformly-distributed columns and
!specification of non-existent values affects the average plan cost,
!and hence if and when a generic plan is chosen.  [elided the last
sentence, placed in the first paragraph]

I'm not sure of the specific goal here but this level detail seems a bit
out-of-place in the SQL Command documentation.  So, do we want this
user-facing and if so do we want it here?

So while looking to find a home for this detail I came across the fact that
Chapter 66 doesn't have a table of contentsquite possibly because it is
presently a single section chapter...

https://www.postgresql.org/docs/devel/static/planner-stats-details.html

It seems like this content would find a nice home there as well (though we
should fix any TOC bug before adding the new section...)

IMO there are now three possibilities for the existing paragraph:

Status-quo
First portion of Bruce's patch
Mine, above

I think the second part of Bruce's should be, possibly with additional
detail and maybe some concrete examples, added to Chapter 66 instead.  A
pointer to that section in the SQL Command docs could be made - though I'm
not sure how much pointing into internals we do from the user-facing areas
of the docs.  In any case the material would be available to those who wish
to see it.

The pertinent points, 5 value-specific executions and comparing the average
of all previous executions to the generic plan generated on the 6th
execution, both do seem worth upsetting the status-quo for.  Beyond that it
is style - except I think Bruce got the comparison direction and magnitude
wrong as noted above.

This leaves Bruce's second alteration: which probably should follow the
rest over to chapter 66.  The point of the existing sentence is to give the
casual user the means to detect the current type of plan and I think that
is all th

Re: [HACKERS] Prepared statements and generic plans

2016-06-02 Thread Bruce Momjian
On Thu, Jun  2, 2016 at 09:56:48PM -0400, Bruce Momjian wrote:
> In Postgres 9.2 we improved the logic of when generic plans are used by
> EXECUTE.  We weren't sure how well it would work, and the docs included
> a vague description on when generic plans are chosen.
> 
> I have gotten a few questions lately about how prepared statements are
> handled with multiple executions so I have updated the PREPARE manual
> page with the attached patch to more clearly explain generic plans and
> when they are chosen.
> 
> I would like to apply this to the 9.6 docs.

FYI, I used this set of queries for testing:

DROP TABLE IF EXISTS test;
CREATE TABLE test (x INT, y INT);
INSERT INTO test SELECT 0, y FROM generate_series(1, 1) AS z(y);
INSERT INTO test SELECT 1, y FROM generate_series(10001, 2) AS z(y);
-- INSERT INTO test SELECT 2, 20001;
-- INSERT INTO test SELECT 3, 20002;
CREATE INDEX i_test_x ON test(x);
CREATE INDEX i_test_y ON test(y);
ANALYZE test;
PREPARE prep_x AS SELECT * FROM test WHERE x = $1;
PREPARE prep_y AS SELECT * FROM test WHERE y = $1;

Doing execute 5+ times on the two prepared statements with the constants
'0' and '2' show the documented behavior, e.g.:

EXPLAIN EXECUTE prep_x(2);
EXPLAIN EXECUTE prep_y(2);

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Prepared statements and generic plans

2016-06-02 Thread Bruce Momjian
In Postgres 9.2 we improved the logic of when generic plans are used by
EXECUTE.  We weren't sure how well it would work, and the docs included
a vague description on when generic plans are chosen.

I have gotten a few questions lately about how prepared statements are
handled with multiple executions so I have updated the PREPARE manual
page with the attached patch to more clearly explain generic plans and
when they are chosen.

I would like to apply this to the 9.6 docs.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +
diff --git a/doc/src/sgml/ref/prepare.sgml b/doc/src/sgml/ref/prepare.sgml
new file mode 100644
index dbce8f2..fce6e96
*** a/doc/src/sgml/ref/prepare.sgml
--- b/doc/src/sgml/ref/prepare.sgml
*** PREPARE n
*** 127,140 
Notes
  

!If a prepared statement is executed enough times, the server may eventually
!decide to save and re-use a generic plan rather than re-planning each time.
!This will occur immediately if the prepared statement has no parameters;
!otherwise it occurs only if the generic plan appears to be not much more
!expensive than a plan that depends on specific parameter values.
!Typically, a generic plan will be selected only if the query's performance
!is estimated to be fairly insensitive to the specific parameter values
!supplied.

  

--- 127,147 
Notes
  

!Prepared statements can optionally use generic plans rather than
!re-planning with each set of supplied EXECUTE values.
!This occurs immediately for prepared statements with no parameters;
!otherwise it occurs only after five or more executions produce plans
!plus planning costs that are, on average, cheaper than the generic plan.
!A generic plan assumes each value supplied to EXECUTE
!is one of the column's distinct values and that column values are
!uniformly distributed.  For example, if statistics records three
!distinct column values, a generic plan assumes a column equality
!comparison will match 33% of processed rows.  Column statistics
!also allows generic plans to accurately compute the selectivity of
!unique columns.  Comparisons on non-uniformly-distributed columns and
!specification of non-existent values affects the average plan cost,
!and hence if and when a generic plan is chosen.  Once a generic plan is
!chosen, it is used for the remaining lifetime of the prepared statement.

  

*** PREPARE n
*** 142,148 
 for a prepared statement, use .
 If a generic plan is in use, it will contain parameter symbols
 $n, while a custom plan will have the
!current actual parameter values substituted into it.

  

--- 149,157 
 for a prepared statement, use .
 If a generic plan is in use, it will contain parameter symbols
 $n, while a custom plan will have the
!supplied parameter values substituted into it.
!The row estimates in the generic plan reflect the selectivity
!computed for the parameters.

  


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Statement timeout

2016-06-02 Thread Tatsuo Ishii
> Well, multiple parse/bind/execute messages before a sync are definitely
> used by PgJDBC and nPgSQL for batching,

Yes, I realized in JDBC.

> and I just posted a patch for it
> for libpq.

I didn't noticed it. Could you give me the message id or URL?

I wouldn't have considered it to simulate multi-statement
> simple-protocol queries, but I guess there are some parallels.
> 
> I am very surprised to find out that statement_timeout tracks the total
> time and isn't reset by a new statement, but I guess it makes sense - what,
> exactly, delimits a "query" in extended query mode? statement_timeout in
> simple-query mode starts at parse time and runs until the end of execute.
> In e.q.p. there might be only one parse, then a series of Bind and Execute
> messages, or there may be repeated Parse messages.

Another issue is inconsistency with log duration, which shows the the
elapsed time for each execute message. I think statement timeout
should be consistent with statement duration. Otherwise users will be
confused.

> Personally I'd be fairly happy with statement-timeout applying per-message
> in the extended query protocol. That would mean that it behaves slightly
> differently, and a query with a long slow parse and bind phase followed by
> quick execution might fail to time out in the extended query protocol where
> it would time out as a simple query.
> It'd behave as if the query was
> PREPAREd then separately EXECUTEd in simple-query protocol. I'm not hugely
> bothered by that, but if it's really a concern I'd ideally like to add a
> new protocol message that resets the statement timeout counter, so the
> client can define what delimits a statement. Not practical in the near term.
> 
> For now I'd be OK with documenting this as a quirk/limitation of
> statement_timeout, that it applies to a whole extended-query-protocol
> batch.

Probably we should apply the code change for 10.0 if any. (of course
this will not be applied if many uses are getting angry with current
behavior of statement timeout).

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-06-02 Thread Josh berkus
On 06/02/2016 01:42 PM, David G. Johnston wrote:
> ​Are you referring to right now or if we move the goal posts to making
> > this a per-statement reservation?​
> 
> I was assuming that we would have *both* per-operation and per-statement
> limits.  I can see reasons for having both, I can see why power users
> would want both, but it's going to be overwhelming to casual users.
> 
> 
> ​ Got that.  The only problem on that front with the current setup is
> that right now we are saying: "at most use 3 of the 8 available
> processes": i.e., we tie ourselves to a fixed number when I think a
> better algorithm would be: "on/off/auto - default auto" and we detect at
> runtime whatever values we feel are most appropriate based upon the
> machine we are running on.  If the user doesn't like our choices they
> can specify their own values.  But the only specified values in the
> configurations would be those placed there automatically by the user. 
> If value isn't specified but is required it gets set at startup and can
> be seen in pg_settings.
> 

Well, the hard part here is that the appropriate value is based on the
level of concurrency in the database as a whole.  For example, if it's a
website database with 200 active connections on a 32-core machine
already, you want zero parallelism.  Whereas if it's the only current
statement on a 16-core VM, you want like 8.

That sounds like a heuristic, except that the number of concurrent
statements can change in milleseconds.  So we'd really want to base this
on some sort of moving average, set conservatively.  This will be some
interesting code, and will probably need to be revised several times
before we get it right.  Particularly since this would involve scanning
some of the global catalogs we've been trying to move activity off of.

-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] kqueue

2016-06-02 Thread Thomas Munro
On Fri, Jun 3, 2016 at 4:02 AM, Alvaro Herrera  wrote:
> Tom Lane wrote:
>> Andres Freund  writes:
>> >> pg_strtoi?
>>
>> > I think that's what Thomas did upthread. Are you taking this one then?
>>
>> I'd go with just "strtoint".  We have "strtoint64" elsewhere.
>
> For closure of this subthread: this rename was committed by Tom as
> 0ab3595e5bb5.

Thanks.  And here is a new version of the kqueue patch.  The previous
version doesn't apply on top of recent commit
a3b30763cc8686f5b4cd121ef0bf510c1533ac22, which sprinkled some
MAXALIGN macros nearby.  I've now done the same thing with the kevent
struct because it's cheap, uniform with the other cases and could
matter on some platforms for the same reason.

It's in the September commitfest here: https://commitfest.postgresql.org/10/597/

-- 
Thomas Munro
http://www.enterprisedb.com


kqueue-v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] array of domain types

2016-06-02 Thread Rod Taylor
On Thu, Jun 2, 2016 at 10:42 AM, Konstantin Knizhnik <
k.knizh...@postgrespro.ru> wrote:

> On 02.06.2016 17:22, Tom Lane wrote:
>
>> konstantin knizhnik  writes:
>>
>>> Attached please find patch for DefineDomain function.
>>>
>> You didn't attach the patch,
>>
>
> Sorry, but I did attached the patch - I see the attachment in my mail
> received from the group.
> Multidimensional arrays work fine:
>
> knizhnik=# SELECT '{{14},{20}}'::teenager[][];
> ERROR:  value for domain teenager violates check constraint
> "teenager_check"
> LINE 1: SELECT '{{14},{20}}'::teenager[][];
>^
> knizhnik=# SELECT '{{14},{19}}'::teenager[][];
>   teenager
> -
>  {{14},{19}}
> (1 row)
>
> knizhnik=# SELECT ('{{14},{19}}'::teenager[][])[1][1];
>  teenager
> --
>14
> (1 row)
>
>
> Domain of array of domain also works:
>
>
I applied the domain.patch from above on HEAD, and all I get is cache
lookup failures. The type_sanity regression test fails too.

postgres=# CREATE DOMAIN teenager AS int CHECK (VALUE BETWEEN 13 AND 20);
CREATE DOMAIN
postgres=# CREATE DOMAIN teenager_groups AS teenager[];
CREATE DOMAIN
postgres=# CREATE TABLE x (col teenager_groups);
ERROR:  cache lookup failed for type 0


Anyway, if that worked for me I would have done this which I expect will
succeed when it shouldn't.

INSERT INTO x VALUES (ARRAY[13,14,20]);
ALTER DOMAIN teenager DROP CONSTRAINT teenager_check;
ALTER DOMAIN teenager ADD CHECK (VALUE BETWEEN 13 AND 19);


Re: [HACKERS] IPv6 link-local addresses and init data type

2016-06-02 Thread Tom Lane
Markus Wanner  writes:
> Given that a zone_id is a) highly system dependent and b) only ever
> meaningful for non-global addresses, I'm wondering what the use case for
> storing them is.

> I'm even wondering if 'fe80::1%1'::inet = 'fe80::1%2'::inet shouldn't
> simply yield true. After all, it's the same (non-global) address.

Surely not?  If the zone_ids didn't mean anything, why would the concept
even exist?  ISTM that what you've got there is two different addresses
neither of which is reachable from outside the given machine --- but
within that machine, they are different.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] IPv6 link-local addresses and init data type

2016-06-02 Thread Markus Wanner
On 31.05.2016 12:40, Andreas Karlsson wrote:
> On 05/31/2016 04:06 AM, Tom Lane wrote:
>> Unless there's a semantic difference between fe80::1%2/64 and
>> fe80::1/64%2, this doesn't seem like a big deal to me.
> 
> As far as I can till only fe80::1%2/64 is valid, but I am not 100% sure.

According to RFC 4007, Section 11.7 states: "In this combination, it is
important to place the zone index portion before the prefix length when
we consider parsing the format by a name-to-address library function
[11].  That is, we can first separate the address with the zone index
from the prefix length, and just pass the former to the library function."

However, in the sense of being liberal in what you accept, 'fe80::/64%2'
should probably work as well.

Given that a zone_id is a) highly system dependent and b) only ever
meaningful for non-global addresses, I'm wondering what the use case for
storing them is.

I'm even wondering if 'fe80::1%1'::inet = 'fe80::1%2'::inet shouldn't
simply yield true. After all, it's the same (non-global) address.

Kind Regards

Markus Wanner



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Let file_fdw access COPY FROM PROGRAM

2016-06-02 Thread Corey Huinker
A while back, there was a push to make COPY gzip-aware. That didn't happen,
but COPY FROM PROGRAM did, and it scratches the same itch.

I have a similar need, but with file_fdw foreign tables. I have .csv.gz
files downloaded to the server, but those CSVs have 100+ columns in them,
and in this case I only really care about a half dozen of those columns.
I'd like to avoid:
- the overhead of writing the uncompressed file to disk and then
immediately re-reading it
- writing unwanted columns to a temp/work table via COPY, and then
immediately re-reading them
- multicorn fdw because it ends up making a python string out of all data
cells
- a csv parsing tool like csvtool or mlr, because they output another CSV
which must be reparsed from scratch

Since file_fdw leverages COPY, it seemed like it would be easy to add the
FROM PROGRAM feature to file_fdw. I began asking questions on #postgresql
IRC, only to discover that Adam Gomaa ( akgo...@gmail.com ) had already
written such a thing, but hadn't submitted it. Attached is a small rework
of his patch, along with documentation.

NOTE: The regression test includes unix commands in the program option. I
figured that wouldn't work for win32 systems, so I checked to see what the
regression tests do to test COPY FROM PROGRAM...and I couldn't find any. So
I guess the test exists as a proof of concept that will get excised before
final commit.
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index bc4d2d7..97df270 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -59,6 +59,7 @@ struct FileFdwOption
 static const struct FileFdwOption valid_options[] = {
/* File options */
{"filename", ForeignTableRelationId},
+   {"program", ForeignTableRelationId},
 
/* Format options */
/* oids option is not supported */
@@ -86,9 +87,10 @@ static const struct FileFdwOption valid_options[] = {
 typedef struct FileFdwPlanState
 {
char   *filename;   /* file to read */
-   List   *options;/* merged COPY options, excluding 
filename */
-   BlockNumber pages;  /* estimate of file's physical 
size */
-   double  ntuples;/* estimate of number of rows 
in file */
+   char   *program;/* program to read output from */
+   List   *options;/* merged COPY options, excluding 
filename and program */
+   BlockNumber pages;  /* estimate of file or program 
output's physical size */
+   double  ntuples;/* estimate of number of rows 
in file or program output */
 } FileFdwPlanState;
 
 /*
@@ -97,8 +99,9 @@ typedef struct FileFdwPlanState
 typedef struct FileFdwExecutionState
 {
char   *filename;   /* file to read */
-   List   *options;/* merged COPY options, excluding 
filename */
-   CopyState   cstate; /* state of reading file */
+   char   *program;/* program to read output from */
+   List   *options;/* merged COPY options, excluding 
filename and program */
+   CopyState   cstate; /* state of reading file or 
program */
 } FileFdwExecutionState;
 
 /*
@@ -139,7 +142,9 @@ static bool fileIsForeignScanParallelSafe(PlannerInfo 
*root, RelOptInfo *rel,
  */
 static bool is_valid_option(const char *option, Oid context);
 static void fileGetOptions(Oid foreigntableid,
-  char **filename, List **other_options);
+  char **filename,
+  char **program,
+  List **other_options);
 static List *get_file_fdw_attribute_options(Oid relid);
 static bool check_selective_binary_conversion(RelOptInfo *baserel,
  Oid 
foreigntableid,
@@ -189,6 +194,7 @@ file_fdw_validator(PG_FUNCTION_ARGS)
List   *options_list = untransformRelOptions(PG_GETARG_DATUM(0));
Oid catalog = PG_GETARG_OID(1);
char   *filename = NULL;
+   char   *program = NULL;
DefElem*force_not_null = NULL;
DefElem*force_null = NULL;
List   *other_options = NIL;
@@ -196,16 +202,17 @@ file_fdw_validator(PG_FUNCTION_ARGS)
 
/*
 * Only superusers are allowed to set options of a file_fdw foreign 
table.
-* This is because the filename is one of those options, and we don't 
want
-* non-superusers to be able to determine which file gets read.
+* This is because the filename or program string are two of those
+* options, and we don't want non-superusers to be able to determine 
which
+* file gets read or what command is run.
 *
 * Putting this sort of permissions check in a validator 

Re: [HACKERS] Rename max_parallel_degree?

2016-06-02 Thread David G. Johnston
On Thu, Jun 2, 2016 at 4:35 PM, Josh berkus  wrote:

> On 06/02/2016 01:08 PM, David G. Johnston wrote:
> > On Thu, Jun 2, 2016 at 3:52 PM, Josh berkus  > >wrote:
> >
> > On 06/02/2016 08:53 AM, Tom Lane wrote:
> > > Josh berkus mailto:j...@agliodbs.com>> writes:
> > >> On 06/02/2016 04:58 AM, Robert Haas wrote:
> > >>> Well, I think we could drop node, if you like.  I think parallel
> > >>> wouldn't be good to drop, though, because it sounds like we want
> a
> > >>> global limit on parallel workers also, and that can't be just
> > >>> max_workers.  So I think we should keep parallel in there for
> all of
> > >>> them, and have max_parallel_workers and
> > >>> max_parallel_workers_per_gather(_node).  The reloption and the
> Path
> > >>> struct field can be parallel_workers rather than parallel_degree.
> > >
> > >> So does that mean we'll rename it if you manage to implement a
> parameter
> > >> which controls the number of workers for the whole statement?
> > >
> > > That would fit in as something like
> max_parallel_workers_per_statement.
> >
> > ETOOMANYKNOBS
> >
> > I'm trying to think of some way we can reasonably automate this for
> > users ...
> >
> >
> > ​Are you referring to right now or if we move the goal posts to making
> > this a per-statement reservation?​
>
> I was assuming that we would have *both* per-operation and per-statement
> limits.  I can see reasons for having both, I can see why power users
> would want both, but it's going to be overwhelming to casual users.
>
>
​Got that.  The only problem on that front with the current setup is that
right now we are saying: "at most use 3 of the 8 available processes":
i.e., we tie ourselves to a fixed number when I think a better algorithm
would be: "on/off/auto - default auto" and we detect at runtime whatever
values we feel are most appropriate based upon the machine we are running
on.  If the user doesn't like our choices they can specify their own
values.  But the only specified values in the configurations would be those
placed there automatically by the user.  If value isn't specified but is
required it gets set at startup and can be seen in pg_settings.

David J.


Re: [HACKERS] Rename max_parallel_degree?

2016-06-02 Thread Josh berkus
On 06/02/2016 01:08 PM, David G. Johnston wrote:
> On Thu, Jun 2, 2016 at 3:52 PM, Josh berkus  >wrote:
> 
> On 06/02/2016 08:53 AM, Tom Lane wrote:
> > Josh berkus mailto:j...@agliodbs.com>> writes:
> >> On 06/02/2016 04:58 AM, Robert Haas wrote:
> >>> Well, I think we could drop node, if you like.  I think parallel
> >>> wouldn't be good to drop, though, because it sounds like we want a
> >>> global limit on parallel workers also, and that can't be just
> >>> max_workers.  So I think we should keep parallel in there for all of
> >>> them, and have max_parallel_workers and
> >>> max_parallel_workers_per_gather(_node).  The reloption and the Path
> >>> struct field can be parallel_workers rather than parallel_degree.
> >
> >> So does that mean we'll rename it if you manage to implement a 
> parameter
> >> which controls the number of workers for the whole statement?
> >
> > That would fit in as something like max_parallel_workers_per_statement.
> 
> ETOOMANYKNOBS
> 
> I'm trying to think of some way we can reasonably automate this for
> users ...
> 
> 
> ​Are you referring to right now or if we move the goal posts to making
> this a per-statement reservation?​

I was assuming that we would have *both* per-operation and per-statement
limits.  I can see reasons for having both, I can see why power users
would want both, but it's going to be overwhelming to casual users.


-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-06-02 Thread David G. Johnston
On Thu, Jun 2, 2016 at 3:52 PM, Josh berkus  wrote:

> On 06/02/2016 08:53 AM, Tom Lane wrote:
> > Josh berkus  writes:
> >> On 06/02/2016 04:58 AM, Robert Haas wrote:
> >>> Well, I think we could drop node, if you like.  I think parallel
> >>> wouldn't be good to drop, though, because it sounds like we want a
> >>> global limit on parallel workers also, and that can't be just
> >>> max_workers.  So I think we should keep parallel in there for all of
> >>> them, and have max_parallel_workers and
> >>> max_parallel_workers_per_gather(_node).  The reloption and the Path
> >>> struct field can be parallel_workers rather than parallel_degree.
> >
> >> So does that mean we'll rename it if you manage to implement a parameter
> >> which controls the number of workers for the whole statement?
> >
> > That would fit in as something like max_parallel_workers_per_statement.
>
> ETOOMANYKNOBS
>
> I'm trying to think of some way we can reasonably automate this for
> users ...
>

​Are you referring to right now or if we move the goal posts to making this
a per-statement reservation?​

Oh, and how does one measure 0.7​18... of a knob?

David J.


[HACKERS] XTM & parallel search

2016-06-02 Thread Konstantin Knizhnik

We have to add three more functions to eXtensible Transaction Manager API (XTM):

/*
 * Calculate transaction state size. This method is invoked by 
EstimateTransactionStateSpace to copy transaction
 * state to parallel workers
 */
size_t  (*GetTransactionStateSize)(void);

/*
 * Serialize transaction state
 */
void(*SerializeTransactionState)(void* ctx);

/*
 * Deserialize transaction state
 */
void(*DeserializeTransactionState)(void* ctx);

The reason is that we find out that our multimaster is not correctly working when 
max_parallel_workers > 0
because multimaster transaction context is not properly shared between workers.
Unfortunately right now serialization/deserialization of transaction state is 
hardcoded in xact.c. and IMHO is done in quite ugly way:

XactIsoLevel = (int) tstate[0];
XactDeferrable = (bool) tstate[1];
XactTopTransactionId = tstate[2];
CurrentTransactionState->transactionId = tstate[3];
currentCommandId = tstate[4];
nParallelCurrentXids = (int) tstate[5];
ParallelCurrentXids = &tstate[6];

- there is even no declared structure with fixed part of saved context.
I wonder if not only DTM will be interested in sharing some common state 
between workers and should we  provide some way of replicating user defined 
context between workers? From my point of view XTM seems to be good place for 
it...



--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-06-02 Thread Josh berkus
On 06/02/2016 08:53 AM, Tom Lane wrote:
> Josh berkus  writes:
>> On 06/02/2016 04:58 AM, Robert Haas wrote:
>>> Well, I think we could drop node, if you like.  I think parallel
>>> wouldn't be good to drop, though, because it sounds like we want a
>>> global limit on parallel workers also, and that can't be just
>>> max_workers.  So I think we should keep parallel in there for all of
>>> them, and have max_parallel_workers and
>>> max_parallel_workers_per_gather(_node).  The reloption and the Path
>>> struct field can be parallel_workers rather than parallel_degree.
> 
>> So does that mean we'll rename it if you manage to implement a parameter
>> which controls the number of workers for the whole statement?
> 
> That would fit in as something like max_parallel_workers_per_statement.

ETOOMANYKNOBS

I'm trying to think of some way we can reasonably automate this for
users ...

-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Misdesigned command/status APIs for parallel dump/restore

2016-06-02 Thread Tom Lane
I wrote:
> In the attached patch for this, I took a middle ground of separating out
> the command and status string building and parsing functions.  There isn't
> presently any provision for letting archive format modules override these,
> but that could easily be added if we ever need it.  Meanwhile, this saves
> about 100 lines of rather messy code.

Attached are two versions of this patch.  The first is against current
HEAD (i.e., rebased over e652273e073566b6) and the second is additionally
over the patch I proposed at <6814.1464893...@sss.pgh.pa.us>, resolving
the merge conflicts between them.

Since this is only refactoring, and doesn't (I think) fix any bugs,
I'm setting it aside till the next commitfest.

regards, tom lane

diff --git a/src/bin/pg_dump/parallel.c b/src/bin/pg_dump/parallel.c
index 7647271..9080102 100644
*** a/src/bin/pg_dump/parallel.c
--- b/src/bin/pg_dump/parallel.c
***
*** 20,49 
   * the desired number of worker processes, which each enter WaitForCommands().
   *
   * The master process dispatches an individual work item to one of the worker
!  * processes in DispatchJobForTocEntry().  That calls
!  * AH->MasterStartParallelItemPtr, a routine of the output format.  This
!  * function's arguments are the parents archive handle AH (containing the full
!  * catalog information), the TocEntry that the worker should work on and a
!  * T_Action value indicating whether this is a backup or a restore task.  The
!  * function simply converts the TocEntry assignment into a command string that
!  * is then sent over to the worker process. In the simplest case that would be
!  * something like "DUMP 1234", with 1234 being the TocEntry id.
!  *
   * The worker process receives and decodes the command and passes it to the
   * routine pointed to by AH->WorkerJobDumpPtr or AH->WorkerJobRestorePtr,
   * which are routines of the current archive format.  That routine performs
!  * the required action (dump or restore) and returns a malloc'd status string.
!  * The status string is passed back to the master where it is interpreted by
!  * AH->MasterEndParallelItemPtr, another format-specific routine.  That
!  * function can update state or catalog information on the master's side,
!  * depending on the reply from the worker process.  In the end it returns a
!  * status code, which is 0 for successful execution.
   *
!  * Remember that we have forked off the workers only after we have read in
!  * the catalog.  That's why our worker processes can also access the catalog
!  * information.  (In the Windows case, the workers are threads in the same
!  * process.  To avoid problems, they work with cloned copies of the Archive
!  * data structure; see RunWorker().)
   *
   * In the master process, the workerStatus field for each worker has one of
   * the following values:
--- 20,41 
   * the desired number of worker processes, which each enter WaitForCommands().
   *
   * The master process dispatches an individual work item to one of the worker
!  * processes in DispatchJobForTocEntry().  We send a command string such as
!  * "DUMP 1234" or "RESTORE 1234", where 1234 is the TocEntry ID.
   * The worker process receives and decodes the command and passes it to the
   * routine pointed to by AH->WorkerJobDumpPtr or AH->WorkerJobRestorePtr,
   * which are routines of the current archive format.  That routine performs
!  * the required action (dump or restore) and returns an integer status code.
!  * This is passed back to the master which updates its state accordingly.
   *
!  * In principle additional archive-format-specific information might be needed
!  * in commands or worker status responses, but so far that hasn't proved
!  * necessary, since workers have full copies of the ArchiveHandle/TocEntry
!  * data structures.  Remember that we have forked off the workers only after
!  * we have read in the catalog.  That's why our worker processes can also
!  * access the catalog information.  (In the Windows case, the workers are
!  * threads in the same process.  To avoid problems, they work with cloned
!  * copies of the Archive data structure; see RunWorker().)
   *
   * In the master process, the workerStatus field for each worker has one of
   * the following values:
*** ParallelBackupEnd(ArchiveHandle *AH, Par
*** 1057,1062 
--- 1049,1158 
  }
  
  /*
+  * These next four functions handle construction and parsing of the command
+  * strings and response strings for parallel workers.
+  *
+  * Currently, these can be the same regardless of which archive format we are
+  * processing.  In future, we might want to let format modules override these
+  * functions to add format-specific data to a command or response.
+  */
+ 
+ /*
+  * buildWorkerCommand: format a command string to send to a worker.
+  *
+  * The string is built in the caller-supplied buffer of size buflen.
+  */
+ static void
+ buildWorkerCommand(ArchiveHandle *AH, TocEn

Re: [HACKERS] Redesigning parallel dump/restore's wait-for-workers logic

2016-06-02 Thread Tom Lane
I wrote:
> One of the things I do not like about the current coding of parallel
> pg_dump/pg_restore is its baroque logic for handling worker completion
> reports, specifically the ListenToWorkers/ReapWorkerStatus APIs.

Here's a version of this patch rebased over e652273e073566b6.

Since this is only refactoring, and doesn't (I think) fix any bugs,
I'm setting it aside till the next commitfest.

regards, tom lane

diff --git a/src/bin/pg_dump/parallel.c b/src/bin/pg_dump/parallel.c
index 7647271..a9671ca 100644
*** a/src/bin/pg_dump/parallel.c
--- b/src/bin/pg_dump/parallel.c
***
*** 35,43 
   * the required action (dump or restore) and returns a malloc'd status string.
   * The status string is passed back to the master where it is interpreted by
   * AH->MasterEndParallelItemPtr, another format-specific routine.  That
!  * function can update state or catalog information on the master's side,
   * depending on the reply from the worker process.  In the end it returns a
!  * status code, which is 0 for successful execution.
   *
   * Remember that we have forked off the workers only after we have read in
   * the catalog.  That's why our worker processes can also access the catalog
--- 35,45 
   * the required action (dump or restore) and returns a malloc'd status string.
   * The status string is passed back to the master where it is interpreted by
   * AH->MasterEndParallelItemPtr, another format-specific routine.  That
!  * function can update format-specific information on the master's side,
   * depending on the reply from the worker process.  In the end it returns a
!  * status code, which we pass to the ParallelCompletionPtr callback function
!  * that was passed to DispatchJobForTocEntry().  The callback function does
!  * state updating for the master control logic in pg_backup_archiver.c.
   *
   * Remember that we have forked off the workers only after we have read in
   * the catalog.  That's why our worker processes can also access the catalog
***
*** 48,60 
   * In the master process, the workerStatus field for each worker has one of
   * the following values:
   *		WRKR_IDLE: it's waiting for a command
!  *		WRKR_WORKING: it's been sent a command
!  *		WRKR_FINISHED: it's returned a result
   *		WRKR_TERMINATED: process ended
-  * The FINISHED state indicates that the worker is idle, but we've not yet
-  * dealt with the status code it returned from the prior command.
-  * ReapWorkerStatus() extracts the unhandled command status value and sets
-  * the workerStatus back to WRKR_IDLE.
   */
  
  #include "postgres_fe.h"
--- 50,57 
   * In the master process, the workerStatus field for each worker has one of
   * the following values:
   *		WRKR_IDLE: it's waiting for a command
!  *		WRKR_WORKING: it's working on a command
   *		WRKR_TERMINATED: process ended
   */
  
  #include "postgres_fe.h"
***
*** 75,80 
--- 72,79 
  #define PIPE_READ			0
  #define PIPE_WRITE			1
  
+ #define NO_SLOT (-1)			/* Failure result for GetIdleWorker() */
+ 
  #ifdef WIN32
  
  /*
*** static void setup_cancel_handler(void);
*** 161,169 
--- 160,171 
  static void set_cancel_pstate(ParallelState *pstate);
  static void set_cancel_slot_archive(ParallelSlot *slot, ArchiveHandle *AH);
  static void RunWorker(ArchiveHandle *AH, ParallelSlot *slot);
+ static int	GetIdleWorker(ParallelState *pstate);
  static bool HasEveryWorkerTerminated(ParallelState *pstate);
  static void lockTableForWorker(ArchiveHandle *AH, TocEntry *te);
  static void WaitForCommands(ArchiveHandle *AH, int pipefd[2]);
+ static bool ListenToWorkers(ArchiveHandle *AH, ParallelState *pstate,
+ bool do_wait);
  static char *getMessageFromMaster(int pipefd[2]);
  static void sendMessageToMaster(int pipefd[2], const char *str);
  static int	select_loop(int maxFd, fd_set *workerset);
*** archive_close_connection(int code, void 
*** 335,342 
  			 * fail to detect it because there would be no EOF condition on
  			 * the other end of the pipe.)
  			 */
! 			if (slot->args->AH)
! DisconnectDatabase(&(slot->args->AH->public));
  
  #ifdef WIN32
  			closesocket(slot->pipeRevRead);
--- 337,344 
  			 * fail to detect it because there would be no EOF condition on
  			 * the other end of the pipe.)
  			 */
! 			if (slot->AH)
! DisconnectDatabase(&(slot->AH->public));
  
  #ifdef WIN32
  			closesocket(slot->pipeRevRead);
*** ShutdownWorkersHard(ParallelState *pstat
*** 393,399 
  	EnterCriticalSection(&signal_info_lock);
  	for (i = 0; i < pstate->numWorkers; i++)
  	{
! 		ArchiveHandle *AH = pstate->parallelSlot[i].args->AH;
  		char		errbuf[1];
  
  		if (AH != NULL && AH->connCancel != NULL)
--- 395,401 
  	EnterCriticalSection(&signal_info_lock);
  	for (i = 0; i < pstate->numWorkers; i++)
  	{
! 		ArchiveHandle *AH = pstate->parallelSlot[i].AH;
  		char		errbuf[1];
  
  		if (AH != NULL &

Re: [HACKERS] pg9.6 segfault using simple query (related to use fk for join estimates)

2016-06-02 Thread Tom Lane
Robert Haas  writes:
> FYI, I spoke to Tom Lane about this at PGCon and suggested that he
> look at the proposed patch as I requested in
> https://www.postgresql.org/message-id/CA+TgmobPqrAVXOBMHTcpDq8hX7gCzcVhoUvC8s9V=d09+bt...@mail.gmail.com
> and see whether that would address his concerns, and he said that he
> would do that.  It may, however, have slipped his mind.

Not entirely, though I've been wrapped up in pg_dump fixes for the last
little while.  I'll get to this soon.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] epoll_wait returning EFAULT on Linux 3.2.78

2016-06-02 Thread Tom Lane
Greg Stark  writes:
> Ok. I added a comment and also fixed a small typo.

Looks sane to me, though I might reduce the comment to something like
"MAXALIGN all the subsidiary arrays, to avoid interdependencies of the
alignment requirements of their component types."

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] epoll_wait returning EFAULT on Linux 3.2.78

2016-06-02 Thread Andres Freund
Hi,

On 2016-06-02 19:15:07 +0100, Greg Stark wrote:
> Ok. I added a comment and also fixed a small typo.

> diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
> index 3fbe0e5..a96fb7b 100644
> --- a/src/backend/storage/ipc/latch.c
> +++ b/src/backend/storage/ipc/latch.c
> @@ -485,35 +485,40 @@ CreateWaitEventSet(MemoryContext context, int nevents)
>   char   *data;
>   Sizesz = 0;
>  
> - sz += sizeof(WaitEventSet);
> - sz += sizeof(WaitEvent) * nevents;
> + /* 
> +  * struct epoll_event contains a uint64_t which on some architectures
> +  * requires 8-byte alignment and just to be safe be prepared for future
> +  * additions of other such elements later in the WaitEventSet structure
> +  */

I'd rather phrase this roughly like:

Use MAXALIGN size/alignment to guarantee that later uses of memory are
aligned correctly. E.g. epoll_event might need 8 byte alignment on some
platforms, but earlier allocations like WaitEventSet and WaitEvent might
not sized to guarantee that when purely using sizeof().

> + sz += MAXALIGN(sizeof(WaitEventSet));
> + sz += MAXALIGN(sizeof(WaitEvent) * nevents);
>  
>  #if defined(WAIT_USE_EPOLL)
> - sz += sizeof(struct epoll_event) * nevents;
> + sz += MAXALIGN(sizeof(struct epoll_event) * nevents);
>  #elif defined(WAIT_USE_POLL)
> - sz += sizeof(struct pollfd) * nevents;
> + sz += MAXALIGN(sizeof(struct pollfd) * nevents);
>  #elif defined(WAIT_USE_WIN32)
>   /* need space for the pgwin32_signal_event */
> - sz += sizeof(HANDLE) * (nevents + 1);
> + sz += MAXALIGN(sizeof(HANDLE) * (nevents + 1));
>  #endif
>  
>   data = (char *) MemoryContextAllocZero(context, sz);
>  
>   set = (WaitEventSet *) data;
> - data += sizeof(WaitEventSet);
> + data += MAXALIGN(sizeof(WaitEventSet));
>  
>   set->events = (WaitEvent *) data;
> - data += sizeof(WaitEvent) * nevents;
> + data += MAXALIGN(sizeof(WaitEvent) * nevents);
>  
>  #if defined(WAIT_USE_EPOLL)
>   set->epoll_ret_events = (struct epoll_event *) data;
> - data += sizeof(struct epoll_event) * nevents;
> + data += MAXALIGN(sizeof(struct epoll_event) * nevents);
>  #elif defined(WAIT_USE_POLL)
>   set->pollfds = (struct pollfd *) data;
> - data += sizeof(struct pollfd) * nevents;
> + data += MAXALIGN(sizeof(struct pollfd) * nevents);
>  #elif defined(WAIT_USE_WIN32)
>   set->handles = (HANDLE) data;
> - data += sizeof(HANDLE) * nevents;
> + data += MAXALIGN(sizeof(HANDLE) * nevents);
>  #endif

You can't easily test WIN32, but I'd suggest running make check with
WAIT_USE_POLL and WAIT_USE_SELECT at the top.

Looks sane otherwise.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] epoll_wait returning EFAULT on Linux 3.2.78

2016-06-02 Thread Greg Stark
Ok. I added a comment and also fixed a small typo.
diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
index 3fbe0e5..a96fb7b 100644
--- a/src/backend/storage/ipc/latch.c
+++ b/src/backend/storage/ipc/latch.c
@@ -485,35 +485,40 @@ CreateWaitEventSet(MemoryContext context, int nevents)
 	char	   *data;
 	Size		sz = 0;
 
-	sz += sizeof(WaitEventSet);
-	sz += sizeof(WaitEvent) * nevents;
+	/* 
+	 * struct epoll_event contains a uint64_t which on some architectures
+	 * requires 8-byte alignment and just to be safe be prepared for future
+	 * additions of other such elements later in the WaitEventSet structure
+	 */
+	sz += MAXALIGN(sizeof(WaitEventSet));
+	sz += MAXALIGN(sizeof(WaitEvent) * nevents);
 
 #if defined(WAIT_USE_EPOLL)
-	sz += sizeof(struct epoll_event) * nevents;
+	sz += MAXALIGN(sizeof(struct epoll_event) * nevents);
 #elif defined(WAIT_USE_POLL)
-	sz += sizeof(struct pollfd) * nevents;
+	sz += MAXALIGN(sizeof(struct pollfd) * nevents);
 #elif defined(WAIT_USE_WIN32)
 	/* need space for the pgwin32_signal_event */
-	sz += sizeof(HANDLE) * (nevents + 1);
+	sz += MAXALIGN(sizeof(HANDLE) * (nevents + 1));
 #endif
 
 	data = (char *) MemoryContextAllocZero(context, sz);
 
 	set = (WaitEventSet *) data;
-	data += sizeof(WaitEventSet);
+	data += MAXALIGN(sizeof(WaitEventSet));
 
 	set->events = (WaitEvent *) data;
-	data += sizeof(WaitEvent) * nevents;
+	data += MAXALIGN(sizeof(WaitEvent) * nevents);
 
 #if defined(WAIT_USE_EPOLL)
 	set->epoll_ret_events = (struct epoll_event *) data;
-	data += sizeof(struct epoll_event) * nevents;
+	data += MAXALIGN(sizeof(struct epoll_event) * nevents);
 #elif defined(WAIT_USE_POLL)
 	set->pollfds = (struct pollfd *) data;
-	data += sizeof(struct pollfd) * nevents;
+	data += MAXALIGN(sizeof(struct pollfd) * nevents);
 #elif defined(WAIT_USE_WIN32)
 	set->handles = (HANDLE) data;
-	data += sizeof(HANDLE) * nevents;
+	data += MAXALIGN(sizeof(HANDLE) * nevents);
 #endif
 
 	set->latch = NULL;
@@ -616,7 +621,7 @@ AddWaitEventToSet(WaitEventSet *set, uint32 events, pgsocket fd, Latch *latch,
 		if (set->latch)
 			elog(ERROR, "cannot wait on more than one latch");
 		if ((events & WL_LATCH_SET) != WL_LATCH_SET)
-			elog(ERROR, "latch events only spuport being set");
+			elog(ERROR, "latch events only support being set");
 	}
 	else
 	{

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] epoll_wait returning EFAULT on Linux 3.2.78

2016-06-02 Thread Tom Lane
Andres Freund  writes:
> On 2016-06-02 18:41:00 +0100, Greg Stark wrote:
>> Well there's not *nothing* we can do. I thought I we were going to
>> have to go back and do manual offset calculations to get that right.

> The kernel accesses the elements as an array. If the array stride (by
> virtue of sizeof) were wrong, we couldn't fix that.

Right.  The general rule in C is that sizeof(anything) is always a
multiple of the something's alignment requirement, so that if you
have a correctly aligned initial element of an array then later
elements are also correctly aligned.  The problem in our existing
code is that sizeof(WaitEventSet) might not be a multiple of the
alignment requirement of WaitEvent, and either of those two might
not be a multiple of the alignment requirement of struct epoll_event,
etc.  So we should make the code look like

sz += MAXALIGN(sizeof(WaitEventSet));
sz += MAXALIGN(sizeof(WaitEvent) * nevents);

#if defined(WAIT_USE_EPOLL)
sz += MAXALIGN(sizeof(struct epoll_event) * nevents);

etc, so that each of the subsidiary arrays starts on a MAXALIGN boundary.
Where the later array elements fall is taken care of given that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Typo in comment in nbtree.h

2016-06-02 Thread Kevin Grittner
On Wed, Jun 1, 2016 at 4:08 PM, Thomas Munro
 wrote:

> Maybe it should be like this?
>
> --- a/src/include/access/nbtree.h
> +++ b/src/include/access/nbtree.h
> @@ -522,7 +522,7 @@ typedef struct BTScanPosData
> Buffer  buf;/* if valid, the
> buffer is pinned */
>
> XLogRecPtr  lsn;/* pos in the WAL
> stream when page was read */
> -   BlockNumber currPage;   /* page we've referencd by
> items array */
> +   BlockNumber currPage;   /* page referenced by items array */
> BlockNumber nextPage;   /* page's right link when we
> scanned it */
>
> /*

I agree.  Pushed.

Thanks!

-- 
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] epoll_wait returning EFAULT on Linux 3.2.78

2016-06-02 Thread Andres Freund
On 2016-06-02 18:41:00 +0100, Greg Stark wrote:
> On Thu, Jun 2, 2016 at 6:35 PM, Andres Freund  wrote:
> > Want me to polish that up and push, or do you want to go back and forth
> > and push yourself?
> 
> I'm happy to check if my bits still work if it's not too much hassle
> to go back and forth.

No problem.


> > They should be *after* the multiplications. If arrays of structs are
> > internally misaligned there's nothing we can do.
> 
> Well there's not *nothing* we can do. I thought I we were going to
> have to go back and do manual offset calculations to get that right.

The kernel accesses the elements as an array. If the array stride (by
virtue of sizeof) were wrong, we couldn't fix that.


> But with the u64_t element that's probably not necessary. It's a bit
> scary having that be the only thing protecting it but then again it's
> probably exactly why it's there.

I guess the problem occurs due to the kernel checking that the
parameter is correctly aligned to contain an epoll_event array. Which
needs its alignment due to that uint64_t.  I rather doubt that the
syscall requires 8 byte alignment because of the 32 vs. 64 bit split;
that problem exists in a *lot* of places...

> Should the maxaligns be there for the non-epoll cases? I tossed them
> in without paying much attention. I really have no idea if they're
> needed on Windows or not for example.

The might not absolutely be needed in some cases today, but I think it's
better to be safe. Who knows what we'll add after the current elements
at some later point.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] epoll_wait returning EFAULT on Linux 3.2.78

2016-06-02 Thread Greg Stark
On Thu, Jun 2, 2016 at 6:35 PM, Andres Freund  wrote:
> Want me to polish that up and push, or do you want to go back and forth
> and push yourself?

I'm happy to check if my bits still work if it's not too much hassle
to go back and forth.


> They should be *after* the multiplications. If arrays of structs are
> internally misaligned there's nothing we can do.

Well there's not *nothing* we can do. I thought I we were going to
have to go back and do manual offset calculations to get that right.
But with the u64_t element that's probably not necessary. It's a bit
scary having that be the only thing protecting it but then again it's
probably exactly why it's there. Given how much of a pain doing manual
offset calculations I'm happy to go along and say it's not worth
bothering.

Should the maxaligns be there for the non-epoll cases? I tossed them
in without paying much attention. I really have no idea if they're
needed on Windows or not for example.


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] epoll_wait returning EFAULT on Linux 3.2.78

2016-06-02 Thread Andres Freund
On 2016-06-02 18:15:54 +0100, Greg Stark wrote:
> I just threw maxalign everywhere but I was going to comment that we
> might need to put a double in for the subsequent struct elements to
> end up aligned as well.

Hm. Shouldn't be needed if you MAXALIGN the sz computation, because
that'll mean each struct will start at the proper offset.


> But I guess that's not true due to the u64
> element.  I'm not sure I've completely grokked all the bits and pieces
> here so I may have added a few too many maxaligns. I put them on all
> the size and offset calculations before the multiplications.

They should be *after* the multiplications. If arrays of structs are
internally misaligned there's nothing we can do.


> diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
> index 3fbe0e5..5e69a5a 100644
> --- a/src/backend/storage/ipc/latch.c
> +++ b/src/backend/storage/ipc/latch.c
> @@ -485,35 +485,35 @@ CreateWaitEventSet(MemoryContext context, int nevents)
>   char   *data;
>   Sizesz = 0;
>  
> - sz += sizeof(WaitEventSet);
> - sz += sizeof(WaitEvent) * nevents;
> + sz += MAXALIGN(sizeof(WaitEventSet));
> + sz += MAXALIGN(sizeof(WaitEvent)) * nevents;
>  
>  #if defined(WAIT_USE_EPOLL)
> - sz += sizeof(struct epoll_event) * nevents;
> + sz += MAXALIGN(sizeof(struct epoll_event)) * nevents;

So this needs to include the '* nevents' inside the MAXALIGN.


>  #elif defined(WAIT_USE_POLL)
> - sz += sizeof(struct pollfd) * nevents;
> + sz += MAXALIGN(sizeof(struct pollfd)) * nevents;

Same.


>  #elif defined(WAIT_USE_WIN32)
>   /* need space for the pgwin32_signal_event */
> - sz += sizeof(HANDLE) * (nevents + 1);
> + sz += MAXALIGN(sizeof(HANDLE)) * (nevents + 1);
>  #endif

same.


>  
>  #if defined(WAIT_USE_EPOLL)
>   set->epoll_ret_events = (struct epoll_event *) data;
> - data += sizeof(struct epoll_event) * nevents;
> + data += MAXALIGN(sizeof(struct epoll_event)) * nevents;
>  #elif defined(WAIT_USE_POLL)
>   set->pollfds = (struct pollfd *) data;
> - data += sizeof(struct pollfd) * nevents;
> + data += MAXALIGN(sizeof(struct pollfd)) * nevents;
>  #elif defined(WAIT_USE_WIN32)
>   set->handles = (HANDLE) data;
> - data += sizeof(HANDLE) * nevents;
> + data += MAXALIGN(sizeof(HANDLE)) * nevents;
>  #endif

same.


Want me to polish that up and push, or do you want to go back and forth
and push yourself?


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] epoll_wait returning EFAULT on Linux 3.2.78

2016-06-02 Thread Andres Freund
On 2016-06-02 13:24:38 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > That makes sense independent of 32 vs. 64 system. Part of the relevant
> > struct probably require 8 byte alignment:
> 
> >typedef union epoll_data {
> >void*ptr;
> >int  fd;
> >uint32_t u32;
> >uint64_t u64;
> >} epoll_data_t;
> 
> On a 32-bit machine it's entirely possible that that would only require
> 32-bit alignment.

Not on 32bit sparc afaics:
> checking alignment of long long int... 8
> checking alignment of double... 8

> But based on what we know so far, using our regular
> MAXALIGN macro should be good enough.

Yea.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] epoll_wait returning EFAULT on Linux 3.2.78

2016-06-02 Thread Tom Lane
Andres Freund  writes:
> That makes sense independent of 32 vs. 64 system. Part of the relevant
> struct probably require 8 byte alignment:

>typedef union epoll_data {
>void*ptr;
>int  fd;
>uint32_t u32;
>uint64_t u64;
>} epoll_data_t;

On a 32-bit machine it's entirely possible that that would only require
32-bit alignment.  But based on what we know so far, using our regular
MAXALIGN macro should be good enough.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] epoll_wait returning EFAULT on Linux 3.2.78

2016-06-02 Thread Greg Stark
On Thu, Jun 2, 2016 at 6:04 PM, Andres Freund  wrote:
> Greg, are you writing & testing a patch? Or should I write something for
> you to test?

I'm running the regression tests now. They look like they're passing.

I just threw maxalign everywhere but I was going to comment that we
might need to put a double in for the subsequent struct elements to
end up aligned as well. But I guess that's not true due to the u64
element.  I'm not sure I've completely grokked all the bits and pieces
here so I may have added a few too many maxaligns. I put them on all
the size and offset calculations before the multiplications.



-- 
greg
diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
index 3fbe0e5..5e69a5a 100644
--- a/src/backend/storage/ipc/latch.c
+++ b/src/backend/storage/ipc/latch.c
@@ -485,35 +485,35 @@ CreateWaitEventSet(MemoryContext context, int nevents)
 	char	   *data;
 	Size		sz = 0;
 
-	sz += sizeof(WaitEventSet);
-	sz += sizeof(WaitEvent) * nevents;
+	sz += MAXALIGN(sizeof(WaitEventSet));
+	sz += MAXALIGN(sizeof(WaitEvent)) * nevents;
 
 #if defined(WAIT_USE_EPOLL)
-	sz += sizeof(struct epoll_event) * nevents;
+	sz += MAXALIGN(sizeof(struct epoll_event)) * nevents;
 #elif defined(WAIT_USE_POLL)
-	sz += sizeof(struct pollfd) * nevents;
+	sz += MAXALIGN(sizeof(struct pollfd)) * nevents;
 #elif defined(WAIT_USE_WIN32)
 	/* need space for the pgwin32_signal_event */
-	sz += sizeof(HANDLE) * (nevents + 1);
+	sz += MAXALIGN(sizeof(HANDLE)) * (nevents + 1);
 #endif
 
 	data = (char *) MemoryContextAllocZero(context, sz);
 
 	set = (WaitEventSet *) data;
-	data += sizeof(WaitEventSet);
+	data += MAXALIGN(sizeof(WaitEventSet));
 
 	set->events = (WaitEvent *) data;
-	data += sizeof(WaitEvent) * nevents;
+	data += MAXALIGN(sizeof(WaitEvent)) * nevents;
 
 #if defined(WAIT_USE_EPOLL)
 	set->epoll_ret_events = (struct epoll_event *) data;
-	data += sizeof(struct epoll_event) * nevents;
+	data += MAXALIGN(sizeof(struct epoll_event)) * nevents;
 #elif defined(WAIT_USE_POLL)
 	set->pollfds = (struct pollfd *) data;
-	data += sizeof(struct pollfd) * nevents;
+	data += MAXALIGN(sizeof(struct pollfd)) * nevents;
 #elif defined(WAIT_USE_WIN32)
 	set->handles = (HANDLE) data;
-	data += sizeof(HANDLE) * nevents;
+	data += MAXALIGN(sizeof(HANDLE)) * nevents;
 #endif
 
 	set->latch = NULL;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] epoll_wait returning EFAULT on Linux 3.2.78

2016-06-02 Thread Andres Freund
Hi,

On 2016-06-02 17:50:20 +0100, Greg Stark wrote:
> So FYI, it does look like Postgres built in 32-bit mode, at least
> pointers are 32 bits. But I think maxalign might still be enough due
> to doubles being 64 bits.

That makes sense independent of 32 vs. 64 system. Part of the relevant
struct probably require 8 byte alignment:

   typedef union epoll_data {
   void*ptr;
   int  fd;
   uint32_t u32;
   uint64_t u64;
   } epoll_data_t;

   struct epoll_event {
   uint32_t events;/* Epoll events */
   epoll_data_t data;  /* User data variable */
   };

So using long long alignment seems the best bet here.

Greg, are you writing & testing a patch? Or should I write something for
you to test?

Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] epoll_wait returning EFAULT on Linux 3.2.78

2016-06-02 Thread Tom Lane
Greg Stark  writes:
> So FYI, it does look like Postgres built in 32-bit mode, at least
> pointers are 32 bits. But I think maxalign might still be enough due
> to doubles being 64 bits.

OK, let's just maxalign all the components of the WaitEventSet struct
and call it good, at least till we find out differently.  Andres,
do you want to do that?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: pg9.6 segfault using simple query (related to use fk for join estimates)

2016-06-02 Thread Tomas Vondra

Hi,

On 06/02/2016 04:18 PM, Robert Haas wrote:

My opinion is that something needs to be done about this patch. It
needs to be improved or reverted. Improved would be ideal in my
mind, but reverted is an outcome I can live with.


FWIW I'm ready to put my time into fixing the issues, but only if 
there's a consensus it's not doomed anyway (for 9.6).


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] epoll_wait returning EFAULT on Linux 3.2.78

2016-06-02 Thread Greg Stark
So FYI, it does look like Postgres built in 32-bit mode, at least
pointers are 32 bits. But I think maxalign might still be enough due
to doubles being 64 bits.

checking whether long int is 64 bits... no
checking whether long long int is 64 bits... yes
checking snprintf length modifier for long long int... ll
checking whether snprintf supports the %z modifier... yes
checking size of void *... 4
checking size of size_t... 4
checking size of long... 4
checking whether to build with float4 passed by value... yes
checking whether to build with float8 passed by value... no
checking alignment of short... 2
checking alignment of int... 4
checking alignment of long... 4
checking alignment of long long int... 8
checking alignment of double... 8
checking for int8... no
checking for uint8... no
checking for int64... no
checking for uint64... no
checking for __int128... no

I wrote a test program based on latch.c and tested various alignments.
It does indeed get EFAULT for anything other than 8-byte alignment
(the second argument is the number of bytes to offset the events
structure):

$ echo | ./a.out 1 0
epoll_wait(epfd=5, events=0x22018, maxevents=1, timeout=-1)
success
$ echo | ./a.out 1 1
epoll_wait(epfd=5, events=0x22019, maxevents=1, timeout=-1)
epoll_wait: Bad address
$ echo | ./a.out 1 2
epoll_wait(epfd=5, events=0x2201a, maxevents=1, timeout=-1)
epoll_wait: Bad address
$ echo | ./a.out 1 4
epoll_wait(epfd=5, events=0x2201c, maxevents=1, timeout=-1)
epoll_wait: Bad address

The same program gets success for any all offsets on x86.
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

/* typedef in latch.h */
struct WaitEventSet
{
  int nevents;
  int nevents_space;
  int epoll_fd;
  struct epoll_event *epoll_ret_events;
};


int main(int argc, char *argv[], char *envp[]) {

  int rc;
  int nevents = 0;
  int align = 0;
  int size;
  struct WaitEventSet *set;

  if (argc >= 3)
align = atoi(argv[2]);
  if (argc >= 2)
nevents = atoi(argv[1]);

  size = sizeof(struct WaitEventSet);
  size += sizeof(struct epoll_event) * nevents;
  size += align;
  
  set = malloc(size);
  set->nevents = nevents;
  set->nevents_space = nevents;
  set->epoll_fd = rc = epoll_create(nevents);
  if (rc < 0) {
perror("epoll_create");
exit(1);
  }
  set->epoll_ret_events = (struct epoll_event*)(((char*)&set->epoll_ret_events) + sizeof(char *) + align);
  
  struct epoll_event event;
  event.events = EPOLLERR | EPOLLHUP | EPOLLIN;
  event.data.u64 = 0;
  rc = epoll_ctl(set->epoll_fd, EPOLL_CTL_ADD, 0, &event);
  if (rc < 0) {
perror("epoll_ctl(epoll_ctl_add)");
exit(1);
  }

  printf("epoll_wait(epfd=%d, events=%p, maxevents=%d, timeout=%d)\n",
	 set->epoll_fd, set->epoll_ret_events, nevents, -1);

  rc = epoll_wait(set->epoll_fd, set->epoll_ret_events, nevents, -1);
  if (rc < 0) {
perror("epoll_wait");
exit(1);
  }
  printf("success\n");
  exit(0);
}

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] TESTING in src/bin/pg_upgrade has incorrect documentation

2016-06-02 Thread Andreas 'ads' Scherbaum


Hi,

the TESTING file in src/bin/pg_upgrade talks about a "check.sh script", 
but this seems to be a binary (check) now.



Regards,

--
Andreas 'ads' Scherbaum
German PostgreSQL User Group
European PostgreSQL User Group - Board of Directors
Volunteer Regional Contact, Germany - PostgreSQL Project


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel pg_dump's error reporting doesn't work worth squat

2016-06-02 Thread Alvaro Herrera
Tom Lane wrote:
> Kyotaro HORIGUCHI  writes:
> > Apart from the invalid snapshot problem, I looked the patch
> > previously mentioned mainly for Windows.
> 
> Thanks for looking!
> 
> > Even though the threads started by beginthread cannot be
> > terminated cleanly from outside, but the whole process will soon
> > terminate anyway, so we could use TreminateThread. This seems
> > working. (Attached patch)
> 
> Seems reasonable to me; I was unhappy about the lack of any direct
> equivalent to the child SIGTERMs that the Unix code does.

Given this testing, it's clear that the timeout on select() is useless;
we could get rid of it in vacuumdb.c too.  I'll post a patch later.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] kqueue

2016-06-02 Thread Alvaro Herrera
Tom Lane wrote:
> Andres Freund  writes:
> >> pg_strtoi?
> 
> > I think that's what Thomas did upthread. Are you taking this one then?
> 
> I'd go with just "strtoint".  We have "strtoint64" elsewhere.

For closure of this subthread: this rename was committed by Tom as
0ab3595e5bb5.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-06-02 Thread Tom Lane
Josh berkus  writes:
> On 06/02/2016 04:58 AM, Robert Haas wrote:
>> Well, I think we could drop node, if you like.  I think parallel
>> wouldn't be good to drop, though, because it sounds like we want a
>> global limit on parallel workers also, and that can't be just
>> max_workers.  So I think we should keep parallel in there for all of
>> them, and have max_parallel_workers and
>> max_parallel_workers_per_gather(_node).  The reloption and the Path
>> struct field can be parallel_workers rather than parallel_degree.

> So does that mean we'll rename it if you manage to implement a parameter
> which controls the number of workers for the whole statement?

That would fit in as something like max_parallel_workers_per_statement.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-06-02 Thread Josh berkus
On 06/02/2016 04:58 AM, Robert Haas wrote:

> Well, I think we could drop node, if you like.  I think parallel
> wouldn't be good to drop, though, because it sounds like we want a
> global limit on parallel workers also, and that can't be just
> max_workers.  So I think we should keep parallel in there for all of
> them, and have max_parallel_workers and
> max_parallel_workers_per_gather(_node).  The reloption and the Path
> struct field can be parallel_workers rather than parallel_degree.

So does that mean we'll rename it if you manage to implement a parameter
which controls the number of workers for the whole statement?


-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] epoll_wait returning EFAULT on Linux 3.2.78

2016-06-02 Thread Tom Lane
Greg Stark  writes:
> On Thu, Jun 2, 2016 at 3:50 PM, Tom Lane  wrote:
>> I notice that CreateWaitEventSet() is being remarkably cavalier
>> about alignment requirements; maybe it ought to make some effort
>> to ensure that epoll_ret_events is maxaligned.

> That makes a certain amount of sense as This is Debian Sparc64 which
> as I understand it has the kernel compiled in 64-bit mode but not most
> of userland. I bet if I compile Postgres in 64-bit mode it'll turn up
> as an alignment issue in userland before it even reaches the kernel.
> Or maybe not, maybe it's just that __put_user is stricter about
> alignment than really necessary.

Hm.  If you're doing a 32-bit build then we'll likely think that
4-byte alignment *is* maxaligned.  But it's not very hard to believe
that the kernel might be insisting on 8-byte alignment anyway.  That
would be a bug, but one that would easily escape detection for such
a seldom-used kernel call.

We might have to do something to force the pointer to be 8-byte-aligned
even in 32-bit builds.  Ick, but better than falling back to poll().

Can you do a test and see if 8-byte aligning the pointer makes the
EFAULT go away?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-06-02 Thread Masahiko Sawada
On Sat, May 7, 2016 at 5:40 AM, Robert Haas  wrote:
> On Wed, May 4, 2016 at 8:08 PM, Andres Freund  wrote:
>> On 2016-05-02 14:48:18 -0700, Andres Freund wrote:
>>> 77a1d1e Department of second thoughts: remove PD_ALL_FROZEN.
>>
>> Nothing to say here.
>>
>>> fd31cd2 Don't vacuum all-frozen pages.
>>
>> Hm. I do wonder if it's going to bite us that we don't have a way to
>> actually force vacuuming of the whole table (besides manually rm'ing the
>> VM). I've more than once seen VACUUM used to try to do some integrity
>> checking of the database.  How are we actually going to test that the
>> feature works correctly? They'd have to write checks ontop of
>> pg_visibility to see whether things are borked.
>
> Let's add VACUUM (FORCE) or something like that.
>
>> /*
>>  * Compute whether we actually scanned the whole relation. If we 
>> did, we
>>  * can adjust relfrozenxid and relminmxid.
>>  *
>>  * NB: We need to check this before truncating the relation, because 
>> that
>>  * will change ->rel_pages.
>>  */
>>
>> Comment is out-of-date now.
>
> OK.

Fixed.

>> -   if (blkno == next_not_all_visible_block)
>> +   if (blkno == next_unskippable_block)
>> {
>> -   /* Time to advance next_not_all_visible_block */
>> -   for (next_not_all_visible_block++;
>> -next_not_all_visible_block < nblocks;
>> -next_not_all_visible_block++)
>> +   /* Time to advance next_unskippable_block */
>> +   for (next_unskippable_block++;
>> +next_unskippable_block < nblocks;
>> +next_unskippable_block++)
>>
>> Hm. So we continue with the course of re-processing pages, even if
>> they're marked all-frozen. For all-visible there at least can be a
>> benefit by freezing earlier, but for all-frozen pages there's really no
>> point.  I don't really buy the arguments for the skipping logic. But
>> even disregarding that, maybe we should skip processing a block if it's
>> all-frozen (without preventing the page from being read?); as there's no
>> possible benefit?  Acquring the exclusive/content lock and stuff is far
>> from free.
>
> I wanted to tinker with this logic as little as possible in the
> interest of ending up with something that worked.  I would not have
> written it this way.
>
>> Not really related to this patch, but the FORCE_CHECK_PAGE is rather
>> ugly.
>
> +1.
>> +   /*
>> +* The current block is potentially skippable; if 
>> we've seen a
>> +* long enough run of skippable blocks to justify 
>> skipping it, and
>> +* we're not forced to check it, then go ahead and 
>> skip.
>> +* Otherwise, the page must be at least all-visible 
>> if not
>> +* all-frozen, so we can set 
>> all_visible_according_to_vm = true.
>> +*/
>> +   if (skipping_blocks && !FORCE_CHECK_PAGE())
>> +   {
>> +   /*
>> +* Tricky, tricky.  If this is in aggressive 
>> vacuum, the page
>> +* must have been all-frozen at the time we 
>> checked whether it
>> +* was skippable, but it might not be any 
>> more.  We must be
>> +* careful to count it as a skipped 
>> all-frozen page in that
>> +* case, or else we'll think we can't update 
>> relfrozenxid and
>> +* relminmxid.  If it's not an aggressive 
>> vacuum, we don't
>> +* know whether it was all-frozen, so we 
>> have to recheck; but
>> +* in this case an approximate answer is OK.
>> +*/
>> +   if (aggressive || VM_ALL_FROZEN(onerel, 
>> blkno, &vmbuffer))
>> +   vacrelstats->frozenskipped_pages++;
>> continue;
>> +   }
>>
>> Hm. This indeed seems a bit tricky.  Not sure how to make it easier
>> though without just ripping out the SKIP_PAGES_THRESHOLD stuff.
>
> Yep, I had the same problem.
>> Hm. This also doubles the number of VM accesses. While I guess that's
>> not noticeable most of the time, it's still not nice; especially when a
>> large relation is entirely frozen, because it'll mean we'll sequentially
>> go through the visibilityma twice.
>
> Compared to what we're saving, that's obviously a trivial cost.
> That's not to say that we might not want to improve it, but it's
> hardly a disaster.
>
> In short: wah, wah, wah.
>

Attached patch optimises skipping pages logic so

Re: [HACKERS] epoll_wait returning EFAULT on Linux 3.2.78

2016-06-02 Thread Greg Stark
On Thu, Jun 2, 2016 at 3:50 PM, Tom Lane  wrote:
> The man page for epoll_wait suggests that that implies a bad pointer
> value for the events array.  You're showing
> epoll_ret_events = 0x9c0ad4
> which is not obviously bad, but it's also only 4-byte aligned.
> I notice that CreateWaitEventSet() is being remarkably cavalier
> about alignment requirements; maybe it ought to make some effort
> to ensure that epoll_ret_events is maxaligned.


That makes a certain amount of sense as This is Debian Sparc64 which
as I understand it has the kernel compiled in 64-bit mode but not most
of userland. I bet if I compile Postgres in 64-bit mode it'll turn up
as an alignment issue in userland before it even reaches the kernel.
Or maybe not, maybe it's just that __put_user is stricter about
alignment than really necessary.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel pg_dump's error reporting doesn't work worth squat

2016-06-02 Thread Tom Lane
Kyotaro HORIGUCHI  writes:
> Apart from the invalid snapshot problem, I looked the patch
> previously mentioned mainly for Windows.

Thanks for looking!

> Even though the threads started by beginthread cannot be
> terminated cleanly from outside, but the whole process will soon
> terminate anyway, so we could use TreminateThread. This seems
> working. (Attached patch)

Seems reasonable to me; I was unhappy about the lack of any direct
equivalent to the child SIGTERMs that the Unix code does.

> Is there any reason for the name "ourAH" not to be "myAH"?

Don't much care, I'll change it.

> setup_cancel_handler looks somewhat bizarre. It eventually works
> only for the main process/thread and does nothing for workers. It
> is enough to be run once before forking in ParalleBackupStart and
> that makes handler_set unnecessary.

No, because we also want it to work in non-parallel cases.  As coded,
we'll establish the handler whenever set_archive_cancel_info is first
called, which will be in the first ConnectDatabase() call.  It would
be possible to do that someplace else maybe, but it would require
more code changes than just attaching it to set_archive_cancel_info.

> In EndDBCopyMode, the result of PQgetResult is abandoned. This
> can leak memory and such usage is not seen elsewhere in the
> source tree. Shouldn't we hold the result and PQclear it? (Mainly
> as a convention, not for benefit.)

There should never be any non-null result, so that seems like a waste
of code to me.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] epoll_wait returning EFAULT on Linux 3.2.78

2016-06-02 Thread Tom Lane
Greg Stark  writes:
> What I'm seeing is that every call to epoll_wait() raises EFAULT.

The man page for epoll_wait suggests that that implies a bad pointer
value for the events array.  You're showing
epoll_ret_events = 0x9c0ad4
which is not obviously bad, but it's also only 4-byte aligned.
I notice that CreateWaitEventSet() is being remarkably cavalier
about alignment requirements; maybe it ought to make some effort
to ensure that epoll_ret_events is maxaligned.  Personally I'd
make all the subsidiary arrays maxaligned, just to be sure.
The existing coding is taking it on faith that the alignment
requirements of each later component of the WaitEventSet are no
worse than those of any earlier component; which even if true today
is an assumption that will bite us on the ass someday.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] epoll_wait returning EFAULT on Linux 3.2.78

2016-06-02 Thread Andres Freund
Hi,

On 2016-06-02 15:31:15 +0100, Greg Stark wrote:
> What I'm seeing is that every call to epoll_wait() raises EFAULT. I
> don't see anything wrong with the arguments to epoll_wait so unless
> there was some earlier bogus argument to epoll_ctl or something this
> looks like some kind of kernel or glibc problem and it looks like
> we'll to check for EFAULT and fall back to poll (a configure test
> wouldn't help since this is run-time behaviour).

I'd really like knowing what's going on before doing that. epoll was
definitely supported back then, and just falling back to poll seems too
likely to be just papering over bug or portability issue on our side.

Any chance you could provide a strace of a problematic sequence?


- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] array of domain types

2016-06-02 Thread Konstantin Knizhnik

On 02.06.2016 17:22, Tom Lane wrote:

konstantin knizhnik  writes:

Attached please find patch for DefineDomain function.

You didn't attach the patch,


Sorry, but I did attached the patch - I see the attachment in my mail 
received from the group.

Multidimensional arrays work fine:

knizhnik=# SELECT '{{14},{20}}'::teenager[][];
ERROR:  value for domain teenager violates check constraint "teenager_check"
LINE 1: SELECT '{{14},{20}}'::teenager[][];
   ^
knizhnik=# SELECT '{{14},{19}}'::teenager[][];
  teenager
-
 {{14},{19}}
(1 row)

knizhnik=# SELECT ('{{14},{19}}'::teenager[][])[1][1];
 teenager
--
   14
(1 row)


Domain of array of domain also works:


knizhnik=# create domain teenager_groups as teenager[];
CREATE DOMAIN
knizhnik=# SELECT '{{14},{19}}'::teenager_groups;
 teenager_groups
-
 {{14},{19}}
(1 row)

knizhnik=# SELECT '{{14},{20}}'::teenager_groups;
ERROR:  value for domain teenager violates check constraint "teenager_check"
LINE 1: SELECT '{{14},{20}}'::teenager_groups;



  but in any case, I would be astonished
if there is no work required beyond creating the matching array type.
The reverse case (domains over arrays) has half a dozen special cases
required to make it work smoothly.  Probably the considerations on this
side are totally different, but it's hard to believe there aren't any.

One case that seems likely to be pretty squishy is an array of a domain
over an array type.  One would wish to be able to do foo[2][4] to
extract an element of the contained array.  That won't work as-is
because the notation will be taken as a multi-dimensional subscript,
but I would expect that (foo[2])[4] should work.  Does it?  Does
ruleutils.c always include the necessary parens when reverse-listing
such a construct?  Is it possible to assign to such a sub-element,
and if so, do the domain constraints get checked properly?

Domain over an array that is of a domain type might be another fun
case.

regards, tom lane


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] epoll_wait returning EFAULT on Linux 3.2.78

2016-06-02 Thread Greg Stark
I was just trying out a new (well, new to me...) machine here. It
happens that the version of Linux installed is 3.2.78. Apparently the
current version on this branch is 3.2.80 released a couple months ago
and I don't see any relevant changes in .79 or .80 so I think this is
actually likely to be the behaviour in a "supported" version of Linux.

What I'm seeing is that every call to epoll_wait() raises EFAULT. I
don't see anything wrong with the arguments to epoll_wait so unless
there was some earlier bogus argument to epoll_ctl or something this
looks like some kind of kernel or glibc problem and it looks like
we'll to check for EFAULT and fall back to poll (a configure test
wouldn't help since this is run-time behaviour).

Breakpoint 1, WaitEventSetWaitBlock (set=0x9c0a88, cur_timeout=-1,
occurred_events=0xff88f4f8, nevents=1) at latch.c:975
975 int returned_events = 0;

(gdb) n
981 rc = epoll_wait(set->epoll_fd, set->epoll_ret_events,

(gdb) p *set
$9 = {nevents = 3, nevents_space = 3, events = 0x9c0aa4, latch = 0xf75a9808,
  latch_pos = 1, epoll_fd = 3, epoll_ret_events = 0x9c0ad4}

(gdb) p *set->epoll_ret_events
$10 = {events = 0, data = {ptr = 0x0, fd = 0, u32 = 0, u64 = 0}}

(gdb) p nevents
$11 = 1

(gdb) p cur_timeout
$12 = -1

(gdb) n
Breakpoint 3, WaitEventSetWaitBlock (set=0x9c0a88, cur_timeout=-1,
occurred_events=0xff88f4f8, nevents=1) at latch.c:985
985 if (rc < 0)

(gdb) p rc
$13 = -1

(gdb) p errno
$14 = 14


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] array of domain types

2016-06-02 Thread Tom Lane
konstantin knizhnik  writes:
> Attached please find patch for DefineDomain function.

You didn't attach the patch, but in any case, I would be astonished
if there is no work required beyond creating the matching array type.
The reverse case (domains over arrays) has half a dozen special cases
required to make it work smoothly.  Probably the considerations on this
side are totally different, but it's hard to believe there aren't any.

One case that seems likely to be pretty squishy is an array of a domain
over an array type.  One would wish to be able to do foo[2][4] to
extract an element of the contained array.  That won't work as-is
because the notation will be taken as a multi-dimensional subscript,
but I would expect that (foo[2])[4] should work.  Does it?  Does
ruleutils.c always include the necessary parens when reverse-listing
such a construct?  Is it possible to assign to such a sub-element,
and if so, do the domain constraints get checked properly?

Domain over an array that is of a domain type might be another fun
case.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg9.6 segfault using simple query (related to use fk for join estimates)

2016-06-02 Thread Robert Haas
On Wed, Jun 1, 2016 at 9:29 PM, Noah Misch  wrote:
> This PostgreSQL 9.6 open item is past due for your status update.  Kindly send
> a status update within 24 hours, and include a date for your subsequent status
> update.  Refer to the policy on open item ownership:
> http://www.postgresql.org/message-id/20160527025039.ga447...@tornado.leadboat.com

FYI, I spoke to Tom Lane about this at PGCon and suggested that he
look at the proposed patch as I requested in
https://www.postgresql.org/message-id/CA+TgmobPqrAVXOBMHTcpDq8hX7gCzcVhoUvC8s9V=d09+bt...@mail.gmail.com
and see whether that would address his concerns, and he said that he
would do that.  It may, however, have slipped his mind.

My opinion is that something needs to be done about this patch.  It
needs to be improved or reverted.  Improved would be ideal in my mind,
but reverted is an outcome I can live with.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-06-02 Thread David G. Johnston
On Thu, Jun 2, 2016 at 9:59 AM, Tom Lane  wrote:

> Robert Haas  writes:
> > Well, I think we could drop node, if you like.  I think parallel
> > wouldn't be good to drop, though, because it sounds like we want a
> > global limit on parallel workers also, and that can't be just
> > max_workers.  So I think we should keep parallel in there for all of
> > them, and have max_parallel_workers and
> > max_parallel_workers_per_gather(_node).  The reloption and the Path
> > struct field can be parallel_workers rather than parallel_degree.
>
> WFM.
>

​+1​


Re: [HACKERS] Rename max_parallel_degree?

2016-06-02 Thread Tom Lane
Robert Haas  writes:
> Well, I think we could drop node, if you like.  I think parallel
> wouldn't be good to drop, though, because it sounds like we want a
> global limit on parallel workers also, and that can't be just
> max_workers.  So I think we should keep parallel in there for all of
> them, and have max_parallel_workers and
> max_parallel_workers_per_gather(_node).  The reloption and the Path
> struct field can be parallel_workers rather than parallel_degree.

WFM.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-06-02 Thread Robert Haas
On Wed, Jun 1, 2016 at 5:29 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> I've largely given up hope of coming up with an alternative that can
>> attract more than one vote and that is also at least mildly accurate,
>> but one idea is max_parallel_workers_per_gather_node.  That will be
>> totally clear.
>
> Given the reference to Gather nodes, couldn't you drop the word
> "parallel"?  "node" might not be necessary either.

Well, I think we could drop node, if you like.  I think parallel
wouldn't be good to drop, though, because it sounds like we want a
global limit on parallel workers also, and that can't be just
max_workers.  So I think we should keep parallel in there for all of
them, and have max_parallel_workers and
max_parallel_workers_per_gather(_node).  The reloption and the Path
struct field can be parallel_workers rather than parallel_degree.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] array of domain types

2016-06-02 Thread konstantin knizhnik
On Jun 2, 2016, at 12:29 PM, Thom Brown wrote:On 2 June 2016 at 10:13, konstantin knizhnik  wrote:Yes, it doesn't work:# CREATE DOMAIN teenager AS int CHECK (VALUE BETWEEN 13 AND 19);CREATE DOMAIN# SELECT 14::teenager; teenager --   14(1 row)# SELECT 20::teenager;ERROR:  value for domain teenager violates check constraint "teenager_check"# SELECT '{14,20}'::teenager[]; teenager -- {14,20}(1 row)That last one should fail.Yes, I see.This approach was wrong.Attached please find patch for DefineDomain function.

domain.patch
Description: Binary data
Thom



Re: [HACKERS] array of domain types

2016-06-02 Thread Thom Brown
On 2 June 2016 at 10:13, konstantin knizhnik 
wrote:

>
> On Jun 1, 2016, at 4:37 PM, Thom Brown wrote:
>
> On 1 June 2016 at 14:20, Konstantin Knizhnik 
> wrote:
>
>> I wonder why domain types can not be used for specification of array
>> element:
>>
>> create domain objref as bigint;
>> create table foo(x objref[]);
>> ERROR:  type "objref[]" does not exist
>> create table foo(x bigint[]);
>> CREATE TABLE
>>
>> Is there some principle problem here or it is just not implemented?
>
>
> It's not implemented, but patches welcome.
>
> Thom
>
>
> The patch is trivial:  just use typbasetype in get_array_type if typtype
> is TYPTYPE_DOMAIN:
>
> diff --git a/src/backend/utils/cache/lsyscache.c
> b/src/backend/utils/cache/lsyscache.c
> index cb26d79..ecfbb20 100644
> --- a/src/backend/utils/cache/lsyscache.c
> +++ b/src/backend/utils/cache/lsyscache.c
> @@ -2486,7 +2486,18 @@ get_array_type(Oid typid)
> if (HeapTupleIsValid(tp))
> {
> result = ((Form_pg_type) GETSTRUCT(tp))->typarray;
> -   ReleaseSysCache(tp);
> +   if (result == InvalidOid && ((Form_pg_type)
> GETSTRUCT(tp))->typtype == TYPTYPE_DOMAIN) {
> +   typid = ((Form_pg_type)
> GETSTRUCT(tp))->typbasetype;
> +   ReleaseSysCache(tp);
> +   tp = SearchSysCache1(TYPEOID,
> ObjectIdGetDatum(typid));
> +   if (HeapTupleIsValid(tp))
> +   {
> +   result = ((Form_pg_type)
> GETSTRUCT(tp))->typarray;
> +   ReleaseSysCache(tp);
> +   }
> +   } else {
> +   ReleaseSysCache(tp);
> +   }
> }
> return result;
>  }
>
>
> Any problems with it?
>
>
Yes, it doesn't work:

# CREATE DOMAIN teenager AS int CHECK (VALUE BETWEEN 13 AND 19);
CREATE DOMAIN

# SELECT 14::teenager;
 teenager
--
   14
(1 row)

# SELECT 20::teenager;
ERROR:  value for domain teenager violates check constraint "teenager_check"

# SELECT '{14,20}'::teenager[];
 teenager
--
 {14,20}
(1 row)

That last one should fail.

Thom


Re: [HACKERS] array of domain types

2016-06-02 Thread konstantin knizhnik

On Jun 1, 2016, at 4:37 PM, Thom Brown wrote:

> On 1 June 2016 at 14:20, Konstantin Knizhnik  
> wrote:
> I wonder why domain types can not be used for specification of array element:
> 
> create domain objref as bigint;
> create table foo(x objref[]);
> ERROR:  type "objref[]" does not exist
> create table foo(x bigint[]);
> CREATE TABLE
> 
> Is there some principle problem here or it is just not implemented?
> 
> It's not implemented, but patches welcome.
>  
> Thom

The patch is trivial:  just use typbasetype in get_array_type if typtype is 
TYPTYPE_DOMAIN:

diff --git a/src/backend/utils/cache/lsyscache.c 
b/src/backend/utils/cache/lsyscache.c
index cb26d79..ecfbb20 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -2486,7 +2486,18 @@ get_array_type(Oid typid)
if (HeapTupleIsValid(tp))
{
result = ((Form_pg_type) GETSTRUCT(tp))->typarray;
-   ReleaseSysCache(tp);
+   if (result == InvalidOid && ((Form_pg_type) 
GETSTRUCT(tp))->typtype == TYPTYPE_DOMAIN) {
+   typid = ((Form_pg_type) GETSTRUCT(tp))->typbasetype;
+   ReleaseSysCache(tp);
+   tp = SearchSysCache1(TYPEOID, ObjectIdGetDatum(typid));
+   if (HeapTupleIsValid(tp))
+   {
+   result = ((Form_pg_type) 
GETSTRUCT(tp))->typarray;
+   ReleaseSysCache(tp);
+   }
+   } else { 
+   ReleaseSysCache(tp);
+   }
}
return result;
 }


Any problems with it?



[HACKERS] An extra error for client disconnection on Windows

2016-06-02 Thread Kyotaro HORIGUCHI
Hello.

After a process termination without PQfinish() of a client,
server emits the following log message not seen on Linux boxes.

> LOG:  could not receive data from client: An existing connection was forcibly 
> closed by the remote host.

This is because pgwin32_recv reuturns an error ECONNRESET for the
situation instead of returning non-error EOF as recv(2) does.

This patch translates WSAECONNRESET of WSARecv to an EOF so that
pgwin32_recv behaves the same way with Linux.

The attached patch does this.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 7106c56c6606af25ce65b0f5ece8dae095e7c756 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Thu, 2 Jun 2016 09:53:56 +0900
Subject: [PATCH] Avoid unnecessary error message on Windows

On a client process termination on Windows, server emits an error
message which is not seen on Linux boxes. This is caused by a
difference in handling the situation by recv(). This patch translates
WSACONNRESET of WSARecv to just an EOF so that the pgwin32_recv()
behaves as the same with recv() on Linux.
---
 src/backend/port/win32/socket.c | 18 --
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/src/backend/port/win32/socket.c b/src/backend/port/win32/socket.c
index 5d8fb7f..9d4ac6d 100644
--- a/src/backend/port/win32/socket.c
+++ b/src/backend/port/win32/socket.c
@@ -372,6 +372,7 @@ pgwin32_recv(SOCKET s, char *buf, int len, int f)
 	DWORD		b;
 	DWORD		flags = f;
 	int			n;
+	int			lasterror;
 
 	if (pgwin32_poll_signals())
 		return -1;
@@ -383,7 +384,13 @@ pgwin32_recv(SOCKET s, char *buf, int len, int f)
 	if (r != SOCKET_ERROR)
 		return b;/* success */
 
-	if (WSAGetLastError() != WSAEWOULDBLOCK)
+	lasterror = WSAGetLastError();
+
+	/* Unix's recv treats a connection reset by peer as just an EOF */
+	if (lasterror == WSAECONNRESET)
+		return 0;
+
+	if (lasterror != WSAEWOULDBLOCK)
 	{
 		TranslateSocketError();
 		return -1;
@@ -410,7 +417,14 @@ pgwin32_recv(SOCKET s, char *buf, int len, int f)
 		r = WSARecv(s, &wbuf, 1, &b, &flags, NULL, NULL);
 		if (r != SOCKET_ERROR)
 			return b;			/* success */
-		if (WSAGetLastError() != WSAEWOULDBLOCK)
+
+		lasterror = WSAGetLastError();
+
+		/* Unix's recv treats a connection reset by peer as just an EOF */
+		if (lasterror == WSAECONNRESET)
+			return 0;
+
+		if (lasterror != WSAEWOULDBLOCK)
 		{
 			TranslateSocketError();
 			return -1;
-- 
1.8.3.1


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel pg_dump's error reporting doesn't work worth squat

2016-06-02 Thread Kyotaro HORIGUCHI
Apart from the invalid snapshot problem, I looked the patch
previously mentioned mainly for Windows.

At Tue, 31 May 2016 12:29:50 -0400, Tom Lane  wrote in 
<7445.1464712...@sss.pgh.pa.us>
> In the patch I posted yesterday, I reversed the order of those two
> steps, which should fix this problem in most scenarios:
> https://www.postgresql.org/message-id/7005.1464657...@sss.pgh.pa.us

Entering ctrl-C while parallel pg_dump is running, on my console
I saw a repetition of the following error message, which is not
seen on Linux thanks to forcible termination of worker processes
in sigTermHandler().

> pg_dump: Dumping the contents of table "t" failed: PQgetResult() failed.
> pg_dump: Error message from server: ERROR:  canceling statement due to user 
> request
> pg_dump: The command was: COPY public.t (a) TO stdout;

We could provide a global flag (wantAbort?) that inform the
canceling of queries but it needs adding checks for it
everywhere.

Even though the threads started by beginthread cannot be
terminated cleanly from outside, but the whole process will soon
terminate anyway, so we could use TreminateThread. This seems
working. (Attached patch) We might be allowed to do exit() in
colsoleHandler().


Other questions follow.

Is there any reason for the name "ourAH" not to be "myAH"?

setup_cancel_handler looks somewhat bizarre. It eventually works
only for the main process/thread and does nothing for workers. It
is enough to be run once before forking in ParalleBackupStart and
that makes handler_set unnecessary.

In EndDBCopyMode, the result of PQgetResult is abandoned. This
can leak memory and such usage is not seen elsewhere in the
source tree. Shouldn't we hold the result and PQclear it? (Mainly
as a convention, not for benefit.)

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/bin/pg_dump/parallel.c b/src/bin/pg_dump/parallel.c
index cf97d9c..fb48a02 100644
--- a/src/bin/pg_dump/parallel.c
+++ b/src/bin/pg_dump/parallel.c
@@ -618,6 +618,12 @@ consoleHandler(DWORD dwCtrlType)
 			{
 ArchiveHandle *AH = signal_info.pstate->parallelSlot[i].args->AH;
 
+/*
+ * Using TerminateThread here may leave some resources leaked
+ * but whole this process will soon die.
+ */
+TerminateThread(signal_info.pstate->parallelSlot[i].hThread, 0);
+
 if (AH != NULL && AH->connCancel != NULL)
 	(void) PQcancel(AH->connCancel, errbuf, sizeof(errbuf));
 			}
@@ -632,6 +638,8 @@ consoleHandler(DWORD dwCtrlType)
 			errbuf, sizeof(errbuf));
 
 		LeaveCriticalSection(&signal_info_lock);
+
+		write_msg(NULL, "terminated by user\n");
 	}
 
 	/* Always return FALSE to allow signal handling to continue */

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COMMENT ON, psql and access methods

2016-06-02 Thread Michael Paquier
On Thu, Jun 2, 2016 at 3:42 PM, Michael Paquier
 wrote:
> On Thu, Jun 2, 2016 at 1:00 PM, Michael Paquier
>  wrote:
>> I have added an open item for 9.6 regarding this patch, that would be
>> good to complete this work in this release for consistency with the
>> other objects.
>
> Doh. I forgot to update psql --help.

And Query_for_list_of_access_methods was defined more or less twice,
the one of my patch having an error...
-- 
Michael
From 220cb52ecde2943ce909a3a418c3b6e1e5171863 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Thu, 2 Jun 2016 15:59:31 +0900
Subject: [PATCH] Add support for COMMENT ON and psql meta-command for access 
 methods

473b932 has visibly forgotten that access methods, being database objects
could have a description associated with them. Having a psql-level meta
command that allows to list all the access commands available on the
server was missing as well.

At the same time, I have noticed that tab completion of psql could be
more verbose regarding access methods, those things are fixed at the
same time.
---
 contrib/bloom/bloom--1.0.sql   |  1 +
 doc/src/sgml/ref/comment.sgml  |  1 +
 doc/src/sgml/ref/psql-ref.sgml | 13 +
 src/backend/parser/gram.y  |  5 ++--
 src/bin/psql/command.c |  3 +++
 src/bin/psql/describe.c| 61 ++
 src/bin/psql/describe.h|  3 +++
 src/bin/psql/help.c|  1 +
 src/bin/psql/tab-complete.c| 25 -
 9 files changed, 104 insertions(+), 9 deletions(-)

diff --git a/contrib/bloom/bloom--1.0.sql b/contrib/bloom/bloom--1.0.sql
index 7fa7513..87b5442 100644
--- a/contrib/bloom/bloom--1.0.sql
+++ b/contrib/bloom/bloom--1.0.sql
@@ -5,6 +5,7 @@ LANGUAGE C;
 
 -- Access method
 CREATE ACCESS METHOD bloom TYPE INDEX HANDLER blhandler;
+COMMENT ON ACCESS METHOD bloom IS 'bloom index access method';
 
 -- Opclasses
 
diff --git a/doc/src/sgml/ref/comment.sgml b/doc/src/sgml/ref/comment.sgml
index 3321d4b..9582de8 100644
--- a/doc/src/sgml/ref/comment.sgml
+++ b/doc/src/sgml/ref/comment.sgml
@@ -23,6 +23,7 @@ PostgreSQL documentation
 
 COMMENT ON
 {
+  ACCESS METHOD object_name |
   AGGREGATE aggregate_name ( aggregate_signature ) |
   CAST (source_type AS target_type) |
   COLLATION object_name |
diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index df79a37..4079ac6 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -1130,6 +1130,19 @@ testdb=>
 
   
 
+  
+\dA[+] [ pattern ]
+
+
+
+Lists access methods. If pattern is specified, only access
+methods whose names match the pattern are shown. If
++ is appended to the command name, each access
+method is listed with its associated handler function and description.
+
+
+  
 
   
 \db[+] [ pattern ]
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 20384db..5d646e7 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -5693,8 +5693,8 @@ opt_restart_seqs:
  *	The COMMENT ON statement can take different forms based upon the type of
  *	the object associated with the comment. The form of the statement is:
  *
- *	COMMENT ON [ [ CONVERSION | COLLATION | DATABASE | DOMAIN |
- * EXTENSION | EVENT TRIGGER | FOREIGN DATA WRAPPER |
+ *	COMMENT ON [ [ ACCESS METHOD | CONVERSION | COLLATION | DATABASE |
+ * DOMAIN | EXTENSION | EVENT TRIGGER | FOREIGN DATA WRAPPER |
  * FOREIGN TABLE | INDEX | [PROCEDURAL] LANGUAGE |
  * MATERIALIZED VIEW | POLICY | ROLE | SCHEMA | SEQUENCE |
  * SERVER | TABLE | TABLESPACE |
@@ -5896,6 +5896,7 @@ comment_type:
 			| TABLE{ $$ = OBJECT_TABLE; }
 			| VIEW{ $$ = OBJECT_VIEW; }
 			| MATERIALIZED VIEW	{ $$ = OBJECT_MATVIEW; }
+			| ACCESS METHOD		{ $$ = OBJECT_ACCESS_METHOD; }
 			| COLLATION			{ $$ = OBJECT_COLLATION; }
 			| CONVERSION_P		{ $$ = OBJECT_CONVERSION; }
 			| TABLESPACE		{ $$ = OBJECT_TABLESPACE; }
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 693b531..543fe5f 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -402,6 +402,9 @@ exec_command(const char *cmd,
 	/* standard listing of interesting things */
 	success = listTables("tvmsE", NULL, show_verbose, show_system);
 break;
+			case 'A':
+success = describeAccessMethods(pattern, show_verbose);
+break;
 			case 'a':
 success = describeAggregates(pattern, show_verbose, show_system);
 break;
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 0a771bd..4e9e6ac 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -129,6 +129,67 @@ describeAggregates(const char *pattern, bool verbose, bool showSystem)
 	return true;
 }
 
+/* \dA
+ * Takes an optional regexp to select particular access methods
+ */
+bool
+describeAccessM