date:20150810

Tatsuo Ishii is...@postgresql.org writes:
 I think this is because pg_control on the standby remembers that the
 previous primary server's max_connections = 1100 even if the standby
 server fails to start. Shouldn't we update pg_control file only when
 standby succeeds to start?

Somebody refresh my memory as to why we have this restriction (that is,
slave's max_connections = master's max_connections) in the first place?
Seems like it should not be a necessary requirement, and working towards
getting rid of it would be far better than any other answer.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] max_connections and standby server

2015-08-10 Thread Tatsuo Ishii

 Somebody refresh my memory as to why we have this restriction (that is,
 slave's max_connections = master's max_connections) in the first place?
 Seems like it should not be a necessary requirement, and working towards
 getting rid of it would be far better than any other answer.

If you care about max_connections, you might want to care about below as well 
(from xlog.c)

RecoveryRequiresIntParameter(max_worker_processes,
 
max_worker_processes,
 
ControlFile-max_worker_processes);
RecoveryRequiresIntParameter(max_prepared_transactions,
 
max_prepared_xacts,
 
ControlFile-max_prepared_xacts);
RecoveryRequiresIntParameter(max_locks_per_transaction,

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-08-10 Thread Masahiko Sawada

On Tue, Aug 11, 2015 at 1:50 AM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Masahiko Sawada wrote:

 This topic may have been already discussed but, why don't we use just
 total scanned pages and total pages?

 Because those numbers don't extrapolate nicely.  If the density of dead
 tuples is irregular across the table, such absolute numbers might be
 completely meaningless: you could scan 90% of the table without seeing
 any index scan, and then at the final 10% be hit by many index scans
 cleaning dead tuples.  Thus you would see progress go up to 90% very
 quickly and then take hours to have it go to 91%.  (Additionally, and a
 comparatively minor point: since you don't know how many index scans are
 going to happen, there's no way to know the total number of blocks
 scanned, unless you don't count index blocks at all, and then the
 numbers become a lie.)
 If you instead track number of heap pages separately from index pages,
 and indicate how many index scans have taken place, you have a chance of
 actually figuring out how many heap pages are left to scan and how many
 more index scans will occur.

Thank you for your explanation!
I understood about this.

 VACUUM has 3 phases now, but since phases 2 and 3 repeat, you can have an 
 unbounded number of phases. But that assumes that we don't count truncation 
 as a 4th phase of VACUUM...

In case of vacuum, I think we need to track the number of scanned heap
pages at least, and the information about index scan is the additional
information.
The another idea for displaying progress is to have two kind of
information: essential information and additional information.

Essential information has one numeric data, which is stored
essentially information regarding of its processing.
Additional information has two data: text and numeric. These data is
free-style data which is stored by each backend as it like.
And these three data are output at same time.

For example, In case of vacuum, essential information is the number of
total scanned heap page.

* When lazy_scan_heap starts, the two additional data are NULL.

* When lazy_vacuum_index starts, the backend set additional data like
followings.
  - Index vacuuming into text data which describes what we're doing
now actually.
  - 50 into numeric data which describes how many index pages we scanned.

* And when lazy_vacuum_index is done, backend sets additional data NULL again.

Regards,

--
Masahiko Sawada


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] max_connections and standby server

2015-08-10 Thread Michael Paquier

On Tue, Aug 11, 2015 at 2:42 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Tatsuo Ishii is...@postgresql.org writes:
 I think this is because pg_control on the standby remembers that the
 previous primary server's max_connections = 1100 even if the standby
 server fails to start. Shouldn't we update pg_control file only when
 standby succeeds to start?

 Somebody refresh my memory as to why we have this restriction (that is,
 slave's max_connections = master's max_connections) in the first place?
 Seems like it should not be a necessary requirement, and working towards
 getting rid of it would be far better than any other answer.

If I recall correctly, that's because KnownAssignedXIDs and the lock
table need to be large enough on the standby for the largest snapshot
possible (procarray.c).
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PL/pgSQL, RAISE and error context

2015-08-10 Thread Pavel Stehule

2015-08-10 9:11 GMT+02:00 Heikki Linnakangas hlinn...@iki.fi:

 On 07/26/2015 08:34 AM, Pavel Stehule wrote:

 Hi

 here is complete patch, that introduce context filtering on client side.
 The core of this patch is trivial and small - almost all of size are
 trivial changes in regress tests - removing useless context.

 Documentation, check-world


 Looks good to me at first glance. I'll mark this as Ready for Committer.


Is it acceptable for all?

I have not a problem with this way.

Regards

Pavel



 - Heikki

Re: [HACKERS] WIP: Rework access method interface

Alexander Korotkov a.korot...@postgrespro.ru writes:
 On Mon, Aug 10, 2015 at 1:12 PM, Petr Jelinek p...@2ndquadrant.com wrote:
 I don't understand this, there is already AmRoutine in RelationData, why
 the need for additional field for just amsupport?

 We need amsupport in load_relcache_init_file() which reads
 pg_internal.init. I'm not sure this is correct place to call am_handler.
 It should work in the case of built-in AM. But if AM is defined in the
 extension then we wouldn't be able to do catalog lookup for am_handler on
 this stage of initialization.

This is an issue we'll have to face before there's much hope of having
index AMs as extensions: how would you locate any extension function
without catalog access?  Storing raw function pointers in pg_internal.init
is not an answer in an ASLR world.

I think we can dodge the issue so far as pg_internal.init is concerned by
decreeing that system catalogs can only have indexes with built-in AMs.
Calling a built-in function doesn't require catalog access, so there
should be no problem with re-calling the handler function by OID during
load_relcache_init_file().

We could also have problems with WAL replay, though I think the consensus
there is that extension AMs have to use generic WAL records that don't
require any index-specific replay code.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Rework access method interface

2015-08-10 Thread Alexander Korotkov

On Mon, Aug 10, 2015 at 6:36 PM, Petr Jelinek p...@2ndquadrant.com wrote:

 On 2015-08-10 16:58, Alexander Korotkov wrote:

 That should work, thanks! Also we can have SQL-visible functions to get
 amsupport and amstrategies and use them in the regression tests.


 SQL-visible functions would be preferable to storing it in pg_am as
 keeping the params in pg_am would limit the extensibility of pg_am itself.


I actually meant just two particular functions (not per AM) which both get
AM oid as argument. They should call amhandler and return amsupport and
amstrategies correspondingly.
These functions can be used in regression tests to check opclass and
opfamilies correctness.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] WIP: Rework access method interface

2015-08-10 Thread Alvaro Herrera

Tom Lane wrote:

 There are a couple of other pg_am columns, such as amstorage and
 amcanorderbyop, which similarly bear on what's legal to appear in
 related catalogs such as pg_opclass.  I'd be sort of inclined to
 leave those in the catalog as well.  I do not see that exposing
 a SQL function is better than exposing a catalog column; either
 way, that property is SQL-visible.

If we do that, it doesn't seem reasonable to use the same catalog for
other things such as sequence AM, right?  IMO it'd be better to keep the
catalog agnostic for exactly what each row is going to be an AM for.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] cache invalidation skip logic

2015-08-10 Thread Qingqing Zhou

On Sun, Aug 9, 2015 at 8:24 AM, Robert Haas robertmh...@gmail.com wrote:

 In step 1, AcceptInvalidationMessages() should process all pending
 invalidation messages.  So if step 2 did AcceptInvalidationMessages()
 again it would be a no-op, because no messages should remain at that
 point.


That's what I think at first. I would try to see if I can manually repro a case.

Thanks,
Qingqing


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Rework access method interface


On 2015-08-10 18:16, Alvaro Herrera wrote:

Tom Lane wrote:


There are a couple of other pg_am columns, such as amstorage and
amcanorderbyop, which similarly bear on what's legal to appear in
related catalogs such as pg_opclass.  I'd be sort of inclined to
leave those in the catalog as well.  I do not see that exposing
a SQL function is better than exposing a catalog column; either
way, that property is SQL-visible.


If we do that, it doesn't seem reasonable to use the same catalog for
other things such as sequence AM, right?  IMO it'd be better to keep the
catalog agnostic for exactly what each row is going to be an AM for.



Yeah I said the same, the question is if we should have pg_am agnostic 
or just assume that it's index AM and let other AM types create separate 
catalogs. Tom seems to prefer the latter, I don't see problem with that, 
except that I would really hate to add more am related columns to 
pg_class. I would not mind having relam pointing to different AM catalog 
for different relkinds but dunno if that's ok for others (it's not 
really normalized design).


We could also keep pg_am agnostic and add pg_index_am for additional 
info about index AMs, but that would make this patch more invasive.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Rework access method interface

2015-08-10 Thread Alexander Korotkov

On Mon, Aug 10, 2015 at 5:48 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Alexander Korotkov a.korot...@postgrespro.ru writes:
  On Mon, Aug 10, 2015 at 1:12 PM, Petr Jelinek p...@2ndquadrant.com
 wrote:
  I don't understand this, there is already AmRoutine in RelationData, why
  the need for additional field for just amsupport?

  We need amsupport in load_relcache_init_file() which reads
  pg_internal.init. I'm not sure this is correct place to call
 am_handler.
  It should work in the case of built-in AM. But if AM is defined in the
  extension then we wouldn't be able to do catalog lookup for am_handler on
  this stage of initialization.

 This is an issue we'll have to face before there's much hope of having
 index AMs as extensions: how would you locate any extension function
 without catalog access?  Storing raw function pointers in pg_internal.init
 is not an answer in an ASLR world.

 I think we can dodge the issue so far as pg_internal.init is concerned by
 decreeing that system catalogs can only have indexes with built-in AMs.
 Calling a built-in function doesn't require catalog access, so there
 should be no problem with re-calling the handler function by OID during
 load_relcache_init_file().


That should work, thanks! Also we can have SQL-visible functions to get
amsupport and amstrategies and use them in the regression tests.


 We could also have problems with WAL replay, though I think the consensus
 there is that extension AMs have to use generic WAL records that don't
 require any index-specific replay code.


Yes, I've already showed one version of generic WAL records. The main
objecting against them was it's hard insure that composed WAL-record is
correct.
Now I'm working on new version which would be much easier and safe to use.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Moving SS_finalize_plan processing to the end of planning

David Rowley david.row...@2ndquadrant.com writes:
 On 10 August 2015 at 07:50, Tom Lane t...@sss.pgh.pa.us wrote:
 I've started to work on path-ification of the upper planner (finally),

 I was digging around the grouping_planner() last week with the intentions
 of making some changes there to allow GROUP BY before join, but in the end
 decided to stay well clear of this area until this pathification is done.
 So far I've managed to keep my changes away from the upper planner and I've
 added GroupingPath types, which from what I can predict of what you'll be
 making changes to, I think you'll also need to have grouping_planner()
 return a few variations of GroupingPath to allow the paths list to be
 passed up to subquery_planner() and on up to functions
 like recurse_set_operations() so that they have the option of choosing
 GroupAggregate / MergeAppend to implement UNION.

 If I'm right on this, then maybe there's a few things you can copy and
 paste from the patch I posted here:
 http://www.postgresql.org/message-id/CAKJS1f-sEcm=gtfs-dqjsocsz-vlhrp_hsrntjjq-s7egn8...@mail.gmail.com
  specifically around create_grouping_path()?

Yeah, I saw your patch, but have not yet had time to think about what
parts of it I could borrow.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Rework access method interface


On 2015-08-10 17:47, Tom Lane wrote:

Petr Jelinek p...@2ndquadrant.com writes:

On 2015-08-10 16:58, Alexander Korotkov wrote:

That should work, thanks! Also we can have SQL-visible functions to get
amsupport and amstrategies and use them in the regression tests.



SQL-visible functions would be preferable to storing it in pg_am as
keeping the params in pg_am would limit the extensibility of pg_am itself.


I don't see any particularly good reason to remove amsupport and
amstrategies from pg_am.  Those are closely tied to the other catalog
infrastructure for indexes (pg_amproc, pg_amop) which I don't think are
candidates for getting changed by this patch.



Ok, in that case it seems unlikely that we'll be able to use pg_am for 
any other access methods besides indexes in the future.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Rework access method interface


On 2015-08-10 18:08, Tom Lane wrote:

Alexander Korotkov a.korot...@postgrespro.ru writes:

On Mon, Aug 10, 2015 at 6:47 PM, Tom Lane t...@sss.pgh.pa.us wrote:

There are a couple of other pg_am columns, such as amstorage and
amcanorderbyop, which similarly bear on what's legal to appear in
related catalogs such as pg_opclass.  I'd be sort of inclined to
leave those in the catalog as well.  I do not see that exposing
a SQL function is better than exposing a catalog column; either
way, that property is SQL-visible.



That answers my question, thanks!


BTW, just to clarify: we can still have the desirable property that
CREATE INDEX ACCESS METHOD needs no parameters other than the AM handler
function name.  The AM code would still expose all of its properties
through the struct returned by the handler.  What is at issue here is
how information in that struct that needs to be available to SQL code
gets exposed.  We can do that either by exposing SQL functions to get
it, or by having CREATE INDEX ACCESS METHOD call the handler and then
copy some fields into the new pg_am row.  I'm suggesting that we should
do the latter for any fields that determine validity of pg_opclass,
pg_amop, etc entries associated with the AM type.  The AM could not
reasonably change its mind about such properties (within a major
release at least) so there is no harm in making a long-lived copy
in pg_am.  And we might as well not break SQL code unnecessarily
by removing fields that used to be there.



That's definitely the case for built-in AMs but 3rd party ones won't 
necessarily follow PG release schedule/versioning and I can imagine 
change of for example amcanorderbyop between AM releases as the AM 
matures. This could probably be solved by some kind of invalidation 
mechanism (even if it's some additional SQL).


However I am not sure if using catalog as some sort of cache for 
function output is a good idea in general. IMHO it would be better to 
just have those options as part of CREATE and ALTER DDL for INDEX ACCESS 
METHODS if we store them in pg_am.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Asynchronous execution on FDW

2015-08-10 Thread Robert Haas

On Mon, Aug 10, 2015 at 3:23 AM, Heikki Linnakangas hlinn...@iki.fi wrote:
 I've marked this as rejected in the commitfest, because others are
 working on a more general solution with parallel workers. That's still
 work-in-progress, and it's not certain if it's going to make it into
 9.6, but if it does it will largely render this obsolete. We can revisit
 this patch later in the release cycle, if the parallel scan patch hasn't
 solved the same use case by then.

I think the really important issue for this patch is the one discussed here:

http://www.postgresql.org/message-id/ca+tgmoaijk1svzw_gkfu+zssxcijkfelqu2aomvuphpsfw4...@mail.gmail.com

You raised an important issue there but never really expressed an
opinion on the points I raised, here or on the other thread.  And
neither did anyone else except the patch author who, perhaps
unsurprisingly, thinks it's OK.  I wish we could get more discussion
about that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] How to compare different datums within from a tuple?

2015-08-10 Thread Peter Moser


Hello,
I try to write my first patch. It is too early to tell more about it, 
but I am just fiddling around with some prototypes.


Can someone tell me, how I can compare two datum fields, when I do not 
know the data type in advance inside an executor function?


For example, x less than y where x and y are of various types that 
form intervals. I have found the method ExecTuplesMatch, but it is only 
for equality comparison, I think. Another one is ApplySortComparator... 
maybe that's the correct way to go?


Some code to make things clearer...

Datum x = heap_getattr(out-tts_tuple,
node-xpos,
out-tts_tupleDescriptor,
isNull1);
Datum y = slot_getattr(curr, node-ypos, isNull2);

if (compareDatumWithCorrectMethod(x,y)  0)
{
 /* do something */
}

Thx,
Peter


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-08-10 Thread Alvaro Herrera

Masahiko Sawada wrote:

 This topic may have been already discussed but, why don't we use just
 total scanned pages and total pages?

Because those numbers don't extrapolate nicely.  If the density of dead
tuples is irregular across the table, such absolute numbers might be
completely meaningless: you could scan 90% of the table without seeing
any index scan, and then at the final 10% be hit by many index scans
cleaning dead tuples.  Thus you would see progress go up to 90% very
quickly and then take hours to have it go to 91%.  (Additionally, and a
comparatively minor point: since you don't know how many index scans are
going to happen, there's no way to know the total number of blocks
scanned, unless you don't count index blocks at all, and then the
numbers become a lie.)

If you instead track number of heap pages separately from index pages,
and indicate how many index scans have taken place, you have a chance of
actually figuring out how many heap pages are left to scan and how many
more index scans will occur.

 The mechanism of VACUUM is complicated a bit today,

Understatement of the week ...

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tap tests remove working directories




On 08/10/2015 09:55 AM, Tom Lane wrote:

Andrew Dunstan and...@dunslane.net writes:

On 08/09/2015 08:58 PM, Michael Paquier wrote:

Sure. Attached is what I have in mind. Contrary to your version we
keep around temp paths should a run succeed after one that has failed
when running make check multiple times in a row. Perhaps it does not
matter much in practice as log files get removed at each new run but I
think that this allows a finer control in case of failure. Feel free
to have a look.

Actually, I don't think this is a very good idea. You could end up with
a whole series of opaquely named directories from a series of failing
runs. If you want to keep the directory after a failure, and want to do
another run, then rename the directory. That's what you have to do with
the main regression tests, too. My version also has the benefit of
changing exactly 3 lines in the source :-)

FWIW, I think we should keep the behavior of the TAP tests as close as
possible to the established behavior of the main regression tests.

It's certainly possible that there's room for improvement in that
benchmark behavior.  But let's first converge the test behaviors,
and then have a discussion about whether we want changes, and if
so change all the tests at the same time.



Yeah. To do that we should probably stop using File::Temp to make the 
directory, and just use a hardcoded name given to File::Path::mkpath. If 
the directory exists, we'd just remove it first.


That won't be a very big change - probably just a handful of lines.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-08-10 Thread Syed, Rahila

Hello,

When we're in Phase2 or 3, don't we need to report the number of total page 
scanned or percentage of how many table pages scanned, as well?
The total heap pages scanned need to be reported with phase 2 or 3. Complete 
progress report need to have numbers from each phase when applicable. 

 Phase 1. Report 2 integer counters: heap pages scanned and total heap 
 pages,
 1 float counter: percentage_complete and progress message.
 Phase 2. Report 2 integer counters: index pages scanned and total 
 index pages(across all indexes) and progress message.
 Phase 3. 1 integer counter: heap pages vacuumed.

Sorry for being unclear here. What I meant to say is, each phase of a command 
will correspond to a slot in COMMAND_NUM_SLOTS. Each phase will be a separate 
array element and 
will comprise of n integers, n floats, string. So , in the view reporting 
progress, VACUUM command can have 3 entries one for each phase. 

Thank you,
Rahila Syed



__
Disclaimer: This email and any attachments are sent in strictest confidence
for the sole use of the addressee and may contain legally privileged,
confidential, and proprietary data. If you are not the intended recipient,
please advise the sender by replying promptly to this email and then delete
and destroy this email and any attachments without any further use, copying
or forwarding.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Rework access method interface

2015-08-10 Thread Alexander Korotkov

On Mon, Aug 10, 2015 at 6:47 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Petr Jelinek p...@2ndquadrant.com writes:
  On 2015-08-10 16:58, Alexander Korotkov wrote:
  That should work, thanks! Also we can have SQL-visible functions to get
  amsupport and amstrategies and use them in the regression tests.

  SQL-visible functions would be preferable to storing it in pg_am as
  keeping the params in pg_am would limit the extensibility of pg_am
 itself.

 I don't see any particularly good reason to remove amsupport and
 amstrategies from pg_am.  Those are closely tied to the other catalog
 infrastructure for indexes (pg_amproc, pg_amop) which I don't think are
 candidates for getting changed by this patch.

 There are a couple of other pg_am columns, such as amstorage and
 amcanorderbyop, which similarly bear on what's legal to appear in
 related catalogs such as pg_opclass.  I'd be sort of inclined to
 leave those in the catalog as well.  I do not see that exposing
 a SQL function is better than exposing a catalog column; either
 way, that property is SQL-visible.


That answers my question, thanks!

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] WIP: Rework access method interface

Alvaro Herrera alvhe...@2ndquadrant.com writes:
 Tom Lane wrote:
 I think that's likely for the best anyway; there are too many catalogs
 that think a pg_am OID identifies an index AM.  Better to create other
 catalogs for other types of AMs.

 That means we won't be able to reuse pg_class.relam as a pointer to the
 AM-of-the-other-kind either.

Hm.  So one way or the other we're going to end up violating relational
theory somewhere.  OK, I yield: let's say that pg_am has amname, amkind,
amhandler, and nothing else.  Then we will need SQL functions to expose
whatever information we think needs to be available to SQL code.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing



Hello Andres,

Thanks for your comments. Some answers and new patches included.


+ /*
+ * Array of buffer ids of all buffers to checkpoint.
+ */
+static int *CheckpointBufferIds = NULL;


Should be at the beginning of the file. There's a bunch more cases of that.


done.


+/* Compare checkpoint buffers
+ */
+static int bufcmp(const int * pa, const int * pb)
+{
+   BufferDesc
+   *a = GetBufferDescriptor(*pa),
+   *b = GetBufferDescriptor(*pb);


This definitely needs comments about ignoring the normal buffer header
locking.


Added.


Why are we ignoring the database directory? I doubt it'll make a huge
difference, but grouping metadata affecting operations by directory
helps.


I wanted to do the minimal comparisons to order buffers per file, so I 
skipped everything else. My idea of a checkpoint is a lot of data in a few 
files (at least compared to the data...), so I do not think that it is 
worth it. I may be proven wrong!



+static void
+AllocateCheckpointBufferIds(void)
+{
+   /* Safe worst case allocation, all buffers belong to the checkpoint...
+* that is pretty unlikely.
+*/
+   CheckpointBufferIds = (int *) palloc(sizeof(int) * NBuffers);
+}


(wrong comment style...)


Fixed.

Heikki, you were concerned about the size of the allocation of this, 
right? I don't think it's relevant - we used to allocate an array of 
that size for the backend's private buffer pin array until 9.5, so in 
theory we should be safe agains that. NBuffers is limited to INT_MAX/2 
in guc.ċ, which ought to be sufficient?


I think that there is no issue with the current shared_buffers limit. I 
could allocate and use 4 GB on my laptop without problem. I added a cast 
to ensure that unsigned int are used for the size computation.


+ /* + * Lazy allocation: this function is called through the 
checkpointer, + * but also by initdb. Maybe the allocation could be 
moved to the callers. + */ + if (CheckpointBufferIds == NULL) + 
AllocateCheckpointBufferIds(); +




I don't think it's a good idea to allocate this on every round.
That just means a lot of page table entries have to be built and torn 
down regularly. It's not like checkpoints only run for 1% of the time or 
such.


Sure. It is not allocated on every round, it is allocated once on the 
first checkpoint, the variable tested is static. There is no free. Maybe

the allocation could be moved to the callers, though.


FWIW, I still think it's a much better idea to allocate the memory once
in shared buffers.


Hmmm. The memory does not need to be shared with other processes?

It's not like that makes us need more memory overall, and it'll be huge 
page allocations if configured. I also think that sooner rather than 
later we're going to need more than one process flushing buffers, and 
then it'll need to be moved there.


That is an argument. I think that it could wait for the need to actually 
arise.



+   /*
+* Sort buffer ids to help find sequential writes.
+*
+* Note: buffers are not locked in anyway, but that does not matter,
+* this sorting is really advisory, if some buffer changes status during
+* this pass it will be filtered out later.  The only necessary property
+* is that marked buffers do not move elsewhere.
+*/


That reasoning makes it impossible to move the fsyncing of files into 
the loop (whenever we move to a new file). That's not nice.


I do not see why. Moving rsync ahead is definitely an idea that you 
already pointed out, I have given it some thoughts, and it would require 
a carefull implementation and some restructuring. For instance, you do not 
want to issue fsync right after having done writes, you want to wait a 
little bit so that the system had time to write the buffers to disk.



The formulation with necessary property doesn't seem very clear to me?


Removed.

How about: /* * Note: Buffers are not locked in any way during sorting, 
but that's ok: * A change in the buffer header is only relevant when it 
changes the * buffer's identity. If the identity has changed it'll have 
been * written out by BufferAlloc(), so there's no need for checkpointer 
to * write it out anymore. The buffer might also get written out by a * 
backend or bgwriter, but that's equally harmless. */


This new version included.


 Also, qsort implementation
+* should be resilient to occasional contradictions (cmp(a,b) != 
-cmp(b,a))
+* because of these possible concurrent changes.


Hm. Is that actually the case for our qsort implementation?


I think that it is hard to write a qsort which would fail that. That would 
mean that it would compare the same items twice, which would be 
inefficient.


If the pivot element changes its identity won't the result be pretty 
much random?


That would be a very unlikely event, given the short time spent in qsort. 
Anyway, this is not a problem, and is the beauty of the advisory sort:

Re: [HACKERS] Summary of plans to avoid the annoyance of Freezing

On 10 August 2015 at 18:02, Josh Berkus j...@agliodbs.com wrote:


 There's a lesser version of this item which remains relevant unless we
 implement (5).  That is, currently the same autovacuum_vaccuum_delay
 (AVVD) applies to regular autovacuums as does to anti-wraparound
 autovacuums.  If the user has set AV with a high delay, this means that
 anti-wraparound AV may never complete.  For that reason, we ought to
 have a separate parameter for AVVD, which defaults to a lower number
 (like 5ms), or even to zero.

 Of course, if we implement (5), that's not necessary, since AV will
 never trigger an anti-wraparound freeze.


Good idea.


  Having a freeze map would be wholly unnecessary if we don't ever need to
  freeze whole tables again. Freezing would still be needed on individual
  blocks where an old row has been updated or deleted; a freeze map would
  not help there either.
 
  So there is no conflict, but options 2) and 3) are completely redundant
  if we go for 5). After investigation, I now think 5) is achievable in
  9.6, but if I am wrong for whatever reason, we have 2) as a backstop.

 It's not redundant.  Users may still want to freeze for two reasons:

 1. to shrink the clog and multixact logs

 2. to support INDEX-ONLY SCAN


Freezing is not a necessary pre-condition for either of those things, I am
happy to say. There is confusion here because for ( 1 ) the shrink was
performed after freezing, but when you have access to the epoch there is no
need for exhaustive freezing - only in special cases, as noted. If we are
lucky those special cases will mean a massive reduction in I/O. For ( 2 ) a
normal VACUUM is sufficient and as Robert observes, maybe just HOT is
enough.

In the new world, the clog can be shrunk when everything has been hinted.
Given that is possible with just a normal VACUUM, I think the new
anti-freeze design (hey, we need a name, right?) will mean the clog
actually stays smaller in most cases than it does now.


 In both of those cases, having a freeze map would speed up the manual
 vacuum freeze considerably.  Otherwise, we're just punting on the
 problem, and making it worse for users who wait too long.


There would be no further need for the VACUUM FREEZE command. It would do
nothing desirable.


 Now, it might still be the case that the *overhead* of a freeze map is a
 bad tradeoff if we don't have to worry about forced wraparound.  But
 that's a different argument.


I myself was in favour of the freeze map solution for some time, but I'm
not anymore. Thank discussions at Pgcon slowly working their way into my
brain.


 BTW, has it occured to anyone that implementing XID epoch headers is
 going to mean messing with multixact logs again?  Just thought I'd open
 a papercut and pour some lemon juice on it.


 I doubt we have seen the last of that pain, but its not my fingers on the
chopping board, so squeeze all you want. ;-)

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services

Re: [HACKERS] Commitfest remaining Needs Review items




* extends pgbench expressions with functions


Robert signed up as reviewer for this a long time ago. Ping, do you still 
plan to review this?


I seem to recall that I nominated Robert when adding the patch, because
this is an extension of his expression patch. So the sign up is really
just a hint.


* checkpoint continuous flushing


Per discussion on that thread, let's just do the sorting part now. Needs some 
cleanup, but it's almost there.


Given the performance improvements (yes some tps, but also latency), I 
think that the full patch deserves consideration.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing




Ok ok, I stop resisting... I'll have a look.


Here is a v7 ab version which uses shared memory instead of palloc.

--
Fabien.diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e900dcc..1cec243 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2454,6 +2454,28 @@ include_dir 'conf.d'
   /listitem
  /varlistentry
 
+ varlistentry id=guc-checkpoint-sort xreflabel=checkpoint_sort
+  termvarnamecheckpoint_sort/varname (typebool/type)
+  indexterm
+   primaryvarnamecheckpoint_sort/ configuration parameter/primary
+  /indexterm
+  /term
+  listitem
+   para
+Whether to sort buffers before writting them out to disk on checkpoint.
+For a HDD storage, this setting allows to group together
+neighboring pages written to disk, thus improving performance by
+reducing random write activity.
+This sorting should have limited performance effects on SSD backends
+as such storages have good random write performance, but it may
+help with wear-leveling so be worth keeping anyway.
+The default is literalon/.
+This parameter can only be set in the filenamepostgresql.conf/
+file or on the server command line.
+   /para
+  /listitem
+ /varlistentry
+
  varlistentry id=guc-checkpoint-warning xreflabel=checkpoint_warning
   termvarnamecheckpoint_warning/varname (typeinteger/type)
   indexterm
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index e3941c9..f538698 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -546,6 +546,18 @@
   /para
 
   para
+   When hard-disk drives (HDD) are used for terminal data storage
+   xref linkend=guc-checkpoint-sort allows to sort pages
+   so that neighboring pages on disk will be flushed together by
+   chekpoints, reducing the random write load and improving performance.
+   If solid-state drives (SSD) are used, sorting pages induces no benefit
+   as their random write I/O performance is good: this feature could then
+   be disabled by setting varnamecheckpoint_sort/ to valueoff/.
+   It is possible that sorting may help with SSD wear leveling, so it may
+   be kept on that account.
+  /para
+
+  para
The number of WAL segment files in filenamepg_xlog/ directory depends on
varnamemin_wal_size/, varnamemax_wal_size/ and
the amount of WAL generated in previous checkpoint cycles. When old log
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 68e33eb..bee38ab 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7995,11 +7995,13 @@ LogCheckpointEnd(bool restartpoint)
 sync_secs,
 total_secs,
 longest_secs,
+sort_secs,
 average_secs;
 	int			write_usecs,
 sync_usecs,
 total_usecs,
 longest_usecs,
+sort_usecs,
 average_usecs;
 	uint64		average_sync_time;
 
@@ -8030,6 +8032,10 @@ LogCheckpointEnd(bool restartpoint)
 		CheckpointStats.ckpt_end_t,
 		total_secs, total_usecs);
 
+	TimestampDifference(CheckpointStats.ckpt_sort_t,
+		CheckpointStats.ckpt_sort_end_t,
+		sort_secs, sort_usecs);
+
 	/*
 	 * Timing values returned from CheckpointStats are in microseconds.
 	 * Convert to the second plus microsecond form that TimestampDifference
@@ -8048,8 +8054,8 @@ LogCheckpointEnd(bool restartpoint)
 
 	elog(LOG, %s complete: wrote %d buffers (%.1f%%); 
 		 %d transaction log file(s) added, %d removed, %d recycled; 
-		 write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s; 
-		 sync files=%d, longest=%ld.%03d s, average=%ld.%03d s; 
+		 sort=%ld.%03d s, write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s;
+		  sync files=%d, longest=%ld.%03d s, average=%ld.%03d s; 
 		 distance=%d kB, estimate=%d kB,
 		 restartpoint ? restartpoint : checkpoint,
 		 CheckpointStats.ckpt_bufs_written,
@@ -8057,6 +8063,7 @@ LogCheckpointEnd(bool restartpoint)
 		 CheckpointStats.ckpt_segs_added,
 		 CheckpointStats.ckpt_segs_removed,
 		 CheckpointStats.ckpt_segs_recycled,
+		 sort_secs, sort_usecs / 1000,
 		 write_secs, write_usecs / 1000,
 		 sync_secs, sync_usecs / 1000,
 		 total_secs, total_usecs / 1000,
diff --git a/src/backend/storage/buffer/buf_init.c b/src/backend/storage/buffer/buf_init.c
index 3ae2848..ec2436f 100644
--- a/src/backend/storage/buffer/buf_init.c
+++ b/src/backend/storage/buffer/buf_init.c
@@ -65,7 +65,8 @@ void
 InitBufferPool(void)
 {
 	bool		foundBufs,
-foundDescs;
+foundDescs,
+foundCpid;
 
 	/* Align descriptors to a cacheline boundary. */
 	BufferDescriptors = (BufferDescPadded *) CACHELINEALIGN(
@@ -77,10 +78,14 @@ InitBufferPool(void)
 		ShmemInitStruct(Buffer Blocks,
 		NBuffers * (Size) BLCKSZ, foundBufs);
 
-	if (foundDescs || foundBufs)
+	CheckpointBufferIds = (int *)
+		ShmemInitStruct(Checkpoint BufferIds,
+		NBuffers * sizeof(int), foundCpid);
+
+	if (foundDescs || foundBufs || foundCpid)
 	{
-		/* both should be

Re: [HACKERS] checkpointer continuous flushing

On 2015-08-10 19:07:12 +0200, Fabien COELHO wrote:
 I think that there is no issue with the current shared_buffers limit. I
 could allocate and use 4 GB on my laptop without problem. I added a cast to
 ensure that unsigned int are used for the size computation.

You can't allocate 4GB with palloc(), it has a builtin limit against
allocating more than 1GB.

 + /* + * Lazy allocation: this function is called through the
 checkpointer, + * but also by initdb. Maybe the allocation could be
 moved to the callers. + */ + if (CheckpointBufferIds == NULL) +
 AllocateCheckpointBufferIds(); +
 
 
 I don't think it's a good idea to allocate this on every round.
 That just means a lot of page table entries have to be built and torn down
 regularly. It's not like checkpoints only run for 1% of the time or such.
 
 Sure. It is not allocated on every round, it is allocated once on the first
 checkpoint, the variable tested is static. There is no free. Maybe
 the allocation could be moved to the callers, though.

Well, then everytime the checkpointer is restarted.

 FWIW, I still think it's a much better idea to allocate the memory once
 in shared buffers.
 
 Hmmm. The memory does not need to be shared with other processes?

The point is that it's done at postmaster startup, and we're pretty much
guaranteed that the memory will availabl.e.

 It's not like that makes us need more memory overall, and it'll be huge
 page allocations if configured. I also think that sooner rather than later
 we're going to need more than one process flushing buffers, and then it'll
 need to be moved there.
 
 That is an argument. I think that it could wait for the need to actually
 arise.

Huge pages are used today.

 +   /*
 +* Sort buffer ids to help find sequential writes.
 +*
 +* Note: buffers are not locked in anyway, but that does not matter,
 +* this sorting is really advisory, if some buffer changes status during
 +* this pass it will be filtered out later.  The only necessary property
 +* is that marked buffers do not move elsewhere.
 +*/
 
 That reasoning makes it impossible to move the fsyncing of files into the
 loop (whenever we move to a new file). That's not nice.
 
 I do not see why.

Because it means that the sorting isn't necessarily correct. I.e. we
can't rely on it to determine whether a file has already been fsynced.

  Also, qsort implementation
 +* should be resilient to occasional contradictions (cmp(a,b) != 
 -cmp(b,a))
 +* because of these possible concurrent changes.
 
 Hm. Is that actually the case for our qsort implementation?
 
 I think that it is hard to write a qsort which would fail that. That would
 mean that it would compare the same items twice, which would be inefficient.

What? The same two elements aren't frequently compared pairwise with
each other, but of course an individual element is frequently compared
with other elements. Consider what happens when the chosen pivot element
changes its identity after already dividing half. The two partitions
will not be divided in any meaning full way anymore. I don't see how
this will results in a meaningful sort.

 If the pivot element changes its identity won't the result be pretty much
 random?
 
 That would be a very unlikely event, given the short time spent in
 qsort.

Meh, we don't want to rely on likeliness on such things.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing



Hello Andres,


You can't allocate 4GB with palloc(), it has a builtin limit against
allocating more than 1GB.


Argh, too bad, I assumed very naively that palloc was malloc in disguise.


[...]

Well, then everytime the checkpointer is restarted.


Hm...


The point is that it's done at postmaster startup, and we're pretty much
guaranteed that the memory will availabl.e.


Ok ok, I stop resisting... I'll have a look.

Would it also fix the 1 GB palloc limit on the same go? I guess so...



That reasoning makes it impossible to move the fsyncing of files into the
loop (whenever we move to a new file). That's not nice.


I do not see why.


Because it means that the sorting isn't necessarily correct. I.e. we
can't rely on it to determine whether a file has already been fsynced.


Ok, I understand your point.

Then the file would be fsynced twice: if the fsync is done properly (data 
have already been flushed to disk) then it would not cost much, and doing 
it sometimes twice on some file would not be a big issue. The code could 
also detect such event and log a warning, which would give a hint about 
how often it occurs in practice.



Hm. Is that actually the case for our qsort implementation?


I think that it is hard to write a qsort which would fail that. That would
mean that it would compare the same items twice, which would be inefficient.


What? The same two elements aren't frequently compared pairwise with 
each other, but of course an individual element is frequently compared 
with other elements.


Sure.

Consider what happens when the chosen pivot element changes its identity 
after already dividing half. The two partitions will not be divided in 
any meaning full way anymore. I don't see how this will results in a 
meaningful sort.


It would be partly meaningful, which is enough for performance, and does 
not matter for correctness: currently buffers are not sorted at all and it 
works, even if it does not work well.



If the pivot element changes its identity won't the result be pretty much
random?


That would be a very unlikely event, given the short time spent in
qsort.


Meh, we don't want to rely on likeliness on such things.


My main argument is that even if it occurs, and the qsort result is partly 
wrong, it does not change correctness, it just mean that the actual writes 
will be less in order than wished. If it occurs, one pivot separation 
would be quite strange, but then others would be right, so the buffers 
would be partly sorted.


Another issue I see is that even if buffers are locked within cmp, the 
status may change between two cmp... I do not think that locking all 
buffers for sorting them is an option. So on the whole, I think that 
locking buffers for sorting is probably not possible with the simple (and 
efficient) lightweight approach used in the patch.


The good news, as I argued before, is that the order is only advisory to 
help with performance, but the correctness is really that all checkpoint 
buffers are written and fsync is called in the end, and does not depend on 
the buffer order. That is how it currently works anyway.


If you block on this then I'll put a heavy weight approach, but that would 
be a waste of memory in my opinion, hence my argumentation for the 
lightweight approach.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] linux sparc compile issue

2015-08-10 Thread Waldemar Brodkorb

Hi,

while doing regular builds via buildroot autobuilders
a compile problem for sparc 32bit v8 was found.
It seems the defines for Linux are other than for Solaris.

Following patch fixes it for buildroot:

The gcc predefines for Linux are __sparc_v8__/__sparc_v7__

Signed-off-by: Waldemar Brodkorb w...@openadk.org

diff -Nur postgresql-9.4.4.orig/src/include/storage/s_lock.h 
postgresql-9.4.4/src/include/storage/s_lock.h
--- postgresql-9.4.4.orig/src/include/storage/s_lock.h  2015-06-09 
21:29:38.0 +0200
+++ postgresql-9.4.4/src/include/storage/s_lock.h   2015-08-09 
19:57:06.0 +0200
@@ -420,12 +420,12 @@
 :  =r(_res), +m(*lock)
 :  r(lock)
 :  memory);
-#if defined(__sparcv7)
+#if defined(__sparcv7) || defined(__sparc_v7__)
/*
 * No stbar or membar available, luckily no actually produced hardware
 * requires a barrier.
 */
-#elif defined(__sparcv8)
+#elif defined(__sparcv8) || defined(__sparc_v8__)
/* stbar is available (and required for both PSO, RMO), membar isn't */
__asm__ __volatile__ (stbar \n:::memory);
 #else
@@ -438,13 +438,13 @@
return (int) _res;
 }
 
-#if defined(__sparcv7)
+#if defined(__sparcv7) || defined(__sparc_v7__)
 /*
  * No stbar or membar available, luckily no actually produced hardware
  * requires a barrier.
  */
 #define S_UNLOCK(lock) (*((volatile slock_t *) (lock)) = 0)
-#elif defined(__sparcv8)
+#elif defined(__sparcv8) || defined(__sparc_v8__)
 /* stbar is available (and required for both PSO, RMO), membar isn't */
 #define S_UNLOCK(lock) \
 do \


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Summary of plans to avoid the annoyance of Freezing

2015-08-10 Thread Josh Berkus

Simon,

Thank you for this summary!  I was losing track, myself.

On 08/09/2015 11:03 PM, Simon Riggs wrote:
 Freezing is painful for VLDBs and high transaction rate systems. We have
 a number of proposals to improve things...

 3. Speed up autovacuums when they are triggered to avoid wraparounds (Simon)
 Idea is to do a VACUUM scan which only freezes tuples. If we dirty a
 page from freezing we then also prune it, but don't attempt to scan
 indexes to remove the now-truncated dead tuples.
 This looks very straightforward, no technical issues. Might even be able
 to backpatch it.
 [patch investigated but not finished yet]

There's a lesser version of this item which remains relevant unless we
implement (5).  That is, currently the same autovacuum_vaccuum_delay
(AVVD) applies to regular autovacuums as does to anti-wraparound
autovacuums.  If the user has set AV with a high delay, this means that
anti-wraparound AV may never complete.  For that reason, we ought to
have a separate parameter for AVVD, which defaults to a lower number
(like 5ms), or even to zero.

Of course, if we implement (5), that's not necessary, since AV will
never trigger an anti-wraparound freeze.

 Having a freeze map would be wholly unnecessary if we don't ever need to
 freeze whole tables again. Freezing would still be needed on individual
 blocks where an old row has been updated or deleted; a freeze map would
 not help there either.

 So there is no conflict, but options 2) and 3) are completely redundant
 if we go for 5). After investigation, I now think 5) is achievable in
 9.6, but if I am wrong for whatever reason, we have 2) as a backstop.

It's not redundant.  Users may still want to freeze for two reasons:

1. to shrink the clog and multixact logs

2. to support INDEX-ONLY SCAN

In both of those cases, having a freeze map would speed up the manual
vacuum freeze considerably.  Otherwise, we're just punting on the
problem, and making it worse for users who wait too long.

Now, it might still be the case that the *overhead* of a freeze map is a
bad tradeoff if we don't have to worry about forced wraparound.  But
that's a different argument.

BTW, has it occured to anyone that implementing XID epoch headers is
going to mean messing with multixact logs again?  Just thought I'd open
a papercut and pour some lemon juice on it.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] A \pivot command for psql

2015-08-10 Thread Daniel Verite

Tom Lane wrote:

 I'm not sure how pushing it out to psql makes that better.  There is
 no way to do further processing on something that psql has printed,
 so you've punted on solving that issue just as much if not more.

It's the same spirit as \x : the only thing it achieves is better
readability in certain cases, I don't expect more from it.
But I admit that \pivot would be much more of a niche use case
than \x

 Besides which, psql is not all that great for looking at very wide tables,
 so I'm not sure that this would be very useful for dynamic column sets
 anyway.

I use \x all the time with wide tables and \x and \pivot happen to play
out well together (In fact I was pleasantly surprised by this)

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_dump and search_path

2015-08-10 Thread Steve Thames

I earliest reference I found to this issue is here
http://postgresql.nabble.com/set-search-path-in-dump-output-considered-harm
ful-td1947594.html  and refers to the search_path being arbitrarily set in
the file created by pg_dump. This is apparently still the case in 9.4.

 

I found this issue because I use SERIAL/BIGSERIAL columns and when I created
schema-specific tables in a schema other than the first listed in
search_path the nextval() sequence references were schema-qualified.

 

When I created a backup file with pg_dump and then restored using psql, the
nextval() sequence references were no longer schema-qualified because the
backup file set my table schema as the first schema in search_path. I saw
the same result with pg_restore.

 

While the results of \d testschema.testtable shows the schema-qualified
sequence name in nextval():

 

\d testschema.testtable;
Table testschema.testtable
 Column |  Type  | Modifiers

++--
-
 id | integer| not null default
nextval('testschema.testtable_id_seq'::regclass)

 

The actual default read from pg_attrdef does not:

 

   SELECT a.attnum, n.nspname, c.relname, d.adsrc AS default_value 

 FROM pg_attribute AS a 

 JOIN pg_class AS c ON a.attrelid = c.oid 

 JOIN pg_namespace AS n ON c.relnamespace = n.oid 

LEFT JOIN pg_attrdef   AS d ON d.adrelid  = c.oid AND d.adnum = a.attnum


WHERE a.attnum  0   

  AND n.nspname = 'testschema'  

  AND c.relname = 'testtable';

 

attnum |  nspname   |  relname  | default_value 
++---+---
  1 | testschema | testtable | nextval('testtable_id_seq'::regclass)
  2 | testschema | testtable | 

 

This insistency is described here
http://dba.stackexchange.com/questions/21150/default-value-of-serial-fields
-changes-after-restore . 

 

This is not a documented behavior-at least I couldn't find it and I searched
quite a bit. There was no indication to me that when I run pg_dump it will
do something more than I asked it to do and it took me a while to figure out
why. I solved the problem by setting the search_path as pg_dump does when
creating the database so now the restore does not create a different
database than I did.

 

Certainly it would seem a bug that \d and a direct read from pg_attrdef give
different results even though pg_dump determining on its own what the
search_path should be is no doubt an intended behavior. But it seems to me
this should be an option. I expected pg_dump to do what I asked it to do and
when it did something other than that it was quite a headache.

 

What's more, I like schema-qualified references. Schemas are an effective
database organization tool and I teach my people to use them and not depend
on the search path as doing so leads to sloppy and inconsistent thinking as
well as coding.

 

Please consider making the arbitrary determination of search_path by pg_dump
an optional behavior. Or better yet, just have it generate a backup that
accurately reflects the database it is backing up.

 

BTW, I am a huge fan of PostgreSQL.

Cheers!

Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index

2015-08-10 Thread Jeff Janes

On Thu, Oct 30, 2014 at 5:30 AM, Fujii Masao masao.fu...@gmail.com wrote:

 On Thu, Oct 30, 2014 at 7:30 PM, Etsuro Fujita
 fujita.ets...@lab.ntt.co.jp wrote:

  +   {
  +   {pending_list_cleanup_size, PGC_USERSET,
  CLIENT_CONN_STATEMENT,
  +   gettext_noop(Sets the maximum size of the
 pending
  list for GIN index.),
  +NULL,
  +   GUC_UNIT_KB
  +   },
  +   pending_list_cleanup_size,
  +   4096, 0, MAX_KILOBYTES,
  +   NULL, NULL, NULL
  +   },
 
  ISTM it'd be better to use RESOURCES_MEM, not CLIENT_CONN_STATEMENT. No?

 Yes if the pending list always exists in the memory. But not, IIUC.
 Thought?

  Also why not set min to 64, not to 0, in accoradance with that of
 work_mem?

 I'm OK to use 64. But I just chose 0 because I could not think of any
 reasonable
 reason why 64k is suitable as the minimum size of the pending list.
 IOW, I have no idea about whether it's reasonable to use the min value of
 work_mem as the min size of the pending list.



I know I am late to the party here, but would like to have the minimum be
0, not 64.  As long as by zero, it means it doesn't insert anything into
the pending list, rather than inserting and promptly cleaning it up.

The reason for this is that if I am trying to decide what
pending_list_cleanup_size I want to set for the index in the indexes
storage parameters, the way to find that out is try a bunch of different
ones through the guc while the index is still at the default of no
overriding storage parameter.  It would be nice to try the fastupdate=off
alternative (i.e. the same as pending_list_cleanup_size=0) without having
to change the index itself and change the syntax used in the testing.

Cheers,

Jeff

Re: [HACKERS] [patch] A \pivot command for psql

2015-08-10 Thread Daniel Verite

David Fetter wrote:

 I'm working up a proposal to add (UN)PIVOT support to the back-end.

I was under the impression that a server-side PIVOT *with dynamic
columns* was just unworkable as an SQL query, because it couldn't
be prepared if it existed.

I am wrong on that? I feel like you guys are all telling me that \pivot
should happen on the server, but the point that it would not be
realistic to begin with is not considered.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] pg_upgrade fails when postgres/template1 isn't in default tablespace

2015-08-10 Thread Bruce Momjian

On Fri, Jun 19, 2015 at 07:10:40PM +0300, Marti Raudsepp wrote:
 Hi list

Sorry I am just getting this report.  Thanks to the people who stalled
for me.

 One of my databases failed to upgrade successfully and produced this error in
 the copying phase:
 
 error while copying relation pg_catalog.pg_largeobject (/srv/ssd/
 PG_9.3_201306121/1/12023 to /PG_9.4_201409291/1/12130): No such file or
 directory

As with all good bug reports, there are multiple bugs here.  ;-)  Notice
the second path has an invalid prefix:  /PG_9.4_201409291/1/12130 ---
obviously something is seriously wrong.  The failure of pg_dumpall to
move 'postgres' and 'template1' databases to the new tablespace cannot
explain that, i.e. I could understand pg_upgrade looking for
pg_largeobject in the default data directory, or in the new one, but not
in a path that doesn't even exist.  pg_upgrade tracks old and new
tablespaces separately, so there is obviously something wrong.

And I found it, patch attached.  The pg_upgrade code was testing for the
tablespace of the _old_ object, then assigning the old and _new_
tablespaces based on that.  The first patch fixes that, and should be
backpatched to all supported pg_upgrade versions.

 Turns out this happens when either the postgres or template1 databases 
 have
 been moved to a non-default tablespace. pg_dumpall does not dump attributes
 (such as tablespace) for these databases; pg_upgrade queries the new cluster
 about the tablespace for these relations and builds a broken destination path
 for the copy/link operation.
 
 The least bad solution seems to be generating ALTER DATBASE SET TABLESPACE
 commands for these from pg_dumpall. Previously a --globals-only dump didn't
 generate psql \connect commands, but you can't run SET TABLESPACE from within
 the same database you're connected to. So to move postgres, it needs to
 connect to template1 and vice versa. That seems fine for the purpose of
 pg_upgrade which can assume a freshly created cluster with both databases
 intact.

Yes, seems like a good solution.

 I have implemented a proof of concept patch for this. Currently I'm only
 tackling the binary upgrade failure and not general pg_dumpall.
 
 Alternatively, we could allow SET TABLESPACE in the current database, which
 seems less ugly to me. A code comment says Obviously can't move the tables of
 my own database, but it's not obvious to me why. If I'm the only connected
 backend, it seems that any caches and open files could be invalidated. But I
 don't know how big of an undertaking that would be.

Your submitted patch, attached, also looks good, and should be
backpatched.  My only question is whether this should be for all runs of
pg_dumpall, not just in binary upgrade mode.  Comments?

Once we agree on this, I will apply these to all back branches and run
tests by moving template1 before the upgrade to make sure it works for
all PG versions.

 pg_upgrade) misses:
 * Nothing at all is dumped for the template0 database, although ACLs, settings
 and the tablespace can be changed by the user

Uh, we _assume_ no one is connecting to template0, but you are right
people can change things _about_ template0.  It would be nice to
preserve those.

 * template1  postgres databases retain ACLs and settings, but not attributes
 like TABLESPACE or CONNECTION LIMIT. Other attributes like LC_COLLATE and
 LC_CTYPE can't be changed without recreating the DB, but those don't  matter
 for the pg_upgrade case anyway.
 
 It seems those would be good material for another patch?

Agreed.  I am not sure how we have gotten this far with so few
complaints about this problem.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
new file mode 100644
index e158c9f..390fc62
*** a/src/bin/pg_upgrade/info.c
--- b/src/bin/pg_upgrade/info.c
*** create_rel_filename_map(const char *old_
*** 140,145 
--- 140,146 
  		const RelInfo *old_rel, const RelInfo *new_rel,
  		FileNameMap *map)
  {
+ 	/* Someday the old/new tablespaces might not match, so handle it. */
  	if (strlen(old_rel-tablespace) == 0)
  	{
  		/*
*** create_rel_filename_map(const char *old_
*** 147,162 
  		 * exist in the data directories.
  		 */
  		map-old_tablespace = old_data;
- 		map-new_tablespace = new_data;
  		map-old_tablespace_suffix = /base;
- 		map-new_tablespace_suffix = /base;
  	}
  	else
  	{
  		/* relation belongs to a tablespace, so use the tablespace location */
  		map-old_tablespace = old_rel-tablespace;
- 		map-new_tablespace = new_rel-tablespace;
  		map-old_tablespace_suffix = old_cluster.tablespace_suffix;
  		map-new_tablespace_suffix = new_cluster.tablespace_suffix;
  	}
  
--- 148,171 
  		 * exist in the data directories.
  		 */
  		map-old_tablespace = old_data;
  		map-old_tablespace_suffix = /base;

Re: [HACKERS] Summary of plans to avoid the annoyance of Freezing

2015-08-10 Thread Josh Berkus

On 08/10/2015 10:31 AM, Simon Riggs wrote:
 Freezing is not a necessary pre-condition for either of those things, I
 am happy to say. There is confusion here because for ( 1 ) the shrink
 was performed after freezing, but when you have access to the epoch
 there is no need for exhaustive freezing - only in special cases, as
 noted. If we are lucky those special cases will mean a massive reduction
 in I/O. For ( 2 ) a normal VACUUM is sufficient and as Robert observes,
 maybe just HOT is enough.

Yeah, saw your explanation on this on the other thread.  Good point.

Question: does regular vacuum update the visibility map in the same way
vacuum freeze does?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Rework access method interface


On 2015-08-10 16:58, Alexander Korotkov wrote:

On Mon, Aug 10, 2015 at 5:48 PM, Tom Lane t...@sss.pgh.pa.us
mailto:t...@sss.pgh.pa.us wrote:

Alexander Korotkov a.korot...@postgrespro.ru
mailto:a.korot...@postgrespro.ru writes:
 On Mon, Aug 10, 2015 at 1:12 PM, Petr Jelinek p...@2ndquadrant.com 
mailto:p...@2ndquadrant.com wrote:
 I don't understand this, there is already AmRoutine in RelationData, why
 the need for additional field for just amsupport?

 We need amsupport in load_relcache_init_file() which reads
 pg_internal.init. I'm not sure this is correct place to call am_handler.
 It should work in the case of built-in AM. But if AM is defined in the
 extension then we wouldn't be able to do catalog lookup for am_handler on
 this stage of initialization.

This is an issue we'll have to face before there's much hope of having
index AMs as extensions: how would you locate any extension function
without catalog access?  Storing raw function pointers in
pg_internal.init
is not an answer in an ASLR world.

I think we can dodge the issue so far as pg_internal.init is
concerned by
decreeing that system catalogs can only have indexes with built-in AMs.
Calling a built-in function doesn't require catalog access, so there
should be no problem with re-calling the handler function by OID during
load_relcache_init_file().


That should work, thanks! Also we can have SQL-visible functions to get
amsupport and amstrategies and use them in the regression tests.



SQL-visible functions would be preferable to storing it in pg_am as 
keeping the params in pg_am would limit the extensibility of pg_am itself.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-08-10 Thread Masahiko Sawada

On Tue, Aug 11, 2015 at 12:20 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 10 August 2015 at 15:59, Syed, Rahila rahila.s...@nttdata.com wrote:

 Hello,

 When we're in Phase2 or 3, don't we need to report the number of total
  page scanned or percentage of how many table pages scanned, as well?
 The total heap pages scanned need to be reported with phase 2 or 3.
 Complete progress report need to have numbers from each phase when
 applicable.

  Phase 1. Report 2 integer counters: heap pages scanned and total heap
  pages,
  1 float counter: percentage_complete and progress message.
  Phase 2. Report 2 integer counters: index pages scanned and total
  index pages(across all indexes) and progress message.
  Phase 3. 1 integer counter: heap pages vacuumed.

 Sorry for being unclear here. What I meant to say is, each phase of a
 command will correspond to a slot in COMMAND_NUM_SLOTS. Each phase will be a
 separate array element and
 will comprise of n integers, n floats, string. So , in the view reporting
 progress, VACUUM command can have 3 entries one for each phase.


 VACUUM has 3 phases now, but since phases 2 and 3 repeat, you can have an
 unbounded number of phases. But that assumes that we don't count truncation
 as a 4th phase of VACUUM...

Yeah.
This topic may have been already discussed but, why don't we use just
total scanned pages and total pages?

The mechanism of VACUUM is complicated a bit today, and other
maintenance command is as well.
It would be tough to trace these processing, and these might be
changed in the future.
But basically, we can trace total scanned pages of target relation
easily, and such information would be enough at many case.

Regards,

--
Masahiko Sawada


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN pageinspect functions

2015-08-10 Thread Jeff Janes

On Fri, Nov 21, 2014 at 2:04 AM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:

 On 11/20/2014 05:52 AM, Michael Paquier wrote:

 On Wed, Nov 19, 2014 at 7:01 AM, Peter Geoghegan p...@heroku.com wrote:

 On Tue, Nov 4, 2014 at 7:26 AM, Amit Kapila amit.kapil...@gmail.com
 wrote:
 1. Documentation seems to be missing, other API's exposed
 via pageinspect are documented at:
 http://www.postgresql.org/docs/devel/static/pageinspect.html

 Done.

 2.
 +CREATE FUNCTION gin_metapage(IN page bytea,
 +OUT pending_head bigint,
 +OUT pending_tail bigint,
 +OUT version int4)
 +AS 'MODULE_PATHNAME', 'gin_metapage'
 +LANGUAGE C STRICT;
 a. Isn't it better to name the function as gin_metap(..) similar to
 existing function bt_metap(..)?

 I actually liked more gin_metapage_info, a name similar to the
 newly-introduced brin indexes.

 b. Can this function have a similar signature as bt_metap() which means
 it should take input as relname?

 That's mostly a matter of taste but I think we should definitely pass
 a raw page to it as it is now. This has the advantage to add an extra
 check if the page passed is really a meta page of not, something
 useful for development.

 3. Can gin_dataleafpage() API have similar name and signature as
 API bt_page_items() exposed for btree?

 What about gin_leafpage_items then?


 The signature of bt_page_items() isn't a good example to follow. It
 existed before the get_raw_page() function, and the other functions that
 are designed to work with that, was added. gin_leafpage_items() name seems
 fine to me.



When I call gin_leafpage_items on a {leaf} page, I get the ERROR:

ERROR:  input page is not a compressed GIN data leaf page
DETAIL:  Flags 0002, expected 0083

I'm don't know why it won't work on an uncompressed leaf page (or for that
matter, why my index pages are not compressed), but the docs should
probably note the restriction.

Cheers,

Jeff


pageinspect_gin_leaf.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CREATE POLICY and RETURNING

2015-08-10 Thread Stephen Frost

Zhaomo,

* Zhaomo Yang (zmp...@gmail.com) wrote:
 This thread has a pretty thorough discussion of pros and cons of applying
 SELECT policy to other commands. Besides what have been mentioned, I think
 there is another potential pro: we can enable references to pseudorelations
 OLD and NEW in predicates. Now, for UPDATE, the references to the target
 table in USING clause are actually references to OLD and the references in
 WITH CHECK clause are references to NEW. Logically now USING and WITH CHECK
 are combined by AND, so we cannot have predicates like

For my part, I find that the simplicity of having USING only ever refer
to existing records and WITH CHECK only ever refer to records being
added to be good and I'm concerned that this approach would be
confusing.  If no NEW or OLD is used, what happens?  Or would you have
to always specify OLD/NEW for UPDATE, and then what about for the other
policies, and the FOR ALL policies?

   foo(NEW)  1 OR bar(OLD)  1   (combine predicates referencing OLD
 and NEW by an operation other than AND)

Your statement that this can't be done with the existing policy approach
is incorrect, or I've misunderstood what you mean above.  This could be
accomplished with USING (bar  1) and WITH CHECK (foo  1), no?
Your sentence above that USING and WITH CHECK are combined by AND
isn't correct either- they're independent and are therefore really OR'd.
If they were AND'd then the new record would have to pass both USING and
WITH CHECK policies.

   NEW.id  OLD.id(reference both in the same expression)

Here you're correct that this isn't something the existing approach to
UPDATE policies can support.  I don't intend to simply punt on this, but
I will point out that this particular requirement can be handled in a
trigger, if desired.

Further, I'm not sure that I see how this would work in a case where you
have the SELECT policy (which clearly could only refer to OLD) applied
first, as you suggest?

 If we apply SELECT policy to other commands, we only need one predicate for
 INSERT/UPDATE/DELETE. That predicate can reference to OLD and NEW, just like
 predicates for triggers and rules. For UPDATE and DELETE, the predicate of
 SELECT will be applied first (when the target table is scanned) to ensure no
 information leakage and their own predicate will be applied later. This
 doesn't change much for INSERT and DELETE, but it gives users more
 flexibility when they set predicates for UPDATE.

OLD and NEW are only applicable for the UPDATE case and requiring those
to be used for the other policies is adding complexity for a pretty
narrow use-case, as is combining the SELECT USING policy with the USING
(or any) policy for the other commands.

Further, it clearly reduces the range of options for UPDATE as it means
that you can't do blind updates or deletes.  Perhaps that's sensible,
but we could do that by simply AND'ing the SELECT USING policy with the
other command USING policy, but Dean was against that idea up-thread, as
am I, because it adds complexity, and it would further mean that neither
of your suggested predicates above would be supported, as I explained
above.

Also, as far as I see, none of this really amounts to anything different
with regard to RETURNING than the previous proposal of simply applying
the SELECT USING policy to the records first, and the records returned
could still not be allowed by the SELECT policy as they would be the
results of the update, which could still pass the UPDATE policy for new
records.  Applying the SELECT USING policy against the RETURNING records
strikes me, more and more, as just complexity which will cause more
confusion than it helps anyone.  We definitely need to explain how the
USING clause works for the commands and how that impacts RETURNING, as
I've mentioned in the 9.5 open items list, and I'm planning to tackle
that this week.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] checkpointer continuous flushing

Hi,

On 2015-08-08 20:49:03 +0300, Heikki Linnakangas wrote:
 I ripped out the flushing part, keeping only the sorting. I refactored the
 logic in BufferSync() a bit. There's now a separate function,
 nextCheckpointBuffer(), that returns the next buffer ID from the sorted
 list. The tablespace-parallelization behaviour in encapsulated there,
 keeping the code in BufferSync() much simpler. See attached. Needs some
 minor cleanup and commenting still before committing, and I haven't done any
 testing besides a simple make check.

Thought it'd be useful to review the current version as well. Some of
what I'm commenting on you'll probably already have though of under the
label of minor cleanup.

  /*
 + * Array of buffer ids of all buffers to checkpoint.
 + */
 +static int *CheckpointBufferIds = NULL;
 +
 +/* Compare checkpoint buffers
 + */

Should be at the beginning of the file. There's a bunch more cases of that.


 +/* Compare checkpoint buffers
 + */
 +static int bufcmp(const int * pa, const int * pb)
 +{
 + BufferDesc
 + *a = GetBufferDescriptor(*pa),
 + *b = GetBufferDescriptor(*pb);
 +
 + /* tag: rnode, forkNum (different files), blockNum
 +  * rnode: { spcNode (ignore: not really needed),
 +  *   dbNode (ignore: this is a directory), relNode }
 +  * spcNode: table space oid, not that there are at least two
 +  * (pg_global and pg_default).
 +  */
 + /* compare relation */
 + if (a-tag.rnode.spcNode  b-tag.rnode.spcNode)
 + return -1;
 + else if (a-tag.rnode.spcNode  b-tag.rnode.spcNode)
 + return 1;
 + if (a-tag.rnode.relNode  b-tag.rnode.relNode)
 + return -1;
 + else if (a-tag.rnode.relNode  b-tag.rnode.relNode)
 + return 1;
 + /* same relation, compare fork */
 + else if (a-tag.forkNum  b-tag.forkNum)
 + return -1;
 + else if (a-tag.forkNum  b-tag.forkNum)
 + return 1;
 + /* same relation/fork, so same segmented file, compare block number
 +  * which are mapped on different segments depending on the number.
 +  */
 + else if (a-tag.blockNum  b-tag.blockNum)
 + return -1;
 + else /* should not be the same block anyway... */
 + return 1;
 +}

This definitely needs comments about ignoring the normal buffer header
locking.

Why are we ignoring the database directory? I doubt it'll make a huge
difference, but grouping metadata affecting operations by directory
helps.

 +
 +static void
 +AllocateCheckpointBufferIds(void)
 +{
 + /* Safe worst case allocation, all buffers belong to the checkpoint...
 +  * that is pretty unlikely.
 +  */
 + CheckpointBufferIds = (int *) palloc(sizeof(int) * NBuffers);
 +}

(wrong comment style...)

Heikki, you were concerned about the size of the allocation of this,
right? I don't think it's relevant - we used to allocate an array of
that size for the backend's private buffer pin array until 9.5, so in
theory we should be safe agains that. NBuffers is limited to INT_MAX/2
in guc.ċ, which ought to be sufficient?

 + /*
 +  * Lazy allocation: this function is called through the checkpointer,
 +  * but also by initdb. Maybe the allocation could be moved to the 
 callers.
 +  */
 + if (CheckpointBufferIds == NULL)
 + AllocateCheckpointBufferIds();
 +
 

I don't think it's a good idea to allocate this on every round. That
just means a lot of page table entries have to be built and torn down
regularly. It's not like checkpoints only run for 1% of the time or
such.

FWIW, I still think it's a much better idea to allocate the memory once
in shared buffers. It's not like that makes us need more memory overall,
and it'll be huge page allocations if configured. I also think that
sooner rather than later we're going to need more than one process
flushing buffers, and then it'll need to be moved there.

 + /*
 +  * Sort buffer ids to help find sequential writes.
 +  *
 +  * Note: buffers are not locked in anyway, but that does not matter,
 +  * this sorting is really advisory, if some buffer changes status during
 +  * this pass it will be filtered out later.  The only necessary property
 +  * is that marked buffers do not move elsewhere.
 +  */

That reasoning makes it impossible to move the fsyncing of files into
the loop (whenever we move to a new file). That's not nice. The
formulation with necessary property doesn't seem very clear to me?

How about:
/*
 * Note: Buffers are not locked in any way during sorting, but that's ok:
 * A change in the buffer header is only relevant when it changes the
 * buffer's identity. If the identity has changed it'll have been
 * written out by BufferAlloc(), so there's no need for checkpointer to
 * write it out anymore. The buffer might also get written out by a
 * backend or bgwriter, but that's equally harmless.
 */

  Also, qsort implementation
 +  *

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

On 10 August 2015 at 15:59, Syed, Rahila rahila.s...@nttdata.com wrote:

 Hello,

 When we're in Phase2 or 3, don't we need to report the number of total
 page scanned or percentage of how many table pages scanned, as well?
 The total heap pages scanned need to be reported with phase 2 or 3.
 Complete progress report need to have numbers from each phase when
 applicable.

  Phase 1. Report 2 integer counters: heap pages scanned and total heap
  pages,
  1 float counter: percentage_complete and progress message.
  Phase 2. Report 2 integer counters: index pages scanned and total
  index pages(across all indexes) and progress message.
  Phase 3. 1 integer counter: heap pages vacuumed.

 Sorry for being unclear here. What I meant to say is, each phase of a
 command will correspond to a slot in COMMAND_NUM_SLOTS. Each phase will be
 a separate array element and
 will comprise of n integers, n floats, string. So , in the view reporting
 progress, VACUUM command can have 3 entries one for each phase.


VACUUM has 3 phases now, but since phases 2 and 3 repeat, you can have an
unbounded number of phases. But that assumes that we don't count truncation
as a 4th phase of VACUUM...

SELECT statements also have a variable number of phases, hash, materialize,
sorts all act as blocking nodes where you cannot progress to next phase
until it is complete and you don't know for certain how much data will come
in later phases.

I think the best you'll do is an array of pairs of values [(current blocks,
total blocks), ... ]

Knowing how many phases there are is a tough problem. I think the only way
forwards is to admit that we will publish our best initial estimate of
total workload size and then later we may realise it was wrong and publish
a better number (do until complete). It's not wonderful, but la vida es
loca.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services

Re: [HACKERS] WIP: Rework access method interface

Petr Jelinek p...@2ndquadrant.com writes:
 On 2015-08-10 16:58, Alexander Korotkov wrote:
 That should work, thanks! Also we can have SQL-visible functions to get
 amsupport and amstrategies and use them in the regression tests.

 SQL-visible functions would be preferable to storing it in pg_am as 
 keeping the params in pg_am would limit the extensibility of pg_am itself.

I don't see any particularly good reason to remove amsupport and
amstrategies from pg_am.  Those are closely tied to the other catalog
infrastructure for indexes (pg_amproc, pg_amop) which I don't think are
candidates for getting changed by this patch.

There are a couple of other pg_am columns, such as amstorage and
amcanorderbyop, which similarly bear on what's legal to appear in
related catalogs such as pg_opclass.  I'd be sort of inclined to
leave those in the catalog as well.  I do not see that exposing
a SQL function is better than exposing a catalog column; either
way, that property is SQL-visible.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Rework access method interface

Petr Jelinek p...@2ndquadrant.com writes:
 On 2015-08-10 17:47, Tom Lane wrote:
 I don't see any particularly good reason to remove amsupport and
 amstrategies from pg_am.  Those are closely tied to the other catalog
 infrastructure for indexes (pg_amproc, pg_amop) which I don't think are
 candidates for getting changed by this patch.

 Ok, in that case it seems unlikely that we'll be able to use pg_am for 
 any other access methods besides indexes in the future.

I think that's likely for the best anyway; there are too many catalogs
that think a pg_am OID identifies an index AM.  Better to create other
catalogs for other types of AMs.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Rework access method interface

Alexander Korotkov a.korot...@postgrespro.ru writes:
 On Mon, Aug 10, 2015 at 6:47 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 There are a couple of other pg_am columns, such as amstorage and
 amcanorderbyop, which similarly bear on what's legal to appear in
 related catalogs such as pg_opclass.  I'd be sort of inclined to
 leave those in the catalog as well.  I do not see that exposing
 a SQL function is better than exposing a catalog column; either
 way, that property is SQL-visible.

 That answers my question, thanks!

BTW, just to clarify: we can still have the desirable property that
CREATE INDEX ACCESS METHOD needs no parameters other than the AM handler
function name.  The AM code would still expose all of its properties
through the struct returned by the handler.  What is at issue here is
how information in that struct that needs to be available to SQL code
gets exposed.  We can do that either by exposing SQL functions to get
it, or by having CREATE INDEX ACCESS METHOD call the handler and then
copy some fields into the new pg_am row.  I'm suggesting that we should
do the latter for any fields that determine validity of pg_opclass,
pg_amop, etc entries associated with the AM type.  The AM could not
reasonably change its mind about such properties (within a major
release at least) so there is no harm in making a long-lived copy
in pg_am.  And we might as well not break SQL code unnecessarily
by removing fields that used to be there.

There may be some other AM properties that would be better to expose
through SQL functions because they could potentially change without
invalidating information stored in the other catalogs.  I'm not sure
whether there is any such data that needs to be accessible at SQL
level, though.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Rework access method interface

2015-08-10 Thread Alvaro Herrera

Tom Lane wrote:
 Petr Jelinek p...@2ndquadrant.com writes:
  On 2015-08-10 17:47, Tom Lane wrote:
  I don't see any particularly good reason to remove amsupport and
  amstrategies from pg_am.  Those are closely tied to the other catalog
  infrastructure for indexes (pg_amproc, pg_amop) which I don't think are
  candidates for getting changed by this patch.
 
  Ok, in that case it seems unlikely that we'll be able to use pg_am for 
  any other access methods besides indexes in the future.
 
 I think that's likely for the best anyway; there are too many catalogs
 that think a pg_am OID identifies an index AM.  Better to create other
 catalogs for other types of AMs.

That means we won't be able to reuse pg_class.relam as a pointer to the
AM-of-the-other-kind either.  I don't think this is the best course of
action.  We have the sequence AM patch that already reuses
pg_class.relam to point to pg_seqam.oid, but you objected to that on
relational theory grounds, which seemed reasonable to me.  The other
option is to create yet another pg_class column with an OID of some
other AM catalog, but this seems a waste.

FWIW the column store patch we're working on also wants to have its own
AM-like catalog.  In our current code we have a separate catalog
pg_colstore_am, but are eagerly waiting for the discussion on this to
settle so that we can just use pg_am and pg_class.relam instead.  (The
patch itself is not public yet since it's nowhere near usable, and this
detail is a pretty minor issue, but I thought reasonable to bring it up
here.  We're also waiting on upper-planner path-ification since it
seems likely that some code will collide there, anyway.)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Precedence of standard comparison operators

2015-08-10 Thread Geoff Winkless

On 10 August 2015 at 16:31, Tom Lane t...@sss.pgh.pa.us wrote:

 Pavel Stehule pavel.steh...@gmail.com writes:
  On Sun, Aug 09, 2015 at 08:06:11PM -0400, Tom Lane wrote:
  So yeah, I do think that getting a syntax error if you don't use
  parentheses is the preferable behavior here.

  If we raise a syntax error, then there should be very informative
 message,

 Yeah, it would sure be nice to throw something better than syntax error.
 But I've looked at bison's facilities for that, and they're pretty bad.
 I'm not sure there's much we can do short of moving to some other parser
 generator tool, which would be a Large Undertaking ... and I don't know
 of any arguably-better tool anyway.


I would say that anyone who's tricksy enough to be using boolean logic in
the way you describe would be expected to have enough nouse about them to
either a) know what the precedences are or b) know that their lack of
knowledge means they should sprinkle their code with
brackets
.

Returning a syntax error for something that isn't actually an error in
syntax is a poor showing.
Are we to start
expecting
syntax errors when people use comparison operators on
NULLable


columns

without a COALESCE

because the comparison might do something they don't expect
 if they haven't tested their code with NULL values
?

IMO using a = b  c as described is unnecessarily* horrid, *whichever* way
you mean it, and will only produce the sort of unreadability that generates
errors in the long run anyway (even if you understand it, chances are the
next poor sap reading your code won't) and deserves code that breaks,
especially if you're the sort of person who uses it without fully
understanding it.

(*Unnecessarily because there are clearer constructs - CASE springs to mind
- that do the same thing without the associated unreadability and/or
ambiguity)

Geoff

Re: [HACKERS] checkpointer continuous flushing

On August 10, 2015 8:24:21 PM GMT+02:00, Fabien COELHO coe...@cri.ensmp.fr 
wrote:

Hello Andres,

 You can't allocate 4GB with palloc(), it has a builtin limit against
 allocating more than 1GB.

Argh, too bad, I assumed very naively that palloc was malloc in
disguise.

It is, but there's some layering (memory pools/contexts) on top. You can get 
huge allocations with polloc_huge.

Then the file would be fsynced twice: if the fsync is done properly
(data 
have already been flushed to disk) then it would not cost much, and
doing 
it sometimes twice on some file would not be a big issue. The code
could 
also detect such event and log a warning, which would give a hint about

how often it occurs in practice.

Right. At the cost of keeping track of all files...



 If the pivot element changes its identity won't the result be
pretty much
 random?

 That would be a very unlikely event, given the short time spent in
 qsort.

 Meh, we don't want to rely on likeliness on such things.

My main argument is that even if it occurs, and the qsort result is
partly 
wrong, it does not change correctness, it just mean that the actual
writes 
will be less in order than wished. If it occurs, one pivot separation 
would be quite strange, but then others would be right, so the buffers 
would be partly sorted.

It doesn't matter for correctness today, correct. But it makes out impossible 
to rely on or too.

Another issue I see is that even if buffers are locked within cmp, the 
status may change between two cmp...

Sure. That's not what in suggesting. Earlier versions of the patch kept an 
array of buffer headers exactly because of that.

 I do not think that locking all 
buffers for sorting them is an option. So on the whole, I think that 
locking buffers for sorting is probably not possible with the simple
(and 
efficient) lightweight approach used in the patch.

Yes, the other version has a higher space overhead. I'm not convinced that's 
meaningful in comparison to shared buffets in space.
And rather doubtful it a loss performance wise in a loaded server. All the 
buffer headers are touched on other cores and doing the sort with indirection 
will greatly increase bus traffic.

The good news, as I argued before, is that the order is only advisory
to 
help with performance, but the correctness is really that all
checkpoint 
buffers are written and fsync is called in the end, and does not depend
on 
the buffer order. That is how it currently works anyway

It's not particularly desirable to have a performance feature that works less 
well if the server is heavily and concurrently loaded. The likelihood of bogus 
sort results will increase with the churn rate in shared buffers.

Andres

--- 
Please excuse brevity and formatting - I am writing this on my mobile phone.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix pg_dump to dump shell types.

2015-08-10 Thread Peter Eisentraut

On 8/9/15 6:23 PM, Tom Lane wrote:
 It looks to me like the reason for this is that pg_dump forces the
 typacl of a type to be '{=U}' when reading the schema data for a
 pre-9.2 type, rather than reading it as NULL (ie default permissions)
 which would result in not printing any grant/revoke commands for
 the object.
 
 I do not see a good reason for that; quite aside from this problem,
 it means there is one more place that knows the default permissions
 for a type than there needs to be.  Peter, what was the rationale?

This was probably just copied from how proacl and lanacl are handled,
which predate typacl by quite a bit.  Maybe there was a reason in those
days.

It might also have something to do with how owner privileges are
handled.  An explicit '{=U}' doesn't create owner privileges, unlike a
null value in that field.  Maybe this is necessary if you dump and
restore between databases with different user names.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] fix oversight converting buf_id to Buffer

2015-08-10 Thread Qingqing Zhou

Attached patch fixes oversights converting buf_id to Buffer in
PrintBufferDescs() and InvalidateBuffer(). Especially for the latter,
the reason we haven't seen any reports of the issue might be that it
needs certain concurrent conditions to be true.

Along the line, it also changes all direct maths against buf_id to use
BufferDescriptorGetBuffer() instead.

Regards,
Qingqing
diff --git a/src/backend/storage/buffer/bufmgr.c 
b/src/backend/storage/buffer/bufmgr.c
new file mode 100644
index e4b2558..2e9a7c7
*** a/src/backend/storage/buffer/bufmgr.c
--- b/src/backend/storage/buffer/bufmgr.c
*** retry:
*** 1273,1279 
UnlockBufHdr(buf);
LWLockRelease(oldPartitionLock);
/* safety check: should definitely not be our *own* pin */
!   if (GetPrivateRefCount(buf-buf_id)  0)
elog(ERROR, buffer is pinned in InvalidateBuffer);
WaitIO(buf);
goto retry;
--- 1273,1279 
UnlockBufHdr(buf);
LWLockRelease(oldPartitionLock);
/* safety check: should definitely not be our *own* pin */
!   if (GetPrivateRefCount(BufferDescriptorGetBuffer(buf))  0)
elog(ERROR, buffer is pinned in InvalidateBuffer);
WaitIO(buf);
goto retry;
*** ReleaseAndReadBuffer(Buffer buffer,
*** 1426,1441 
  static bool
  PinBuffer(volatile BufferDesc *buf, BufferAccessStrategy strategy)
  {
!   int b = buf-buf_id;
boolresult;
PrivateRefCountEntry *ref;
  
!   ref = GetPrivateRefCountEntry(b + 1, true);
  
if (ref == NULL)
{
ReservePrivateRefCountEntry();
!   ref = NewPrivateRefCountEntry(b + 1);
  
LockBufHdr(buf);
buf-refcount++;
--- 1426,1441 
  static bool
  PinBuffer(volatile BufferDesc *buf, BufferAccessStrategy strategy)
  {
!   Buffer  b = BufferDescriptorGetBuffer(buf);
boolresult;
PrivateRefCountEntry *ref;
  
!   ref = GetPrivateRefCountEntry(b, true);
  
if (ref == NULL)
{
ReservePrivateRefCountEntry();
!   ref = NewPrivateRefCountEntry(b);
  
LockBufHdr(buf);
buf-refcount++;
*** PinBuffer(volatile BufferDesc *buf, Buff
*** 1460,1467 
  
ref-refcount++;
Assert(ref-refcount  0);
!   ResourceOwnerRememberBuffer(CurrentResourceOwner,
!   
BufferDescriptorGetBuffer(buf));
return result;
  }
  
--- 1460,1466 
  
ref-refcount++;
Assert(ref-refcount  0);
!   ResourceOwnerRememberBuffer(CurrentResourceOwner, b);
return result;
  }
  
*** PinBuffer(volatile BufferDesc *buf, Buff
*** 1489,1511 
  static void
  PinBuffer_Locked(volatile BufferDesc *buf)
  {
!   int b = buf-buf_id;
PrivateRefCountEntry *ref;
  
/*
 * As explained, We don't expect any preexisting pins. That allows us to
 * manipulate the PrivateRefCount after releasing the spinlock
 */
!   Assert(GetPrivateRefCountEntry(b + 1, false) == NULL);
  
buf-refcount++;
UnlockBufHdr(buf);
  
!   ref = NewPrivateRefCountEntry(b + 1);
ref-refcount++;
  
!   ResourceOwnerRememberBuffer(CurrentResourceOwner,
!   
BufferDescriptorGetBuffer(buf));
  }
  
  /*
--- 1488,1509 
  static void
  PinBuffer_Locked(volatile BufferDesc *buf)
  {
!   Buffer  b = BufferDescriptorGetBuffer(buf);
PrivateRefCountEntry *ref;
  
/*
 * As explained, We don't expect any preexisting pins. That allows us to
 * manipulate the PrivateRefCount after releasing the spinlock
 */
!   Assert(GetPrivateRefCountEntry(b, false) == NULL);
  
buf-refcount++;
UnlockBufHdr(buf);
  
!   ref = NewPrivateRefCountEntry(b);
ref-refcount++;
  
!   ResourceOwnerRememberBuffer(CurrentResourceOwner, b);
  }
  
  /*
*** static void
*** 1520,1533 
  UnpinBuffer(volatile BufferDesc *buf, bool fixOwner)
  {
PrivateRefCountEntry *ref;
  
/* not moving as we're likely deleting it soon anyway */
!   ref = GetPrivateRefCountEntry(buf-buf_id + 1, false);
Assert(ref != NULL);
  
if (fixOwner)
!   ResourceOwnerForgetBuffer(CurrentResourceOwner,
! 
BufferDescriptorGetBuffer(buf));
  
Assert(ref-refcount  0);
ref-refcount--;
--- 1518,1531 
  UnpinBuffer(volatile BufferDesc *buf, bool fixOwner)
  {
PrivateRefCountEntry *ref;
+   Buffer  b = BufferDescriptorGetBuffer(buf);
  
/* not

Re: [HACKERS] WIP: SCRAM authentication

2015-08-10 Thread Josh Berkus

On 08/09/2015 07:19 PM, Michael Paquier wrote:
 ... during my exchange with Michael, I was thinking about the bug
  potential of taking the password field and multiplexing it in some way,
  which is significant.  There is a definite risk of making this too
  complicated and we'll need to contrast that against ease-of-migration,
  because complicated mechanisms tend to be less secure due to user error.

 Sure. That's why I am all in for adding a compatibility GUC or similar
 that enforces the removal of old verifier types after marking those as
 deprecated for a couple of years as there's surely a significant risk
 to keep old passwords around or bad pg_hba entries. Still we need IMO
 a way for a user to save multiple verifiers generated from a client to
 manage carefully the password verifier aging, deprecations and support
 removal.

That still falls under the heading of possibly too complicated though.

As I see it, there's two potential migration paths:

1. Allow multiple verifiers for each login role (Heikki's plan).

2. Set verifier type per login role.

(2) has the advantage of not requiring a bunch of new scaffolding
(although it will require a little) and thus being less likely to
introduce new bugs.  It also doesn't require adding new catalog
structures which are really only needed for the migration period, and
after which will become a wart (i.e. having multiple verifiers per login
role).

In real migration terms, though, (2) has some major drawbacks in terms
of making migration much harder.

a) it won't work for Heroku and other 1-login-per-database hosting.

b) moving to multiple roles from single roles per app is a painful
process currently.

For (a), one could argue that these are good candidates for all at
once migrations, and that moving to SCRAM will depend on the host
supporting it.  Someone from Heroku could speak up here.

For (b), there are a lot of things we could do to make migrating to a
multiple-role infra much easier for users, which would continue to be
useful even after the migration to SCRAM is history:

* remove the role requirement for ALTER DEFAULT PRIVILEGES, and ...

* add ALTER ROLE ___ ALTER DEFAULT OWNER, a command which sets the
default owner of newly created objects by that login role to a different
role of which they are a member.  Alternatively, add a way to make a
default SET ROLE whenever a login role logs in.

These two changes, or changes like them that serve the same purpose,
would allow us to prescribe the following migration path for most users:

1. add a new login role which is a member of the old login role and uses
SCRAM;

2. set the defaults for that role so that its objects and permissions
belong to the parent role;

3. move all applications to using SCRAM and the new role;

4. disable logins on the old, parent role.

It's currently (2) which is painfully difficult, and could be made less
so via the two features recommended above.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: SCRAM authentication

2015-08-10 Thread Stephen Frost

* Robert Haas (robertmh...@gmail.com) wrote:
 On Sat, Aug 8, 2015 at 1:23 PM, Heikki Linnakangas hlinn...@iki.fi wrote:
  On 08/08/2015 04:27 PM, Robert Haas wrote:
  I don't see that there's any good reason to allow the same password to
  be stored in the catalog encrypted more than one way,
 
  Sure there is. If you want to be able to authenticate using different
  mechanism, you need the same password encrypted in different ways. SCRAM
  uses verifier that's derived from the password in one way, MD5
  authentication needs an MD5 hash, and yet other protocols have other
  requirements.
 
 Why do we need to be able to authenticate using more than one
 mechanism?  If you have some clients that can't support SCRAM yet, you
 might as well continue using MD5 across the board until that changes.
 You're not going to get much real security out of using MD5 for some
 authentication attempts and SCRAM for other ones, and the amount of
 infrastructure we're proposing to introduce to support that is pretty
 substantial.

I agree with Josh that requiring a hard switch simply won't be
acceptable to our user community and if decide there can only be one
running at a time then we'll never get people to move off md5 and
therefore we wouldn't ever make real progress here.  Apologies to Josh
if I've misconstrued anything in his excellent response.

  and I don't think there's any good reason to introduce the PASSWORD
  VERIFIER terminology.  I think we should store (1) your password,
  either encrypted or unencrypted; and (2) the method used to encrypt
  it.  And that's it.
 
 
  Like Joe and Stephen, I actually find it highly confusing that we call the
  MD5 hash an encrypted password. The term password verifier is fairly
  common in the specifications of authentication mechanisms. I think we should
  adopt it.
 
 OK, but it sure looked from Michael's syntax description like you
 wrote PASSWORD VERIFIER (md5 'the_actual_password').   Or at least
 that was my impression from reading it, maybe I got it wrong.  If you
 want to introduce ALTER USER ... PASSWORD VERIFIER as alternative
 syntax for what we now call ALTER USER ... ENCRYPTED PASSWORD, that
 works for me.  But a plaintext password shouldn't be called a password
 verifier under the terminology you are using here, IIUC.

Apologies for my incomplete up-thread definition of a password verifier,
it was a bit off-the-cuff while I was out of the country and referred to
the specific password verifier in SCRAM, which isn't a cleartext
or encrypted password or a simple hash of it, or a salted hash.

To flip it around as a positive definition instead of the negative it's
not X, Y or Z which I provided up-thread, a password verifier is a
general term for anything that can be used to verify that the client
knows the right password, so it technically could be the cleartext
version of the password, or a simple hash of the password, or an
encrypted version of the password, or pretty much anything else that
works for the protocol.  The reason it came about was because of this
very issue that there wasn't a general term for the value- it was
referred to as a salted hash, or encrypted, or just hashed, etc, and
those terms don't apply to everything that can be used today as a
password verifier.

As such, it's actually correct usage in this case as it's a general
term, rather than the specific and incorrect term encrypted which we
have today.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Asynchronous execution on FDW

2015-08-10 Thread Stephen Frost

* Robert Haas (robertmh...@gmail.com) wrote:
On Mon, Aug 10, 2015 at 3:23 AM, Heikki Linnakangas hlinn...@iki.fi wrote:
I've marked this as rejected in the commitfest, because others are
working on a more general solution with parallel workers. That's still
work-in-progress, and it's not certain if it's going to make it into
9.6, but if it does it will largely render this obsolete. We can revisit
this patch later in the release cycle, if the parallel scan patch hasn't
solved the same use case by then.

I think the really important issue for this patch is the one discussed here:

http://www.postgresql.org/message-id/ca+tgmoaijk1svzw_gkfu+zssxcijkfelqu2aomvuphpsfw4...@mail.gmail.com

I agree that it'd be great to figure out the answer to #2, but I'm also
of the opinion that we can either let the user tell us through the use
of the GUCs proposed in the patch or simply not worry about the
potential for time wastage associated with starting them all at once, as
you suggested there.

You raised an important issue there but never really expressed an
opinion on the points I raised, here or on the other thread. And
neither did anyone else except the patch author who, perhaps
unsurprisingly, thinks it's OK. I wish we could get more discussion
about that.

When I read the proposal, I had the same reaction that it didn't seem
like quite the right place and it further bothered me that it was
specific to FDWs.

Perhaps not surprisingly, as I authored it, but I'm still a fan of my
proposal #1 here:

http://www.postgresql.org/message-id/20131104032604.gb2...@tamriel.snowman.net

More generally, I completely agree with the position (I believe your's,
but I might be misremembering) that we want to have this async
capability independently and in addition to parallel scan. I don't
believe one obviates the advantages of the other.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] FSM versus GIN pending list bloat

2015-08-10 Thread Jeff Janes

On Tue, Aug 4, 2015 at 12:38 PM, Jeff Janes jeff.ja...@gmail.com wrote:

 On Tue, Aug 4, 2015 at 1:39 AM, Simon Riggs si...@2ndquadrant.com wrote:

 On 4 August 2015 at 06:03, Jeff Janes jeff.ja...@gmail.com wrote:


 The attached proof of concept patch greatly improves the bloat for both
 the insert and the update cases.  You need to turn on both features: adding
 the pages to fsm, and vacuuming the fsm, to get the benefit (so JJ_GIN=3).
 The first of those two things could probably be adopted for real, but the
 second probably is not acceptable.  What is the right way to do this?
 Could a variant of RecordFreeIndexPage bubble the free space up `the map
 immediately rather than waiting for a vacuum?  It would only have to move
 up until it found a page with freespace already recorded in it, which the
 vast majority of the time would mean observing up one level and then not
 writing to it, assuming the pending list pages remain well clustered.


 You make a good case for action here since insert only tables with GIN
 indexes on text are a common use case for GIN.

 Why would vacuuming the FSM be unacceptable? With a
 large gin_pending_list_limit it makes sense.


 But with a smallish gin_pending_list_limit (like the default 4MB) this
 could be called a lot (multiple times a second during some spurts), and
 would read the entire fsm each time.



 If it is unacceptable, perhaps we can avoid calling it every time, or
 simply have FreeSpaceMapVacuum() terminate more quickly on some kind of
 80/20 heuristic for this case.


 Or maybe it could be passed a range of blocks which need vacuuming, so it
 concentrated on that range.

 But from the README file, it sounds like it is already supposed to be
 bubbling up.  I'll have to see just whats going on there when I get a
 chance.


Before making changes to the FSM code to make immediate summarization
possible, I decided to quantify the effect of vacuuming the entire fsm.
Going up to 5 GB of index size, the time taken to vacuum the entire FSM one
time for each GIN_NDELETE_AT_ONCE was undetectable.

Based on that, I made this patch which vacuums it one time per completed
ginInsertCleanup, which should be far less than once per
GIN_NDELETE_AT_ONCE.

I would be interested in hearing what people with very large GIN indexes
think of it.  It does seem like at some point the time needed must become
large, but from what I can tell that point is way beyond what someone is
likely to have for an index on an unpartitioned table.


I have a simple test case that inserts an array of 101 md5 digests into
each row.  With 10_000 of these rows inserted into an already indexed
table, I get 40MB for the table and 80MB for the index unpatched.  With the
patch, I get 7.3 MB for the index.

Cheers,

Jeff


gin_fast_freespace_v001.patch
Description: Binary data


gin_freespace2.pl
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Summary of plans to avoid the annoyance of Freezing

2015-08-10 Thread Alvyhank

Having a freeze map would be wholly unnecessary if we don't ever need to
freeze whole tables again. Freezing would still be needed on individual
blocks where an old row has been updated or deleted; a freeze map would not
help there either.

So there is no conflict, but options 2) and 3) are completely redundant if
we go for 5). After investigation, I now think 5) is achievable in 9.6, but
if I am wrong for whatever reason, we have 2) as a backstop for more go to h
ttp://www.pillenpalast.com/ http://www.pillenpalast.com/  



-
Kamagra http://www.pillenpalast.com/
--
View this message in context: 
http://postgresql.nabble.com/Summary-of-plans-to-avoid-the-annoyance-of-Freezing-tp5861530p5861534.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix pg_dump to dump shell types.

I wrote:
 But now that you mention it, isn't that completely broken?  What pg_dump
 actually prints given this made-up data is

 REVOKE ALL ON TYPE myshell FROM PUBLIC;
 REVOKE ALL ON TYPE myshell FROM postgres;
 GRANT ALL ON TYPE myshell TO PUBLIC;

 which seems like a completely insane interpretation.  There is no way
 that dumping a type from a pre-typacl database and restoring it into
 a newer one should end up with the type's owner having no privileges
 on it.  I'm astonished that we've not gotten bug reports about that.

... of course, the reason for no field reports is that the owner still
has privileges, as she inherits them from PUBLIC.  Doesn't make this
any less broken though.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] linux sparc compile issue

On 2015-08-10 17:36:57 -0400, Tom Lane wrote:
 Waldemar Brodkorb w...@openadk.org writes:
  while doing regular builds via buildroot autobuilders
  a compile problem for sparc 32bit v8 was found.
  It seems the defines for Linux are other than for Solaris.
 
  Following patch fixes it for buildroot:
  The gcc predefines for Linux are __sparc_v8__/__sparc_v7__
 
 I've applied your suggested patch for this, but I'm a bit curious what
 version of gcc you are using; our code's been like that for a very long
 time and nobody complained before.

Hm. Could this be caused by using a new solaris studio compiler? It
apparently uses gcc as a frontend and thus might actually reach that bit
of code.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tap tests remove working directories




On 08/10/2015 10:32 AM, Andrew Dunstan wrote:



On 08/10/2015 09:55 AM, Tom Lane wrote:

Andrew Dunstan and...@dunslane.net writes:

On 08/09/2015 08:58 PM, Michael Paquier wrote:

Sure. Attached is what I have in mind. Contrary to your version we
keep around temp paths should a run succeed after one that has failed
when running make check multiple times in a row. Perhaps it does not
matter much in practice as log files get removed at each new run but I
think that this allows a finer control in case of failure. Feel free
to have a look.

Actually, I don't think this is a very good idea. You could end up with
a whole series of opaquely named directories from a series of failing
runs. If you want to keep the directory after a failure, and want to do
another run, then rename the directory. That's what you have to do with
the main regression tests, too. My version also has the benefit of
changing exactly 3 lines in the source :-)

FWIW, I think we should keep the behavior of the TAP tests as close as
possible to the established behavior of the main regression tests.

It's certainly possible that there's room for improvement in that
benchmark behavior.  But let's first converge the test behaviors,
and then have a discussion about whether we want changes, and if
so change all the tests at the same time.



Yeah. To do that we should probably stop using File::Temp to make the 
directory, and just use a hardcoded name given to File::Path::mkpath. 
If the directory exists, we'd just remove it first.


That won't be a very big change - probably just a handful of lines.




Unfortunately, it's rather more complicated. These tests are extremely 
profligate with space and time, creating not less than 29 data 
directories in 19 temporary directories, including 14 temp directories 
in scripts alone, and not counting those in pg_rewind which puts two 
more data directories in a place which is different from all the others. 
And of course, in those directories (scripts and pg_ctl) that have 
multiple temp directories, we have no idea which one goes with which set 
of tests.


It's no wonder that these tests take an inordinate amount of time to run 
- on crake, where it takes 23 seconds for make installcheck to run, it 
takes 176 seconds for make -C src/bin installcheck to run. My bet is 
all those initdb calls account for a substantial amount of that time.


I think this needs a significant bit of reworking.

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [sqlsmith] subplan variable reference / unassigned NestLoopParams

Andreas Seltenreich seltenre...@gmx.de writes:
 Tom Lane writes:
 Andreas Seltenreich seltenre...@gmx.de writes:
 Tom Lane writes:
 Well, I certainly think all of these represent bugs:
 3 | ERROR:  plan should not reference subplan's variable
 2 | ERROR:  failed to assign all NestLoopParams to plan nodes

 These appear to be related.  The following query produces the former,
 but if you replace the very last reference of provider with the literal
 'bar', it raises the latter error.

 Fixed that, thanks for the test case!

 I haven't seen the former since your commit, but there recently were
 some instances of the latter.  The attached queries all throw the error
 against master at 8752bbb.

Turns out my patch only addressed some cases of the underlying issue.
I've gone back for a second swipe at it; thanks for the continued testing!

BTW, the commit hashes you've been mentioning don't seem to correspond to
anything in the master PG repo ... are you working with modified sources?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] linux sparc compile issue

Waldemar Brodkorb w...@openadk.org writes:
 while doing regular builds via buildroot autobuilders
 a compile problem for sparc 32bit v8 was found.
 It seems the defines for Linux are other than for Solaris.

 Following patch fixes it for buildroot:
 The gcc predefines for Linux are __sparc_v8__/__sparc_v7__

I've applied your suggested patch for this, but I'm a bit curious what
version of gcc you are using; our code's been like that for a very long
time and nobody complained before.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] fix oversight converting buf_id to Buffer

Hi,

That's a very nice catch! Did you trigger the error or just found it
when reading the code?

On 2015-08-10 12:12:01 -0700, Qingqing Zhou wrote:
 Attached patch fixes oversights converting buf_id to Buffer in
 PrintBufferDescs() and InvalidateBuffer(). Especially for the latter,
 the reason we haven't seen any reports of the issue might be that it
 needs certain concurrent conditions to be true.

PrintBufferDescs() is dead code, so bit is not surprising. I'm also not
surprised that the wrong buffer in InvalidateBuffer()'s check doesn't
trigger. You'd have to have a cursor open in the current backend that
currently pins the off-by-one buffer.

I'm too tired right now to look at this, but it generally looked sane.

Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix pg_dump to dump shell types.

I wrote:
 Peter Eisentraut pete...@gmx.net writes:
 This was probably just copied from how proacl and lanacl are handled,
 which predate typacl by quite a bit.  Maybe there was a reason in those
 days.

 Hm ... I wonder whether those are well-thought-out either.

They're not.  Testing with ancient servers shows that we dump very silly
grant/revoke state for functions and languages as well, if the source
server is too old to have proacl or lanacl (ie, pre-7.3).  As with typacl,
the silliness is accidentally masked as long as the owner doesn't do
something like revoke the privileges granted to PUBLIC.

Things work far more sanely with the attached patch, to wit we just leave
all object privileges as default if dumping from a version too old to have
privileges on that type of object.  I think we should back-patch this into
all supported branches; it's considerably more likely that older branches
would be used to dump from ancient servers.

regards, tom lane

diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 51b6d3a..87dadbf 100644
*** a/src/bin/pg_dump/pg_dump.c
--- b/src/bin/pg_dump/pg_dump.c
*** getTypes(Archive *fout, int *numTypes)
*** 3513,3519 
  	else if (fout-remoteVersion = 80300)
  	{
  		appendPQExpBuffer(query, SELECT tableoid, oid, typname, 
! 		  typnamespace, '{=U}' AS typacl, 
  		  (%s typowner) AS rolname, 
  		  typinput::oid AS typinput, 
  		  typoutput::oid AS typoutput, typelem, typrelid, 
--- 3513,3519 
  	else if (fout-remoteVersion = 80300)
  	{
  		appendPQExpBuffer(query, SELECT tableoid, oid, typname, 
! 		  typnamespace, NULL AS typacl, 
  		  (%s typowner) AS rolname, 
  		  typinput::oid AS typinput, 
  		  typoutput::oid AS typoutput, typelem, typrelid, 
*** getTypes(Archive *fout, int *numTypes)
*** 3528,3534 
  	else if (fout-remoteVersion = 70300)
  	{
  		appendPQExpBuffer(query, SELECT tableoid, oid, typname, 
! 		  typnamespace, '{=U}' AS typacl, 
  		  (%s typowner) AS rolname, 
  		  typinput::oid AS typinput, 
  		  typoutput::oid AS typoutput, typelem, typrelid, 
--- 3528,3534 
  	else if (fout-remoteVersion = 70300)
  	{
  		appendPQExpBuffer(query, SELECT tableoid, oid, typname, 
! 		  typnamespace, NULL AS typacl, 
  		  (%s typowner) AS rolname, 
  		  typinput::oid AS typinput, 
  		  typoutput::oid AS typoutput, typelem, typrelid, 
*** getTypes(Archive *fout, int *numTypes)
*** 3542,3548 
  	else if (fout-remoteVersion = 70100)
  	{
  		appendPQExpBuffer(query, SELECT tableoid, oid, typname, 
! 		  0::oid AS typnamespace, '{=U}' AS typacl, 
  		  (%s typowner) AS rolname, 
  		  typinput::oid AS typinput, 
  		  typoutput::oid AS typoutput, typelem, typrelid, 
--- 3542,3548 
  	else if (fout-remoteVersion = 70100)
  	{
  		appendPQExpBuffer(query, SELECT tableoid, oid, typname, 
! 		  0::oid AS typnamespace, NULL AS typacl, 
  		  (%s typowner) AS rolname, 
  		  typinput::oid AS typinput, 
  		  typoutput::oid AS typoutput, typelem, typrelid, 
*** getTypes(Archive *fout, int *numTypes)
*** 3558,3564 
  		appendPQExpBuffer(query, SELECT 
  		 (SELECT oid FROM pg_class WHERE relname = 'pg_type') AS tableoid, 
  		  oid, typname, 
! 		  0::oid AS typnamespace, '{=U}' AS typacl, 
  		  (%s typowner) AS rolname, 
  		  typinput::oid AS typinput, 
  		  typoutput::oid AS typoutput, typelem, typrelid, 
--- 3558,3564 
  		appendPQExpBuffer(query, SELECT 
  		 (SELECT oid FROM pg_class WHERE relname = 'pg_type') AS tableoid, 
  		  oid, typname, 
! 		  0::oid AS typnamespace, NULL AS typacl, 
  		  (%s typowner) AS rolname, 
  		  typinput::oid AS typinput, 
  		  typoutput::oid AS typoutput, typelem, typrelid, 
*** getAggregates(Archive *fout, DumpOptions
*** 4249,4255 
    CASE WHEN aggbasetype = 0 THEN 0 ELSE 1 END AS pronargs, 
  		  aggbasetype AS proargtypes, 
  		  (%s aggowner) AS rolname, 
! 		  '{=X}' AS aggacl 
  		  FROM pg_aggregate 
  		  where oid  '%u'::oid,
  		  username_subquery,
--- 4249,4255 
    CASE WHEN aggbasetype = 0 THEN 0 ELSE 1 END AS pronargs, 
  		  aggbasetype AS proargtypes, 
  		  (%s aggowner) AS rolname, 
! 		  NULL AS aggacl 
  		  FROM pg_aggregate 
  		  where oid  '%u'::oid,
  		  username_subquery,
*** getAggregates(Archive *fout, DumpOptions
*** 4264,4270 
    CASE WHEN aggbasetype = 0 THEN 0 ELSE 1 END AS pronargs, 
  		  aggbasetype AS proargtypes, 
  		  (%s aggowner) AS rolname, 
! 		  '{=X}' AS aggacl 
  		  FROM pg_aggregate 
  		  where oid  '%u'::oid,
  		  username_subquery,
--- 4264,4270 
    CASE WHEN aggbasetype = 0 THEN 0 ELSE 1 END AS pronargs, 
  		  aggbasetype AS proargtypes, 
  		  (%s aggowner) AS rolname, 
!

Re: [HACKERS] [patch] A \pivot command for psql

2015-08-10 Thread David Fetter

On Mon, Aug 10, 2015 at 07:10:41PM +0200, Daniel Verite wrote:
   David Fetter wrote:
 
  I'm working up a proposal to add (UN)PIVOT support to the back-end.
 
 I was under the impression that a server-side PIVOT *with dynamic
 columns* was just unworkable as an SQL query, because it couldn't be
 prepared if it existed.

That depends on what you mean by dynamic columns.  The approach
taken in the tablefunc extension is to use functions which return
SETOF RECORD, which in turn need to be cast at runtime.

At least one other implementation takes two separate approaches, both
interesting at least from my point of view.

The first is to spell out all the columns in the query, which is not
dynamic columns in any reasonable sense, as hand-craft one piece of
SQL whose purpose is to generate the actual pivot SQL is not
a reasonable level of dynamic.

The second, more on point, is to specify a serialization for the rows
in the dynamic columns case.  Their syntax is PIVOT XML, but I
would rather do something more like PIVOT (SERIALIZATION XML).

A third idea I have only roughed out feels a bit like cheating:
creating a function which returns two distinct rowtypes via REFCURSORs
or similar.  The first result, used to create a CAST, spells out the
row type of the second.

 I am wrong on that? I feel like you guys are all telling me that
 \pivot should happen on the server, but the point that it would not
 be realistic to begin with is not considered.

I think that starting the conversation again, especially at this stage
of the 9.6 cycle, is a very good thing, whatever its outcome.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ON CONFLICT DO UPDATE using EXCLUDED.column gives an error about mismatched types

2015-08-10 Thread Peter Geoghegan

On Wed, Aug 5, 2015 at 12:58 AM, Amit Langote
langote_amit...@lab.ntt.co.jp wrote:
 I forgot mentioning one thing later yesterday. The way exclRelTlist is
 initialized, all the way in the beginning (transformOnConflictClause), is
 most probably to blame. It uses expandRelAttrs() for other valid reasons;
 but within it, expandRTE() is called with 'false' for include_dropped
 (columns). I applied the attached (perhaps ugly) fix, and it seemed to fix
 the issue. But, I guess you'll be able to come up with some better fix.

Thanks for working on this. I think you have this right, but I also
don't think we should discuss this fix outside of the context of a
wider discussion about the excluded targetlist.

What I'm going to do is roll this into my own pending patch to fix the
issue with wholerow vars, which is also down to a problem with the
excluded targetlist initially generated by calling expandRelAttrs().
Andres might want to take an alternative approach with that, so a
consolidation makes sense.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix pg_dump to dump shell types.




On 08/10/2015 07:29 PM, Tom Lane wrote:

I wrote:

Peter Eisentraut pete...@gmx.net writes:

This was probably just copied from how proacl and lanacl are handled,
which predate typacl by quite a bit.  Maybe there was a reason in those
days.

Hm ... I wonder whether those are well-thought-out either.

They're not.  Testing with ancient servers shows that we dump very silly
grant/revoke state for functions and languages as well, if the source
server is too old to have proacl or lanacl (ie, pre-7.3).  As with typacl,
the silliness is accidentally masked as long as the owner doesn't do
something like revoke the privileges granted to PUBLIC.

Things work far more sanely with the attached patch, to wit we just leave
all object privileges as default if dumping from a version too old to have
privileges on that type of object.  I think we should back-patch this into
all supported branches; it's considerably more likely that older branches
would be used to dump from ancient servers.





FYI this has fixed the problem that was encountered in cross-version 
upgrade testing.


cheers

andrew






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix pg_dump to dump shell types.

Peter Eisentraut pete...@gmx.net writes:
 On 8/9/15 6:23 PM, Tom Lane wrote:
 It looks to me like the reason for this is that pg_dump forces the
 typacl of a type to be '{=U}' when reading the schema data for a
 pre-9.2 type, rather than reading it as NULL (ie default permissions)
 which would result in not printing any grant/revoke commands for
 the object.
 
 I do not see a good reason for that; quite aside from this problem,
 it means there is one more place that knows the default permissions
 for a type than there needs to be.  Peter, what was the rationale?

 This was probably just copied from how proacl and lanacl are handled,
 which predate typacl by quite a bit.  Maybe there was a reason in those
 days.

Hm ... I wonder whether those are well-thought-out either.

 It might also have something to do with how owner privileges are
 handled.  An explicit '{=U}' doesn't create owner privileges, unlike a
 null value in that field.  Maybe this is necessary if you dump and
 restore between databases with different user names.

But now that you mention it, isn't that completely broken?  What pg_dump
actually prints given this made-up data is

REVOKE ALL ON TYPE myshell FROM PUBLIC;
REVOKE ALL ON TYPE myshell FROM postgres;
GRANT ALL ON TYPE myshell TO PUBLIC;

which seems like a completely insane interpretation.  There is no way
that dumping a type from a pre-typacl database and restoring it into
a newer one should end up with the type's owner having no privileges
on it.  I'm astonished that we've not gotten bug reports about that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing

2015-08-10 Thread Michael Paquier

On Tue, Aug 11, 2015 at 4:28 AM, Andres Freund wrote:
 On August 10, 2015 8:24:21 PM GMT+02:00, Fabien COELHO wrote:
 You can't allocate 4GB with palloc(), it has a builtin limit against
 allocating more than 1GB.

Argh, too bad, I assumed very naively that palloc was malloc in
disguise.

 It is, but there's some layering (memory pools/contexts) on top. You can get 
 huge allocations with polloc_huge.

palloc_huge does not exist yet ;)
There is either repalloc_huge or palloc_extended now, though
implementing one would be trivial.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] fix oversight converting buf_id to Buffer

2015-08-10 Thread Qingqing Zhou

On Mon, Aug 10, 2015 at 4:15 PM, Andres Freund and...@anarazel.de wrote:

 That's a very nice catch! Did you trigger the error or just found it
 when reading the code?


My fellow colleagues hit the issue during some stress: I am not clear
the exact repro but from the failed assertion, the cause is kinda
clear.

Regards,
Qingqing


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] A \pivot command for psql

2015-08-10 Thread Joe Conway

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 08/10/2015 04:03 PM, David Fetter wrote:
 I am wrong on that? I feel like you guys are all telling me that 
 \pivot should happen on the server, but the point that it would
 not be realistic to begin with is not considered.
 
 I think that starting the conversation again, especially at this
 stage of the 9.6 cycle, is a very good thing, whatever its
 outcome.

Talk to Atri Sharma -- he recently indicated to me that he might be
working on built-in pivot in the near term future (if not already
started).

- -- 
Joe Conway
Crunchy Data
Enterprise PostgreSQL
http://crunchydata.com
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (GNU/Linux)

iQIcBAEBAgAGBQJVyUVSAAoJEDfy90M199hlhEgP/1mQpZ0JOR2kbaP5Iw6FY4q+
P1SLWbCWMhVDp97OAjbP0T34YmG5Rw0fDh4WLPbNBp6xyPTQQgIVFvQAu6kDSngk
X9kbQHKGE+pNqc9hMc7CyuSse8Pw8VeQGRDU5a+8E3fPdi9rbB2YTDt7SJIXlavc
zJ8kZlUj59Xw9Kdkpon8Jb/Nxn3JV4GWLTe4+nxRZoH/9POjslyM+rVGtrMlHA59
0NWWPnTsLfU1HNk+olf72ihZpmQM4KPog8PdWo0GyFqJhMwmYtE/WTJh4jNDygBe
NXTQE7/I5OA6fYQniq+7Wsyhc5raOH16lVSSo0QquuTtGG2mrYsvd82Zx7J0SDQp
NfVk4qgjJYMNBN9/hvPXZZ2+LReYqliloR2PqLqxgDn3DyGgSpSUs1JyDZRtoJEj
P4jEVGVWnUbjKuNldRJZi1DmMTKVSS2RSnoC0RJ7DzAfUyQ4oH4FmX8jX5rZpoXN
nJLtVPhIRpp0A2Lq849hXB3/LuNzjYmZ3VvGNUwffTD7d3XxQ/Zeb9ZtWZszfJUo
zjMAh0wFwDtdVBYdFStzlByGTSEIqeqdGCDSHhiZlNw/E0xV2YjNzdUP9N34aWml
i+KUgPMxTJ/GAineSXpjt+I6Y+cHA10JpQLOf6fVdrOp/R1Z2EqFfZl2GzbyXJ5n
lz4QjhOfx6YG1Tdi5qYW
=nyuc
-END PGP SIGNATURE-


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Test code is worth the space

2015-08-10 Thread Robert Haas

On Mon, Aug 10, 2015 at 2:02 AM, Simon Riggs si...@2ndquadrant.com wrote:
 Almost every patch I review has either zero or insufficient tests.

 If we care about robustness, then we must discuss tests. Here are my two
 recent experiences:

 I agree we could do with x10 as many tests, but that doesn't mean all tests
 make sense, so there needs to be some limiting principles to avoid adding
 pointless test cases. I recently objected to adding a test case to Isolation
 tester for the latest ALTER TABLE patch because in that case there is no
 concurrent behavior to test. There is already a regression test that tests
 lock levels for those statements, so in my view it is more sensible to
 modify the existing test than to add a whole new isolation test that does
 nothing more than demonstrate the lock levels work as described, which we
 already knew.

 On another review I suggested we add a function to core to allow it to be
 used in regression tests. A long debate ensued, deciding that we must be
 consistent and put diagnostic functions in contrib. My understanding is that
 we are not allowed to write regression tests that use contrib modules, yet
 the consistent place to put diagnostic functions is contrib - therefore, we
 are never allowed to write tests utilizing diagnostic functions. We are
 allowed to put modules for testing in another directory, but these are not
 distributed to users so cannot be used for production diagnosis. Systemic
 fail, advice needed.

There are actually two ways to do this.

One model is the dummy_seclabel module.  The build system arranges for
that to be available when running the core regression tests, so they
can use it.  And the dummy_seclabel test does.  There are actually a
couple of other loadable modules that are used by the core regression
tests in this kind of way (refint and autoinc, from contrib/spi).

The other model is what I'll call the test_decoding model.  Since the
core code can't be tested without a module, we have a module.  But
then the tests are in the module's directory
(contrib/test_decoding/{sql,expected}) not the core regression tests.

In general, I have a mild preference for the second model.  It seems
more scalable, and keeps the core tests quick to run, which is
appropriate for more obscure tests that are unlikely to break very
often.  But the first model can also be done, as show by the fact that
we have in fact done it several times.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commitfest remaining Needs Review items


On 08/10/2015 10:46 AM, Atri Sharma wrote:


* Unique Joins

Still needs to be reviewed. Any volunteers?


Can take this one up, if its within my limits.


Thanks! I've added you as reviewer to that.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] A \pivot command for psql

2015-08-10 Thread Pavel Stehule

2015-08-10 6:04 GMT+02:00 David Fetter da...@fetter.org:

 On Sun, Aug 09, 2015 at 07:29:40PM +0200, Daniel Verite wrote:
Hi,
 
  I want to suggest a client-side \pivot command in psql, implemented
  in the attached patch.
 
  \pivot takes the current query in the buffer, execute it and
  display it pivoted by interpreting the result as:
 
  column1 = row in pivoted output
  column2 = column in pivoted output
  column3 = value at (row,column) in pivoted ouput

 This is really neat work, and thanks for the patch.

 This issue in this one as in the crosstab extension is that it's
 available only if you are using some particular piece of extra
 software, in this case the psql client.

 I'm working up a proposal to add (UN)PIVOT support to the back-end.
 Would you like to join in on that?


PIVOT, UNPIVOT should be preferred solution

Pavel

I can look on implementations in other db



 Cheers,
 David.
 --
 David Fetter da...@fetter.org http://fetter.org/
 Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
 Skype: davidfetter  XMPP: david.fet...@gmail.com

 Remember to vote!
 Consider donating to Postgres: http://www.postgresql.org/about/donate


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commitfest remaining Needs Review items

2015-08-10 Thread Ashutosh Bapat

On Mon, Aug 10, 2015 at 1:04 PM, Heikki Linnakangas hlinn...@iki.fi wrote:

 Hello Hackers,

 There are a few Needs Review items remaining in the July commitfest.
 Reviewers, please take action - you are holding up the commitfest.

 In addition to these items, there are a bunch of items in Ready for
 Committer state. Committers: please help with those.

 * extends pgbench expressions with functions


 Robert signed up as reviewer for this a long time ago. Ping, do you still
 plan to review this?

 * self-defined policy for sepgsql regression test


 Stephen: would you have the time to handle this, please?

 * Allocate restart_lsn at the time of creating physical replication slot


 Andres, can you have a look at this, please?

 * Parallel Seq scan


 Jeff, Haribabu, you're signed up as reviewers. How's it going?

 * Join pushdown support for foreign tables


 Ashutosh, KaiGai, Etsuro: You signed up as reviewers in spring, but there
 hasn't been any activity during this commitfest. What's the status of this
 patch? Do you plan to review this now?


There are three patches involved here. I have reviewed one of them. I am
planning to look at them within few days.



 * replace pg_stat_activity.waiting with something more descriptive


 This is not quite ready yet, but has had its fair share of review, so I'm
 going to move it to the next commitfest. Feel free to continue the
 discussion and review, though; I just don't want drag the commitfest
 because for this.

 * Unique Joins


 Still needs to be reviewed. Any volunteers?

 * checkpoint continuous flushing


 Per discussion on that thread, let's just do the sorting part now. Needs
 some cleanup, but it's almost there.

 - Heikki




-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Summary of plans to avoid the annoyance of Freezing

On 2015-08-10 07:26:29 +0100, Simon Riggs wrote:
 On 10 August 2015 at 07:14, Peter Geoghegan p...@heroku.com wrote:
 
  On Sun, Aug 9, 2015 at 11:03 PM, Simon Riggs si...@2ndquadrant.com
  wrote:
   If 5) fails to bring a workable solution by the Jan 2016 CF then we
  commit
   2) instead.
 
  Is there actually a conflict there? I didn't think so.
 
 
 I didn't explain myself fully, thank you for asking.
 
 Having a freeze map would be wholly unnecessary if we don't ever need to
 freeze whole tables again. Freezing would still be needed on individual
 blocks where an old row has been updated or deleted; a freeze map would not
 help there either.
 
 So there is no conflict, but options 2) and 3) are completely redundant if
 we go for 5). After investigation, I now think 5) is achievable in 9.6, but
 if I am wrong for whatever reason, we have 2) as a backstop.

I don't think that's true. You can't ever delete the clog without
freezing. There's no need for anti-wraparound scans anymore, but you
still need to freeze once.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Precedence of standard comparison operators

2015-08-10 Thread Pavel Stehule

2015-08-10 5:37 GMT+02:00 Noah Misch n...@leadboat.com:

 On Sun, Aug 09, 2015 at 08:06:11PM -0400, Tom Lane wrote:
  Noah Misch n...@leadboat.com writes:
   In SQL:2008 and SQL:2011 at least, =,  and BETWEEN are all in
 the same
   boat.  They have no precedence relationships to each other; SQL
 sidesteps the
   question by requiring parentheses.  They share a set of precedence
   relationships to other constructs.  SQL does not imply whether to put
 them in
   one %nonassoc precedence group or in a few, but we can contemplate
 whether
   users prefer an error or prefer the 9.4 behavior for affected queries.
 
  Part of my thinking was that the 9.4 behavior fails the principle of
 least
  astonishment, because I seriously doubt that people expect '=' to be
  either right-associative or lower priority than ''.  Here's one example:
 
  regression=# select false = true  false;
   ?column?
  --
   t
  (1 row)

  So yeah, I do think that getting a syntax error if you don't use
  parentheses is the preferable behavior here.


If we raise a syntax error, then there should be very informative message,
because pattern

true = 2  1 is probably relative often

and it is hard to find syntax error on this trivial expression

Regards

Pavel



 I agree.


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing

On 2015-08-08 20:49:03 +0300, Heikki Linnakangas wrote:
 * I think we should drop the flush part of this for now. It's not as
 clearly beneficial as the sorting part, and adds a great deal more code
 complexity. And it's orthogonal to the sorting patch, so we can deal with it
 separately.

I don't agree. For one I've seen it cause rather big latency
improvements, and we're horrible at that. But more importantly I think
the requirements of the flush logic influences how exactly the sorting
is done. Splitting them will just make it harder to do the flushing in a
not too big patch.

 * Is it really necessary to parallelize the I/O among tablespaces? I can see
 the point, but I wonder if it makes any difference in practice.

Today it's somewhat common to have databases that are bottlenecked on
write IO and all those writes being done by the checkpointer. If we
suddenly do the writes to individual tablespaces separately and
sequentially we'll be bottlenecked on the peak IO of a single
tablespace.

 * Is there ever any harm in sorting the buffers? The GUC is useful for
 benchmarking, but could we leave it out of the final patch?

Agreed.

 * Do we need to worry about exceeding the 1 GB allocation limit in
 AllocateCheckpointBufferIds? It's enough got 2 TB of shared_buffers. That's
 a lot, but it's not totally crazy these days that someone might do that. At
 the very least, we need to lower the maximum of shared_buffers so that you
 can't hit that limit.

We can just use the _huge variant?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Summary of plans to avoid the annoyance of Freezing


On 08/10/2015 11:17 AM, Andres Freund wrote:

On 2015-08-10 07:26:29 +0100, Simon Riggs wrote:

On 10 August 2015 at 07:14, Peter Geoghegan p...@heroku.com wrote:


On Sun, Aug 9, 2015 at 11:03 PM, Simon Riggs si...@2ndquadrant.com
wrote:

If 5) fails to bring a workable solution by the Jan 2016 CF then we

commit

2) instead.


Is there actually a conflict there? I didn't think so.



I didn't explain myself fully, thank you for asking.

Having a freeze map would be wholly unnecessary if we don't ever need to
freeze whole tables again. Freezing would still be needed on individual
blocks where an old row has been updated or deleted; a freeze map would not
help there either.

So there is no conflict, but options 2) and 3) are completely redundant if
we go for 5). After investigation, I now think 5) is achievable in 9.6, but
if I am wrong for whatever reason, we have 2) as a backstop.


I don't think that's true. You can't ever delete the clog without
freezing. There's no need for anti-wraparound scans anymore, but you
still need to freeze once.


What's your definition of freezing? As long as you remove all dead 
tuples, you can just leave the rest in place with their original 
XID+epoch in place, and assume that everything old enough is committed.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CREATE POLICY and RETURNING

2015-08-10 Thread Zhaomo Yang

Hi,

This thread has a pretty thorough discussion of pros and cons of applying
SELECT policy to other commands. Besides what have been mentioned, I think
there is another potential pro: we can enable references to pseudorelations
OLD and NEW in predicates. Now, for UPDATE, the references to the target
table in USING clause are actually references to OLD and the references in
WITH CHECK clause are references to NEW. Logically now USING and WITH CHECK
are combined by AND, so we cannot have predicates like

  foo(NEW)  1 OR bar(OLD)  1   (combine predicates referencing OLD
and NEW by an operation other than AND)
  NEW.id  OLD.id(reference both in the same expression)

If we apply SELECT policy to other commands, we only need one predicate for
INSERT/UPDATE/DELETE. That predicate can reference to OLD and NEW, just like
predicates for triggers and rules. For UPDATE and DELETE, the predicate of
SELECT will be applied first (when the target table is scanned) to ensure no
information leakage and their own predicate will be applied later. This
doesn't change much for INSERT and DELETE, but it gives users more
flexibility when they set predicates for UPDATE.

Thanks,
Zhaomo



--
View this message in context: 
http://postgresql.nabble.com/CREATE-POLICY-and-RETURNING-tp5823192p5861550.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CREATE POLICY and RETURNING

2015-08-10 Thread Zhaomo Yang

In case you missed the link to the previous discussion at the bottom,
http://www.postgresql.org/message-id/CAHGQGwEqWD=ynqe+zojbpoxywt3xlk52-v_q9s+xofckjd5...@mail.gmail.com

Re: [HACKERS] statement_timeout affects query results fetching?

2015-08-10 Thread Shay Rojansky

Thanks for the explanation Robert, that makes total sense. However, it
seems like the utility of PG's statement_timeout is much more limited than
I thought.

In case you're interested, I dug a little further and it seems that
Microsoft's client for SQL Server implements the following timeout (source
https://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlcommand.commandtimeout%28v=vs.110%29.aspx?f=255MSPPError=-2147217396
):

cumulative time-out (for all network packets that are read during the
invocation of a method) for all network reads during command execution or
processing of the results. A time-out can still occur after the first row
is returned, and does not include user processing time, only network read
time.

Since it doesn't seem possible to have a clean query-processing-only
timeout at the backend, we may be better off doing something similar to the
above and enforce timeouts on the client only. Any further thoughts on this
would be appreciated.

On Sun, Aug 9, 2015 at 5:21 PM, Robert Haas robertmh...@gmail.com wrote:

On Sat, Aug 8, 2015 at 11:30 AM, Shay Rojansky r...@roji.org wrote:
the entire row in memory (imagine rows with megabyte-sized columns). This
makes sense to me; Tom, doesn't the libpq behavior you describe of
absorbing
the result set as fast as possible mean that a lot of memory is wasted on
the client side?

It sure does.

I can definitely appreciate the complexity of changing this behavior. I
think that some usage cases (such as mine) would benefit from a timeout
on
the time until the first row is sent, this would allow to put an upper
cap
on stuff like query complexity, for example.

Unfortunately, it would not do any such thing. It's possible for the
first row to be returned really really fast and then for an arbitrary
amount of time to pass and computation to happen before all the rows
are returned. A plan can have a startup cost of zero and a run cost
of a billion (or a trillion). This kind of scenario isn't even
particularly uncommon. You just need a plan that looks like this:

Nested Loop
- Nested Loop
- Nested Loop
- Seq Scan
- Index Scan
- Index Scan
- Index Scan

You can just keep pushing more nested loop/index scans on there and
the first row will still pop out quite fast. But if the seq-scanned
table is large, generating the whole result set can take a long, long
time.

Even worse, you could have a case like this:

SELECT somefunc(a) FROM foo;

Now suppose somefunc is normally very quick, but if a = 42 then it
does pg_sleep() in a loop until the world ends. You're going to have
however many rows of foo have a != 42 pop out near-instantaneously,
and then it will go into the tank and not come out until the meaning
of life, the universe, and everything is finally revealed.

That second case is a little extreme, and a little artificial, but the
point is this: just as you don't want the client to have to buffer the
results in memory, the server doesn't either. It's not the case that
the server computes the rows and sends them to you. Each one is sent
to you as it is computed, and in the normal case, at the time the
first row is sent, only a small percentage of the total work of the
query has been performed.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Moving SS_finalize_plan processing to the end of planning

2015-08-10 Thread David Rowley

On 10 August 2015 at 07:50, Tom Lane t...@sss.pgh.pa.us wrote:

 I've started to work on path-ification of the upper planner (finally),
 and since that's going to be a large patch in any case, I've been looking
 for pieces that could be bitten off and done separately.  One such piece
 is the fact that SS_finalize_plan (computation of extParams/allParams)
 currently gets run at the end of subquery_planner; as long as that's true,
 we cannot have subquery_planner returning paths rather than concrete
 plans.  The attached patch moves that processing to occur just before
 set_plan_references is run.


 Basically what this patch does is to divide what had been done in
 SS_finalize_plan into three steps:

 * SS_identify_outer_params determines which outer-query-level Params will
 be available for the current query level to use, and saves that aside in
 a new PlannerInfo field root-outer_params.  This processing turns out
 to be the only reason that SS_finalize_plan had to be called in
 subquery_planner: we have to capture this data before exiting
 subquery_planner because the upper levels' plan_params lists may change
 as they plan other subplans.

 * SS_attach_initplans does the work of attaching initplans to the parent
 plan node and adjusting the parent's cost estimate accordingly.

 * SS_finalize_plan now *only* does extParam/allParam calculation.

 I had to change things around a bit in planagg.c (which was already a
 hack, and the law of conservation of cruft applies).  Otherwise it's
 pretty straightforward and removes some existing hacks.  One notable
 point is that there's no longer an assumption within SS_finalize_plan
 that initPlans can only appear in the top plan node.

 Any objections?


Great! I'm very interested in this work, specifically around the
grouping_planner() changes too.

I've looked over the patch and from what I understand it seems like a good
solid step in the right direction.

I was digging around the grouping_planner() last week with the intentions
of making some changes there to allow GROUP BY before join, but in the end
decided to stay well clear of this area until this pathification is done.
So far I've managed to keep my changes away from the upper planner and I've
added GroupingPath types, which from what I can predict of what you'll be
making changes to, I think you'll also need to have grouping_planner()
return a few variations of GroupingPath to allow the paths list to be
passed up to subquery_planner() and on up to functions
like recurse_set_operations() so that they have the option of choosing
GroupAggregate / MergeAppend to implement UNION.

If I'm right on this, then maybe there's a few things you can copy and
paste from the patch I posted here:
http://www.postgresql.org/message-id/CAKJS1f-sEcm=gtfs-dqjsocsz-vlhrp_hsrntjjq-s7egn8...@mail.gmail.com
 specifically around create_grouping_path()?

Kind Regards

David Rowley

--
 David Rowley   http://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

Re: [HACKERS] Test code is worth the space

On 8 August 2015 at 17:47, Noah Misch n...@leadboat.com wrote:

 We've too often criticized patches for carrying many lines/bytes of test
 case
 additions.  Let's continue to demand debuggable, secure tests that fail
 only
 when something is wrong, but let's stop pushing for test minimalism.  Such
 objections may improve the individual patch, but that doesn't make up for
 the
 chilling effect on test contributions.  I remember clearly the first time I
 submitted thorough test coverage with a feature.  Multiple committers
 attacked
 it in the name of brevity.  PostgreSQL would be better off with 10x its
 current test bulk, even if the average line of test code were considerably
 less important than we expect today.


Almost every patch I review has either zero or insufficient tests.

If we care about robustness, then we must discuss tests. Here are my two
recent experiences:

I agree we could do with x10 as many tests, but that doesn't mean all tests
make sense, so there needs to be some limiting principles to avoid adding
pointless test cases. I recently objected to adding a test case to
Isolation tester for the latest ALTER TABLE patch because in that case
there is no concurrent behavior to test. There is already a regression test
that tests lock levels for those statements, so in my view it is more
sensible to modify the existing test than to add a whole new isolation test
that does nothing more than demonstrate the lock levels work as described,
which we already knew.

On another review I suggested we add a function to core to allow it to be
used in regression tests. A long debate ensued, deciding that we must be
consistent and put diagnostic functions in contrib. My understanding is
that we are not allowed to write regression tests that use contrib modules,
yet the consistent place to put diagnostic functions is contrib -
therefore, we are never allowed to write tests utilizing diagnostic
functions. We are allowed to put modules for testing in another directory,
but these are not distributed to users so cannot be used for production
diagnosis. Systemic fail, advice needed.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services

Re: [HACKERS] Summary of plans to avoid the annoyance of Freezing

2015-08-10 Thread Peter Geoghegan

On Sun, Aug 9, 2015 at 11:03 PM, Simon Riggs si...@2ndquadrant.com wrote:
 If 5) fails to bring a workable solution by the Jan 2016 CF then we commit
 2) instead.

Is there actually a conflict there? I didn't think so.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Summary of plans to avoid the annoyance of Freezing

On 10 August 2015 at 07:14, Peter Geoghegan p...@heroku.com wrote:

 On Sun, Aug 9, 2015 at 11:03 PM, Simon Riggs si...@2ndquadrant.com
 wrote:
  If 5) fails to bring a workable solution by the Jan 2016 CF then we
 commit
  2) instead.

 Is there actually a conflict there? I didn't think so.


I didn't explain myself fully, thank you for asking.

Having a freeze map would be wholly unnecessary if we don't ever need to
freeze whole tables again. Freezing would still be needed on individual
blocks where an old row has been updated or deleted; a freeze map would not
help there either.

So there is no conflict, but options 2) and 3) are completely redundant if
we go for 5). After investigation, I now think 5) is achievable in 9.6, but
if I am wrong for whatever reason, we have 2) as a backstop.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services

[HACKERS] Commitfest remaining Needs Review items


Hello Hackers,

There are a few Needs Review items remaining in the July commitfest. 
Reviewers, please take action - you are holding up the commitfest.


In addition to these items, there are a bunch of items in Ready for 
Committer state. Committers: please help with those.



* extends pgbench expressions with functions


Robert signed up as reviewer for this a long time ago. Ping, do you 
still plan to review this?



* self-defined policy for sepgsql regression test


Stephen: would you have the time to handle this, please?


* Allocate restart_lsn at the time of creating physical replication slot


Andres, can you have a look at this, please?


* Parallel Seq scan


Jeff, Haribabu, you're signed up as reviewers. How's it going?


* Join pushdown support for foreign tables


Ashutosh, KaiGai, Etsuro: You signed up as reviewers in spring, but 
there hasn't been any activity during this commitfest. What's the status 
of this patch? Do you plan to review this now?



* replace pg_stat_activity.waiting with something more descriptive


This is not quite ready yet, but has had its fair share of review, so 
I'm going to move it to the next commitfest. Feel free to continue the 
discussion and review, though; I just don't want drag the commitfest 
because for this.



* Unique Joins


Still needs to be reviewed. Any volunteers?


* checkpoint continuous flushing


Per discussion on that thread, let's just do the sorting part now. Needs 
some cleanup, but it's almost there.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commitfest remaining Needs Review items

2015-08-10 Thread Atri Sharma


 * Unique Joins


 Still needs to be reviewed. Any volunteers?



Can take this one up, if its within my limits.

[HACKERS] Summary of plans to avoid the annoyance of Freezing

Freezing is painful for VLDBs and high transaction rate systems. We have a
number of proposals to improve things...

1. Allow parallel cores to be used with vacuumdb (Dilip)
Idea is if we have to get the job done, lets do it as fast as we can using
brute force. Reasonable thinking, but there are more efficient approaches.

2. Freeze Map (Sawada)
This works and we have a mostly-ready patch. I'm down to do final review on
this, which is why I'm producing this summary and working out what's the
best next action for Postgres.

3. Speed up autovacuums when they are triggered to avoid wraparounds (Simon)
Idea is to do a VACUUM scan which only freezes tuples. If we dirty a page
from freezing we then also prune it, but don't attempt to scan indexes to
remove the now-truncated dead tuples.
This looks very straightforward, no technical issues. Might even be able to
backpatch it.
[patch investigated but not finished yet]

4. 64-bit Xids (Alexander)
Proposal rejected

5. 64-bit Xid Visibility, with Xid Epoch in page header (Heikki)
* Epoch is stored on new pages, using a new page format. Epoch of existing
pages is stored in control file - code changes to bufpage.c look isolated
* All transaction age tests are made using both xid and epoch - the code
changes to tqual.c look isolated
* We won't need to issue anti-wraparound vacuums again, ever
* We will still need to issue clog-trimming vacuums, so we'll need a new
parameter to control when to start scanning to avoid clog growing too large
* relfrozenxid will be set to InvalidTransactionId once all existing pages
have been made visible
* relhintxid and relhintepoch will be required so we have a backstop to
indicate where to truncate clog (and same for multixacts)
* There is no need to FREEZE except when UPDATEing/DELETEing pages from
previous epochs
* VACUUM FREEZE will only freeze old pages; on a new cluster it will work
same as VACUUM
* VACUUMs that touch every non-all-visible page will be able to set
relhintxid to keep clog trimmed, so never need to scan all blocks in table
* Code changes seem fairly isolated
* No action required at pg_upgrade
* Additional benefit: we can move to 32-bit CRC checksums on data pages at
same time as doing this (seamlessly).
* 8 bytes additional space required per page (~0.1% growth in database size)
* (Any other changes to page headers can be made or reserved at this time)

To me 3) would have been useful if we'd done it earlier. Now that we have
2) and 5), I don't see much point pursuing 3).

The main options are 2) or 5)

Freeze Map (2) makes Freezing more efficient for larger tables, but doesn't
avoid the need altogether. 5) is a deeper treatment of the underlying
problem and is a better architecture for the future of Postgres, IMHO.

I was previously a proponent of (2) as a practical way forwards, but my
proposal here today is that we don't do anything further on 2) yet, and
seek to make progress on 5) instead.

If 5) fails to bring a workable solution by the Jan 2016 CF then we commit
2) instead.

If Heikki wishes to work on (5), that's good. Otherwise, I think its
something I can understand and deliver by 1 Jan, though likely for 1 Nov CF.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services

Re: [HACKERS] Asynchronous execution on FDW

I've marked this as rejected in the commitfest, because others are
working on a more general solution with parallel workers. That's still
work-in-progress, and it's not certain if it's going to make it into
9.6, but if it does it will largely render this obsolete. We can revisit
this patch later in the release cycle, if the parallel scan patch hasn't
solved the same use case by then.

- Heikki



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PL/pgSQL, RAISE and error context


On 07/26/2015 08:34 AM, Pavel Stehule wrote:

Hi

here is complete patch, that introduce context filtering on client side.
The core of this patch is trivial and small - almost all of size are
trivial changes in regress tests - removing useless context.

Documentation, check-world


Looks good to me at first glance. I'll mark this as Ready for Committer.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commitfest remaining Needs Review items

2015-08-10 Thread Amit Kapila

On Mon, Aug 10, 2015 at 1:04 PM, Heikki Linnakangas hlinn...@iki.fi wrote:

 * Parallel Seq scan


 Jeff, Haribabu, you're signed up as reviewers. How's it going?


The current state of the patch is that there are few comments by
Jeff and others which I need to take care, however it would have
been better if there are more comments after detailed review.
In the absence of more detailed review, I will address the existing
comments and send an updated patch.  Feel free to move it to
next CF or I will do the same if there are no more comments before
you want to close this CF.


Many Thanks for managing this Commit Fest and giving each patch
a fair share of it's time for review and discussion and doing review of
as many patches as possible by yourself.  Your contribution in
systematic progress of CF is really valuable.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Commitfest remaining Needs Review items

2015-08-10 Thread Etsuro Fujita


On 2015/08/10 17:10, Ashutosh Bapat wrote:

On Mon, Aug 10, 2015 at 1:04 PM, Heikki Linnakangas hlinn...@iki.fi
mailto:hlinn...@iki.fi wrote:
* Join pushdown support for foreign tables



Ashutosh, KaiGai, Etsuro: You signed up as reviewers in spring, but
there hasn't been any activity during this commitfest. What's the
status of this patch? Do you plan to review this now?



There are three patches involved here. I have reviewed one of them. I am
planning to look at them within few days.


Sorry for not having taken any action.

I think we should address the issue dicussed in [1] first because I 
think that would affect the design of join pushdown API in the core. 
I'll submit a patch for that within a few days.  I'd like to review this 
in detail after addressing that, if it's OK.


Best regards,
Etsuro Fujita

[1] http://www.postgresql.org/message-id/558a18b3.9050...@lab.ntt.co.jp


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Rework access method interface

On 2015-08-09 23:56, Alexander Korotkov wrote:

Hacker,

some time before I proposed patches implementing CREATE ACCESS METHOD.
http://www.postgresql.org/message-id/capphfdsxwzmojm6dx+tjnpyk27kt4o7ri6x_4oswcbyu1rm...@mail.gmail.com
As I get from comments to my patches and also from Tom's comment about
AM interface in tablesampling thread – AM interface needs reworking. And
AFAICS AM interface rework is essential to have CREATE ACCESS METHOD
command.
http://www.postgresql.org/message-id/5438.1436740...@sss.pgh.pa.us

Cool, I was planning to take a stab at this myself when I have time, so
I am glad you posted this.

This is why I'd like to show a WIP patch implementing AM interface
rework. Patch is quite dirty yet, but I think it illustrated the idea
quite clear. AM now have single parameter – handler function. This
handler returns pointer to AmRoutine which have the same data as pg_am
had before. Thus, API is organized very like FDW.

I wonder if it would make sense to call it IndexAmRoutine instead in
case we add other AMs (like the sequence am, or maybe column store if
that's done as AM) in the future.

The patch should probably add the am_handler type which should be return
type of the am handler function (see fdw_handler and tsm_handler types).

I also think that the CHECK_PROCEDUREs should be in the place of the
original GET_REL_PROCEDUREs (just after relation check). If the
interface must exist we may as well check for it at the beginning and
not after we did some other work which is useless without the interface.

However, this patch appears to take more work than I expected. It have
to do many changes spread among many files.

Yeah you need to change the definition and I/O handling of every AM
function, but that's to be expected.

Also, it appears not so easy
to hide amsupport into AmRoutine, because it's needed for relcache. As a
temporary solution it's duplicated in RelationData.

I don't understand this, there is already AmRoutine in RelationData, why
the need for additional field for just amsupport?

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Using contrib modules in check (Re: pgsql: Fix BRIN to use SnapshotAny during summarization)

2015-08-10 Thread Christoph Berg

Re: Tom Lane 2015-08-07 928.1438900...@sss.pgh.pa.us
 Alvaro Herrera alvhe...@alvh.no-ip.org writes:
  Fix BRIN to use SnapshotAny during summarization
 
 This patch added an isolation test that fails unless contrib/pageinspect
 has been built and installed.  I do not find that acceptable.  It causes
 make check-world to fail ... and no, installing the extension during
 make check-world isn't going to make me happier.
 
 I don't really think we need this isolation test at all, but if we do,
 please fix it to not rely on any extensions.  Perhaps looking at
 pg_relation_size or some such would do?  Or you could just issue
 a query that should use the index, and see if it finds the rows it
 ought to.

Hi,

this breaks the Debian package builds as well because we run
check-world as a build step.

Any chance for a fix/workaround so the nightly master/head builds will
succeed again?

Christoph
-- 
c...@df7cb.de | http://www.df7cb.de/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tap tests remove working directories




On 08/09/2015 08:58 PM, Michael Paquier wrote:

On Sun, Aug 9, 2015 at 11:19 PM, Andrew Dunstan and...@dunslane.net wrote:


On 08/09/2015 08:41 AM, Michael Paquier wrote:

On Sun, Aug 9, 2015 at 1:40 AM, Andrew Dunstan and...@dunslane.net
wrote:

On 08/08/2015 09:31 AM, Robert Haas wrote:

On Fri, Aug 7, 2015 at 7:17 PM, Andrew Dunstan and...@dunslane.net
wrote:

That certainly isn't what happens, and given the way this is done in
TestLib.pm, using the CLEANUP parameter of File::Temp's tempdir()
function,
it's not clear how we could do that easily.

shot-in-the-dark

Set cleanup to false and manually remove the directory later in the
code, so that stuff runs only if we haven't died sooner?

/shot-in-the-dark


Yeah, maybe. I'm thinking of trying to do it more globally, like in
src/Makefile.global.in. That way we wouldn't have to add new code to
every
test file.

If we rely on END to clean up the temporary data folder, there is no
need to impact each test file, just the test functions called in
TestLib.pm that could switch a flag to not perform any cleanup at the
end of the run should an unexpected result be found. Now I am as well
curious to see what you have in mind with manipulating
Makefile.global.

See attached. If you have a better idea please post your patch.

Sure. Attached is what I have in mind. Contrary to your version we
keep around temp paths should a run succeed after one that has failed
when running make check multiple times in a row. Perhaps it does not
matter much in practice as log files get removed at each new run but I
think that this allows a finer control in case of failure. Feel free
to have a look.



Actually, I don't think this is a very good idea. You could end up with 
a whole series of opaquely named directories from a series of failing 
runs. If you want to keep the directory after a failure, and want to do 
another run, then rename the directory. That's what you have to do with 
the main regression tests, too. My version also has the benefit of 
changing exactly 3 lines in the source :-)


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-08-10 Thread Masahiko Sawada

On Mon, Aug 10, 2015 at 1:36 PM, Rahila Syed rahilasye...@gmail.com wrote:
 Hello,

Say, 6 bigint counters, 6 float8
counters, and 3 strings up to 80 characters each.  So we have a
fixed-size chunk of shared memory per backend, and each backend that
wants to expose progress information can fill in those fields however
it likes, and we expose the results.
This would be sorta like the way pg_statistic works: the same columns
can be used for different purposes depending on what estimator will be
used to access them.

 After thinking more on this suggestion, I came up with following generic
 structure which can be used to store progress of any command per backend in
 shared memory.

 Struct PgBackendProgress
 {
 int32 *counter[COMMAND_NUM_SLOTS];
 float8 *counter_float[COMMAND_NUM_SLOTS];

 char *progress_message[COMMAND_NUM_SLOTS];
 }

 COMMAND_NUM_SLOTS will define maximum number of slots(phases)  for any
 command.
 Progress of command will be measured using progress of each phase in
 command.
 For some command the number of phases can be singular and rest of the slots
 will be NULL.

 Each phase will report n integer counters, n float counters and a progress
 message.
 For some phases , any of the above fields can be NULL.

 For VACUUM , there can 3 phases as discussed in the earlier mails.

 Phase 1. Report 2 integer counters: heap pages scanned and total heap pages,
 1 float counter: percentage_complete and progress message.
 Phase 2. Report 2 integer counters: index pages scanned and total index
 pages(across all indexes) and progress message.
 Phase 3. 1 integer counter: heap pages vacuumed.

 This structure can be accessed by statistics collector to display progress
 via new view.

I have one question about this.

When we're in Phase2 or 3, don't we need to report the number of total
page scanned or percentage of how many table pages scanned, as well?
As Robert said, Phase2(means lazy_vacuum_index here) and 3(means
lazy_vacuum_heap here) could be called whenever lazy_scan_heap fills
up the maintenance_work_mem. And phase 3 could be called at the end of
scanning single page if table doesn't have index.
So if vacuum progress checker reports the only current phase
information, we would not be able to know where we are in now.

Regards,

--
Masahiko Sawada


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Test code is worth the space

On 10 August 2015 at 13:55, Robert Haas robertmh...@gmail.com wrote:

 On Mon, Aug 10, 2015 at 2:02 AM, Simon Riggs si...@2ndquadrant.com
 wrote:
  On another review I suggested we add a function to core to allow it to be
  used in regression tests. A long debate ensued, deciding that we must be
  consistent and put diagnostic functions in contrib. My understanding is
 that
  we are not allowed to write regression tests that use contrib modules,
 yet
  the consistent place to put diagnostic functions is contrib - therefore,
 we
  are never allowed to write tests utilizing diagnostic functions. We are
  allowed to put modules for testing in another directory, but these are
 not
  distributed to users so cannot be used for production diagnosis. Systemic
  fail, advice needed.

 There are actually two ways to do this.

 One model is the dummy_seclabel module.  The build system arranges for
 that to be available when running the core regression tests, so they
 can use it.  And the dummy_seclabel test does.  There are actually a
 couple of other loadable modules that are used by the core regression
 tests in this kind of way (refint and autoinc, from contrib/spi).

 The other model is what I'll call the test_decoding model.  Since the
 core code can't be tested without a module, we have a module.  But
 then the tests are in the module's directory
 (contrib/test_decoding/{sql,expected}) not the core regression tests.

 In general, I have a mild preference for the second model.  It seems
 more scalable, and keeps the core tests quick to run, which is
 appropriate for more obscure tests that are unlikely to break very
 often.  But the first model can also be done, as show by the fact that
 we have in fact done it several times.


Neither of those uses a diagnostic function that also has value in
production environments - for logical decoding we added something to core
specifically to allow this pg_recvlogical.

Anyway, resolving this isn't important anymore because I wish to pursue a
different mechanism for freezing, but its possible I hit the same issue.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services

Re: [HACKERS] tap tests remove working directories

Andrew Dunstan and...@dunslane.net writes:
 On 08/09/2015 08:58 PM, Michael Paquier wrote:
 Sure. Attached is what I have in mind. Contrary to your version we
 keep around temp paths should a run succeed after one that has failed
 when running make check multiple times in a row. Perhaps it does not
 matter much in practice as log files get removed at each new run but I
 think that this allows a finer control in case of failure. Feel free
 to have a look.

 Actually, I don't think this is a very good idea. You could end up with 
 a whole series of opaquely named directories from a series of failing 
 runs. If you want to keep the directory after a failure, and want to do 
 another run, then rename the directory. That's what you have to do with 
 the main regression tests, too. My version also has the benefit of 
 changing exactly 3 lines in the source :-)

FWIW, I think we should keep the behavior of the TAP tests as close as
possible to the established behavior of the main regression tests.

It's certainly possible that there's room for improvement in that
benchmark behavior.  But let's first converge the test behaviors,
and then have a discussion about whether we want changes, and if
so change all the tests at the same time.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Don'st start streaming after creating a slot in pg_receivexlog

On 2015-07-30 17:14:16 +0900, Michael Paquier wrote:
So, perhaps the attached is more convincing then? It just changes
--create-slot to leave immediately after creation to address the
complain of this thread.  -- Michael

Pushed that. Thanks!

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] expose confirmed_flush for replication slots