Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-07-16 Thread Robert Haas
On Wed, Jul 15, 2015 at 5:03 AM, Michael Paquier
michael.paqu...@gmail.com wrote:
 Group labels are essential.

 OK, so this is leading us to the following points:
 - Use a JSON object to define the quorum/priority groups for the sync state.
 - Store it as a GUC, and use the check hook to validate its format,
 which is what we have now with s_s_names
 - Rely on SIGHUP to maintain an in-memory image of the quorum/priority
 sync state
 - Have the possibility to define group labels in this JSON blob, and
 be able to use those labels in a quorum or priority sync definition.
 - For backward-compatibility, use for example s_s_names = 'json' to
 switch to the new system.

Personally, I think we're going to find that using JSON for this
rather than a custom syntax makes the configuration strings two or
three times as long for no discernable benefit.

But I just work here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-07-16 Thread Simon Riggs
On 16 July 2015 at 18:27, Robert Haas robertmh...@gmail.com wrote:

 On Wed, Jul 15, 2015 at 5:03 AM, Michael Paquier
 michael.paqu...@gmail.com wrote:
  Group labels are essential.
 
  OK, so this is leading us to the following points:
  - Use a JSON object to define the quorum/priority groups for the sync
 state.
  - Store it as a GUC, and use the check hook to validate its format,
  which is what we have now with s_s_names
  - Rely on SIGHUP to maintain an in-memory image of the quorum/priority
  sync state
  - Have the possibility to define group labels in this JSON blob, and
  be able to use those labels in a quorum or priority sync definition.
  - For backward-compatibility, use for example s_s_names = 'json' to
  switch to the new system.

 Personally, I think we're going to find that using JSON for this
 rather than a custom syntax makes the configuration strings two or
 three times as long for


They may well be 2-3 times as long. Why is that a negative?


 no discernable benefit.


Benefits:
* More readable
* Easy to validate
* No additional code required in the server to support this syntax (so no
bugs)
* Developers will immediately understand the format
* Easy to programmatically manipulate in a range of languages

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-07-16 Thread Robert Haas
On Thu, Jul 16, 2015 at 1:32 PM, Simon Riggs si...@2ndquadrant.com wrote:
 Personally, I think we're going to find that using JSON for this
 rather than a custom syntax makes the configuration strings two or
 three times as long for

 They may well be 2-3 times as long. Why is that a negative?

In my opinion, brevity makes things easier to read and understand.  We
also don't support multi-line GUCs, so if your configuration takes 140
characters, you're going to have a very long line in your
postgresql.conf (and in your pg_settings output, etc.)

 * No additional code required in the server to support this syntax (so no
 bugs)

I think you'll find that this is far from true.  Presumably not any
arbitrary JSON object will be acceptable.  You'll have to parse it as
JSON, and then validate that it is of the expected form.  It may not
be MORE code than implementing a mini-language from scratch, but I
wouldn't expect to save much.

 * Developers will immediately understand the format

I doubt it.  I think any format that we pick will have to be carefully
documented.  People may know what JSON looks like in general, but they
will not immediately know what bells and whistles are available in
this context.

 * Easy to programmatically manipulate in a range of languages

I agree that JSON has that advantage, but I doubt that it is important
here.  I would expect that people might need to generate a new config
string and dump it into postgresql.conf, but that should be easy with
any reasonable format.  I think it will be rare to need to parse the
postgresql.conf string, manipulate it programatically, and then put it
back.  As we've already said, most configurations are simple and
shouldn't change frequently.  If they're not or they do, that's a
problem of itself.

However, I'm not trying to ram my idea through; I'm just telling you my opinion.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Generalized JSON output functions

2015-07-16 Thread Robert Haas
On Wed, Jul 15, 2015 at 12:58 PM, Andrew Dunstan and...@dunslane.net wrote:
 The approach take was both invasive and broken.

Well, then let's not do it that way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Generalized JSON output functions

2015-07-16 Thread Robert Haas
On Wed, Jul 15, 2015 at 1:10 PM, Ryan Pedela rped...@datalanche.com wrote:
 Like I said previously, the
 situation with Javascript will hopefully be remedied in a few years with ES7
 anyway.

I don't understand these issues in great technical depth, but if
somebody is arguing that it's OK for PostgreSQL to be difficult to use
for a certain category of user for several years until the next
language rev becomes mainstream, then I disagree.  The fact that
somebody wrote a patch to try to solve a problem means that the thing
in question is a problem for at least that one user.  If he's the only
one, maybe we don't need to care all that much.  If his needs are
representative of a significant user community, we should not turn our
backs on that community, regardless of whether we like the patch he
wrote, and regardless of how well we are meeting the needs of other
communities (like node.js users).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Generalized JSON output functions

2015-07-16 Thread Pavel Stehule
2015-07-16 19:51 GMT+02:00 Robert Haas robertmh...@gmail.com:

 On Wed, Jul 15, 2015 at 1:10 PM, Ryan Pedela rped...@datalanche.com
 wrote:
  Like I said previously, the
  situation with Javascript will hopefully be remedied in a few years with
 ES7
  anyway.

 I don't understand these issues in great technical depth, but if
 somebody is arguing that it's OK for PostgreSQL to be difficult to use
 for a certain category of user for several years until the next
 language rev becomes mainstream, then I disagree.  The fact that
 somebody wrote a patch to try to solve a problem means that the thing
 in question is a problem for at least that one user.  If he's the only
 one, maybe we don't need to care all that much.  If his needs are
 representative of a significant user community, we should not turn our
 backs on that community, regardless of whether we like the patch he
 wrote, and regardless of how well we are meeting the needs of other
 communities (like node.js users).


I don't think so this issue is too hot. How long we support XML? The output
format is static - the date format is fixed. How much  issues  was there?
Was there any issue, that was not solvable by casting?

If somebody needs different quoting, then it can be solved by explicit cast
in SQL query, and not in hacking our output routines.

Regards

Pavel



 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company



[HACKERS] Bugs in our qsort implementation

2015-07-16 Thread Tom Lane
I've been trying to figure out the crash in qsort reported here:
http://www.postgresql.org/message-id/flat/cal8hzunr2fr1owzhwg-p64gjtnfbbmpx1y2oxmj_xuq3p8y...@mail.gmail.com

I first noticed that our qsort code uses an int to hold some transient
values representing numbers of elements.  Since the passed array size
argument is a size_t, this is broken; the integer could overflow.
I do not think this is a live bug so far as our core code is concerned,
because tuplesort.c still limits itself to no more than INT_MAX items
to be sorted, and nothing else in the core would even approach that.
However, it's in principle a hazard for third-party modules that might try
to sort more than that; and in any case it's a bug waiting to bite us on
the rear whenever somebody decides they have enough RAM that they should
be able to sort more than INT_MAX items.

However, Yiqing reported the crash as occurring here:

Program terminated with signal 11, Segmentation fault.
#0  0x00785180 in med3_tuple (a=0x7f31613f1028, b=0x7f31613f1040, 
c=0x3ffd, cmp_tuple=0x7f43613f1010, state=0x1) at qsort_tuple.c:66
66  {

which is a bit curious because that function does not itself access any
of the data --- it just calls the cmp_tuple function, so even if we'd
somehow computed a bad address, the crash should occur inside the
comparator function, not here.

After awhile a theory occurred to me: the qsort functions recurse without
bothering with a stack depth check, so maybe the SEGV actually represents
running out of stack space.  And after a bit of research, that theory
seems pretty plausible.  It turns out that qsort is guaranteed to recurse
no deeper than log(N) levels, but *only if you take care to recurse on the
smaller partition and iterate on the larger one*.  And we're not doing
that, we just blindly recurse on the left partition.  So given a fairly
huge amount of data and some bad luck in partition-picking, it seems
possible that stack overflow explains this report.

I propose to
(1) fix the code to use a size_t variable rather than int where
appropriate;
(2) teach it to recurse on the smaller partition.

It's possible that this issue can only manifest on 9.4 and up where
we have the ability for tuplesort to allocate work arrays approaching
INT_MAX elements.  But I don't have a lot of faith in that; I think the
worst-case stack depth for the way we have it now could be as bad as O(N),
so in principle a crash could be possible with significantly smaller input
arrays.  I think we'd better back-patch this all the way.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-16 Thread Petr Jelinek

On 2015-07-13 00:36, Tom Lane wrote:


We have a far better model to follow already, namely the foreign data
wrapper API.  In that, there's a single SQL-visible function that returns
a dummy datatype indicating that it's an FDW handler, and when called,
it hands back a struct containing pointers to all the other functions
that the particular wrapper needs to supply (and, if necessary, the struct
could have non-function-pointer fields containing other info the core
system might need to know about the handler).  We could similarly invent a
pseudotype tablesample_handler and represent each tablesample method by
a single SQL function returning tablesample_handler.  Any future changes
in the API for tablesample handlers would then appear as changes in the C
definition of the struct returned by the handler, which requires no
SQL-visible thrashing, hence creates no headaches for pg_upgrade, and it
is pretty easy to design it so that failure to update an external module
that implements the API results in C compiler errors and/or sane fallback
behavior.



(back from vacation, going over this thread)

Yes this sounds very sane (and we use something similar also for logical 
decoding plugins, not just FDWs). I wish this has occurred to me before, 
I would not have to spend time on the DDL support which didn't even get in.



Once we've done that, I think we don't even need a special catalog
representing tablesample methods.  Given FROM foo TABLESAMPLE
bernoulli(...), the parser could just look for a function bernoulli()
returning tablesample_handler, and it's done.  The querytree would have an
ordinary function dependency on that function, which would be sufficient


It seems possible that we might not need catalog indeed.

This would also simplify the parser part which currently contains 
specialized function search code as we could most likely just reuse the 
generic code.



to handle DROP dependency behaviors properly.  (On reflection, maybe
better if it's bernoulli(internal) returns tablesample_handler,
so as to guarantee that such a function doesn't create a conflict with
any user-defined function of the same name.)



The probability of conflict seems high with the system() so yeah we'd 
need some kind of differentiator.



PS: now that I've written this rant, I wonder why we don't redesign the
index AM API along the same lines.  It probably doesn't matter much at
the moment, but if we ever get serious about supporting index AM
extensions, I think we ought to consider doing that.


+1

I think this is very relevant to the proposed sequence am patch as well.

--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-07-16 Thread Syed, Rahila
Hello,

Naming the GUC pgstat* seems a little inconsistent.
Sorry, there is a typo in the mail. The GUC name is 'track_activity_progress'.

Also, adding the new GUC to src/backend/utils/misc/postgresql.conf.sample
might be helpful 
Yes.  I will update.

Thank you,
Rahila Syed

__
Disclaimer: This email and any attachments are sent in strictest confidence
for the sole use of the addressee and may contain legally privileged,
confidential, and proprietary data. If you are not the intended recipient,
please advise the sender by replying promptly to this email and then delete
and destroy this email and any attachments without any further use, copying
or forwarding.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LWLock deadlock and gdb advice

2015-07-16 Thread Jeff Janes
On Wed, Jul 15, 2015 at 8:44 AM, Heikki Linnakangas hlinn...@iki.fi wrote:


 Both. Here's the patch.

 Previously, LWLockAcquireWithVar set the variable associated with the lock
 atomically with acquiring it. Before the lwlock-scalability changes, that
 was straightforward because you held the spinlock anyway, but it's a lot
 harder/expensive now. So I changed the way acquiring a lock with a variable
 works. There is now a separate flag, LW_FLAG_VAR_SET, which indicates that
 the current lock holder has updated the variable. The LWLockAcquireWithVar
 function is gone - you now just use LWLockAcquire(), which always clears
 the LW_FLAG_VAR_SET flag, and you can call LWLockUpdateVar() after that if
 you want to set the variable immediately. LWLockWaitForVar() always waits
 if the flag is not set, i.e. it will not return regardless of the
 variable's value, if the current lock-holder has not updated it yet.


I ran this for a while without casserts and it seems to work.  But with
casserts, I get failures in the autovac process on the GIN index.

I don't see how this is related to the LWLock issue, but I didn't see it
without your patch.  Perhaps the system just didn't survive long enough to
uncover it without the patch (although it shows up pretty quickly).  It
could just be an overzealous Assert, since the casserts off didn't show
problems.

bt and bt full are shown below.

Cheers,

Jeff

#0  0x003dcb632625 in raise () from /lib64/libc.so.6
#1  0x003dcb633e05 in abort () from /lib64/libc.so.6
#2  0x00930b7a in ExceptionalCondition (
conditionName=0x9a1440 !(((PageHeader) (page))-pd_special =
(__builtin_offsetof (PageHeaderData, pd_linp))), errorType=0x9a12bc
FailedAssertion,
fileName=0x9a12b0 ginvacuum.c, lineNumber=713) at assert.c:54
#3  0x004947cf in ginvacuumcleanup (fcinfo=0x7fffee073a90) at
ginvacuum.c:713
#4  0x0093b53a in FunctionCall2Coll (flinfo=0x7fffee073e60,
collation=0, arg1=140737186840448, arg2=0) at fmgr.c:1323
#5  0x004d5c7a in index_vacuum_cleanup (info=0x7fffee073f80,
stats=0x0) at indexam.c:717
#6  0x00664f69 in lazy_cleanup_index (indrel=0x7faafbcace20,
stats=0x0, vacrelstats=0x28809c8) at vacuumlazy.c:1400
#7  0x00664142 in lazy_scan_heap (onerel=0x7faafbcab6d0,
vacrelstats=0x28809c8, Irel=0x2881090, nindexes=2, scan_all=1 '\001') at
vacuumlazy.c:
#8  0x00662905 in lazy_vacuum_rel (onerel=0x7faafbcab6d0,
options=65, params=0x286ea00, bstrategy=0x286ea90) at vacuumlazy.c:247
#9  0x006623e4 in vacuum_rel (relid=16409, relation=0x7fffee074550,
options=65, params=0x286ea00) at vacuum.c:1377
#10 0x00660bea in vacuum (options=65, relation=0x7fffee074550,
relid=16409, params=0x286ea00, va_cols=0x0, bstrategy=0x286ea90,
isTopLevel=1 '\001')
at vacuum.c:295
#11 0x0075caa9 in autovacuum_do_vac_analyze (tab=0x286e9f8,
bstrategy=0x286ea90) at autovacuum.c:2811
#12 0x0075be67 in do_autovacuum () at autovacuum.c:2331
#13 0x0075ac1c in AutoVacWorkerMain (argc=0, argv=0x0) at
autovacuum.c:1648
#14 0x0075a84d in StartAutoVacWorker () at autovacuum.c:1455
#15 0x0076f745 in StartAutovacuumWorker () at postmaster.c:5261
#16 0x0076effd in sigusr1_handler (postgres_signal_arg=10) at
postmaster.c:4918
#17 signal handler called
#18 0x003dcb6e1353 in __select_nocancel () from /lib64/libc.so.6
#19 0x0076a8f0 in ServerLoop () at postmaster.c:1582
#20 0x0076a141 in PostmasterMain (argc=4, argv=0x27b2cf0) at
postmaster.c:1263
#21 0x006c1e6e in main (argc=4, argv=0x27b2cf0) at main.c:223


#0  0x003dcb632625 in raise () from /lib64/libc.so.6
No symbol table info available.
#1  0x003dcb633e05 in abort () from /lib64/libc.so.6
No symbol table info available.
#2  0x00930b7a in ExceptionalCondition (
conditionName=0x9a1440 !(((PageHeader) (page))-pd_special =
(__builtin_offsetof (PageHeaderData, pd_linp))), errorType=0x9a12bc
FailedAssertion,
fileName=0x9a12b0 ginvacuum.c, lineNumber=713) at assert.c:54
No locals.
#3  0x004947cf in ginvacuumcleanup (fcinfo=0x7fffee073a90) at
ginvacuum.c:713
buffer = 186
page = 0x7faafc565b20 
info = 0x7fffee073f80
stats = 0x2880858
index = 0x7faafbcace20
needLock = 1 '\001'
npages = 22569
blkno = 12025
totFreePages = 11502
ginstate = {index = 0x7faafbcace20, oneCol = 1 '\001', origTupdesc
= 0x7faafbcad150, tupdesc = {0x7faafbcad150, 0x0 repeats 31 times},
  compareFn = {{fn_addr = 0x90224d bttextcmp, fn_oid = 360,
fn_nargs = 2, fn_strict = 1 '\001', fn_retset = 0 '\000', fn_stats = 2
'\002',
  fn_extra = 0x0, fn_mcxt = 0x285e3c0, fn_expr = 0x0}, {fn_addr
= 0, fn_oid = 0, fn_nargs = 0, fn_strict = 0 '\000', fn_retset = 0 '\000',
  fn_stats = 0 '\000', fn_extra = 0x0, fn_mcxt = 0x0, fn_expr =
0x0} repeats 31 times}, extractValueFn = {{
  fn_addr = 0x494adc 

Re: [HACKERS] Re: [COMMITTERS] pgsql: Map basebackup tablespaces using a tablespace_map file

2015-07-16 Thread Alvaro Herrera
Amit Kapila wrote:

 This can be tracked either in 9.5 Open Items or for next CF,
 any opinions?
 
 If nobody else has any opinion on this, I will add it to 9.5 Open Items
 list.

I think this belongs in the open items list, yeah.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch to fix spelling mistake in error message

2015-07-16 Thread Magnus Hagander
On Thu, Jul 16, 2015 at 2:23 AM, David Rowley david.row...@2ndquadrant.com
wrote:

 Attached is a small patch to fix a spelling mistake in an error message


Applied, thanks.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] [PATCH] Comment fix for miscinit.c

2015-07-16 Thread Magnus Hagander
On Wed, Jul 15, 2015 at 11:18 PM, David Christensen da...@endpoint.com
wrote:

 Quick comment fix for edit issue.


Applied, thanks.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-16 Thread Alvaro Herrera
Petr Jelinek wrote:
 On 2015-07-13 00:36, Tom Lane wrote:

 PS: now that I've written this rant, I wonder why we don't redesign the
 index AM API along the same lines.  It probably doesn't matter much at
 the moment, but if we ever get serious about supporting index AM
 extensions, I think we ought to consider doing that.
 
 +1
 
 I think this is very relevant to the proposed sequence am patch as well.

Hmm, how would this work?  Would we have index AM implementation run
some function that register their support methods somehow at startup?
Hopefully we're not going to have the index AMs become shared libraries.

In any case, if indexes AMs and sequence AMs go this route, that
probably means the column store AM we're working on will probably have
to go the same route too.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-16 Thread Petr Jelinek

On 2015-07-13 15:39, Tom Lane wrote:


Datum
tsm_system_rows_init(PG_FUNCTION_ARGS)
{
 TableSampleDesc *tsdesc = (TableSampleDesc *) PG_GETARG_POINTER(0);
 uint32  seed = PG_GETARG_UINT32(1);
 int32   ntuples = PG_ARGISNULL(2) ? -1 : PG_GETARG_INT32(2);

This is rather curious coding.  Why should this function check only
one of its arguments for nullness?  If it needs to defend against
any of them being null, it really needs to check all.  But in point of
fact, this function is declared STRICT, which means it's a violation of
the function call protocol if the core code ever passes a null to it,
and so this test ought to be dead code.

A similar pattern of ARGISNULL checks in declared-strict functions exists
in all the tablesample modules, not just this one, showing that this is an
overall design error not just a thinko here.  My inclination would be to
make the core code enforce non-nullness of all tablesample arguments so as
to make it unnecessary to check strictness of the tsm*init functions, but
in any case it is Not Okay to just pass nulls willy-nilly to strict C
functions.



The reason for this ugliness came from having to have balance between 
modularity and following the SQL Standard error codes for BERNOULLI and 
SYSTEM, which came as issue during reviews (the original code did the 
checks before calling the sampling method's functions but produced just 
generic error code about wrong parameter). I considered it as okayish 
mainly because those functions are not SQL callable and the code which 
calls them knows how to handle it correctly, but I understand why you don't.


I guess if we did what you proposed in another email in this thread - 
don't have the API exposed on SQL level, we could send the additional 
parameters as List * and leave the validation completely to the 
function. (And maybe don't allow NULLs at all)



Also, I find this coding pretty sloppy even without the strictness angle,
because the net effect of this way of dealing with nulls is that an
argument-must-not-be-null complaint is reported as out of range,
which is both confusing and the wrong ERRCODE.

 if (ntuples  1)
 ereport(ERROR,
 (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
  errmsg(invalid sample size),
  errhint(Sample size must be positive integer value.)));

I don't find this to be good error message style.  The secondary comment
is not a hint, it's an ironclad statement of what you did wrong, so if
we wanted to phrase it like this it should be an errdetail not errhint.
But the whole thing is overly cute anyway because there is no reason at
all not to just say what we mean in the primary error message, eg
 ereport(ERROR,
 (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
  errmsg(sample size must be greater than zero)));



Same as above, although now that I re-read the standard I am sure I 
misunderstand it the first time - it says:
If S is the null value or if S  0 (zero) or if S  100, then an 
exception condition is raised: data exception — invalid sample size.


I took it as literal error message originally but they just mean status 
code by it.



While we're on the subject, what's the reason for disallowing a sample
size of zero?  Seems like a reasonable edge case.


Yes that's a bug.



 /* All blocks have been read, we're done */
 if (sampler-doneblocks  sampler-nblocks ||
 sampler-donetuples = sampler-ntuples)
 PG_RETURN_UINT32(InvalidBlockNumber);

Okay, I lied, I *am* going to complain about this comment.  Comments that
do not accurately describe the code they're attached to are worse than
useless.



That's copy-pasto from the tsm_system_time.


In short, I'd do something more like

 uint32  r;

 /* Safety check to avoid infinite loop or zero result for small n. */
 if (n = 1)
 return 1;

 /*
  * This should only take 2 or 3 iterations as the probability of 2 numbers
  * being relatively prime is ~61%; but just in case, we'll include a
  * CHECK_FOR_INTERRUPTS in the loop.
  */
 do {
 CHECK_FOR_INTERRUPTS();
 r = (uint32) (sampler_random_fract(randstate) * n);
 } while (r == 0 || gcd(r, n)  1);

Note however that this coding would result in an unpredictable number
of iterations of the RNG, which might not be such a good thing if we're
trying to achieve repeatability.  It doesn't matter in the context of
this module since the RNG is not used after initialization, but it would
matter if we then went on to do Bernoulli-style sampling.  (Possibly that
could be dodged by reinitializing the RNG after the initialization steps.)



Bernoulli-style sampling does not need this kind of code so it's not 
really an issue. That is unless you'd like to combine the linear probing 
and bernoulli of course, but I don't see any benefit in doing that.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 

Re: [HACKERS] [DESIGN] Incremental checksums

2015-07-16 Thread David Christensen
  - pg_disable_checksums(void) = turn checksums off for a cluster.  
Sets the state to disabled, which means bg_worker will not do 
anything.
   
  - pg_request_checksum_cycle(void) = if checksums are enabled, 
increment the data_checksum_cycle counter and set the state to 
enabling.
   
  
   If the cluster is already enabled for checksums, then what is
   the need for any other action?
 
  You are assuming this is a one-way action.  
 
 
 No, I was confused by the state (enabling) this function will set.

Okay.

  Requesting an explicit checksum cycle would be desirable in the case where 
  you want to proactively verify there is no cluster corruption to be found.
 
 
 Sure, but I think that is different from setting the state to enabling.
 In your proposal above, in enabling state cluster needs to write
 checksums, where for such a feature you only need read validation.

With “revalidating” since your database is still actively making changes, you 
need to validate writes too (think new tables, etc).  “Enabling” needs reads 
unvalidated because you’re starting from an unknown state (i.e., not 
checksummed already).

Thanks,

David
--
David Christensen
PostgreSQL Team Manager
End Point Corporation
da...@endpoint.com
785-727-1171







-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-16 Thread Petr Jelinek

On 2015-07-13 18:00, Tom Lane wrote:


And here's that.  I do *not* claim that this is a complete list of design
issues with the patch, it's just things I happened to notice in the amount
of time I've spent so far (which is already way more than I wanted to
spend on TABLESAMPLE right now).


I'm not sure that we need an extensible sampling-method capability at all,
let alone that an API for that needs to be the primary focus of a patch.
Certainly the offered examples of extension modules aren't inspiring:
tsm_system_time is broken by design (more about that below) and nobody
would miss tsm_system_rows if it weren't there.  What is far more
important is to get sampling capability into indexscans, and designing
an API ahead of time seems like mostly a distraction from that.

I'd think seriously about tossing the entire executor-stage part of the
patch, creating a stateless (history-independent) sampling filter that
just accepts or rejects TIDs, and sticking calls to that into all the
table scan node types.  Once you had something like that working well
it might be worth considering whether to try to expose an API to
generalize it.  But even then it's not clear that we really need any
more options than true-Bernoulli and block-level sampling.


I think this is not really API issue as much as opposite approach on 
what to implement first. I prioritized in first iteration for the true 
block sampling support as that's what I've been mostly asked for.


But my plan was to have at same later stage (9.6+) also ability to do 
subquery scans etc. Basically new SamplingFilter executor node which 
would pass the tuples to examinetuple() which would then decide what to 
do with it. The selection between using nextblock/nexttuple and 
examinetuple was supposed to be extension of the API where the sampling 
method would say if it supports examinetuple or nextblock/nexttuple or 
both. And eventually I wanted to rewrite bernoulli to just use the 
examinetuple() on top of whatever scan once this additional support was 
in. I think I explained this during the review stage.




The IBM paper I linked to in the other thread mentions that their
optimizer will sometimes choose to do Bernoulli sampling even if SYSTEM
was requested.  Probably that happens when it decides to do a simple
indexscan, because then there's no advantage to trying to cluster the


Yeah it happens when there is an index which is used in WHERE clause and 
has bigger selectivity than the percentage specified in the TABLESAMPLE 
clause. This of course breaks one of the common use-case though:

SELECT count(*) * 100 FROM table TABLESAMPLE SYSTEM(1) WHERE foo = bar;


sampled rows.  But in the case of a bitmap scan, you could very easily do
either true Bernoulli or block-level sampling simply by adjusting the
rules about which bits you keep or clear in the bitmap (given that you
apply the filter between collection of the index bitmap and accessing the
heap, which seems natural).  The only case where a special scan type
really seems to be needed is if you want block-level sampling, the query
would otherwise use a seqscan, *and* the sampling percentage is pretty low
--- if you'd be reading every second or third block anyway, you're likely
better off with a plain seqscan so that the kernel sees sequential access
and does prefetching.  The current API doesn't seem to make it possible to
switch between seqscan and read-only-selected-blocks based on the sampling
percentage, but I think that could be an interesting optimization.


Well you can do that if you write your own sampling method. We don't do 
that in optimizer and that's design choice, because you can't really do 
that on high level like that if you want to keep extensibility. And 
given the amount of people that asked if they can do their own sampling 
when I talked to them about this during the design stage, I consider the 
extensibility as more important. Especially if extensibility gives you 
the option to do the switching anyway, albeit on lower-level and not out 
of the box.



(Another bet that's been missed is having the samplescan logicrequest
prefetching when it is doing selective block reads.  The current API can't
support that AFAICS, since there's no expectation that nextblock calls
could be done asynchronously from nexttuple calls.)


Not sure I follow, would not it be possible to achieve this using the 
tsmseqscan set to true (it's a misnomer then, I know)?




Another issue with the API as designed is the examinetuple() callback.
Allowing sample methods to see invisible tuples is quite possibly the
worst idea in the whole patch.  They can *not* do anything with such
tuples, or they'll totally break reproducibility: if the tuple is
invisible to your scan, it might well be or soon become invisible to
everybody, whereupon it would be subject to garbage collection at the
drop of a hat.  So if an invisible tuple affects the sample method's
behavior at all, repeated scans in the same query would 

Re: [HACKERS] Freeze avoidance of very large table.

2015-07-16 Thread Sawada Masahiko
On Wed, Jul 15, 2015 at 3:07 AM, Sawada Masahiko sawada.m...@gmail.com wrote:
 On Wed, Jul 15, 2015 at 12:55 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 10 July 2015 at 15:11, Sawada Masahiko sawada.m...@gmail.com wrote:


 Oops, I had forgotten to add new file heapfuncs.c.
 Latest patch is attached.


 I think we've established the approach is desirable and defined the way
 forwards for this, so this is looking good.

 If we want to move stuff like pg_stattuple, pg_freespacemap into core,
 we could move them into heapfuncs.c.

 Some of my requests haven't been actioned yet, so I personally would not
 commit this yet. I am happy to continue as reviewer/committer unless others
 wish to take over.
 The main missing item is pg_upgrade support, which won't happen by end of
 CF1, so I am marking this as Returned With Feedback. Hopefully we can review
 this again before CF2.

 I appreciate your reviewing.
 Yeah, the pg_upgrade support and regression test for VFM patch is
 almost done now, I will submit the patch in this week after testing it
 .

Attached patch is latest v9 patch.

I added:
- regression test for visibility map (visibilitymap.sql and
visibilitymap.out files)
- pg_upgrade support (rewriting vm file to vfm file)
- regression test for pg_upgrade

Please review it.

Regards,

--
Masahiko Sawada
diff --git a/contrib/pgstattuple/pgstatapprox.c b/contrib/pgstattuple/pgstatapprox.c
index 22c5f7a..b1b6a06 100644
--- a/contrib/pgstattuple/pgstatapprox.c
+++ b/contrib/pgstattuple/pgstatapprox.c
@@ -87,7 +87,7 @@ statapprox_heap(Relation rel, output_type *stat)
 		 * If the page has only visible tuples, then we can find out the free
 		 * space from the FSM and move on.
 		 */
-		if (visibilitymap_test(rel, blkno, vmbuffer))
+		if (visibilitymap_test(rel, blkno, vmbuffer, VISIBILITYMAP_ALL_VISIBLE))
 		{
 			freespace = GetRecordedFreeSpace(rel, blkno);
 			stat-tuple_len += BLCKSZ - freespace;
diff --git a/src/backend/access/heap/Makefile b/src/backend/access/heap/Makefile
index b83d496..806ce27 100644
--- a/src/backend/access/heap/Makefile
+++ b/src/backend/access/heap/Makefile
@@ -12,6 +12,7 @@ subdir = src/backend/access/heap
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = heapam.o hio.o pruneheap.o rewriteheap.o syncscan.o tuptoaster.o visibilitymap.o
+OBJS = heapam.o hio.o pruneheap.o rewriteheap.o syncscan.o tuptoaster.o visibilitymap.o \
+	heapfuncs.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 86a2e6b..796b76f 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2131,8 +2131,9 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	CheckForSerializableConflictIn(relation, NULL, InvalidBuffer);
 
 	/*
-	 * Find buffer to insert this tuple into.  If the page is all visible,
-	 * this will also pin the requisite visibility map page.
+	 * Find buffer to insert this tuple into.  If the page is all visible
+	 * or all frozen, this will also pin the requisite visibility map and
+	 * frozen map page.
 	 */
 	buffer = RelationGetBufferForTuple(relation, heaptup-t_len,
 	   InvalidBuffer, options, bistate,
@@ -2147,7 +2148,11 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	if (PageIsAllVisible(BufferGetPage(buffer)))
 	{
 		all_visible_cleared = true;
+
+		/* all-frozen information is also cleared at the same time */
 		PageClearAllVisible(BufferGetPage(buffer));
+		PageClearAllFrozen(BufferGetPage(buffer));
+
 		visibilitymap_clear(relation,
 			ItemPointerGetBlockNumber((heaptup-t_self)),
 			vmbuffer);
@@ -2448,7 +2453,11 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
 		if (PageIsAllVisible(page))
 		{
 			all_visible_cleared = true;
+
+			/* all-frozen information is also cleared at the same time */
 			PageClearAllVisible(page);
+			PageClearAllFrozen(page);
+
 			visibilitymap_clear(relation,
 BufferGetBlockNumber(buffer),
 vmbuffer);
@@ -2731,9 +2740,9 @@ heap_delete(Relation relation, ItemPointer tid,
 
 	/*
 	 * If we didn't pin the visibility map page and the page has become all
-	 * visible while we were busy locking the buffer, we'll have to unlock and
-	 * re-lock, to avoid holding the buffer lock across an I/O.  That's a bit
-	 * unfortunate, but hopefully shouldn't happen often.
+	 * visible or all frozen while we were busy locking the buffer, we'll
+	 * have to unlock and re-lock, to avoid holding the buffer lock across an
+	 * I/O.  That's a bit unfortunate, but hopefully shouldn't happen often.
 	 */
 	if (vmbuffer == InvalidBuffer  PageIsAllVisible(page))
 	{
@@ -2925,10 +2934,15 @@ l1:
 	 */
 	PageSetPrunable(page, xid);
 
+	/* clear PD_ALL_VISIBLE and PD_ALL_FORZEN flags */
 	if (PageIsAllVisible(page))
 	{
 		all_visible_cleared = true;
+
+		/* all-frozen information is also cleared at the same time */
 		PageClearAllVisible(page);
+		

Re: [HACKERS] optimizing vacuum truncation scans

2015-07-16 Thread Amit Kapila
On Thu, Jul 16, 2015 at 11:21 AM, Haribabu Kommi kommi.harib...@gmail.com
wrote:

 On Wed, Jul 15, 2015 at 11:19 PM, Amit Kapila amit.kapil...@gmail.com
wrote:
 
  One case where this patch can behave differently then current
  code is, when before taking Exclusive lock on relation for truncation,
  if some backend clears the vm bit and then inserts-deletes-prune that
  page.  I think with patch it will not consider to truncate pages whereas
  current code will allow to truncate it as it re-verifies each page.  I
think
  such a case would be rare and we might not want to bother about it,
  but still worth to think about it once.

 Thanks for your review.

 corrected the code as instead of returning the blkno after visibility map
 check failure, the code just continue to verify the contents as
 earlier approach.


Okay, I think this should work.


 Here I attached updated patches,
 1) without prefetch logic.
 2) with combined vm and prefetch logic.


I think it is better to just get the first patch in as that in itself is a
clear win and then we can evaluate the second one (prefetching during
truncation) separately.  I think after first patch, the use case for doing
prefetch seems to be less beneficial and I think it could hurt by loading
not required pages (assume you have prefetched 32 pages and after
inspecting first page, it decides to quit the loop as that is non-empty page
and other is when it has prefetched just because for page the visibility map
bit was cleared, but for others it is set) in OS buffer cache.

Having said that, I am not against prefetching in this scenario as that
can help in more cases then it could hurt, but I think it will be better
to evaluate that separately.



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-16 Thread Alvaro Herrera
Petr Jelinek wrote:
 On 2015-07-13 15:39, Tom Lane wrote:

 I don't find this to be good error message style.  The secondary comment
 is not a hint, it's an ironclad statement of what you did wrong, so if
 we wanted to phrase it like this it should be an errdetail not errhint.
 But the whole thing is overly cute anyway because there is no reason at
 all not to just say what we mean in the primary error message, eg
  ereport(ERROR,
  (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
   errmsg(sample size must be greater than zero)));
 
 Same as above, although now that I re-read the standard I am sure I
 misunderstand it the first time - it says:
 If S is the null value or if S  0 (zero) or if S  100, then an exception
 condition is raised: data exception — invalid sample size.
 I took it as literal error message originally but they just mean status code
 by it.

Yes, must use a new errcode ERRCODE_INVALID_SAMPLE_SIZE defined to 2202H
here.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Map basebackup tablespaces using a tablespace_map file

2015-07-16 Thread Fujii Masao
On Fri, Jul 3, 2015 at 12:15 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Thu, Jul 2, 2015 at 7:44 PM, Alvaro Herrera alvhe...@2ndquadrant.com
 wrote:

 Amit Kapila wrote:
 
  Added the above log messages in attached patch with small change
  such that in message, file names will be displayed with quotes as most
  of other usages of rename (failure) in that file uses quotes to display
  filenames.

 Why emit two messages?

 not necessary.

  Can we reduce that to a single one?  Maybe the
 first one could be errdetail or something.


 I think it is better other way (basically have second one as errdetail).
 We already have one similar message in xlog.c that way.
 ereport(LOG,
 (errmsg(online backup mode canceled),
 errdetail(\%s\ was renamed to \%s\.,
   BACKUP_LABEL_FILE, BACKUP_LABEL_OLD)));

 Attached patch consolidates errmsg into one message.

Here are some minor comments:

+ereport(LOG,
+(errmsg(ignoring \%s\ file because no
\%s\ file exists,
+TABLESPACE_MAP, BACKUP_LABEL_FILE),
+ errdetail(could not rename file \%s\ to
\%s\: %m,
+   TABLESPACE_MAP, TABLESPACE_MAP_OLD)));

WARNING is better than LOG here because it indicates a problematic case?

In detail message, the first word of sentence needs to be capitalized.

+ errdetail(renamed file \%s\ to \%s\,
+   TABLESPACE_MAP, TABLESPACE_MAP_OLD)));

In detail message, basically we should use a complete sentence.
So like other similar detail messages in xlog.c, I think that it's better
to use \%s\ was renamed to \%s\. as the detail message here.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Volatility of pg_xact_commit_timestamp() and pg_last_committed_xact()

2015-07-16 Thread Fujii Masao
Hi,

Volatilities of pg_xact_commit_timestamp() and pg_last_committed_xact()
are now STABLE. But ISTM that those functions can return different results
even within a single statement. So we should change the volatilities of them
to VOLATILE?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-07-16 Thread Ildus Kurbangaliev

 On Jul 14, 2015, at 5:25 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 
 Robert Haas robertmh...@gmail.com writes:
 I really think we should do the simple thing first.  If we make this
 complicated and add lots of bells and whistles, it is going to be much
 harder to get anything committed, because there will be more things
 for somebody to object to.  If we start with something simple, we can
 always improve it later, if we are confident that the design for
 improving it is good.  The hardest thing about a proposal like this is
 going to be getting down the overhead to a level that is acceptable,
 and every expansion of the basic design that has been proposed -
 gathering more than one byte of information, or gathering times, or
 having the backend update a tracking hash - adds *significant*
 overhead to the design I proposed.
 
 FWIW, I entirely share Robert's opinion that adding gettimeofday()
 overhead in routinely-taken paths is likely not to be acceptable.
 But there's no need to base this solely on opinions.  I suggest somebody
 try instrumenting just one hotspot in the various ways that are being
 proposed, and then we can actually measure what it costs, instead
 of guessing about that.
 
   regards, tom lane
 

I made benchmark of gettimeofday(). I believe it is certainly usable for 
monitoring.
Testing configuration:
24 cores,  Intel Xeon CPU X5675@3.07Ghz
RAM 24 GB

54179703 - microseconds total
2147483647 - (INT_MAX), the number of gettimeofday() calls

 54179703 / 2147483647.0
0.025229390256679331

Here we have the average duration of one gettimeofday in microseconds.

Now we get the count of all waits in one minute (pgbench -i -s 500, 
2 GB shared buffers, started with -c 96 -j 4, it is almost 100% cpu load).

b1=# select sum(wait_count) from pg_stat_wait_profile;
sum   
-
2113608

So, we can estimate the time of gtd in one minute (it multiplies by two,
because we are going to call it twice for each wait)

 2113608 * 0.025229390256679331 * 2
106650.08216327897  

This is time in microseconds that we spend on gtd in one minute.
Calculation of overhead (in percents):

 106650 / 6000.0 * 100
0.17775

gtd_test.c
Description: Binary data



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-16 Thread Petr Jelinek

On 2015-07-16 15:59, Tom Lane wrote:

Alvaro Herrera alvhe...@2ndquadrant.com writes:

Petr Jelinek wrote:

On 2015-07-13 00:36, Tom Lane wrote:

PS: now that I've written this rant, I wonder why we don't redesign the
index AM API along the same lines.  It probably doesn't matter much at
the moment, but if we ever get serious about supporting index AM
extensions, I think we ought to consider doing that.



I think this is very relevant to the proposed sequence am patch as well.



Hmm, how would this work?  Would we have index AM implementation run
some function that register their support methods somehow at startup?


Well, registration of new index AMs is an unsolved question ATM anyhow.
But what I'm imagining is that pg_am would reduce to about two columns,
amname and a handler function OID, and everything else that constitutes
the API for AMs would get moved down to the C level.  We have to keep that
catalog because we still need index AMs to have OIDs that will represent
them in pg_opclass etc; but we don't need to nail the exact set of AM
interface functions into the catalog.  (I'm not sure whether we'd want
to remove all the bool columns from pg_am.  At the C level it would be
about as convenient to have them in a struct returned by the handler
function.  But it's occasionally useful to have those properties
visible to SQL queries.)


This is along the lines of how I was thinking also (when I read your 
previous email). I think the properties of the index will have to be 
decided on individual basis once somebody actually starts working on 
this. But functions can clearly go into C struct if they are called only 
from C anyway.



I'm not clear on whether sequence AMs would need explicit catalog
representation, or could be folded down to just a single SQL function
with special signature as I suggested for tablesample handlers.
Is there any need for a sequence AM to have additional catalog
infrastructure like index AMs need?



That depends on the route we will choose to take with the storage there. 
If we allow custom columns for sequence AMs (which is what both Heikki 
and me seem to be inclined to do) then I think it will still need 
catalog, plus it's also easier to just reuse the relam behavior than 
coming up up with something completely new IMHO.



In any case, if indexes AMs and sequence AMs go this route, that
probably means the column store AM we're working on will probably have
to go the same route too.


It's worth considering anyway.  The FDW API has clearly been far more
successful than the index AM API in terms of being practically usable
by extensions.



Yep, I now consider it to be a clear mistake that I modeled both 
sequence am and tablesample after indexes given that I targeted both at 
extensibility.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-16 Thread Amit Langote
On Thu, Jul 16, 2015 at 10:33 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Petr Jelinek wrote:
 On 2015-07-13 00:36, Tom Lane wrote:

 PS: now that I've written this rant, I wonder why we don't redesign the
 index AM API along the same lines.  It probably doesn't matter much at
 the moment, but if we ever get serious about supporting index AM
 extensions, I think we ought to consider doing that.

 +1

 I think this is very relevant to the proposed sequence am patch as well.

 Hmm, how would this work?  Would we have index AM implementation run
 some function that register their support methods somehow at startup?
 Hopefully we're not going to have the index AMs become shared libraries.


I recall a proposal by Alexander Korotkov about extensible access
methods although his proposal also included a CREATE AM command that
would add a pg_am row so that perhaps differs from what Tom seems to
allude to here.

http://www.postgresql.org/message-id/capphfdsxwzmojm6dx+tjnpyk27kt4o7ri6x_4oswcbyu1rm...@mail.gmail.com

Thanks,
Amit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-07-16 Thread Tom Lane
Ildus Kurbangaliev i.kurbangal...@postgrespro.ru writes:
 I made benchmark of gettimeofday(). I believe it is certainly usable for 
 monitoring.
 Testing configuration:
 24 cores,  Intel Xeon CPU X5675@3.07Ghz
 RAM 24 GB

 54179703 - microseconds total
 2147483647 - (INT_MAX), the number of gettimeofday() calls

54179703 / 2147483647.0
   0.025229390256679331

 Here we have the average duration of one gettimeofday in microseconds.

25 nsec per gettimeofday() is in the same ballpark as what I measured
on a new-ish machine last year:
http://www.postgresql.org/message-id/flat/31856.1400021...@sss.pgh.pa.us

The problem is that (a) on modern hardware that is not a small number,
it's the equivalent of 100 or more instructions; and (b) the results
look very much worse on less-modern hardware, particularly machines
where gettimeofday requires a kernel call.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] postgres_fdw extension support

2015-07-16 Thread Amit Langote
On Thu, Jul 16, 2015 at 11:06 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Amit Langote langote_amit...@lab.ntt.co.jp writes:
 On 2015-07-16 PM 12:43, Tom Lane wrote:
 The basic issue here is how can a user control which functions/operators
 can be sent for remote execution?.  While it's certainly true that
 sometimes you might want function-by-function control of that, Paul's
 point was that extension-level granularity would be extremely convenient
 for PostGIS, and probably for other extensions.

 Perhaps just paranoid but is the extension version number any significant?

 In any scenario for user control of sending functions to the far end, it's
 on the user's head to make sure that he's telling us the truth about which
 functions are compatible between local and remote servers.  That would
 extend to checking cross-version compatibility if he's running different
 versions, too.  We already have risks of that kind with built-in
 functions, really, and I've not heard complaints about it.


Yeah, that's true.

Thanks,
Amit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-16 Thread Tom Lane
Amit Langote amitlangot...@gmail.com writes:
 On Thu, Jul 16, 2015 at 10:33 PM, Alvaro Herrera
 alvhe...@2ndquadrant.com wrote:
 Hmm, how would this work?  Would we have index AM implementation run
 some function that register their support methods somehow at startup?

 I recall a proposal by Alexander Korotkov about extensible access
 methods although his proposal also included a CREATE AM command that
 would add a pg_am row so that perhaps differs from what Tom seems to
 allude to here.

I think we'd still need to invent CREATE AM if we wanted to allow index
AMs to be created as extensions; we'd still have to have the pg_am
catalog, and extensions still couldn't write rows directly into that,
for the same reasons I pointed out with respect to tablesample methods.

However, if the contents of pg_am could be boiled down to just a name and
a handler function, then that would represent a simple and completely
stable definition for CREATE AM's arguments, which would be a large
improvement over trying to reflect the current contents of pg_am directly
in a SQL statement.  We add new columns to pg_am all the time, and that
would create huge backward-compatibility headaches if we had to modify
the behavior of a CREATE AM statement every time.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] postgres_fdw extension support

2015-07-16 Thread Tom Lane
Amit Langote langote_amit...@lab.ntt.co.jp writes:
 On 2015-07-16 PM 12:43, Tom Lane wrote:
 The basic issue here is how can a user control which functions/operators
 can be sent for remote execution?.  While it's certainly true that
 sometimes you might want function-by-function control of that, Paul's
 point was that extension-level granularity would be extremely convenient
 for PostGIS, and probably for other extensions.

 Perhaps just paranoid but is the extension version number any significant?

In any scenario for user control of sending functions to the far end, it's
on the user's head to make sure that he's telling us the truth about which
functions are compatible between local and remote servers.  That would
extend to checking cross-version compatibility if he's running different
versions, too.  We already have risks of that kind with built-in
functions, really, and I've not heard complaints about it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE doesn't actually satisfy the SQL spec, does it?

2015-07-16 Thread Tom Lane
Petr Jelinek p...@2ndquadrant.com writes:
 On 2015-07-12 18:02, Tom Lane wrote:
 A possible way around this problem is to redefine the sampling rule so
 that it is not history-dependent but depends only on the tuple TIDs.
 For instance, one could hash the TID of a candidate tuple, xor that with
 a hash of the seed being used for the current query, and then select the
 tuple if (hash/MAXINT)  P.

 That would work for bernoulli for physical tuples, yes. Only thing that 
 worries me is future extensibility for data sources that only provide 
 virtual tuples.

Well, repeatability of a TABLESAMPLE attached to a join seems like an
unsolved and possibly unsolvable problem anyway.  I don't think we should
assume that the API we define today will cope with that.

But that is another reason why the current API is inadequate: there's no
provision for specifying whether or how a tablesample method can be
applied to non-base-table RTEs.  (I re-read the thread and noted that
Peter E. complained about that some time ago, but nothing was done about
it.  I'm fine with not supporting the case right now, but nonetheless
it's another reason why we'd better make the API more easily extensible.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-16 Thread Tom Lane
Petr Jelinek p...@2ndquadrant.com writes:
 On 2015-07-16 15:59, Tom Lane wrote:
 I'm not clear on whether sequence AMs would need explicit catalog
 representation, or could be folded down to just a single SQL function
 with special signature as I suggested for tablesample handlers.
 Is there any need for a sequence AM to have additional catalog
 infrastructure like index AMs need?

 That depends on the route we will choose to take with the storage there. 
 If we allow custom columns for sequence AMs (which is what both Heikki 
 and me seem to be inclined to do) then I think it will still need 
 catalog, plus it's also easier to just reuse the relam behavior than 
 coming up up with something completely new IMHO.

Hm.  I've not been following the sequence AM stuff carefully, but if you
want to use relam to point at a sequence's AM then really sequence AMs
have to be represented in pg_am.  (It would be quite broken from a
relational theory standpoint if relam could point to either of two
catalogs; not to mention that we have no way to guarantee OID uniqueness
across multiple catalogs.)

As things stand today, putting both kinds of AM into one catalog would be
pretty horrible, but it seems not hard to make it work if we did this sort
of refactoring first.  We'd end up with three columns in pg_am: amname,
amkind (index or sequence), and amhandler.  The type and contents of the
struct returned by amhandler could be different depending on amkind, which
means that the APIs could be as different as we need, despite sharing just
one catalog.  The only real restriction is that index and sequence AMs
could not have the same names, which doesn't seem like much of a problem
from here.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE doesn't actually satisfy the SQL spec, does it?

2015-07-16 Thread Petr Jelinek

On 2015-07-16 16:22, Tom Lane wrote:

Petr Jelinek p...@2ndquadrant.com writes:

On 2015-07-12 18:02, Tom Lane wrote:

A possible way around this problem is to redefine the sampling rule so
that it is not history-dependent but depends only on the tuple TIDs.
For instance, one could hash the TID of a candidate tuple, xor that with
a hash of the seed being used for the current query, and then select the
tuple if (hash/MAXINT)  P.



That would work for bernoulli for physical tuples, yes. Only thing that
worries me is future extensibility for data sources that only provide
virtual tuples.


Well, repeatability of a TABLESAMPLE attached to a join seems like an
unsolved and possibly unsolvable problem anyway.  I don't think we should
assume that the API we define today will cope with that.



Ok, It's true that the implementations I've seen in other databases so 
far only concern themselves by sampling physical relations and ignore 
the rest.



But that is another reason why the current API is inadequate: there's no
provision for specifying whether or how a tablesample method can be
applied to non-base-table RTEs.  (I re-read the thread and noted that
Peter E. complained about that some time ago, but nothing was done about
it.  I'm fine with not supporting the case right now, but nonetheless
it's another reason why we'd better make the API more easily extensible.)


Nothing in terms of implementation yes, I did write my idea on how this 
could be done via extending the current API in the future. I won't try 
to pretend that I am absolutely sure that the API might not need some 
breaking change to do that though.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-07-16 Thread Robert Haas
On Thu, Jul 16, 2015 at 10:54 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Ildus Kurbangaliev i.kurbangal...@postgrespro.ru writes:
 I made benchmark of gettimeofday(). I believe it is certainly usable for 
 monitoring.
 Testing configuration:
 24 cores,  Intel Xeon CPU X5675@3.07Ghz
 RAM 24 GB

 54179703 - microseconds total
 2147483647 - (INT_MAX), the number of gettimeofday() calls

54179703 / 2147483647.0
   0.025229390256679331

 Here we have the average duration of one gettimeofday in microseconds.

 25 nsec per gettimeofday() is in the same ballpark as what I measured
 on a new-ish machine last year:
 http://www.postgresql.org/message-id/flat/31856.1400021...@sss.pgh.pa.us

 The problem is that (a) on modern hardware that is not a small number,
 it's the equivalent of 100 or more instructions; and (b) the results
 look very much worse on less-modern hardware, particularly machines
 where gettimeofday requires a kernel call.

Yes, we've been through this many times before.  All you have to do is
look at how much slower a query gets when you run EXPLAIN ANALYZE vs.
when you run it without EXPLAIN ANALYZE.  The slowdown there is
platform-dependent, but I think it's significant even on platforms
where gettimeofday is fast, like modern Linux machines.  That overhead
is precisely the reason why we added EXPLAIN (ANALYZE, TIMING OFF) -
so that if you want to, you can see the row-count estimates without
incurring the timing overhead.  There is *plenty* of evidence that
using gettimeofday in contexts where it may be called many times per
query measurably hurts performance.

It is possible that we can have an *optional feature* where timing can
be turned on, but it is dead certain that turning it on
unconditionally will be unacceptable.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Retrieve the snapshot's LSN

2015-07-16 Thread Robert Haas
On Thu, Jul 16, 2015 at 12:54 PM, Andres Freund and...@anarazel.de wrote:
 Well, in combination with logical decoding it kinda has one: It should
 allow you to take a dump of the database with a certain snapshot and
 replay all transactions with a commit lsn bigger than the snapshot's
 lsn and end up with a continually consistent database.

 Visibility for HS actually works precisely in commit LSN order, even if
 that is possibly different than on the primary...

That makes sense, and hopefully answers Florent's question about why
this is only exposed through the slot mechanism.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Retrieve the snapshot's LSN

2015-07-16 Thread Andres Freund
On 2015-07-16 12:40:07 -0400, Robert Haas wrote:
 On Wed, Jul 15, 2015 at 12:51 PM, Florent Guiliani flor...@guiliani.fr 
 wrote:
  During slot creation, the snapshot building and exporting code seems
  highly coupled with the logical decoding stuff. It doesn't seems much
  reusable to retrieve the snapshot's LSN outside of logical decoding.

 I don't think the snapshot's LSN has a well-defined meaning in
 general.  The obvious meaning would be the LSN such that all commits
 prior to that LSN are visible and all later commits are invisible,
 but such an LSN need not exist.  Suppose A writes a commit record at
 LSN 0/1, and then B writes a commit record at 0/10100, and then B
 calls ProcArrayEndTransaction().  At this point, B is visible and A is
 not visible, even though A's commit record precedes that of B.

Well, in combination with logical decoding it kinda has one: It should
allow you to take a dump of the database with a certain snapshot and
replay all transactions with a commit lsn bigger than the snapshot's
lsn and end up with a continually consistent database.

Visibility for HS actually works precisely in commit LSN order, even if
that is possibly different than on the primary...


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Map basebackup tablespaces using a tablespace_map file

2015-07-16 Thread Robert Haas
On Thu, Jul 16, 2015 at 9:41 AM, Fujii Masao masao.fu...@gmail.com wrote:
 Here are some minor comments:

 +ereport(LOG,
 +(errmsg(ignoring \%s\ file because no
 \%s\ file exists,
 +TABLESPACE_MAP, BACKUP_LABEL_FILE),
 + errdetail(could not rename file \%s\ to
 \%s\: %m,
 +   TABLESPACE_MAP, TABLESPACE_MAP_OLD)));

 WARNING is better than LOG here because it indicates a problematic case?

No, that's not the right distinction.  Remember that, when sending
messages to the client, WARNING  LOG, and when sending messages to
the log, LOG  WARNING.  So messages that a user is more likely to
care about than the administrator should be logged at WARNNG; those
that the administrator is more likely to care about should be LOG.  I
think LOG is clearly the appropriate thing here.

 In detail message, the first word of sentence needs to be capitalized.

 + errdetail(renamed file \%s\ to \%s\,
 +   TABLESPACE_MAP, TABLESPACE_MAP_OLD)));

 In detail message, basically we should use a complete sentence.
 So like other similar detail messages in xlog.c, I think that it's better
 to use \%s\ was renamed to \%s\. as the detail message here.

Right, that's what the style guidelines say.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Retrieve the snapshot's LSN

2015-07-16 Thread Robert Haas
On Wed, Jul 15, 2015 at 12:51 PM, Florent Guiliani flor...@guiliani.fr wrote:
 During slot creation, the snapshot building and exporting code seems
 highly coupled with the logical decoding stuff. It doesn't seems much
 reusable to retrieve the snapshot's LSN outside of logical decoding.

I don't think the snapshot's LSN has a well-defined meaning in
general.  The obvious meaning would be the LSN such that all commits
prior to that LSN are visible and all later commits are invisible,
but such an LSN need not exist.  Suppose A writes a commit record at
LSN 0/1, and then B writes a commit record at 0/10100, and then B
calls ProcArrayEndTransaction().  At this point, B is visible and A is
not visible, even though A's commit record precedes that of B.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Map basebackup tablespaces using a tablespace_map file

2015-07-16 Thread Alvaro Herrera
Robert Haas wrote:
 On Thu, Jul 16, 2015 at 9:41 AM, Fujii Masao masao.fu...@gmail.com wrote:
  Here are some minor comments:
 
  +ereport(LOG,
  +(errmsg(ignoring \%s\ file because no
  \%s\ file exists,
  +TABLESPACE_MAP, BACKUP_LABEL_FILE),
  + errdetail(could not rename file \%s\ to
  \%s\: %m,
  +   TABLESPACE_MAP, TABLESPACE_MAP_OLD)));
 
  WARNING is better than LOG here because it indicates a problematic case?
 
 No, that's not the right distinction.  Remember that, when sending
 messages to the client, WARNING  LOG, and when sending messages to
 the log, LOG  WARNING.  So messages that a user is more likely to
 care about than the administrator should be logged at WARNNG; those
 that the administrator is more likely to care about should be LOG.  I
 think LOG is clearly the appropriate thing here.

Right.  Now that we have errors marked with criticality, it will be
pretty obvious what message is indicating a system problem rather than
just a problem that can be ignored without taking any action.

... oh, actually we don't have the criticality marking yet, do we.
I think we need to fix that at some point.  (Some guys have said to grep
the log for certain SQLSTATE codes, but, uh, really?)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] postgres_fdw extension support

2015-07-16 Thread Paul Ramsey
Michael, thanks so much for the review!

On July 15, 2015 at 7:35:11 PM, Michael Paquier (michael.paqu...@gmail.com) 
wrote:
This patch includes some diff noise, it would be better to remove that. 
Done.

- if (!is_builtin(fe-funcid)) 
+ if (!is_builtin(fe-funcid)  
(!is_in_extension(fe-funcid, fpinfo))) 
Extra parenthesis are not needed. 
OK, will remove.

+ if ( (!is_builtin(oe-opno))  
(!is_in_extension(oe-opno, fpinfo)) ) 
... And this does not respect the project code format. See here for 
more details for example: 
http://www.postgresql.org/docs/devel/static/source.html 
I’m sorry, that link doesn’t clarify for me what’s stylistically wrong here 
(it’s almost all about error messages), could you help (is it the padding 
around the conditionals? removed that)

+ /* PostGIS metadata */ 
+ List *extensions; 
+ bool use_postgis; 
+ Oid postgis_oid; 
This addition in PgFdwRelationInfo is surprising. What the point of 
keeping use_postgis and postgres_oid that are actually used nowhere? 
Whoops, a couple old pieces from my proof-of-concept survived the conversion to 
a generic features. Removed.

appendStringInfo(buf, ::%s, 
- format_type_with_typemod(node-consttype, 
- node-consttypmod)); 
+ format_type_be_qualified(node-consttype)); 
What's the reason for this change? 
Good question. As I recall, the very sparse search path the FDW connection 
makes can sometimes leave remote function failing to find other functions they 
need, so we want to force the calls to be schema-qualified. Unfortunately 
there’s no perfect public call for that, ideally it would be return 
format_type_internal(type_oid, typemod, true, false, true), but there’s no 
function like that, so I settled for format_type_be_qualified(), which forces 
qualification at the expense of ignoring the typmod.

Thinking a bit wider, why is this just limited to extensions? You may 
have as well other objects defined locally and remotely like functions 
or operators that are not defined in extensions, but as normal 
objects. Hence I think that what you are looking for is not this 
option, but a boolean option enforcing the behavior of code paths 
using is_builtin() in foreign_expr_walker such as the sanity checks on 
existing objects are not limited to FirstBootstrapObjectId but to 
other objects in the catalogs. 
Well, as I see it there’s three broad categories of behavior available:

1- Forward nothing non-built-in (current behavior)

2- Use options to forward only specified non-built-in things (either in 
function chunks (extensions, as in this patch) or one-by-one (mark your desired 
functions / ops)

3- Forward everything if a “forward everything” option is set

I hadn’t actually considered the possibility of option 3, but for my purposes 
it would work just as well, with the added efficiency bonus of not having to 
check whether particular funcs/ops are inside declared extensions. Both the 
current state of FDW and the patch I’m providing expect a *bit* of smarts on 
the part of users, to make sure their remote/local environments are in sync (in 
particular versions of pgsql and of extensions). Option 3 just ups the ante on 
that requirement. I’d be fine w/ this, makes the patch very simple and fast.

For option 2, marking things one at a time really isn’t practical for a package 
like PostGIS, with several hundred functions and operators. Using the larger 
block of “extension” makes more sense. I think in general, expecting users to 
itemize every func/op they want to forward, particular if they just want an 
extension to “work” over FDW is too big an expectation. That’s not to minimize 
the utility of being able to mark individual functions/ops for forwarding, but 
I think that’s a separate use case that doesn’t eliminate the need for 
extension-level forwarding.

Thanks again for the review!

P.

 

That's a risky option because it could 
lead to inconsistencies among objects, so obviously the default is 
false but by documenting correctly the risks of using this option we 
may be able to get something integrated (aka be sure that each object 
is defined consistently across the servers queried or you'll have 
surprising results!). In short, it seems to me that you are taking the 
wrong approach. 
-- 



fdw-extension-support-2.diff
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: multiple psql option -c

2015-07-16 Thread Fabrízio de Royes Mello
On Thu, Jul 16, 2015 at 4:42 PM, Pavel Stehule pavel.steh...@gmail.com
wrote:

 Hi

 can we support multiple -c option?

 Why? Because some statements like VACUUM cannot be used together with any
other statements with single -c option. The current solution is using echo
and pipe op, but it is a complication in some complex scripts - higher
complication when you run psql via multiple sudo statement.

 Example:

 psql -c select pg_stat_reset() -c vacuum full analyze dbname

 or on all db

 psql -At -c select datname from pg_databases postgres | \
 xargs -n 1 -P 3 psql -c ... -c ...

 Ideas, notes, comments?


Why you want it if we already have the -f option that cover this use case?

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog: http://fabriziomello.github.io
 Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello
 Github: http://github.com/fabriziomello


[HACKERS] proposal: multiple psql option -c

2015-07-16 Thread Pavel Stehule
Hi

can we support multiple -c option?

Why? Because some statements like VACUUM cannot be used together with any
other statements with single -c option. The current solution is using echo
and pipe op, but it is a complication in some complex scripts - higher
complication when you run psql via multiple sudo statement.

Example:

psql -c select pg_stat_reset() -c vacuum full analyze dbname

or on all db

psql -At -c select datname from pg_databases postgres | \
xargs -n 1 -P 3 psql -c ... -c ...

Ideas, notes, comments?

Regards

Pavel


Re: [HACKERS] proposal: multiple psql option -c

2015-07-16 Thread Pavel Stehule
2015-07-16 23:10 GMT+02:00 Rosser Schwarz rosser.schw...@gmail.com:

 On Thu, Jul 16, 2015 at 1:44 PM, Pavel Stehule pavel.steh...@gmail.com
 wrote:

 2015-07-16 22:07 GMT+02:00 Fabrízio de Royes Mello 
 fabriziome...@gmail.com:

 Why you want it if we already have the -f option that cover this use
 case?

 It doesn't help me - we would to run script or remote script (via ssh)
 without necessity to create (and later drop) files on production servers.


 Does piping a series of commands into psql work in your scenario? You can
 even say things like:

 cat $local_file | ssh $production_server 'psql $database'


probably not - the first remote command is sudo su - due security reasons




 --
 :wq



Re: [HACKERS] proposal: multiple psql option -c

2015-07-16 Thread Pavel Stehule
2015-07-16 22:07 GMT+02:00 Fabrízio de Royes Mello fabriziome...@gmail.com
:


 On Thu, Jul 16, 2015 at 4:42 PM, Pavel Stehule pavel.steh...@gmail.com
 wrote:
 
  Hi
 
  can we support multiple -c option?
 
  Why? Because some statements like VACUUM cannot be used together with
 any other statements with single -c option. The current solution is using
 echo and pipe op, but it is a complication in some complex scripts - higher
 complication when you run psql via multiple sudo statement.
 
  Example:
 
  psql -c select pg_stat_reset() -c vacuum full analyze dbname
 
  or on all db
 
  psql -At -c select datname from pg_databases postgres | \
  xargs -n 1 -P 3 psql -c ... -c ...
 
  Ideas, notes, comments?
 

 Why you want it if we already have the -f option that cover this use case?


It doesn't help me - we would to run script or remote script (via ssh)
without necessity to create (and later drop) files on production servers.

remote execution of scripts is much more simple if you don't need to create
any script files.

Regards

Pavel


 Regards,

 --
 Fabrízio de Royes Mello
 Consultoria/Coaching PostgreSQL
  Timbira: http://www.timbira.com.br
  Blog: http://fabriziomello.github.io
  Linkedin: http://br.linkedin.com/in/fabriziomello
  Twitter: http://twitter.com/fabriziomello
  Github: http://github.com/fabriziomello



Re: [HACKERS] Memory prefetching while sequentially fetching from SortTuple array, tuplestore

2015-07-16 Thread Tom Lane
Peter Geoghegan p...@heroku.com writes:
 Attached patch adds a portable Postgres wrapper on the GCC intrinsic.

Meh.  I don't like the assumption that non-GCC compilers will be smart
enough to optimize away the useless-to-them if() tests this adds.
Please refactor that so that there is exactly 0 new code when the
intrinsic doesn't exist.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-07-16 Thread Peter Geoghegan
On Tue, Jul 14, 2015 at 7:25 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 FWIW, I entirely share Robert's opinion that adding gettimeofday()
 overhead in routinely-taken paths is likely not to be acceptable.

I think that it can depend on many factors. For example, the
availability of vDSO support on Linux/glibc.

I've heard that clock_gettime() with CLOCK_REALTIME_COARSE, or with
CLOCK_MONOTONIC_COARSE can have significantly lower overhead than
gettimeofday().
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Map basebackup tablespaces using a tablespace_map file

2015-07-16 Thread Amit Kapila
On Thu, Jul 16, 2015 at 12:30 PM, Alvaro Herrera alvhe...@2ndquadrant.com
wrote:

 Amit Kapila wrote:

  This can be tracked either in 9.5 Open Items or for next CF,
  any opinions?
 
  If nobody else has any opinion on this, I will add it to 9.5 Open Items
  list.

 I think this belongs in the open items list, yeah.


Okay, added to the list of 9.5 Open Items.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Re: [COMMITTERS] pgsql: Map basebackup tablespaces using a tablespace_map file

2015-07-16 Thread Fujii Masao
On Fri, Jul 17, 2015 at 1:28 AM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Jul 16, 2015 at 9:41 AM, Fujii Masao masao.fu...@gmail.com wrote:
 Here are some minor comments:

 +ereport(LOG,
 +(errmsg(ignoring \%s\ file because no
 \%s\ file exists,
 +TABLESPACE_MAP, BACKUP_LABEL_FILE),
 + errdetail(could not rename file \%s\ to
 \%s\: %m,
 +   TABLESPACE_MAP, TABLESPACE_MAP_OLD)));

 WARNING is better than LOG here because it indicates a problematic case?

 No, that's not the right distinction.  Remember that, when sending
 messages to the client, WARNING  LOG, and when sending messages to
 the log, LOG  WARNING.  So messages that a user is more likely to
 care about than the administrator should be logged at WARNNG; those
 that the administrator is more likely to care about should be LOG.  I
 think LOG is clearly the appropriate thing here.

Isn't this rule confusing the administrators? ISTM that the administrators
would intuitively and literally pay more attention to WARNING than LOG.
Also there are already some warning messages with WARNING level that
the administrators rather than the clients should care about. For example,
the warning message which output when archive_command fails.

ereport(WARNING,
(errmsg(archiving transaction log file
\%s\ failed too many times, will try again later,
xlog)));

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Memory Accounting v11

2015-07-16 Thread David Rowley
On 16 July 2015 at 05:07, Jeff Davis pg...@j-davis.com wrote:

 On Wed, 2015-07-15 at 12:59 +0530, Atri Sharma wrote:

 
  I think a heuristic would be more suited here and ignoring memory
  consumption for internal types means that we are not making the memory
  accounting useful for a set of usecases.
 
 OK, I will drop this patch and proceed with the HashAgg patch, with a
 heuristic for internal types.


Should we mark the patch as returned with feedback in the commitfest app
then?

--
 David Rowley   http://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


Re: [HACKERS] Add CINE for ALTER TABLE ... ADD COLUMN

2015-07-16 Thread Michael Paquier
On Fri, Jun 26, 2015 at 12:41 AM, Fabrízio de Royes Mello
fabriziome...@gmail.com wrote:


 On Wed, Jun 24, 2015 at 3:36 PM, Alvaro Herrera alvhe...@2ndquadrant.com
 wrote:

 Fabrízio de Royes Mello wrote:

  Another rebased version.

 There are a number of unrelated whitespace changes in this patch; also
 please update the comment on top of check_for_column_name_collision.


 Sorry, bad merging after a pgident run. Comments on top of
 check_for_column_name collision also updated.

I had a look at this patch, and here are some minor comments:
1) In alter_table.sgml, you need a space here:
[ IF NOT EXISTS ]replaceable
2)
+   check_for_column_name_collision(targetrelation, newattname, false);
(void) needs to be added in front of check_for_column_name_collision
where its return value is not checked or static code analyzers are
surely going to complain.
3) Something minor, some lines of codes exceed 80 characters, see
declaration of check_for_column_name_collision for example...
4) This comment needs more precisions?
/* new name should not already exist */
-   check_for_column_name_collision(rel, colDef-colname);
+   if (!check_for_column_name_collision(rel, colDef-colname,
if_not_exists))
The new name can actually exist if if_not_exists is true.

Except that the implementation looks sane to me.
Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Map basebackup tablespaces using a tablespace_map file

2015-07-16 Thread Robert Haas
On Thu, Jul 16, 2015 at 9:54 PM, Fujii Masao masao.fu...@gmail.com wrote:
 Isn't this rule confusing the administrators?

I'd like to think not, but yeah, it probably is.  It is not like it
isn't documented.  There are even comments in postgresql.conf
explaining it.  But the fact that we have committers who are confused
probably isn't a great sign.

I really rather like the system we have and find it quite intuitive,
but it's obvious that a lot of other people don't.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-07-16 Thread Tom Lane
Peter Geoghegan p...@heroku.com writes:
 I've heard that clock_gettime() with CLOCK_REALTIME_COARSE, or with
 CLOCK_MONOTONIC_COARSE can have significantly lower overhead than
 gettimeofday().

It can, but it also has *much* lower precision, typically 1ms or so.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_rewind failure by file deletion in source server

2015-07-16 Thread Michael Paquier
On Wed, Jul 1, 2015 at 9:31 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Wed, Jul 1, 2015 at 2:21 AM, Heikki Linnakangas hlinn...@iki.fi wrote:
 On 06/29/2015 09:44 AM, Michael Paquier wrote:

 On Mon, Jun 29, 2015 at 4:55 AM, Heikki Linnakangas wrote:

 But we'll still need to handle the pg_xlog symlink case somehow. Perhaps
 it
 would be enough to special-case pg_xlog for now.


 Well, sure, pg_rewind does not copy the soft links either way. Now it
 would be nice to have an option to be able to recreate the soft link
 of at least pg_xlog even if it can be scripted as well after a run.


 Hmm. I'm starting to think that pg_rewind should ignore pg_xlog entirely. In
 any non-trivial scenarios, just copying all the files from pg_xlog isn't
 enough anyway, and you need to set up a recovery.conf after running
 pg_rewind that contains a restore_command or primary_conninfo, to fetch the
 WAL. So you can argue that by not copying pg_xlog automatically, we're
 actually doing a favour to the DBA, by forcing him to set up the
 recovery.conf file correctly. Because if you just test simple scenarios
 where not much time has passed between the failover and running pg_rewind,
 it might be enough to just copy all the WAL currently in pg_xlog, but it
 would not be enough if more time had passed and not all the required WAL is
 present in pg_xlog anymore.  And by not copying the WAL, we can avoid some
 copying, as restore_command or streaming replication will only copy what's
 needed, while pg_rewind would copy all WAL it can find the target's data
 directory.

 pg_basebackup also doesn't include any WAL, unless you pass the --xlog
 option. It would be nice to also add an optional --xlog option to pg_rewind,
 but with pg_rewind it's possible that all the required WAL isn't present in
 the pg_xlog directory anymore, so you wouldn't always achieve the same
 effect of making the backup self-contained.

 So, I propose the attached. It makes pg_rewind ignore the pg_xlog directory
 in both the source and the target.

 If pg_xlog is simply ignored, some old WAL files may remain in target server.
 Don't these old files cause the subsequent startup of target server as new
 standby to fail? That is, it's the case where the WAL file with the same name
 but different content exist both in target and source. If that's harmfull,
 pg_rewind also should remove the files in pg_xlog of target server.

This would reduce usability. The rewound node will replay WAL from the
previous checkpoint where WAL forked up to the minimum recovery point
of source node where pg_rewind has been run. Hence if we remove
completely the contents of pg_xlog we'd lose a portion of the logs
that need to be replayed until timeline is switched on the rewound
node when recovering it (while streaming from the promoted standby,
whatever). I don't really see why recycled segments would be a
problem, as that's perhaps what you are referring to, but perhaps I am
missing something.

Attached is a rebased version of the previous patch to ignore the
contents of pg_xlog/ when rewinding.
-- 
Michael


20150717_pgrewind_ignore_xlog.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: multiple psql option -c

2015-07-16 Thread Joshua D. Drake





it is one possible solution too

multiple -c option has advantage of simple evaluation of backslash
statements .. -c \l -c \dt - but this advantage is not high important.


Or just properly understand the ; ?

-c select * from foo; update bar set baz = 'bing'; vacuum bar;

JD



Pavel



Best Regards,
Dinesh
manojadinesh.blogspot.com http://manojadinesh.blogspot.com

Regards

Pavel






--
Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Announcing I'm offended is basically telling the world you can't
control your own emotions, so everyone else should do it for you.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: multiple psql option -c

2015-07-16 Thread Pavel Stehule
2015-07-17 6:26 GMT+02:00 Joshua D. Drake j...@commandprompt.com:




 it is one possible solution too

 multiple -c option has advantage of simple evaluation of backslash
 statements .. -c \l -c \dt - but this advantage is not high important.


 Or just properly understand the ; ?

 -c select * from foo; update bar set baz = 'bing'; vacuum bar;


there is a risk of compatibility issues - all statements runs under one
transaction implicitly

Pavel



 JD


 Pavel



 Best Regards,
 Dinesh
 manojadinesh.blogspot.com http://manojadinesh.blogspot.com

 Regards

 Pavel





 --
 Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
 PostgreSQL Centered full stack support, consulting and development.
 Announcing I'm offended is basically telling the world you can't
 control your own emotions, so everyone else should do it for you.



Re: [HACKERS] Bugs in our qsort implementation

2015-07-16 Thread Tom Lane
Peter Geoghegan p...@heroku.com writes:
 On Thu, Jul 16, 2015 at 5:05 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 It's possible that this issue can only manifest on 9.4 and up where
 we have the ability for tuplesort to allocate work arrays approaching
 INT_MAX elements.  But I don't have a lot of faith in that; I think the
 worst-case stack depth for the way we have it now could be as bad as O(N),
 so in principle a crash could be possible with significantly smaller input
 arrays.  I think we'd better back-patch this all the way.

 +1.

 If you want to generate a worst case, McIlroy wrote a program that
 will generate one [1]. AFAIR, it will generate a series of
 self-consistent comparisons in the gas comparator that produce a
 worst-case outcome (as opposed to producing a simple worse-case input,
 which would be more convenient in this kind of scenario). This is
 known specifically to affect the Bentley  McIlroy implementation, as
 the paper goes into.
 [1] http://www.cs.dartmouth.edu/~doug/mdmspe.pdf

Wow, interesting article.  I stuck that into a test program with our
qsort, and found out that

(1) our presort check totally beats that oracle, because it causes
the oracle to freeze the gas in perfectly sequential order, so that
the presort run gets all the way through and decides that the input
is sorted.  Hence, N-1 comparisons and no recursion.

(2) if you dike out the presort check, you do see quadratic comparison
behavior but the max recursion depth is only 1.  Apparently, the oracle
effectively always makes the right-hand partition as large as possible,
so that recursing on the left partition is the optimal policy as far
as stack depth goes.

However, you can trivially modify this oracle to break our code: just
negate its comparison result, ie

-   return val[x] - val[y]; /* only the sign matters */
+   return -(val[x] - val[y]);  /* only the sign matters */

This stops the presort check immediately, since now the data looks
to be reverse-sorted instead of correctly ordered; and now it makes
the left-hand partition as large as possible, instead of the right.

With our unmodified code and the tweaked oracle, I see maximum recursion
depth in the vicinity of N/4.  So it's definitely possible to get O(N)
stack growth without a fix, though it may take very unusual input.  With
the correction to recurse to the smaller partition, we get max recursion
depth of just 1, since that always chooses a very small partition to
recurse to.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: multiple psql option -c

2015-07-16 Thread Pavel Stehule
2015-07-17 0:03 GMT+02:00 dinesh kumar dineshkuma...@gmail.com:

 On Thu, Jul 16, 2015 at 12:42 PM, Pavel Stehule pavel.steh...@gmail.com
 wrote:

 Hi

 can we support multiple -c option?

 Why? Because some statements like VACUUM cannot be used together with any
 other statements with single -c option. The current solution is using echo
 and pipe op, but it is a complication in some complex scripts - higher
 complication when you run psql via multiple sudo statement.

 Example:

 psql -c select pg_stat_reset() -c vacuum full analyze dbname

 or on all db

 psql -At -c select datname from pg_databases postgres | \
 xargs -n 1 -P 3 psql -c ... -c ...

 Ideas, notes, comments?


 IMO, rather having multiple -c args, it would be good to have another flag
 like -C which do accept and execute multiple SQL statements in sequential.


it is one possible solution too

multiple -c option has advantage of simple evaluation of backslash
statements .. -c \l -c \dt - but this advantage is not high important.

Pavel




 Best Regards,
 Dinesh
 manojadinesh.blogspot.com


 Regards

 Pavel





Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-07-16 Thread Amit Kapila
On Thu, Jul 16, 2015 at 11:10 PM, Robert Haas robertmh...@gmail.com wrote:

 On Thu, Jul 16, 2015 at 1:32 PM, Simon Riggs si...@2ndquadrant.com
wrote:

  * Developers will immediately understand the format

 I doubt it.  I think any format that we pick will have to be carefully
 documented.  People may know what JSON looks like in general, but they
 will not immediately know what bells and whistles are available in
 this context.


I also think any format where user has to carefully remember how he has
to provide the values is not user-friendly, why in this case SQL based
syntax is not preferable, with that we can even achieve consistency of this
parameter across all servers which I think is not of utmost importance
for this feature, but still I think it will make users happy.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] auto_explain sample rate

2015-07-16 Thread Craig Ringer
On 7 July 2015 at 21:37, Julien Rouhaud julien.rouh...@dalibo.com wrote:

 Well, I obviously missed that pg_srand48() is only used if the system
 lacks random/srandom, sorry for the noise.  So yes, random() must be
 used instead of pg_lrand48().

 I'm attaching a new version of the patch fixing this issue just in case.

Thanks for picking this up. I've been trying to find time to come back
to it but been swamped in priority work.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: Memory prefetching while sequentially fetching from SortTuple array, tuplestore

2015-07-16 Thread Peter Geoghegan
On Thu, Jul 16, 2015 at 4:01 PM, Peter Geoghegan p...@heroku.com wrote:
 Attached patch adds a portable Postgres wrapper on the GCC intrinsic.
 It also adds a client within tuplesort.c -- a common function that
 seems like a good generic choke point. I can get a speed-up of 6% - 9%
 for all of these cases by prefetching a few SortTuples ahead, for the
 tuple proper. (In-memory sorts only.)

I added a silly bug during last minute clean-up. I attach a V2.

-- 
Peter Geoghegan
From 5d3093e9c0d10f4140072da8d18d0dddf6b84b1e Mon Sep 17 00:00:00 2001
From: Peter Geoghegan peter.geoghega...@gmail.com
Date: Sun, 12 Jul 2015 13:14:01 -0700
Subject: [PATCH] Prefetch from memtuples array in tuplesort

Testing shows that prefetching the tuple proper of a slightly later
SortTuple in the memtuples array during each of many sequential,
in-logical-order SortTuple fetches speeds up various sorting intense
operations considerably.  For example, B-Tree index builds are
accelerated as leaf pages are created from the memtuples array.
(i.e.  The operation following actually performing the sort, but
before a tuplesort_end() call is made as a B-Tree spool is destroyed.)

Similarly, ordered set aggregates (all cases except the datumsort case
with a pass-by-value type), and regular heap tuplesorts benefit to about
the same degree.  The optimization is only used when sorts fit in
memory, though.

Also, prefetch a few places ahead within the analogous fetching point
in tuplestore.c.  This appears to offer similar benefits in certain
cases.  For example, queries involving large common table expressions
significantly benefit.
---
 config/c-compiler.m4| 17 +
 configure   | 30 ++
 configure.in|  1 +
 src/backend/utils/sort/tuplesort.c  | 18 ++
 src/backend/utils/sort/tuplestore.c | 11 +++
 src/include/c.h | 19 +++
 src/include/pg_config.h.in  |  3 +++
 src/include/pg_config.h.win32   |  3 +++
 8 files changed, 102 insertions(+)

diff --git a/config/c-compiler.m4 b/config/c-compiler.m4
index 050bfa5..5776201 100644
--- a/config/c-compiler.m4
+++ b/config/c-compiler.m4
@@ -287,6 +287,23 @@ fi])# PGAC_C_BUILTIN_UNREACHABLE
 
 
 
+# PGAC_C_BUILTIN_PREFETCH
+# -
+# Check if the C compiler understands __builtin_prefetch(),
+# and define HAVE__BUILTIN_PREFETCH if so.
+AC_DEFUN([PGAC_C_BUILTIN_PREFETCH],
+[AC_CACHE_CHECK(for __builtin_prefetch, pgac_cv__builtin_prefetch,
+[AC_COMPILE_IFELSE([AC_LANG_PROGRAM([],
+[int i = 0;__builtin_prefetch(i, 0, 3);])],
+[pgac_cv__builtin_prefetch=yes],
+[pgac_cv__builtin_prefetch=no])])
+if test x$pgac_cv__builtin_prefetch = xyes ; then
+AC_DEFINE(HAVE__BUILTIN_PREFETCH, 1,
+  [Define to 1 if your compiler understands __builtin_prefetch.])
+fi])# PGAC_C_BUILTIN_PREFETCH
+
+
+
 # PGAC_C_VA_ARGS
 # --
 # Check if the C compiler understands C99-style variadic macros,
diff --git a/configure b/configure
index 77048e8..f14e65d 100755
--- a/configure
+++ b/configure
@@ -11248,6 +11248,36 @@ if test x$pgac_cv__builtin_unreachable = xyes ; then
 $as_echo #define HAVE__BUILTIN_UNREACHABLE 1 confdefs.h
 
 fi
+{ $as_echo $as_me:${as_lineno-$LINENO}: checking for __builtin_prefetch 5
+$as_echo_n checking for __builtin_prefetch...  6; }
+if ${pgac_cv__builtin_prefetch+:} false; then :
+  $as_echo_n (cached)  6
+else
+  cat confdefs.h - _ACEOF conftest.$ac_ext
+/* end confdefs.h.  */
+
+int
+main ()
+{
+int i = 0;__builtin_prefetch(i, 0, 3);
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_compile $LINENO; then :
+  pgac_cv__builtin_prefetch=yes
+else
+  pgac_cv__builtin_prefetch=no
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+fi
+{ $as_echo $as_me:${as_lineno-$LINENO}: result: $pgac_cv__builtin_prefetch 5
+$as_echo $pgac_cv__builtin_prefetch 6; }
+if test x$pgac_cv__builtin_prefetch = xyes ; then
+
+$as_echo #define HAVE__BUILTIN_PREFETCH 1 confdefs.h
+
+fi
 { $as_echo $as_me:${as_lineno-$LINENO}: checking for __VA_ARGS__ 5
 $as_echo_n checking for __VA_ARGS__...  6; }
 if ${pgac_cv__va_args+:} false; then :
diff --git a/configure.in b/configure.in
index 5724a4d..2132d69 100644
--- a/configure.in
+++ b/configure.in
@@ -1318,6 +1318,7 @@ PGAC_C_TYPES_COMPATIBLE
 PGAC_C_BUILTIN_BSWAP32
 PGAC_C_BUILTIN_CONSTANT_P
 PGAC_C_BUILTIN_UNREACHABLE
+PGAC_C_BUILTIN_PREFETCH
 PGAC_C_VA_ARGS
 PGAC_STRUCT_TIMEZONE
 PGAC_UNION_SEMUN
diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index 435041a..70b53d1 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -1663,6 +1663,24 @@ tuplesort_gettuple_common(Tuplesortstate *state, bool forward,
 if (state-current  state-memtupcount)
 {
 	*stup = state-memtuples[state-current++];
+
+	/*
+	 * Perform memory prefetch of tuple proper of the
+	 * SortTuple that's 

Re: [HACKERS] [PATCH] postgres_fdw extension support

2015-07-16 Thread Michael Paquier
On Fri, Jul 17, 2015 at 1:08 AM, Paul Ramsey wrote:
 + if ( (!is_builtin(oe-opno)) 
 (!is_in_extension(oe-opno, fpinfo)) )
 ... And this does not respect the project code format. See here for
 more details for example:
 http://www.postgresql.org/docs/devel/static/source.html

 I’m sorry, that link doesn’t clarify for me what’s stylistically wrong here
 (it’s almost all about error messages), could you help (is it the padding
 around the conditionals? removed that)

Two things:
1) Spaces between parenthesis at the beginning and the end of he expression
2) Parenthesis around is_in_extension are not that much necessary.


 appendStringInfo(buf, ::%s,
 - format_type_with_typemod(node-consttype,
 - node-consttypmod));
 + format_type_be_qualified(node-consttype));
 What's the reason for this change?

 Good question. As I recall, the very sparse search path the FDW connection
 makes can sometimes leave remote function failing to find other functions
 they need, so we want to force the calls to be schema-qualified.
 Unfortunately there’s no perfect public call for that, ideally it would be
 return format_type_internal(type_oid, typemod, true, false, true), but
 there’s no function like that, so I settled for format_type_be_qualified(),
 which forces qualification at the expense of ignoring the typmod.

Hm. I don't have my mind wrapped around that now, but this needs to be
checked with data types like char or varchar. It may matter.

 Thinking a bit wider, why is this just limited to extensions? You may
 have as well other objects defined locally and remotely like functions
 or operators that are not defined in extensions, but as normal
 objects. Hence I think that what you are looking for is not this
 option, but a boolean option enforcing the behavior of code paths
 using is_builtin() in foreign_expr_walker such as the sanity checks on
 existing objects are not limited to FirstBootstrapObjectId but to
 other objects in the catalogs.


 Well, as I see it there’s three broad categories of behavior available:

 1- Forward nothing non-built-in (current behavior)
 2- Use options to forward only specified non-built-in things (either in
 function chunks (extensions, as in this patch) or one-by-one (mark your
 desired functions / ops)
 3- Forward everything if a “forward everything” option is set

Then what you are describing here is a parameter able to do a switch
among this selection:
- nothing, which is to check on built-in stuff
- extension list.
- all.

 I hadn’t actually considered the possibility of option 3, but for my
 purposes it would work just as well, with the added efficiency bonus of not
 having to check whether particular funcs/ops are inside declared extensions.
 Both the current state of FDW and the patch I’m providing expect a *bit* of
 smarts on the part of users, to make sure their remote/local environments
 are in sync (in particular versions of pgsql and of extensions). Option 3
 just ups the ante on that requirement. I’d be fine w/ this, makes the patch
 very simple and fast.

Yeah, perhaps too simple though :) You may want something that is more
advanced. For example when using a set of objects you can decide which
want you want to pushdown and which one you want to keep as evaluated
locally.

 For option 2, marking things one at a time really isn’t practical for a
 package like PostGIS, with several hundred functions and operators. Using
 the larger block of “extension” makes more sense. I think in general,
 expecting users to itemize every func/op they want to forward, particular if
 they just want an extension to “work” over FDW is too big an expectation.
 That’s not to minimize the utility of being able to mark individual
 functions/ops for forwarding, but I think that’s a separate use case that
 doesn’t eliminate the need for extension-level forwarding.

OK, that's good to know. Perhaps then that using a parameter with an
extension list is a good balance. Now would there be practical cases
where it is actually useful to have an extension list? For example,
let's say that there are two extensions installed: foo1 and foo2. Do
you think (or know) if some users would be willing to include only the
objects of foo1, but include those of foo2? Also, could it be possible
to consider an exclude list? Like ignoring all the objects in the
given extension list and allow the rest.
I am just trying to consider all the possibilities here.

 Thanks again for the review!

Good to see that you added in the CF app:
https://commitfest.postgresql.org/6/304/

Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] optimizing vacuum truncation scans

2015-07-16 Thread Haribabu Kommi
On Thu, Jul 16, 2015 at 10:17 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Thu, Jul 16, 2015 at 11:21 AM, Haribabu Kommi kommi.harib...@gmail.com
 wrote:


 Here I attached updated patches,
 1) without prefetch logic.
 2) with combined vm and prefetch logic.


 I think it is better to just get the first patch in as that in itself is a
 clear win and then we can evaluate the second one (prefetching during
 truncation) separately.  I think after first patch, the use case for doing
 prefetch seems to be less beneficial and I think it could hurt by loading
 not required pages (assume you have prefetched 32 pages and after
 inspecting first page, it decides to quit the loop as that is non-empty page
 and other is when it has prefetched just because for page the visibility map
 bit was cleared, but for others it is set) in OS buffer cache.

Yes, in the above cases, the prefetch is an overhead. I am not sure
how frequently such
scenarios occur in real time scenarios.

 Having said that, I am not against prefetching in this scenario as that
 can help in more cases then it could hurt, but I think it will be better
 to evaluate that separately.

Yes, the prefetch patch works better in cases, where majorly the first
vacuum skips
the truncation because of lock waiters. If such cases are more in real
time scenarios,
then the prefetch patch is needed.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Memory prefetching while sequentially fetching from SortTuple array, tuplestore

2015-07-16 Thread Peter Geoghegan
I've always thought that GCC intrinsics for software-based memory
prefetching are a very sharp tool. While it's really hard to use GCC's
__builtin_prefetch() effectively (I've tried enough times to know), I
always thought that there'd probably end up being a handful of cases
where using it presented a clear and worthwhile win. Hash joins seemed
to me to be a likely candidate, because memory latency is a real
killer there. However, I stumbled upon another case where prefetching
pays clear dividends while actually being quite simple: sequentially
reading an already-sorted memtuples array (an array of SortTuple
structs), during some kind of post-processing.

This comes up for many common and important cases, including regular
sort nodes (heaptuple sorts), datumsorts with pass-by-reference types
(e.g. ordered set aggregates), and B-Tree index builds, where the
sequential fetching occurs as actual B-Tree leaf pages are built from
the array.

Patch
=

Attached patch adds a portable Postgres wrapper on the GCC intrinsic.
It also adds a client within tuplesort.c -- a common function that
seems like a good generic choke point. I can get a speed-up of 6% - 9%
for all of these cases by prefetching a few SortTuples ahead, for the
tuple proper. (In-memory sorts only.)

ISTM that this alone makes the patch worthwhile, but we should still
consider the break-down of costs in each of these operations, and that
the cost of the sequential reading and processing of SortTuples is all
that has been touched by the patch. There are only about 3 real lines
of code added to tuplesort. Obviously, the dominant costs within each
of these queries/utility commands are the sort proper, and to a lesser
extent the heap scan preceding it. I am not targeting those at all --
I'm only targeting the tertiary cost of sequentially scanning the
memtuples array by hiding memory latency, and yet I end up with a
notable overall speed-up.

It's only really fair to look at this final sequential fetching cost
in isolation. The reduction in that cost in isolation seems to be
huge. I tried out the proprietary Intel vTune Amplifier utility with
this case, and it indicated that CPU time spent in calls to
_bt_buildadd was less than 1/8 the previous time in a comparable test
of the master branch, with no other changes that I can attribute to
this patch (i.e. there is a bit of noise). You can easily get some
hint of the size of the improvement by reviewing trace_sort = on log
output, and the reduction of time spent after performsort is done
but before the sort is shut down.

Tuplestores


tuplestore.c has a similar optimization added, on the theory that it
should match tuplesort.c wherever possible, and because it seems to
help a lot there too. For example, queries that perform a big CTE Scan
seem to benefit to about the same degree (although, again, only when
the array is fully in memory).

Portability concerns
===

I started writing this patch with an intuition that this would help. I
arrived at the fixed-up version posted here through frobbing the code
based on the feedback of Postgres running on my laptop. I iteratively
tweaked the number of tuples ahead to fetch from, and the
__builtin_prefetch() temporal locality argument's value, which helped
a little for a number of test cases. If that doesn't inspire much
confidence in this patch, then good -- that's the idea. Obviously if
we are going to commit this, there needs to be testing of a reasonably
representative selection of platforms.

On the other hand, I couldn't find a case that was regressed, and I am
encouraged by the fact that this helps the post-processing of
SortTuples so effectively. I imagine that some platform would have
very different performance characteristics in order for this to be the
wrong thing. However, it seems possible that there is a case that has
been regressed, since the prefetch is added at a very generic point
within tuplesort.c and tuplestore.c.
-- 
Peter Geoghegan
From 3c44ec9a9c68da3408716e06d0e822acfc68a746 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan peter.geoghega...@gmail.com
Date: Sun, 12 Jul 2015 13:14:01 -0700
Subject: [PATCH] Prefetch from memtuples array in tuplesort

Testing shows that prefetching the tuple proper of a slightly later
SortTuple in the memtuples array during each of many sequential,
in-logical-order SortTuple fetches speeds up various sorting intense
operations considerably.  For example, B-Tree index builds are
accelerated as leaf pages are created from the memtuples array.
(i.e.  The operation following actually performing the sort, but
before a tuplesort_end() call is made as a B-Tree spool is destroyed.)

Similarly, ordered set aggregates (all cases except the datumsort case
with a pass-by-value type), and regular heap tuplesorts benefit to about
the same degree.  The optimization is only used when sorts fit in
memory, though.

Also, prefetch a few places ahead within the analogous fetching point
in tuplestore.c. 

Re: [HACKERS] Bugs in our qsort implementation

2015-07-16 Thread Peter Geoghegan
On Thu, Jul 16, 2015 at 5:05 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 It's possible that this issue can only manifest on 9.4 and up where
 we have the ability for tuplesort to allocate work arrays approaching
 INT_MAX elements.  But I don't have a lot of faith in that; I think the
 worst-case stack depth for the way we have it now could be as bad as O(N),
 so in principle a crash could be possible with significantly smaller input
 arrays.  I think we'd better back-patch this all the way.

+1.

If you want to generate a worst case, McIlroy wrote a program that
will generate one [1]. AFAIR, it will generate a series of
self-consistent comparisons in the gas comparator that produce a
worst-case outcome (as opposed to producing a simple worse-case input,
which would be more convenient in this kind of scenario). This is
known specifically to affect the Bentley  McIlroy implementation, as
the paper goes into.

[1] http://www.cs.dartmouth.edu/~doug/mdmspe.pdf
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Size of TOAST pointer

2015-07-16 Thread Tom Lane
Vignesh Raghunathan vignesh.pg...@gmail.com writes:
 I was looking at the documentation for TOAST (
 http://www.postgresql.org/docs/devel/static/storage-toast.html) and it's
 specified that the toast pointer occupies 18 bytes. However, the struct
 representing the toast pointer is defined as follows
 typedef struct varatt_external

The data that actually ends up on disk is a varattrib_1b_e wrapping a
varatt_external, and that header is where the extra 2 bytes come from.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers