[HACKERS] close_ps, NULLs, and DirectFunctionCall

2017-09-20 Thread Andrew Gierth
(From IRC discussion with Andreas about some sqlsmith errors)

Commit 278148907a9 changed close_ps in geo_ops.c to return SQL NULL in
the event that a valid result point was not found (rather than crashing
or failing an assert).

But close_ps is called with DirectFunctionCall from other close_*
functions, and indirectly from dist_* functions. So while the specific
case of (point ## lseg) returns a NULL with no error in this case, for
other data types, or for certain dist_* calls, rather than a null result
you get this:

postgres=# select point(1,2) ## '((0,0),(NaN,NaN))'::box;
ERROR:  function 0x9c5de0 returned NULL

postgres=# select point(1,2) <-> '((0,0),(NaN,NaN))'::box;
ERROR:  function 0x9c5de0 returned NULL

This seems ... unhelpful (though it's at least better than crashing) and
inconsistent.

Does this need fixing, and if so how? The most consistent fix would seem
to be to make all the affected functions return NULL, which probably
requires factoring out a bunch of close_*_internal functions and
replacing the DirectFunctionCall usage.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inadequate infrastructure for NextValueExpr

2017-08-17 Thread Andrew Gierth
> "Thomas" == Thomas Munro  writes:

 >> [...]
 >> T_NamedTuplestoreScan can be produced by outfuncs.c with tagname
 >> NAMEDTUPLESTORESCAN but that tagname is not recognized by readfuncs.c
 >> [...]
 >> 
 >> That revealed a defect in commit
 >> 18ce3a4ab22d2984f8540ab480979c851dae5338 which I think should be
 >> corrected with something like the attached, though I'm not sure if
 >> it's possible to reach it.

 Thomas> Adding Kevin and Andrew G.  Thoughts on whether this is a
 Thomas> defect that should be corrected with something like
 Thomas> read-namedtuplestorescan.patch?

It's a defect that should probably be corrected for consistency, though
at present it looks like it's not actually possible to reach the code.
The patch looks good to me though I've not tested it.

Kevin, you want to take it? Or shall I deal with it?

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY vs. transition tables

2017-07-10 Thread Andrew Gierth
>>>>> "Andrew" == Andrew Gierth <and...@tao11.riddles.org.uk> writes:
>>>>> "Thomas" == Thomas Munro <thomas.mu...@enterprisedb.com> writes:

 Thomas> Here it is.  Added to open items.

 Andrew> On it.

Committed.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY vs. transition tables

2017-07-10 Thread Andrew Gierth
> "Thomas" == Thomas Munro  writes:

 Thomas> Here it is.  Added to open items.

On it.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] transition table behavior with inheritance appears broken

2017-06-28 Thread Andrew Gierth
Commits pushed.

Unless I broke the buildfarm again (which I'll check up on later), or
some new issue arises with the fixes, this should close all 3 related
items for transition tables.

-- 
Andrew.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] transition table behavior with inheritance appears broken

2017-06-27 Thread Andrew Gierth
> "Noah" == Noah Misch  writes:

 Noah> IMMEDIATE ATTENTION REQUIRED.  This PostgreSQL 10 open item is
 Noah> long past due for your status update.  Please reacquaint yourself
 Noah> with the policy on open item ownership[1] and then reply
 Noah> immediately.  If I do not hear from you by 2017-06-28 06:00 UTC,
 Noah> I will transfer this item to release management team ownership
 Noah> without further notice.

Sorry for the lack of updates. I need to sleep now, but I will send a
proper status update by 1800 UTC (1900 BST) today (28th).

-- 
Andrew.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] transition table behavior with inheritance appears broken

2017-06-23 Thread Andrew Gierth
> "Noah" == Noah Misch  writes:

 Noah> This PostgreSQL 10 open item is past due for your status update.
 Noah> Kindly send a status update within 24 hours,

oops, sorry! I forgot to include a date in the last one, and in fact a
personal matter delayed things anyway. I expect to have this wrapped up
by 23:59 BST on the 24th.

-- 
Andrew.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PG10 transition tables, wCTEs and multiple operations on the same table

2017-06-18 Thread Andrew Gierth
> "Thomas" == Thomas Munro  writes:

 Thomas> Thanks both for the review.  New version of patch #2 attached.

I'm looking to commit this soon; if anyone has any further comment now
would be a good time to speak up.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transition tables vs ON CONFLICT

2017-06-18 Thread Andrew Gierth
> "Thomas" == Thomas Munro  writes:

 Thomas> That accidentally removed a comment that I wanted to keep.
 Thomas> Here is a better version.

I plan to commit this soon; if anyone has any comment to make, now would
be a good time.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] transition table behavior with inheritance appears broken

2017-06-18 Thread Andrew Gierth
>>>>> "Andrew" == Andrew Gierth <and...@tao11.riddles.org.uk> writes:

 Andrew> Unfortunately I've been delayed over the past couple of days,
 Andrew> but I have Thomas' latest patchset in hand and will be working
 Andrew> on it over the rest of the week. Status update by 23:59 BST on
 Andrew> Sun 18th, by which time I hope to have everything finalized
 Andrew> (all three issues, not just the inheritance one).

I have, I believe, completed my review of the patchset. My conclusion is
that the fix appears to be sound and I haven't been able to find any
further issues with it; so I think Thomas's patches should be committed
as-is. Unless anyone objects I will do this within the next few days.

(Any preferences for whether it should be one commit or 3 separate ones?)

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] transition table behavior with inheritance appears broken

2017-06-14 Thread Andrew Gierth
>>>>> "Andrew" == Andrew Gierth <and...@tao11.riddles.org.uk> writes:

 Andrew> I will post a further status update before 23:59 BST on 14th
 Andrew> Jun.

Unfortunately I've been delayed over the past couple of days, but I have
Thomas' latest patchset in hand and will be working on it over the rest
of the week. Status update by 23:59 BST on Sun 18th, by which time I
hope to have everything finalized (all three issues, not just the
inheritance one).

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] transition table behavior with inheritance appears broken

2017-06-11 Thread Andrew Gierth
>>>>> "Andrew" == Andrew Gierth <and...@tao11.riddles.org.uk> writes:

 Andrew> I have it; I will post a status update before 23:59 BST on 11
 Andrew> Jun.

This is that status update.  I am still studying Thomas' latest patch
set; as I mentioned in another message, I've confirmed a memory leak,
and I expect further work may be needed in some other areas as well, but
I think we're still making progress towards fixing it and I will work
with Thomas on it.

I will post a further status update before 23:59 BST on 14th Jun.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PG10 transition tables, wCTEs and multiple operations on the same table

2017-06-11 Thread Andrew Gierth
> "Robert" == Robert Haas  writes:

 Robert> I don't see a reason why MakeTransitionCaptureState needs to
 Robert> force the tuplestores into TopTransactionContext or make them
 Robert> owned by TopTransactionResourceOwner.

Nor do I, and I'm pretty sure it's leaking memory wholesale within a
transaction if you have aborted subxacts.

e.g. a few iterations of this, on a table with an appropriate trigger:

savepoint s1;
insert into foo
  select i, case when i < 10 then repeat('a',100)
else repeat('a',1/(i-10)) end
from generate_series(1,10) i;
rollback to s1;

This is a bit contrived of course but it shows that there's missing
cleanup somewhere, either in the form of poor choice of context or
missing code in the subxact cleanup.

 Robert> I mean, it was like that before, but afterTriggers is a global
 Robert> variable and, potentially, there could still be a pointer
 Robert> accessible through it to a tuplestore linked from it even after
 Robert> the corresponding subtransaction aborted, possibly causing some
 Robert> cleanup code to trip and fall over.  But that can't be used to
 Robert> justify this case, because the TransitionCaptureState is only
 Robert> reached through the PlanState tree; if that goes away, how is
 Robert> anybody going to accidentally follow a pointer to the
 Robert> now-absent tuplestore?

For per-row triggers with transition tables, a pointer to the transition
capture state ends up in the shared-data record in the event queue?

-- 
Andrew.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] transition table behavior with inheritance appears broken

2017-06-09 Thread Andrew Gierth
> "Robert" == Robert Haas  writes:

 Robert> So, Andrew, are you running with this, or should I keep looking
 Robert> into it?

I have it; I will post a status update before 23:59 BST on 11 Jun.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PG10 transition tables, wCTEs and multiple operations on the same table

2017-06-08 Thread Andrew Gierth
> "Robert" == Robert Haas  writes:

 Robert> unless some other committer volunteers.  (Of course, anyone
 Robert> could step in to do the work, as Thomas already has to a
 Robert> considerable degree, but without a committer involved it
 Robert> doesn't fix the problem.)

I can probably take this on if needed.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PG10 transition tables, wCTEs and multiple operations on the same table

2017-06-08 Thread Andrew Gierth
> "Thomas" == Thomas Munro  writes:

 Thomas> So, afterTriggers.query_stack is used to handle the reentrancy
 Thomas> that results from triggers running further statements that
 Thomas> might fire triggers.  It isn't used for dealing with extra
 Thomas> ModifyTable nodes that can appear in a plan because of wCTEs.
 Thomas> Could it also be used for that purpose?  I think that would
 Thomas> only work if it is the case that each ModifyTable node begin
 Thomas> and then runs to completion (ie no interleaving of wCTE
 Thomas> execution) and then its AS trigger fires, which I'm not sure
 Thomas> about.

There's a problem with this which I didn't see anyone mention (though
I've not read the whole history); existing users of wCTEs rely on the
fact that no AFTER triggers are run until _all_ modifications are
completed.  If you change that, then you can't use wCTEs to insert into
tables with FK checks without knowing the order in which the individual
wCTEs are executed, which is currently a bit vague and non-obvious
(which doesn't cause issues at the moment only because nothing actually
depends on the order).

for example:
create table foo (id integer primary key);
create table foo_bar (foo_id integer references foo, bar_id integer);
with i1 as (insert into foo values (1))
  insert into foo_bar values (1,2),(1,3);

This would fail the FK check if each insert did its own trigger firing,
since the foo_bar insert is actually run _first_.

Firing triggers earlier than they currently are would thus break
existing working queries.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Assertion failure in REL9_5_STABLE

2017-04-18 Thread Andrew Gierth
> "Pavan" == Pavan Deolasee  writes:

 Pavan> I am attaching a patch that throws a similar ERROR during
 Pavan> planning even for 9.5. AFAICS in presence of grouping sets, we
 Pavan> always decide to use sort-based implementation for grouping, but
 Pavan> do not check if the columns support ordering or not.

Hmmm. This is obviously my fault somewhere... the patch looks good, I'll
deal with it shortly.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only count(*) for indexes supporting bitmap scans

2017-04-12 Thread Andrew Gierth
> "Alexander" == Alexander Kuzmenkov  writes:

 Alexander> Structurally, the patch consists of two major parts: a
 Alexander> specialized executor node

Why?

It strikes me that the significant fact here is not that we're doing
count(*), but that we don't need any columns from the bitmap heap scan
result.  Rather than creating a whole new node, can't the existing
bitmap heapscan be taught to skip fetching the actual table page in
cases where it's all-visible, not lossy, and no columns are needed?

(this would also have the advantage of getting parallelism for free)

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ongoing issues with representation of empty arrays

2017-04-10 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 >> First is contrib/intarray, _AGAIN_ (see past bugs such as #7730):
 >> ...
 >> I plan to fix this one properly, unless anyone has any objections.

 Tom> Just to clarify, what do you think is "properly"?

I would say, that any time an intarray function returns an empty result
it should be the standard 0-dimensional representation that every other
array operation uses.  The intarray functions all seem already able to
take such values as inputs.  Also there should be regression tests for
this (none of intarray's existing tests have any empty arrays at all).

 >> Second is aclitem[], past bug #8395 which was not really resolved; empty
 >> ACLs are actually 1-dim arrays of length 0, and all the ACL functions
 >> insist on that, which means that you can't call aclexplode('{}') for
 >> example:
 >> It's much less clear what to do about this one. Thoughts? 

 Tom> My instinct is to be conservative in what you send and liberal in
 Tom> what you accept.  In this context that would probably mean fixing
 Tom> aclitem-related functions to accept both 0-dimensional and
 Tom> 1-dimensional-0-length inputs.
 
 Tom> (Actually, is there a reason to be picky about the input
 Tom> dimensionality at all, rather than just iterating over whatever
 Tom> the array contents are?)

Currently there is this:

#define ACL_NUM(ACL)(ARR_DIMS(ACL)[0])

which is obviously wrong for dimensions other than exactly 1.  I don't
_think_ there's any great obstacle to fixing this; the only code that
would care about number of dimensions would be allocacl, and since
there'd be no obvious reason to preserve the shape of an aclitem[]
anywhere, that could just do 0 or 1 dimensions.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Ongoing issues with representation of empty arrays

2017-04-10 Thread Andrew Gierth
The distinction between the standard representation of '{}' as an array
with zero dimensions and nonstandard representations as a 1-dimensional
array with zero elements has come up in a couple of contexts on the IRC
channel recently.

First is contrib/intarray, _AGAIN_ (see past bugs such as #7730):

select array_dims(('{1,2}'::integer[] & '{3}'));
 array_dims 

 [1:0]
(1 row)

regression=# select ('{1,2}'::integer[] & '{3}') = '{}';
 ?column? 
--
 f
(1 row)

Worse, the fact that the fix for #7730 (commit c155f654) only did a
very partial job means that it's now inconsistent:

regression=# select (a - b), (a & c), (a - b) = (a & c)
 from (values (array[1,2],array[1,2],array[3])) v(a,b,c);
 ?column? | ?column? | ?column? 
--+--+--
 {}   | {}   | f
(1 row)

I plan to fix this one properly, unless anyone has any objections.


Second is aclitem[], past bug #8395 which was not really resolved; empty
ACLs are actually 1-dim arrays of length 0, and all the ACL functions
insist on that, which means that you can't call aclexplode('{}') for
example:

https://www.postgresql.org/message-id/flat/CA%2BTgmoZdDpTJDUVsgzRhoCctidUqLDyO8bdYwgLD5p8DwHtMcQ%40mail.gmail.com

It's much less clear what to do about this one. Thoughts? 

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Planner crash on foreign table join

2017-04-09 Thread Andrew Gierth
> "Thomas" == Thomas Munro  writes:

 >> SomeType *x = (SomeType *) lfirst(l);
 >> 
 >> (in my code I tend to omit the (SomeType *), which I dislike because
 >> it adds no real protection)

 Thomas> Just BTW, without that cast it's not compilable as C++, so I'm
 Thomas> guessing that Peter E will finish up putting it back in
 Thomas> wherever you leave it out...

There's north of 150 other examples (just grep for '= lfirst' in the
source). Some were even committed by Peter E :-)

In the discussion with Andres the same point came up for palloc, for
which I suggested we add something along the lines of:

#define palloc_object(_type_) (_type_ *) palloc(sizeof(_type_))
#define palloc_array(_type_, n) (_type_ *) palloc((n) * sizeof(_type_))

palloc() without a cast is even more common than lfirst() without one,
and something like half of those (and 80%+ of the pallocs that do have a
cast) are palloc(sizeof(...)) or palloc(something * sizeof(...)).

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Malformed Array Literal in PL/pgSQL Exception Block

2017-04-09 Thread Andrew Gierth
> "David" == David E Wheeler  writes:

 >> If you change this to EXCEPTION WHEN division_by_zero THEN, the
 >> reported error becomes:
 >> 
 >> ERROR:  malformed array literal: "foo"
 >> LINE 1: SELECT things || 'foo'

 David> So the issue stands, yes?

Tom's response has the explanation of why it fails (everywhere, not just
in the exception block): parse analysis prefers to match the (array ||
array) form of the operator when given input of (array || unknown). Just
cast the 'foo' to the array element type.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Malformed Array Literal in PL/pgSQL Exception Block

2017-04-09 Thread Andrew Gierth
> "David" == David E Wheeler  writes:

 David> And it works great, including in PL/pgSQL functions, except in
 David> an exception block. When I run this:

 David> BEGIN;

 David> CREATE OR REPLACE FUNCTION foo(
 David> ) RETURNS BOOLEAN IMMUTABLE LANGUAGE PLPGSQL AS $$
 David> DECLARE
 David> things TEXT[] := '{}';
 David> BEGIN
 David> things := things || 'foo';
 David> RAISE division_by_zero;

This "raise" statement is not reached, because the previous line raises
the "malformed array literal" error.

 David> EXCEPTION WHEN OTHERS THEN

If you change this to  EXCEPTION WHEN division_by_zero THEN, the
reported error becomes:

ERROR:  malformed array literal: "foo"
LINE 1: SELECT things || 'foo'

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Planner crash on foreign table join

2017-04-08 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 Tom> Experimentation shows that actually, the standard regression tests
 Tom> provide dozens of opportunities for find_relation_from_clauses to
 Tom> fail on non-RestrictInfo input.  However, it lacks any IsA check,

In a discussion with Andres on the hash grouping sets review thread, I
proposed that we should have something of the form

#define lfirst_node(_type_, l) (castNode(_type_,lfirst(l)))

to replace the current idiom of

foreach(l, blah)
{
SomeType *x = (SomeType *) lfirst(l);

(in my code I tend to omit the (SomeType *), which I dislike because it
adds no real protection)

by

foreach(l, blah)
{
SomeType *x = lfirst_node(SomeType, l);

in order to get that IsA check in there in a convenient way.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Planner crash on foreign table join

2017-04-08 Thread Andrew Gierth
> "Andreas" == Andreas Seltenreich  writes:

 Andreas> Hi,
 
 Andreas> testing master at f0e44021df with a loopback postgres_fdw
 Andreas> installed, I see lots of crashes on queries joining foreign
 Andreas> tables with various expressions.  Below is a reduced recipe
 Andreas> for the regression database and a backtrace.

Commit ac2b095088 assumes that clauselist_selectivity is being passed a
list of RelOptInfo, but postgres_fdw is passing it a list of bare
clauses.  One of them is wrong :-)

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Instead of DROP function use UPDATE pg_proc in an upgrade extension script

2017-04-03 Thread Andrew Gierth
> "Vicky" == Vicky Vergara  writes:

 Vicky> UPDATE pg_proc SET [...]

 Vicky> So, I want to know how "safe" can you consider the second
 Vicky> method, and what kind of other objects do I need to test besides
 Vicky> views.

Speaking from personal experience (I did this in the upgrade script for
ip4r, in a much simpler case than yours, and broke it badly), it's not
at all safe.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unable to build doc on latest head

2017-04-03 Thread Andrew Gierth
> "Peter" == Peter Eisentraut  writes:

 > On 4/3/17 02:44, Ashutosh Bapat wrote:
 >> [1] says that id.attribute is supported in stylesheets version
 >> 1.77.1.  Do I need to update stylesheets version? How do I do it?
 >> Any help will be appreciated.

 Peter> The oldest version among the ones listed at
 Peter> http://docbook.sourceforge.net/release/xsl/ that works for me is
 Peter> 1.77.0.  Which one do you have?

BTW, the version in freebsd ports is 1.76.1, and the download stats on
docbook.sourceforge.net suggests that this one is still pretty widely
used despite its age. I filed a PR on the port, but who knows when
anything will happen.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash support for grouping sets

2017-03-23 Thread Andrew Gierth
> "Andres" == Andres Freund  writes:

 Andres> a) cast result of lfirst/lnext/whatnot.

Again, what we need here is something like

#define lfirst_node(_type_, l) (castNode(_type_, lfirst(l)))

etc.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash support for grouping sets

2017-03-23 Thread Andrew Gierth
> "Andres" == Andres Freund  writes:

 Andres> We usually cast the result of palloc.

 >> Rough count in the backend has ~400 without casts to ~1350 with, so
 >> this doesn't seem to have been consistently enforced.

 Andres> Yea, but we're still trying.

Well, a lot of the uncasted ones are in fairly new code, from quite a
number of different committers.

So if this is a big deal, why don't we already have

#define palloc_array(etype,ecount) (((etype) *) palloc((ecount) * 
sizeof(etype)))
#define palloc_object(otype) (((otype) *) palloc(sizeof(otype)))

or something of that ilk?

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash support for grouping sets

2017-03-23 Thread Andrew Gierth
> "Mark" == Mark Dilger  writes:

 Mark> Is there a performance test case where this patch should shine
 Mark> brightest?  I'd like to load a schema with lots of data, and run
 Mark> a grouping sets query, both before and after applying the patch,
 Mark> to see what the performance advantage is.

The area which I think is most important for performance is the handling
of small cubes; without this patch, a 2d cube needs 2 full sorts, a 3d
one needs 3, and a 4d one needs 6. In many real-world data sets these
would all hash entirely in memory.

So here's a very simple example (deliberately using integers for
grouping to minimize the advantage; grouping by text columns in a non-C
locale would show a much greater speedup for the patch):

create table sales (
  id serial,
  product_id integer,
  store_id integer,
  customer_id integer,
  qty integer);

-- random integer function
create function d(integer) returns integer language sql
 as $f$ select floor(random()*$1)::integer + 1; $f$;

-- 10 million random rows
insert into sales (product_id,store_id,customer_id,qty)
  select d(20), d(6), d(10), d(100) from generate_series(1,1000);

-- example 2d cube:
select product_id, store_id, count(*), sum(qty)
  from sales
 group by cube(product_id, store_id);

-- example 3d cube:
select product_id, store_id, customer_id, count(*), sum(qty)
  from sales
 group by cube(product_id, store_id, customer_id);

-- example 4d cube with a computed column:
select product_id, store_id, customer_id, (qty / 10), count(*), sum(qty)
  from sales
 group by cube(product_id, store_id, customer_id, (qty / 10));

On my machine, the 2d cube is about 3.6 seconds with the patch, and
about 8 seconds without it; the 4d is about 18 seconds with the patch
and about 32 seconds without it (all with work_mem=1GB, compiled with
-O2 and assertions off).

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash support for grouping sets

2017-03-23 Thread Andrew Gierth
> "Mark" == Mark Dilger  writes:

 Mark> You define DiscreteKnapsack to take integer weights and double
 Mark> values, and perform the usual Dynamic Programming algorithm to
 Mark> solve.  But the only place you call this, you pass in NULL for
 Mark> the values, indicating that all the values are equal.  I'm
 Mark> confused why in this degenerate case you bother with the DP at
 Mark> all.  Can't you just load the knapsack from lightest to heaviest
 Mark> after sorting, reducing the runtime complexity to O(nlogn)?  It's
 Mark> been a long day.  Sorry if I am missing something obvious.

That's actually a very good question.

(Though in passing I should point out that the runtime cost is going to
be negligible in all practical cases. Even an 8-way cube (256 grouping
sets) has only 70 rollups, and a 4-way cube has only 6; the limit of
4096 grouping sets was only chosen because it was a nice round number
that was larger than comparable limits in other database products. Other
stuff in the grouping sets code has worse time bounds; the
bipartite-match code used to minimize the number of rollups is
potentially O(n^2.5) in the number of grouping sets.)

Thinking about it, it's not at all obvious what we should be optimizing
for here in the absence of accurate sort costs. Right now what matters
is saving as many sort steps as possible, since that's a saving likely
to be many orders of magnitude greater than the differences between two
sorts. The one heuristic that might be useful is to prefer the smaller
estimated size if other factors are equal, just on memory usage grounds,
but even that seems a bit tenuous.

I'm inclined to leave things as they are in the current patch, and maybe
revisit it during beta if we get any relevant feedback from people
actually using grouping sets?

 Mark> I'm guessing you intend to extend the code at some later date to
 Mark> have different values for items.  Is that right?

Well, as I mentioned in a reply to Andres, we probably should eventually
optimize for greatest reduction in cost, and the code as it stands would
allow that to be added easily, but that would require having more
accurate cost info than we have now. cost_sort doesn't take into
consideration the number or types of sort columns or the costs of their
comparison functions, so all sorts of the same input data are costed
equally.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash support for grouping sets

2017-03-22 Thread Andrew Gierth
> "Andres" == Andres Freund  writes:

 >> -   Assert(newphase == 0 || newphase == aggstate->current_phase + 1);
 >> +   Assert(newphase <= 1 || newphase == aggstate->current_phase + 1);

 Andres> I think this somewhere in the file header needs an expanded
 Andres> explanation about what these "special" phases mean.

I could move most of the block comment for ExecInitAgg up into the file
header; would that be better?

 Andres> I've not yet read up on the thread, or the whole patch, but an
 Andres> explanation of the memory management for all those tables would
 Andres> be good somewhere (maybe I've just not read that far).

I can expand on that.

 Andres> "mixed types" sounds a bit weird.

changing to "if there are both types present"

 >> +   values = palloc((1 + max_weight) * sizeof(double));
 >> +   sets = palloc((1 + max_weight) * sizeof(Bitmapset *));

 Andres> We usually cast the result of palloc.

Rough count in the backend has ~400 without casts to ~1350 with, so this
doesn't seem to have been consistently enforced.

 >> +static void
 >> +_outGroupingSetData(StringInfo str, const GroupingSetData *node)
 >> +{
 >> +   WRITE_NODE_TYPE("GSDATA");
 >> +
 >> +   WRITE_NODE_FIELD(set);
 >> +   WRITE_FLOAT_FIELD(numGroups, "%.0f");
 >> +}

 Andres> .0f?

Copied from every other node that uses "%.0f" to write out a numGroups
value.

 >> +static grouping_sets_data *
 >> +preprocess_grouping_sets(PlannerInfo *root)
 >> +{

 Andres> It'd not hurt to add a comment as to what this function is
 Andres> actually doing.

Sure.

 Andres> I hope there's tests for both these branches.

A number of tests for both unsortable and unhashable cases are in the
regression test.

 >> +   /*
 >> +* Is it hashable? We pretend empty sets are hashable even 
 >> though we
 >> +* actually force them not to be hashed later. But don't bother 
 >> if
 >> +* there's nothing but empty sets.
 >> +*/

 Andres> Why?

If there's no non-empty grouping set then there's nothing to hash (the
hash code assumes at least one column). I'll put that in the comment.

 >> @@ -3214,6 +3407,11 @@ get_number_of_groups(PlannerInfo *root,
 >> * estimate_hashagg_tablesize
 >> * estimate the number of bytes that a hash aggregate hashtable will
 >> * require based on the agg_costs, path width and dNumGroups.
 >> + *
 >> + * XXX this may be over-estimating the size now that hashagg knows to omit
 >> + * unneeded columns from the hashtable. Also for mixed-mode grouping sets,
 >> + * grouping columns not in the hashed set are counted here even though 
 >> hashagg
 >> + * won't store them. Is this a problem?
 >> */

 Andres> Hm. This seems like it could move us to use sorting, even if
 Andres> not actually warranted.

This is largely a pre-existing issue that this patch doesn't try and
fix.  That would be an issue for a separate patch, I think.  I merely
added the comment to point it out.

 Andres> What's the newline after the comment about?

Old style habits die hard.

 >> +   RelOptInfo *grouped_rel,
 >> +   Path *path,
 >> +   bool is_sorted,
 >> +   bool can_hash,
 >> +   PathTarget *target,
 >> +   grouping_sets_data *gd,
 >> +   const AggClauseCosts 
 >> *agg_costs,
 >> +   double dNumGroups)
 >> +{
 >> +   Query  *parse = root->parse;
 >> +
 >> +   /*
 >> +* If we're not being offered sorted input, then only consider plans 
 >> that
 >> +* can be done entirely by hashing.
 >> +*
 >> +* We can hash everything if it looks like it'll fit in work_mem. But if
 >> +* the input is actually sorted despite not being advertised as such, we
 >> +* prefer to make use of that in order to use less memory.
 >> +* If none of the grouping sets are sortable, then ignore the work_mem
 >> +* limit and generate a path anyway, since otherwise we'll just fail.
 >> +*/
 >> +   if (!is_sorted)
 >> +   {
 >> +   List   *new_rollups = NIL;
 >> +   RollupData *unhashable_rollup = NULL;
 >> +   List   *sets_data;
 >> +   List   *empty_sets_data = NIL;
 >> +   List   *empty_sets = NIL;
 >> +   ListCell   *lc;
 >> +   ListCell   *l_start = list_head(gd->rollups);
 >> +   AggStrategy strat = AGG_HASHED;
 >> +   Sizehashsize;
 >> +   double  exclude_groups = 0.0;
 >> +
 >> +   Assert(can_hash);
 >> +
 >> +   if (pathkeys_contained_in(root->group_pathkeys, path->pathkeys))
 >> +   {
 >> +   unhashable_rollup = lfirst(l_start);

 

Re: [HACKERS] Hash support for grouping sets

2017-03-22 Thread Andrew Gierth
> "Andres" == Andres Freund  writes:

 Andres> Changes to advance_aggregates() are, in my experience, quite
 Andres> likely to have performance effects.  This needs some
 Andres> performance tests.
 [...]
 Andres> Looks like it could all be noise, but it seems worthwhile to
 Andres> look into some.

Trying to sort out signal from noise when dealing with performance
impacts of no more than a few percent is _extremely hard_ these days.

Remember this, from a couple of years back?  http://tinyurl.com/op9qg8a

That's a graph of performance results from tests where the only change
being made was adding padding bytes to a function that was never called
during the test. Performance differences on the order of 3% were being
introduced by doing nothing more than changing the actual memory
locations of functions.

My latest round of tests on this patch shows a similar effect. Here's
one representative test timing (a select count(*) with no grouping sets
involved):

 master: 5727ms
 gsets_hash: 5949ms
 gsets_hash + 500 padding bytes: 5680ms

Since the effect of padding is going to vary over time as other,
unrelated, patches happen, I think I can safely claim that the
performance of the patch at least overlaps the noise range of the
performance of current code.  To get a more definitive result, it would
be necessary to run at least some dozens of tests, with different
padding sizes, and determine whether the average changes detectably
between the two versions.  I will go ahead and do this, out of sheer
curiosity if nothing else, but the preliminary results suggest there's
probably nothing worth holding up the patch for.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash support for grouping sets

2017-03-22 Thread Andrew Gierth
[snip]

This thread seems to have gone quiet - is it time for me to just go
ahead and commit the thing anyway? Anyone else want to weigh in?

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Composite IS NULL again, this time with plpgsql

2017-03-18 Thread Andrew Gierth
This came up recently on irc:

create type t1 as (a integer, b integer);
create type t2 as (p t1, q t1);
create function null_t2() returns t2 language sql
  as $f$ select null::t2; $f$;

Now consider the following plpgsql:

declare
  v t2;
begin
  v := null_t2();
  raise info 'v is null = %', v is null;  -- shows 'f'
end;

The value of the plpgsql variable v always behaves as if it had been
assigned row(row(null,null),row(null,null)) -- which, as we thrashed out
in detail sometime last year, isn't the same as row(null,null) or just
null and isn't considered null by IS NULL.

So there's no non-ugly way to preserve the nullity or otherwise of the
function result. The best I could come up with in answer to the original
problem (how to detect in plpgsql whether a composite value returned by
a function was actually null) was this:

declare
  v t2;
  v_null boolean;
begin
  select x into v from null_t2() as x where not (x is null);
  v_null := not found;
  raise info 'v is null = %', v_null;
end;

which lacks a certain obviousness (and you have to remember to use not
(x is null) rather than x is not null).

This obviously happens because plpgsql is storing the variable as an
expanded list of fields rather than as a single composite datum, and
rebuilding the datum value when it needs to be passed to SQL for
evaluation.  Without an "isnull" flag for each composite subvalue, this
can't regenerate the original datum closely enough to give the same
value on an isnull test.

What to do about this?

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash support for grouping sets

2017-03-08 Thread Andrew Gierth
> "Mark" == Mark Dilger  writes:

 Mark> Hi Andrew,

 Mark> Reviewing the patch a bit more, I find it hard to understand the
 Mark> comment about passing -1 as a flag for finalize_aggregates.  Any
 Mark> chance you can spend a bit more time word-smithing that code
 Mark> comment?

Actually, ignore that prior response, I'll refactor it a bit to remove
the need for the comment at all.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash support for grouping sets

2017-03-08 Thread Andrew Gierth
> "Mark" == Mark Dilger  writes:

 Mark> Hi Andrew,

 Mark> Reviewing the patch a bit more, I find it hard to understand the
 Mark> comment about passing -1 as a flag for finalize_aggregates.  Any
 Mark> chance you can spend a bit more time word-smithing that code
 Mark> comment?

Sure.

How does this look for wording (I'll incorporate it into an updated
patch later):

/*
 * Compute the final value of all aggregates for one group.
 *
 * This function handles only one grouping set at a time.  But in the hash
 * case, it's the caller's responsibility to have selected the set already, and
 * we pass in -1 as currentSet here to flag that; this also changes how we
 * handle the indexing into AggStatePerGroup as explained below.
 *
 * Results are stored in the output econtext aggvalues/aggnulls.
 */
static void
finalize_aggregates(AggState *aggstate,
AggStatePerAgg peraggs,
AggStatePerGroup pergroup,
int currentSet)
{
ExprContext *econtext = aggstate->ss.ps.ps_ExprContext;
Datum  *aggvalues = econtext->ecxt_aggvalues;
bool   *aggnulls = econtext->ecxt_aggnulls;
int aggno;
int transno;

/*
 * If currentSet >= 0, then we're doing sorted grouping, and pergroup 
is an
 * array of size numTrans*numSets which we have to index into using
 * currentSet in addition to transno. The caller may not have selected 
the
 * set, so we do that.
 *
 * If currentSet < 0, then we're doing hashed grouping, and pergroup is 
an
 * array of only numTrans items (since for hashed grouping, each 
grouping
 * set is in a separate hashtable).  We rely on the caller having done
 * select_current_set, and we fudge currentSet to 0 in order to make the
 * same indexing calculations work as for the grouping case.
 */
if (currentSet >= 0)
select_current_set(aggstate, currentSet, false);
else
currentSet = 0;


-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash support for grouping sets

2017-03-08 Thread Andrew Gierth
> "Mark" == Mark Dilger  writes:

 Mark> On linux/gcc the patch generates a warning in nodeAgg.c that is
 Mark> fairly easy to fix.  Using -Werror to make catching the error
 Mark> easier, I get:

what gcc version is this exactly?

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash support for grouping sets

2017-02-24 Thread Andrew Gierth
> "Thom" == Thom Brown  writes:

 Thom> This doesn't apply cleanly to latest master.  Could you please
 Thom> post a rebased patch?

Sure.

-- 
Andrew (irc:RhodiumToad)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index c9e0a3e..480a07e 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -996,6 +996,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		pname = "HashAggregate";
 		strategy = "Hashed";
 		break;
+	case AGG_MIXED:
+		pname = "MixedAggregate";
+		strategy = "Mixed";
+		break;
 	default:
 		pname = "Aggregate ???";
 		strategy = "???";
@@ -1924,6 +1928,19 @@ show_grouping_set_keys(PlanState *planstate,
 	ListCell   *lc;
 	List	   *gsets = aggnode->groupingSets;
 	AttrNumber *keycols = aggnode->grpColIdx;
+	const char *keyname;
+	const char *keysetname;
+
+	if (aggnode->aggstrategy == AGG_HASHED || aggnode->aggstrategy == AGG_MIXED)
+	{
+		keyname = "Hash Key";
+		keysetname = "Hash Keys";
+	}
+	else
+	{
+		keyname = "Group Key";
+		keysetname = "Group Keys";
+	}
 
 	ExplainOpenGroup("Grouping Set", NULL, true, es);
 
@@ -1938,7 +1955,7 @@ show_grouping_set_keys(PlanState *planstate,
 			es->indent++;
 	}
 
-	ExplainOpenGroup("Group Keys", "Group Keys", false, es);
+	ExplainOpenGroup(keysetname, keysetname, false, es);
 
 	foreach(lc, gsets)
 	{
@@ -1962,12 +1979,12 @@ show_grouping_set_keys(PlanState *planstate,
 		}
 
 		if (!result && es->format == EXPLAIN_FORMAT_TEXT)
-			ExplainPropertyText("Group Key", "()", es);
+			ExplainPropertyText(keyname, "()", es);
 		else
-			ExplainPropertyListNested("Group Key", result, es);
+			ExplainPropertyListNested(keyname, result, es);
 	}
 
-	ExplainCloseGroup("Group Keys", "Group Keys", false, es);
+	ExplainCloseGroup(keysetname, keysetname, false, es);
 
 	if (sortnode && es->format == EXPLAIN_FORMAT_TEXT)
 		es->indent--;
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index aa08152..f059560 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -122,12 +122,19 @@
  *	  specific).
  *
  *	  Where more complex grouping sets are used, we break them down into
- *	  "phases", where each phase has a different sort order.  During each
- *	  phase but the last, the input tuples are additionally stored in a
- *	  tuplesort which is keyed to the next phase's sort order; during each
- *	  phase but the first, the input tuples are drawn from the previously
- *	  sorted data.  (The sorting of the data for the first phase is handled by
- *	  the planner, as it might be satisfied by underlying nodes.)
+ *	  "phases", where each phase has a different sort order (except phase 0
+ *	  which is reserved for hashing).  During each phase but the last, the
+ *	  input tuples are additionally stored in a tuplesort which is keyed to the
+ *	  next phase's sort order; during each phase but the first, the input
+ *	  tuples are drawn from the previously sorted data.  (The sorting of the
+ *	  data for the first phase is handled by the planner, as it might be
+ *	  satisfied by underlying nodes.)
+ *
+ *	  Hashing can be mixed with sorted grouping.  To do this, we have an
+ *	  AGG_MIXED strategy that populates the hashtables during the first sorted
+ *	  phase, and switches to reading them out after completing all sort phases.
+ *	  We can also support AGG_HASHED with multiple hash tables and no sorting
+ *	  at all.
  *
  *	  From the perspective of aggregate transition and final functions, the
  *	  only issue regarding grouping sets is this: a single call site (flinfo)
@@ -139,8 +146,6 @@
  *	  sensitive to the grouping set for which the aggregate function is
  *	  currently being called.
  *
- *	  TODO: AGG_HASHED doesn't support multiple grouping sets yet.
- *
  * Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
  *
@@ -432,6 +437,7 @@ typedef struct AggStatePerGroupData
  */
 typedef struct AggStatePerPhaseData
 {
+	AggStrategy aggstrategy;	/* strategy for this phase */
 	int			numsets;		/* number of grouping sets (or 0) */
 	int		   *gset_lengths;	/* lengths of grouping sets */
 	Bitmapset **grouped_cols;	/* column groupings for rollup */
@@ -440,7 +446,31 @@ typedef struct AggStatePerPhaseData
 	Sort	   *sortnode;		/* Sort node for input ordering for phase */
 }	AggStatePerPhaseData;
 
+/*
+ * AggStatePerHashData - per-hashtable state
+ *
+ * When doing grouping sets with hashing, we have one of these for each
+ * grouping set. (When doing hashing without grouping sets, we have just one of
+ * them.)
+ */
+
+typedef struct AggStatePerHashData
+{
+	TupleHashTable hashtable;	/* hash table with one entry per group */
+	TupleHashIterator hashiter; /* for iterating through hash table */
+	TupleTableSlot *hashslot;	/* slot for loading hash table */
+	FmgrInfo   *hashfunctions;	/* 

Re: [HACKERS] smallint out of range EXECUTEing prepared statement

2017-01-18 Thread Andrew Gierth
> "Justin" == Justin Pryzby  writes:

 Justin> Is this expected behavior ?

 Justin> ts=# SELECT * FROM t WHERE site_id=32768 LIMIT 1;
 Justin> (0 rows)

 Justin> ts=# PREPARE x AS SELECT * FROM t WHERE site_id=$1 LIMIT 1;
 Justin> PREPARE
 Justin> ts=# EXECUTE x(32768);
 Justin> ERROR:  smallint out of range

If column "site_id" is of type smallint, then parse analysis will deduce
a type of smallint for $1, which is otherwise of unknown type. So the
prepared statement "x" then has one parameter of type smallint.

Passing 32768 for that parameter therefore fails with the expected error.

 Justin> ts=# PREPARE y AS SELECT * FROM t WHERE site_id::int=$1 LIMIT 1;
 Justin> PREPARE

Now $1 is of type integer, not smallint, because parse analysis sees
(integer = unknown) and deduces the type from that.

(a better way would be WHERE site_id = $1::integer, which would allow
index usage on site_id, unlike your example)

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Couple of issues with prepared FETCH commands

2017-01-11 Thread Andrew Gierth
> "Robert" == Robert Haas  writes:

 >> But the problem that actually came up is this: if you do the
 >> PQprepare before the named cursor has actually been opened, then
 >> everything works _up until_ the first event, such as a change to
 >> search_path, that forces a revalidation; and at that point it fails
 >> with the "must not change result type" error _even if_ the cursor
 >> always has exactly the same result type.  This happens because the
 >> initial prepare actually stored NULL for plansource->resultDesc,
 >> since the cursor name wasn't found, while on the revalidate, when
 >> the cursor obviously does exist, it gets the actual result type.
 >> 
 >> It seems a bit of a "gotcha" to have it fail in this case when the
 >> result type isn't actually being checked in other cases.

 Robert> To me, that sounds like a bug.

So what's the appropriate fix? My suggestion would be to suppress the
result type check entirely for utility statements; EXPLAIN and SHOW
always return the same thing anyway, and both FETCH and EXECUTE are
subject to the issue described. This would mean conceding that the
result descriptor of a prepared FETCH or EXECUTE might change (i.e. a
Describe of the statement might not be useful, though a Describe of an
opened portal would be ok). I think this would result in the most
obviously correct behavior from the client point of view.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sequence data type

2017-01-10 Thread Andrew Gierth
> "Daniel" == Daniel Verite  writes:

 Daniel> Consider the case of a table with a SERIAL column which later
 Daniel> has to become a BIGINT due to growth.  Currently a user would
 Daniel> just alter the column's type and does need to do anything with
 Daniel> the sequence.

 Daniel> With the patch, it becomes a problem because

 Daniel> - ALTER SEQUENCE seqname MAXVALUE new_value
 Daniel> will fail because new_value is beyond the range of INT4.

 Daniel> - ALTER SEQUENCE seqname TYPE BIGINT
 Daniel> does not exist (yet?)

 Daniel> - DROP SEQUENCE seqname  (with the idea of recreating the
 Daniel> sequence immediately after) will be rejected because the table
 Daniel> depends on the sequence.

 Daniel> What should a user do to upgrade the SERIAL column?

Something along the lines of:

begin;
alter table tablename alter column id drop default;
alter sequence tablename_id_seq owned by none;
create sequence tablename_id_seq2 as bigint owned by tablename.id;
select setval('tablename_id_seq2', last_value, is_called) from tablename_id_seq;
drop sequence tablename_id_seq;
alter table tablename alter column id type bigint;
alter table tablename alter column id set default nextval('tablename_id_seq2');
commit;

Not impossible, but not at all obvious and quite involved. (And -1 for
this feature unless this issue is addressed.)

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Couple of issues with prepared FETCH commands

2017-01-10 Thread Andrew Gierth
(This came up on IRC, but I'm not sure to what extent it should be
considered a "bug")

If you do  PQprepare(conn, "myfetch", "FETCH ALL FROM mycursor", ...);
then the results are unpredictable in two ways:

Firstly, nothing causes the plancache entry to be revalidated just
because "mycursor" got opened with a different query, so the result type
can change between uses.  This could be considered a "caveat user" case,
though, and I can't find anything that actually breaks.

But the problem that actually came up is this: if you do the PQprepare
before the named cursor has actually been opened, then everything works
_up until_ the first event, such as a change to search_path, that forces
a revalidation; and at that point it fails with the "must not change
result type" error _even if_ the cursor always has exactly the same
result type.  This happens because the initial prepare actually stored
NULL for plansource->resultDesc, since the cursor name wasn't found,
while on the revalidate, when the cursor obviously does exist, it gets
the actual result type.

It seems a bit of a "gotcha" to have it fail in this case when the
result type isn't actually being checked in other cases.

(In the reported case, search_path was actually changing due to the
creation of a temp table, so there was a certain amount of
spooky-action-at-a-distance to figure out in order to locate the
problem.)

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Teaching query_planner to handle multiple sort orders?

2017-01-07 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 >> Of course there is one good solution, which is to have query_planner
 >> take a set of acceptable output sort orders rather than just a
 >> single one.

 >> How wild an idea is this?

 Tom> It's been on my to-do list for years, see e.g.
 Tom> https://postgr.es/m/5702.1480896...@sss.pgh.pa.us

 Tom> The idea wouldn't have been very helpful before the upper planner
 Tom> got path-ified, but now the only stumbling block is a shortage of
 Tom> round tuits.

I'll take a shot at it, then.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Teaching query_planner to handle multiple sort orders?

2017-01-07 Thread Andrew Gierth
So in the grouping sets patch post, I said:

 >> There is one current weakness which I don't see a good solution for:
 >> the planner code still has to pick a single value for group_pathkeys
 >> before planning the input path.  This means that we sometimes can't
 >> choose a minimal set of sorts, because we may have chosen a sort
 >> order for a grouping set that should have been hashed,

Of course there is one good solution, which is to have query_planner
take a set of acceptable output sort orders rather than just a single
one.

How wild an idea is this? I guess a possible hazard is an increase in
the number of possible mergejoin paths to consider. What other possible
issues are there?

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Hash support for grouping sets

2017-01-05 Thread Andrew Gierth
Herewith a patch for doing grouping sets via hashing or mixed hashing
and sorting.

The principal objective is to pick whatever combination of grouping sets
has an estimated size that fits in work_mem, and minimizes the number of
sorting passes we need to do over the data, and hash those.  (Yes, this
is a knapsack problem.)

This is achieved by adding another strategy to the Agg node; AGG_MIXED
means that both hashing and (sorted or plain) grouping happens.  In
addition, support for multiple hash tables in AGG_HASHED mode is added.

Sample plans look like this:

explain select two,ten from onek group by cube(two,ten);
  QUERY PLAN  
--
 MixedAggregate  (cost=0.00..89.33 rows=33 width=8)
   Hash Key: two, ten
   Hash Key: two
   Hash Key: ten
   Group Key: ()
   ->  Seq Scan on onek  (cost=0.00..79.00 rows=1000 width=8)

explain select two,ten from onek group by two, cube(ten,twenty);
  QUERY PLAN   
---
 HashAggregate  (cost=86.50..100.62 rows=162 width=12)
   Hash Key: two, ten, twenty
   Hash Key: two, ten
   Hash Key: two
   Hash Key: two, twenty
   ->  Seq Scan on onek  (cost=0.00..79.00 rows=1000 width=12)

set work_mem='64kB';
explain select count(*) from tenk1
  group by grouping sets (unique1,thousand,hundred,ten,two);
   QUERY PLAN   

 MixedAggregate  (cost=1535.39..3023.89 rows=2 width=28)
   Hash Key: two
   Hash Key: ten
   Hash Key: hundred
   Group Key: unique1
   Sort Key: thousand
 Group Key: thousand
   ->  Sort  (cost=1535.39..1560.39 rows=1 width=20)
 Sort Key: unique1
 ->  Seq Scan on tenk1  (cost=0.00..458.00 rows=1 width=20)
(10 rows)

Columns with unhashable or unsortable data types are handled
appropriately, though obviously every individual grouping set must end
up containing compatible column types.

There is one current weakness which I don't see a good solution for: the
planner code still has to pick a single value for group_pathkeys before
planning the input path.  This means that we sometimes can't choose a
minimal set of sorts, because we may have chosen a sort order for a
grouping set that should have been hashed, for example:

explain select count(*) from tenk1
  group by grouping sets (two,unique1,thousand,hundred,ten);
   QUERY PLAN   

 MixedAggregate  (cost=1535.39..4126.28 rows=2 width=28)
   Hash Key: ten
   Hash Key: hundred
   Group Key: two
   Sort Key: unique1
 Group Key: unique1
   Sort Key: thousand
 Group Key: thousand
   ->  Sort  (cost=1535.39..1560.39 rows=1 width=20)
 Sort Key: two
 ->  Seq Scan on tenk1  (cost=0.00..458.00 rows=1 width=20)

(3 sorts, vs. 2 in the previous example that listed the same grouping
sets in different order)

Current status of the work: probably still needs cleanup, maybe some
better regression tests, and Andres has expressed some concern over the
performance impact of additional complexity in advance_aggregates; it
would be useful if this could be tested.  But it should be reviewable
as-is.

-- 
Andrew (irc:RhodiumToad)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index c762fb0..6c787ea 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -972,6 +972,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		pname = "HashAggregate";
 		strategy = "Hashed";
 		break;
+	case AGG_MIXED:
+		pname = "MixedAggregate";
+		strategy = "Mixed";
+		break;
 	default:
 		pname = "Aggregate ???";
 		strategy = "???";
@@ -1900,6 +1904,19 @@ show_grouping_set_keys(PlanState *planstate,
 	ListCell   *lc;
 	List	   *gsets = aggnode->groupingSets;
 	AttrNumber *keycols = aggnode->grpColIdx;
+	const char *keyname;
+	const char *keysetname;
+
+	if (aggnode->aggstrategy == AGG_HASHED || aggnode->aggstrategy == AGG_MIXED)
+	{
+		keyname = "Hash Key";
+		keysetname = "Hash Keys";
+	}
+	else
+	{
+		keyname = "Group Key";
+		keysetname = "Group Keys";
+	}
 
 	ExplainOpenGroup("Grouping Set", NULL, true, es);
 
@@ -1914,7 +1931,7 @@ show_grouping_set_keys(PlanState *planstate,
 			es->indent++;
 	}
 
-	ExplainOpenGroup("Group Keys", "Group Keys", false, es);
+	ExplainOpenGroup(keysetname, keysetname, false, es);
 
 	foreach(lc, gsets)
 	{
@@ -1938,12 +1955,12 @@ show_grouping_set_keys(PlanState *planstate,
 		}
 
 		if (!result && es->format == EXPLAIN_FORMAT_TEXT)
-			ExplainPropertyText("Group Key", "()", es);
+			ExplainPropertyText(keyname, "()", es);
 		else
-			ExplainPropertyListNested("Group Key", result, es);
+			

Re: [HACKERS] Strange result with LATERAL query

2016-08-24 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 Tom> Hm, I was just working on inserting something of the sort into
 Tom> ExecInitAgg.  But I guess we could do it in the planner too.  Will
 Tom> run with your approach.

 Tom> I think it's a bit too stupid as-is, though.  We don't need to
 Tom> recalculate for Params in aggdirectargs, do we?

In theory we would need to. But in practice we don't, because we don't
allow ordered aggs in AGG_HASHED mode anyway.

We could skip filling in aggParam at all if not in AGG_HASHED mode I
guess.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Strange result with LATERAL query

2016-08-24 Thread Andrew Gierth
> "Pavel" == Pavel Stehule  writes:

 Pavel> The result should not depend on GUC - hashagg on/off changing
 Pavel> output - it is error.

I don't think anyone's suggesting leaving it unfixed, just whether the
fix should introduce unnecessary rescans of the aggregate input.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Strange result with LATERAL query

2016-08-24 Thread Andrew Gierth
>>>>> "Andrew" == Andrew Gierth <and...@tao11.riddles.org.uk> writes:
>>>>> "Tom" == Tom Lane <t...@sss.pgh.pa.us> writes:

 Tom> I'm not sure if it's worth trying to distinguish whether the Param
 Tom> is inside any aggregate calls or not.

How about:

-- 
Andrew (irc:RhodiumToad)

diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index 1ec2515..4a9fefb 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -3426,10 +3426,11 @@ ExecReScanAgg(AggState *node)
 
 		/*
 		 * If we do have the hash table and the subplan does not have any
-		 * parameter changes, then we can just rescan the existing hash table;
-		 * no need to build it again.
+		 * parameter changes, we might still need to rebuild the hash if any of
+		 * the parameter changes affect the input to aggregate functions.
 		 */
-		if (outerPlan->chgParam == NULL)
+		if (outerPlan->chgParam == NULL
+			&& !bms_overlap(node->ss.ps.chgParam, aggnode->aggParam))
 		{
 			ResetTupleHashIterator(node->hashtable, >hashiter);
 			return;
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index c7a0644..7b5eb86 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -871,6 +871,7 @@ _copyAgg(const Agg *from)
 	COPY_SCALAR_FIELD(aggstrategy);
 	COPY_SCALAR_FIELD(aggsplit);
 	COPY_SCALAR_FIELD(numCols);
+	COPY_BITMAPSET_FIELD(aggParam);
 	if (from->numCols > 0)
 	{
 		COPY_POINTER_FIELD(grpColIdx, from->numCols * sizeof(AttrNumber));
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 1fab807..5e48edd 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -706,6 +706,7 @@ _outAgg(StringInfo str, const Agg *node)
 	WRITE_ENUM_FIELD(aggstrategy, AggStrategy);
 	WRITE_ENUM_FIELD(aggsplit, AggSplit);
 	WRITE_INT_FIELD(numCols);
+	WRITE_BITMAPSET_FIELD(aggParam);
 
 	appendStringInfoString(str, " :grpColIdx");
 	for (i = 0; i < node->numCols; i++)
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index c83063e..67dcf17 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -2004,6 +2004,7 @@ _readAgg(void)
 	READ_ENUM_FIELD(aggstrategy, AggStrategy);
 	READ_ENUM_FIELD(aggsplit, AggSplit);
 	READ_INT_FIELD(numCols);
+	READ_BITMAPSET_FIELD(aggParam);
 	READ_ATTRNUMBER_ARRAY(grpColIdx, local_node->numCols);
 	READ_OID_ARRAY(grpOperators, local_node->numCols);
 	READ_LONG_FIELD(numGroups);
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index a46cc10..2e56484 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -82,6 +82,7 @@ static Bitmapset *finalize_plan(PlannerInfo *root,
 			  Bitmapset *valid_params,
 			  Bitmapset *scan_params);
 static bool finalize_primnode(Node *node, finalize_primnode_context *context);
+static bool finalize_agg_primnode(Node *node, finalize_primnode_context *context);
 
 
 /*
@@ -2659,8 +2660,30 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
 			  );
 			break;
 
-		case T_Hash:
 		case T_Agg:
+			{
+Agg		   *agg = (Agg *) plan;
+finalize_primnode_context aggcontext;
+
+/*
+ * We need to know which params are referenced in inputs to
+ * aggregate calls, so that we know whether we need to rebuild
+ * the hashtable in the AGG_HASHED case. Rescan the tlist and
+ * qual for this purpose.
+ */
+
+aggcontext = context;
+aggcontext.paramids = NULL;
+
+finalize_agg_primnode((Node *) agg->plan.targetlist, );
+finalize_agg_primnode((Node *) agg->plan.qual, );
+
+/* remember results for execution */
+agg->aggParam = aggcontext.paramids;
+			}
+			break;
+
+		case T_Hash:
 		case T_Material:
 		case T_Sort:
 		case T_Unique:
@@ -2812,6 +2835,24 @@ finalize_primnode(Node *node, finalize_primnode_context *context)
 }
 
 /*
+ * finalize_agg_primnode: add IDs of all PARAM_EXEC params appearing inside
+ * Aggref nodes in the given expression tree to the result set.
+ */
+static bool
+finalize_agg_primnode(Node *node, finalize_primnode_context *context)
+{
+	if (node == NULL)
+		return false;
+	if (IsA(node, Aggref))
+	{
+		finalize_primnode(node, context);
+		return false;			/* no more to do here */
+	}
+	return expression_tree_walker(node, finalize_agg_primnode,
+  (void *) context);
+}
+
+/*
  * SS_make_initplan_output_param - make a Param for an initPlan's output
  *
  * The plan is expected to return a scalar value of the given type/collation.
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 369179f..3d5e4df 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -712,6 +712,7 @@ typedef struct Agg
 	AggStrategy aggstrategy;	/* basic strategy, see nodes.h */
 	AggSplit	

Re: [HACKERS] Strange result with LATERAL query

2016-08-24 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 Tom> I'm not sure if it's worth trying to distinguish whether the Param
 Tom> is inside any aggregate calls or not.  The existing code gets the
 Tom> right answer for

 Tom> select array(select x+sum(y) from generate_series(1,3) y group by y) 
 Tom>   from generate_series(1,3) x;

 Tom> and we'd be losing some efficiency for cases like that if we fix
 Tom> it as above.  But is it worth the trouble?

The loss of efficiency could be significant, since it's forcing a rescan
of what could be an expensive plan.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Strange result with LATERAL query

2016-08-24 Thread Andrew Gierth
>>>>> "Andrew" == Andrew Gierth <and...@tao11.riddles.org.uk> writes:
>>>>> "Jeevan" == Jeevan Chalke <jeevan.cha...@enterprisedb.com> writes:

 Jeevan> Hi,
 Jeevan> While playing with LATERAL along with some aggregates in
 Jeevan> sub-query, I have observed somewhat unusual behavior.

 Andrew> Simpler example not needing LATERAL:

 Andrew> select array(select sum(x+y) from generate_series(1,3) y group by y)
 Andrew>   from generate_series(1,3) x;

Oh, and another way to verify that it's a bug and not expected behavior
is that it goes away with set enable_hashagg=false;

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Strange result with LATERAL query

2016-08-24 Thread Andrew Gierth
> "Jeevan" == Jeevan Chalke  writes:

 Jeevan> Hi,
 Jeevan> While playing with LATERAL along with some aggregates in
 Jeevan> sub-query, I have observed somewhat unusual behavior.

Simpler example not needing LATERAL:

select array(select sum(x+y) from generate_series(1,3) y group by y)
  from generate_series(1,3) x;
 ?column? 
--
 {2,4,3}
 {2,4,3}
 {2,4,3}
(3 rows)

This is broken all the way back, it's not a recent bug.

Something is wrong with the way chgParam is being handled in Agg nodes.
The code in ExecReScanAgg seems to assume that if the lefttree doesn't
have any parameter changes then it suffices to re-project the data from
the existing hashtable; but of course this is nonsense if the parameter
is in an input to an aggregate function.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SP-GiST support for inet datatypes

2016-08-21 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 > Emre Hasegeli  writes:
 >> Attached patches add SP-GiST support to the inet datatypes.

 Tom> I started to look at this patch.  The reported speedup is pretty
 Tom> nice, but ...

The builtin gist support for inet seems quite surprisingly slow; ip4r
beats it into the ground without difficulty. (I'd be curious how well
this sp-gist version stacks up against ip4r.)

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] C++ port of Postgres

2016-08-17 Thread Andrew Gierth
> "Robert" == Robert Haas  writes:

 Robert> Hmm, so sizeof() has different semantics in C vs. C++?

No. '1' has different semantics in C vs C++. (In C, '1' is an int,
whereas in C++ it's a char. It so happens that (sizeof '1') is the only
case which is valid in both C and C++ where this makes a difference.)

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] EXLCUDE constraints and Hash indexes

2016-08-17 Thread Andrew Gierth
> "Jeff" == Jeff Janes  writes:

 Jeff> From: https://www.postgresql.org/docs/9.4/static/sql-createtable.html

 Jeff> "The access method must support amgettuple (see Chapter 55); at
 Jeff> present this means GIN cannot be used. Although it's allowed, there is
 Jeff> little point in using B-tree or hash indexes with an exclusion
 Jeff> constraint, because this does nothing that an ordinary unique
 Jeff> constraint doesn't do better. So in practice the access method will
 Jeff> always be GiST or SP-GiST."

I also recently found a case where using btree exclusion constraints was
useful: a unique index on an expression can't be marked deferrable, but
the equivalent exclusion constraint can be.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] No longer possible to query catalogs for index capabilities?

2016-08-13 Thread Andrew Gierth
Latest patch.

Names and scopes are as per discussion. New files for code and
regression test. Docs included.

-- 
Andrew (irc:RhodiumToad)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 7830334..4552a74 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -16290,6 +16290,18 @@ SELECT pg_type_is_visible('myschema.widget'::regtype);

 

+pg_indexam_has_property
+   
+
+   
+pg_index_column_has_property
+   
+
+   
+pg_index_has_property
+   
+
+   
 pg_options_to_table

 
@@ -16477,6 +16489,21 @@ SELECT pg_type_is_visible('myschema.widget'::regtype);
   number of columns, pretty-printing is implied
   
   
+   pg_indexam_has_property(am_oid, prop_name)
+   boolean
+   Test whether an index access method has a specified property
+  
+  
+   pg_index_column_has_property(index_oid, column_no, prop_name)
+   boolean
+   Test whether an index column has a specified property
+  
+  
+   pg_index_has_property(index_oid, prop_name)
+   boolean
+   Test whether the access method for the specified index has a specified property
+  
+  
pg_options_to_table(reloptions)
setof record
get the set of storage option name/value pairs
@@ -16620,6 +16647,141 @@ SELECT pg_type_is_visible('myschema.widget'::regtype);
   
 
   
+   pg_indexam_has_property,
+   pg_index_has_property, and
+   pg_index_column_has_property return whether the
+   specified access method, index, or index column possesses the named
+   property. NULL is returned if the property name is not
+   known; true if the property is present,
+   false if it is not. Refer to
+for column properties,
+for index properties, and
+for access method properties.
+  
+
+  
+   Index Column Properties
+   
+
+ NameDescription
+
+
+ 
+  asc
+  Does the column sort in ascending order on a forward scan?
+  
+ 
+ 
+  desc
+  Does the column sort in descending order on a forward scan?
+  
+ 
+ 
+  nulls_first
+  Does the column sort with nulls first on a forward scan?
+  
+ 
+ 
+  nulls_last
+  Does the column sort with nulls last on a forward scan?
+  
+ 
+ 
+  orderable
+  Does the column possess any ordering properties such
+  as ASC or DESC
+ 
+ 
+  distance_orderable
+  Can the column be returned in order by a "distance" operator,
+  for example ORDER BY col <-> constant
+ 
+ 
+  returnable
+  Can the column value be returned in an index-only scan?
+  
+ 
+ 
+  search_array
+  Does the column support array queries with ANY
+  natively in the index AM?
+ 
+ 
+  search_nulls
+  Does the column support IS NULL or
+  IS NOT NULL conditions in the index?
+  
+ 
+
+   
+  
+
+  
+   Index Properties
+   
+
+ NameDescription
+
+
+ 
+  clusterable
+  Can this index be used in a CLUSTER operation?
+  
+ 
+ 
+  backward_scan
+  Can this index be scanned in the reverse direction?
+  
+ 
+ 
+  index_scan
+  Does this index support plain (non-bitmap) scans?
+  
+ 
+ 
+  bitmap_scan
+  Does this index support bitmap scans?
+  
+ 
+
+   
+  
+
+  
+   Index Access Method Properties
+   
+
+ NameDescription
+
+
+ 
+  can_order
+  Does this access method support ASC,
+  DESC and related keywords on columns in
+  CREATE INDEX?
+  
+ 
+ 
+  can_unique
+  Does this access method support
+  CREATE UNIQUE INDEX?
+  
+ 
+ 
+  can_multi_col
+  Does this access method support multiple columns?
+  
+ 
+ 
+  can_exclude
+  Does this access method support exclusion constraints?
+  
+ 
+
+   
+  
+
+  
pg_options_to_table returns the set of storage
option name/value pairs
(option_name/option_value) when passed
diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
index e8034b9..fece954 100644
--- a/src/backend/access/gist/gist.c
+++ b/src/backend/access/gist/gist.c
@@ -16,11 +16,14 @@
 
 #include "access/gist_private.h"
 #include "access/gistscan.h"
+#include "access/htup_details.h"
 #include "catalog/pg_collation.h"
+#include "catalog/pg_opclass.h"
 #include "miscadmin.h"
 #include "utils/index_selfuncs.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
+#include "utils/syscache.h"
 
 
 /* non-export function prototypes */
@@ -87,6 +90,7 @@ gisthandler(PG_FUNCTION_ARGS)
 	amroutine->amendscan = gistendscan;
 	amroutine->ammarkpos = NULL;
 	amroutine->amrestrpos = NULL;
+	amroutine->amproperty = gistproperty;
 
 	PG_RETURN_POINTER(amroutine);
 }
@@ -1490,6 +1494,98 @@ freeGISTstate(GISTSTATE *giststate)
 	MemoryContextDelete(giststate->scanCxt);
 }
 
+
+/*
+ *	gistproperty() 

Re: [HACKERS] Btree Index on PostgreSQL and Wiredtiger (MongoDB3.2)

2016-08-12 Thread Andrew Gierth
> "Greg" == Greg Stark  writes:

 >> No, because as the pages split, they fill more slowly (because there
 >> are now more pages). So on average in a large randomly filled index,
 >> pages spend more time nearer 50% full than 100% full. This is easy
 >> to demonstrate by creating a table with an indexed float8 column and
 >> adding batches of random() values to it, checking with pgstatindex
 >> at intervals - the average leaf density will rarely exceed 70%.
 >> 
 >> However, worst case conditions can give lower leaf densities;
 >> obviously the worst case is if the data is loaded in an order and
 >> quantity that just happens to leave every leaf page recently split.

 Greg> btree pages don't split 50/50 either. They split biased to assume
 Greg> the greater side of the split will receive more inserts -- iirc
 Greg> 70/30.

Hmm? The code in _bt_findsplitloc and _bt_checksplitloc doesn't seem to
agree with this.

(Inserting on the high leaf page is a special case, which is where the
fillfactor logic kicks in; that's why sequentially filled indexes are
(by default) 90% full rather than 100%. But other pages split into
roughly equal halves.)

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] No longer possible to query catalogs for index capabilities?

2016-08-12 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 Tom> But we need to be clear in the documentation about what this
 Tom> property actually means.  My objection to having it answer at the
 Tom> index or column level is basically that that encourages confusion
 Tom> as to what it means.

OK. Here's new output with the scope changes, and some of the names
changed in an attempt to make them clearer:

cap | AM | Index | Column 
++---+
 asc||   | t
 desc   ||   | f
 nulls_first||   | f
 nulls_last ||   | t
 orderable  ||   | t
 distance_orderable ||   | f
 returnable ||   | t
 search_array   ||   | t
 search_nulls   ||   | t
 clusterable|| t | 
 backward_scan  || t | 
 index_scan || t | 
 bitmap_scan|| t | 
 can_order  | t  |   | 
 can_unique | t  |   | 
 can_multi_col  | t  |   | 
 can_exclude| t  |   | 
(17 rows)

(The can_* names are reserved for the AM level where we're asking about
the abstract capabilities of the AM.)

Better? Worse? Any more improvements to the names?

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] No longer possible to query catalogs for index capabilities?

2016-08-12 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 >> This table shows what properties are exposed at the AM-wide level,
 >> the per-index level and the per-column level.

 Tom> +1 mostly, but I'm a bit bemused by can_order and can_backward
 Tom> having different scopes --- how come?

That's where they were in the previous list, a couple of messages up in
the thread...

Currently can_backward is always the same for all indexes in the same
AM, but I guess there's no guarantee that won't change. In the previous
message you suggested it might have to be per-column, even, but I think
that makes no sense (either the whole index is scannable in both
directions or it is not, it can't be different per column).

 Tom> Also, not sure about allowing things like can_multi_col as index
 Tom> or column properties.  That seems a bit silly: whether or not the
 Tom> AM can do multi columns, a specific index is what it is.  I'd be a
 Tom> bit inclined to have those return null except at the AM level.

I thought it would be cleaner to be able to query all properties at the
most specific level.

 Tom> In particular, I'm not sure what you intend to mean by applying
 Tom> can_unique at the index or column level.  Is that supposed to mean
 Tom> that the index or column *is* unique?

No. (We could add properties like is_unique, is_exclusion which clients
currently query in pg_index; should we?)

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] No longer possible to query catalogs for index capabilities?

2016-08-12 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 >> distance_orderable now returns true/false depending on the opclass,
 >> not just on the amcanorderbyop field. In order to do this, I've
 >> added an optional amproperty function to the AM api, which if it
 >> exists, gets first dibs on all property calls so it can override the
 >> result as it sees fit.

 Tom> Hmm, seems like for that case, it'd be easier to look into pg_amop
 Tom> and see if the opclass has any suitably-marked operators.

I thought about that, but it seemed like it could get painful. The
planner is working forwards from a known operator and matching it
against the index column, whereas we'd have to work backwards from the
opfamily, and there's no good existing index for this; in the presence
of binary-compatible types, I don't think even amoplefttype can be
assumed (e.g. a varchar column can be ordered by pg_trgm's <-> operator
which is declared for text). So it'd have to be the equivalent of:

  get index column's opclass oid
  look it up in pg_opclass to get opfamily
  for r in select * from pg_amop where amopfamily=? and amoppurpose='o'
if r.amoplefttype is binary-coercible from the index column's type
  then return true

As opposed to what I have now, which is:

  get index column's opclass oid
  look it up in pg_opclass to get opfamily/opcintype
  result = SearchSysCacheExists4(AMPROCNUM, ...)  

(in theory this could produce a false positive if there's a distance
function but no actual operators to reference it, but I think that's the
opclass author's issue)

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Btree Index on PostgreSQL and Wiredtiger (MongoDB3.2)

2016-08-12 Thread Andrew Gierth
> "Jeff" == Jeff Janes  writes:

 Jeff> But shouldn't that still leave us with a 75% full index, rather
 Jeff> than slightly over 50% full?

Average is usually about 67%-70%. (For capacity estimation I always
assume 66% for a non-sequentially-filled btree.)

 Jeff> The leaf pages start at 50%, grow to 100%, then split back to
 Jeff> 50%, then grow back to 100%.  So the average should be about 75%.

No, because as the pages split, they fill more slowly (because there are
now more pages). So on average in a large randomly filled index, pages
spend more time nearer 50% full than 100% full. This is easy to
demonstrate by creating a table with an indexed float8 column and adding
batches of random() values to it, checking with pgstatindex at intervals -
the average leaf density will rarely exceed 70%.

However, worst case conditions can give lower leaf densities; obviously
the worst case is if the data is loaded in an order and quantity that
just happens to leave every leaf page recently split.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] No longer possible to query catalogs for index capabilities?

2016-08-12 Thread Andrew Gierth
So I'm tidying up and doing docs for the next version of this patch, but
here for comment is the current functionality:

select cap,
   pg_indexam_has_property(a.oid, cap) as "AM",
   pg_index_has_property('onek_hundred'::regclass, cap) as "Index",
   pg_index_column_has_property('onek_hundred'::regclass, 1, cap) as 
"Column"
  from pg_am a,
   unnest(array['asc', 'desc', 'nulls_first', 'nulls_last',
'orderable', 'distance_orderable', 'can_order',
'can_unique', 'can_multi_col', 'can_exclude',
'can_backward', 'can_cluster', 'index_scan',
'bitmap_scan', 'can_return', 'search_array',
'search_nulls'])
 with ordinality as u(cap,ord)
 where a.amname='btree'
 order by ord;
cap | AM | Index | Column 
++---+
 asc||   | t
 desc   ||   | f
 nulls_first||   | f
 nulls_last ||   | t
 orderable  ||   | t
 distance_orderable ||   | f
 can_order  | t  | t | t
 can_unique | t  | t | t
 can_multi_col  | t  | t | t
 can_exclude| t  | t | t
 can_backward   || t | t
 can_cluster|| t | t
 index_scan || t | t
 bitmap_scan|| t | t
 can_return ||   | t
 search_array   ||   | t
 search_nulls   ||   | t
(17 rows)

This table shows what properties are exposed at the AM-wide level, the
per-index level and the per-column level.

distance_orderable now returns true/false depending on the opclass, not
just on the amcanorderbyop field. In order to do this, I've added an
optional amproperty function to the AM api, which if it exists, gets
first dibs on all property calls so it can override the result as it
sees fit.

can_return likewise reflects the result of index_can_return.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Why is box <-> point missing, and box <-> box not indexable?

2016-08-12 Thread Andrew Gierth
point <-> point, circle <-> point and polygon <-> point all exist as
orderable-by-operator operators (in fact they are the only ones by
default). But there's no box <-> point operator at all, and no index
support for box <-> box.

Was this intentional, or just a strange oversight?

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] No longer possible to query catalogs for index capabilities?

2016-08-10 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 >> - this still has everything in amapi.c rather than creating any new
 >> files. Also, the regression tests are in create_index.sql for lack
 >> of any obviously better place.

 Tom> This more than doubles the size of amapi.c, so it has a definite
 Tom> feel of tail-wags-dog for me, even if that seemed like an
 Tom> appropriate place otherwise which it doesn't really.  I think a
 Tom> new .c file under adt/ is the way to go, with extern declarations
 Tom> in builtins.h.

Yeah, I'm fine with that.  adt/amutils.c unless I see some better
suggestion.

 Tom> Maybe we need a new regression test case file too.  I don't much
 Tom> like adding this to create_index because (a) it doesn't
 Tom> particularly seem to match that file's purpose of setting up
 Tom> indexes on the standard regression test tables, and (b) that file
 Tom> is a bottleneck in parallel regression runs because it can't run
 Tom> in parallel with much else.

Good point. I looked around to see if anything was testing
pg_get_indexdef, thinking that that would be a good place, but it seems
that pg_get_indexdef is tested only in incidental ways (in collate and
rules, the latter of which tests it only with invalid input).

I'll do the next version with a new file, unless a better idea shows up.

 >> Comments?

 Tom> Why did you go with "capability" rather than "property" in the
 Tom> exposed function names?  The latter seems much closer to le mot
 Tom> juste, especially for things like asc/desc.

The first version (which dealt only with AMs) went with "capability"
because it was dealing with what the AM _could_ do rather than what was
defined on any specific index.

The second version added pg_index_column_has_property because that was
the name that had been used in discussion.

Changing them all to "property" would be more consistent I suppose.

 Tom> I'd personally cut the list of pg_am replacement properties way
 Tom> back, as I believe much of what you've got there is not actually
 Tom> of use to applications, and some of it is outright
 Tom> counterproductive.  An example is exposing amcanreturn as an
 Tom> index-AM boolean.

For AM-wide properties, it may be that they have to be considered
"lossy" when tested against the AM oid rather than on an individual
index or column - at the AM level, "false" might mean "this won't work"
while "true" would mean "this might work sometimes, not guaranteed to
work on every index".  The documentation should probably indicate this.

So these properties (I've changed all the names here, suggestions
welcome for better ones) I think should be testable on the AM, each with
an example of why:

  can_order
- if this is false, an admin tool shouldn't try and put ASC or DESC
  in a CREATE INDEX

  can_unique
- if this is false, an admin tool might, for example, want to not
  offer the user the option of CREATE UNIQUE INDEX with this AM

  can_multi_col
- if this is false, an admin tool might want to allow the user to
  select only one column

  can_exclude 
- a new property that indicates whether the AM can be used for
  exclusion constraints; at present this matches "amgettuple" but
  that implementation detail should of course be hidden

(One possible refinement here could be to invert the sense of all of
these, making them no_whatever, so that "false" and "null" could be
treated the same by clients. Or would that be too confusing?)

These could be limited to being testable only on a specified index, and
not AM-wide:

  can_backward
  clusterable
  index_scan
  bitmap_scan
  optional_key (? maybe)
  predicate_locks (? maybe)

And these for individual columns:

  can_return
  search_array  (? maybe)
  search_nulls  (? maybe)
  operator_orderable  (or distance_orderable? what's a good name?)
  orderable
  asc
  desc
  nulls_first
  nulls_last

A question: implementing can_return as a per-column property looks like
it requires actually opening the index rel, rather than just consulting
the syscache the way that most pg_get_* functions do. Should it always
open it, or only for properties that need it?

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] No longer possible to query catalogs for index capabilities?

2016-08-10 Thread Andrew Gierth
Updated patch. Changes:

  - returns NULL rather than "cache lookup failed"

  - added pg_index_column_has_property (incl. docs)

  - added regression tests

Not changed / need consideration:

  - this still has everything in amapi.c rather than creating any new
files. Also, the regression tests are in create_index.sql for lack
of any obviously better place.

The list of column properties is:

  ordered  - (same as "amcanorder" AM capability)
  ordered_asc
  ordered_desc
  ordered_nulls_first
  ordered_nulls_last

If "ordered" is true then exactly one of _asc/_desc and exactly one of
_nulls_first/_last will be true; if "ordered" is false then all the
others will be false too. The intended usage is something like

  CASE WHEN pg_index_column_has_property(idx, attno, 'ordered_asc')
   THEN 'ASC'
   WHEN pg_index_column_has_property(idx, attno, 'ordered_desc')
   THEN 'DESC'
   ELSE ''  -- or NULL
  END

Comments?

-- 
Andrew (irc:RhodiumToad)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index ccb9b97..684f7b3 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -577,6 +577,89 @@

   
 
+  
+Capability information formerly stored in pg_am
+is now available via the functions
+pg_indexam_has_capability and
+pg_index_has_capability
+(see ). The following
+boolean-valued capability names are currently supported:
+  
+
+  
+   Capabilities
+
+   
+
+ 
+  Name
+  Description
+ 
+
+
+ 
+  amcanorder
+  Does the access method support ordered scans sorted by the
+   indexed column's value?
+ 
+ 
+  amcanorderbyop
+  Does the access method support ordered scans sorted by the result
+   of an operator on the indexed column?
+ 
+ 
+  amcanbackward
+  Does the access method support backward scanning?
+ 
+ 
+  amcanunique
+  Does the access method support unique indexes?
+ 
+ 
+  amcanmulticol
+  Does the access method support multicolumn indexes?
+ 
+ 
+  amoptionalkey
+  Does the access method support a scan without any constraint
+   for the first index column?
+ 
+ 
+  amsearcharray
+  Does the access method support ScalarArrayOpExpr searches?
+ 
+ 
+  amsearchnulls
+  Does the access method support IS NULL/NOT NULL searches?
+ 
+ 
+  amstorage
+  Can index storage data type differ from column data type?
+ 
+ 
+  amclusterable
+  Can an index of this type be clustered on?
+ 
+ 
+  ampredlocks
+  Does an index of this type manage fine-grained predicate locks?
+ 
+ 
+  amgettuple
+  Does the access method provide an amgettuple function?
+ 
+ 
+  amgetbitmap
+  Does the access method provide an amgetbitmap function?
+ 
+ 
+  amcanreturn
+  Does the access method support index-only scans?
+ 
+
+   
+  
+
  
 
 
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 7830334..d2fe506 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -16290,6 +16290,18 @@ SELECT pg_type_is_visible('myschema.widget'::regtype);

 

+pg_indexam_has_capability
+   
+
+   
+pg_index_column_has_property
+   
+
+   
+pg_index_has_capability
+   
+
+   
 pg_options_to_table

 
@@ -16477,6 +16489,21 @@ SELECT pg_type_is_visible('myschema.widget'::regtype);
   number of columns, pretty-printing is implied
   
   
+   pg_indexam_has_capability(am_oid, cap_name)
+   boolean
+   Test whether an index access method has a specified capability
+  
+  
+   pg_index_column_has_property(index_oid, column_no, prop_name)
+   boolean
+   Test whether an index column has a specified property
+  
+  
+   pg_index_has_capability(index_oid, cap_name)
+   boolean
+   Test whether the access method for the specified index has a specified capability
+  
+  
pg_options_to_table(reloptions)
setof record
get the set of storage option name/value pairs
@@ -16620,6 +16647,73 @@ SELECT pg_type_is_visible('myschema.widget'::regtype);
   
 
   
+   pg_indexam_has_capability and
+   pg_index_has_capability return whether the specified
+   access method, or the access method for the specified index, advertises the
+   named capability. NULL is returned if the capability
+   name is not known; true if the capability is advertised,
+   false if it is not. Refer
+   to  for capability names and their meanings.
+  
+
+  
+   pg_index_column_has_property returns whether the
+   specified index column possesses the named property.
+   NULL is returned if the property name is not
+   known; true if the property is present,
+   false if it is not. Index column property names and the
+   matching clauses of CREATE INDEX are given in
+   .
+  
+
+  
+   Index Column 

Re: [HACKERS] No longer possible to query catalogs for index capabilities?

2016-08-09 Thread Andrew Gierth
> "Alvaro" == Alvaro Herrera  writes:

 >> One idea is utils/adt/misc.c.  Or we could make a new file under
 >> utils/adt/ though I'm not very sure what to name it.  amaccess.c?
 >> catutils.c?  If there's only ever likely to be one or two functions
 >> of this ilk, maybe a new file is overkill and we should just use
 >> misc.c.

 Alvaro> I like the idea of a new file; I have a hunch that it will
 Alvaro> grow, given that we're expanding in this area, and perhaps we
 Alvaro> can find some existing stuff to relocate there in the future.
 Alvaro> I don't think a small file is a problem, anyway.

 Alvaro> How about amfuncs.c?  Maybe it can live in catalog/ instead of
 Alvaro> utils/adt?

Well, the existing patch used access/index/amapi.c for the AM capability
functions. There may be some merit in keeping everything together - I
asked because it didn't seem at first glance that the index column
property function belonged there, but on second thought there's some
overlap in that in future, if indoptions ever acquires any AM-specific
flags, it may be necessary for pg_index_column_has_property to call into
an AM-specific function.

So, here are some options:

1. Put everything in access/index/amapi.c

2. Move either all of access/index/amapi.c, or just the SQL-callable
   part of it (amvalidate), to utils/adt/amfuncs.c and put new stuff in
   there

3. put pg_index[am]_has_capability in access/index/amapi.c and
   pg_index_column_has_property in utils/adt/misc.c

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] No longer possible to query catalogs for index capabilities?

2016-08-09 Thread Andrew Gierth
> "Kevin" == Kevin Grittner  writes:

 >>> Building on the has-property approach Andrew suggested, I wonder if
 >>> we need something like pg_index_column_has_property(indexoid, colno,
 >>> propertyname) with properties like "sortable", "desc", "nulls first".
 >> 
 >> This seems simple enough, on the surface.  Why not run with this design?

 Kevin> Andrew's patch, plus this, covers everything I can think of.

Where'd be a good place to put that function? ruleutils? catalog/index.c ?

(ruleutils is way too big already)

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Oddity with NOT IN

2016-08-06 Thread Andrew Gierth
>>>>> "Andrew" == Andrew Gierth <and...@tao11.riddles.org.uk> writes:

 Andrew> The easy to catch case, I think, is when the targetlist of the
 Andrew> IN or NOT IN subquery contains vars of the outer query level
 Andrew> but no vars of the inner one and no volatile functions. This
 Andrew> can be checked for with a handful of lines in the parser or a
 Andrew> couple of dozen lines in a plugin module (though one would have
 Andrew> to invent an error code, none of the existing WARNING sqlstates
 Andrew> would do).

Actually thinking about this, if you did it in a module, you'd probably
want to make it an ERROR not a WARNING, because you'd want to actually
stop queries like

delete from t1 where x in (select x from table_with_no_column_x);

rather than let them run.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Oddity with NOT IN

2016-08-06 Thread Andrew Gierth
> "Pavel" == Pavel Stehule  writes:

 >> Well now I feel dumb...
 >> 
 >> It would be very useful if we had some way to warn users about stuff
 >> like this. Emitting a NOTICE comes to mind.

 Pavel> This can be valid query

It can be, but it essentially never is. The cases where you genuinely
want a correlated IN query are rare, but even then there would be
something in the targetlist that referenced the inner query.

The easy to catch case, I think, is when the targetlist of the IN or NOT
IN subquery contains vars of the outer query level but no vars of the
inner one and no volatile functions. This can be checked for with a
handful of lines in the parser or a couple of dozen lines in a plugin
module (though one would have to invent an error code, none of the
existing WARNING sqlstates would do).

Maybe David Fetter's suggested module for catching missing WHERE clauses
could be expanded into a more general SQL-'Lint' module?

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] No longer possible to query catalogs for index capabilities?

2016-08-06 Thread Andrew Gierth
> "Bruce" == Bruce Momjian  writes:

 Bruce> Would it be helpful to output an array of strings representing
 Bruce> the index definition?

 >> Why would that help, if the point is to enable programmatic access
 >> to information?

 Bruce> I was thinking an array of strings would avoid problems in
 Bruce> having to re-scan the output for tokens.

OK, but that still leaves the issue of how to interpret each string,
yes?

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Crash in GetOldestSnapshot()

2016-08-06 Thread Andrew Gierth
> "Amit" == Amit Kapila  writes:

 Amit> Sure, that is the reason of crash, but even if we do that it will
 Amit> lead to an error "no known snapshots".  Here, what is going on is
 Amit> that we initialized toast snapshot when there is no active
 Amit> snapshot in the backend, so GetOldestSnapshot() won't return any
 Amit> snapshot.

Hmm.

So this happens because RETURNING queries run to completion immediately
and populate a tuplestore with the results, and the portal then fetches
from the tuplestore to send to the destination. The assumption is that
the tuplestore output can be processed without needing a snapshot, which
obviously is not true now if it contains toasted data.

In a similar case in the past involving holdable cursors, the solution
was to detoast _before_ storing in the tuplestore (see
PersistHoldablePortal). I guess the question now is, under what
circumstances is it now allowable to detoast a datum with no active
snapshot? (Wouldn't it be necessary in such a case to know what the
oldest snapshot ever used in the transaction was?)

 Amit> I think for such situations, we need to initialize the lsn and
 Amit> whenTaken of ToastSnapshot as we do in GetSnapshotData() [1].

Would that not give a too-recent LSN, resulting in possibly failing to
fetch the toast rows?

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Crash in GetOldestSnapshot()

2016-08-06 Thread Andrew Gierth
> "Andreas" == Andreas Seltenreich  writes:

418   if (OldestActiveSnapshot != NULL)
419   ActiveLSN = OldestActiveSnapshot->as_snap->lsn;
420
421   if (XLogRecPtrIsInvalid(RegisteredLSN) || RegisteredLSN > ActiveLSN)
422   return OldestActiveSnapshot->as_snap;

This second conditional should clearly be inside the first one...

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] No longer possible to query catalogs for index capabilities?

2016-08-06 Thread Andrew Gierth
> "Bruce" == Bruce Momjian  writes:

 >> As far as I understood Andrew's use case, he was specifically *not*
 >> interested in a complete representation of an index definition, but
 >> rather about whether it had certain properties that would be of
 >> interest to query-constructing applications.

Well, I wouldn't limit it to query-constructing applications.

I'll give another random example that I thought of. Suppose an
administrative GUI (I have no idea if any of the existing GUIs do this)
has an option to do CLUSTER on a table; how should it know which indexes
to offer the user to cluster on, without access to amclusterable?

 Bruce> Would it be helpful to output an array of strings representing
 Bruce> the index definition?

Why would that help, if the point is to enable programmatic access to
information?

Anyway, what I haven't seen in this thread is any implementable
counter-proposal other than the "just hardcode the name 'btree'"
response that was given in the JDBC thread, which I don't consider
acceptable in any sense. Is 9.6 going to go out like this or is action
going to be taken before rc1?

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bogus ANALYZE results for an otherwise-unique column with many nulls

2016-08-05 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 Tom> Also, the way that the value is calculated in the
 Tom> samples-not-all-distinct case corresponds to the way I have it in
 Tom> the patch.

Ahh, gotcha.  You're referring to this:

/*
 * If we estimated the number of distinct values at more than 10% of
 * the total row count (a very arbitrary limit), then assume that
 * stadistinct should scale with the row count rather than be a fixed
 * value.
 */
if (stats->stadistinct > 0.1 * totalrows)
stats->stadistinct = -(stats->stadistinct / totalrows);

where "totalrows" includes nulls obviously. So this expects negative
stadistinct to be scaled by the total table size, and the all-distinct
case should do the same.

Objection withdrawn.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bogus ANALYZE results for an otherwise-unique column with many nulls

2016-08-05 Thread Andrew Gierth
>>>>> "Andrew" == Andrew Gierth <and...@tao11.riddles.org.uk> writes:
>>>>> "Tom" == Tom Lane <t...@sss.pgh.pa.us> writes:

 Tom> What I did in the patch is to scale the formerly fixed "-1.0"
 Tom> stadistinct estimate to discount the fraction of nulls we found.

 Andrew> This seems quite dubious to me. stadistinct representing only
 Andrew> the non-null values seems to me to be substantially more useful
 Andrew> and less confusing; it should be up to consumers to take
 Andrew> stanullfrac into account (in general they already do) since in
 Andrew> many cases we explicitly do _not_ want to count nulls.

Hm. I am wrong about this, since it's the fact that consumers are taking
stanullfrac into account that makes the value wrong in the first place.
For example, if a million-row table has stanullfrac=0.9 and
stadistinct=-1, then get_variable_numdistinct is returning 1 million,
and (for example) var_eq_non_const divides 0.1 by that to give a
selectivity of 1 in 10 million, which is obviously wrong.

But I think the fix is still wrong, because it changes the meaning of
ALTER TABLE ... ALTER col SET (n_distinct=...)  in a non-useful way; it
is no longer possible to nail down a useful negative n_distinct value if
the null fraction of the column is variable. Would it not make more
sense to do the adjustment in get_variable_numdistinct, instead?

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bogus ANALYZE results for an otherwise-unique column with many nulls

2016-08-05 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 Tom> What I did in the patch is to scale the formerly fixed "-1.0"
 Tom> stadistinct estimate to discount the fraction of nulls we found.

This seems quite dubious to me. stadistinct representing only the
non-null values seems to me to be substantially more useful and less
confusing; it should be up to consumers to take stanullfrac into account
(in general they already do) since in many cases we explicitly do _not_
want to count nulls.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wanting to learn about pgsql design decision

2016-08-02 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 >> - Why to read from a table, both a usage permission on the schema
 >> and a read access permission on the table is needed?

 Tom> Because the SQL standard says so.

You'd think, but in fact it doesn't; the spec (at least 2008 and the
2011 drafts) has no concept of grantable permissions on schemas, and
ties table ownership and schema ownership together.

(See the definition of  to see that there's nothing there
for schemas, and the definition of  for the fact that
it's the schema owner who also owns the table and gets the initial
grants on it, and  and  to
confirm that only the schema owner can alter or drop the table. The
access rules for  only require permission on a table
column, no mention of schemas.)

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] copyParamList

2016-07-26 Thread Andrew Gierth
> "Robert" == Robert Haas  writes:

 Robert> So I think we instead ought to fix it as in the attached.

 Robert>if (retval->paramMask != NULL &&
 Robert> -  !bms_is_member(i, retval->paramMask))
 Robert> +  !bms_is_member(i, from->paramMask))

Need to change both references to retval->paramMask, not just one.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] No longer possible to query catalogs for index capabilities?

2016-07-26 Thread Andrew Gierth
And a doc patch to go with it:

-- 
Andrew (irc:RhodiumToad)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 0689cc9..3e13e38 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -577,6 +577,89 @@

   
 
+  
+Capability information formerly stored in pg_am
+is now available via the functions
+pg_indexam_has_capability and
+pg_index_has_capability
+(see ). The following
+boolean-valued capability names are currently supported:
+  
+
+  
+   Capabilities
+
+   
+
+ 
+  Name
+  Description
+ 
+
+
+ 
+  amcanorder
+  Does the access method support ordered scans sorted by the
+   indexed column's value?
+ 
+ 
+  amcanorderbyop
+  Does the access method support ordered scans sorted by the result
+   of an operator on the indexed column?
+ 
+ 
+  amcanbackward
+  Does the access method support backward scanning?
+ 
+ 
+  amcanunique
+  Does the access method support unique indexes?
+ 
+ 
+  amcanmulticol
+  Does the access method support multicolumn indexes?
+ 
+ 
+  amoptionalkey
+  Does the access method support a scan without any constraint
+   for the first index column?
+ 
+ 
+  amsearcharray
+  Does the access method support ScalarArrayOpExpr searches?
+ 
+ 
+  amsearchnulls
+  Does the access method support IS NULL/NOT NULL searches?
+ 
+ 
+  amstorage
+  Can index storage data type differ from column data type?
+ 
+ 
+  amclusterable
+  Can an index of this type be clustered on?
+ 
+ 
+  ampredlocks
+  Does an index of this type manage fine-grained predicate locks?
+ 
+ 
+  amgettuple
+  Does the access method provide an amgettuple function?
+ 
+ 
+  amgetbitmap
+  Does the access method provide an amgetbitmap function?
+ 
+ 
+  amcanreturn
+  Does the access method support index-only scans?
+ 
+
+   
+  
+
  
 
 
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index baef80d..fd6b983 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -16193,6 +16193,14 @@ SELECT pg_type_is_visible('myschema.widget'::regtype);

 

+pg_indexam_has_capability
+   
+
+   
+pg_index_has_capability
+   
+
+   
 pg_options_to_table

 
@@ -16380,6 +16388,16 @@ SELECT pg_type_is_visible('myschema.widget'::regtype);
   number of columns, pretty-printing is implied
   
   
+   pg_indexam_has_capability(am_oid, cap_name)
+   boolean
+   Test whether an index access method has a specified capability
+  
+  
+   pg_index_has_capability(index_oid, cap_name)
+   boolean
+   Test whether the access method for the specified index has a specified capability
+  
+  
pg_options_to_table(reloptions)
setof record
get the set of storage option name/value pairs
@@ -16523,6 +16541,16 @@ SELECT pg_type_is_visible('myschema.widget'::regtype);
   
 
   
+   pg_indexam_has_capability and
+   pg_index_has_capability return whether the specified
+   access method, or the access method for the specified index, advertises the
+   named capability. NULL is returned if the capability
+   name is not known; true if the capability is advertised,
+   false if it is not. Refer
+   to  for capability names and their meanings.
+  
+
+  
pg_options_to_table returns the set of storage
option name/value pairs
(option_name/option_value) when passed

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] No longer possible to query catalogs for index capabilities?

2016-07-25 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 Tom> Andrew still hasn't shown a concrete example of what he needs to
 Tom> do and why.

The issue I ran into was the exact same one as in the JDBC thread I
linked to earlier: correctly interpreting pg_index.indoption (to get the
ASC / DESC and NULLS FIRST/LAST settings), which requires knowing
whether amcanorder is true to determine whether to look at the bits at
all. The guy I was helping was using an earlier pg version, so it didn't
affect him (yet); I hit it when trying to test the query on 9.6.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] No longer possible to query catalogs for index capabilities?

2016-07-25 Thread Andrew Gierth
Here is my proposed code (first cut; obviously it needs docs too).
Opinions?

-- 
Andrew (irc:RhodiumToad)

diff --git a/src/backend/access/index/amapi.c b/src/backend/access/index/amapi.c
index d347ebc..3e7e084 100644
--- a/src/backend/access/index/amapi.c
+++ b/src/backend/access/index/amapi.c
@@ -16,7 +16,9 @@
 #include "access/amapi.h"
 #include "access/htup_details.h"
 #include "catalog/pg_am.h"
+#include "catalog/pg_class.h"
 #include "catalog/pg_opclass.h"
+#include "utils/elog.h"
 #include "utils/syscache.h"
 
 
@@ -119,3 +121,95 @@ amvalidate(PG_FUNCTION_ARGS)
 
 	PG_RETURN_BOOL(result);
 }
+
+
+/*
+ * Test capabilities of an index or index AM.
+ */
+static Datum
+indexam_capability(FunctionCallInfo fcinfo,
+   Oid amoid, char *nameptr, int namelen)
+{
+	IndexAmRoutine *routine = NULL;
+	bool		ret;
+
+	if (namelen < 1 || namelen > 14)
+		PG_RETURN_NULL();
+
+	routine = GetIndexAmRoutineByAmId(amoid);
+
+	if (namelen == 10 && memcmp(nameptr, "amcanorder", 10) == 0)
+		ret = (routine->amcanorder) ? true : false;
+	else if (namelen == 14 && memcmp(nameptr, "amcanorderbyop", 14) == 0)
+		ret = (routine->amcanorderbyop) ? true : false;
+	else if (namelen == 13 && memcmp(nameptr, "amcanbackward", 13) == 0)
+		ret = (routine->amcanbackward) ? true : false;
+	else if (namelen == 11 && memcmp(nameptr, "amcanunique", 11) == 0)
+		ret = (routine->amcanunique) ? true : false;
+	else if (namelen == 13 && memcmp(nameptr, "amcanmulticol", 13) == 0)
+		ret = (routine->amcanmulticol) ? true : false;
+	else if (namelen == 13 && memcmp(nameptr, "amoptionalkey", 13) == 0)
+		ret = (routine->amoptionalkey) ? true : false;
+	else if (namelen == 13 && memcmp(nameptr, "amsearcharray", 13) == 0)
+		ret = (routine->amsearcharray) ? true : false;
+	else if (namelen == 13 && memcmp(nameptr, "amsearchnulls", 13) == 0)
+		ret = (routine->amsearchnulls) ? true : false;
+	else if (namelen == 9 && memcmp(nameptr, "amstorage", 9) == 0)
+		ret = (routine->amstorage) ? true : false;
+	else if (namelen == 13 && memcmp(nameptr, "amclusterable", 13) == 0)
+		ret = (routine->amclusterable) ? true : false;
+	else if (namelen == 11 && memcmp(nameptr, "ampredlocks", 11) == 0)
+		ret = (routine->ampredlocks) ? true : false;
+	else if (namelen == 11 && memcmp(nameptr, "amcanreturn", 11) == 0)
+		ret = (routine->amcanreturn) ? true : false;
+	else if (namelen == 10 && memcmp(nameptr, "amgettuple", 10) == 0)
+		ret = (routine->amgettuple) ? true : false;
+	else if (namelen == 11 && memcmp(nameptr, "amgetbitmap", 11) == 0)
+		ret = (routine->amgetbitmap) ? true : false;
+	else
+		PG_RETURN_NULL();
+
+	PG_RETURN_BOOL(ret);
+}
+
+/*
+ * Test capability of an AM specified by the AM Oid.
+ */
+Datum
+pg_indexam_has_capability(PG_FUNCTION_ARGS)
+{
+	Oid			amoid = PG_GETARG_OID(0);
+	text	   *name = PG_GETARG_TEXT_PP(1);
+	char	   *nameptr = VARDATA_ANY(name);
+	int			namelen = VARSIZE_ANY_EXHDR(name);
+
+	return indexam_capability(fcinfo, amoid, nameptr, namelen);
+}
+
+/*
+ * Test capability of the AM for an index specified by relation Oid.
+ */
+Datum
+pg_index_has_capability(PG_FUNCTION_ARGS)
+{
+	Oid			relid = PG_GETARG_OID(0);
+	text	   *name = PG_GETARG_TEXT_PP(1);
+	char	   *nameptr = VARDATA_ANY(name);
+	int			namelen = VARSIZE_ANY_EXHDR(name);
+	Oid			amoid;
+	HeapTuple	tuple;
+	Form_pg_class rd_rel;
+
+	tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
+	if (!HeapTupleIsValid(tuple))
+		elog(ERROR, "could not find tuple for relation %u", relid);
+	rd_rel = (Form_pg_class) GETSTRUCT(tuple);
+	if (rd_rel->relkind != RELKIND_INDEX)
+		ereport(ERROR,
+(errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("relation %u is not an index", relid)));
+	amoid = rd_rel->relam;
+	ReleaseSysCache(tuple);
+
+	return indexam_capability(fcinfo, amoid, nameptr, namelen);
+}
diff --git a/src/include/access/amapi.h b/src/include/access/amapi.h
index 35f1061..0e2f9d7 100644
--- a/src/include/access/amapi.h
+++ b/src/include/access/amapi.h
@@ -171,4 +171,7 @@ extern IndexAmRoutine *GetIndexAmRoutineByAmId(Oid amoid);
 
 extern Datum amvalidate(PG_FUNCTION_ARGS);
 
+extern Datum pg_indexam_has_capability(PG_FUNCTION_ARGS);
+extern Datum pg_index_has_capability(PG_FUNCTION_ARGS);
+
 #endif   /* AMAPI_H */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 5d233e3..4e696fa 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -564,6 +564,11 @@ DESCR("spgist index access method handler");
 DATA(insert OID = 335 (  brinhandler	PGNSP PGUID 12 1 0 0 0 f f f f t f v s 1 0 325 "2281" _null_ _null_ _null_ _null_ _null_	brinhandler _null_ _null_ _null_ ));
 DESCR("brin index access method handler");
 
+DATA(insert OID = 336 (  pg_indexam_has_capability	PGNSP PGUID 12 1 0 0 0 f f f f t f s s 2 0 16 "26 25" _null_ _null_ _null_ _null_ _null_	pg_indexam_has_capability _null_ _null_ _null_ ));
+DESCR("test capability of an index access method");
+DATA(insert OID = 337 (  pg_index_has_capability	PGNSP PGUID 

Re: [HACKERS] No longer possible to query catalogs for index capabilities?

2016-07-25 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 >> It could easily be exposed as a function interface of the form
 >> index_has_capability(oid,name) or indexam_has_capability(oid,name)
 >> without any initdb worries.

 Tom> You missed the "compelling argument why it's needed" part.  What
 Tom> is the need for this?  I'm not going to be persuaded by "it was
 Tom> there before".

How about "it was there before, and people did use it"?

In fact I notice you participated in a discussion of this a couple of
months back on the JDBC list, in which your solution was to suggest
hardcoding the name 'btree' into the query:

https://www.postgresql.org/message-id/24504.1463237368%40sss.pgh.pa.us

Doesn't that strike you as an indication that something is wrong?

 Tom> We've gotten along fine without such inspection functions for FDWs
 Tom> and tablesample methods,

which are new and not especially interesting to code doing introspection

 Tom> so I doubt that we really need them for index AMs.

People write catalog queries for indexes a whole lot more than they do
for FDWs or tablesample methods.

This whole discussion started because I wrote a catalog query for
someone on IRC, and found I couldn't do it on 9.6 because amcanorder was
gone.

 Tom> Nobody's writing applications that make decisions about which AM
 Tom> to use based on what they see in pg_am.

That's not the issue. The issue is finding information about _existing_
indexes that is not otherwise exposed.

 Tom> Moreover, I think you are missing the point about initdb.  The
 Tom> issue there is that anytime in future that we make a change to the
 Tom> AM API, we'd need to have a debate about whether and how to expose
 Tom> such a change for SQL inspection.  Defining the exposure mechanism
 Tom> as a new function rather than a new view column changes neither
 Tom> the need for a debate, nor the need for an initdb unless we decide
 Tom> that we don't need to expose anything.

I'm not proposing a new function for each capability. I'm proposing ONE
function (or two, one starting from the index rather than the AM, for
convenience). Adding more capability names would not require an initdb.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] No longer possible to query catalogs for index capabilities?

2016-07-25 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 >> With the gutting of pg_am in 9.6, there seems to be no longer any
 >> way for a query of the system catalogs to discover any of the index
 >> capabilities that were formerly columns in pg_am (notably
 >> amcanorder, amcanorderbyop, amclusterable, amsearcharray,
 >> amsearchnulls).

 >> Am I missing something or is this a significant oversight?

 Tom> It's absolutely not an oversight.  We asked when 65c5fcd35 went in
 Tom> whether there was still any need for that information to be
 Tom> available at the SQL level, and nobody appeared to care.

Perhaps you were asking the wrong people?

 Tom> We could in theory expose a view to show the data --- but since a
 Tom> large part of the point of that change was to not need initdb for
 Tom> AM API changes, and to not be constrained to exactly
 Tom> SQL-compatible representations within that API, I'm disinclined to
 Tom> do so without a fairly compelling argument why it's needed.

It could easily be exposed as a function interface of the form
index_has_capability(oid,name) or indexam_has_capability(oid,name)
without any initdb worries.

That would surely be better than the present condition of being
completely unable to get this information from SQL.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] No longer possible to query catalogs for index capabilities?

2016-07-25 Thread Andrew Gierth
With the gutting of pg_am in 9.6, there seems to be no longer any way
for a query of the system catalogs to discover any of the index
capabilities that were formerly columns in pg_am (notably amcanorder,
amcanorderbyop, amclusterable, amsearcharray, amsearchnulls).

Am I missing something or is this a significant oversight?

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: revert behavior of IS NULL on row types

2016-07-22 Thread Andrew Gierth
>>>>> "Andrew" == Andrew Gierth <and...@tao11.riddles.org.uk> writes:

 >>> Whole-row vars when constructed never contain the null value.

 David> ...but what does this mean in end-user terms?​

 Andrew> It means for example that this query:

 Andrew>   select y from x left join y on (x.id=y.id) where y is null;

 Andrew> would always return 0 rows.

On second thoughts I'll take this one back. Specifying that whole-row
vars don't contain the null value when constructed doesn't actually
result in no rows, since the construction is logically below the join;
and hence even with that rule in place, the query would return a row for
each non-matched "x" row.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: revert behavior of IS NULL on row types

2016-07-22 Thread Andrew Gierth
> "David" == David G Johnston  writes:

 >> Prohibiting IS NOT NULL is not on the cards; it's very widely used.

 David> ​Yet changing how it behaves, invisibly, is?

Did you mean prohibiting it only for composite-type args? It's obviously
widely used for non-composite args.

I would expect that >95% of cases where someone has written (x IS NOT
NULL) for x being a composite type, it's actually a bug and that NOT (x
IS NULL) was intended.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: revert behavior of IS NULL on row types

2016-07-22 Thread Andrew Gierth
> "David" == David G Johnston  writes:

 >> 1. x IS NULL  is true if and only if x has the null value (isnull set).

 David> ​I don't have a problem conforming to "ROW(NULL, NULL) IS NULL"
 David> being true...​if you somehow get a hold of something in that
 David> form, which your others points address.

This seems harmless, but I think it's not worth the pain.

I'm informed that for example on Oracle:

  select case when foo_type(null, null) is null
  then 'true' else 'false'
  end from dual;

returns 'false'.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: revert behavior of IS NULL on row types

2016-07-22 Thread Andrew Gierth
> "David" == David G Johnston  writes:

 >> 2. x IS NOT NULL  if and only if  NOT (x IS NULL)

 David> ​I would rather prohibit "IS NOT NULL" altogether.​ If one needs
 David> to test "NOT (x IS NULL)" they can write it that way.

Prohibiting IS NOT NULL is not on the cards; it's very widely used.

 >> Whole-row vars when constructed never contain the null value.

 David> ...but what does this mean in end-user terms?​

It means for example that this query:

  select y from x left join y on (x.id=y.id) where y is null;

would always return 0 rows.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Proposal: revert behavior of IS NULL on row types

2016-07-22 Thread Andrew Gierth
In light of the fact that it is an endless cause of bugs both in pg and
potentially to applications, I propose that we cease attempting to
conform to the spec's definition of IS NULL in favour of the following
rules:

1. x IS NULL  is true if and only if x has the null value (isnull set).

2. x IS NOT NULL  if and only if  NOT (x IS NULL)

3. ROW() and other row constructors never return the null value.
Whole-row vars when constructed never contain the null value. 

4. Columns or variables of composite type can (if not declared NOT NULL)
contain the null value (isnull set) which is distinct from an
all-columns-null value.

5. COALESCE(x,y) continues to return y if and only if x is the null
value. (We currently violate the spec here.)

(X. Optionally, consider adding new predicates:

  x IS ALL NULL
  x IS NOT ALL NULL
  x IS ALL NOT NULL
  x IS NOT ALL NOT NULL

which would examine the fields of x non-recursively.)

Justification:

https://www.postgresql.org/message-id/4f6a90a0-c6e8-22eb-3b7a-727f8a60f3b1%40BlueTreble.com
https://www.postgresql.org/message-id/20160708024746.1410.57282%40wrigleys.postgresql.org

Further rationale:

https://www.postgresql.org/message-id/87zip9qti4.fsf%40news-spur.riddles.org.uk

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] \timing interval

2016-07-09 Thread Andrew Gierth
> "Gavin" == Gavin Flower  writes:

 >> How about
 >> 
 >> Time: 1234567.666 ms (20m 34.6s)

 Gavin> I like that, but I think the human form should retain the 3
 Gavin> decimal places.

Scale it.

Time: 12.345 ms (0.012345s)

Time: 1234.567 ms (1.235s)

Time: 98765.432 ms (98.8s)

Time: 123456.789 ms (2m 3.5s)

Time: 12345678.666 ms (3h 24m 46s)

 Gavin> In a few years, we may well have enormously multiprocessor
 Gavin> computers with massive very fast permanent 'RAM' where the
 Gavin> entire database is always in memory, so timing to the nearest
 Gavin> microsecond could be useful.

But the original microsecond-resolution value is still right there, so I
don't see the issue.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] \timing interval

2016-07-09 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 > Peter Eisentraut  writes:
 >> I'm not quite sure what you mean by wanting to do arithmetic on the 
 >> numbers.  My phrasing of the problem is that after a long query, you 
 >> might get output like this:
 >> Time: 1234567.666 ms
 >> which is pretty useless.

 Tom> What I mean by that is that not infrequently, I'll run the same
 Tom> query several times and then want to average the results.  That's
 Tom> easy with awk or similar scripts as long as the numbers are in
 Tom> straight decimal.

 Tom> I don't mind if we provide a way to print in Babylonian-inspired
 Tom> notation(s) as well, but I'm going to be seriously annoyed if
 Tom> that's the only way to get the output.

How about

Time: 1234567.666 ms (20m 34.6s)

?

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] reserved role names

2016-07-05 Thread Andrew Gierth
> "Joshua" == Joshua D Drake  writes:

 Joshua> Is it intentional that insert and delete are allowed and select
 Joshua> is not or is it an oversight?

Just an artifact of SELECT being fully reserved (needed for (SELECT ...)
syntax to work right) while INSERT and DELETE are unreserved.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New design for FK-based join selectivity estimation

2016-06-29 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 > Tomas Vondra  writes:
 >> Attached is a reworked patch, mostly following the new design proposal 
 >> from this thread.

 Tom> Comments and testing appreciated.

This blows up (see bug 14219 for testcase) in
match_foreign_keys_to_quals on the find_base_rel call(s) in the
following case:

If the query was produced by rule expansion then the code that populates
fkinfo includes FK references to the OLD and NEW RTEs, but those might not
appear in the jointree (the testcase for the bug is a DELETE rule where
NEW clearly doesn't apply) and hence build_simple_rel was not called
(causing find_base_rel to fail). Not sure what the right fix is.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MultiXactId error after upgrade to 9.3.4

2016-06-17 Thread Andrew Gierth
> "Robert" == Robert Haas  writes:

 >> Why is the correct rule not "check for and ignore pre-upgrade mxids
 >> before even trying to fetch members"?

 Robert> I entirely believe that's the correct rule, but doesn't
 Robert> implementing it require a crystal balll?

Why would it? Pre-9.3 mxids are identified by the combination of flag
bits in the infomask, see Alvaro's patch.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MultiXactId error after upgrade to 9.3.4

2016-06-17 Thread Andrew Gierth
> "Alvaro" == Alvaro Herrera  writes:

 >> (It can, AFAICT, be inside the currently valid range due to
 >> wraparound, i.e. without there being a valid pg_multixact entry for
 >> it, because AFAICT in 9.2, once the mxid is hinted dead it is never
 >> again either looked up or cleared, so it can sit in the tuple xmax
 >> forever, even through multiple wraparounds.)

 Alvaro> HeapTupleSatisfiesVacuum removes very old multixacts

It does nothing of the kind; it only marks them HEAP_XMAX_INVALID. The
actual mxid remains in the tuple xmax field.

The failing mxids in the case I analyzed on -bugs are failing _in spite
of_ being already hinted HEAP_XMAX_INVALID, because the code path in
question doesn't check that.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MultiXactId error after upgrade to 9.3.4

2016-06-16 Thread Andrew Gierth
> "Alvaro" == Alvaro Herrera  writes:

 Alvaro> I think that was a good choice in general so that
 Alvaro> possibly-data-eating bugs could be reported, but there's a
 Alvaro> problem in the specific case of tuples carried over by
 Alvaro> pg_upgrade whose Multixact is "further in the future" compared
 Alvaro> to the nextMultiXactId counter.  I think it's pretty clear that
 Alvaro> we should let that error be downgraded to DEBUG too, like the
 Alvaro> other checks.

But that leaves an obvious third issue: it's all very well to ignore the
pre-upgrade (pre-9.3) mxid if it's older than the cutoff or it's in the
future, but what if it's actually inside the currently valid range?
Looking it up as though it were a valid mxid in that case seems to be
completely wrong and could introduce more subtle errors.

(It can, AFAICT, be inside the currently valid range due to wraparound,
i.e. without there being a valid pg_multixact entry for it, because
AFAICT in 9.2, once the mxid is hinted dead it is never again either
looked up or cleared, so it can sit in the tuple xmax forever, even
through multiple wraparounds.)

Why is the correct rule not "check for and ignore pre-upgrade mxids
before even trying to fetch members"?

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] regexp_match() returning text

2016-05-30 Thread Andrew Gierth
> "Emre" == Emre Hasegeli  writes:

 Emre> Attached patch adds regexp_match() function which is a simple
 Emre> variant of regexp_matches() that doesn't return a set.

We already have a function that takes a string and a regexp and returns
a single text result: substring().

Regexp flags other than 'g' can also be embedded in the regexp:

postgres=# select substring('foo bar' from '(?i)BA+');
 substring 
---
 ba

Returning an array containing the values of all capture groups might be
more useful (substring returns the value of the first capture group if
any, otherwise the matched string).

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls

2016-05-30 Thread Andrew Gierth
> "Dean" == Dean Rasheed  writes:

 Dean> That may be so, but we already support FILTER for all windows
 Dean> functions as well as aggregates:

Not so:

"If FILTER is specified, then only the input rows for which the
filter_clause evaluates to true are fed to the window function; other
rows are discarded. Only window functions that are aggregates accept a
FILTER clause."

(Per spec, FILTER appears only in the  production,
which is just one of the several options for .)

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] copyParamList

2016-05-27 Thread Andrew Gierth
copyParamList does not respect from->paramMask, in what looks to me like
an obvious oversight:

retval->paramMask = NULL;
[...]
/* Ignore parameters we don't need, to save cycles and space. */
if (retval->paramMask != NULL &&
!bms_is_member(i, retval->paramMask))

retval->paramMask is never set to anything not NULL in this function,
so surely that should either be initializing it to from->paramMask, or
checking from->paramMask in the conditional?

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow COPY to use parameters

2016-05-27 Thread Andrew Gierth
> "Merlin" == Merlin Moncure  writes:

 Merlin> Note, the biggest pain point I have with COPY is not being able
 Merlin> to parameterize the filename argument.

Second proof of concept attached. This goes so far as to allow
statements like:

do $$
  declare t text := 'bar'; f text := '/tmp/copytest.dat';
  begin copy (select t, now()) to (f) csv header; end;
$$;

Also "copy foo to $1" or "copy (select * from foo where x=$1) to $2" and
so on should work from PQexecParams or in a plpgsql EXECUTE.

(I haven't tried to parameterize anything other than the filename and
query. Also, it does not accept arbitrary expressions - only $n, '...'
or a columnref. $n and '...' can have parens or not, but the columnref
must have them due to conflicts with unreserved keywords PROGRAM, STDIN,
STDOUT. This could be hacked around in other ways, I guess, if the
parens are too ugly.)

-- 
Andrew (irc:RhodiumToad)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3201476..97debb7 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -37,6 +37,8 @@
 #include "optimizer/clauses.h"
 #include "optimizer/planner.h"
 #include "nodes/makefuncs.h"
+#include "nodes/nodeFuncs.h"
+#include "parser/analyze.h"
 #include "rewrite/rewriteHandler.h"
 #include "storage/fd.h"
 #include "tcop/tcopprot.h"
@@ -279,13 +281,13 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
 
 
 /* non-export function prototypes */
-static CopyState BeginCopy(bool is_from, Relation rel, Node *raw_query,
+static CopyState BeginCopy(bool is_from, Relation rel, Node *raw_query, ParamListInfo params,
 		  const char *queryString, const Oid queryRelId, List *attnamelist,
 		  List *options);
 static void EndCopy(CopyState cstate);
 static void ClosePipeToProgram(CopyState cstate);
-static CopyState BeginCopyTo(Relation rel, Node *query, const char *queryString,
-			const Oid queryRelId, const char *filename, bool is_program,
+static CopyState BeginCopyTo(Relation rel, Node *query, ParamListInfo params, const char *queryString,
+			const Oid queryRelId, Node *filename_expr, bool is_program,
 			List *attnamelist, List *options);
 static void EndCopyTo(CopyState cstate);
 static uint64 DoCopyTo(CopyState cstate);
@@ -767,6 +769,43 @@ CopyLoadRawBuf(CopyState cstate)
 }
 
 
+static char *
+CopyEvalFilename(QueryDesc *qd, Node *expr, ParamListInfo params)
+{
+	char *filename = NULL;
+
+	if (expr)
+	{
+		EState *estate = qd ? qd->estate : CreateExecutorState();
+		MemoryContext oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
+
+		Assert(exprType(expr) == CSTRINGOID);
+
+		if (qd == NULL)
+			estate->es_param_list_info = params;
+
+		{
+			ExprContext *ecxt = CreateExprContext(estate);
+			ExprState *exprstate = ExecPrepareExpr(copyObject(expr), estate);
+			bool isnull;
+			Datum result = ExecEvalExprSwitchContext(exprstate, ecxt, , NULL);
+			if (!isnull)
+filename = MemoryContextStrdup(oldcontext, DatumGetCString(result));
+			FreeExprContext(ecxt, true);
+		}
+
+		MemoryContextSwitchTo(oldcontext);
+
+		if (qd == NULL)
+			FreeExecutorState(estate);
+
+		if (!filename)
+			elog(ERROR, "COPY filename expression must not return NULL");
+	}
+
+	return filename;
+}
+
 /*
  *	 DoCopy executes the SQL COPY statement
  *
@@ -787,7 +826,7 @@ CopyLoadRawBuf(CopyState cstate)
  * the table or the specifically requested columns.
  */
 Oid
-DoCopy(const CopyStmt *stmt, const char *queryString, uint64 *processed)
+DoCopy(const CopyStmt *stmt, const char *queryString, ParamListInfo params, uint64 *processed)
 {
 	CopyState	cstate;
 	bool		is_from = stmt->is_from;
@@ -906,7 +945,7 @@ DoCopy(const CopyStmt *stmt, const char *queryString, uint64 *processed)
 			select->targetList = list_make1(target);
 			select->fromClause = list_make1(from);
 
-			query = (Node *) select;
+			query = (Node *) parse_analyze((Node *) select, queryString, NULL, 0);
 
 			/*
 			 * Close the relation for now, but keep the lock on it to prevent
@@ -929,6 +968,8 @@ DoCopy(const CopyStmt *stmt, const char *queryString, uint64 *processed)
 
 	if (is_from)
 	{
+		char *filename;
+
 		Assert(rel);
 
 		/* check read-only transaction and parallel mode */
@@ -936,15 +977,20 @@ DoCopy(const CopyStmt *stmt, const char *queryString, uint64 *processed)
 			PreventCommandIfReadOnly("COPY FROM");
 		PreventCommandIfParallelMode("COPY FROM");
 
-		cstate = BeginCopyFrom(rel, stmt->filename, stmt->is_program,
+		filename = CopyEvalFilename(NULL, stmt->filename, params);
+
+		cstate = BeginCopyFrom(rel, filename, stmt->is_program,
 			   stmt->attlist, stmt->options);
 		cstate->range_table = range_table;
 		*processed = CopyFrom(cstate);	/* copy from file to database */
 		EndCopyFrom(cstate);
+
+		if (filename)
+			pfree(filename);
 	}
 	else
 	{
-		cstate = BeginCopyTo(rel, query, queryString, relid,
+		cstate = BeginCopyTo(rel, query, params, queryString, relid,
 			 stmt->filename, stmt->is_program,
 			 

  1   2   3   4   5   >