Re: [HACKERS] Selective logical replication

2015-11-20 Thread Craig Ringer
On 20 November 2015 at 22:03, Craig Ringer  wrote:

> On 19 November 2015 at 16:48, konstantin knizhnik <
> k.knizh...@postgrespro.ru> wrote:
>
>> Hi,
>>
>> I want to use logical replication for implementing multimaster (so all
>> nodes are both sending and receiving changes).
>>
>
> Like http://bdr-project.org/ ?
>
>
>> But there is one "stupid" problem: how to prevent infinite recursion and
>> not to rereplicate replicated data.
>>
>
> You need replication origins, so you can tell where a change came from and
> in a mesh topology only send locally-originated tuples to peers.
>
> In a circular forwarding topology you instead filter out a peer's changes
> when they come back around the circle to you and forward everything else.
>
> That's exactly what the replication origins feature in 9.5 is for. It lets
> you associate a tuple with the node it came from. When doing logical
> decoding you can make decisions about whether to forward it based on your
> knowledge of the tuple's origin and the peer node(s).
>
> This is trivial to implement this on top of the pglogical output plugin -
> we already have the hooks for origin filtering in place. All you have to do
> is pass information about which nodes you want to filter out or retain.
>
>
Oh, see also the submission for 9.6.


http://www.postgresql.org/message-id/CAMsr+YGc6AYKjsCj0Zfz=X4Aczonq1SfQx9C=huyun4j2pk...@mail.gmail.com


The pglogical downstream can trivially support simple multimaster
replication. So can any other client that consumes the pglogical_output
plugin's change stream because it's designed with pluggable hooks you can
use to add your own knowledge of topology, node filtering, etc.

That said: Multimaster is a lot more complicated than just avoiding
circular replication. That's one of a very many complex issues. Dealing
with schema changes is way harder than you think because of the
queued-but-not-yet-replicated changes in an asynchronous system. That's why
we have that horrid global DDL lock in BDR (though it's being improved to
be a bit less extreme).

Conflict handling is hard, especially when foreign keys, multiple
constraints, etc get involved. There are tons and tons of hairy corner
cases.

Handling full table rewrites is a problem because the way Pg performs them
internally is currently hard for logical decoding to handle. You don't have
a good mapping from the pg_temp table being written to during the rewrite
to the final target table. It also means you lose the original replication
origin and commit timestamp info. So you need a way to make sure there are
no outstanding writes to replicate on that table when you do the full
rewrite, like a mini global checkpoint for just one table.

There are lots of things that can break an asynchronous multimaster
replication setup, often in subtle ways, that won't break standalone Pg.
Want to create an exclusion constraint? Think again. You can't do that
safely. Not without just discarding changed tuples that violate the
constraint and introducing divergence between node states, anyway.
Secondary unique indexes, FK constraints, etc, can all be challenging. A
naïve multimaster system is easy to deadlock with simple foreign key use.
There are still cases BDR doesn't handle.

I'd really, really love more involvement from others in doing multi-master
replication with PostgresQL using logical decoding, replication origins,
etc. We've got some amazing infrastructure now, and are in a position to
deliver some really cool technology if there's more uptake and interest. So
I'd love to work with you to whatever extent it's possible. Lets try to
share effort rather than reinvent. Feel free to get in touch off-list if
that's better for you at the current stage of your investigations.

(BTW, I'm quite interested in what's being discussed with distributed
locking and transactions, since that'd help solve some of the issues with
DDL in logical replication. Having a good way to get a global exclusive
lock before a table rewrite, for example.)

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Foreign join pushdown vs EvalPlanQual

2015-11-20 Thread Kouhei Kaigai
> On 2015/11/19 12:32, Robert Haas wrote:
> > On Tue, Nov 17, 2015 at 8:47 PM, Kouhei Kaigai  wrote:
> >> The attached patch is the portion cut from the previous EPQ recheck
> >> patch.
> 
> > Thanks, committed.
> 
> Thanks, Robert and KaiGai-san.
> 
> Sorry, I'm a bit late to the party.  Here are my questions:
> 
> * This patch means we can define fdw_recheck_quals even for the case of
> foreign tables with non-NIL fdw_scan_tlist.  However, we discussed in
> another thread [1] that such foreign tables might break EvalPlanQual
> tests.  Where are we on that issue?
>
In case of later locking, RefetchForeignRow() will set a base tuple
that have compatible layout of the base relation, not fdw_scan_tlist,
because RefetchForeignRow() does not have information about scan node.
Here is two solutions. 1) You should not use fdw_scan_tlist for the
FDW that uses late locking mechanism. 2) recheck callback applies
projection to fit fdw_scan_tlist (that is not difficult to provide
as a utility function by the core).

Even though we allow to set up fdw_scan_tlist on simple scan cases,
it does not mean it works for any cases.

> * For the case of foreign joins, I think fdw_recheck_quals can be
> defined for example, the same way as for the case of foreign tables, ie,
> quals not in scan.plan.qual, or ones defined as "otherclauses"
> (rinfo->is_pushed_down=true) pushed down to the remote.  But since it's
> required that the FDW has to add to the fdw_scan_tlist the set of
> columns needed to check quals in fdw_recheck_quals in preparation for
> EvalPlanQual tests, it's likely that fdw_scan_tlist will end up being
> long, leading to an increase in a total data transfer amount from the
> remote.  So, that seems not practical to me.  Maybe I'm missing
> something, but what use cases are you thinking?
>
It is trade-off. What solution do you think we can have?
To avoid data transfer used for EPQ recheck only, we can implement
FDW driver to issue remote join again on EPQ recheck, however, it
is not a wise design, isn't it?

If we would be able to have no extra data transfer and no remote
join execution during EPQ recheck, it is a perfect.
However, we have to take both advantage and disadvantage when
we determine an implementation. We usually choose a way that
has more advantage than disadvantage, but it does not mean no
disadvantage.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Selective logical replication

2015-11-20 Thread Craig Ringer
On 19 November 2015 at 16:48, konstantin knizhnik  wrote:

> Hi,
>
> I want to use logical replication for implementing multimaster (so all
> nodes are both sending and receiving changes).
>

Like http://bdr-project.org/ ?


> But there is one "stupid" problem: how to prevent infinite recursion and
> not to rereplicate replicated data.
>

You need replication origins, so you can tell where a change came from and
in a mesh topology only send locally-originated tuples to peers.

In a circular forwarding topology you instead filter out a peer's changes
when they come back around the circle to you and forward everything else.

That's exactly what the replication origins feature in 9.5 is for. It lets
you associate a tuple with the node it came from. When doing logical
decoding you can make decisions about whether to forward it based on your
knowledge of the tuple's origin and the peer node(s).

This is trivial to implement this on top of the pglogical output plugin -
we already have the hooks for origin filtering in place. All you have to do
is pass information about which nodes you want to filter out or retain.

See https://github.com/2ndQuadrant/pglogical_output and the unit tests,
especially the hook tests.


> I wonder if there is some better way to prevent some particular
> transaction from been replicated?
>

Replication origins.

http://www.postgresql.org/docs/9.5/static/replication-origins.html


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] [DESIGN] ParallelAppend

2015-11-20 Thread Robert Haas
On Fri, Nov 20, 2015 at 12:45 AM, Amit Kapila  wrote:
> Okay, but I think that's not what I am talking about.  I am talking about
> below code in cost_seqscan:
>
> - if (nworkers > 0)
>
> - run_cost = run_cost / (nworkers + 0.5);
>
> + if (path->parallel_degree > 0)
>
> + run_cost = run_cost / (path->parallel_degree + 0.5);
>
>
> It will consider 50% of master backends effort for scan of each child
> relation,
> does that look correct to you?  Wouldn't 50% of master backends effort be
> considered to scan all the child relations?

In the code you originally wrote, you were adding 1 there rather than
0.5.  That meant you were expecting the leader to do as much work as
each of its workers, which is clearly a bad estimate, because the
leader also has to do the work of gathering tuples from the workers.
0.5 might not be the right value, but it's surely better than 1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Foreign join pushdown vs EvalPlanQual

2015-11-20 Thread Etsuro Fujita

On 2015/11/19 12:32, Robert Haas wrote:

On Tue, Nov 17, 2015 at 8:47 PM, Kouhei Kaigai  wrote:

The attached patch is the portion cut from the previous EPQ recheck
patch.



Thanks, committed.


Thanks, Robert and KaiGai-san.

Sorry, I'm a bit late to the party.  Here are my questions:

* This patch means we can define fdw_recheck_quals even for the case of 
foreign tables with non-NIL fdw_scan_tlist.  However, we discussed in 
another thread [1] that such foreign tables might break EvalPlanQual 
tests.  Where are we on that issue?


* For the case of foreign joins, I think fdw_recheck_quals can be 
defined for example, the same way as for the case of foreign tables, ie, 
quals not in scan.plan.qual, or ones defined as "otherclauses" 
(rinfo->is_pushed_down=true) pushed down to the remote.  But since it's 
required that the FDW has to add to the fdw_scan_tlist the set of 
columns needed to check quals in fdw_recheck_quals in preparation for 
EvalPlanQual tests, it's likely that fdw_scan_tlist will end up being 
long, leading to an increase in a total data transfer amount from the 
remote.  So, that seems not practical to me.  Maybe I'm missing 
something, but what use cases are you thinking?


Best regards,
Etsuro Fujita

[1] http://www.postgresql.org/message-id/55af3c08.1070...@lab.ntt.co.jp



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Refactoring of LWLock tranches

2015-11-20 Thread Ildus Kurbangaliev
On Thu, 19 Nov 2015 11:09:38 -0500
Robert Haas  wrote:

> On Thu, Nov 19, 2015 at 9:04 AM, Ildus Kurbangaliev
>  wrote:
> > The moving base tranches to shared memory has been discussed many
> > times. The point is using them later in pg_stat_activity and other
> > monitoring views.  
> 
> I'm not in agreement with this idea.  Actually, I'd prefer that the
> tranches live in backend-private memory, not shared memory, so that we
> could for example add backend-local counters to them if desired.  The
> SLRU patch is the first time we've put them in shared memory, but it
> would be easy to keep only the things that the tranche needs to point
> to in shared memory and put the tranche itself back in each backend,
> which I tend to think is what we should do.
>

We keep limited number of LWLocks in base shared memory, why not keep
their thanches in shared memory too? Other tranches can be in local
memory, we just have to save somewhere highest id of these tranches.

We have not so many of possible builtin tranches, excluding the SLRU
tranches, there will be only seven: three for buffer manager, one for
each of proc.c, slot.c, lock manager and predicate lock manager.

-- 
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning

2015-11-20 Thread Alvaro Herrera
Amit Langote wrote:
> On Fri, Nov 20, 2015 at 7:20 PM, Simon Riggs  wrote:

> > Drop it?? I think he means "in this initial patch", right Amit L ?
> 
> Yes, there was some notion of multi-level partitioning in the earlier
> patch but I removed it from the version I posted on Oct 30. I do
> intend to re-introduce it though.

I'm with Simon.  In my own experience, it's crazy to try to introduce a
huge feature such as this one in a single enormous commit.  The last
patch you posted was 300 kb without any SGML changes.

The way parallel query is being introduced is a good example to follow
(or logical multi-master replication, for that matter) --- one
infrastructure piece at a time.

Let's build incrementally.  

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning

2015-11-20 Thread Simon Riggs
On 20 November 2015 at 09:18, Amit Langote 
wrote:

> On 2015/11/06 1:29, Robert Haas wrote:
> > On Fri, Oct 30, 2015 at 6:08 AM, Amit Langote
> >  wrote:
> >> The DDL and catalogs part are not much different from what I had last
> >> described though I took a few steps to simplify things. I dropped the
> >> multi-level partitioning bit
> >
> > Hmm, that doesn't sound good to me.  I think multi-level partitioning
> > is a reasonably important use case.
>
> I agree. I'm in the process of reformulating this proposal from the
> syntax, catalog and DDL -centric perspective and will re-incorporate
> multi-level partitioning notion into it. It was a mistake to drop it.
>

Drop it?? I think he means "in this initial patch", right Amit L ?

I don't really understand why parallel query was pursued in small pieces,
but partitioning needs to happen all in one huge patch. Wishing too many
things is going to slow down this feature.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Declarative partitioning

2015-11-20 Thread Amit Langote
On Fri, Nov 20, 2015 at 7:20 PM, Simon Riggs  wrote:
> On 20 November 2015 at 09:18, Amit Langote 
> wrote:
>>
>> On 2015/11/06 1:29, Robert Haas wrote:
>> > On Fri, Oct 30, 2015 at 6:08 AM, Amit Langote
>> >  wrote:
>> >> The DDL and catalogs part are not much different from what I had last
>> >> described though I took a few steps to simplify things. I dropped the
>> >> multi-level partitioning bit
>> >
>> > Hmm, that doesn't sound good to me.  I think multi-level partitioning
>> > is a reasonably important use case.
>>
>> I agree. I'm in the process of reformulating this proposal from the
>> syntax, catalog and DDL -centric perspective and will re-incorporate
>> multi-level partitioning notion into it. It was a mistake to drop it.
>
>
> Drop it?? I think he means "in this initial patch", right Amit L ?

Yes, there was some notion of multi-level partitioning in the earlier
patch but I removed it from the version I posted on Oct 30. I do
intend to re-introduce it though.

Thanks,
Amit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Proposal] Table partition + join pushdown

2015-11-20 Thread Taiki Kondo
Hello, KaiGai-san.

Thank you for your reply, and sorry for late response.

I created v3 patch for this feature, and v1 patch for regression tests.
Please find attached.

Reply for your comments is below.

> Overall comments
> 
> * I think the enhancement in copyfuncs.c shall be in the separate
>   patch; it is more graceful manner. At this moment, here is less
>   than 20 Path delivered type definition. It is much easier works
>   than entire Plan node support as we did recently.
>   (How about other folk's opinion?)

I also would like to wait for other fork's opinion.
So I don't divide this part from this patch yet.


> * Can you integrate the attached test cases as regression test?
>   It is more generic way, and allows people to detect problems
>   if relevant feature gets troubled in the future updates.

Ok, done. Please find attached.

> * Naming of "join pushdown" is a bit misleading because other
>   component also uses this term, but different purpose.
>   I'd like to suggest try_pullup_append_across_join.
>   Any ideas from native English speaker?

Thank you for your suggestion.

I change its name to "try_append_pullup_accross_join",
which is matched with the order of the previous name.

However, this change is just temporary.
I also would like to wait for other fork's opinion
for the naming.


> Patch review
> 
> 
> At try_join_pushdown:
> +   /* When specified outer path is not an AppendPath, nothing to do here. */
> +   if (!IsA(outer_rel->cheapest_total_path, AppendPath))
> +   {
> +   elog(DEBUG1, "Outer path is not an AppendPath. Do nothing.");
> +   return;
> +   }
> It checks whether the cheapest_total_path is AppendPath at the head
> of this function. It ought to be a loop to walk on the pathlist of
> RelOptInfo, because multiple path-nodes might be still alive
> but also might not be cheapest_total_path.


Ok, done.

> +   switch (inner_rel->cheapest_total_path->pathtype)
> +
> Also, we can construct the new Append node if one of the path-node
> within pathlist of inner_rel are at least supported.

Done.
But, this change will create nested loop between inner_rel's pathlist
and outer_rel's pathlist. It means that planning time is increased more.

I think it is adequate to check only for cheapest_total_path
because checking only for cheapest_total_path is implemented in other
parts, like make_join_rel().

How about your (and also other people's) opinion?


> +   if (list_length(inner_rel->ppilist) > 0)
> +   {
> +   elog(DEBUG1, "ParamPathInfo is already set in inner_rel. Can't 
> pushdown.");
> +   return;
> +   }
> +
> You may need to introduce why this feature uses ParamPathInfos here.
> It seems to me a good hack to attach additional qualifiers on
> the underlying inner scan node, even if it is not a direct child of
> inner relation.
> However, people may have different opinion.

Ok, added comment in source.
Please find from attached patch.

> +static List *
> +convert_parent_joinclauses_to_child(PlannerInfo *root, List *join_clauses,
> +   RelOptInfo *outer_rel) {
> +   Index   parent_relid =
> +   find_childrel_appendrelinfo(root, 
> outer_rel)->parent_relid;
> +   List*clauses_parent = get_actual_clauses(join_clauses);
> +   List*clauses_child = NIL;
> +   ListCell*lc;
> +
> +   foreach(lc, clauses_parent)
> +   {
> +   Node*one_clause_child = (Node *) copyObject(lfirst(lc));
> +
> +   ChangeVarNodes(one_clause_child, parent_relid, outer_rel->relid, 0);
> +   clauses_child = lappend(clauses_child, one_clause_child);
> +   }
> +
> +   return make_restrictinfos_from_actual_clauses(root, clauses_child); 
> +}
> 
> Is ChangeVarNodes() right routine to replace var-node of parent relation
> by relevant var-node of child relation?
> It may look sufficient, however, nobody can ensure varattno of child
> relation is identical with parent relation's one.
> For example, which attribute number shall be assigned on 'z' here?
>   CREATE TABLE tbl_parent(x int);
>   CREATE TABLE tbl_child(y int) INHERITS(tbl_parent);
>   ALTER TABLE tbl_parent ADD COLUMN z int;

Maybe you're right, so I agree with you.
I use adjust_appendrel_attrs() instead of ChangeVarNodes()
for this purpose.


> --- a/src/backend/optimizer/plan/createplan.c
> +++ b/src/backend/optimizer/plan/createplan.c
> @@ -4230,8 +4230,14 @@ prepare_sort_from_pathkeys(PlannerInfo *root, Plan 
> *lefttree, List *pathkeys,
> /*
>  * Ignore child members unless they match the rel being
>  * sorted.
> +*
> +* If this is called from make_sort_from_pathkeys(),
> +* relids may be NULL. In this case, we must not ignore child
> +* members because inner/outer plan of pushed-down merge join 
> is
> +* always child table.
>  */
> -   if (em->em_is_child &&
> + 

Re: [HACKERS] Getting sorted data from foreign server for merge join

2015-11-20 Thread Ashutosh Bapat
On Thu, Nov 19, 2015 at 7:30 AM, Robert Haas  wrote:

> On Tue, Nov 17, 2015 at 8:33 AM, Ashutosh Bapat
>  wrote:
> >> Although I'm usually on the side of marking things as extern whenever
> >> we find it convenient, I'm nervous about doing that to
> >> make_canonical_pathkey(), because it has side effects.  Searching the
> >> list of canonical pathkeys for the one we need is reasonable, but is
> >> it really right to ever think that we might create a new one at this
> >> stage?  Maybe it is, but if so I'd like to hear a good explanation as
> >> to why.
> >
> > For a foreign table no pathkeys are created before creating paths for
> > individual foreign table since there are no indexes. For a regular table,
> > the pathkeys get created for useful indexes.
>
> OK.
>
> Could some portion of the second foreach loop inside
> get_useful_pathkeys_for_relation be replaced by a call to
> eclass_useful_for_merging?  The logic looks very similar.
>

Yes. I have incorporated this change in the patch attached.


>
> More broadly, I'm wondering about the asymmetries between this code
> and pathkeys_useful_for_merging().  The latter has a
> right_merge_direction() test and a case for non-EC-derivable join
> clauses that aren't present in your code.

I wonder if we could just
> generate pathkeys speculatively and then test which ones survive
> truncate_useless_pathkeys(), or something like that.  This isn't an
> enormously complicated patch, but it duplicates more of the logic in
> pathkeys.c than I'd like.
>


pathkeys_useful_for_merging(), truncate_useless_pathkeys() and host of
functions in that area are crafted to assess the usefulness of given
pathkeys which for regular tables are "speculated" from indexes on that
table. Foreign tables do not have indexes and neither we have information
about the indexes (if any) on foreign server, thus pathkeys to be assessed
are not readily available. Hence we need some other way of "speculating"
pathkeys for foreign tables. We can not just generate pathkeys because
there are infinite possible expressions on which pathkeys can be built. So,
we hunt for the expressions at various places like root_pathkeys, merge
joinable clauses etc. The seeming duplication of code is because the places
where we are hunting for useful pathkeys in case of foreign table are same
as the places used for assessing usefulness for pathkeys in case of regular
table. Thus in case of foreign tables, we always generate useful pathkeys,
which do not need any further assessment. For regular tables we have set of
pathkeys which need to be assessed. Because of this difference the code
differs in details.

right_merge_direction() compares whether a given pathkey (readily available
or derived from an index) has the same direction as required by
root->query_pathkeys or ascending direction. For foreign tables, the
pathkeys are themselves created from root->query_pathkeys, thus doesn't
require this assessment. The patch creates pathkeys other than those from
root->query_pathkeys in the ascending order (like other places in the code
eg. select_outer_pathkeys_for_merge()), which is what
right_merge_direction() assesses. So again we don't need to call
right_merge_direction().

non-EC-derivable join is interesting. Thanks for bringing it out. These are
mergejoinable clauses in an OUTER join, where columns from inner side can
be NULL, and thus not exactly equal. I will incorporate that change in the
patch. Again while doing so, we have to just pick up the right or left
equivalence class from a given RestrictInfo and don't need to assess it
further like pathkeys_useful_for_merging().

Added this change in the attached patch.

>
> I'm inclined to think that we shouldn't consider pathkeys that might
> be useful for merge-joining unless we're using remote estimates.  It
> seems more speculative than pushing down a toplevel sort.
>

I agree with you but let me elaborate why I agree. The pathkeys we are
generating are heauristic in nature and thus may not be useful in the same
sense as pathkeys for regular tables. If use_remote_estimate is ON, the
planner will spend time in explaining multiple queries, but it will be in
better position to cost the usage. If use_remote_estimate is OFF, we will
altogether loose chances of merge join, which doesn't look good to me. But
a speculative costing done in this case can result in choosing wrong plan
similar to the same case as toplevel sort. But then choosing a wrong join
has more serious implications than estimating wrong cost for top level join.


>
> This patch seems rather light on test cases.
>
>
Added tests. Let me know if you have any specific scenario in mind.
-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 866a09b..8b05fcd 100644
--- 

Re: [HACKERS] [BUGS] BUG #13779: Inherited check constraint becomes non-inherited when related column is changed

2015-11-20 Thread Tom Lane
jan.dirk.zijls...@redwood.com writes:
> [ ALTER COLUMN TYPE leaves inherited constraints in the wrong state ]

Yeah.  After perusing this I've become convinced that ALTER TABLE's
approach to rebuilding check constraints is fundamentally misguided.
Rather than using ALTER TABLE ONLY to reconstruct a check constraint
separately for each child table, we should apply a regular ALTER TABLE
ADD CONSTRAINT once at the parent table.

Annoyingly, we already tried to fix this area once in 5ed6546c, but
that was just doubling down on the wrong basic design.  The problem
is actually visible in the test case added by that commit, if it had
occurred to us to check the inheritance status columns:

regression=# select t.relname, c.conname, c.coninhcount, c.conislocal, 
c.connoinherit
from pg_constraint c, pg_class t where c.conname like 'test_inh_check%'
and c.conrelid = t.oid;
   relname|conname | coninhcount | conislocal | 
connoinherit 
--++-++--
 test_inh_check   | test_inh_check_a_check |   0 | t  | f
 test_inh_check_child | test_inh_check_a_check |   1 | f  | f
(2 rows)

regression=# ALTER TABLE test_inh_check ALTER COLUMN a TYPE numeric;
ALTER TABLE
regression=# select t.relname, c.conname, c.coninhcount, c.conislocal, 
c.connoinherit
from pg_constraint c, pg_class t where c.conname like 'test_inh_check%'
and c.conrelid = t.oid;
   relname|conname | coninhcount | conislocal | 
connoinherit 
--++-++--
 test_inh_check   | test_inh_check_a_check |   0 | t  | f
 test_inh_check_child | test_inh_check_a_check |   0 | t  | f
(2 rows)

Barring objections I'll go try to fix it by removing the "ONLY"
and then suppressing generation of new work queue entries
for inherited child constraints.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Add pg_audit, an auditing extension

2015-11-20 Thread David Steele

Hi Thom,

On 11/18/15 8:54 AM, Thom Brown wrote:

On 10 June 2015 at 14:41, Noah Misch  wrote:

On Tue, Jun 09, 2015 at 03:54:59PM -0400, David Steele wrote:

I've certainly had quite the experience as a first-time contributor
working on this patch.  Perhaps I bit off more than I should have and I
definitely managed to ruffle a few feathers along the way.  I learned a
lot about how the community works, both the good and the bad.  Fear not,
though, I'm not so easily discouraged and you'll definitely be hearing
more from me.


Glad to hear it.


The stated purpose of contrib is: "include porting tools, analysis
utilities, and plug-in features that are not part of the core PostgreSQL
system, mainly because they address a limited audience or are too
experimental to be part of the main source tree. This does not preclude
their usefulness."

Perhaps we should consider modifying that language, because from my
perspective pg_audit fit the description perfectly.


"What is contrib?" attracts enduring controversy; see recent thread "RFC:
Remove contrib entirely" for the latest episode.  However that discussion
concludes, that documentation passage is not too helpful as a guide to
predicting contrib patch reception.  (Most recent contrib additions had an
obvious analogy to an existing module, sidestepping the question.)


Is pg_audit being resubmitted for 9.6 at all?


A number of people have asked me about this over the last few months.  I 
would certainly be interested in resubmitting this for 9.6.


I fixed many of the issues that caused complaints at the end of the 9.5 
cycle, but there are still two remaining items I would want to address 
before resubmitting:


1) There was concern about audit messages being sent to the clients.
I'm looking at adding an option to the ereport macro to suppress sending
messages to the clients.  I'll submit that patch soon.

2) I would like make install-check to work with modules that require 
shared_preload_libraries.  My understanding is that Andrew may have 
already fixed this, but if not I'll look into it.


--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-11-20 Thread Robert Haas
On Thu, Nov 19, 2015 at 11:59 PM, Amit Kapila  wrote:
> Isn't it better to destroy the memory for readers array as that gets
> allocated
> even if there are no workers available for execution?
>
> Attached patch fixes the issue by just destroying readers array.

Well, then you're making ExecGatherShutdownWorkers() not a no-op any
more.  I'll go commit a combination of your two patches.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Additional role attributes && superuser review

2015-11-20 Thread Stephen Frost
* Michael Paquier (michael.paqu...@gmail.com) wrote:
> On Thu, Nov 19, 2015 at 7:10 AM, Stephen Frost wrote:
> > * Michael Paquier (michael.paqu...@gmail.com) wrote:
> >> It seems weird to not have a dedicated role for pg_switch_xlog.
> >
> > I didn't add a pg_switch_xlog default role in this patch series, but
> > would be happy to do so if that's the consensus.  It's quite easy to do.
> 
> Agreed. I am not actually getting why that's part of the backup
> actually. That would be more related to archiving, both being
> unrelated concepts. But at this point I guess that's mainly a
> philosophical split.

As David notes, they're actually quite related.  Note that in our
documentation pg_switch_xlog() is listed in the "Backup Control
Functions" table.

I can think of a use-case for a user who can call pg_switch_xlog, but
not pg_start_backup()/pg_stop_backup(), but I have to admit that it
seems rather limited and I'm on the fence about it being a worthwhile
distinction.

Even so, in the interest of having more fine-grained permission
controls, I've gone ahead and added a pg_switch_xlog default role.
Note that this means that pg_switch_xlog() can be called by both
pg_switch_xlog roles and pg_backup roles.  I'd be very much against
removing the ability to call pg_switch_xlog from the pg_backup role as
that really is a capability which is needed by users running backups and
it'd just add unnecessary complexity to require users setting up backup
tools to grant two different roles to get the backup to work.

> > I'm a bit confused- this is a separate change.  Note that the previous
> > patch was actually a git patchset which had two patches- one to do the
> > reservation and a second to add the default roles.  The attached
> > patchset is actually three patches:
> > 1- Update to catalog documentation regarding permissions
> > 2- Reserve 'pg_' role namespace
> > 3- Add default roles
> 
> I see, that's why I got confused. I just downloaded your file and
> applied it blindly with git apply or patch (don't recall which). Other
> folks tend to publish that as a separate set of files generated by
> git-format-patch.

The file was generated using 'git format-patch', but sent to one file
instead of multiple.  'git am' understands that format and will add the
independent commits to the current branch.  Note that the git
documentation (see 'man git-format-patch' and 'man git-apply', at least
on Debian-based systems) specifically recommends using 'git am' to
create commits from patches generated by 'git format-patch'.

The workflow you're describing would require saving off each patch,
doing a 'patch' or 'git apply' run for each, then committing the changes
with your own commit message and only then would you have the entire
series.  That doesn't seem like a better approach.

> >> Shouldn't pg_current_xlog_location, pg_current_xlog_insert_location,
> >> pg_last_xlog_receive_location and pg_last_xlog_replay_location fall under a
> >> restriction category like pg_monitor? I recall of course that we discussed
> >> that some months ago and that a lot of people were reluctant to harden this
> >> area to not break any existing monitoring tool, still this patch may be the
> >> occasion to restrict a bit those functions, particularly on servers where
> >> wal_compression is enabled.
> >
> > For my 2c, I believe they should.  I've not modified them in that way in
> > this patchset, but I can certainly do so if others agree.
> 
> My vote goes into hardening them a bit this way.

I've made those changes in this patch set.  Note that these functions
can now only be called by roles which are members of pg_monitor,
pg_backup, or pg_switch_xlog (or superuser, of course).

> > I've added regression tests for the default roles where it was
> > relatively straight-forward to do so.  I don't see a trivial way to test
> > that the pg_signal_backend role works though, as an example.
> 
> It is at least possible to check that the error code paths are
> working. 

Perhaps, but...

> For the code paths where a backend would be actually running,
> you could use the following:
> SET client_min_messages TO 'error';
> -- This should return "1" and not an ERROR nor a WARNING if PID does not 
> exist.
> select count(pg_terminate_backend(0));
> But that's ugly depending on your taste.

I really dislike that.

> Do you think that it makes sense to add tests as well in the second
> patch to check that restrictions pg_* are in place? Except that I
> don't have much more to say about patches 1 and 2 which are rather
> straight-forward.

Ah, yes.  I've now moved those hunks to the second patch where they
really should have been.

> Regarding patch 3, the documentation of psql should mention the new
> subommands \dgS and \dgS+. That's a one-liner.

Ah, right.  I've updated the psql SGML documentation for \duS and \dgS.
The '\h' output had already been updated.  Was there something else
which you meant above that I've missed?  Note that these fixes went into
the 

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-11-20 Thread Robert Haas
On Thu, Nov 19, 2015 at 2:18 AM, Amit Langote
 wrote:
> As someone pointed out upthread, the final heap truncate phase can take
> arbitrarily long and is outside the scope of lazy_scan_heap() to
> instrument. Perhaps a bool, say, waiting_heap_trunc could be reported for
> the same. Note that, it would have to be reported from lazy_vacuum_rel().

I don't think reporting booleans is a very good idea.  It's better to
report that some other way, like use one of the strings to report a
"phase" of processing that we're currently performing.

> IMHO, float progress parameters (st_progress_param_float[]) can be taken
> out. They are currently unused and it's unlikely that some command would
> want to report them.

If they are not used, they shouldn't be included in this patch, but we
should be open to adding them later if it proves useful.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN pending list clean up exposure to SQL

2015-11-20 Thread Jaime Casanova
On 19 November 2015 at 14:57, Jaime Casanova
 wrote:
> On 19 November 2015 at 14:47, Jaime Casanova
>  wrote:
>> On 19 November 2015 at 14:18, Alvaro Herrera  
>> wrote:
>>> Alvaro Herrera wrote:
 Jeff Janes wrote:
 > I've written a function which allows users to clean up the pending list.
 > It takes the index name and returns the number of pending list pages
 > deleted.

 I just noticed that your patch uses AccessShareLock on the index.  Is
 that okay?  I would have assumed that you'd need ShareUpdateExclusive
 (same as vacuum uses), but I don't really know.  Was that a carefully
 thought-out choice?
>>>
>>> After reading gitPendingCleanup it becomes clear that there's no need
>>> for a stronger lock than what you've chosen.  Jaime Casanova just
>>> pointed this out to me.
>>>
>>
>> But it should do some checks, no?
>> - only superusers?
>> - what i received as parameter is a GIN index?
>>
>
> I just notice this:
>
> +   ginInsertCleanup(, true, );
>
> ginInsertCleanup() now has four parameters, so you should update the call
>

Btw, this is not in the commitfest and seems like a useful thing to have

-- 
Jaime Casanova  www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using quicksort for every external sort run

2015-11-20 Thread Robert Haas
On Thu, Nov 19, 2015 at 5:42 PM, Peter Geoghegan  wrote:
> I would like more opinions on the multipass_warning message. I can
> write a patch that creates a new system view, detailing how sort were
> completed, if there is demand.

I think a warning message is a terrible idea, and a system view is a
needless complication.  If the patch is as fast or faster than what we
have now in all cases, then we should adopt it (assuming it's also
correct and well-commented and all that other good stuff).  If it's
not, then we need to analyze the cases where it's slower and decide
whether they are significant enough to care about.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: CustomScan in a larger structure (RE: [HACKERS] CustomScan support on readfuncs.c)

2015-11-20 Thread Robert Haas
On Thu, Nov 19, 2015 at 10:11 PM, Kouhei Kaigai  wrote:
> The above two points are for the case if and when extension want to use
> variable length fields for its private fields.
> So, nodeAlloc() callback is not a perfect answer for the use case because
> length of the variable length fields shall be (usually) determined by the
> value of another fields (like num_inner_rels, num_gpu_devices, ...) thus
> we cannot determine the correct length before read.
>
> Let's assume an custom-scan extension that wants to have:
>
>   typedef struct {
>   CustomScancscan;
>   int   priv_value_1;
>   long  priv_value_2;
>   extra_info_t  extra_subplan_info[FLEXIBLE_ARRAY_MEMBER];
>   /* its length equal to number of sub-plans */
>   } ExampleScan;
>
> The "extnodename" within CustomScan allows to pull callback functions
> to handle read node from the serialized format.
> However, nodeAlloc() callback cannot determine the correct length
> (= number of sub-plans in this example) prior to read 'cscan' part.
>
> So, I'd like to suggest _readCustomScan (and other extendable nodes
> also) read tokens on local CustomScan variable once, then call
>   Node *(nodeRead)(Node *local_node)
> to allocate entire ExampleScan node and read other private fields.
>
> The local_node is already filled up by the core backend prior to
> the callback invocation, so extension can know how many sub-plans
> are expected thus how many private tokens shall appear.
> It also means extension can know exact length of the ExampleScan
> node, so it can allocate the node as expected then fill up
> remaining private tokens.

On second thought, I think we should insist that nodes have to be
fixed-size.  This sounds like a mess.  If the node needs a variable
amount of storage for something, it can store a pointer to a
separately-allocated array someplace inside of it.  That's what
existing nodes do, and it works fine.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using quicksort for every external sort run

2015-11-20 Thread Robert Haas
On Thu, Nov 19, 2015 at 5:53 PM, Peter Geoghegan  wrote:
> I'll now talk about my patch series in general -- the actual
> consequences of not avoiding a single pass merge phase when the master
> branch would have done so.

That's what I was asking about.  It seemed to me that you were saying
we could ignore those cases, which doesn't seem to me to be true.

> The latter 16MB work_mem example query/table is still faster with a
> 12MB work_mem than master, even with multiple passes. Quite a bit
> faster, in fact: about 37 seconds on master, to about 24.7 seconds
> with the patches (same for higher settings short of 16MB).

Is this because we save enough by quicksorting rather than heapsorting
to cover the cost of the additional merge phase?

If not, then why is it happening like this?

> I should point out that there is no evidence that any case has been
> regressed, let alone written off entirely or ignored. I looked. I
> probably have not been completely exhaustive, and I'd be willing to
> believe there is something that I've missed, but it's still quite
> possible that there is no downside to any of this.

If that's so, it's excellent news.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-11-20 Thread Jeff Janes
On Thu, Nov 19, 2015 at 6:44 AM, Masahiko Sawada  wrote:
> On Thu, Nov 19, 2015 at 5:54 AM, Jeff Janes  wrote:
>> On Wed, Nov 18, 2015 at 11:18 AM, Jeff Janes  wrote:
>>>
>>> I get an error when running pg_upgrade from 9.4 to 9.6-this
>>>
>>> error while copying relation "mediawiki.archive"
>>> ("/tmp/data/base/16414/21043_vm" to
>>> "/tmp/data_fm/base/16400/21043_vm"): No such file or directory
>>
>> OK, so the problem seems to be that rewriteVisibilitymap can get
>> called with errno already set to a nonzero value.
>>
>> It never clears it, and then fails at the end despite that no error
>> has actually occurred.
>>
>> Just setting it to 0 at the top of the function seems to be correct
>> thing to do.  Or does it need to save the old value and restore it?
>
> Thank you for testing!
> I think that the former is better, so attached latest patch.
>
>> But now when I want to do the upgrade faster, I run into this:
>>
>> "This utility cannot upgrade from PostgreSQL version from 9.5 or
>> before to 9.6 or later with link mode."
>>
>> Is this really an acceptable a tradeoff?  Surely we can arrange to
>> link everything else and rewrite just the _vm, which is a tiny portion
>> of the data directory.  I don't think that -k promises to link
>> everything it possibly can.
>
> I agree.
> I've changed the patch so that.
> pg_upgarde creates new _vm file and rewrites it even if upgrading to
> 9.6 with link mode.


The rewrite code thinks that only the first page of a vm has a header
of size SizeOfPageHeaderData, and the rest of the pages have a zero
size header.  So the resulting _vm is corrupt.

After pg_upgrade, doing a vacuum freeze verbose gives:


WARNING:  invalid page in block 1 of relation base/16402/22430_vm;
zeroing out page
WARNING:  invalid page in block 1 of relation base/16402/22430_vm;
zeroing out page
WARNING:  invalid page in block 2 of relation base/16402/22430_vm;
zeroing out page
WARNING:  invalid page in block 2 of relation base/16402/22430_vm;
zeroing out page
WARNING:  invalid page in block 3 of relation base/16402/22430_vm;
zeroing out page
WARNING:  invalid page in block 3 of relation base/16402/22430_vm;
zeroing out page
WARNING:  invalid page in block 4 of relation base/16402/22430_vm;
zeroing out page
WARNING:  invalid page in block 4 of relation base/16402/22430_vm;
zeroing out page
WARNING:  invalid page in block 5 of relation base/16402/22430_vm;
zeroing out page
WARNING:  invalid page in block 5 of relation base/16402/22430_vm;
zeroing out page
WARNING:  invalid page in block 6 of relation base/16402/22430_vm;
zeroing out page
WARNING:  invalid page in block 6 of relation base/16402/22430_vm;
zeroing out page
WARNING:  invalid page in block 7 of relation base/16402/22430_vm;
zeroing out page
WARNING:  invalid page in block 7 of relation base/16402/22430_vm;
zeroing out page
WARNING:  invalid page in block 8 of relation base/16402/22430_vm;
zeroing out page
WARNING:  invalid page in block 8 of relation base/16402/22430_vm;
zeroing out page

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY (INSERT/UPDATE/DELETE .. RETURNING ..)

2015-11-20 Thread Marko Tiikkaja

On 11/19/15 7:39 PM, Michael Paquier wrote:

On Thu, Nov 19, 2015 at 9:22 PM, Marko Tiikkaja  wrote:

Of course, something might break if we added a new statement type which
supported RETURNING, but I'm really not worried about that.  I'm not dead
set against adding some Assert(IsA()) calls here, but I don't see the point.


gram.y has a long comment before select_no_parens regarding why we
shouldn't do it this way, did you notice it?


It talks about not doing  '(' SelectStmt ')'  "in the expression 
grammar".  I don't see what that has to do with this patch.



.m


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning

2015-11-20 Thread Amit Langote
On 2015/11/06 1:29, Robert Haas wrote:
> On Fri, Oct 30, 2015 at 6:08 AM, Amit Langote
>  wrote:
>> The DDL and catalogs part are not much different from what I had last
>> described though I took a few steps to simplify things. I dropped the
>> multi-level partitioning bit
> 
> Hmm, that doesn't sound good to me.  I think multi-level partitioning
> is a reasonably important use case.

I agree. I'm in the process of reformulating this proposal from the
syntax, catalog and DDL -centric perspective and will re-incorporate
multi-level partitioning notion into it. It was a mistake to drop it.

I am thinking of introducing an explicit notion of sub-partition key and
sub-partitions (of the top parent as far as syntactic notation is
concerned). I guess it would not be unreasonable to think that most
use-cases that multi-level partitioning is used for require at most 2
levels. It will enable us to use a more intuitive syntax and make
internals easier to manage.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using quicksort for every external sort run

2015-11-20 Thread Peter Geoghegan
On Fri, Nov 20, 2015 at 12:50 PM, Robert Haas  wrote:
> On Thu, Nov 19, 2015 at 5:42 PM, Peter Geoghegan  wrote:
>> I would like more opinions on the multipass_warning message. I can
>> write a patch that creates a new system view, detailing how sort were
>> completed, if there is demand.
>
> I think a warning message is a terrible idea, and a system view is a
> needless complication.  If the patch is as fast or faster than what we
> have now in all cases, then we should adopt it (assuming it's also
> correct and well-commented and all that other good stuff).  If it's
> not, then we need to analyze the cases where it's slower and decide
> whether they are significant enough to care about.

Maybe I was mistaken to link the idea to this patch, but I think it
(or something involving a view) is a good idea. I linked it to the
patch because the patch makes it slightly more important than before.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: CustomScan in a larger structure (RE: [HACKERS] CustomScan support on readfuncs.c)

2015-11-20 Thread Kouhei Kaigai
> On Thu, Nov 19, 2015 at 10:11 PM, Kouhei Kaigai  wrote:
> > The above two points are for the case if and when extension want to use
> > variable length fields for its private fields.
> > So, nodeAlloc() callback is not a perfect answer for the use case because
> > length of the variable length fields shall be (usually) determined by the
> > value of another fields (like num_inner_rels, num_gpu_devices, ...) thus
> > we cannot determine the correct length before read.
> >
> > Let's assume an custom-scan extension that wants to have:
> >
> >   typedef struct {
> >   CustomScancscan;
> >   int   priv_value_1;
> >   long  priv_value_2;
> >   extra_info_t  extra_subplan_info[FLEXIBLE_ARRAY_MEMBER];
> >   /* its length equal to number of sub-plans */
> >   } ExampleScan;
> >
> > The "extnodename" within CustomScan allows to pull callback functions
> > to handle read node from the serialized format.
> > However, nodeAlloc() callback cannot determine the correct length
> > (= number of sub-plans in this example) prior to read 'cscan' part.
> >
> > So, I'd like to suggest _readCustomScan (and other extendable nodes
> > also) read tokens on local CustomScan variable once, then call
> >   Node *(nodeRead)(Node *local_node)
> > to allocate entire ExampleScan node and read other private fields.
> >
> > The local_node is already filled up by the core backend prior to
> > the callback invocation, so extension can know how many sub-plans
> > are expected thus how many private tokens shall appear.
> > It also means extension can know exact length of the ExampleScan
> > node, so it can allocate the node as expected then fill up
> > remaining private tokens.
> 
> On second thought, I think we should insist that nodes have to be
> fixed-size.  This sounds like a mess.  If the node needs a variable
> amount of storage for something, it can store a pointer to a
> separately-allocated array someplace inside of it.  That's what
> existing nodes do, and it works fine.
>
OK, let's have "node_size" instead of nodeAlloc callback and other
stuffs.

Please wait for a patch.
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using quicksort for every external sort run

2015-11-20 Thread Peter Geoghegan
On Fri, Nov 20, 2015 at 12:52 PM, Robert Haas  wrote:
> That's what I was asking about.  It seemed to me that you were saying
> we could ignore those cases, which doesn't seem to me to be true.

I've been around for long enough to know that there are very few cases
that can be ignored.  :-)

>> The latter 16MB work_mem example query/table is still faster with a
>> 12MB work_mem than master, even with multiple passes. Quite a bit
>> faster, in fact: about 37 seconds on master, to about 24.7 seconds
>> with the patches (same for higher settings short of 16MB).
>
> Is this because we save enough by quicksorting rather than heapsorting
> to cover the cost of the additional merge phase?
>
> If not, then why is it happening like this?

I think it's because of caching effects alone, but I am not 100% sure
of that. I concede that it might not be enough to make up for the
additional I/O on some systems or platforms. The fact remains,
however, that the patch was faster on the unsympathetic case I ran on
the machine I had available (which has an SSD), and that I really have
not managed to find a case that is regressed after some effort.

>> I should point out that there is no evidence that any case has been
>> regressed, let alone written off entirely or ignored. I looked. I
>> probably have not been completely exhaustive, and I'd be willing to
>> believe there is something that I've missed, but it's still quite
>> possible that there is no downside to any of this.
>
> If that's so, it's excellent news.

As I mentioned up-thread, maybe I shouldn't have brought all the
theoretical justifications for killing replacement selection into the
discussion so early. Those observations on replacement selection
(which are not my own original insights) happen to be what spurred
this work. I spent so much time talking about how irrelevant
multi-pass merging was that people imagined that that was severely
regressed, when it really was not. That just happened to be the way I
came at the problem.

The numbers speak for themselves here. I just want to be clear about
the disadvantages of what I propose, even if it's well worth it
overall in most (all?) cases.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RLS open items are vague and unactionable

2015-11-20 Thread Noah Misch
On Mon, Sep 28, 2015 at 03:03:51PM -0400, Stephen Frost wrote:
> If SELECT rights are required then apply the SELECT policies, even if
> the actual command is an UPDATE or DELETE.  This covers the RETURNING
> case which was discussed previously, so we don't need the explicit check
> for that, and further addresses the concern raised by Zhaomo about
> someone abusing the WHERE clause in an UPDATE or DELETE.
> 
> Further, if UPDATE rights are required then apply the UPDATE policies,
> even if the actual command is a SELECT.  This addresses the concern that
> a user might be able to lock rows they're not actually allowed to UPDATE
> through the UPDATE policies.
> 
> Comments welcome, of course.  Barring concerns, I'll get this pushed
> tomorrow.

The CREATE POLICY reference page continues to describe the behavior this patch
replaced, not today's behavior.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] count_nulls(VARIADIC "any")

2015-11-20 Thread Marko Tiikkaja

On 2015-11-21 06:02, I wrote:

Here's a patch implementing this under the name num_nulls().  For
January's CF, of course.


I forgot to update the some references in the documentation.  Fixed in 
v3, attached.



.m
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
***
*** 182,188 

  

!Comparison Operators
  
 
  comparison
--- 182,188 

  

!Comparison Functions and Operators
  
 
  comparison
***
*** 191,200 
  
 
  The usual comparison operators are available, shown in .
 
  
!
  Comparison Operators
  
   
--- 191,200 
  
 
  The usual comparison operators are available, shown in .
 
  
!
  Comparison Operators
  
   
***
*** 437,442 
--- 437,470 
 
  -->
  
+   
+ Comparison Functions
+ 
+  
+   
+Function
+Description
+Example
+Example Result
+   
+  
+  
+   
+
+  
+   num_nulls
+  
+  num_nulls(VARIADIC "any")
+
+Returns the number of NULL input arguments
+num_nulls(1, NULL, 2)
+1
+   
+  
+ 
+
+ 
+ 

  

***
*** 10307,10313  table2-mapping


 The standard comparison operators shown in   are available for
 jsonb, but not for json. They follow the
 ordering rules for B-tree operations outlined at .
--- 10335,10341 


 The standard comparison operators shown in   are available for
 jsonb, but not for json. They follow the
 ordering rules for B-tree operations outlined at .
*** a/src/backend/utils/adt/misc.c
--- b/src/backend/utils/adt/misc.c
***
*** 45,50 
--- 45,118 
  
  
  /*
+  * num_nulls()
+  *  Count the number of NULL input arguments
+  */
+ Datum
+ pg_num_nulls(PG_FUNCTION_ARGS)
+ {
+ 	int32 count = 0;
+ 	int i;
+ 
+ 	if (get_fn_expr_variadic(fcinfo->flinfo))
+ 	{
+ 		ArrayType  *arr;
+ 		int ndims, nitems, *dims;
+ 		bits8 *bitmap;
+ 		int bitmask;
+ 
+ 		/* Should have just the one argument */
+ 		Assert(PG_NARGS() == 1);
+ 
+ 		/* num_nulls(VARIADIC NULL) is defined as NULL */
+ 		if (PG_ARGISNULL(0))
+ 			PG_RETURN_NULL();
+ 
+ 		/*
+ 		 * Non-null argument had better be an array.  We assume that any call
+ 		 * context that could let get_fn_expr_variadic return true will have
+ 		 * checked that a VARIADIC-labeled parameter actually is an array.  So
+ 		 * it should be okay to just Assert that it's an array rather than
+ 		 * doing a full-fledged error check.
+ 		 */
+ 		Assert(OidIsValid(get_base_element_type(get_fn_expr_argtype(fcinfo->flinfo, 0;
+ 
+ 		/* OK, safe to fetch the array value */
+ 		arr = PG_GETARG_ARRAYTYPE_P(0);
+ 
+ 		ndims = ARR_NDIM(arr);
+ 		dims = ARR_DIMS(arr);
+ 		nitems = ArrayGetNItems(ndims, dims);
+ 
+ 		bitmap = ARR_NULLBITMAP(arr);
+ 		if (!bitmap)
+ 			PG_RETURN_INT32(0);
+ 		bitmask = 1;
+ 
+ 		for (i = 0; i < nitems; i++)
+ 		{
+ 			if ((*bitmap & bitmask) == 0)
+ count++;
+ 
+ 			bitmask <<= 1;
+ 			if (bitmask == 0x100)
+ 			{
+ bitmap++;
+ bitmask = 1;
+ 			}
+ 		}
+ 		PG_RETURN_INT32(count);
+ 	}
+ 
+ 	for (i = 0; i < PG_NARGS(); i++)
+ 	{
+ 		if (PG_ARGISNULL(i))
+ 			count++;
+ 	}
+ 	PG_RETURN_INT32(count);
+ }
+ 
+ /*
   * current_database()
   *	Expose the current database to the user
   */
*** a/src/include/catalog/pg_proc.h
--- b/src/include/catalog/pg_proc.h
***
*** 2963,2968  DESCR("adjust time with time zone precision");
--- 2963,2970 
  
  DATA(insert OID = 2003 (  textanycat	   PGNSP PGUID 14 1 0 0 0 f f f f t f s s 2 0 25 "25 2776" _null_ _null_ _null_ _null_ _null_ "select $1 || $2::pg_catalog.text" _null_ _null_ _null_ ));
  DATA(insert OID = 2004 (  anytextcat	   PGNSP PGUID 14 1 0 0 0 f f f f t f s s 2 0 25 "2776 25" _null_ _null_ _null_ _null_ _null_ "select $1::pg_catalog.text || $2" _null_ _null_ _null_ ));
+ DATA(insert OID = 4400 (  num_nullsPGNSP PGUID 12 1 0 2276 0 f f f f f f i s 1 0 23 "2276" "{2276}" "{v}" _null_ _null_ _null_pg_num_nulls _null_ _null_ _null_ ));
+ DESCR("count the number of NULL input arguments");
  
  DATA(insert OID = 2005 (  bytealike		   PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 16 "17 17" _null_ _null_ _null_ _null_ _null_ bytealike _null_ _null_ _null_ ));
  DATA(insert OID = 2006 (  byteanlike	   PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 16 "17 17" _null_ _null_ _null_ _null_ _null_ byteanlike _null_ _null_ _null_ ));
*** a/src/include/utils/builtins.h
--- b/src/include/utils/builtins.h
***
*** 481,486  extern Datum pg_ls_dir(PG_FUNCTION_ARGS);
--- 481,487 
  extern Datum pg_ls_dir_1arg(PG_FUNCTION_ARGS);
  
  /* misc.c */
+ extern Datum pg_num_nulls(PG_FUNCTION_ARGS);
  extern Datum current_database(PG_FUNCTION_ARGS);
  extern Datum current_query(PG_FUNCTION_ARGS);
  extern Datum pg_cancel_backend(PG_FUNCTION_ARGS);
*** 

Re: [HACKERS] count_nulls(VARIADIC "any")

2015-11-20 Thread Marko Tiikkaja

On 2015-11-21 06:06, Tom Lane wrote:

Marko Tiikkaja  writes:

Here's a patch implementing this under the name num_nulls().  For
January's CF, of course.


What's this do that "count(*) - count(x)" doesn't?


This is sort of a lateral version of count(x); the input is a list of 
expressions rather than an expression executed over a bunch of input rows.



.m


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] count_nulls(VARIADIC "any")

2015-11-20 Thread Tom Lane
Marko Tiikkaja  writes:
> Here's a patch implementing this under the name num_nulls().  For 
> January's CF, of course.

What's this do that "count(*) - count(x)" doesn't?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] CreateFunction Statement

2015-11-20 Thread Praveen M
Hi All,

I am trying to get the schema name of the create function call from the
parse tree. When I look at the structure of the CreateFunctionStmt , I do
not see the schemaname information . Can you please help me to understand
how to extract the schema name for the function.

typedef struct CreateFunctionStmt
{
NodeTag type;
bool replace; /* T => replace if already exists */
List   *funcname; /* qualified name of function to create */
List   *parameters; /* a list of FunctionParameter */
TypeName   *returnType; /* the return type */
List   *options; /* a list of DefElem */
List   *withClause; /* a list of DefElem */
} CreateFunctionStmt;


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-11-20 Thread Jim Nasby

On 11/19/15 7:29 PM, Amit Langote wrote:

Another option is to provide the means for the index scan routines to
>report their progress. Maybe every index AM won't use it, but it'd
>certainly be a lot better than staring at a long_running boolean.

The boolean would be a workaround for sure. I'm also slightly tempted by
the idea of instrumenting vacuum scans of individual index AM's bulkdelete
methods. One precedent is how vacuum_delay_point() are sprinkled around in
the code. Another problem to solve would be to figure out how to pass
progress parameters around - via some struct or could they be globals just
like VacuumCost* variables are...


It just occurred to me that we could do the instrumentation in 
lazy_tid_reaped(). It might seem bad to do in increment for every tuple 
in an index, but we're already doing a bsearch over the dead tuple list. 
Presumably that's going to be a lot more expensive than an increment 
operation.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] count_nulls(VARIADIC "any")

2015-11-20 Thread Jim Nasby

On 11/20/15 11:12 PM, Marko Tiikkaja wrote:

On 2015-11-21 06:02, I wrote:

Here's a patch implementing this under the name num_nulls().  For
January's CF, of course.


I forgot to update the some references in the documentation.  Fixed in
v3, attached.


I thought there was going to be a not-null equivalent as well? I've 
definitely wanted both variations in the past.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN pending list clean up exposure to SQL

2015-11-20 Thread Jim Nasby

On 11/19/15 10:47 AM, Jaime Casanova wrote:

- only superusers?


I would think the owner of the table (index?) should also be able to run 
this.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CreateFunction Statement

2015-11-20 Thread Pavel Stehule
Hi

2015-11-21 7:09 GMT+01:00 Praveen M :

> Hi All,
>
> I am trying to get the schema name of the create function call from the
> parse tree. When I look at the structure of the CreateFunctionStmt , I do
> not see the schemaname information . Can you please help me to understand
> how to extract the schema name for the function.
>
> typedef struct CreateFunctionStmt
> {
> NodeTag type;
> bool replace; /* T => replace if already exists */
> List   *funcname; /* qualified name of function to create */
> List   *parameters; /* a list of FunctionParameter */
> TypeName   *returnType; /* the return type */
> List   *options; /* a list of DefElem */
> List   *withClause; /* a list of DefElem */
> } CreateFunctionStmt;
>
>
The funcname field is >>list of names<<. Look on makeRangeVarFromNameList
function. It is good example.

Regards

Pavel


Re: [HACKERS] count_nulls(VARIADIC "any")

2015-11-20 Thread Marko Tiikkaja

Hello,

Here's a patch implementing this under the name num_nulls().  For 
January's CF, of course.




.m
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
***
*** 182,188 

  

!Comparison Operators
  
 
  comparison
--- 182,188 

  

!Comparison Functions and Operators
  
 
  comparison
***
*** 194,200 
  linkend="functions-comparison-table">.
 
  
!
  Comparison Operators
  
   
--- 194,200 
  linkend="functions-comparison-table">.
 
  
!
  Comparison Operators
  
   
***
*** 437,442 
--- 437,470 
 
  -->
  
+   
+ Comparison Functions
+ 
+  
+   
+Function
+Description
+Example
+Example Result
+   
+  
+  
+   
+
+  
+   num_nulls
+  
+  num_nulls(VARIADIC "any")
+
+Returns the number of NULL input arguments
+num_nulls(1, NULL, 2)
+1
+   
+  
+ 
+
+ 
+ 

  

*** a/src/backend/utils/adt/misc.c
--- b/src/backend/utils/adt/misc.c
***
*** 45,50 
--- 45,118 
  
  
  /*
+  * num_nulls()
+  *  Count the number of NULL input arguments
+  */
+ Datum
+ pg_num_nulls(PG_FUNCTION_ARGS)
+ {
+ 	int32 count = 0;
+ 	int i;
+ 
+ 	if (get_fn_expr_variadic(fcinfo->flinfo))
+ 	{
+ 		ArrayType  *arr;
+ 		int ndims, nitems, *dims;
+ 		bits8 *bitmap;
+ 		int bitmask;
+ 
+ 		/* Should have just the one argument */
+ 		Assert(PG_NARGS() == 1);
+ 
+ 		/* num_nulls(VARIADIC NULL) is defined as NULL */
+ 		if (PG_ARGISNULL(0))
+ 			PG_RETURN_NULL();
+ 
+ 		/*
+ 		 * Non-null argument had better be an array.  We assume that any call
+ 		 * context that could let get_fn_expr_variadic return true will have
+ 		 * checked that a VARIADIC-labeled parameter actually is an array.  So
+ 		 * it should be okay to just Assert that it's an array rather than
+ 		 * doing a full-fledged error check.
+ 		 */
+ 		Assert(OidIsValid(get_base_element_type(get_fn_expr_argtype(fcinfo->flinfo, 0;
+ 
+ 		/* OK, safe to fetch the array value */
+ 		arr = PG_GETARG_ARRAYTYPE_P(0);
+ 
+ 		ndims = ARR_NDIM(arr);
+ 		dims = ARR_DIMS(arr);
+ 		nitems = ArrayGetNItems(ndims, dims);
+ 
+ 		bitmap = ARR_NULLBITMAP(arr);
+ 		if (!bitmap)
+ 			PG_RETURN_INT32(0);
+ 		bitmask = 1;
+ 
+ 		for (i = 0; i < nitems; i++)
+ 		{
+ 			if ((*bitmap & bitmask) == 0)
+ count++;
+ 
+ 			bitmask <<= 1;
+ 			if (bitmask == 0x100)
+ 			{
+ bitmap++;
+ bitmask = 1;
+ 			}
+ 		}
+ 		PG_RETURN_INT32(count);
+ 	}
+ 
+ 	for (i = 0; i < PG_NARGS(); i++)
+ 	{
+ 		if (PG_ARGISNULL(i))
+ 			count++;
+ 	}
+ 	PG_RETURN_INT32(count);
+ }
+ 
+ /*
   * current_database()
   *	Expose the current database to the user
   */
*** a/src/include/catalog/pg_proc.h
--- b/src/include/catalog/pg_proc.h
***
*** 2963,2968  DESCR("adjust time with time zone precision");
--- 2963,2970 
  
  DATA(insert OID = 2003 (  textanycat	   PGNSP PGUID 14 1 0 0 0 f f f f t f s s 2 0 25 "25 2776" _null_ _null_ _null_ _null_ _null_ "select $1 || $2::pg_catalog.text" _null_ _null_ _null_ ));
  DATA(insert OID = 2004 (  anytextcat	   PGNSP PGUID 14 1 0 0 0 f f f f t f s s 2 0 25 "2776 25" _null_ _null_ _null_ _null_ _null_ "select $1::pg_catalog.text || $2" _null_ _null_ _null_ ));
+ DATA(insert OID = 4400 (  num_nullsPGNSP PGUID 12 1 0 2276 0 f f f f f f i s 1 0 23 "2276" "{2276}" "{v}" _null_ _null_ _null_pg_num_nulls _null_ _null_ _null_ ));
+ DESCR("count the number of NULL input arguments");
  
  DATA(insert OID = 2005 (  bytealike		   PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 16 "17 17" _null_ _null_ _null_ _null_ _null_ bytealike _null_ _null_ _null_ ));
  DATA(insert OID = 2006 (  byteanlike	   PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 16 "17 17" _null_ _null_ _null_ _null_ _null_ byteanlike _null_ _null_ _null_ ));
*** a/src/include/utils/builtins.h
--- b/src/include/utils/builtins.h
***
*** 481,486  extern Datum pg_ls_dir(PG_FUNCTION_ARGS);
--- 481,487 
  extern Datum pg_ls_dir_1arg(PG_FUNCTION_ARGS);
  
  /* misc.c */
+ extern Datum pg_num_nulls(PG_FUNCTION_ARGS);
  extern Datum current_database(PG_FUNCTION_ARGS);
  extern Datum current_query(PG_FUNCTION_ARGS);
  extern Datum pg_cancel_backend(PG_FUNCTION_ARGS);
*** /dev/null
--- b/src/test/regress/expected/misc_functions.out
***
*** 0 
--- 1,68 
+ --
+ -- num_nulls()
+ --
+ SELECT num_nulls();
+ ERROR:  function num_nulls() does not exist
+ LINE 1: SELECT num_nulls();
+^
+ HINT:  No function matches the given name and argument types. You might need to add explicit type casts.
+ SELECT num_nulls(NULL);
+  num_nulls 
+ ---
+  1
+ (1 row)
+ 
+ SELECT num_nulls('1');
+  num_nulls 
+ ---
+  0
+ (1 row)
+ 
+ SELECT num_nulls(NULL::text);
+  num_nulls 
+ ---
+  1
+ (1 row)
+ 
+ 

Re: [HACKERS] count_nulls(VARIADIC "any")

2015-11-20 Thread Marko Tiikkaja

On 2015-11-21 06:52, Jim Nasby wrote:

On 11/20/15 11:12 PM, Marko Tiikkaja wrote:

On 2015-11-21 06:02, I wrote:

Here's a patch implementing this under the name num_nulls().  For
January's CF, of course.


I forgot to update the some references in the documentation.  Fixed in
v3, attached.


I thought there was going to be a not-null equivalent as well? I've
definitely wanted both variations in the past.


I'm not sure that's necessary.  It's quite simple to implement yourself 
using the  int - int  operator.



.m


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] custom function for converting human readable sizes to bytes

2015-11-20 Thread Pavel Stehule
Hi

I have to write some filters, and filtering by size is "unfriendly" due
calculation in bytes.

I propose inversion function to pg_size_pretty function - like
pg_human_size.

Usage:

SELECT * FROM pg_class
  WHERE pg_table_size(oid) > pg_human_size('2GB');


Ideas, comments?

Regards

Pavel


Re: [HACKERS] onlyvalue aggregate (was: First Aggregate Funtion?)

2015-11-20 Thread Marko Tiikkaja

Hi Dean,

Here's v2 of the patch.  How's this look?


.m
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
***
*** 12635,12640  NULL baz(3 rows)
--- 12635,12660 
   

 
+ single_value
+
+
+  single_value(expression)
+
+   
+   
+any type for which the equality operator has been defined
+   
+   
+same as argument type
+   
+   returns the single distinct value from the input values;
+   if more than one distinct value exists (while considering NULLs equal),
+   an exception is raised
+  
+ 
+  
+   
+
  string_agg
 
 
*** a/src/backend/utils/adt/misc.c
--- b/src/backend/utils/adt/misc.c
***
*** 40,45 
--- 40,46 
  #include "utils/acl.h"
  #include "utils/builtins.h"
  #include "utils/timestamp.h"
+ #include "utils/typcache.h"
  
  #define atooid(x)  ((Oid) strtoul((x), NULL, 10))
  
***
*** 598,600  pg_column_is_updatable(PG_FUNCTION_ARGS)
--- 599,702 
  
  	PG_RETURN_BOOL((events & REQ_EVENTS) == REQ_EVENTS);
  }
+ 
+ struct single_value_agg_stype
+ {
+ 	FunctionCallInfoData fcinfo;
+ 	Datum datum;
+ 	bool isnull;
+ };
+ 
+ Datum
+ single_value_agg_transfn(PG_FUNCTION_ARGS)
+ {
+ 	MemoryContext aggcontext;
+ 	struct single_value_agg_stype *state;
+ 
+ 	if (!AggCheckCallContext(fcinfo, ))
+ 	{
+ 		/* cannot be called directly because of internal-type argument */
+ 		elog(ERROR, "single_value_agg_transfn called in non-aggregate context");
+ 	}
+ 
+ 	if (PG_ARGISNULL(0))
+ 	{
+ 		TypeCacheEntry *typentry;
+ 		Oid arg1_typeid = get_fn_expr_argtype(fcinfo->flinfo, 1);
+ 
+ 		if (arg1_typeid == InvalidOid)
+ 			ereport(ERROR,
+ 	(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 	 errmsg("could not determine input data type")));
+ 
+ 		typentry = lookup_type_cache(arg1_typeid,
+ 	 TYPECACHE_EQ_OPR_FINFO);
+ 		if (!OidIsValid(typentry->eq_opr_finfo.fn_oid))
+ 			ereport(ERROR,
+ 	(errcode(ERRCODE_UNDEFINED_FUNCTION),
+ 			errmsg("could not identify an equality operator for type %s",
+    format_type_be(arg1_typeid;
+ 
+ 		state = (struct single_value_agg_stype *) MemoryContextAlloc(aggcontext, sizeof(struct single_value_agg_stype));
+ 
+ 		if (PG_ARGISNULL(1))
+ 		{
+ 			state->datum = (Datum) 0;
+ 			state->isnull = true;
+ 			memset(>fcinfo, 0, sizeof(state->fcinfo));
+ 		}
+ 		else
+ 		{
+ 			state->datum = PG_GETARG_DATUM(1);
+ 			state->isnull = false;
+ 			InitFunctionCallInfoData(state->fcinfo, >eq_opr_finfo, 2,
+ 	 InvalidOid, NULL, NULL);
+ 		}
+ 	}
+ 	else
+ 	{
+ 		bool oprresult;
+ 
+ 		state = (struct single_value_agg_stype *) PG_GETARG_POINTER(0);
+ 
+ 		if (state->isnull)
+ 			oprresult = PG_ARGISNULL(1);
+ 		else
+ 		{
+ 			state->fcinfo.argnull[0] = false;
+ 			state->fcinfo.argnull[1] = false;
+ 			state->fcinfo.arg[0] = state->datum;
+ 			state->fcinfo.arg[1] = PG_GETARG_DATUM(1);
+ 			state->fcinfo.isnull = false;
+ 			oprresult = DatumGetBool(FunctionCallInvoke(>fcinfo));
+ 		}
+ 		if (!oprresult)
+ 			ereport(ERROR,
+ 	(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 			errmsg("more than one distinct value passed to single_value")));
+ 	}
+ 
+ 	/*
+ 	 * The transition type for single_value() is declared to be "internal",
+ 	 * which is a pass-by-value type the same size as a pointer.  So we can
+ 	 * safely pass the pointer through nodeAgg.c's machinations.
+ 	 */
+ 	PG_RETURN_POINTER(state);
+ }
+ 
+ Datum
+ single_value_agg_finalfn(PG_FUNCTION_ARGS)
+ {
+ 	struct single_value_agg_stype *state;
+ 
+ 	if (PG_ARGISNULL(0))
+ 		PG_RETURN_NULL();
+ 
+ 	/* cannot be called directly because of internal-type argument */
+ 	Assert(AggCheckCallContext(fcinfo, NULL));
+ 
+ 	state = (struct single_value_agg_stype *) PG_GETARG_POINTER(0);
+ 	if (state->isnull)
+ 		PG_RETURN_NULL();
+ 	PG_RETURN_DATUM(state->datum);
+ }
*** a/src/include/catalog/pg_aggregate.h
--- b/src/include/catalog/pg_aggregate.h
***
*** 292,297  DATA(insert ( 3197	n 0 json_object_agg_transfn json_object_agg_finalfn --
--- 292,300 
  DATA(insert ( 3267	n 0 jsonb_agg_transfn	jsonb_agg_finalfn			---f f 0	2281	0	0		0	_null_ _null_ ));
  DATA(insert ( 3270	n 0 jsonb_object_agg_transfn jsonb_object_agg_finalfn ---f f 0	2281	0	0		0	_null_ _null_ ));
  
+ /* single_value */
+ DATA(insert ( 4202 n 0 single_value_agg_transfnsingle_value_agg_finalfn-   -   -   t f 0   22810   0   0   _null_ _null_ ));
+ 
  /* ordered-set and hypothetical-set aggregates */
  DATA(insert ( 3972	o 1 ordered_set_transition			percentile_disc_final	-		-		-		t f 0	2281	0	0		0	_null_ _null_ ));
  DATA(insert ( 3974	o 1 ordered_set_transition			percentile_cont_float8_final			-		-		-		f f 0	2281	0	0		0	_null_ _null_ ));
*** a/src/include/catalog/pg_proc.h
--- b/src/include/catalog/pg_proc.h
***
*** 945,950