Re: [HACKERS] MERGE SQL Statement for PG11

2017-11-07 Thread Nico Williams
Ah, there is one reason not to use a mapping to CTEs to implement MERGE:
it might be faster to use a single query that is a FULL OUTER JOIN of the
source and target to drive the update/insert/delete operations.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-11-07 Thread Nico Williams
On Tue, Nov 07, 2017 at 03:31:22PM -0800, Peter Geoghegan wrote:
> On Tue, Nov 7, 2017 at 3:29 PM, Nico Williams <n...@cryptonector.com> wrote:
> > On Thu, Nov 02, 2017 at 03:25:48PM -0700, Peter Geoghegan wrote:
> >> Nico Williams <n...@cryptonector.com> wrote:
> >> >A MERGE mapped to a DML like this:
> >
> > I needed to spend more time reading MERGE docs from other RDBMSes.
> 
> Please don't hijack this thread. It's about the basic question of
> semantics, and is already hard enough for others to follow as-is.

I'm absolutely not.  If you'd like a pithy summary devoid of detail, it
is this:

  I'm making the argument that using ON CONFLICT to implement MERGE
  cannot produce a complete implementation [you seem to agree], but
  there is at least one light-weight way to implement MERGE with
  _existing_ machinery in PG: CTEs.

  It's perfectly fine to implement an executor for MERGE, but I think
  that's a bit silly and I explain why.

Further, I explored your question regarding order of events, which you
(and I) think is a very important semantics question.  You thought order
of execution / trigger firing should be defined, whereas I think it
should not because MERGE explicitly says, at least MSFT's!

MSFT's MERGE says:

| For every insert, update, or delete action specified in the MERGE
| statement, SQL Server fires any corresponding AFTER triggers defined
| on the target table, but does not guarantee on which action to fire
| triggers first or last. Triggers defined for the same action honor the
| order you specify.

Impliedly (though not stated explicitly), the actual updates, inserts,
and deletes, can happen in any order as well as the triggers firing in
any order.

As usual, in the world of programming language design, leaving order of
execution undefined as much as possible increases the level of available
opportunities to parallelize.  Presumably MSFT is leaving the door open
to parallizing MERGE, if they haven't already.

Impliedly, CTEs that have no dependencies on each other are also ripe
for parallelization.  This is important too!  For one of my goals is: to
improve CTE performance.  If implementing MERGE as a mapping to CTEs
leads to improvements in CTEs, so much the better.  But also this *is* a
simple implementation of MERGE, and simplicity seems like a good thing.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Add ALWAYS DEFERRED option for constraints

2017-11-07 Thread Nico Williams
On Mon, Nov 06, 2017 at 05:50:21PM +1300, Thomas Munro wrote:
> On Fri, Oct 20, 2017 at 9:05 AM, Nico Williams <n...@cryptonector.com> wrote:
> > Rebased (there were conflicts in the SGML files).
> 
> Hi Nico
> 
> FYI that version has some stray absolute paths in constraints.source:
> 
> -COPY COPY_TBL FROM '@abs_srcdir@/data/constro.data';
> +COPY COPY_TBL FROM 
> '/home/nico/ws/postgres/src/test/regress/data/constro.data';
> 
> -COPY COPY_TBL FROM '@abs_srcdir@/data/constrf.data';
> +COPY COPY_TBL FROM 
> '/home/nico/ws/postgres/src/test/regress/data/constrf.data';

Oops!  Thanks for catching that!


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-11-07 Thread Nico Williams
On Thu, Nov 02, 2017 at 03:25:48PM -0700, Peter Geoghegan wrote:
> Nico Williams <n...@cryptonector.com> wrote:
> >A MERGE mapped to a DML like this:

I needed to spend more time reading MERGE docs from other RDBMSes.

The best MERGE so far is MS SQL Server's, which looks like:

  MERGE INTO  
  USING  
  ON ()
  
  -- optional:
  WHEN MATCHED THEN UPDATE SET ...

  -- optional:
  WHEN NOT MATCHED [ BY TARGET ] THEN INSERT ...

  -- optional:
  WHEN NOT MATCHED BY SOURCE THEN DELETE

  -- optional:
  OUTPUT ...
  
  ;

(The other MERGEs are harder to use because they lack a WHEN NOT MATCHED
BY SOURCE THEN DELETE, instead having a DELETE clause on the UPDATE,
which is then difficult to use.)

This is *trivial* to map to a CTE, and, in fact, I and my colleagues
have resorted to hand-coded CTEs like this precisely because PG lacks
MERGE (though we ourselves didn't know about MERGE -- it's new to us).

If  is a query, then we start with a CTE for that, else if it's
a view or table, then we don't setup a CTE for it.  Each of the UPDATE,
INSERT, and/or DELETE can be it's own CTE.  If there's an OUTPUT clause,
that can be a final SELECT that queries from the CTEs that ran the DMLs
with RETURNING.  If there's no OUTPUT then none of the DMLs need to have
RETURNING, and one of them will be the main statement, rather than a
CTE.

The pattern is:

  WITH
-- IFF  is a query:
 AS (),

-- IFF there's a WHEN MATCHED THEN UPDATE
updates AS (
  UPDATE  AS  SET ...
  FROM 
  WHERE 
  -- IFF there's an OUTPUT clause, then:
  RETURNING 'update' as "@action", ...
),

inserts AS (
  INSERT INTO  ()
  SELECT ...
  FROM 
  LEFT JOIN  ON 
  WHERE  IS NOT DISTINCT FROM NULL
  -- IFF there's a CONCURRENTLY clause:
  ON CONFLICT DO NOTHING
  -- IFF there's an OUTPUT clause, then:
  RETURNING 'insert' as "@action", ...
),

deletes AS (
  DELETE FROM 
  WHERE NOT EXISTS (SELECT * FROM  WHERE )
  -- IFF there's an OUTPUT clause, then:
  RETURNING 'delete' as "@action", ...
),

  -- IFF there's an OUTPUT clause
  SELECT * FROM updates
  UNION
  SELECT * FROM inserts
  UNION
  SELECT * FROM deletes;

If there's not an output clause then one of the DMLs has to be the main
statement:

  WITH ...
  DELETE ...; -- or UPDATE, or INSERT

Note that if the source is a view or table and there's no OUTPUT clause,
then it's one DML with up to (but not referring to) two CTEs, and in all
cases the CTEs do not refer to each other.  This means that the executor
can parallelize all of the DMLs.

If the source is a query, then that could be made a temp view to avoid
having to run the query first.  The CTE executor needs to learn to
sometimes do this anyways, so this is good.

The  CTE can be equivalently written without a NOT EXISTS:

to_be_deleted AS (
  SELECT 
  FROM 
  LEFT JOIN  ON ()
  WHERE  IS NOT DISTINCT FROM NULL
),
deletes AS (
  DELETE FROM 
  USING to_be_deleted tbd
  WHERE  = 
)

if that were to run faster (probably not, since PG today would first run
the to_be_deleted CTE, then the deletes CTE).  I mention only because
it's nice to see the symmetry of LEFT JOINs for the two WHEN NOT MATCHED
cases.

(Here  is the alias for it if one was given.)

***

This mapping triggers triggers as one would expect (at least FOR EACH
ROW; I expect the DMLs in CTEs should also trigger FOR EACH STATEMENT
triggers, and if they don't I consider that a bug).

> This is a bad idea. An implementation like this is not at all
> maintainable.

I beg to differ.  First of all, not having to add an executor for MERGE
is a win: much, much less code to maintain.  The code to map MERGE to
CTEs can easily be contained entirely in src/backend/parser/gram.y,
which is a maintainability win: any changes to how CTEs are compiled
will fail to compile if they break the MERGE mapping to CTEs.

> >can handle concurrency via ON CONFLICT DO NOTHING in the INSERT CTE.
> 
> That's not handling concurrency -- it's silently ignoring an error. Who
> is to say that the conflict that IGNORE ignored is associated with a row
> visible to the MVCC snapshot of the statement? IOW, why should the DELETE
> affect any row?

That was me misunderstanding MERGE.  The DELETE is independent of the
INSERT -- if an INSERT does nothing because of an ON CONFLICT DO
NOTHING clause, then that won't cause that row to be deleted -- the
inserts and deletes CTEs are independent in the latest mapping (see
above).

I believe adding ON CONFLICT DO NOTHING to the INSERT in this mapping is
all that's needed to support concurrency.

> There are probably a great many reasons why you need a ModifyTable
> executor node that keeps around state, and explicitly indicates that a
> MERGE is a MERGE. For example, we'll probably want statement level
> triggers to execute in a fixed order

Re: [HACKERS] Where is it documented what role executes constraint triggers?

2017-11-03 Thread Nico Williams
On Fri, Nov 03, 2017 at 02:09:00PM -0400, Chapman Flack wrote:
> From a little experimenting in 9.5, it seems that a referential
> integrity trigger is executed with the identity of the referencED
> table's owner, but I have not been able to find this covered in
> the docs. Is this a documentation oversight, or is it explained
> somewhere I didn't look (or may have skimmed right over it)?
> 
> The question came up at $work after the departure of $colleague,
> who had created some tables as himself and not changed their
> ownership. His role had the superuser bit at the time, so
> RI checks involving those tables never incurred 'permission denied'
> errors until he left. Then, his role was not dropped, only disabled
> for login and made no longer superuser, and that's when RI checks
> started incurring 'permission denied'.

Are the trigger functions SECURITY DEFINER?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Add ALWAYS DEFERRED option for constraints

2017-11-03 Thread Nico Williams
On Fri, Nov 03, 2017 at 01:41:45PM -0400, Peter Eisentraut wrote:
> On 11/2/17 16:54, Nico Williams wrote:
> > Replacing condeferred and condeferrable with a char columns also
> > occurred to me, and though I assume backwards-incompatible changes to
> > pg_catalog tables are fair game, I assumed everyone would prefer
> > avoiding such changes where possible.
> 
> I don't think there is an overriding mandate to avoid such catalog
> changes.  Consider old clients that don't know about your new column.
> They might look at the catalog entries and derive information about a
> constraint, not being aware that there is additional information in
> another column that overrides that.  So in such cases it's arguably
> better to make a break.

Makes sense.

> (In any case, it might be worth waiting for a review of the rest of the
> patch before taking on a significant rewrite of the catalog structures.)

I'll wait then :)

When you're done with that I'll make this change (replacing those three
bool columns with a single char column).

> > Hmmm, must I do anything special about _downgrades_?  Does PostgreSQL
> > support downgrades?
> 
> no

Oh good.  Thanks for clarifying that.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-11-02 Thread Nico Williams
On Thu, Nov 02, 2017 at 03:25:48PM -0700, Peter Geoghegan wrote:
> Nico Williams <n...@cryptonector.com> wrote:
> >A MERGE mapped to a DML like this:
> 
> This is a bad idea. An implementation like this is not at all
> maintainable.

Assuming the DELETE issue can be addressed, why would this not be
maintainable?

> >can handle concurrency via ON CONFLICT DO NOTHING in the INSERT CTE.
> 
> That's not handling concurrency -- it's silently ignoring an error. Who
> is to say that the conflict that IGNORE ignored is associated with a row
> visible to the MVCC snapshot of the statement? IOW, why should the DELETE
> affect any row?

Ah, yes, we'd have to make sure the DELETE does not delete rows that
could not be inserted.  There's... no way to find out what those would
have been -- RETURNING won't mention them, though it'd be a nice
addition to UPSERT to have a way to do that, and it'd make this mapping
feasible.

> There are probably a great many reasons why you need a ModifyTable
> executor node that keeps around state, and explicitly indicates that a
> MERGE is a MERGE. For example, we'll probably want statement level
> triggers to execute in a fixed order, regardless of the MERGE, RLS will
> probably require explicitly knowledge of MERGE semantics, and so on.

Wouldn't those fire anyways in a statement like the one I mentioned?

> FWIW, your example doesn't actually have a source (just a target), so it
> isn't actually like MERGE.

That can be added.  I was trying to keep it pithy.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-11-02 Thread Nico Williams
On Thu, Nov 02, 2017 at 02:05:19PM -0700, Peter Geoghegan wrote:
> Simon Riggs  wrote:
> >So in your view we should make no attempt to avoid concurrent errors,
> >even when we have the capability to do so (in some cases) and doing so
> >would be perfectly compliant with the SQLStandard.
> 
> Yes. That's what I believe. I believe this because I can't see a way to
> do this that isn't a mess, and because ON CONFLICT DO UPDATE exists and
> works well for the cases where we can do better in READ COMMITTED mode.

A MERGE mapped to a DML like this:

  WITH
  updated AS (
UPDATE 
SET ...
WHERE 
RETURNING 
)
, inserted AS (
INSERT INTO 
SELECT ...
WHERE  NOT IN (SELECT  FROM updated) AND ..
ON CONFLICT DO NOTHING -- see below!
RETURNING 
)
  DELETE FROM 
  WHERE  NOT IN (SELECT  FROM updated) AND
 NOT IN (SELECT  FROM inserted) AND ...;

can handle concurrency via ON CONFLICT DO NOTHING in the INSERT CTE.

Now, one could write a MERGE that produces conflicts even without
concurrency, so adding ON CONFLICT DO NOTHING by default as above...
seems not-quite-correct.  But presumably one wouldn't write MERGE
statements that produce conflicts in the absence of concurrency, so this
seems close enough to me.

Another thing is that MERGE itself could get an ON CONFLICT clause for
the INSERT portion of the MERGE, allowing one to ignore some conflicts
and not others, though there would be no need for DO UPDATE, only DO
NOTHING for conflict resolution :)  This seems better.

I do believe this mapping is correct, and could be implemented entirely
in src/backend/parser/gram.y!  Am I wrong about this?

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Add ALWAYS DEFERRED option for constraints

2017-11-02 Thread Nico Williams
On Thu, Nov 02, 2017 at 04:20:19PM -0400, Peter Eisentraut wrote:
> I haven't really thought about this feature too hard; I just want to
> give you a couple of code comments.

Thanks!

> I think the catalog structure, and relatedly also the parser structures,
> could be made more compact.  We currently have condeferrable and
> condeferred to represent three valid states (NOT DEFERRABLE, DEFERRABLE
> INITIALLY IMMEDIATE, DEFERRABLE INITIALLY DEFERRED).  You are adding
> conalwaysdeferred, but you are adding only additional state (ALWAYS
> DEFERRED).  So we end up with three bool fields to represent four
> states.  I think this should all be rolled into one char field with four
> states.

I thought about this.  I couldn't see a way to make the two existing
boolean columns have a combination meaning "ALWAYS DEFERRED" that might
not break something else.

Since (condeferred AND NOT condeferrable) is an impossible combination
today, I could use it to mean ALWAYS DEFERRED.  I'm not sure how safe
that would be.  And it does seem like a weird way to express ALWAYS
DEFERRED, though it would work.

Replacing condeferred and condeferrable with a char columns also
occurred to me, and though I assume backwards-incompatible changes to
pg_catalog tables are fair game, I assumed everyone would prefer
avoiding such changes where possible.

Also, a backwards-incompatible change to the table would significantly
enlarge the patch, as more version checks would be needed, particularly
regarding upgrades (which are otherwise trivial).

I felt adding a new column was probably safest.  I'll make a backwards-
incompatible change if requested, naturally, but I guess I'd want to
get wider consensus on that, as I fear others may not agree.  That fear
may just be due to my ignorance of the community's preference as to
pg_catalog backwards-compatibility vs. cleanliness.

Hmmm, must I do anything special about _downgrades_?  Does PostgreSQL
support downgrades?

> In psql and pg_dump, if you are query new catalog fields, you need to
> have a version check to have a different query for >=PG11.  (This would
> likely apply whether you adopt my suggestion above or not.)

Ah, yes, of course.  I will add such a check.

> Maybe a test case in pg_dump would be useful.

Will do.

> Other than that, this looks like a pretty complete patch.

Thanks for the review!  It's a small-ish patch, and my very first for
PG.  It was fun writing it.  I greatly appreciate that PG source is easy
to read.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-11-02 Thread Nico Williams
On Thu, Nov 02, 2017 at 12:51:45PM -0700, Peter Geoghegan wrote:
> Nico Williams <n...@cryptonector.com> wrote:
> >If you want to ignore conflicts arising from concurrency you could
> >always add an ON CONFLICT DO NOTHING to the INSERT DML in the mapping I
> >proposed earlier.  Thus a MERGE CONCURRENTLY could just do that.
> >
> >Is there any reason not to map MERGE as I proposed?
> 
> Performance, for one. MERGE generally has a join that can be optimized
> like an UPDATE FROM join.

Ah, right, I think my mapping was pessimal.  How about this mapping
instead then:

WITH
updated AS (
  UPDATE 
  SET ...
  WHERE 
  RETURNING 
  )
  , inserted AS (
  INSERT INTO 
  SELECT ...
  WHERE  NOT IN (SELECT  FROM updated) AND ..
  /*
   * Add ON CONFLICT DO NOTHING here to avoid conflicts in the face
   * of concurrency.
   */
  RETURNING 
  )
DELETE FROM 
WHERE  NOT IN (SELECT  FROM updated) AND
   NOT IN (SELECT  FROM inserted) AND ...;

?

If a MERGE has no delete clause, then the mapping would be:

WITH
updated AS (
  UPDATE 
  SET ...
  WHERE 
  RETURNING 
  )
INSERT INTO 
SELECT ...
WHERE  NOT IN (SELECT  FROM updated) AND ..
/*
 * Add ON CONFLICT DO NOTHING here to avoid conflicts in the face
 * of concurrency.
 */
;

> I haven't studied this question in any detail, but FWIW I think that
> using CTEs for merging is morally equivalent to a traditional MERGE
> implementation. [...]

I agree.  So why not do that initially?  Optimize later.

Such a MERGE mapping could be implemented entirely within
src/backend/parser/gram.y ...

Talk about cheap to implement, review, and maintain!

Also, this would be notionally very simple.

Any optimizations to CTE query/DML execution would be generic and
applicable to MERGE and other things besides.  If mapping MERGE to
CTE-using DMLs motivates such optimizations, all the better.

>  [...]. It may actually be possible to map from CTEs to a MERGE
> statement, but I don't think that that's a good approach to implementing
> MERGE.

Surely not every DML with CTEs can map to MERGE.  Maybe I misunderstood
your comment?

> Most of the implementation time will probably be spent doing things like
> making sure MERGE behaves appropriately with triggers, RLS, updatable
> views, and so on. That will take quite a while, but isn't particularly
> technically challenging IMV.

Note that mapping to a DML with CTEs as above gets triggers, RLS, and
updateable views right from the get-go, because DMLs with CTEs, and DMLs
as CTEs, surely do as well.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: schema variables

2017-11-02 Thread Nico Williams
On Thu, Nov 02, 2017 at 11:48:44AM -0400, Tom Lane wrote:
> Nico Williams <n...@cryptonector.com> writes:
> > With access controls, GUCs could become schema variables, and settings
> > from postgresql.conf could move into the database itself (which I think
> > would be nice).
> 
> People re-propose some variant of that every so often, but it never works,
> because it ignores the fact that some of the GUCs' values are needed
> before you can access system catalogs at all, or in places where relying
> on system catalog access would be a bad idea.

ISTM that it should be possible to break the chicken-egg issue by having
the config variables stored in such a way that knowing only the pgdata
directory path should suffice to find them.  That's effectively the case
already in that postgresql.conf is found... there.

One could do probably this as a PoC entirely as a SQL-coded VIEW that
reads and writes (via the adminpack module's pg_catalog.pg_file_write())
postgresql.conf (without preserving comments, or with some rules
regarding comments so that they are effectively attached to params).

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-11-02 Thread Nico Williams
On Thu, Nov 02, 2017 at 06:49:18PM +, Simon Riggs wrote:
> On 1 November 2017 at 18:20, Peter Geoghegan  wrote:
> > In Postgres, you can avoid duplicate violations with MERGE by using a
> > higher isolation level (these days, those are turned into a
> > serialization error at higher isolation levels when no duplicate is
> > visible to the xact's snapshot).
> 
> So if I understand you correctly, in your view MERGE should just fail
> with an ERROR if it runs concurrently with other DML?
> 
> i.e. if a race condition between the query and an INSERT runs
> concurrently with another INSERT
> 
> We have no interest in making that work?

If you map MERGE to a DML with RETURNING-DML CTEs as I suggested before,
how would that interact with concurrent DMLs?  The INSERT DML of the
mapped statement could produce conflicts that abort the whole MERGE,
correct?

If you want to ignore conflicts arising from concurrency you could
always add an ON CONFLICT DO NOTHING to the INSERT DML in the mapping I
proposed earlier.  Thus a MERGE CONCURRENTLY could just do that.

Is there any reason not to map MERGE as I proposed?

Such an implementation of MERGE wouldn't be online because CTEs are
always implemented sequentially currently.  That's probably reason
enough to eventually produce a native implementation of MERGE, ... or to
revamp the CTE machinery to allow such a mapping to be online.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-11-02 Thread Nico Williams
If nothing else, anyone needing MERGE can port their MERGE statements to
a DML with DML-containing CTEs...

The generic mapping would be something like this, I think:

WITH
rows AS (SELECT  FROM  WHERE )
  , updated AS (
UPDATE 
SET ...
WHERE  IN (SELECT  FROM rows) /* matched */
RETURNING 
  )
  , inserted AS (
INSERT INTO 
SELECT ...
WHERE  NOT IN (SELECT  FROM rows) /* not matched */
RETURNING 
  )
DELETE FROM 
WHERE (...) AND
   NOT IN (SELECT  FROM updated UNION
SELECT  FROM inserted);

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: schema variables

2017-11-02 Thread Nico Williams
On Thu, Nov 02, 2017 at 06:05:54PM +0530, Robert Haas wrote:
> On Thu, Oct 26, 2017 at 12:51 PM, Pavel Stehule  
> wrote:
> > The variables can be modified by SQL command SET (this is taken from
> > standard, and it natural)
> >
> > SET varname = expression;
> 
> Overloading SET to handle both variables and GUCs seems likely to
> create problems, possibly including security problems.  For example,
> maybe a security-definer function could leave behind variables to
> trick the calling code into failing to set GUCs that it intended to
> set.  Or maybe creating a variable at the wrong time will just break
> things randomly.

That's already true of GUCs, since there are no access controls on
set_config()/current_setting().

Presumably "schema variables" would really just be GUC-like and not at
all like lexically scoped variables.  And also subject to access
controls, thus an overall improvement on set_config()/current_setting().

With access controls, GUCs could become schema variables, and settings
from postgresql.conf could move into the database itself (which I think
would be nice).

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Add two-arg for of current_setting(NAME, FALLBACK)

2017-11-01 Thread Nico Williams
On Thu, Mar 19, 2015 at 05:41:02PM -0500, David Christensen wrote:
> The two-arg form of the current_setting() function will allow a
> fallback value to be returned instead of throwing an error when an
> unknown GUC is provided.  This would come in most useful when using
> custom GUCs; e.g.:

There already _is_ a two-argument form of current_setting() that yours
somewhat conflicts with:

   current_setting(setting_name [, missing_ok ])

https://www.postgresql.org/docs/current/static/functions-admin.html

I often use

  coalesce(current_setting(setting_name, true), default_value_here)

as an implementation of current_setting() with a default value.

You could treat booleans as the second argument as a missing_ok argument
instead of a default value, since _currently_ current_setting() only
returns TEXT.

But if ever GUCs are allowed to have values of other types, then your
two-argument current_setting() will conflict with boolean GUCs.

There are several ways to prevent such a future conflict.  Here are
some:

 - make a two-argument current_setting() for boolean GUCs treat the
   second argument as a default value (since there are no such GUCs
   today, this won't break anything)

 - use a new function name

 - use a three-argument current_setting()

The third option seems very lame to me.  So I'd argue for either of the
other two.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Mapping MERGE onto CTEs (Re: MERGE SQL Statement for PG11)

2017-11-01 Thread Nico Williams
Is it possible to map MERGE onto a query with CTEs that does the the
various DMLs, with all but the last RETURNING?  Here's a sketch:

WITH matched_rows AS (
SELECT FROM  t WHERE 
 ),
 updated_rows AS (
UPDATE  t
SET ...
WHERE ... AND t in (SELECT j FROM matched_rows j)
RETURNING t
 ),
 inserted_rows AS (
INSERT INTO  t
SELECT ...
WHERE ... AND t NOT IN (SELECT j FROM matched_rows j)
RETURNING t
 ),
DELETE FROM  t
WHERE ...;

Now, one issue is that in PG CTEs are basically like temp tables, and
also like optimizer barriers, so this construction is not online, and if
matched_rows is very large, that would be a problem.

As an aside, I'd like to be able to control which CTEs are view-like and
which are table-like.  In SQLite3, for example, they are all view-like,
and the optimizer will act accordingly, whereas in PG they are all
table-like, and thus optimizer barriers.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-10-30 Thread Nico Williams
On Mon, Oct 30, 2017 at 10:59:43AM -0700, Peter Geoghegan wrote:
> On Mon, Oct 30, 2017 at 6:21 AM, Simon Riggs  wrote:
> > If a general purpose solution exists, please explain what it is.
> 
> For the umpteenth time, a general purpose solution is one that more or
> less works like an UPDATE FROM, with an outer join, whose ModifyTable
> node is capable of insert, update, or delete (and accepts quals for
> MATCHED and NOT matched cases, etc). You could still get duplicate
> violations due to concurrent activity in READ COMMITTED mode, but not
> at higher isolation levels thanks to Thomas Munro's work there. In
> this world, ON CONFLICT and MERGE are fairly distinct things.

FWIW, and as an outsider, having looked at MERGE docs from other
RDBMSes, I have to agree that the PG UPSERT (ON CONFLICT .. DO) and
MERGE are rather different beasts.

In particular, I suspect all UPSERT statements can be mapped onto
equivalent MERGE statements, but not all MERGE statements can be mapped
onto UPSERTs.

The reason is that UPSERT depends on UNIQUE constraints, whereas MERGE
uses a generic join condition that need not even refer to any INDEXes,
let alone UNIQUE ones.

Perhaps PG's UPSERT can be generalized to create a temporary UNIQUE
constraint on the  specified in the conflict_target portion of the
statement, increasing the number of MERGE statements that could be
mapped onto UPSERT.  But even then, that would still be a UNIQUE
constraint, whereas MERGE does not even imply such a thing.

Now, a subset of MERGE (those using equijoins in the ON condition) can
be mapped onto UPSERT provided either a suitable UNIQUE index exists (or
that PG notionally creates a temporary UNIQUE constraint for the purpose
of evaluating the UPSERT).  This approach would NOT preclude a more
complete subsequent implementation of MERGE.  But I wonder if that's
worthwhile given that a proper and complete implementation of MERGE is
probably very desirable.

On a tangentially related note, I've long wanted to have an RDBMS-
independent SQL parser for the purpose of implementing external query-
rewriting (and external optimizers), syntax highlighting, and so on.
Having an external / plug-in method for rewriting unsupported SQL as a
way of bridging functionality gaps (like lack of MERGE support) would be
very nice.  PG does have a way to expose its AST...  It might be a good
idea to start by implementing unsupported SQL features in such a way
that they parse and can produce an AST along with a syntax/unsupported
error -- then one might rewrite the parsed AST, generate appropriate
SQL, and execute that.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: BRIN bloom indexes

2017-10-27 Thread Nico Williams
On Fri, Oct 27, 2017 at 10:06:58PM +0200, Tomas Vondra wrote:
> > + * We use an optimisation that initially we store the uint32 values 
> > directly,
> > + * without the extra hashing step. And only later filling the bitmap space,
> > + * we switch to the regular bloom filter mode.
> > 
> > I don't think that optimization is worthwhile.  If I'm going to be using
> > a Bloom filter, it's because I expect to have a lot of rows.
> > 
> > (Granted, if I CREATE TABLE .. LIKE .. INCLUDING INDEXES, and the new
> > table just won't have lots of rows, then I might want this optimization,
> > but I can always just drop the Bloom filter index, or not include
> > indexes in the LIKE.)
> 
> I think you're confusing "rows" and "distinct values". Furthermore, it's
> rather irrelevant how many rows are in the table because BRIN indexes
> split the table into "ranges" that are 1MB by default. And you can only
> squash certain number of rows into such range.
> 
> The idea of the optimization is to efficiently support cases where each
> range contains only small number of distinct values - which is quite
> common in the cases I described in my initial message (with bursts of
> rows with the same IP / client identifier etc.).

What range?  It's just bits to set.

The point is that Bloom filters should ideally be about 50% full -- much
less than that and you are wasting space, much more than than and your
false positive rate becomes useless.

> > Filter compression is not worthwhile.  You want to have a fairly uniform
> > hash distribution, and you want to end up with a Bloom filter that's not
> > much more than 50% full.  That means that when near 50% full the filter
> > will not be very sparse at all.  Optimizing for the not common case
> > doesn't seem worthwhile, and adds complexity.
> 
> Properly sizing the bloom filter requires knowledge of many variables,
> in particular the number of distinct values expected to be added to the
> filter. But we don't really know that number, and we also don't even
> know many values useful for estimating that (how many rows will fit into
> a range, number of distinct values in the whole table, etc.)

This is why Scalable Bloom filters exist: so you can adapt.

> So the idea was to oversize the bloom filter, and then use the sparse
> representation to reduce the size.

A space-efficient representation of sparse bitmaps is not as simple as a
Scalable Bloom filter.

And a filter that requires user intervention to size correctly, or which
requires rebuilding when it gets too full, is also not as convenient as
a Scalable Bloom filter.

> > + * XXX We can also watch the number of bits set in the bloom filter, and
> > + * then stop using it (and not store the bitmap, to save space) when the
> > + * false positive rate gets too high.
> > 
> > Ah, right, what you want is a "Scalable Bloom Filter".
> 
> That's not what I had in mind. My idea was to size the bloom filter on
> "best effort" basis, and then if one range gets a bit too inaccurate
> then just get rid of the filter. If that happens for many ranges, we
> should probably allow specifying parameters as relopts for the index.

Scalable Bloom filters are way more convenient than that.  They're
always not-too-full, and only the open filter is less-than-full-enough.

And since you should grow them somewhat exponentially (but not quite as
much as a doubling in each generation), there should never be too many
filters to search.  But you can always "vacuum" (rebuild) the filter
starting with the size of the last filter added prior to the vacuum.

> I think this is really an over-engineering, and I certainly don't plan
> to extend the patch in this direction.

Whereas I think a sparse bitmap representation is overly complex and
"over-engineering".  Scalable Bloom filters are very well understood in
the literature -- there's nothing terribly complex to them.

> I do not expect these parameters (particularly the number of distinct
> values in a range) to significantly change over time, so the easiest
> solution is to provide a reloption to specify that number in
> CREATE/ALTER index.

Doesn't this depend on the use-case though?  I think a self-tuning
system is better than one that doesn't self-tune.  Call that
over-engineering if you like, but it doesn't make it not desirable :)

> Alternatively, we could allow the summarization process to gather some
> statistics (either on the whole range, or perhaps the whole table), and
> store them somewhere for later use. For example to compute the number of
> distinct values per range (in the existing data), and use that later.

Again, Scalable Bloom filters automatically adapt without needing a
statistics gathering exercise.  All you need is a good hash function
(that's another topic).

Scalable Bloom filters are a trivial extension of Bloom filters.

> > What I'm getting at is that the query planner really needs to understand
> > that a Bloom filter is a probabilistic data structure.
> 
> It does, and we 

Re: [HACKERS] MERGE SQL Statement for PG11

2017-10-27 Thread Nico Williams
On Fri, Oct 27, 2017 at 02:13:27PM -0700, srielau wrote:
> While the standard may not require a unique index for the ON clause I have
> never seen a MERGE statement that did not have this property.  So IMHO this
> is a reasonable restrictions.

I don't understand how one could have a conflict upon which to turn
INSERT into UPDATE without having a UNIQUE constraint violated...

The only question is whether one should have control over -or have to
specify- which constraint violations lead to UPDATE vs. which ones lead
to failure vs. which ones lead to doing nothing.

The row to update is the one that the to-be-inserted row conflicted with
-- there can only have been one if the constraint violated was a PRIMARY
KEY constraint, or if there is a PRIMARY KEY at all, but if there's no
PRIMARY KEY, then there can have been more conflicting rows because of
NULL columns in the to-be-inserted row.  If the to-be-inserted row
conflicts with multiple rows, then just fail, or don't allow MERGE on
tables that have no PK (as you know, many think it makes no sense to not
have a PK on a table in SQL).

In the common case one does not care about which UNIQUE constraint is
violated because there's only one that could have been violated, or
because if the UPDATE should itself cause some other UNIQUE constraint
to be violated, then the whole statement should fail.

PG's UPSERT is fantastic -- it allows very fine-grained control, but it
isn't as pithy as it could be when the author doesn't care to specify
all that detail.

Also, something like SQLite3's INSERT OR REPLACE is very convenient:
pithy, INSERT syntax, upsert-like semantics[*].

I'd like to have this in PG:

  INSERT INTO ..
  ON CONFLICT DO UPDATE;  -- I.e., update all columns of the existing
  -- row to match the ones from the row that
  -- would have been inserted had there not been
  -- a conflict.
  --
  -- If an INSERTed row conflicts and then the
  -- UPDATE it devolves to also conflicts, then
  -- fail.

and

  INSERT INTO ..
  ON CONFLICT DO UPDATE   -- I.e., update all columns of the existing
  -- row to match the ones from the row that
  -- would have been inserted had there not been
  -- a conflict.
  --
  ON CONFLICT DO NOTHING; -- If an INSERTed row conflicts and then the
  -- UPDATE it devolves to also conflicts, then
  -- DO NOTHING.


[*] SQLite3's INSERT OR REPLACE is NOT an insert-or-update, but an
insert-or-delete-and-insert, and any deletions that occur in the
process do fire triggers.  INSERT OR UPDATE would be much more
useful.


Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: BRIN bloom indexes

2017-10-27 Thread Nico Williams
On Thu, Oct 19, 2017 at 10:15:32PM +0200, Tomas Vondra wrote:

A bloom filter index would, indeed, be wonderful.

Comments:

+ * We use an optimisation that initially we store the uint32 values directly,
+ * without the extra hashing step. And only later filling the bitmap space,
+ * we switch to the regular bloom filter mode.

I don't think that optimization is worthwhile.  If I'm going to be using
a Bloom filter, it's because I expect to have a lot of rows.

(Granted, if I CREATE TABLE .. LIKE .. INCLUDING INDEXES, and the new
table just won't have lots of rows, then I might want this optimization,
but I can always just drop the Bloom filter index, or not include
indexes in the LIKE.)

+ * XXX Perhaps we could save a few bytes by using different data types, but
+ * considering the size of the bitmap, the difference is negligible.

A bytea is all that's needed.  See below.

+ * XXX We could also implement "sparse" bloom filters, keeping only the
+ * bytes that are not entirely 0. That might make the "sorted" phase
+ * mostly unnecessary.

Filter compression is not worthwhile.  You want to have a fairly uniform
hash distribution, and you want to end up with a Bloom filter that's not
much more than 50% full.  That means that when near 50% full the filter
will not be very sparse at all.  Optimizing for the not common case
doesn't seem worthwhile, and adds complexity.

+ * XXX We can also watch the number of bits set in the bloom filter, and
+ * then stop using it (and not store the bitmap, to save space) when the
+ * false positive rate gets too high.

Ah, right, what you want is a "Scalable Bloom Filter".

A Scalable Bloom filter is actually a series of Bloom filters where all
but the newest filter are closed to additions, and the newest filter is
where you do all the additions.  You generally want to make each new
filter bigger than the preceding one because when searching a Scalable
Bloom filter you have to search *all* of them, so you want to minimize
the number of filters.

Eventually, when you have enough sub-filters, you may want to re-write
the whole thing so that you start with a single sub-filter that is large
enough.

The format of the bytea might then be something like:

[[[...]]

where the last bitmap is the filter that is open to additions.

I wonder if there are write concurrency performance considerations
here...

It might be better to have a bytea value per-sub-filter so that there is
no lock contention for the closed filters.  The closed sub-filters are
constant, thus not even shared locks are needed for them, and especially
not exclusive locks.

Writing to the filter will necessitate locking the entire open filter,
or else byte-range locking on it.  Something to think about.

> Now, what about query performance? Unlike the "minmax" indexes, the
> "bloom" filter can only handle equality conditions.

A Bloom filter has non-zero false positives for existence, but zero
false positives for non-existence.

This means that a Bloom filter index can only be used for:

a) non-existence tests (with equality predicates, as you point out),

b) as an optimization to avoid slower index checks (or heap scans) when
   the Bloom filter indicates non-existence.

(b) is really just an internal application of (a).

There might be applications where false positives might be ok in a query
like:

  SELECT a.* FROM a a JOIN b b USING (some_col);

but for most real-world queries like that, allowing false positives by
default would be very bad.  An option in the index declaration could be
used to enable existence equality predicates, but I would not include
such an option initially -- let's see if anyone actually has a use case
for it first.

Of course, for something like:

  SELECT a.*, b.* FROM a a JOIN b b USING (some_col);

a Bloom filter can only be used as an optimization to avoid using a
slower index (or heap scan) on the inner table source.

What I'm getting at is that the query planner really needs to understand
that a Bloom filter is a probabilistic data structure.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: schema variables

2017-10-26 Thread Nico Williams
On Thu, Oct 26, 2017 at 09:21:24AM +0200, Pavel Stehule wrote:
> Comments, notes?

I like it.

I would further like to move all of postgresql.conf into the database,
as much as possible, as well as pg_ident.conf and pg_hba.conf.

Variables like current_user have a sort of nesting context
functionality: calling a SECURITY DEFINER function "pushes" a new value
onto current_user, then when the function returns the new value of
current_user is "popped" and the previous value restored.

It might be nice to be able to generalize this.

Questions that then arise:

 - can one see up the stack?
 - are there permissions issues with seeing up the stack?

I recently posted proposing a feature such that SECURITY DEFINER
functions could observe the _caller_'s current_user.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Add ALWAYS DEFERRED option for constraints

2017-10-19 Thread Nico Williams
Rebased (there were conflicts in the SGML files).

Nico
-- 
>From 80d284ecefa22945d507d2822f1f1a195e2af751 Mon Sep 17 00:00:00 2001
From: Nicolas Williams 
Date: Tue, 3 Oct 2017 00:33:09 -0500
Subject: [PATCH] Add ALWAYS DEFERRED option for CONSTRAINTs

and CONSTRAINT TRIGGERs.

This is important so that one can have triggers and constraints that
must run after all of the user/client's statements in a transaction
(i.e., at COMMIT time), so that the user/client may make no further
changes (triggers, of course, still can).
---
 doc/src/sgml/catalogs.sgml | 17 -
 doc/src/sgml/ref/alter_table.sgml  |  4 +-
 doc/src/sgml/ref/create_table.sgml | 10 ++-
 doc/src/sgml/ref/create_trigger.sgml   |  2 +-
 doc/src/sgml/trigger.sgml  |  1 +
 src/backend/bootstrap/bootparse.y  |  2 +
 src/backend/catalog/heap.c |  1 +
 src/backend/catalog/index.c|  8 +++
 src/backend/catalog/information_schema.sql |  8 +++
 src/backend/catalog/pg_constraint.c|  2 +
 src/backend/catalog/toasting.c |  2 +-
 src/backend/commands/indexcmds.c   |  2 +-
 src/backend/commands/tablecmds.c   | 20 +-
 src/backend/commands/trigger.c | 28 +++--
 src/backend/commands/typecmds.c|  3 +
 src/backend/nodes/copyfuncs.c  |  3 +
 src/backend/nodes/outfuncs.c   |  4 ++
 src/backend/parser/gram.y  | 99 ++
 src/backend/parser/parse_utilcmd.c | 46 +-
 src/backend/utils/adt/ruleutils.c  |  4 ++
 src/bin/pg_dump/pg_dump.c  | 31 --
 src/bin/pg_dump/pg_dump.h  |  2 +
 src/bin/psql/describe.c| 34 +++---
 src/bin/psql/tab-complete.c|  4 +-
 src/include/catalog/index.h|  2 +
 src/include/catalog/pg_constraint.h| 42 +++--
 src/include/catalog/pg_constraint_fn.h |  1 +
 src/include/catalog/pg_trigger.h   | 16 ++---
 src/include/commands/trigger.h |  1 +
 src/include/nodes/parsenodes.h |  6 +-
 src/include/utils/reltrigger.h |  1 +
 src/test/regress/input/constraints.source  | 51 +++
 src/test/regress/output/constraints.source | 54 +++-
 33 files changed, 418 insertions(+), 93 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index ef60a58..1bc35dc 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -,6 +,13 @@ SCRAM-SHA-256$iteration count:
  
 
  
+  conalwaysdeferred
+  bool
+  
+  Is the constraint always deferred?
+ 
+
+ 
   convalidated
   bool
   
@@ -6968,6 +6975,13 @@ SCRAM-SHA-256$iteration count:
  
 
  
+  tgalwaysdeferred
+  bool
+  
+  True if constraint trigger is always deferred
+ 
+
+ 
   tgnargs
   int2
   
@@ -7029,7 +7043,8 @@ SCRAM-SHA-256$iteration count:

 When tgconstraint is nonzero,
 tgconstrrelid, tgconstrindid,
-tgdeferrable, and tginitdeferred are
+tgdeferrable, tginitdeferred, and
+tgalwaysdeferred are
 largely redundant with the referenced pg_constraint entry.
 However, it is possible for a non-deferrable trigger to be associated
 with a deferrable constraint: foreign key constraints can have some
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index b4b8dab..fe24521 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -55,7 +55,7 @@ ALTER TABLE [ IF EXISTS ] name
 ALTER [ COLUMN ] column_name SET STORAGE { PLAIN | EXTERNAL | EXTENDED | MAIN }
 ADD table_constraint [ NOT VALID ]
 ADD table_constraint_using_index
-ALTER CONSTRAINT constraint_name [ DEFERRABLE | NOT DEFERRABLE ] [ INITIALLY DEFERRED | INITIALLY IMMEDIATE ]
+ALTER CONSTRAINT constraint_name [ DEFERRABLE | NOT DEFERRABLE | ALWAYS DEFERRED ] [ INITIALLY DEFERRED | INITIALLY IMMEDIATE ]
 VALIDATE CONSTRAINT constraint_name
 DROP CONSTRAINT [ IF EXISTS ]  constraint_name [ RESTRICT | CASCADE ]
 DISABLE TRIGGER [ trigger_name | ALL | USER ]
@@ -89,7 +89,7 @@ ALTER TABLE [ IF EXISTS ] name
 
 [ CONSTRAINT constraint_name ]
 { UNIQUE | PRIMARY KEY } USING INDEX index_name
-[ DEFERRABLE | NOT DEFERRABLE ] [ INITIALLY DEFERRED | INITIALLY IMMEDIATE ]
+[ DEFERRABLE | NOT DEFERRABLE | ALWAYS DEFERRED ] [ INITIALLY DEFERRED | INITIALLY IMMEDIATE ]
 
  
 
diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index 2db2e9f..cf1ba1c 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -67,7 +67,7 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
   PRIMARY KEY index_parameters |
   REFERENCES reftable [ ( refcolumn ) ] [ MATCH 

Re: [HACKERS] Interest in a SECURITY DEFINER function current_user stack access mechanism?

2017-10-19 Thread Nico Williams
A bit more about why I want this.

Suppose you have an app like PostgREST (a RESTful, Haskell-coded, HTTP
front-end for PostgreSQL).  PostgREST basically a proxy for PG access.

Users authenticate to the proxy.  The proxy authenticates to PG with its
own credentials, then it does something like SET ROLE or SET SESSION
AUTHORIZATION to impersonate the user to PG.

Now suppose you want to support impersonation in such a proxy.  That is,
that user "Joe" can impersonate "Jane", for example.  So what you do is
you have the proxy do this:

  -- This is the role authenticated to the proxy:
  SET SESSION AUTHORIZATION ;

  -- This is the requested impersonation, and succeeds only if the first
  -- role has been GRANTed the second:
  SET SESSION ROLE ;

Convenient!

Now, if you want to... audit what Joe does, you may want to record both,
the session_user (Joe) and the current_user (Jane) as you write the
audit trail.  Naturally, that's desirable, and it works...

...unless the audit procedures are SECURITY DEFINER.  Then they don't
see the current_user.  They see the session_user, but can't see who the
session user was impersonating.

Hence I want to be able to look back through the [security definer]
function invocation stack to find what current_user was at the
top-level, or even just the calling function's current_user.

Now, since this is a proxy, a workaround is to store the impersonated
role name in an application defined GUC.  But if ever you wanted to give
users direct PG access (one might! it should be possible), then that's
not enough because they can set that GUC.  So it really has to be that
the audit procedures can look up the stack.

(When you give out direct PG access you really want to make those audit
procedures SECURITY DEFINER, so they can do DMLs on tables that the
session_user can't.)

This isn't urgent _for me_, but it is a real problem.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interest in a SECURITY DEFINER function current_user stack access mechanism?

2017-10-18 Thread Nico Williams
On Wed, Oct 18, 2017 at 02:45:47PM -0700, David G. Johnston wrote:
> On Wed, Oct 18, 2017 at 2:30 PM, Nico Williams <n...@cryptonector.com>
> wrote:
> > On Wed, Oct 18, 2017 at 02:13:29PM -0700, David G. Johnston wrote:
> > > > More useful than this, for me, would be a way to get the top-most user.
> > >
> > > That would be "session_user"?
> >
> > It's not quite since there's a difference between SET SESSION
> > AUTHORIZATION and SET SESSION ROLE.
> >
> > But yes, it's what I'm using now.
> 
> True, though at that point the superuser who wants to cover their tracks
> could probably just edit your functions...

I don't worry about superusers.

However, I'd like for there to be a way to drop privileges permanently
for a session.  Something like SET SESSION AUTHORIZATION WITH NORESET
(ala MySQL) or SET SESSION AUTHENTICATION.

> > Hmmm, oh, I forgot about GET DIAGNOSTICS!  The stack is already exposed
> > to SQL.  Maybe we could add a CURRENT_USER item to GET STACKED
> > DIAGNOSTICS or to the PG_CONTEXT.
> 
> Ideally if implementing what you describe we'd want it accessible from any
> procedural language​, not just pl/pgsql.

Good point.  So a function.  Got it.

> I'd probably expose the stack as an array...

I agree, but that would be more expensive, since it means marshalling
all the information, even if the caller only wants one specific item.
Whereas accessing a specific frame by number is much simpler and
performant (no allocation).

It's also easier to not have to do something like.. parsing than the
PG_CONTEXT, instead accessing each of any number of attributes we might
expose from each frame.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interest in a SECURITY DEFINER function current_user stack access mechanism?

2017-10-18 Thread Nico Williams
On Wed, Oct 18, 2017 at 02:13:29PM -0700, David G. Johnston wrote:
> > More useful than this, for me, would be a way to get the top-most user.
> 
> That would be "session_user"?

It's not quite since there's a difference between SET SESSION
AUTHORIZATION and SET SESSION ROLE.

But yes, it's what I'm using now.

> > Introducing the concept of a stack at the SQL level here seems, at
> > > first glance, to be over-complicating things.
> >
> > Because of the current implementation of invocation of SECURITY DEFINER
> > functions, a stack is trivial to build, since it's a list of nodes
> > allocated on the C stack in fmgr_security_definer().
> 
> Not saying its difficult (or not) to code in C; but exposing that to SQL
> seems like a big step.

Really?  Why?  I mean, there's an implicit function invocation stack
already.  Reifying some bits of the function call stack is useful.  I
can't think of how this particular reification would be dangerous or set
a bad precedent.

Hmmm, oh, I forgot about GET DIAGNOSTICS!  The stack is already exposed
to SQL.  Maybe we could add a CURRENT_USER item to GET STACKED
DIAGNOSTICS or to the PG_CONTEXT.

> If I was in position to dive deeper I wouldn't foreclose on the stack idea
> but I'd be inclined to see if something else could be made to work with
> reasonable effort.

I would think that the more general approach, if easy enough to
implement, would be better.  I can (and will) live with using
session_user instead of current_user, for now.  But I'm willing to
contribute a patch.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interest in a SECURITY DEFINER function current_user stack access mechanism?

2017-10-18 Thread Nico Williams
On Wed, Oct 18, 2017 at 01:43:30PM -0700, David G. Johnston wrote:
> Regardless of the merits of the proposed feature, the function
> "session_user" is SQL-defined and should not be modified or enhanced.
> 
> I could see "calling_role()" being useful - it returns the same value
> as "current_role" normally and in security invoker functions while in
> a security definer function it would return whatever current_role
> would have returned if the function was a security invoker (i.e., the
> role that the system will put back into effect once the security
> definer function returns).

That... could be awkward where lots of SECURITY DEFINER functions may be
user-callable, but also called from each other.  But it would be
minimally useful.

More useful than this, for me, would be a way to get the top-most user.

> Introducing the concept of a stack at the SQL level here seems, at
> first glance, to be over-complicating things.

Because of the current implementation of invocation of SECURITY DEFINER
functions, a stack is trivial to build, since it's a list of nodes
allocated on the C stack in fmgr_security_definer().

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interest in a SECURITY DEFINER function current_user stack access mechanism?

2017-10-18 Thread Nico Williams
On Wed, Oct 18, 2017 at 10:15:01PM +0200, Pavel Stehule wrote:
> there is a function session_user() already

But it doesn't do this.  Are you saying that I should add a
session_user(int)?

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interest in a SECURITY DEFINER function current_user stack access mechanism?

2017-10-18 Thread Nico Williams
Alternatively, a way to get at the OuterUserId?  Or the outer-most
current_user in the function stack?

I should explain why I need this: for audit functionality where I want
the triggers' procedures to be SECURITY DEFINER so only they can write
to audit tables and such, but I want them to see the current_user of the
*caller*, rather than current_user being the DEFINER's name.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Interest in a SECURITY DEFINER function current_user stack access mechanism?

2017-10-18 Thread Nico Williams
It'd be nice if SECURITY DEFINER functions could see what user invoked
them, but current_user is the DEFINER user, naturally, since that's how
this is done in fmgr_security_definer().

I was thinking that fmgr_security_definer() could keep a global pointer
to a linked list (with automatic nodes) of the save_userid values.  Then
we could have a SQL function for accessing these, something like
pg_current_user(level int) returning text, where level 0 is
current_user, level 1 is "the previous current_user in the stack", and
so on, returning null when level is beyond the top-level.

This seems like a simple, small, easy patch, and since I [think I] need
it I suspect others probably do as well.

Thoughts?

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Add ALWAYS DEFERRED option for constraints

2017-10-11 Thread Nico Williams
FYI, I've added my patch to the commitfest.

https://commitfest.postgresql.org/15/1319/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How does postgres store the join predicate for a relation in a given query

2017-10-10 Thread Nico Williams
On Tue, Oct 10, 2017 at 07:29:24PM +0530, Gourav Kumar wrote:
> When you fire a query in postgresql, it will first parse the query and
> create the data structures for storing various aspects of the query and
> executing the query. (Like RangeTblEntry, PlannerInfo, RangeOptInfo etc.).
> 
> I want to know how does postgresql stores the join predicates of a query.
> Like which data structure is used to store the join predicates.
> 
> How can we find the join predicates applied on a relation from relid, Oid
> or RangeTblEntry ?
> 
> I want to construct a join graph for a given query, for which I need the
> join predicates between two relations.

In the usingClause or quals fields of a JoinExpr.  See
src/backend/parser/gram.y, search for join_qual.

Of course, WHERE clauses have to be inspected as well, which go into the
whereClause of of a SelectStmt; search for where_clause in
src/backend/parser/gram.y.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Discussion on missing optimizations

2017-10-07 Thread Nico Williams
On Fri, Oct 06, 2017 at 10:19:54PM -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2017-10-06 21:33:16 -0400, Adam Brusselback wrote:
> >> The article in question is here:
> >> https://blog.jooq.org/2017/09/28/10-cool-sql-optimisations-that-do-not-depend-on-the-cost-model/
> 
> > That's interesting.
> 
> The impression I have in a quick scan is that probably hardly any of these
> are cases that any of the DB designers think are important in themselves.

That's true for some of those.  But some of them might become important
when you start pushing WHERE constraints from outside into inner table
sources and subqueries, as dumb-looking constraints can simply appear
from pushing non-dumb-looking constraints.

More than the op optimizations would make a big difference for me:

 - turning subqueries into joins

 - turning ORs into UNIONs

   It is easy enough to work around the lack of this optimization in
   many cases, but it does make queries more verbose.

 - pushing WHERE constraints from outer queries into the table source
   queries (_including_ VIEWs)

- determining that some table in a query that had WHERE constraints
  pushed into it... now has a very well-filled out lookup key,
  therefore it's the one that should be the table source to start
  the plan with, i.e., that it should be first in the outermost loop
  of a nested loop join 

  For me these two would be huge wins.  I have to resort to
  functions with roughly the same body as views just so that I can
  have the optimizer pick the correct plan.  This causes a lot of
  code duplication in my schemas.

 - pushing WHERE constraints from outer queries into HAVING thence WHERE
   constraints on GROUP BY queries where the outer constraints are on
   columns used to GROUP BY

   I find myself making two versions of views that do aggregation: one
   that does not, and one that does.  This allows me to use the
   non-aggregating view in contexts where I need this optimization, but
   then I have to re-code the aggregation at that layer.  Again, lots of
   duplication.

These sorts of optimizations are huge.

> Rather, they fall out of more general optimization attempts, or not,
> depending on the optimization mechanisms in use in a particular DB.
> For example, reducing "WHERE 1=1" to "WHERE TRUE" and then to nothing
> comes out of a constant-subexpression-precalculation mechanism for us,
> whereas "WHERE column=column" doesn't fall to that approach.  ISTM it
> would be really dumb to expend planner cycles looking specifically for
> that case, so I guess that DB2 et al are finding it as a side-effect of
> some more general optimization ... I wonder what that is?

If you can reduce the number of compilations / optimization passes for
statements, then spending more time in the optimizer is not a big deal.
So, when invoked via PREPARE I would say spending more cycles looking
for this sort of thing is OK, but in many other cases it's not.

Also, sometimes these cases crop up do to pushing constraints into VIEWs
and sub-queries.  In those cases then constant sub-expression
elimination can be a win.

> (edit: a few minutes later, I seem to remember that equivclass.c has
> to do something special with the X=X case, so maybe it could do
> something else special instead, with little new overhead.)

I'd expect that column = column is not trivial to turn into TRUE, not
unless those columns are NOT NULLable.

> > 9. Unneeded Self JOIN
> 
> > Can't remember discussions of this.
> 
> I can't get very excited about that one either.
> 
> In the end, what the article fails to consider is that all of these are
> tradeoffs, not unalloyed goods.  If you spend planner cycles on every
> query to look for cases that only the most unabashedly brain-dead ORMs
> ever generate, you're not really doing your users a favor on balance.

I can't get very excited about this one either, though I do believe it
can arise as the author says, "when you build complex views and JOIN
them to each other".  Maybe I'm not excited about it because I've not
needed it :)

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] A hook for session start

2017-10-06 Thread Nico Williams
On Sat, Oct 07, 2017 at 05:44:00AM +0200, Pavel Stehule wrote:
> 2017-10-06 21:36 GMT+02:00 Nico Williams <n...@cryptonector.com>:
> > But the nice thing about them is that you need only create them once, so
> > leave them in the catalog.  Stats about them should not be gathered nor
> > stored, since they could be different per-session.
> 
> Unfortunately one field from pg_class are not static - reltuples should be
> per session.

It's "only an estimate" "used by the query planner".  We could estimate
zero for global temp tables, and the query planner can get the true
value from an internal temp table.

> But it can be moved to different table

That too, if it's OK.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] A hook for session start

2017-10-06 Thread Nico Williams
On Fri, Oct 06, 2017 at 08:51:53PM +0200, Pavel Stehule wrote:
> 2017-10-06 20:39 GMT+02:00 Nico Williams <n...@cryptonector.com>:
> > On Fri, Oct 06, 2017 at 06:37:57PM +0200, Pavel Stehule wrote:
> > > When we talked about this topic, there are two issues:
> > >
> > > a) probably not too hard issue - some internal data can be in session sys
> > > cache.
> > >
> > > b) the session sys data should be visible on SQL level too (for some
> > tools
> > > and consistency) - it is hard task.
> >
> > Can you expand on this?
> 
> If global temporary tables should be effective, then you have not have
> modify system catalogue after creating. But lot of processes requires it -
> ANALYZE, query planning.

But the nice thing about them is that you need only create them once, so
leave them in the catalog.  Stats about them should not be gathered nor
stored, since they could be different per-session.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] A hook for session start

2017-10-06 Thread Nico Williams
On Fri, Oct 06, 2017 at 06:37:57PM +0200, Pavel Stehule wrote:
> 2017-10-06 6:48 GMT+02:00 Nico Williams <n...@cryptonector.com>:
> > On Fri, Oct 06, 2017 at 04:52:09AM +0200, Pavel Stehule wrote:
> > > Current TEMP tables, if you do it for any session has pretty significant
> > > overhead  - with possible risk of performance lost (system catalog
> > bloat).
> >
> > Because of the DDLs for them?
> 
> yes - pg_attribute, pg_class, pg_stats are bloating - and when these tables
> are bloated, then DDL is slow.

:(

> > No, I want GLOBAL TEMP tables.
> 
> me too :) - and lot of customer and users.

> I though about it, but I have other on my top priority. GLOBAL TEMP TABLE
> is on 90% unlogged table. But few fields should be session based instead
> shared persistent - statistics, rows in pg_class, filenode.

Unlogged tables don't provide isolation between sessions the way temp
tables do, so I don't see the connection.

But the necessary components (temp heaps and such) are all there, and I
suspect a PoC could be done fairly quickly.  But there are some
subtleties like that FKs between GLOBAL TEMP and persistent tables must
not be allowed (in either direction), so a complete implementation will
take significant work.

The work looks like:

 - add syntax (trivial)

 - add new kind of persistence (lots of places to touch, but it's mostly
   mechanical)

 - redirect all references to global temp table contents to temp
   heaps/indexes/whatever

 - add logic to prevent FKs between persistent and global temp tables

 - what else?

> When we talked about this topic, there are two issues:
> 
> a) probably not too hard issue - some internal data can be in session sys
> cache.
> 
> b) the session sys data should be visible on SQL level too (for some tools
> and consistency) - it is hard task.

Can you expand on this?

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] A hook for session start

2017-10-05 Thread Nico Williams
On Fri, Oct 06, 2017 at 11:04:38AM +0800, Craig Ringer wrote:
> On 6 October 2017 at 10:52, Pavel Stehule  wrote:
> 
> > It is better to work on GLOBAL TEMP tables.
> >
> > Current TEMP tables, if you do it for any session has pretty significant
> > overhead  - with possible risk of performance lost (system catalog bloat).
> >
> > pretty significant performance issue of my customers are related to temp
> > tables usage (under high load)
> 
> I've seen the same thing too. Especially when combined with logical
> decoding, where IIRC we mark transactions as having catalog changes
> due to temp tables.
> 
> Sometimes the catalog bloat can be truly horrible when a user has
> hundreds of plpgsql functions that all like to make temp tables.

I agree that we should have GLOBAL TEMP tables, but also we should have
a pg_temp_catalog where all TEMP schema elements go...  (That, I'm sure,
would be a lot of work.)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] A hook for session start

2017-10-05 Thread Nico Williams
On Fri, Oct 06, 2017 at 04:52:09AM +0200, Pavel Stehule wrote:
> 2017-10-05 22:31 GMT+02:00 Nico Williams <n...@cryptonector.com>:
> > On Tue, Aug 01, 2017 at 03:36:23PM -0400, Peter Eisentraut wrote:
> > > On 7/21/17 13:14, Jim Mlodgenski wrote:
> > > > When I first saw this thread, my initial thought of a use case is to
> > > > prepare some key application queries so they are there and ready to go.
> > > > That would need to be before the ExecutorStart_hook or
> > > > ProcessUtility_hook if an app would just want to execute the prepared
> > > > statement.
> > >
> > > Isn't that what the preprepare extension does already?
> >
> > more generic facility -> more useful
> >
> > My use case is to pre-create TEMP schema elements that VIEWs, FUNCTIONs,
> > and TRIGGERs, might need.
> 
> It is better to work on GLOBAL TEMP tables.

I don't disagree.

In fact, I was scoping out what it might take to do that just yesterday.

I've too thoughts on that: either a new relpersistence kind that is very
similar to persistent, but which always uses temp heaps, or a modifier
for the persistent kind that says to use temp heaps.  Either way it
looks like it should be fairly straightforward (but then, i've only
ever written one thing for PG, earlier this week, the ALWAYS DEFERRED
thing).

> Current TEMP tables, if you do it for any session has pretty significant
> overhead  - with possible risk of performance lost (system catalog bloat).

Because of the DDLs for them?

> So often creating local temp tables is antipattern (in Postgres)
> unfortunately.

I do it plenty, but sometimes I use an UNLOGGED table with a txid column
in the PK set to txid_current(), then I clean up where I can.  It'd be
nice to have COMMIT triggers for cleaning up such rows, among other
things.  I've implemented that using DDL event triggers, but to perform
well it needs to be a native feature.

> I am not sure, if we should to support this case more :( Probably is
> better, so it is hard to use local TEMP tables.

No, I want GLOBAL TEMP tables.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fork()-safety, thread-safety

2017-10-05 Thread Nico Williams
On Thu, Oct 05, 2017 at 03:34:41PM -0700, Andres Freund wrote:
> On 2017-10-05 17:31:07 -0500, Nico Williams wrote:
> > > >vfork() is widely demonized, but it's actually quite superior
> > > >(performance-wise) to fork() when all you want to do is exec-or-exit
> > > >since no page copying (COW or otherwise) needs be done when using
> > > >vfork().
> > > 
> > > Not on linux, at least not as of a year or two back.
> > 
> > glibc has it.  Other Linux C libraries might also; I've not checked them
> > all.
> 
> It has it, but it's not more efficient.

Because of signal-blocking issues?

> > > I do think it'd be good to move more towards threads, but not at all for
> > > the reasons mentioned here.
> > 
> > You don't think eliminating a large difference between handling of WIN32
> > vs. POSIX is a good reason?
> 
> I seems like you'd not really get a much reduced set of differences,
> just a *different* set of differences. After investing time.

Fair enough.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fork()-safety, thread-safety

2017-10-05 Thread Nico Williams
On Thu, Oct 05, 2017 at 03:13:07PM -0700, Andres Freund wrote:
> On 2017-10-05 17:02:22 -0500, Nico Williams wrote:
> >A quick look at the functions called on the child side of fork()
> >makes me think that it's unlikely that the children here use
> >async-signal-safe functions only.
> 
> That's not a requirement unless you're using fork *and* threads. At
> least by my last reading of posix and common practice.

True, yes.  One still has to be careful to fflush() all open FILEs (that
might be used on both sides of fork()) and such though.

> >  - fork() is used in a number of places where execl() or execv() are
> >called immediately after (and exit() if the exec fails).
> > 
> >It would be better to use vfork() where available and _exit() instead
> >of exit().
> 
> vfork is less portable, and doesn't really win us anything on common
> platforms. On most it's pretty much the same implementation.

It's trivial to use it where available, and fork() otherwise.  Mind you,
all current versions of Solaris/Illumos, *BSD, OS X, and Linux w/glibc
(and even Windows with WSL!) have a true vfork().

> >vfork() is widely demonized, but it's actually quite superior
> >(performance-wise) to fork() when all you want to do is exec-or-exit
> >since no page copying (COW or otherwise) needs be done when using
> >vfork().
> 
> Not on linux, at least not as of a year or two back.

glibc has it.  Other Linux C libraries might also; I've not checked them
all.

> I do think it'd be good to move more towards threads, but not at all for
> the reasons mentioned here.

You don't think eliminating a large difference between handling of WIN32
vs. POSIX is a good reason?

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] fork()-safety, thread-safety

2017-10-05 Thread Nico Williams
A thread on parallelization made me wonder so I took a look:

 - src/bin/*/parallel.c uses threads on WIN32
 - src/bin/*/parallel.c uses fork() on not-WIN32

   (Ditto src/bin/pg_basebackup/pg_basebackup.c and
   src/backend/postmaster/syslogger.c.)

   A quick look at the functions called on the child side of fork()
   makes me think that it's unlikely that the children here use
   async-signal-safe functions only.

   Why not use threads on all systems where threads are available when
   we'd use threads on some such systems?  If this code is thread-safe
   on WIN32, why wouldn't it be thread-safe on POSIX?  (Well, naturally
   there may be calls to, e.g., getpwnam() and such that would not be
   thread-safe on POSIX, and which might not exist on WIN32.  But I
   mean, aside from that, if synchronization is done correctly on WIN32,
   what would stop that from being true on POSIX?)

 - fork() is used in a number of places where execl() or execv() are
   called immediately after (and exit() if the exec fails).

   It would be better to use vfork() where available and _exit() instead
   of exit().

   Alternatively posix_spawn() should be used (which generally uses
   vfork() or equivalent under the covers).

   vfork() is widely demonized, but it's actually quite superior
   (performance-wise) to fork() when all you want to do is exec-or-exit
   since no page copying (COW or otherwise) needs be done when using
   vfork().

   It's actually safer to use vfork() because POSIX limits one to
   async-signal-safe functions between fork() and exec-or-exit...  With
   fork(), where neither the parent nor the child immediately execs-or-
   exits, it's too easy to fail to make sure that the code they execute
   is fork-safe.  Whereas with vfork() the fact that the parent (just
   the one thread, incidentally, not all of them[*]) blocks until the
   child execs-or-exits means it's impossible to fail to notice a
   long-running child that does lots of fork-unsafe work.

   It's safer still to use posix_spawn(), naturally.

In Unix-land it's standard practice to ignore the async-signal-safe
requirement when using fork() early on in a daemon's life to start
worker processes.  This is fine, of course, though if we're using
CreateProcess*()/_spawn() on WIN32 anyways, it might be best to do the
equivalent on Unix and just spawn the children -- if nothing else, this
would reduce the likelihood of unintended divergence between WIN32 and
Unix.

Nico

[*] Actually, I do believe that on Solaris/Illumos vfork() stops all
threads in the parent, if I remember correctly anyways.  Linux's and
NetBSD's vfork() only stops the one thread in the parent that called
it.  I haven't checked other BSDs.  There was a patch for NetBSD to
stop all threads in the parent, but I convinced the NetBSD community
to discard that patch.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] A hook for session start

2017-10-05 Thread Nico Williams
On Tue, Aug 01, 2017 at 03:36:23PM -0400, Peter Eisentraut wrote:
> On 7/21/17 13:14, Jim Mlodgenski wrote:
> > When I first saw this thread, my initial thought of a use case is to
> > prepare some key application queries so they are there and ready to go.
> > That would need to be before the ExecutorStart_hook or
> > ProcessUtility_hook if an app would just want to execute the prepared
> > statement.
> 
> Isn't that what the preprepare extension does already?

more generic facility -> more useful

My use case is to pre-create TEMP schema elements that VIEWs, FUNCTIONs,
and TRIGGERs, might need.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] A hook for session start

2017-10-05 Thread Nico Williams
On Fri, Jul 21, 2017 at 11:10:52PM +0800, Craig Ringer wrote:
> What practical use cases are there for acting post-auth but that can't wait
> until the user tries to do something?

Creating TEMP schema that triggers and functions might need.

Doing CREATE TEMP TABLE IF NOT EXISTS in triggers slows things down.

It'd be super nice if PostgreSQL had some sort of persistent TEMP
schema option, where you can have schema elements that are persistent
in that they're always there, but where the data is all TEMP.  Oracle
has this and they call it GLOBAL TEMP IIRC.  There would be some
caveats, such as not being able to have FKs between these sorts of
persistent temp tables and persistent tables.

In the absence of such a feature, a session hook/trigger is a great
workaround.

> Can a user do anything remotely interesting or useful without hitting
> either ExecutorStart_hook or ProcessUtility_hook? They can parse queries I
> guess but you could just set your hook up in the parser instead. If you
> hook the parser all they can do is open an idle session and sit there...

In any other hook you'd have to check whether the session setup work you
wanted to do has been done.  That could be potentially slow.

I actually have an all SQL implementation of session/begin/commit
triggers.  The session triggers in that implementation only run on the
first DML statement, which could be too late for OP's purpose.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw super user checks

2017-10-05 Thread Nico Williams
On Thu, Sep 14, 2017 at 04:08:08PM -0400, Robert Haas wrote:
> On Thu, Sep 14, 2017 at 2:33 PM, Jeff Janes  wrote:
> > I think that foreign tables ought to behave as views do, where they run as
> > the owner rather than the invoker.  No one has talked me out of it, but no
> > one has supported me on it either.  But I think it is too late to change
> > that now.
> 
> That's an interesting point.  I think that you can imagine use cases
> for either method.  Obviously, if what you want to do is drill a hole
> through the Internet to another server and then expose it to some of
> your fellow users, having the FDW run with the owner's permissions
> (and credentials) is exactly right.  But there's another use case too,
> which is where you have something that looks like a multi-user
> sharding cluster.  You want each person's own credentials to carry
> over to everything they do remotely.

Hmmm, I don't think that's really right.

What I'd like instead is for the FDW client to tell the FDW server the
session_user/current_user on behalf of which it's acting, and let the
FDW server decide how to proceed.  This could be done by doing a SET
SESSION fdw.client.session_user... and so on.

We use Kerberos principal names as PG user/role names, _with_ @REALM
included, so a user foo@BAR is likely to make sense to the FDW server.

Of course, if you're not using Kerberos then the local and remote user
namespaces might be completely distinct, but by letting the FDW server
know a) the FDW client's username (via authentication) and b) the true
username on the client side (via SET/set_config()), the server might
have enough information to decide whether it trusts (a) to impersonate
(b) and how to map (b) to a local user.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Add ALWAYS DEFERRED option for constraints

2017-10-05 Thread Nico Williams
I accidentally typoed when saving a file.  Here's the new patch with
that typo corrected, changes to information_schema dropped, and with the
addition of tab completion of ALWAYS DEFERRED in psql.

Nico
-- 
>From 97d3db0be9307eff5919821db7fc437da52ef7e3 Mon Sep 17 00:00:00 2001
From: Nicolas Williams 
Date: Tue, 3 Oct 2017 00:33:09 -0500
Subject: [PATCH] Add ALWAYS DEFERRED option for CONSTRAINTs

and CONSTRAINT TRIGGERs.

This is important so that one can have triggers and constraints that
must run after all of the user/client's statements in a transaction
(i.e., at COMMIT time), so that the user/client may make no further
changes (triggers, of course, still can).
---
 doc/src/sgml/catalogs.sgml | 17 -
 doc/src/sgml/ref/alter_table.sgml  |  4 +-
 doc/src/sgml/ref/create_table.sgml | 10 ++-
 doc/src/sgml/ref/create_trigger.sgml   |  2 +-
 doc/src/sgml/trigger.sgml  |  1 +
 src/backend/bootstrap/bootparse.y  |  2 +
 src/backend/catalog/heap.c |  1 +
 src/backend/catalog/index.c|  8 +++
 src/backend/catalog/information_schema.sql |  8 +++
 src/backend/catalog/pg_constraint.c|  2 +
 src/backend/catalog/toasting.c |  2 +-
 src/backend/commands/indexcmds.c   |  2 +-
 src/backend/commands/tablecmds.c   | 20 +-
 src/backend/commands/trigger.c | 28 +++--
 src/backend/commands/typecmds.c|  3 +
 src/backend/nodes/copyfuncs.c  |  3 +
 src/backend/nodes/outfuncs.c   |  4 ++
 src/backend/parser/gram.y  | 99 ++
 src/backend/parser/parse_utilcmd.c | 46 +-
 src/backend/utils/adt/ruleutils.c  |  4 ++
 src/bin/pg_dump/pg_dump.c  | 31 --
 src/bin/pg_dump/pg_dump.h  |  2 +
 src/bin/psql/describe.c| 34 +++---
 src/bin/psql/tab-complete.c|  4 +-
 src/include/catalog/index.h|  2 +
 src/include/catalog/pg_constraint.h| 42 +++--
 src/include/catalog/pg_constraint_fn.h |  1 +
 src/include/catalog/pg_trigger.h   | 16 ++---
 src/include/commands/trigger.h |  1 +
 src/include/nodes/parsenodes.h |  6 +-
 src/include/utils/reltrigger.h |  1 +
 src/test/regress/input/constraints.source  | 51 +++
 src/test/regress/output/constraints.source | 54 +++-
 33 files changed, 418 insertions(+), 93 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 9af77c1..2c3ed23 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -2202,6 +2202,13 @@ SCRAM-SHA-256$iteration count:salt<
  
 
  
+  conalwaysdeferred
+  bool
+  
+  Is the constraint always deferred?
+ 
+
+ 
   convalidated
   bool
   
@@ -6948,6 +6955,13 @@ SCRAM-SHA-256$iteration count:salt<
  
 
  
+  tgalwaysdeferred
+  bool
+  
+  True if constraint trigger is always deferred
+ 
+
+ 
   tgnargs
   int2
   
@@ -7009,7 +7023,8 @@ SCRAM-SHA-256$iteration count:salt<

 When tgconstraint is nonzero,
 tgconstrrelid, tgconstrindid,
-tgdeferrable, and tginitdeferred are
+tgdeferrable, tginitdeferred, and
+tgalwaysdeferred are
 largely redundant with the referenced pg_constraint entry.
 However, it is possible for a non-deferrable trigger to be associated
 with a deferrable constraint: foreign key constraints can have some
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index 0fb385e..e81d1fa 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -55,7 +55,7 @@ ALTER TABLE [ IF EXISTS ] name
 ALTER [ COLUMN ] column_name SET STORAGE { PLAIN | EXTERNAL | EXTENDED | MAIN }
 ADD table_constraint [ NOT VALID ]
 ADD table_constraint_using_index
-ALTER CONSTRAINT constraint_name [ DEFERRABLE | NOT DEFERRABLE ] [ INITIALLY DEFERRED | INITIALLY IMMEDIATE ]
+ALTER CONSTRAINT constraint_name [ DEFERRABLE | NOT DEFERRABLE | ALWAYS DEFERRED ] [ INITIALLY DEFERRED | INITIALLY IMMEDIATE ]
 VALIDATE CONSTRAINT constraint_name
 DROP CONSTRAINT [ IF EXISTS ]  constraint_name [ RESTRICT | CASCADE ]
 DISABLE TRIGGER [ trigger_name | ALL | USER ]
@@ -89,7 +89,7 @@ ALTER TABLE [ IF EXISTS ] name
 
 [ CONSTRAINT constraint_name ]
 { UNIQUE | PRIMARY KEY } USING INDEX index_name
-[ DEFERRABLE | NOT DEFERRABLE ] [ INITIALLY DEFERRED | INITIALLY IMMEDIATE ]
+[ DEFERRABLE | NOT DEFERRABLE | ALWAYS DEFERRED ] [ INITIALLY DEFERRED | INITIALLY IMMEDIATE ]
 
  
 
diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index 1477288..38c88b8 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -67,7 +67,7 @@ 

Re: [HACKERS] Possible SSL improvements for a newcomer to tackle

2017-10-04 Thread Nico Williams
On Wed, Oct 04, 2017 at 11:47:45AM -0700, Jeff Janes wrote:
> On Mon, Oct 2, 2017 at 9:33 PM, Tom Lane  wrote:
> > It's possible that we could adopt some policy like "if the root.crt file
> > exists then default to verify" ... but that seems messy and unreliable,
> > so I'm not sure it would really add any security.
> 
> That is what we do.  If root.crt exists, we default to verify-ca.
> 
> And yes, it is messy and unreliable.  I don't know if it adds any security
> or not.
> 
> Or do you mean we could default to verify-full instead of verify-ca?

I would rather psql defaulted to verify-full and let users deal with
errors by either a) configuring appropriate trust anchors and
provisioning appropriate certificates, or b) disabling verify-full.

Users should know that they are using psql(1) insecurely -- it has to be
obvious.

Yes, this would be a backwards-incompatible change, but security tends
to justify this sort of change.

Another possibility would be to make this default change only applicable
when using postgresql-scheme URIs (which I do, almost religiously --
they are much easier to use than all alternative connection data
specifications).

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Add ALWAYS DEFERRED option for constraints

2017-10-04 Thread Nico Williams
Ah, David Fetter points out that I should also update tabe completion
for psql.  I'll do that at some point.  I notice there's no table
completion for column constraint attributes...  If it's obvious enough
I'll try to fix that too.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Add ALWAYS DEFERRED option for constraints

2017-10-04 Thread Nico Williams
Ay, NOT WIP -- I left that in the Subject: by accident.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH] WIP Add ALWAYS DEFERRED option for constraints

2017-10-04 Thread Nico Williams

[make check-world passes.  Tests and docs included.  Should be ready for
code review.]

Attached are patches to add an ALWAYS DEFERRED option to CONSTRAINTs and
CONSTRAINT TRIGGERs, meaning: SET CONSTRAINTS .. IMMEDIATE will not make
immediate any constraint/trigger that is declared as ALWAYS DEFERRED.

I.e., the opposite of NOT DEFERRED.

Motivation:

 - Security.

   One may have triggers they need to always be deferred and they
   cannot give direct PG access because of SET CONSTRAINTS .. IMMEDIATE.

   I have such triggers that must run at the end of the transaction
   (after the last statement prior to COMMIT sent by the client/user),
   which I make DEFERRABLE, INITIALLY DEFERRED CONSTRAINT TRIGGERs.

   I have written SQL code to detect that constraint triggers have fired
   too soon, but I'd rather not need it as it does slow things down (it
   uses DDL event triggers and per-table triggers).

   Making it easier to write secure code DEFERRED CONSTRAINT TRIGGERs
   seems like a good idea to me.

 - Symmetry.

   Not using NOT DEFERRABLE is not the inverse of NOT DEFERRABLE.  There
   is no inverse at this time.

   If we can have NOT DEFERRABLE constraints, why not also the inverse,
   a constraint that cannot be made IMMEDIATE with SET CONSTRAINTs?


I've *not* cleaned up C style issues in surrounding -- I'm not sure
if that's desired.  Not cleaning up makes it easier to see what I
changed.

Some questions for experienced PostgreSQL developers:

Q0: Is this sort of patch welcomed?

Q1: Should new columns for pg_catalog tables go at the end, or may they
be added in the middle?

FYI, I'm adding them in the middle, so they are next to related
columns.

Q2: Can I add new columns to information_schema tables, or are there
standards-compliance issues with that?

This is done in the second patch, and it can be dropped safely.

Q3: Perhaps I should make this NOT IMMEDIATE rather than ALWAYS DEFERRED?
Making it NOT IMMEDIATE has the benefit of not having to change the
precedence of ALWAYS to avoid a shift/reduce conflict...  It may
also be more in keeping with NOT DEFERRED.  Thoughts?

Nico
-- 
>From 1d04483511f99cd3417df571ecc0498e928ace35 Mon Sep 17 00:00:00 2001
From: Nicolas Williams 
Date: Tue, 3 Oct 2017 00:33:09 -0500
Subject: [PATCH 1/2] Add ALWAYS DEFERRED option for CONSTRAINTs

and CONSTRAINT TRIGGERs.

This is important so that one can have triggers and constraints that
must run after all of the user/client's statements in a transaction
(i.e., at COMMIT time), so that the user/client may make no further
changes (triggers, of course, still can).
---
 doc/src/sgml/catalogs.sgml | 17 -
 doc/src/sgml/ref/alter_table.sgml  |  4 +-
 doc/src/sgml/ref/create_table.sgml | 10 ++-
 doc/src/sgml/ref/create_trigger.sgml   |  2 +-
 doc/src/sgml/trigger.sgml  |  1 +
 src/backend/bootstrap/bootparse.y  |  2 +
 src/backend/catalog/heap.c |  1 +
 src/backend/catalog/index.c|  8 +++
 src/backend/catalog/information_schema.sql |  8 +++
 src/backend/catalog/pg_constraint.c|  2 +
 src/backend/catalog/toasting.c |  2 +-
 src/backend/commands/indexcmds.c   |  2 +-
 src/backend/commands/tablecmds.c   | 20 +-
 src/backend/commands/trigger.c | 28 +++--
 src/backend/commands/typecmds.c|  3 +
 src/backend/nodes/copyfuncs.c  |  3 +
 src/backend/nodes/outfuncs.c   |  4 ++
 src/backend/parser/gram.y  | 99 ++
 src/backend/parser/parse_utilcmd.c | 46 +-
 src/backend/utils/adt/ruleutils.c  |  4 ++
 src/bin/pg_dump/pg_dump.c  | 31 --
 src/bin/pg_dump/pg_dump.h  |  2 +
 src/bin/psql/describe.c| 34 +++---
 src/include/catalog/index.h|  2 +
 src/include/catalog/pg_constraint.h| 42 +++--
 src/include/catalog/pg_constraint_fn.h |  1 +
 src/include/catalog/pg_trigger.h   | 16 ++---
 src/include/commands/trigger.h |  1 +
 src/include/nodes/parsenodes.h |  6 +-
 src/include/utils/reltrigger.h |  1 +
 src/test/regress/input/constraints.source  | 51 +++
 src/test/regress/output/constraints.source | 54 +++-
 32 files changed, 416 insertions(+), 91 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 9af77c1..2c3ed23 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -2202,6 +2202,13 @@ SCRAM-SHA-256$iteration count:salt<
  
 
  
+  conalwaysdeferred
+  bool
+  
+  Is the constraint always deferred?
+ 
+
+ 
   convalidated
   bool
   
@@ -6948,6 +6955,13 @@ SCRAM-SHA-256$iteration count:salt<
  
 
  
+  tgalwaysdeferred
+  bool
+  
+  True 

Re: [HACKERS] [PATCH] WIP Add ALWAYS DEFERRED option for constraints

2017-10-03 Thread Nico Williams
On Tue, Oct 03, 2017 at 02:51:30PM -0500, Nico Williams wrote:
> Anyways, this patch is NOT passing tests at the moment, and I'm not sure
> why.  I'm sure I can figure it out, but first I need to understand the
> failures.  E.g., I see this sort of difference:
> 
>\d testschema.test_index1
>Index "testschema.test_index1"
> Column |  Type  | Definition
>++
> id | bigint | id
>   -btree, for table "testschema.test_default_tab"
>   +f, for table "testschema.btree", predicate (test_default_tab)
> 
> which means, I think, that I've screwed up in src/bin/psql/describe.c,
> don't it's not obvious to me yet how.

Ah, I needed to adjust references to results columns.  I suspect that
something similar relates to other remaining failures.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] WIP Add ALWAYS DEFERRED option for constraints

2017-10-03 Thread Nico Williams
On Tue, Oct 03, 2017 at 10:10:59PM +0200, Andreas Joseph Krogh wrote:
> +1.
> 
> While we're in deferrable constraints land...;  I even more often need 
> deferrable conditional unique-indexes.
> In PG you now may have:
> ALTER TABLE email_folder ADD CONSTRAINT some_uk UNIQUE (owner_id, 
> folder_type, 
> name) DEFERRABLE INITIALLY DEFERRED; 
> 
> But this isn't supported:
> CREATE UNIQUE INDEX some_uk ON email_folder(owner_id, folder_type, name) 
> WHERE 
> parent_id IS NULL DEFERRABLE INITIALLY DEFERRED;
> 
> Are there any plans to support this?

Not by me, but I can take a look and, if it is trivial, do it.  At a
quick glance it does look like it should be easy enough to do it, at
least to get started on a patch.  

If I can get some help with my current patch, I'll take a look :)

But yes, I'd like to have full consistency between CREATE and ALTER.
Everything that one can do with CREATE should be possible to do with
ALTER, including IF NOT EXISTS.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH] WIP Add ALWAYS DEFERRED option for constraints

2017-10-03 Thread Nico Williams
Attached are patches to add an ALWAYS DEFERRED option to CONSTRAINTs and
CONSTRAINT TRIGGERs, meaning: SET CONSTRAINTS .. IMMEDIATE will not make
immediate any constraint/trigger that is declared as ALWAYS DEFERRED.

I.e., the opposite of NOT DEFERRED.  Perhaps I should make this NOT
IMMEDIATE?  Making it NOT IMMEDIATE has the benefit of not having to
change the precedence of ALWAYS to avoid a shift/reduce conflict...  It
may also be more in keeping with NOT DEFERRED.

Motivation:

 - I have trigger procedures that must run at the end of the transaction
   (after the last statement prior to COMMIT sent by the client/user),
   which I make DEFERRABLE, INITIALLY DEFERRED CONSTRAINT TRIGGERs out
   of, but SET CONSTRAINTS can be used to foil my triggers.  I have
   written SQL code to detect that constraint triggers have fired too
   soon, but I'd rather not need it.

 - Symmetry.  If we can have NOT DEFERRABLE constraints, why not also
   NOT IMMEDIABLE?  :)  Naturally "immediable" is not a word, but you
   get the point.

 - To learn my way around PostgreSQL source code in preparation for
   other contributions.

Anyways, this patch is NOT passing tests at the moment, and I'm not sure
why.  I'm sure I can figure it out, but first I need to understand the
failures.  E.g., I see this sort of difference:

   \d testschema.test_index1
   Index "testschema.test_index1"
Column |  Type  | Definition
   ++
id | bigint | id
  -btree, for table "testschema.test_default_tab"
  +f, for table "testschema.btree", predicate (test_default_tab)

which means, I think, that I've screwed up in src/bin/psql/describe.c,
don't it's not obvious to me yet how.

Some questions for experienced PostgreSQL developers:

Q0: Is this sort of patch welcomed?

Q1: Should new columns for pg_catalog.pg_constraint go at the end, or may
they be added in the middle?

Q2: Can I add new columns to information_schema tables, or are there
standards-compliance issues with that?

Q3: Is the C-style for PG documented somewhere?  (sorry if I missed this)

Q4: Any ideas what I'm doing wrong in this patch series?

Nico
-- 
>From 02f2765bde7e7d4fd357853c33dac55e4bdc2732 Mon Sep 17 00:00:00 2001
From: Nicolas Williams 
Date: Tue, 3 Oct 2017 00:33:09 -0500
Subject: [PATCH 1/4] WIP: Add ALWAYS DEFERRED option for CONSTRAINTs

and CONSTRAINT TRIGGERs.

This is important so that one can have triggers and constraints that
must run after all of the user/client's statements in a transaction
(i.e., at COMMIT time), so that the user/client may make no further
changes (triggers, of course, still can).
---
 WIP| 25 +
 doc/src/sgml/catalogs.sgml |  7 +++
 doc/src/sgml/ref/alter_table.sgml  |  4 +-
 doc/src/sgml/ref/create_table.sgml | 10 ++--
 doc/src/sgml/ref/create_trigger.sgml   |  2 +-
 src/backend/catalog/heap.c |  1 +
 src/backend/catalog/index.c|  8 +++
 src/backend/catalog/information_schema.sql |  8 +++
 src/backend/catalog/pg_constraint.c|  2 +
 src/backend/catalog/toasting.c |  2 +-
 src/backend/commands/indexcmds.c   |  2 +-
 src/backend/commands/tablecmds.c   | 20 +++-
 src/backend/commands/trigger.c | 25 +++--
 src/backend/commands/typecmds.c|  3 ++
 src/backend/parser/gram.y  | 81 --
 src/backend/parser/parse_utilcmd.c |  1 +
 src/backend/utils/adt/ruleutils.c  |  2 +
 src/bin/pg_dump/pg_dump.c  | 23 +++--
 src/bin/pg_dump/pg_dump.h  |  2 +
 src/bin/psql/describe.c| 15 --
 src/include/catalog/index.h|  2 +
 src/include/catalog/pg_constraint.h|  4 +-
 src/include/catalog/pg_constraint_fn.h |  1 +
 src/include/catalog/pg_trigger.h   |  4 +-
 src/include/commands/trigger.h |  1 +
 src/include/nodes/parsenodes.h |  6 ++-
 src/include/utils/reltrigger.h |  1 +
 27 files changed, 221 insertions(+), 41 deletions(-)
 create mode 100644 WIP

diff --git a/WIP b/WIP
new file mode 100644
index 000..806df83
--- /dev/null
+++ b/WIP
@@ -0,0 +1,25 @@
+WIP notes for adding ALWAYS DEFERRED option for CONSTRAINTs and CONSTRAINT TRIGGERs
+===
+
+ - add ALWAYS DEFERRED syntax in src/backend/parser/gram.y (DONE)
+
+   (the existing NOT DEFERRABLE == ALWAYS IMMEDIATE, so we don't add that)
+
+ - add alwaysdeferred field to several structs -- wherever initdeferred
+   is defined, basically (DONE)
+
+ - add conalwaysdeferred column to pg_constraints
+
+ - in src/backend/commands/trigger.c modify places where all_isdeferred
+   and all_isset are used (DONE)
+
+ - add AFTER_TRIGGER_ALWAYSDEFERRED for AfterTriggerSharedData's
+   ats_event (struct) and fill it 

Re: [HACKERS] Possible SSL improvements for a newcomer to tackle

2017-10-03 Thread Nico Williams
On Tue, Oct 03, 2017 at 11:45:24AM +0200, Adrien Nayrat wrote:
> On 10/03/2017 06:15 AM, Zeus Kronion wrote:
> > 2) I was surprised to learn the following from the docs:
> > 
> >> By default, PostgreSQL will not perform any verification of the server
> > certificate. This means that it is possible to spoof the server identity 
> > (for
> > example by modifying a DNS record or by taking over the server IP address)
> > without the client knowing. In order to prevent spoofing, SSL certificate
> > verification must be used.
> > 
> > Is there a technical reason to perform no verification by default? Wouldn't 
> > a
> > safer default be desirable?
> 
> If you want to verify server's certificate you should use DANE [1] + DNSSEC 
> [2]
> ? (I am not an SSL expert too)
> 
> If I understand correctly, you can store your certificate in a DNS record
> (TLSA). Then the client can check the certificate. You must trust your DNS
> server (protection against spoofing), that's why you have to use DNSSEC.

+1, but it's trickier than you might think.  I can connect you with
Viktor Dukhovni, who has implemented DANE for OpenSSL, and done yeoman's
work getting DANE for SMTP working.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Possible SSL improvements for a newcomer to tackle

2017-10-03 Thread Nico Williams
On Tue, Oct 03, 2017 at 09:44:01AM -0400, Tom Lane wrote:
> Magnus Hagander  writes:
> > On Tue, Oct 3, 2017 at 6:33 AM, Tom Lane  wrote:
> >> I'm not an SSL expert, so insert appropriate grain of salt, but AIUI the
> >> question is what are you going to verify against?
> 
> > One way to do it would be to default to the "system global certificate
> > store", which is what most other SSL apps do. For example on a typical
> > debian/ubuntu, that'd be the store in /etc/ssl/certs/ca-certificates.crt.
> > Exactly where to find them would be distribution-specific though, and we
> > would need to actually add support for a second certificate store. But that
> > would probably be a useful feature in itself.
> 
> Maybe.  The impression I have is that it's very common for installations
> to use a locally-run CA to generate server and client certs.  I would not
> expect them to put such certs into /etc/ssl/certs.  But I suppose there
> might be cases where you would actually pay for a universally-valid cert
> for a DB server ...

No, that is very common.  However, in non-enterprise uses it's also very
common for those to be Web PKI certificates, which would be very
inappropriate for use in PG, so I agree that PG should not use the
system trust anchor set by default.  PG should just require that a trust
anchor set be configured.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Possible SSL improvements for a newcomer to tackle

2017-10-03 Thread Nico Williams
On Tue, Oct 03, 2017 at 12:33:00AM -0400, Tom Lane wrote:
> Zeus Kronion  writes:
> > 2) I was surprised to learn the following from the docs:
> 
> >> By default, PostgreSQL will not perform any verification of the server
> >> certificate.
> 
> > Is there a technical reason to perform no verification by default? Wouldn't
> > a safer default be desirable?
> 
> I'm not an SSL expert, so insert appropriate grain of salt, but AIUI the
> question is what are you going to verify against?  You need some local
> notion of which are your trusted root certificates before you can verify
> anything.  So to default to verification would be to default to failing to
> connect at all until user has created a ~/.postgresql/root.crt file with
> valid, relevant entries.  That seems like a nonstarter.
> 
> It's possible that we could adopt some policy like "if the root.crt file
> exists then default to verify" ... but that seems messy and unreliable,
> so I'm not sure it would really add any security.

You do always need trust anchors in order to verify a peer's
certificate.

Usually there will be a system-wide trust anchor set, though it may not
be appropriate for use with PG...

Still, it would be safer to refuse to connect until the lack of trust
anchors is rectified than to connect without warning about the inability
to verify a server.  By forcing the user (admins) to take action to
remediate the problem, the problem then gets fixed, whereas plowing on
creates an invisible (for many users) security problem.

Now, the use of channel binding from authentication methods like GSS-API
helps a fair bit, though mostly it helps by leveraging some other
authentication infrastructure that the users (admins) have set up --
they could easily have setup PKI too then.  With SCRAM there is much
less infrastructure (but also SCRAM requires very good passwords; a PAKE
would be much better).

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] generated columns

2017-10-02 Thread Nico Williams
So yes, distinguishing stored vs. not stored computed columns is useful,
especially if the expression can refer to other columns of the same row,
though not only then.

Examples:

  -- useful only if stored (assuming these never get updated)
  inserted_at TIMESTAMP WITHOUT TIME ZONE AS (clock_timestamp())

  -- useful only if stored
  uuid uuid AS (uuid_generate_v4())

  -- useful only if stored
  who_done_it TEXT (current_user)

  -- useful especially if not stored
  user_at_host TEXT (user || '@' || host)

  -- useful if stored
  original_user_at_host TEXT (user || '@' || host)

I assume once set, a stored computed column cannot be updated, though
maybe being able to allow this would be ok.

Obviously all of this can be done with triggers and VIEWs...  The most
useful case is where a computed column is NOT stored, because it saves
you having to have a table and a view, while support for the stored case
merely saves you having to have triggers.  Of course, triggers for
computing columns are rather verbose, so not having to write those would
be convenient.

Similarly with RLS.  RLS is not strictly necessary since VIEWs and
TRIGGERs allow one to accomplish much the same results, but it's a lot
of work to get that right, while RLS makes most policies very pithy.
(RLS for *update* policies, however, still can't refer to NEW and OLD,
so one still has to resort to triggers for updates in many cases).

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] generated columns

2017-10-02 Thread Nico Williams
On Mon, Oct 02, 2017 at 02:30:38PM -0400, Tom Lane wrote:
> Nico Williams <n...@cryptonector.com> writes:
> > On Mon, Oct 02, 2017 at 12:50:14PM -0400, Adam Brusselback wrote:
> >> So for me, i'd rather default to compute on read, as long storing the
> >> pre-computed value is an option when necessary.
> 
> > Sure, I agree.  I was just wondering whether there might be any other
> > difference besides performance characteristics.  The answer to that is,
> > I think, "no".
> 
> What about non-immutable functions in the generation expression?

Aha, thanks!  Yes, that would be noticeable.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] generated columns

2017-10-02 Thread Nico Williams
On Mon, Oct 02, 2017 at 12:50:14PM -0400, Adam Brusselback wrote:
> I know that for my use-cases, having both options available would be very
> appreciated.  The vast majority of the computed columns I would use in my
> database would be okay to compute on read.  But there are for sure some
> which would be performance prohibitive to have compute on read, so i'd
> rather have those stored.
> 
> So for me, i'd rather default to compute on read, as long storing the
> pre-computed value is an option when necessary.

Sure, I agree.  I was just wondering whether there might be any other
difference besides performance characteristics.  The answer to that is,
I think, "no".


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] generated columns

2017-10-02 Thread Nico Williams
On Thu, Aug 31, 2017 at 12:16:43AM -0400, Peter Eisentraut wrote:
> In previous discussions, it has often been a source of confusion whether
> these generated columns are supposed to be computed on insert/update and
> stored, or computed when read.  The SQL standard is not explicit, but
> appears to lean toward stored.  DB2 stores.  Oracle computes on read.

Question: How would one know the difference between storing computed
  columns vs. computing them on read?

Answer?:  Performance.  If the computation is slow, then you'll really
  notice on read.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] alter server for foreign table

2017-09-30 Thread Nico Williams
On Sat, Sep 30, 2017 at 03:58:04PM -0400, Andrew Dunstan wrote:
> On 09/30/2017 05:14 AM, Derry Hamilton wrote:
> > Just to say, yes, this would be handy. I've been using a variant of
> > that hack on reporting servers, while migrating systems from
> > proprietary databases.  It behaves quite gracefully when there are
> > incompatible options, and it fixes up properly with DROPs as the first
> > options.
> 
> I assume the proposal is to allow changing to a different server using
> the same FDW. I can see all sorts of odd things happening if we allow
> changing to a server of a different FDW.

Like what that could not happen without this feature anyways?

Suppose the foreign server becomes unreachable, changes its schema
completely, and becomes reachable again?  How would that be different
from changing the server name to one with a totally different schema?

Naturally one should shoot one's feet off, but the proposed feature
wouldn't exactly be a footgun.  To believe otherwise would be like
arguing that DROP TABLE (especially CASCASDE!) is a footgun, so better
not have it.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] alter server for foreign table

2017-09-29 Thread Nico Williams
On Fri, Sep 29, 2017 at 10:19:03PM +0200, David Fetter wrote:
> On Fri, Sep 29, 2017 at 01:47:59PM -0400, Tom Lane wrote:
> > Konstantin Knizhnik  writes:
> > > According to Postgresql documentation it is not possible to alter server 
> > > for foreign table:
> > > https://www.postgresql.org/docs/10/static/sql-alterforeigntable.html
> > 
> > Hmm, we'd have to check if the table's options were legal for the
> > new FDW, but in principle we could support this, I suppose.
> > Not sure if it's useful enough to be worth the trouble.
> 
> It would definitely be useful if it were available.  Nodes are a good
> bit more fungible than they were even 5 years back.

It would be nice to have DDLs for everything ALTERation that one could
make that is trivial to support.

It would also be nice to have a rule that every DDL (and every ADD/DROP
in ALTER statements) support IF EXISTS / IF NOT EXISTS, and, where
meaningful, OR REPLACE.  I work around a lot of missing IE/INE/OR by
using techniques like: conditioning on a schema query, using DROP IF
EXISTS then CREATE in a transaction (when the DROP has IE but the CREATE
does not have INE), catching exceptions, and so on.

These are little things -- quality of life sorts of things :)

I've also grown accustomed to writing complex pg_catalog queries.  It'd
be nice to have some built-in views on the pg_catalog -- something
better than the information_schema.  I have written some that we use for
code generation based on PG schemas; we'd be glad to contribute them.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multicolumn hash indexes

2017-09-29 Thread Nico Williams
On Fri, Sep 29, 2017 at 10:54:55AM -0400, Tom Lane wrote:
> There are few if any indexing techniques where the first column isn't
> significantly more important than the rest --- certainly that's true
> for btree, for example.  I do not think it's a showstopper if that's
> true for hash as well.

You have N>1 columns to index and which you'll be using a conjunction of
all of them together in queries, with equiality predicates.  No one of
those columns is sufficiently selective.  But all the columns together
are plenty selective enough.

Obviously a [multi-column] hash index should do.  The main question is
whether the planner can be made to not consider subsets of the columns
to the exclusion of the hash index -- maybe not, or not easily enough.

This is easy enough to implement with a B-Tree index on an expression
consisting of a decent hash function application to the row_to_json() of
a row composed of the columns in question.  But it requires using the
same expression in queries, naturally, which then acts as a barrier to
the planner's propensity to drop columns from consideration for index
selection.

A multi-column hash index facility shouldn't have to require anything
more than where clauses that are a conjunction of all the columns with
equality predicates.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: COMMIT TRIGGERs, take n+1

2017-09-29 Thread Nico Williams
The attached file demonstrates how to create COMMIT, BEGIN, and even
session CONNECT TRIGGERs for PostgreSQL using PlPgSQL only, via normal
and CONSTRAINT TRIGGERs.

There have been many threads on the PG mailing lists about commit
triggers, with much skepticism shown about the possible semantics of
such a thing.  Below we list use cases, and demonstrate reasonable,
useful, and desirable semantics.

The previous version of this could be defeated by using SET CONSTRAINTS
..  IMMEDIATE.  This version detects this when it would cause commit
triggers to run too soon, and causes an exception to be raised.  The
technique used to detect this could be used by anyone whose business
logic breaks when SET CONSTRAINTS ..  IMMEDIATE is used.

There is one shortcoming of this implementation: it is inefficient
because it has to use FOR EACH ROW triggers under the hood, so if you
do 1,000 inserts, then 999 of the resulting internal trigger
procedure invocations will be unnecessary.  These are FOR EACH ROW
triggers because that is the only level permitted for CONSTRAINT
triggers, which are used under the hood to trigger running at commit
time.

(It would be nice if CONSTRAINT triggers could be FOR EACH STATEMENT
too...)

Use-cases:

 - create TEMP schema before it's needed by regular triggers

   This can be useful if CREATE TEMP TABLE IF EXISTS and such in regular
   triggers could slow them down.

 - cleanup internal, temporary state left by triggers from earlier
   transactions

 - perform global consistency checks (assertions, if you like)

   Note that these can be made to scale by checking only the changes
   made by the current transaction.  Transition tables, temporal
   tables, audit tables -- these can all help for the purpose of
   checking only deltas as opposed to the entire database.

   Related: there was a thread about a patch to add assertions:

   
https://www.postgresql.org/message-id/flat/20131218113911.GC5224%40alap2.anarazel.de#20131218113911.gc5...@alap2.anarazel.de

 - update materializations of views when all the relevant deltas can
   be considered together

   I use an alternatively view materialization system that allows direct
   updates of the materialization table, and records updates in a
   related history table.  Being about to update such materializations
   via triggers is very convenient; being able to defer such updates as
   long as possible is a nice optimization.

 - call C functions that have external side-effects when you know the
   transaction will succeed (or previous ones that have succeeded but
   not had those functions called)

Semantics:

 - connect/begin/commit trigger procedures called exactly once per-
   transaction that had any writes (even if they changed nothing
   in the end), with one exception:

- exceptions thrown by triggers may be caught, and the triggering
  statement retried, in which case triggers will run again

 - connect/begin trigger procedures called in order of trigger
   name (ascending) before any rows are inserted/updated/deleted by
   any DML statements on non-TEMP tables in the current transaction

 - commit trigger procedures called in order of commit trigger name
   (ascending) at commit time, after the last statement sent by the
   client/user for the current transaction

 - begin and commit trigger procedures may perform additional write
   operations, and if so that will NOT cause additional invocations
   of begin/commit trigger procedures

 - commit trigger procedures may RAISE EXCEPTION, triggering a
   rollback of the transaction

Nico
-- 


commit_trigger.sql
Description: application/sql

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multicolumn hash indexes

2017-09-27 Thread Nico Williams
On Wed, Sep 27, 2017 at 11:57:23AM -0400, Tom Lane wrote:
> Robert Haas  writes:
> > On Wed, Sep 27, 2017 at 9:56 AM, Jesper Pedersen
> >  wrote:
> >> Maybe an initial proof-of-concept could store the hash of the first column
> >> (col1) plus the hash of all columns (col1, col2, col3) in the index, and 
> >> see
> >> what requirements / design decisions would appear from that.
> 
> > I thought about that sort of thing yesterday but it's not that simple.
> > The problem is that the hash code isn't just stored; it's used to
> > assign tuples to buckets.  If you have two hash codes, you have to
> > pick one of the other to use for assigning the tuple to a bucket.  And
> > then if you want to search using the other hash code, you have to
> > search all of the buckets, which will stink.
> 
> If we follow GIST's lead that the leading column is "most important",
> the idea could be to require a search constraint on the first column,
> which produces the hash that determines the bucket assignment.  Hashes
> for additional columns would just be payload data in the index entry.
> If you have search constraint(s) on low-order column(s), you can check
> for hash matches before visiting the heap, but they don't reduce how
> much of the index you have to search.  Even btree works that way for
> many combinations of incomplete index constraints.

But it might be difficult to obtain sufficient selectivity with any
choice of first column for an otherwise reasonable schema design.

Presumably that is why the OP is interested in hash indexing multiple
columns, as otherwise a plain old b-tree would have sufficed.

It might be better to address the planner assumptions about key prefixes
instead, though I recognize that that might be difficult (or perhaps
even undesirable).  The potential for slowing down the planner might be
too high.

I note that it's not possible today to create a hash index on an
expression -- if it were then the obvious workaround would be to create
a hash on a `row()` expression, then use a
`row() = row()` WHERE clause in
queries, which presumably the planner could not decompose and optimize
or pessimize.  If need be the planner could learn to treat
row() as a barrier to such optimizations.

A present workaround would be to denormalize the column list one cares
about into a single record-type column that one could then hash.  To use
this one would have to use a `record_column = row()`
WHERE clause.  Or, if record types cannot be hashed, then use
row_to_json() and hash the JSON text.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multicolumn hash indexes

2017-09-27 Thread Nico Williams
On Wed, Sep 27, 2017 at 09:56:26AM -0400, Jesper Pedersen wrote:
> On 09/26/2017 08:11 PM, Robert Haas wrote:
> >On Tue, Sep 26, 2017 at 7:18 PM, Tom Lane  wrote:
> >>Tomasz Ostrowski  writes:
> >>>I've noticed that hash indexes can't currently (in PG10) be multicolumn.
> >>>Are they technically hard to implement or just nobody took such a feature?
> >>
> >>It's not simple, particularly not if you wish that the index would support
> >>queries specifying conditions for just a subset of the indexed columns
> >>(an assumption that's buried pretty deeply in the planner, for one thing).
> >>Then you couldn't compute the hash.
> >
> >Whoa, that seems like moving the goalposts.  Somebody could design a
> >hash index that was intended to answer such queries, but it's not the
> >one we've got.  I think we should just aim to support equality queries
> >on all columns.  That seems like a fairly straightforward
> >generalization of what we've already got.
> >
> 
> This would require that the applications that are using the database knows
> about the index structure in order to pass down the right columns.

That's hardly a problem.  We (app devs) already do this all the time.
Sometimes we screw up and get a poor query plan, we figure it out, and
add an index or rewrite the query, or make some index more covering, ...

The real problems, according to Tom Lane, are that the query planner
makes deep assumptions about index keys having prefixes, which b-tree
indexes do but hash indexes do not, and also that optimizations that are
applied before selecting indexes can make it impossible to select a hash
index intended by the author of a query.  The latter problem is not
fatal, but can lead to surprises if not fixed, while the former problem
is fatal until fixed.

> I would say that in most cases that applications doesn't know about index
> structures. So, multiple indexes would have to be created (col1), (col1,
> col2), (col1, col3), ... which isn't ideal.

Nonsense, and also irrelevant to the question of whether multi-column
hash indexes should be supported.  It's not about what Joe User knows to
do, but about what  knows to do.

BTW, one could probably use an expression hash index to get a multi-
column hash index anyways, by using row values, JSON encodings of row
values, or some ad-hoc text representation of row values.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COMMIT TRIGGERs, take n, implemented with CONSTRAINT TRIGGERS

2017-09-25 Thread Nico Williams

I guess this got lost over the weekend and subsequent week (I was on
vacation).

On Fri, Sep 15, 2017 at 04:03:35PM -0500, Nico Williams wrote:
> On Fri, Sep 15, 2017 at 04:07:33PM -0400, Tom Lane wrote:
> > Nico Williams <n...@cryptonector.com> writes:
> > > On Fri, Sep 15, 2017 at 02:19:29PM -0500, Nico Williams wrote:
> > >> On Fri, Sep 15, 2017 at 11:26:08AM -0700, Andres Freund wrote:
> > >>> I think you should also explain why that's a desirable set of
> > >>> semantics.
> > 
> > > Now, why is this desirable: atomicity.
> > 
> > >The client/user can no longer add statements to the transaction,
> > >therefore now (commit-time) is a good time to run a procedure that
> > >can *atomically* examine the totatility of the changes made by the
> > >user.
> > 
> > I do not really understand this claim.  The argument for a commit trigger
> > seems to be exactly "I want to be the last thing that happens in the
> > transaction".  But as soon as you have two of them, one of them is not
> 
> The user defining the triggers is a very different thing from the user
> sending the various statements up to and including the COMMIT.  They
> need not be the same.
> 
> The DBA/developers/ops own the triggers.  The client user may not have
> any trigger creation privilege.
> 
> Being in control of the [pre-]commit triggers, I can control the order
> in which they run (by-name, lexical order).
> 
> The important thing is that all of the [pre-]commit triggers will run
> after the last statement in the TX send by the client.
> 
> Surely there's nothing strange abolut this -- it's already the case for
> [DEFERRED] CONSTRAINT TRIGGERs!

Not to belabor the point, though I'm being redundant: the important
thing is that we have something we can run after the last statement from
the _client_.  Trigger order is already what it is, and that's fine
because the dev/dba controls that, but does not control what the
_client_ sends.

I'm using PostgreSQL for an all-SQL application with:

 - [WISH] access via the PostgreSQL protocol (unconstrained)

 - [CURRENT] access via HTTP via PostgREST (fairly constrained)

This means we don't have full control over the client.  Even if we did,
the client authenticates as users, and we can't ensure that the users
aren't connecting separately and sending arbitrary commands.

Incidentally, there's also a need for a connect-time or BEGIN-time
trigger to simplify other trigger functions.  I often write trigger
functions that do CREATE TEMP TABLE IF NOT EXISTS -- a handy technique.
But it'd be nice if we could have temp tables (and indexes, and views,
and...) automatically created for each session or transaction.  If
nothing else, it would allow statement/row triggers to do less work,
thus run marginally faster.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COMMIT TRIGGERs, take n, implemented with CONSTRAINT TRIGGERS

2017-09-15 Thread Nico Williams
On Fri, Sep 15, 2017 at 04:07:33PM -0400, Tom Lane wrote:
> Nico Williams <n...@cryptonector.com> writes:
> > On Fri, Sep 15, 2017 at 02:19:29PM -0500, Nico Williams wrote:
> >> On Fri, Sep 15, 2017 at 11:26:08AM -0700, Andres Freund wrote:
> >>> I think you should also explain why that's a desirable set of
> >>> semantics.
> 
> > Now, why is this desirable: atomicity.
> 
> >The client/user can no longer add statements to the transaction,
> >therefore now (commit-time) is a good time to run a procedure that
> >can *atomically* examine the totatility of the changes made by the
> >user.
> 
> I do not really understand this claim.  The argument for a commit trigger
> seems to be exactly "I want to be the last thing that happens in the
> transaction".  But as soon as you have two of them, one of them is not

The user defining the triggers is a very different thing from the user
sending the various statements up to and including the COMMIT.  They
need not be the same.

The DBA/developers/ops own the triggers.  The client user may not have
any trigger creation privilege.

Being in control of the [pre-]commit triggers, I can control the order
in which they run (by-name, lexical order).

The important thing is that all of the [pre-]commit triggers will run
after the last statement in the TX send by the client.

Surely there's nothing strange abolut this -- it's already the case for
[DEFERRED] CONSTRAINT TRIGGERs!

> going to be the last thing.  Maybe you could address that by requiring
> the triggers to be read-only, but then it doesn't seem like a very useful

No, you think something is a problem that isn't.  And I need these
triggers to be able to write.

> feature.  If there can be only one, maybe it's a usable feature or maybe
> not, but I'm inclined to think that CREATE TRIGGER is a pretty poor API

Multiple [pre-]commit triggers make as much (or little) sense as
multiple triggers on any table.

> for such a definition.  Triggers are generally expected to be objects
> you can create any number of.

That's what I expect of commitr triggers too: that I could have as many
as I want.

> Another question is exactly how you're going to "examine the totality of
> the transaction's changes"; a trigger, per se, isn't going to know *any*
> of what the transaction did let alone all of it.  We have some features
> like transition tables, but they're not accessible after end of statement;

Currently I use an audit facility I wrote (based on any number of audit
facilities I've seen) which automatically creates for-each-row triggers
for all tables and which then record (txid,old_row,new_row) for all
updates.  So a pre-copmmit trigger can simply examine all such audit
rows for txid_current() each table it cares about.

> and they're pretty useless in the face of DDL too.  It's going to be hard

I use event triggers to discover schema changes.

(I'm not concerned about superusers -- they can disable my
implementation.  I _am_ concerned about unprivileged users.)

> to produce a very credible use-case without a lot of work to expose that
> information somehow.

One dead-trivial use-case is to

  INSERT INTO tx_log.tx_log("txid","who","when","how_long")
  SELECT txid_current(), current_user, current_timestamp,
 clock_timestamp() - current_timestamp;

And yes, I could just as well use a DEFERRED CONSTRAINT TRIGGER on every
table than does this INSERT with ON CONFLICT ("txid") DO NOTHING.
Except that an unprivileged user could SET CONSTRAINTS ALL IMMEDIATE
(ugh), and that this would be slower (once per-row) than doing it just
once at commit time.

> (Some closely related work is being done for logical decoding, btw.
> I wonder which use-cases for this might be satisfied by having a logical
> decoding plug-in watching the WAL output.)

My use case involves generating a stream of incremental updates to
hand-materialized views (because the materialized views in PG do not
expose deltas).  That *almost* fits the logical WAL decoder concept, but
fails on account of needing to be incremental updates not of _raw_ data,
but of views on that data.

> > Commit triggers also allow one to record transaction boundaries, and
> > NOTIFY listeners less frequently than if one did a NOTIFY in normal
> > for-each-row/statement triggers.
> 
> Um, NOTIFY already collapses duplicate notifications per-transaction.

Oh, that I didn't know.  (But it changes nothing for me.)

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COMMIT TRIGGERs, take n, implemented with CONSTRAINT TRIGGERS

2017-09-15 Thread Nico Williams
On Fri, Sep 15, 2017 at 12:25:08PM -0700, Andres Freund wrote:
> On 2017-09-15 14:19:29 -0500, Nico Williams wrote:
> > Please see my post and the linked file to see why.
> 
> The discussions here are often going to be referred back to in years, so
> external links where we aren't sure about the longevity (like e.g. links
> to the mailing list archive, where we're fairly sure), aren't liked
> much.  If you want to argue for a change, it should happen on-list.

Fair enough.  I thought I had given enough detail, but here is the code.
It's just an event trigger that ensures every table has a DEFERRED
CONSTRAINT TRIGGER that runs a function that debounces invocations so
that the "commit trigger" function runs just once:

/*
 * Copyright (c) 2017 Two Sigma Open Source, LLC.
 * All Rights Reserved
 *
 * Permission to use, copy, modify, and distribute this software and its
 * documentation for any purpose, without fee, and without a written agreement
 * is hereby granted, provided that the above copyright notice and this
 * paragraph and the following two paragraphs appear in all copies.
 *
 * IN NO EVENT SHALL TWO SIGMA OPEN SOURCE, LLC BE LIABLE TO ANY PARTY FOR
 * DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING
 * LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS DOCUMENTATION,
 * EVEN IF TWO SIGMA OPEN SOURCE, LLC HAS BEEN ADVISED OF THE POSSIBILITY OF
 * SUCH DAMAGE.
 *
 * TWO SIGMA OPEN SOURCE, LLC SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING,
 * BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
 * FOR A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS ON AN "AS IS"
 * BASIS, AND TWO SIGMA OPEN SOURCE, LLC HAS NO OBLIGATIONS TO PROVIDE
 * MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
 */

/*
 * This file demonstrates how to create a "COMMIT TRIGGER" for
 * PostgreSQL using CONSTRAINT TRIGGERs.
 *
 * There have been many threads on the PG mailing lists about commit
 * triggers, with much skepticism shown about the possible semantics of
 * such a thing.
 *
 * Below we demonstrate reasonable, useful, and desirable semantics, how
 * to obtain them with PG today.
 *
 * There are three shortcomings of this implementation:
 *
 * a) It is possible defeat this implementation by using
 *
 *  SET CONSTRAINTS ... IMMEDIATE;
 *
 *or otherwise disabling the triggers created under the hood herein.
 *
 *The ability to make these triggers run early can be *dangerous*,
 *depending on the application.  It is especially dangerous given
 *that no privilege is needed in order to do this, and there's no
 *way for a CONSTRAINT TRIGGER to detect when it is called _last_,
 *only when it is called _first_, in any transaction.
 *
 * b) This implementation serializes write transactions implicitly by
 *having a single row encode commit trigger state.
 *
 *(This is easily fixed though.)
 *
 * c) This implementation is inefficient because CONSTRAINT TRIGGERs
 *have to be FOR EACH ROW triggers.  Thus a transaction that does
 *1,000 inserts will cause 999 unnecessary trigger procedure calls
 *under the hood.  Also, because CONSTRAINT TRIGGERs have to be FOR
 *EACH ROW triggers, PG has to track OLD/NEW row values for all
 *affected rows, even though commit triggers obviously don't need
 *this.
 *
 * (Also, for simplicity we use SECURITY DEFINER functions here,
 * otherwise we'd have to have additional code to grant to
 * public the ability to call our functions.  We would need additional
 * code by which to ensure that users do not toggle internal state to
 * prevent commit trigger execution.)
 *
 * For example, to create a commit trigger that invokes
 * commit_trigger.example_proc() at the end of any _write_ transaction,
 * run the following in psql:
 *
 *  -- Load commit trigger functionality:
 *  \i commit_trigger.sql
 *
 *  -- CREATE COMMIT TRIGGER egt
 *  -- EXECUTE PROCEDURE commit_trigger.example_proc();
 *  INSERT INTO commit_trigger.triggers
 *  (trig_name, proc_schema, proc_name)
 *  SELECT 'egt', 'commit_trigger', 'example_proc';
 *
 * Demo:
 *
 *  db=# \i commit_trigger.sql
 *  
 *  db=# INSERT INTO commit_trigger.triggers
 *  db-# (trig_name, proc_schema, proc_name)
 *  db-# SELECT 'egt', 'commit_trigger', 'example_proc';
 *  db=#
 *  db=# CREATE SCHEMA eg;
 *  CREATE SCHEMA
 *  db=# CREATE TABLE eg.x(a text primary key);
 *  CREATE TABLE
 *  db=# BEGIN;
 *  BEGIN
 *  db=# INSERT INTO eg.x (a) VALUES('foo');
 *  INSERT 0 1
 *  db=# INSERT INTO eg.x (a) VALUES('bar');
 *  INSERT 0 1
 *  db=# COMMIT;
 *  NOTICE:  example_proc() here!  Should be just one for this TX (txid 208036)
 *  CONTEXT:  PL/pgSQL function example_proc() line 3 at
 *  RAISE
 *  COMMIT
 *  db=# INSERT INTO eg.x (a) VALUES('foobar');
 *  NOTICE:  example_proc() here!  Should be just one for

Re: [HACKERS] COMMIT TRIGGERs, take n, implemented with CONSTRAINT TRIGGERS

2017-09-15 Thread Nico Williams
On Fri, Sep 15, 2017 at 02:19:29PM -0500, Nico Williams wrote:
> On Fri, Sep 15, 2017 at 11:26:08AM -0700, Andres Freund wrote:
> > I think you should also explain why that's a desirable set of
> > semantics.

Note that DEFERRED CONSTRAINT TRIGGERs already have these semantics,
except of course that an unprivileged user can force them to run
IMMEDIATEly rather than DEFERRED.

Now, why is this desirable: atomicity.

   The client/user can no longer add statements to the transaction,
   therefore now (commit-time) is a good time to run a procedure that
   can *atomically* examine the totatility of the changes made by the
   user.

This allows one to perform consistency checks across the entire DB.
One might speed them up by examining deltas recorded by for-each-row
triggers, but logically one could check the entire state of the DB.

Commit triggers also allow one to record transaction boundaries, and
NOTIFY listeners less frequently than if one did a NOTIFY in normal
for-each-row/statement triggers.

These are all good things -- or at least they aren't bad things.

For NOTIFY, it would be nice to have a post-commit trigger, though such
a thing could do nothing more than NOTIFY!

(A "post-commit" trigger could certainly not write to the DB unless it
did so in a new transaction that itself could not invoke post-commit
triggers that could write more...  Such a thing would be very strange!
But I'm not asking for a post-commit trigger feature.)

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COMMIT TRIGGERs, take n, implemented with CONSTRAINT TRIGGERS

2017-09-15 Thread Nico Williams
On Fri, Sep 15, 2017 at 11:26:08AM -0700, Andres Freund wrote:
> On 2017-09-14 14:41:12 -0500, Nico Williams wrote:
> > I've read through several old threads on COMMIT TRIGGERs.  Rather than
> > write a lengthy post addressing past debates, here's an implementation
> > and demonstration of [an approximation of] COMMIT TRIGGERs with natural
> > and _desirable_ semantics:
> 
> I think you should also explain why that's a desirable set of
> semantics. E.g. explain the use case you have and potential other
> use-cases.  I think it should also be explained that these are
> *pre*-commit triggers - IIRC some people have asked for *post*-commit
> triggers in the past.

Responding out of order:

 - Yes, this is a pre-commit thing.

   It's the *same* as DEFERRED CONSTRAINT TRIGGERs.  After all, that's
   how I'm implementing this now :)

   Critically, the client/user can no longer execute additional
   statements at this point, since they've executed COMMIT.  Therefore
   these trigger procedures will see *all* of the changes made by the
   user (and all of the changes made by commit triggers that run before
   them, so, as usual, trigger invocation order matters).

 - As to use-cases, I listed a few in my post:

- update/refresh view materializations
- consistency checks
- NOTIFY
- record history (in particular, record transaction boundaries)
- and, no doubt, others

Of course all of this can [almost!] be done with CONSTRAINT TRIGGERs,
since that's what I'm demonstrating.

HOWEVER, there's a *very serious* problem with CONSTRAINT TRIGGERs:
unprivileged users can make them run IMMEDIATEly rather than deferred.

Also, using CONSTRAINT TRIGGERs for this is inefficient.  Please see my
post and the linked file to see why.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COMMIT TRIGGERs, take n, implemented with CONSTRAINT TRIGGERS

2017-09-15 Thread Nico Williams
On Fri, Sep 15, 2017 at 4:11 AM Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Thu, Sep 14, 2017 at 10:41 PM, Nico Williams <n...@cryptonector.com>
> wrote:
>
>> https://github.com/twosigma/postgresql-contrib/
>>
>> https://github.com/twosigma/postgresql-contrib/blob/master/commit_trigger.sql
>>
>> https://raw.githubusercontent.com/twosigma/postgresql-contrib/master/commit_trigger.sql
>
>
> Do I understand correctly that this is SQL implementation of COMMIT
> TRIGGER functionality which is a prototype used to demonstrate it.  And if
> this prototype is well-accepted then you're going to write a patch for
> builtin COMMIT TRIGGER functionality.  Is it right?
>

That's the idea, yes.

Nico
-- 

>


[HACKERS] COMMIT TRIGGERs, take n, implemented with CONSTRAINT TRIGGERS

2017-09-14 Thread Nico Williams
I've read through several old threads on COMMIT TRIGGERs.  Rather than
write a lengthy post addressing past debates, here's an implementation
and demonstration of [an approximation of] COMMIT TRIGGERs with natural
and _desirable_ semantics:

 - commit triggers run exactly once in any write transaction

 - commit triggers run at the _end_ of any write transaction

 - multiple commit triggers may be declared, and they run in name
   lexical order

 - commit triggers do NOT run in read-only transactions

 - commit trigger procedures can do anything any any other trigger
   procedure can do: DDL, DML, NOTIFY, ...

There is just one undesirable bit of semantics in this implementation:
unprivileged users can break its semantics by executing SET CONSTRAINTS
... IMMEDIATE.  Obviously this is bad, at least for some possible uses
of commit triggers.

Also, this implementation is somewhat inefficient since under the hood
it uses deferred CONSTRAINT TRIGGERs, which have to be FOR EACH ROW
triggers...

To use this:

 - download commit_trigger.sql (reviews welcome!)
   
 - run this in psql:

  -- Load commit trigger functionality:
  \i commit_trigger.sql

 - run this in psql to demo:

  -- CREATE COMMIT TRIGGER egt
  -- EXECUTE PROCEDURE commit_trigger.example_proc();
  INSERT INTO commit_trigger.triggers
  (trig_name, proc_schema, proc_name)
  SELECT 'egt', 'commit_trigger', 'example_proc';

  CREATE SCHEMA eg;
  CREATE TABLE eg.x(a text primary key);
  BEGIN;
  INSERT INTO eg.x (a) VALUES('foo');
  INSERT INTO eg.x (a) VALUES('bar');
  COMMIT;
  INSERT INTO eg.x (a) VALUES('foobar');
  INSERT INTO eg.x (a) VALUES('baz');
  DROP TABLE eg.x CASCADE;

   There should be exactly one NOTICE for the first transaction, and
   exactly one each for the two INSERTs subsequently done in auto-commit
   mode.

I hope this will put to rest all objections to COMMIT TRIGGERS, and that
it will lead to a proper implementation.

Uses of COMMIT TRIGGERs include:

 - update/refresh view materializations
 - consistency checks
 - NOTIFY
 - record history (in particular, record transaction boundaries)
 - and, no doubt, others

https://github.com/twosigma/postgresql-contrib/
https://github.com/twosigma/postgresql-contrib/blob/master/commit_trigger.sql
https://raw.githubusercontent.com/twosigma/postgresql-contrib/master/commit_trigger.sql

Cheers,

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL/JSON in PostgreSQL

2017-03-13 Thread Nico Williams
On Thu, Mar 09, 2017 at 07:12:07PM +0100, Sven R. Kunze wrote:
> From my day-to-day work I can tell, the date(time) type is the only missing
> piece of JSON to make it perfect for business applications (besides, maybe,
> a "currency" type).

And a binary type.  And a chunked-string type (to avoid having to escape
strings).  And an interval type.  And...


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL/JSON in PostgreSQL

2017-03-13 Thread Nico Williams
On Thu, Mar 09, 2017 at 12:58:55PM -0500, Robert Haas wrote:
> On Thu, Mar 9, 2017 at 12:48 PM, Sven R. Kunze  wrote:
> > On 08.03.2017 20:48, Peter van Hardenberg wrote:
> >>
> >> Small point of order: YAML is not strictly a super-set of JSON.
> >
> > I haven't read the whole standard, but from what I can see the standard
> > considers JSON an official subset of itself:
> > http://www.yaml.org/spec/1.2/spec.html
> 
> But there's apparent sophistry, like this, in that spec:

I agree with you.  But beware, the IETF has had multiple threads with
thousands of posts in them about these sorts of issues.  If you're not
careful you'll have such a thread on this list too.  It would be very
sad not to only let a group that really cares have such threads instead.

:)

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL/JSON in PostgreSQL

2017-03-13 Thread Nico Williams
On Tue, Mar 07, 2017 at 10:43:16PM +0100, Sven R. Kunze wrote:
> about the datetime issue: as far as I know, JSON does not define a
> serialization format for dates and timestamps.

Use strings in ISO 8601 format, with or without fractional seconds, and
maybe with 5-digit years.

> On the other hand, YAML (as a superset of JSON) already supports a
> language-independent date(time) serialization format
> (http://yaml.org/type/timestamp.html).

But YAML isn't what this is about.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Idea on how to simplify comparing two sets

2017-02-23 Thread Nico Williams
On Tue, Feb 07, 2017 at 01:03:14PM -0500, Tom Lane wrote:
> "David G. Johnston"  writes:
> 
> Actually ... now that you mention full join, I believe this works:
> 
> select * from (select ...) s1 full join (select ...) s2
>   on ((s1.*)=(s2.*)) where s1.* is distinct from s2.*;

You can drop the .*s:

  select * from (select ...) s1 full join (select ...) s2
on s1 = s2 where s1 is distinct from s2;

And even:

  select s1, s2 from (select ...) s1 natural full outer join (select ...) s2;

This makes it possible to write very generic (schema-wise) code for
comparing table sources.

As I've mentioned elsewhere, there is an issue with NULLs in columns...

I really, really would like either a full equijoin where equality treats
NULL = NULL -> true for this purpose, or a natural join where only
primary key or not-nullable columns are used, or a USING clause form
where I can specify such behavior without having to list all the columns
that should be used.

I use NATURAL FULL OUTER JOIN for computing materialized view diffs in
my alternate view materialization system.  NULLs are poison for this
purpose, yielding false positive differences.  But my code also uses the
table row value form above in order to avoid having to generate column
lists for a USING clause or expressions for ON.

These requests are not for syntactic sugar, not really.  But I realize
they may be non-trivial -- I may be looking for unobtanium.

> > That said I'm not sure how much we want to go down this road on our own.
> > It'd be nice to have when its needed but its not something that gets much
> > visibility on these lists to suggest a large pent-up demand.
> 
> Yeah, if this isn't in the standard and not in other databases either,
> that would seem to suggest that it's not a big requirement.

SQLite3 famously lacks FULL joins.  It kills me because the alternative
constructions become O(N log M) instead of O(N) for a properly
implemented FULL join (assuming suitable indices anyways).

I wouldn't suggest that that's a reason not to support FULL joins in any
other RDBMS, rather, I'd suggest that SQLite3 is missing an important
feature.

Pardon the tangent.  It may not really be applicable here, as here I
think OP is looking for syntactic sugar rather than an important
optimization.  But the point is that sometimes you have to lead the
standards-setting and/or the competition.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Idea on how to simplify comparing two sets

2017-02-23 Thread Nico Williams
On Tue, Feb 07, 2017 at 10:58:41AM -0500, Tom Lane wrote:
> Joel Jacobson  writes:
> > Currently there is no simple way to check if two sets are equal.
> 
> Uh ... maybe check whether SELECT set1 EXCEPT SELECT set2
> and SELECT set2 EXCEPT SELECT set1 are both empty?

Even better, NATURAL(*) FULL OUTER JOIN the two table sources and check
that the result is empty.  If the two sources have useful indices (or if
PG constructs suitable automatic indices for them) for this then the
query should be O(N).

(*) However, if you do this then there'd better not be any NULLs in
columns, otherwise you'll get false positives for differences.  Of
course, if the two table sources have common primary key prefixes and
you only care about equality in those columns, then just FULL OUTER JOIN
USING ().

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Make subquery alias optional in FROM clause

2017-02-23 Thread Nico Williams
On Thu, Feb 23, 2017 at 10:37:16AM +0100, Bernd Helmle wrote:
> Am Mittwoch, den 22.02.2017, 22:17 -0500 schrieb Tom Lane:
> > [ shrug... ]  Well, I won't resist this hard as long as it's done
> > competently, which to me means "the subquery name doesn't conflict
> > with
> > anything else".  Not "it doesn't conflict unless you're unlucky
> > enough
> > to have used the same name elsewhere".  There are a couple ways we
> > could
> > achieve that result, but the submitted patch fails to.
> 
> Right, i'm going to give it a try then. Currently i see these options:
> 
> * Validate any generated alias against a list of explicit alias names.
> 
> This means we have to collect explicit alias names in, say a hashtable,
> and validate a generated name against potential collisions and retry.
> Or better, generate the name in a way that doesn't produce a collision
> with this list.

There's another option:

 * Gensym an alias name, and if the compilation fails with that alias
   name as a conflict, try again with a new gensym'ed name.

> * Don't force/generate an alias at all.

That seems like a lot of work.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Make subquery alias optional in FROM clause

2017-02-22 Thread Nico Williams
On Wed, Feb 22, 2017 at 10:08:38AM -0500, Tom Lane wrote:
> Bernd Helmle  writes:
> >> From time to time, especially during migration projects from Oracle to
> > PostgreSQL, i'm faced with people questioning why the alias in the FROM
> > clause for subqueries in PostgreSQL is mandatory. The default answer
> > here is, the SQL standard requires it.
> 
> Indeed.  When I wrote the comment you're referring to, quite a few years
> ago now, I thought that popular demand might force us to allow omitted
> aliases.  But the demand never materialized.  At this point it seems
> clear to me that there isn't really good reason to exceed the spec here.
> It just encourages people to write unportable SQL code.

I suspect most users, like me, just roll their eyes, grumble, and put up
with it rather than complain.  It's a pain point, but tolerable enough
that no one bothers to demand a change.  Now that it's been done though,
allow me to add my voice in favor of it!

> > The patch generates an auto-alias for subqueries in the format
> > *SUBQUERY_* for subqueries and *VALUES_* for values
> > expressions.  is the range table index it gets during
> > transformRangeSubselect().
> 
> This is not a solution, because it does nothing to avoid conflicts with
> table names elsewhere in the FROM clause.  If we were going to relax this
> --- which, I repeat, I'm against --- we'd have to come up with something
> that would thumb through the whole query and make sure what it was
> generating didn't already appear somewhere else.  Or else not generate
> a name at all, in which case there simply wouldn't be a way to refer to
> the subquery by name; I'm not sure what that might break though.

On alias conflict... backtrack and retry with a new set of sub-query
names.  For generating the alias names all you need is a gensym-style
counter.  But yes, even this is tricky because you'd have to check that
the conflicting alias name is one of the gensym'ed ones.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PoC: Make it possible to disallow WHERE-less UPDATE and DELETE

2017-02-02 Thread Nico Williams
On Thu, Feb 02, 2017 at 12:14:10PM -0500, Tom Lane wrote:
> Also, somebody who wants a check like that isn't necessarily going to want
> "no WHERE clause" training wheels.  So you're going to need to think about
> facilities to enable or disable different checks.

WHERE-less-ness should be something that should be visible to a
statement trigger that could then reject the operation if desirable.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Contrib: pqasyncnotifier.c -- a shell command client for LISTEN

2017-01-23 Thread Nico Williams
I should also note that this is on github at
https://github.com/twosigma/postgresql-contrib

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Contrib: alternative MATERIALIZED VIEWs

2017-01-23 Thread Nico Williams
On Mon, Jan 23, 2017 at 06:05:25PM -0600, Jim Nasby wrote:
> On 1/23/17 5:38 PM, Nico Williams wrote:
> >Attached is an alternative implementation of MATERIALIZED VIEWs.
> 
> Interesting. I don't see this being accepted into core because it's plpgsql
> and it depends on the user to track what criteria to use to apply the
> update. The second item is the biggest issue.

I myself said this is not properly integrated.  I do use this in an
actual system, but ideally if any of this is to be welcomed, then it'd
have to be properly integrated.  I don't see what's wrong with the use
of plpgsql as the MV system in PG uses SQL, but in a proper integration
a lot of this would be re-written in C (the use of SQL for the delta
computation and the updates of the history table would remain, but I
think too so would the triggers needed to update the history table when
the MV is updated directly).

As to the second issue...  Just the other day Kevin Grittner was
concerned about automatic MV updates because some of them can take too
long.  Now, PG could timeout MV updates, roll them back, and mark an MV
as requiring a refresh, but the user might still have something to say
about this: they really, really might want some updates to always happen
synchronously, and others always asynchronously.  I think it's really
necessary to give the user this sort of control.

> That said, I think this would be useful to some people as an extension. I
> suggest you put it on github (or equivalent) and upload it to
> http://pgxn.org.

Ah, I forgot to mention that it is on github here:

https://github.com/twosigma/postgresql-contrib

> In terms of community support, the next step is to get statement-level
> support for NEW and OLD, something I think Kevin has been working on.

Well, I think there's a lot to look at here.  That's partly why a
plpgsql-coded implementation is a good first step, IMO: it helps find
issues.  For me the need for a PK for MVs is fairly important; but the need
for deltas/history table is critical, and there is no reason not to have
it considering that deltas are produced internally during REFRESH ..
CONCURRENTLY.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Contrib: pqasyncnotifier.c -- a shell command client for LISTEN

2017-01-23 Thread Nico Williams
On Tue, Jan 24, 2017 at 12:48:49AM +0100, Marko Tiikkaja wrote:
> Did you forget the attachment?

I guess I must have.  Attached this time.
/*
 * Copyright (c) 2016 Two Sigma Open Source, LLC.
 * All Rights Reserved
 *
 * Permission to use, copy, modify, and distribute this software and its
 * documentation for any purpose, without fee, and without a written agreement
 * is hereby granted, provided that the above copyright notice and this
 * paragraph and the following two paragraphs appear in all copies.
 *
 * IN NO EVENT SHALL TWO SIGMA OPEN SOURCE, LLC BE LIABLE TO ANY PARTY FOR
 * DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING
 * LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS DOCUMENTATION,
 * EVEN IF TWO SIGMA OPEN SOURCE, LLC HAS BEEN ADVISED OF THE POSSIBILITY OF
 * SUCH DAMAGE.
 *
 * TWO SIGMA OPEN SOURCE, LLC SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING,
 * BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
 * FOR A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS ON AN "AS IS"
 * BASIS, AND TWO SIGMA OPEN SOURCE, LLC HAS NO OBLIGATIONS TO PROVIDE
 * MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
 */

/*
 * Based on src/test/examples/testlibpq2.c from Postgres 9.4.4
 *
 * pqasyncnotifier - LISTENs and reports notifications without polling
 *
 * Usage: pqasyncnotifier CONNINFO TABLE_NAME
 */

#include 
#ifdef WIN32
#include 
#endif

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

static void
err(PGconn *conn, ConnStatusType rcc, ExecStatusType rce, const char *msg)
{
int e = errno;

fprintf(stderr, "Error: %s: ", msg);
if (conn == NULL || (rcc == CONNECTION_OK && rce == PGRES_COMMAND_OK))
fprintf(stderr, "%s", strerror(e));
else if (conn != NULL && (rcc != CONNECTION_OK || rce != PGRES_COMMAND_OK))
fprintf(stderr, "%s", PQerrorMessage(conn));
else
fprintf(stderr, "Unknown error");
fprintf(stderr, "\n");
if (conn)
PQfinish(conn);
exit(e ? e : 1);
}

static struct option lopts[] = {
{"help", 0, 0, 'h'},
{"verbose", 0, 0, 'v'},
{"daemonize", 0, 0, 'D'},
{"include-payload", 0, 0, 'd'},
{"include-pid", 0, 0, 'p'},
{"include-time", 0, 0, 't'},
{"include-channel-name", 0, 0, 'c'},
};

static void
usage(const char *prog, int e)
{
size_t i;

if (strrchr(prog, '/') != NULL && strrchr(prog, '/')[1] != '\0')
prog = strrchr(prog, '/') + 1;

fprintf(stderr, "Usage: %s [options] CONNINFO TABLE-NAME ...\n", prog);
fprintf(stderr, "Options:\n");
for (i = 0; i < sizeof(lopts)/sizeof(lopts[0]); i++)
fprintf(stderr, "\t-%c, --%s\n", lopts[i].val, lopts[i].name);
exit(e);
}

static
void
daemonize(int ready)
{
static int pipe_fds[2] = {-1, -1};
pid_t pid;
char code = 1;

if (ready) {
errno = 0;
if (pipe_fds[1] != -1) {
while (write(pipe_fds[1], "", sizeof("")) != 1 && errno != EINTR)
;
if (errno != 0)
err(NULL, 0, 0, "write() failed while daemonizing");
if (close(pipe_fds[1]) != 0)
err(NULL, 0, 0, "close() failed while daemonizing");
pipe_fds[1] = -1;
}
printf("READY: %jd\n", (intmax_t)getpid());
return;
}

if (pipe(pipe_fds) == -1)
err(NULL, 0, 0, "pipe() failed");
pid = fork();
if (pid == -1)
err(NULL, 0, 0, "fork() failed");
if (pid == 0) {
(void) close(pipe_fds[0]);
pipe_fds[0] = -1;
return;
}
(void) close(pipe_fds[1]);
pipe_fds[1] = -1;
while (read(pipe_fds[0], , sizeof(code)) != 1 && errno != EINTR)
;
_exit(code);
}

int
main(int argc, char **argv)
{
const char  *prog = argv[0];
const char  *conninfo;
PGconn  *conn = NULL;
PGresult*res;
PGnotify*notify;
ConnStatusType  rcc = CONNECTION_OK;
ExecStatusType  rce = PGRES_COMMAND_OK;
int include_relname = 0;
int include_payload = 0;
int include_pid = 0;
int include_time = 0;
int verbose = 0;
charc;

setlinebuf(stdout);

while ((c = getopt_long(argc, argv, "+Dcdhptv", lopts, NULL)) != -1) {
switch (c) {
case 'D':
daemonize(0);
case 'c':
include_relname = 1;
break;
case 'd':
include_payload = 1;
break;
case 'h':
usage(prog, 0);
break;
case 'p':
include_pid = 1;
break;
case 't':
include_time = 1;
break;
case 'v':
verbose = 1;
break;
default:
usage(prog, 1);
break;
}
}

argc 

[HACKERS] Contrib: pqasyncnotifier.c -- a shell command client for LISTEN

2017-01-23 Thread Nico Williams
psql(1) does not output notifications asynchronously, as it does not
check for them when idle.  This makes it difficult to script handling of
NOTIFYs.

Attached is pqasyncnotifier.c, a simple command that allows one to
handle NOTIFYs asynchronously.

Cheers,

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Contrib: alternative MATERIALIZED VIEWs

2017-01-23 Thread Nico Williams
Attached is an alternative implementation of MATERIALIZED VIEWs.

The idea is to explore possible enahncements to the PostgreSQL
MATERIALIZED VIEW features.

Features:

 - All SQL-coded.

 - Keeps history of deltas computed at each refresh.

 - Allows DMLs of the materialized view, recording the changes in the
   same way as deltas from refreshes.

   This allows one to code TRIGGERs which update materialized views
   directly.

   Where synchronous updates of an MV can be fast, a TRIGGER can do it
   by querying the source view with additional constraints derived from
   the OLD/NEW rows, and then apply DMLs to the materialization table.

 - MVs can be marked as needing a refresh.

   This is useful where a synchronous update would be too slow.  Use a
   TRIGGER to mark the MV as needing a refresh and NOTIFY a waiting
   service.

 - Refreshes have the same concurrency semantics as REFRESH MATERIALIZED
   VIEW CONCURRENTLY.

 - Allows indices and constraints to be added to the materialzation
   table.

Issues:

 - NULLs in columns of the VIEW cause spurious deltas to be recorded for
   every refresh.  This is because two rows which are equal when
   considering NULLs to be equal are... not equal as far as SQL goes,
   thus such rows always appear to be deleted and inserted.

   This implementation uses a NATURAL FULL OUTER JOIN to compute the
   deltas between a before and an after materialization of a view during
   a refresh.  This avoids having to generate a USING(), which is a very
   convenient simplification, but also part of the source of this
   problem with NULLs.  (The history table has two record-type columns
   to hold entire rows.)

 - No integration.

 - Wonky column names in the history table ("awld", "noo").

Ideas:

 - CREATE MATERIALIZED VIEW should have these additional options, and
   ALTER MATERIALIZED VIEW should allow these to be specified after
   creation:

- WITH [UNLOGGED] HISTORY TABLE schema_name.table_name

- WITH PRIMARY KEY (column_list) -- probably not in ALTER MV though

  A PK on an MV does make sense when one considers the admonition in
  the PG docs to not have duplicate rows in the view...  Besides, an
  MV has a materialization table, and tables generally should have
  PKs!

- WITH CONSTRAINT  (same as in ALTER TABLE ADD
  constraint)


Also, a new type of JOIN might be useful: one that joins using only
columns that are part of the PK of both table sources.  Obviously this
would not be a generally-applicable JOIN type, as it would not work for
table sources that are subqueries or plain views...  But it would be
useful here for generating the FULL OUTER JOIN needed for computing
deltas between tables of the same form.

Nico
-- 


pseudo_mat_views.sql
Description: application/sql

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Updating MATERIALIZED VIEWs (Re: [HACKERS] delta relations in AFTER triggers)

2017-01-20 Thread Nico Williams
[Looking at your patch I see that it's not quite related to MVs, so I'm
changing the Subject.  Apologies for the noise.]

[Responding out of order.]

On Fri, Jan 20, 2017 at 03:37:20PM -0600, Kevin Grittner wrote:
> On Fri, Jan 20, 2017 at 2:08 PM, Nico Williams <n...@cryptonector.com> wrote:
> > BTW, automatic updates of certain types of MVs should be easy: add
> > constraints based on NEW/OLD rows from synthetic triggers to the
> > underlying query.
> 
> Convincing me that this is a good idea for actual MVs, versus
> pseudo-MVs using tables, would be an uphill battle.  [...]

I don't think it's necessary, and I would not insist on it.

My alternative MV implementation lets _me_ choose when to update an MV
synchronously, and when to defer refreshes, by using [hand-coded]
triggers.  This is good enough for me.

If these triggers could be automatically generated, that sure would be
nice, but some control would be needed over when to update the MV vs.
mark it as needing a refresh.

> > Our intention is to contribute this.  We're willing to sign
> > reasonable contribution agreements.
> 
> Posting a patch to these lists constitutes an assertion that you
> have authority to share the IP, and are doing so.  Referencing a
> URL is a bit iffy, since it doesn't leave an archival copy of the
> contribution under the community's control.

Fair enough.  I'll post the source file itself.  I've not done the work
of properly integrating it because I need to gauge interest first,
before dedicating a lot of effort to it.

> I am dubious, though, of the approach in general, as stated above.

I'm using this _now_.  With a caveat:

a) the trigger functions needed to either mark an MV as needing a
refresh, or else to update it directly, are hand-coded, and

b) I chose which operations yield synchronous MV updates and which defer
to a refresh.

The MV, in my scheme, is really just a table with triggers that update a
deltas table the same way that a refresh would.  A refresh locks the
table, disables those triggers, populates another table with the current
output of the underlying view, compares to the previous materialization,
and lastly generates, records, and applies deltas to the
materialization.

To give an example, adding a user to a group -> generally fast; deleting
a user (and thus all their group memberships) -> potentially very slow.

The "add a user to a group" case can then yield near real-time updates
of external caches, while the other case results in a deferred REFRESH
so as to not slow down the current transaction.  The deferred REFRESH is
not deferred too long, so the net effect is still very fast updates of
external caches.

> > However, there is a bug in the query planner that prevents this
> > from being very fast.  At some point I want to tackle that bug.
> 
> What bug is that?

I... asked for help on the IRC #postgresql channel.  I never posted here
about it.

Revisiting it now... the main problem was query _preparation time_, not
execution time.  So perhaps not so bad.  Still, it's worth looking into.

The query was something like this:

SELECT v.data->'verb_name' || '^' || (r.data->'named_acl_name') AS 
grant_name,
   grantee.right_entity_id AS grantee_id
FROM relationships grantee
JOIN relationships grant ON
 grantee.left_entity_id = grant.right_entity_id AND
 grantee.relationship_type_id IN (10421, 10431, 13591, 13921)
 AND grant.relationship_type_id = 10331
JOIN relationships perm_actions ON
 grantee.left_entity_id = perm_actions.right_entity_id AND
 perm_actions.relationship_type_id = 10381
JOIN relationships verb_in_vs ON
 verb_in_vs.right_entity_id = perm_actions.left_entity_id AND
 verb_in_vs.relationship_type_id = 10371
JOIN entities v ON v.id = verb_in_vs.left_entity_id
JOIN entities r ON r.id = grant.left_entity_id;

  (This query uses a bit of an EAV schema.  There's an "entities" table
  with an hstore column for storing attributes ("data") and another
  table, "relationships" that has (relationship_type_id, left_entity_id,
  right_entity_id) columns and which is indexed by both, left_entity_id
  and right_entity_id.  EAV schemas hide relevant information from the
  query planner, so there is that.)

The query plan for this is about as fast as one could hope.  After all,
it has to scan many of the rows.

Now suppose we were adding a new 'grantee' and wanted to generate the
additions that would result in the MV.  We could add this constraint to
the query:

WHERE grantee.left_entity_id = NEW.left_entity_id AND
grantee.right_entity_id = NEW.right_entity_id;

Now we've basically [almost] fully-specified the primary key for the
grantee table source.

The resulting query plan is actually pretty 

Re: [HACKERS] delta relations in AFTER triggers

2017-01-20 Thread Nico Williams
On Fri, Jan 20, 2017 at 01:37:33PM -0600, Kevin Grittner wrote:
> On Thu, Jan 19, 2017 at 4:14 PM, Nico Williams <n...@cryptonector.com> wrote:
> 
> Reviews welcome!

I will review.

> There is currently plenty of room for pseudo-MV implementations,
> and may be for a while.  It's a good indication of the need for the
> feature in core.  An implementation in the guts of core can have
> advantages that nothing else can, of course.  For example, for
> eager application of the deltas, nothing will be able to beat
> capturing tuples already in RAM and being looked at for possible
> trigger firing into a RAM-with-spill-to-disk tuplestore.

BTW, automatic updates of certain types of MVs should be easy: add
constraints based on NEW/OLD rows from synthetic triggers to the
underlying query.  However, there is a bug in the query planner that
prevents this from being very fast.  At some point I want to tackle that
bug.  Basically, the planner does not notice that a table source in a
join has a lookup key sufficiently well-specified by those additional
constraints that it should be the first table source in the outermost
loop.

> I don't have time to review what you've done right now, but will
> save that link to look at later, if you give permission to borrow
> from it (with proper attribution, of course) if there is something
> that can advance what I'm doing.  If such permission is not
> forthcoming, I will probably avoid looking at it, to avoid any
> possible copyright issues.

Our intention is to contribute this.  We're willing to sign reasonable
contribution agreements.

I'd appreciate a review, for sure.  Thanks!

> > Incidentally, it's really nice that PG has some "higher order" SQL
> > features that make this sort of thing easier.  In particular, here, row
> > values and record types, and being able to refer to a table as a column
> > of the table's record type.
> 
> Yeah, I found that quite handy in developing the REFRESH feature,
> and expect to be using it in incremental maintenance.

Indeed, I *copied* the pattern.  However, I didn't have to generate
SELECT statements that include column names, as I was able to just
compare table source row values.  There's a gotcha w.r.t. NULL columns,
but it affects the built-in REFRESH as well, IIRC.  The commentary in
our implementation discusses that in more detail.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] delta relations in AFTER triggers

2017-01-19 Thread Nico Williams
On Sat, Dec 17, 2016 at 08:15:49PM -0600, Kevin Grittner wrote:
> On Sun, Dec 4, 2016 at 11:35 PM, Haribabu Kommi
>  wrote:
> > Moved to next CF with "waiting on author" status.
> 
> [...]

I hope what I've done about delta relations will be mostly irrelevant
given your patch (which I've not looked at in detail), but just FYI,
I've built an alternate, all-SQL-coded materialized view system that
captures deltas between refreshes and deltas from direct DMLs of the
materialized view:

https://github.com/twosigma/postgresql-contrib/blob/master/pseudo_mat_views.sql

There are some good ideas there, IMO, even if that implementation were
useless because of your patch.

Incidentally, it's really nice that PG has some "higher order" SQL
features that make this sort of thing easier.  In particular, here, row
values and record types, and being able to refer to a table as a column
of the table's record type.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb problematic operators

2016-12-13 Thread Nico Williams
On Tue, Dec 13, 2016 at 10:26:24AM +0900, Michael Paquier wrote:
> On Mon, Dec 12, 2016 at 10:22 PM, Greg Stark  wrote:
> > One option might be for Postgres to define duplicate operator names
> > using ¿ or something else. I think ¿ is a good choice because it's a
> > common punctuation mark in spanish so it's probably not hard to find
> > on a lot of keyboards or hard to find instructions on how to type one.
> 
> Are you sure that using a non-ASCII character is a good idea for an
> in-core operator? I would think no.

Eventually language designers will cross that Rubicon in mainstream
languages.  And why not?  It sure would be convenient... from the
designer's p.o.v.  Of course, _users_ would be annoyed, as most users
in the English-speaking world will have no idea how to type such
characters, most others also will not know how to, and there will be
users still using non-Unicode locales who will be unable to type such
characters at all.  Cut-n-paste will save the day, not doubt, though
mostly/only for users using Unicode locales.

But it is tempting.

Using non-ASCII Unicode characters for _alternatives_ seems like a
possible starting point though, since that leaves users with a
universally- available ASCII alternative.  Still, now users would then
have to recognize multiple equivalent forms... ugh.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Faster Expression Processing and Tuple Deforming (including JIT)

2016-12-06 Thread Nico Williams
On Tue, Dec 06, 2016 at 12:36:41PM -0800, Andres Freund wrote:
> On 2016-12-06 15:25:44 -0500, Tom Lane wrote:
> > I'm not entirely thrilled with the idea of this being a configure-time
> > decision, because that forces packagers to decide for their entire
> > audience whether it's okay to depend on LLVM.  That would be an untenable
> > position to put e.g. Red Hat's packagers in: either they screw the people
> > who want performance or they screw the people who want security.

There's no security issue.  The dependency is on LLVM libraries, not
LLVM front-ends (e.g., clang(1)).

I don't think there's a real issue as to distros/packagers/OS vendors.
They already have to package LLVM, and they already package LLVM
libraries separately from LLVM front-ends.

> The argument for not install a c compiler seems to be that it makes it
> less convenient to build an executable. I doubt that having a C(++)
> library for code generation is convenient enough to change the picture
> there.

The security argument goes back to the days of the Morris worm, which
depended on having developer tools (specifically in that case, ld(1),
the link-editor).  But JIT via LLVM won't give hackers a way to generate
or link arbitrary object code.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Faster Expression Processing and Tuple Deforming (including JIT)

2016-12-06 Thread Nico Williams
On Tue, Dec 06, 2016 at 12:27:51PM -0800, Andres Freund wrote:
> On 2016-12-06 14:19:21 -0600, Nico Williams wrote:
> > A bigger concern might be interface stability.  IIRC the LLVM C/C++
> > interfaces are not very stable, but bitcode is.
> 
> The C API is a lot more stable than the C++ bit, that's the primary
> reason I ended up using it, despite the C++ docs being better.

Ah.

> > > I concur with your feeling that hand-rolled JIT is right out.  But
> > 
> > Yeah, that way lies maintenance madness.
> 
> I'm not quite that sure about that. I had a lot of fun doing some
> hand-rolled x86 JITing. Not that is a ward against me being mad.  But
> more seriously: Manually doing a JIT gives you a lot faster compilation
> times, which makes JIT applicable in a lot more situations.

What I meant is that each time there are new ISA extensions, or
differences in how relevant/significant different implementations of the
same ISA implement certain instructions, and/or every time you want to
add a new architecture... someone has to do a lot of very low-level
work.

> > > I'm not sure that whatever performance gain we might get in this
> > > direction is worth the costs.
> > 
> > Byte-/bit-coding query plans then JITting them is very likely to improve
> > performance significantly.
> 
> Note that what I'm proposing is a far cry away from that - this converts
> two (peformance wise two, size wise one) significant subsystems, but far
> from all the executors to be JIT able.  I think there's some more low

Yes, I know.

> hanging fruits (particularly aggregate transition functions), but
> converting everything seems to hit the wrong spot in the
> benefit/effort/maintainability triangle.

Maybe?  At least with the infrastructure in place for it someone might
try it and see.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Faster Expression Processing and Tuple Deforming (including JIT)

2016-12-06 Thread Nico Williams
On Tue, Dec 06, 2016 at 01:56:28PM -0500, Tom Lane wrote:
> Andres Freund  writes:
> > I'm posting a quite massive series of WIP patches here, to get some
> > feedback.
> 
> I guess the $64 question that has to be addressed here is whether we're
> prepared to accept LLVM as a run-time dependency.  There are some reasons
> why we might not be:
> 
> * The sheer mass of the dependency.  What's the installed footprint of
> LLVM, versus a Postgres server?  How hard is it to install from source?

As long as it's optional, does this matter?

A bigger concern might be interface stability.  IIRC the LLVM C/C++
interfaces are not very stable, but bitcode is.

> * How will we answer people who say they can't accept having a compiler
> installed on their production boxes for security reasons?

You don't need the front-ends (e.g., clang) installed in order to JIT.

> * Are there any currently-interesting platforms that LLVM doesn't work
> for?  (I'm worried about RISC-V as much as legacy systems.)

The *BSDs support more platforms than LLVM does, that's for sure.
(NetBSD supports four more, IIRC, including ia64.) But the patches make
LLVM optional anyways, so this should be a non-issue.

> I concur with your feeling that hand-rolled JIT is right out.  But

Yeah, that way lies maintenance madness.

> I'm not sure that whatever performance gain we might get in this
> direction is worth the costs.

Byte-/bit-coding query plans then JITting them is very likely to improve
performance significantly.  Whether you want the maintenance overhead is
another story.

Sometimes byte-coding + interpretation yields a significant improvement
by reducing cache pressure on the icache and the size of the program to
be interpreted.  Having the option to JIT or not JIT might be useful.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tackling JsonPath support

2016-12-05 Thread Nico Williams
On Mon, Dec 05, 2016 at 11:52:57AM -0500, Tom Lane wrote:
> Another point here is that packagers such as Red Hat strenuously dislike
> such source-code-level wrapping of other projects, because that means that
> they have to rebuild multiple packages to fix any bugs found in the
> wrapped code.  If I were still packaging Postgres for Red Hat, and such
> a distribution landed in my inbox, the very first thing I'd be looking
> to do is rip out the borrowed code and replace it with a runtime
> shared-library dependency on the upstream project's official library.

I'm quite aware :(  I used to work at Sun on Solaris.  We too hated
duplication.  OpenSSL was a particularly problematic case...

There is a real tension between the release trains of many distinct open
source projects and those of their consumers, and those of distros/OSes.

Some projects, such as SQLite3, explicitly recommend copying their
source or statically linking them into dependents; distros/vendors never
like this.

My best advice on this (PG might benefit from it), informed by years of
experience dealing with this, is that there's no perfect answer, but
that nonetheless library developers should always follow these best
practices so as to help those who end up having to deal with multiple
versions of those libraries:

 - prefer dynamic linking (because dynamic linking semantics are
   superior to static linking semantics)

 - make libraries self-initialize and self-finalize!  (pthread_once()
   and Win32's ExecuteOnce* are your friends, as are atexit()/on_exit(),
   pthread_key_create(), and DllMain() on Windows)

   (otherwise calling your library from another library gets tricky)

 - make it so that as long as you change SONAMEs you can have multiple
   versions of the library loaded in one process, specifically:

- don't use POSIX file locking (but the new non-POSIX OFD locks are
  OK) (or use them for files that wouldn't be shared across multiple
  versions in one process)

  (e.g., SQLite3 uses POSIX file locking safely, but it's not likely
  that two consumers of different SQLite3 versions in one process
  would access the same DB files, so it kinda works)

- be backwards- and forwards-compatible as to any config file
  formats and other state that will be shared by multiple versions

 - generally: mind backwards compatibility, both source and binary, so
   as to make it easy to upgrade

- this means applying good API design best practices that I won't go
  into here

 - write thread-safe code, and preferably fork-safe code too

For example, I've seen OpenSSL built with different SONAMEs to support
multiple versions of OpenSSL coexisting in a single program/process.
That actually works.

> Having said that ... we have a *really bad* track record of deciding which
> outside projects we want to depend on, or maybe we've just outlived a lot
> of them.  Aside from Robert's examples, there's uuid-ossp and libxml2,
> which are external code but have caused us headaches anyway.  So I think
> there's a lot to be said for avoiding dependencies on libraries that may
> or may not still be getting actively maintained ten years from now.

I'm not at all surprised.

One codebase I help develop and maintain, Heimdal, includes SQLite3 and
libeditline, and parts of Heimdal should really be separate projects
(e.g., its ASN.1 compiler and library, and several supporting libraries
like libroken (a portability layer)) because they could be useful to
others outside Heimdal.  Finding the right balance is not trivial.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tackling JsonPath support

2016-12-05 Thread Nico Williams
On Mon, Dec 05, 2016 at 11:28:31AM -0500, Robert Haas wrote:
> The overall need is that it needs to be possible for PostgreSQL to
> throw an ERROR, and thus longjmp, without leaking resources.

As long as one can interpose jump buffers, that should be possible.

> Sometimes those errors happen asynchronously due to query cancel or
> replication conflicts or similar, and those don't have to be processed
> at once but the delay can't be unbounded or more than some modest
> fraction of a second.  Integrating with PostgreSQL's memory-context
> stuff might make that or other resource leak problems easier, or it
> might not.  To really get a clear notion of what would be involved, I
> think you'd probably need to produce a prototype patch and submit it
> here for review to really find out what issues people see with it.

Understood.  Thanks.

> Library integrations are tricky but, since you wrote JQ and seem

Just to be clear, Stephen Dolan wrote jq.  I've added to, and maintained
jq, to be sure, but I would not want to take credit from Stephen.

> potentially willing to modify it to work with PostgreSQL better, this
> one might be a lot less painful than some.  I can't promise we'd
> accept an integration even if you came up with a patch, but I see
> looking back over this thread that there are several people cautiously
> supporting the idea of using JQ in some form, which is promising.

Fair enough.  I won't be able to work on an integration for a few more
months, so we'll see (and Stephen might well veto some such changes to
jq), and it is time for me to shut up about this for now.  Thanks for
the very useful comments, and sorry for the noise.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tackling JsonPath support

2016-12-02 Thread Nico Williams
On Fri, Dec 02, 2016 at 08:53:33AM -0500, Robert Haas wrote:
> On Tue, Nov 29, 2016 at 11:50 AM, Christian Convey
>  wrote:
> > I think I can satisfy (3) with a PG extension which provides a function that
> > approximately implements JSONPath.  My short-term plans are to submit such a
> > patch.
> 
> FWIW, I think that's a fine plan.  I don't really know whether
> JSONPath is the right standard to pick for the task of extracting bits

It's not even a standard.  Are there particular proposals that the ANSI
SQL working group is considering?

> of JSON from other bits of JSON, but I think there's some value in
> picking something is simple enough that we can implement it in our own
> code and not have to rely on a third-party library.  Of course, if
> somebody feels like adding a configure option for --with-jq and

Sure.  My main concern is that I don't want to have to parse/format JSON
around every such call.  I'd rather parsed JSON remain in an internal
form for as long as possible.

Speaking of which, you could use libjq's jv API and not support the jq
language itself.

> appropriate interfaces to integrate with JQ, we could consider that,
> too, but that imposes a packaging requirement that a home-grown
> implementation doesn't.  I'd want to hear more than one vote for such

What we do in Heimdal, OpenAFS, and other open source projects, some
times, is include a copy / git submodule / similar of some such external
dependencies.  Naturally it's not possible to do this for all external
dependencies, but it works well enough.  The jv API part of jq is small
and simple, and could be ripped out into a library that could be
included in PostgreSQL.

> a course of action before embracing it.  If JQ is a Turing-complete
> query language, integrating it might be quite difficult -- for

Even if it weren't!  (It is.)

Consider this expression using a builtin in jq:

  [range(4503599627370496)]

That is, an array of integers from 0 to 4503599627370495, inclusive.
That will "halt" given a very, very large computer and a lot of time.

(Because jq is Turning-complete, range() can be coded in jq itself, and
some variants of range() are.)

> example, we'd need a way to make sure that it does periodic
> CHECK_FOR_INTERRUPTS() calls, and that it doesn't leak resources or
> crash if those calls decide longjmp() away due to an ERROR -- and
> would we let people query database tables with it?  Would that be
> efficient?  I think it's fine to have more limited objectives than
> what a JQ implementation would apparently entail.

Agreed.  I think this means that we need either or both of a variant of
the C jq_next() function that takes either a timeout parameter, or a
jq_intr() function that can cause a running jq_next() to stop.

(Tolerating longjmp() is harder to do and I'd rather not.)

Other projects, like, say, nginx or similar where there is a per-client
or per-connection memory pool to limit memory footprint, might want
libjq to get an allocator hook, so that's another enhancement to
consider.  If that's something that PostgreSQL would need, please let me
know.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tackling JsonPath support

2016-11-29 Thread Nico Williams
On Tue, Nov 29, 2016 at 05:18:17PM +0100, Petr Jelinek wrote:
> Just to add to this, the SQL/JSON proposals I've seen so far, and what
> Oracle, MSSQL and Teradata chose to implement already is basically
> subset of jsonpath (some proposals/implementations also include
> lax/strict prefix keyword on top of that). I think that should give us
> some hint on what the base functionality should look like.

Yes, that'd be base functionality.  You can go above and beyond.

I agree with Pavel that jq could be used as a user-defined function, but
proper integration would be better because it would avoid the need to
format and parse JSON around calls to jq, and also because PG could
compile jq programs when preparing SQL statements.  Besides, the libjq
jv API is *very* nice.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tackling JsonPath support

2016-11-28 Thread Nico Williams
On Mon, Nov 28, 2016 at 08:00:46PM -0700, David G. Johnston wrote:
> IMO jq is considerably closer to XSLT than XPath - which leads me to figure
> that since xml has both that JSON can benefit from jq and json-path.  I'm
> not inclined to dig too deep here but I'd rather take jq in the form of
> "pl/jq" and have json-path (abstractly) as something that you can use like
> "pg_catalog.get_value(json, json-path)"

JSONPath looks a lot like a small subset of jq.  Here are some examples:

JSONPath|   jq
---

$.store.book[0].title   | .store.book[0].title
$['store']['book'][0]['title']  | .["store"]["book"][0]["title"]
$..author   | ..|.author
$.store.*   | .store[]
$.store..price  | .store|..|.price?
$..book[2]  | [..|.book?][2]
$..book[?(@.isbn)]  | ..|.book?|select(.isbn)
$..book[?(@.price<10)]  | ..|.book?|select(.price<10)
$..*| ..?

Of course, jq can do much more than this.  E.g.,

# Output [, ] of all books with an ISBN:
..|.book?|select(.isbn)|[.title,.price]

# Output the average price of books with ISBNs appearing anywhere in
# the input document:
reduce
  (..|.book?|select(.isbn)|.price) as $price
  (
   # Initial reduction state:
   {price:0,num:0};
   # State update
   .price = (.price * .num + $price) / (.num + 1) | .num += 1) |
# Extract average price
.price

Of course one could just wrap that with a function:

def avg(pathexp; cond; v):
  reduce (pathexp | select(cond) | v) as $v
({v: 0, c: 0};
 .v = (.v * .c + $v) / (.c + 1) | .c += 1) | v;

# Average price of books with ISBNs:
avg(..|.book?; .isbn; .price)

# Average price of all books:
avg(..|.book?; true; .price)

There's much, much more.

Note that jq comes with a C implementation.  It should be easy to make
bindings to it from other programming language run-times.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tackling JsonPath support

2016-11-28 Thread Nico Williams
On Mon, Nov 28, 2016 at 06:38:55PM -0800, Christian Convey wrote:
> On Mon, Nov 28, 2016 at 6:26 PM, Nico Williams <n...@cryptonector.com>
> wrote:
> >
> 
> Thanks for the explanation.  It sounds like your original point was NOT
> that json-path isn't sufficient for "${specific use X}".

The only uses of SQL w/ JSON I've seen so far in live action are to
implement EAV schemas on PostgreSQL.  Since PostgreSQL lacks an ANY
type... using the hstore or jsonb to store data that would otherwise
require an ANY type is the obvious thing to do.  Naturally this use
doesn't need deeply nested JSON data structures, so even JSONPath is
overkill for it!

However, there are use cases I can imagine:

 - generating complex JSON from complex (e.g., recursive) SQL data where
   the desired JSON "schema" is not close to the SQL schema

   I've used jq a *lot* to convert schemas.  I've also use XSLT for the
   same purpose.  I've also used SQL RDBMSes and jq together a fair bit,
   either having jq consume JSON documents to output INSERT and other
   statements, or having a SQL application output JSON that I then
   convert to an appropriate schema using jq.

   Naturally I can keep using these two tools separately.  There's not
   much to gain from integrating them for this particular sort of
   use-case.

 - handling JSON documents with very loose schemata, perhaps arbitrary
   JSON documents, embedded in a SQL DB

   I've not needed to do this much, so I have no specific examples.
   But, of course, one reason I've not needed to do this is that today
   it kinda can't be done with enough expressivity.

There are many use-cases for general-purpose programming languages, and
even for very widely-applicable domain-specific programming language.

It's especially difficult to name a specific use-case for a language
that doesn't exist -- in this case that would be SQL + (jq and/or
JSONPath).

> Instead, your point was that jq seems to have many advantages over
> json-path in general, and therefore PG should offer jq instead or, or in
> addition to, json-path.
> 
> Is that what you're saying?

Roughly, yes.  The distinct advantage is that jq is much more general
and expressive, not unlike SQL itself.

> > Hmm?
> 
> Context: The reason I'm trying to work on a json-path implementation is
> that Pavel Stehule suggested it as a good first PG-hacking project for me.
> At the time, it sounded like he had a use for the feature.

I see.  I understand that.  If you've already made a significant
investment, then I don't blame you for not wanting to risk it.  On the
other hand, if melding jsonb and jq happens to be easy, then you'll get
much more bang from it for your investment.  Naturally, you do what you
prefer, and if the reality on the ground is JSONPath, then so be it.  If
I had time and felt sufficiently strongly, I'd contribute jq
integration; as it is I don't, and beggars can't be choosers.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tackling JsonPath support

2016-11-28 Thread Nico Williams
On Mon, Nov 28, 2016 at 05:50:40PM -0800, Christian Convey wrote:
> On Mon, Nov 28, 2016 at 11:23 AM, Nico Williams <n...@cryptonector.com>
> wrote:
> ...
> > JSON Path is not expressive enough (last I looked) and can be mapped
> > onto jq if need be anyways.
> 
> Hi Nico,
> 
> Could you please clarify what you mean by "not expressive enough"?

jq is a functional language that has these and other features:

 - recursion
 - generators
 - lazy evaluation (of sorts)
 - path expressions
 - math functionality (libm, basically)
 - reduction
 - functions
 - and other things

(jq does not have higher-order functions in that functions cannot return
functions and functions are not values, though it does have closures.)

jq is and feels a lot like a SQL, but for JSON.

> I ask because I've been struggling to identify clear requirements for the
> json-path functionality I'm trying to provide.  It sounds like perhaps you
> have something concrete in mind.

SQL imposes structure on data.  Recursion makes SQL structure looser in
the sense that it may not be easy or possible to express certain
desirable schema constraints in SQL terms without resorting to triggers,
say.  Storing documents in XML, JSON, or other such recursion-friendly
formats (perhaps in semantically equivalent but query-optimized forms)
is also a way to avoid strict structure (thus one needs schema
validators for XML, for example).

Less rigid schema constraints do not and should not preclude powerful
query languages.

One could convert such documents to a SQL EAV schema, if one has an
RDBMS with an ANY type (e.g., something like SQLite3's duck typing), and
then use SQL to query them.  But that may be more difficult to use than
a SQL with support for XML/JSON/... and query sub-languages for those.

SQL is very powerful.  One might like to have similarly powerful,
format-specific query languages for documents stored in XML, JSON,
etcetera, in a SQL RDBMS.  jq is such a language, for JSON documents.
Ditto XPath/XSLT, for XML.  While XPath is expressive and compact, XSLT
is rather verbose; jq is as expressive as XSLT, but with the compact
verbosity of XPath.

> Since I myself have no need currently for this functionality, I'm left
> guessing about hypothetical users of it.  My current mental model is:

That's a bit like asking what is the use for SQL :^)  The point is that
SQL is a powerful query language, and so is jq.  Each is appropriate to
its own domain; both could be used together.

> (a) Backend web developers.  AFAICT, their community has mostly settled on
> the syntax/semantics proposed by Stefan Groessner.  It would probably be
> unkind for PG's implementation to deviate from that without a good reason.

I can't speak for the community.  I wouldn't take it personally that jq
be not chosen, nor any other proposal of mine.  If it's politically
easier, then do that.

> (b) PG hackers who will eventually implement the ISO SQL standard
> operators.  In the standards-committee meeting notes I've seen, it seemed
> to me that they were planning to define some operators in terms of
> json-path expression.  So it would probably be good if whatever json-path
> function I implement turns out to comply with that standard, so that the
> PG-hackers can use it as a building block for their work.

These could still be implemented (e.g., using jq itself).

> (c) Pavel.  (I'm still somewhat unclear on what has him interested in this,
> and what his specific constraints are.)

Hmm?

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] matview incremental maintenance

2016-11-28 Thread Nico Williams
On Mon, Jun 17, 2013 at 07:41:15AM -0700, Kevin Grittner wrote:
> Since there seems to be interest in discussing incremental
> maintenance of materialized views *now*, I'm starting this thread
> to try to avoid polluting unrelated threads with the discussion.  I
> don't intend to spend a lot of time on it until the CF in progress
> completes, but at that point the work will start in earnest.  So
> I'll say where I'm at, and welcome anyone who has time to spare
> outside of the CF to comment or contribute ideas.

I have an implementation that supports updates, but it doesn't implement
automatic updates as described in the paper you cited.

> The paper at the core of the discussion can be found by searching
> for "maintaining views incrementally gupta mumick subrahmanian" --
> it's on both the ACM and CiteSeerX websites.  Of course, one
> doesn't need to understand that paper to discuss techniques for
> capturing the base deltas, but I'm hoping that's not what takes up
> most of the discussion.  I expect the most important discussions to
> be around how best to handle the "count(t)" (or count_t) column,
> what form should be use for intermediate results, how to modify or
> add execution nodes which know how to deal with the count, how to
> generate set operations to use those nodes, and how to modify the
> planner to choose the best plan for these operations.  Whether to
> pull the deltas off the WAL stream or stuff them into a tuplestore
> as they are written seems to me to be a relatively minor point.  If
> properly abstracted, the performance and complexity of alternatives
> can be compared.

Sure.  Automatically converting INSERTs/UPDATEs/DELETEs of MATERIALIZED
VIEW table sources is not trivial, though I've had some luck with a
particular multi-join query by just manually adding constraints from the
OLD.* and NEW.* rows, though unfortunately the PG query planner was
unable to find the obvious fast query plan (more on that in another
thread) and I ended up having to manually optimize the query using
WITH...

So at least for some queries it's easy enough to automatically propagate
constraints into the VIEW's query.  For some recursive queries it may
also be easy to propagate OLD.*/NEW.* into a seed.  Anyways, this is a
bit far afield though, as what I have managed so far is very useful even
without automatically producing updates based only on the VIEW's query.

Nor am I certain that automatically updating a materialized view is
always the right thing to do.

A case in point for me is an authorization system (which generally means
there's something of a transitive closure involved, which means
recursive queries).  In this system adding grants is cheap enough, but
massive revocation (e.g., deleting all of a user's entitlements, perhaps
by deleting the user and cascading the deletion) is not: it can be
faster to just refresh the view old-style than to update it dynamically!
(You noted this problem.)

The queries I use for dynamically updating the materialized views are
hand-optimized as mentioned above.  They are not too unlike what an
automatic system would have generated.

Granted, the problem partly is that an ORM is involved, which adds
obnoxious overhead: one statement per grant deletion versus one
statement for deleting all of them.  But it's still possible that at
some point it's best to refresh, even if an ORM were not involved.

> At the developer meeting last month, we talked about the special
> new count column for a bit, and everyone seemed to agree that
> adding such an animal, ...

What's that?

> Long term, timings for incremental maintenance that people would
> like to see (from most eager to least eager) are:
> 
> - as part of completing each statement, so that the affect on the
> matview is immediately visible to the transaction which modifies a
> supporting table, and becomes visible at commit to other
> transactions

That can be done now with hand-coded triggers, as I do in my use case.
Though I do leave some cases to a deferred refresh as mentioned above.

> - at transaction commit time, so that other transactions see the
> changes to the base tables and the referencing matviews at the same
> point in time

See above.

> - from a FIFO queue which is processed by a background process
> whenever data is present (possibly with pacing)

I do this.  I use NOTIFY/LISTEN and timeouts to drive refreshes as
needed.

> - from a FIFO queue based on a schedule, so that matviews are
> stable between applications and/or to avoid burdening the machine
> during peak periods

Ditto.

> - incremental update, or even full refresh, on an attempt to query
> a "stale" matview

I record and store (in a per-view history table) differences computed
during a refresh.

> - explicit request to apply incremental updates or refresh

I make this decision by using one trigger function or another.  One set
of trigger functions generates updates properly (by updating the
materialization table), and another merely marks 

Re: [HACKERS] Tackling JsonPath support

2016-11-28 Thread Nico Williams
On Mon, Nov 28, 2016 at 05:56:41PM +0100, Pavel Stehule wrote:
> Dne 28. 11. 2016 17:26 napsal uživatel "David Fetter" :
> > There's another option we should also consider: jq
> > .  It's available under a
> > PostgreSQL-compatible license, and has had a LOT of work put into
> > correctness and performance.
> 
> we can use it for inspiration. but the syntax of this tool is little bit
> too complex and too original against Json path ... jsonpath is relative
> simple implementation of xpath to json
> 
> we have one proprietary syntax already, two is maybe too much :-)

jq is hardly proprietary :)

JSON Path is not expressive enough (last I looked) and can be mapped
onto jq if need be anyways.

libjq has a number of desirable features, mostly its immutable/COW data
structures.  In libjq data structures are only mutated when there's
only one reference to them, but libjq's jv API is built around
immutability, so jv values are always notionally immutable.  For
example, one writes:

  jv a = jv_array();

  a = jv_array_append(a, jv_true()); // `a' is notionally new, but since
 // it had only one reference, its
 // memory is reused

and similarly for objects.  One could instead write:

  jv a = jv_array_append(jv_array(), jv_true());
  
or

  jv a = JV_ARRAY(jv_true());

One of the nice things about libjv is that almost every function
consumes a reference of every jv value passed in, with very few
exceptions.  This simplifies memory management, or at least avoidance of
double-free and use-after-free (it can be harder to track down leaks
though, since tools like valgrind don't understand that jv_copy() call
sites can be like allocations).

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >