Re: Fwd: Emit namespace in post-copy output

2021-07-27 Thread Corey Huinker
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:not tested

Passed make check-world. Running make installcheck-world had 2 errors out of 
209, but I got those same 2 errors on a clean branch.

Feature is as-described, and very simple.

As far as I can tell, there is no external specification for vacuum or any 
related utility.

I searched the documentation, and found several examples of the invocation of 
the VACUUM FULL command and vacuuumdb utility, but at no point was sample 
output shown, so this change will not require updating documentation.

The new status of this patch is: Ready for Committer


Re: Grammar railroad diagram

2021-07-10 Thread Corey Huinker
>
>
> Another way that I tested and it's working is to use
> https://www.bottlecaps.de/convert/ paste the postgresql grammar there
> and press "convert" and after press "view diagram".
>

I tried this out and I'm pleased to see that one of the outputs is xhtml +
SVG, because SVGs have hover-over tool-tips, which are an important aspect
of accessibility, which was my major concern the last time a similar thing
was proposed [1].

[1]
https://www.postgresql.org/message-id/cah2-wzmfc+p3pc_u1dsgm3lawurzkx5pqzmxtglgsxbf8gf...@mail.gmail.com


Re: Emit namespace in post-copy output

2021-06-23 Thread Corey Huinker
On Tue, Jun 22, 2021 at 6:08 PM Mike  wrote:

> When running a VACUUM or CLUSTER command, the namespace name is not part
> of the emitted message.
>
> Using `vacuumdb` CLI tool recently with multiple jobs, I found that
> reading the output messages harder to match the relations with their
> namespaces.
>
> Example:
>
> INFO:  vacuuming "sendgrid.open"
> INFO:  vacuuming "mailgun.open"
> ...
> INFO:  "open": found 0 removable, 31460776 nonremovable row versions in
> 1358656 pages
> DETAIL:  0 dead row versions cannot be removed yet.
> CPU 31.35s/261.26u sec elapsed 1620.68 sec.
> ...
>
> In this example. the user can't readily tell which `open` relation was
> completed.
>
> Attached is a patch using existing functions to include the namespace in
> the output string.
>
> Looking forward to feedback!
> -Mike Fiedler
>

I've added this to the open commitfest:
https://commitfest.postgresql.org/33/3200/

The change is quite simple, just 3 lines, adding the schema name to two
different lines of output.

As such, there is no obvious documentation to change, though I can imagine
that we have sample output from vacuum, vacuumdb or cluster somewhere that
would need to be updated.

I cobbled together a very simple test:

~/pgdata$ /usr/local/pgsql/bin/psql postgres
psql (14beta2)
Type "help" for help.
postgres=# create database mike_test;
CREATE DATABASE
postgres=# \c mike_test
You are now connected to database "mike_test" as user "corey".
mike_test=# create schema foo;
CREATE SCHEMA
mike_test=# create table foo.bar(x integer);
CREATE TABLE
mike_test=# \q
mike_test=# VACUUM FULL VERBOSE foo.bar;
INFO:  vacuuming "foo.bar"
INFO:  "foo.bar": found 0 removable, 0 nonremovable row versions in 0 pages
DETAIL:  0 dead row versions cannot be removed yet.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
VACUUM


And of course vacuumdb

~/pgdata$ /usr/local/pgsql/bin/vacuumdb --full --verbose mike_test
--table=foo.bar
vacuumdb: vacuuming database "mike_test"
INFO:  vacuuming "foo.bar"
INFO:  "foo.bar": found 0 removable, 0 nonremovable row versions in 0 pages
DETAIL:  0 dead row versions cannot be removed yet.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.


 So far, so good.


Re: test runner (was Re: SQL-standard function body)

2021-04-11 Thread Corey Huinker
>
> > This is nice.  Are there any parallelism capabilities?
>
> Yes. It defaults to number-of-cores processes, but obviously can also be
> specified explicitly. One very nice part about it is that it'd work
> largely the same on windows (which has practically unusable testing
> right now). It probably doesn't yet, because I just tried to get it
> build and run tests at all, but it shouldn't be a lot of additional
> work.
>

The pidgin developers speak very highly of meson, for the same reasons
already mentioned in this thread.


A Case For Inlining Immediate Referential Integrity Checks

2021-03-14 Thread Corey Huinker
A Case For Inlining Immediate Referential Integrity Checks
--

The following is an overview of how Postgres currently implemented
referential integrity, the some problems with that architecture, attempted
solutions for those problems, and a suggstion of another possible solution.

Notes On Notation and Referential Integrity In General
--

All referential integrity is ultimately of this form:

R(X) => T(Y)

Where one referencing table R has a set of columns X That references a set
of columns Y which comprise a unique constraint on a target table T. Note
that the Y-set of columns is usually the primary key of T, but does not
have to be.

The basic referential integrity checks fall into two basic categories,
Insert and Delete, which can be checked Immediately following the
statement, or can be Deferred to the end of the transaction.

The Insert check is fairly straightforward. Any insert to R, or update of R
that modifies [1] any column in X, is checked to see if all of the X
columns are NOT NULL, and if so, a lookup is done on T to find a matching
row tuple of Y. If none is found, then an error is raised.

The Update check is more complicated, as it covers any UPDATE operation
that modifies [1] any column in Y, where all of the values of Y are NOT
NUL, as well as DELETE operation where all of the columns of Y are NOT
NULL. For any Update check, the table R is scanned for any matching X
tuples matching Y in the previous, and for any matches found, an action is
taken. That action can be to fail the operation (NO ACTION, RESTRICT),
update the X values to fixed values (SET NULL, SET DEFAULT), or to delete
those rows in R (CASCADE).


Current Implementation
--

Currently, these operations are handled via per-row triggers. In our
general case, one trigger is placed on R for INSERT operations, and one
trigger is placed on T for DELETE operations, and an additional trigger is
placed on T for UPDATE operations that affect any column of Y.

These Insert trigger functions invoke the C function RI_FKey_check() [2].
The trigger is fired unconditionally, and the trigger itself determines if
there is a referential integrity constraint to be made or not. Ultimately
this trigger invokes an SPI query of the form SELECT 1 FROM  WHERE () FOR KEY SHARE. This query is generally quite straightforward to the
planner, as it becomes either a scan of a single unique index, or a
partition search followed by a scan of a single unique index. The operation
succeeds if a row is found, and fails if it does not.

The Update trigger functions are implemented with a set of C functions
RI_[noaction|restrict|cascade|setnull|setdefault]_[upd|del]() [3]. These
functions each generate a variation of SPI query in one of the following
forms

 cascade: DELETE FROM  WHERE 
 restrict/noaction: SELECT 1 FROM  WHERE  FOR KEY SHARE
 setnull: UPDATE  SET x1 = NULL, ... WHERE 
 setdefault: UPDATE  SET x1 = DEFAULT, ... WHERE 

These triggers are either executed at statement time (Immediate) or are
queued for execution as a part of the transaction commit (Deferred).

Problems With The Current Implementation


The main problems with this architecture come down to visiblity and
performance.

The foremost problem with this implementation is that these extra queries
are not visible to the end user in any way. It is possible to infer that
the functions executed by looking at the constraint defnitions and
comparing pg_stat_user_tables or pg_stat_user_indexes before and after the
operation, but in general the time spent in these functions accrues to the
DML statement (Immediate) or COMMIT statement (Deferred) without any insght
into what took place. This is especially vexing in situations where an
operation as simple as "DELETE FROM highly_referenced_table WHERE id = 1"
hits the primary key index, but takes several seconds to run.

The performance of Insert operations is generally not too bad, in that
query boils down to an Index Scan for a single row. The problem, however,
is that this query must be executed for every row inserted. The query
itself is only planned once, and that query plan is cached for later
re-use. That removes some of the query overhead, but also incurs a growing
cache of plans which can create memory pressure if the number of foreign
keys is large, and indeed this has become a problem for at least one
customer [4]. Some profiling of the RI check indicated that about half of
the time of the insert was spent in SPI functions that could be bypassed if
the C function called index_beginscan and index_rescan directly [5]. And
these indications bore out when Amit Langote wrote a patch [6] which finds
the designanted index from the constraint (with some drilling through
partitions if need be) and then invokes the scan functions. This method
showed about a halving of the 

Re: simplifying foreign key/RI checks

2021-02-28 Thread Corey Huinker
>
> > It seems to me 1 (RI_PLAN_CHECK_LOOKUPPK) is still alive. (Yeah, I
> > know that doesn't mean the usefulness of the macro but the mechanism
> > the macro suggests, but it is confusing.) On the other hand,
> > RI_PLAN_CHECK_LOOKUPPK_FROM_PK and RI_PLAN_LAST_ON_PK seem to be no
> > longer used.  (Couldn't we remove them?)
>
> Yeah, better to just remove those _PK macros and say this module no
> longer runs any queries on the PK table.
>
> How about the attached?
>
>
Sorry for the delay.
I see that the changes were made as described.
Passes make check and make check-world yet again.
I'm marking this Ready For Committer unless someone objects.


Re: parse_slash_copy doesn't support psql variables substitution

2021-02-11 Thread Corey Huinker
On Wed, Feb 10, 2021 at 8:33 AM Pavel Stehule 
wrote:

> Hi
>
> Is there some reason why \copy statement (parse_slash_copy parser) doesn't
> support psql variables?
>
> Regards
>
> Pavel
>

I remember wondering about that when I was working on the \if stuff. I dug
into it a bit, but the problem was out of scope for my goals.

The additional options recently added to \g reduced my need for \copy, and
it seemed liked there was some effort to have input pipes as well, that
would eliminate the need for \copy altogether.


Re: simplifying foreign key/RI checks

2021-01-24 Thread Corey Huinker
On Sun, Jan 24, 2021 at 6:51 AM Amit Langote 
wrote:

> On Sun, Jan 24, 2021 at 11:26 AM Corey Huinker 
> wrote:
> > On Sat, Jan 23, 2021 at 12:52 PM Zhihong Yu  wrote:
> >>
> >> Hi,
>
> Thanks for the review.
>
> >> +   for (i = 0; i < riinfo->nkeys; i++)
> >> +   {
> >> +   Oid eq_opr = eq_oprs[i];
> >> +   Oid typeid = RIAttType(fk_rel, riinfo->fk_attnums[i]);
> >> +   RI_CompareHashEntry *entry = ri_HashCompareOp(eq_opr,
> typeid);
> >> +
> >> +   if (pk_nulls[i] != 'n' &&
> OidIsValid(entry->cast_func_finfo.fn_oid))
> >>
> >> It seems the pk_nulls[i] != 'n' check can be lifted ahead of the
> assignment to the three local variables. That way, ri_HashCompareOp
> wouldn't be called when pk_nulls[i] == 'n'.
>
> Good idea, so done.  Although, there can't be nulls right now.
>
> >> +   case TM_Updated:
> >> +   if (IsolationUsesXactSnapshot())
> >> ...
> >> +   case TM_Deleted:
> >> +   if (IsolationUsesXactSnapshot())
> >>
> >> It seems the handling for TM_Updated and TM_Deleted is the same. The
> cases for these two values can be put next to each other (saving one block
> of code).
>
> Ah, yes.  The TM_Updated case used to be handled a bit differently in
> earlier unposted versions of the patch, though at some point I
> concluded that the special handling was unnecessary, but didn't
> realize what you just pointed out.  Fixed.
>
> > I'll pause on reviewing v4 until you've addressed the suggestions above.
>
> Here's v5.
>

v5 patches apply to master.
Suggested If/then optimization is implemented.
Suggested case merging is implemented.
Passes make check and make check-world yet again.
Just to confirm, we *don't* free the RI_CompareHashEntry because it points
to an entry in a hash table which is TopMemoryContext aka lifetime of the
session, correct?

Anybody else want to look this patch over before I mark it Ready For
Committer?


Re: simplifying foreign key/RI checks

2021-01-23 Thread Corey Huinker
On Sat, Jan 23, 2021 at 12:52 PM Zhihong Yu  wrote:

> Hi,
>
> +   for (i = 0; i < riinfo->nkeys; i++)
> +   {
> +   Oid eq_opr = eq_oprs[i];
> +   Oid typeid = RIAttType(fk_rel, riinfo->fk_attnums[i]);
> +   RI_CompareHashEntry *entry = ri_HashCompareOp(eq_opr, typeid);
> +
> +   if (pk_nulls[i] != 'n' &&
> OidIsValid(entry->cast_func_finfo.fn_oid))
>
> It seems the pk_nulls[i] != 'n' check can be lifted ahead of the
> assignment to the three local variables. That way, ri_HashCompareOp
> wouldn't be called when pk_nulls[i] == 'n'.
>
> +   case TM_Updated:
> +   if (IsolationUsesXactSnapshot())
> ...
> +   case TM_Deleted:
> +   if (IsolationUsesXactSnapshot())
>
> It seems the handling for TM_Updated and TM_Deleted is the same. The cases
> for these two values can be put next to each other (saving one block of
> code).
>
> Cheers
>

I'll pause on reviewing v4 until you've addressed the suggestions above.


Re: simplifying foreign key/RI checks

2021-01-21 Thread Corey Huinker
>
>
>
> I decided not to deviate from pk_ terminology so that the new code
> doesn't look too different from the other code in the file.  Although,
> I guess we can at least call the main function
> ri_ReferencedKeyExists() instead of ri_PrimaryKeyExists(), so I've
> changed that.
>

I think that's a nice compromise, it makes the reader aware of the concept.


>
> I've attached the updated patch.
>

Missing "break" added. Check.
Comment updated. Check.
Function renamed. Check.
Attribute mapping matching test (and assertion) added. Check.
Patch applies to an as-of-today master, passes make check and check world.
No additional regression tests required, as no new functionality is
introduced.
No docs required, as there is nothing user-facing.

Questions:
1. There's a palloc for mapped_partkey_attnums, which is never freed, is
the prevailing memory context short lived enough that we don't care?
2. Same question for the AtrrMap map, should there be a free_attrmap().


Re: simplifying foreign key/RI checks

2021-01-19 Thread Corey Huinker
>
> I decided not to deviate from pk_ terminology so that the new code
> doesn't look too different from the other code in the file.  Although,
> I guess we can at least call the main function
> ri_ReferencedKeyExists() instead of ri_PrimaryKeyExists(), so I've
> changed that.
>

I agree with leaving the existing terminology where it is for this patch.
Changing the function name is probably enough to alert the reader that the
things that are called pks may not be precisely that.


Re: Release SPI plans for referential integrity with DISCARD ALL

2021-01-18 Thread Corey Huinker
On Wed, Jan 13, 2021 at 1:03 PM Corey Huinker 
wrote:

> In addition to that, a following case would be solved with this approach:
>> When many processes are referencing many tables defined foreign key
>> constraints thoroughly, a huge amount of memory will be consumed
>> regardless of whether referenced tables are partitioned or not.
>>
>> Attached the patch.  Any thoughts?
>>
>
> Amit Langote has done some great work at eliminating SPI from
> INSERT/UPDATE triggers entirely, thus reducing the number of cached plans
> considerably.
>
> I think he was hoping to have a patch formalized this week, if time
> allowed.
>
> It doesn't have DELETE triggers in it, so this patch might still have good
> value for deletes on a commonly used enumeration table.
>
> However, our efforts might be better focused on eliminating SPI from
> delete triggers as well, an admittedly harder task.
>

Amit's patch is now available in this thread [1]. I'm curious if it has any
effect on your memory pressure issue.

[1]
https://www.postgresql.org/message-id/ca+hiwqgkfjfydeq5vhph6eqpkjsbfpddy+j-kxyfepqedts...@mail.gmail.com


Re: simplifying foreign key/RI checks

2021-01-18 Thread Corey Huinker
On Mon, Jan 18, 2021 at 9:45 PM Amit Langote 
wrote:

> On Tue, Jan 19, 2021 at 2:47 AM Zhihong Yu  wrote:
> >
> > Hi,
> > I was looking at this statement:
> >
> > insert into f select generate_series(1, 200, 2);
> >
> > Since certain generated values (the second half) are not in table p,
> wouldn't insertion for those values fail ?
> > I tried a scaled down version (1000th) of your example:
> >
> > yugabyte=# insert into f select generate_series(1, 2000, 2);
> > ERROR:  insert or update on table "f" violates foreign key constraint
> "f_a_fkey"
> > DETAIL:  Key (a)=(1001) is not present in table "p".
>
> Sorry, a wrong copy-paste by me.  Try this:
>
> create table p (a numeric primary key);
> insert into p select generate_series(1, 200);
> create table f (a bigint references p);
>
> -- Unpatched
> insert into f select generate_series(1, 200, 2);
> INSERT 0 100
> Time: 6527.652 ms (00:06.528)
>
> update f set a = a + 1;
> UPDATE 100
> Time: 8108.310 ms (00:08.108)
>
> -- Patched:
> insert into f select generate_series(1, 200, 2);
> INSERT 0 100
> Time: 3312.193 ms (00:03.312)
>
> update f set a = a + 1;
> UPDATE 100
> Time: 4292.807 ms (00:04.293)
>
> > For v1-0002-Avoid-using-SPI-for-some-RI-checks.patch :
> >
> > +* Collect partition key values from the unique key.
> >
> > At the end of the nested loop, should there be an assertion that
> partkey->partnatts partition key values have been found ?
> > This can be done by using a counter (initialized to 0) which is
> incremented when a match is found by the inner loop.
>
> I've updated the patch to add the Assert.  Thanks for taking a look.
>
> --
> Amit Langote
> EDB: http://www.enterprisedb.com


v2 patch applies and passes make check and make check-world. Perhaps, given
the missing break at line 418 without any tests failing, we could add
another regression test if we're into 100% code path coverage. As it is, I
think the compiler warning was a sufficient alert.

The code is easy to read, and the comments touch on the major points of
what complexities arise from partitioned tables.

A somewhat pedantic complaint I have brought up off-list is that this patch
continues the pattern of the variable and function names making the
assumption that the foreign key is referencing the primary key of the
referenced table. Foreign key constraints need only reference a unique
index, it doesn't have to be the primary key. Granted, that unique index is
behaving exactly as a primary key would, so conceptually it is very
similar, but keeping with the existing naming (pk_rel, pk_type, etc) can
lead a developer to think that it would be just as correct to find the
referenced relation and get the primary key index from there, which would
not always be correct. This patch correctly grabs the index from the
constraint itself, so no problem there.

I like that this patch changes the absolute minimum of the code in order to
get a very significant performance benefit. It does so in a way that should
reduce resource pressure found in other places [1]. This will in turn
reduce the performance penalty of "doing the right thing" in terms of
defining enforced foreign keys. It seems to get a clearer performance boost
than was achieved with previous efforts at statement level triggers.

This patch completely sidesteps the DELETE case, which has more insidious
performance implications, but is also far less common, and whose solution
will likely be very different.

[1]
https://www.postgresql.org/message-id/cakkq508z6r5e3jdqhfpwszsajlpho3oyyoamfesaupto5vg...@mail.gmail.com


Re: simplifying foreign key/RI checks

2021-01-18 Thread Corey Huinker
>
>
> In file included from
> /home/japin/Codes/postgresql/Debug/../src/include/postgres.h:47:0,
>  from
> /home/japin/Codes/postgresql/Debug/../src/backend/utils/adt/ri_triggers.c:24:
> /home/japin/Codes/postgresql/Debug/../src/backend/utils/adt/ri_triggers.c:
> In function ‘ri_PrimaryKeyExists’:
> /home/japin/Codes/postgresql/Debug/../src/include/utils/elog.h:134:5:
> warning: this statement may fall through [-Wimplicit-fallthrough=]
>   do { \
>  ^
> /home/japin/Codes/postgresql/Debug/../src/include/utils/elog.h:156:2:
> note: in expansion of macro ‘ereport_domain’
>   ereport_domain(elevel, TEXTDOMAIN, __VA_ARGS__)
>   ^~
> /home/japin/Codes/postgresql/Debug/../src/include/utils/elog.h:229:2:
> note: in expansion of macro ‘ereport’
>   ereport(elevel, errmsg_internal(__VA_ARGS__))
>   ^~~
> /home/japin/Codes/postgresql/Debug/../src/backend/utils/adt/ri_triggers.c:417:5:
> note: in expansion of macro ‘elog’
>  elog(ERROR, "unexpected table_tuple_lock status: %u", res);
>  ^~~~
> /home/japin/Codes/postgresql/Debug/../src/backend/utils/adt/ri_triggers.c:419:4:
> note: here
> default:
> ^~~
>
> --
> Regrads,
> Japin Li.
> ChengDu WenWu Information Technology Co.,Ltd.
>

I also get this warning. Adding a "break;" at line 418 resolves the warning.


Re: Release SPI plans for referential integrity with DISCARD ALL

2021-01-13 Thread Corey Huinker
>
> In addition to that, a following case would be solved with this approach:
> When many processes are referencing many tables defined foreign key
> constraints thoroughly, a huge amount of memory will be consumed
> regardless of whether referenced tables are partitioned or not.
>
> Attached the patch.  Any thoughts?
>

Amit Langote has done some great work at eliminating SPI from INSERT/UPDATE
triggers entirely, thus reducing the number of cached plans considerably.

I think he was hoping to have a patch formalized this week, if time allowed.

It doesn't have DELETE triggers in it, so this patch might still have good
value for deletes on a commonly used enumeration table.

However, our efforts might be better focused on eliminating SPI from delete
triggers as well, an admittedly harder task.


Re: Huge memory consumption on partitioned table with FKs

2020-11-30 Thread Corey Huinker
On Mon, Nov 30, 2020 at 9:48 PM Tom Lane  wrote:

> Corey Huinker  writes:
> > Given that we're already looking at these checks, I was wondering if this
> > might be the time to consider implementing these checks by directly
> > scanning the constraint index.
>
> Yeah, maybe.  Certainly ri_triggers is putting a huge amount of effort
> into working around the SPI/parser/planner layer, to not a lot of gain.
>
> However, it's not clear to me that that line of thought will work well
> for the statement-level-trigger approach.  In that case you might be
> dealing with enough tuples to make a different plan advisable.
>
> regards, tom lane
>

Bypassing SPI would probably mean that we stay with row level triggers, and
the cached query plan would go away, perhaps replaced by an
already-looked-up-this-tuple hash sorta like what the cached nested loops
effort is doing.

I've been meaning to give this a try when I got some spare time. This may
inspire me to try again.


Re: Huge memory consumption on partitioned table with FKs

2020-11-30 Thread Corey Huinker
>
> I think this can be solved easily in the patch, by having
> ri_BuildQueryKey() compare the parent's fk_attnums to the parent; if
> they are equal then use the parent's constaint_id, otherwise use the
> child constraint.  That way, the cache entry is reused in the common
> case where they are identical.
>

Somewhat of a detour, but in reviewing the patch for Statement-Level RI
checks, Andres and I observed that SPI made for a large portion of the RI
overhead.

Given that we're already looking at these checks, I was wondering if this
might be the time to consider implementing these checks by directly
scanning the constraint index.


Re: Feature proposal for psql

2020-09-19 Thread Corey Huinker
>> One limitation of this approach is that \set can't span lines, so
>> writing complex queries would be kinda painful.  But that would
>> be a good limitation to address separately; \set isn't the only
>> metacommand where can't-span-lines is a problem sometimes.
>> If you seriously want to pursue adding a feature like this,
>> probably the -hackers list is a more appropriate discussion
>> forum than -novice.
>>
>> regards, tom lane
>>
>
>  The ability to save and retrieve multi-line queries would be quite nice
> though, often I would like to save a query too large to type.
>
> I think I don't know psql well enough to propose a viable syntax, so I
> guess that would be up to experts here...
> But I would be pretty happy to implement it.
>
> Regards
> Denis
>
>
Well, if you want to do it right now, you can do this:

db=> select * from foo;
 x  | y
+
  1 |  1
  2 |  2
  3 |  3
  4 |  4
  5 |  5
  6 |  6
  7 |  7
  8 |  8
  9 |  9
 10 | 10
(10 rows)
db=> select * from foo where x = :xval \w query1.sql
db=> \set xval 4
db=> \i query1.sql
 x | y
---+---
 4 | 4
(1 row)


Granted, that involves adding files to the filesystem, setting variables
rather than passing parameters, remembering what those variables were, and
having the discipline to not have overlapping uses for variable names
across multiple files.

So the key shortcomings right now seem to be:
* no way to pass in values to an \i or \ir and no way to locally scope them
* one file per query

Setting variables locally in a \ir would need to somehow push and pop
existing variable values because those vars are scoped at the session
level, and that might confuse the user when they set the var inside the
included file expecting the calling session to keep the value.

Perhaps we could add a notion of a "bag of tricks" dir in each user's home
directory, and a slash command \wbag (better name suggestions welcome) that
behaves like \w but assumes the file will go in ~/.psql-bag-of-tricks/  and
\ibag which includes a file from the same dir.


Re: use pg_get_functiondef() in pg_dump

2020-08-17 Thread Corey Huinker
>
> I'm sure there's a lot of folks who'd like to see more of the logic we
> have in pg_dump for building objects from the catalog available to more
> tools through libpgcommon- psql being one of the absolute first
> use-cases for exactly that (there's certainly no shortage of people
> who've asked how they can get a CREATE TABLE statement for a table by
> using psql...).
>

I count myself among those folks (see
https://www.postgresql.org/message-id/CADkLM%3DfxfsrHASKk_bY_A4uomJ1Te5MfGgD_rwwQfV8wP68ewg%40mail.gmail.com
for
discussion of doing DESCRIBE and SHOW CREATE-ish functions either on server
side or client side).

I'm all for having this as "just" as set of pg_get_*def functions, because
they allow for the results to be used in queries. Granted, the shape of the
result set may not be stable, but that's the sort of thing we can warn for
the same way we have warnings for changes to pg_stat_activity. At that
point any DESCRIBE/SHOW CREATE server side functions essentially become
just shells around the pg_get_*def(), with no particular requirement to
make those new commands work inside a SELECT.

Would it be totally out of left field to have the functions have an
optional "version" parameter, defaulted to null, that would be used to give
backwards compatible results if and when we do make a breaking change?


Re: PG 13 release notes, first draft

2020-05-05 Thread Corey Huinker
>
>
> >
> > Please do list Jürgen, Corey and Roger as authors of the glossary.
>
> (Actually I should be listed as well, as the time I spent on it was
> considerable.)
>

+1, the time spent was quite considerable


Re: Proposing WITH ITERATIVE

2020-04-29 Thread Corey Huinker
>
>
> > Perhaps something like this would be more readable
> >
> > WITH t AS (
> >UPDATE ( SELECT 1 AS ctr, 'x' as val )
> >SET ctr = ctr + 1, val = val || 'x'
> >WHILE ctr <= 100
> >RETURNING ctr, val
> > )
> >
> > The notion of an UPDATE on an ephemeral subquery isn't that special, see
> > "subquery2" in
> >
> https://docs.oracle.com/cd/B19306_01/appdev.102/b14261/update_statement.htm
> ,
>
> I must admit that I do not like much needing another level of subquery,
> but maybe it could just be another named query in the WITH statement.
>

So like this:
WITH initial_conditions as (SELECT 1 as ctr, 'x' as val)
UPDATE initial_conditions
SET ctr = ctr + 1, val = val || 'x'
WHILE ctr <= 100
RETURNING ctr, val


> ISTM that UPDATE is quite restrictive as the number of rows cannot
> change, which does not seem desirable at all? How could I add or remove
> rows from one iteration to the next?
>

My understanding was that maintaining a fixed number of rows was a desired
feature.


> ISTM that the WHILE would be checked before updating, so that WHILE FALSE
> does nothing, in which case its position after SET is odd.
>

True, but having the SELECT before the FROM is equally odd.


> Having both WHERE and WHILE might look awkward.
>

Maybe an UNTIL instead of WHILE?


>
> Also it looks much more procedural this way, which is the point, but also
> depart from the declarative SELECT approach of WITH RECURSIVE.
>

Yeah, just throwing it out as a possibility. Looking again at what I
suggested, it looks a bit like the Oracle "CONNECT BY level <= x" idiom.

I suspect that the SQL standards body already has some preliminary work
done, and we should ultimately follow that.


Re: Add A Glossary

2020-04-29 Thread Corey Huinker
On Wed, Apr 29, 2020 at 3:15 PM Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> Why are all the glossary terms capitalized?  Seems kind of strange.
>
>
They weren't intended to be, and they don't appear to be in the page I'm
looking at. Are you referring to the anchor like in
https://www.postgresql.org/docs/devel/glossary.html#GLOSSARY-RELATION ? If
so, that all-capping is part of the rendering, as the ids were all named in
all-lower-case.


Re: Proposing WITH ITERATIVE

2020-04-29 Thread Corey Huinker
On Wed, Apr 29, 2020 at 10:34 AM Jonah H. Harris 
wrote:

> On Wed, Apr 29, 2020 at 7:22 AM Peter Eisentraut <
> peter.eisentr...@2ndquadrant.com> wrote:
>
>> Yeah the RECURSIVE vs ITERATIVE is a bit of a red herring here.  As you
>> say, the RECURSIVE keyword doesn't specify the processing but marks the
>> fact that the specification of the query is recursive.
>>
>
> Agreed. I started thinking through Fabien's response last night.
>
> I think a syntax that would fit better within the existing framework
>> would be something like
>>
>> WITH RECURSIVE t AS (
>>  SELECT base case
>>REPLACE ALL  -- instead of UNION ALL
>>  SELECT recursive case
>> )
>>
>
> I was originally thinking more along the lines of Fabien's approach, but
> this is similarly interesting.
>

Obviously I'm very concerned about doing something that the SQL Standard
will clobber somewhere down the road. Having said that, the recursive
syntax always struck me as awkward even by SQL standards.

Perhaps something like this would be more readable

WITH t AS (
UPDATE ( SELECT 1 AS ctr, 'x' as val )
SET ctr = ctr + 1, val = val || 'x'
WHILE ctr <= 100
RETURNING ctr, val
)

The notion of an UPDATE on an ephemeral subquery isn't that special, see
"subquery2" in
https://docs.oracle.com/cd/B19306_01/appdev.102/b14261/update_statement.htm ,
so the only syntax here without precedence is dropping a WHILE into an
UPDATE statement.


Re: More efficient RI checks - take 2

2020-04-22 Thread Corey Huinker
On Wed, Apr 22, 2020 at 2:36 PM Andres Freund  wrote:

> Hi,
>
> On 2020-04-22 13:46:22 -0400, Robert Haas wrote:
> > On Wed, Apr 22, 2020 at 1:18 PM Alvaro Herrera 
> wrote:
> > > Well, I was actually thinking in building ready-made execution trees,
> > > bypassing the planner altogether.  But apparently no one thinks that
> > > this is a good idea, and we don't have any code that does that already,
> > > so maybe it's not a great idea.
>
> I was commenting on what I understood Corey to say, but was fairly
> unclear about it. But I'm also far from sure that I understood Corey
> correctly...
>

I was unclear because, even after my failed foray into statement level
triggers for RI checks, I'm still pretty inexperienced in this area.

I'm just happy that it's being discussed.


Re: More efficient RI checks - take 2

2020-04-20 Thread Corey Huinker
>
> I can imagine removal of the SPI from the current implementation (and
> constructing the plans "manually"), but note that the queries I use in my
> patch are no longer that trivial. So the SPI makes sense to me because it
> ensures regular query planning.
>

As an intermediate step, in the case where we have one row, it should be
simple enough to extract that row manually, and do an SPI call with fixed
values rather than the join to the ephemeral table, yes?


> As for the tuplestore, I'm not sure the startup cost is a problem: if
> you're
> concerned about the 1-row case, the row should usually be stored in memory.
>



> > and once that is done, we could see about step #2.
>
> As I said during my review of your patch last year, I think the RI
> semantics
> has too much in common with that of triggers. I'd need more info to imagine
> such a change.
>

As a general outline, I think that DML would iterate over the 2 sets of
potentially relevant RI definitions rather than iterating over the
triggers.

The similarities between RI and general triggers are obvious, which
explains why they went that route initially, but they're also a crutch, but
since all RI operations boil down to either an iteration over a tuplestore
to do lookups in an index (when checking for referenced rows), or a hash
join of the transient data against the un-indexed table when checking for
referencing rows, and people who know this stuff far better than me seem to
think that SPI overhead is best avoided when possible. I'm looking forward
to having more time to spend on this.


Re: Poll: are people okay with function/operator table redesign?

2020-04-13 Thread Corey Huinker
>
> Yeah, back at the beginning of this exercise, Alvaro wondered aloud
> if we should go to something other than tables altogether.  I dunno
> what that'd look like though.
>

It would probably look like our acronyms and glossary pages.

Maybe the return example and return values get replaced with a
programlisting?


Re: Poll: are people okay with function/operator table redesign?

2020-04-13 Thread Corey Huinker
>
> Thinking out loud, it'd also be great if we could add in some anchors as
> well, so perhaps in the future on the pgweb side we could add in some
> discoverable links that other documentation has -- which in turn people
> could click / link to others directly to the function name.
>

+1


Re: Add A Glossary

2020-04-11 Thread Corey Huinker
>
>
> Term 'relation': A sequence is internally a table with one row - right?
> Shall we extend the list of concrete relations by 'sequence'? Or is this
> not necessary because 'table' is already there?
>

I wrote one for sequence, it was a bit math-y for Alvaro's taste, so we're
going to try again.


Re: More efficient RI checks - take 2

2020-04-08 Thread Corey Huinker
On Wed, Apr 8, 2020 at 1:06 PM Pavel Stehule 
wrote:

>
>
> st 8. 4. 2020 v 18:36 odesílatel Antonin Houska  napsal:
>
>> After having reviewed [1] more than a year ago (the problem I found was
>> that
>> the transient table is not available for deferred constraints), I've
>> tried to
>> implement the same in an alternative way. The RI triggers still work as
>> row
>> level triggers, but if multiple events of the same kind appear in the
>> queue,
>> they are all passed to the trigger function at once. Thus the check query
>> does
>> not have to be executed that frequently.
>>
>
I'm excited that you picked this up!


>
>> Some performance comparisons are below. (Besides the execution time,
>> please
>> note the difference in the number of trigger function executions.) In
>> general,
>> the checks are significantly faster if there are many rows to process,
>> and a
>> bit slower when we only need to check a single row. However I'm not sure
>> about
>> the accuracy if only a single row is measured (if a single row check is
>> performed several times, the execution time appears to fluctuate).
>>
>
These numbers are very promising, and much more in line with my initial
expectations. Obviously the impact on single-row DML is of major concern,
though.

It is hard task to choose good strategy for immediate constraints, but for
> deferred constraints you know how much rows should be checked, and then you
> can choose better strategy.
>

> Is possible to use estimation for choosing method of RI checks?
>

In doing my initial attempt, the feedback I was getting was that the people
who truly understood the RI checks fell into the following groups:
1. people who wanted to remove the SPI calls from the triggers
2. people who wanted to completely refactor RI to not use triggers
3. people who wanted to completely refactor triggers

While #3 is clearly beyond the scope for an endeavor like this, #1 seems
like it would nearly eliminate the 1-row penalty (we'd still have the
TupleStore initi penalty, but it would just be a handy queue structure, and
maybe that cost would be offset by removing the SPI overhead), and once
that is done, we could see about step #2.


Re: Add A Glossary

2020-04-04 Thread Corey Huinker
On Sat, Apr 4, 2020 at 2:55 AM Fabien COELHO  wrote:

>
> > BTW it's now visible at:
> > https://www.postgresql.org/docs/devel/glossary.html


Nice. I went looking for it yesterday and the docs hadn't rebuilt yet.


> ISTM that occurrences of these words elsewhere in the documentation should
> link to the glossary definitions?
>

Yes, that's a big project. I was considering writing a script to compile
all the terms as search terms, paired with their glossary ids, and then
invoke git grep to identify all pages that have term FOO but don't have
glossary-foo. We would then go about gloss-linking those pages as
appropriate, but only a few pages at a time to keep scope sane. Also, I'm
unclear about the circumstances under which we should _not_ tag a term. I
remember hearing that we should only tag it on the first usage, but is that
per section or per page?


> As the definitions are short and to the point, maybe the HTML display
> could (also) "hover" the definitions when the mouse passes over the word,
> using the "title" attribute?
>

I like that idea, if it doesn't conflict with accessibility standards
(maybe that's just titles on images, not sure).
I suspect we would want to just carry over the first sentence or so with a
... to avoid cluttering the screen with my overblown definition of a
sequence.
I suggest we pursue this idea in another thread, as we'd probably want to
do it for acronyms as well.


>
> "ACID" does not appear as an entry, nor in the acronyms sections. Also no
> DCL, although DML & DDL are in acronyms.
>

It needs to be in the acronyms page, and in light of all the docbook
wizardry that I've learned from Alvaro, those should probably get their own
acronym-foo ids as well. The cutoff date for 13 fast approaches, so it
might be for 14+ unless doc-only patches are treated differently.


> Entries could link to relevant wikipedia pages, like the acronyms section
> does?
>

They could. I opted not to do that because each external link invites
debate about how authoritative that link is, which is easier to do with
acronyms. Now that the glossary is a reality, it's easier to have those
discussions.


Re: Add A Glossary

2020-04-03 Thread Corey Huinker
>
> we have it, we can start thinking of patching the main part of the docs
> to make reference to it by using  in key spots.  Right now
> the glossary links to itself, but it makes lots of sense to have other
> places point to it.
>

I have some ideas about how to patch the main docs, but will leave those to
a separate thread.


> * I commented out the definition of "sequence", which seemed to go into
>   excessive detail.  Let's have a more concise definition?
>

That one's my fault.


>
> Patches for these omissions, and other contributions, welcome.
>

Thanks for all your work on this!


Re: Add A Glossary

2020-04-02 Thread Corey Huinker
On Thu, Apr 2, 2020 at 8:44 AM Jürgen Purtz  wrote:

> +1 and many thanks to Alvaros edits.
>
>
I did some of the grunt work Alvaro alluded to in v6, and the results are
attached and they build, which means there are no invalid links.

Notes:
* no definition wordings were changed
* added a linkend to all remaining glossterms that do not immediately
follow a glossentry
* renamed id glossary-temporary-tables to glossary-temporary-table
* temporarily re-added an id for glossary-row as we have many references to
that. unsure if we should use the term Tuple in all those places or say Row
while linking to glossary-tuple, or something else
* temporarily re-added an id for glossary-segment, glossary-wal-segment,
glossary-analytic-function, as those were also referenced and will need
similar decisions made
* added a stub entry for glossary-unique-index, unsure if it should have a
definition on it's own, or we split it into unique and index.
* I noticed several cases where a glossterm is used twice in a definition,
but didn't de-term them
* I'm curious about how we should tag a term when using it in its own
definition. same as anywhere else?
From 4603ce04306e77f5508bb207b42e5dec1425e7c5 Mon Sep 17 00:00:00 2001
From: coreyhuinker 
Date: Thu, 2 Apr 2020 15:32:43 -0400
Subject: [PATCH] glossary v7

---
 doc/src/sgml/filelist.sgml |1 +
 doc/src/sgml/glossary.sgml | 1589 
 doc/src/sgml/postgres.sgml |1 +
 3 files changed, 1591 insertions(+)
 create mode 100644 doc/src/sgml/glossary.sgml

diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 1043d0f7ab..cf21ef857e 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -170,6 +170,7 @@
 
 
 
+
 
 
 
diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml
new file mode 100644
index 00..edfcf9d725
--- /dev/null
+++ b/doc/src/sgml/glossary.sgml
@@ -0,0 +1,1589 @@
+
+ Glossary
+ 
+  This is a list of terms and their meaning in the context of
+  PostgreSQL and relational database
+  systems in general.
+ 
+
+ 
+  
+   Aggregate Function
+   
+
+ A function that
+ combines (aggregates) multiple input values,
+ for example by counting, averaging or adding,
+ yielding a single output value.
+
+
+ For more information, see
+ .
+
+
+   
+  
+
+  
+   Analyze (operation)
+   
+
+ The process of collecting statistics from data in
+ tables
+ and other relations
+ to help the query planner
+ to make decisions about how to execute
+ queries.
+
+   
+  
+
+  
+   Analytic Function
+   
+  
+
+  
+   Atomic
+   
+
+ In reference to a datum:
+ the fact that its value that cannot be broken down into smaller
+ components.
+
+   
+   
+
+ In reference to a
+ database transaction:
+ see atomicity.
+
+   
+  
+
+  
+   Atomicity
+   
+
+ The property of a transaction
+ that either all its operations complete as a single unit or none do.
+ This is one of the ACID properties.
+
+   
+  
+
+  
+   Attribute
+   
+
+ An element with a certain name and data type found within a
+ tuple or
+ table.
+
+   
+  
+
+  
+   Autovacuum
+   
+
+ Background processes that routinely perform
+ Vacuum and Analyze
+ operations.
+
+
+ For more information, see
+ .
+
+   
+  
+
+  
+   Backend (process)
+   
+
+ Processes of an Instance which act on behalf of
+ client Connections and handle their requests.
+
+
+ (Don't confuse this term with the similar terms
+ Background Worker or
+ Background Writer).
+
+   
+  
+
+  
+   Background Worker (process)
+   
+
+ Individual processes within an Instance, which
+ run system- or user-supplied code.  A typical use case is a process
+ which handles parts of an SQL query to take
+ advantage of parallel execution on servers with multiple
+ CPUs.
+   
+   
+For more information, see
+.
+   
+   
+  
+
+  
+   Background Writer (process)
+   
+
+ A process that continuously writes dirty pages from
+ Shared Memory to the file system.
+ It wakes up periodically, but
+ works only for a short period in order to distribute its expensive
+ I/O activity over time, instead of generating fewer
+ larger I/O peaks which could block other processes.
+
+
+ For more information, see
+ .
+
+   
+  
+
+  
+   Cast
+   
+
+ A conversion of a Datum from its current data
+ type to another data type.
+
+   
+  
+
+  
+   Catalog
+   
+
+ The SQL standard uses this term to
+ indicate what is called a Database in
+ PostgreSQL's terminology.
+
+
+ This should not be confused with the
+ System Catalog.
+
+
+ For more information, see
+ .
+
+   
+  
+
+  
+   Check Constraint
+   
+
+ A type of Constraint defined on a

Re: Add A Glossary

2020-04-01 Thread Corey Huinker
>
> I propose we define "planner" and make "optimizer" a  entry.
>

I have no objection to more entries, or edits to entries, but am concerned
that the process leads to someone having to manually merge several
start-from-scratch patches, with no clear sense of when we'll be done. I
may make sense to appoint an edit-collector.


> I further propose not to define the term "normalized", at least not for
> now.  That seems a very deep rabbit hole.
>

+1 I think we appointed a guy named Xeno to work on that definition. He
says he's getting close...


Re: Add A Glossary

2020-04-01 Thread Corey Huinker
>
> 2. I found out that "see xyz" and "see also" have bespoke markup in
> Docbook --  and .  I changed some glossentries
> to use those, removing some glossdefs and changing a couple of paras to
> glossseealsos.  I also removed all "id" properties from glossentries
> that are just , because I think it's a mistake to have
> references to entries that will make the reader look up a different
> term; for me as a reader that's annoying, and I don't like to annoy
> people.
>

+1 These structural enhancements are great. I'm fine with removing the id
from just-glossee, and glad that we're keeping the entry to aid discovery.


> I rewrote the definition for "atomic" once again.  Made it two
> glossdefs, because I can.  If you don't like this, I can undo.
>

+1 Splitting this into two definitions, one for each context, is the most
sensible thing and I don't know why I didn't do that in the first place.


Re: Add A Glossary

2020-03-31 Thread Corey Huinker
On Tue, Mar 31, 2020 at 2:09 PM Justin Pryzby  wrote:

> On Sun, Oct 13, 2019 at 04:52:05PM -0400, Corey Huinker wrote:
> > 1. It's obviously incomplete. There are more terms, a lot more, to add.
>
> How did you come up with the initial list of terms ?
>

1. I asked some newer database people to come up with a list of terms that
they used.
2. I then added some more terms that seemed obvious given that first list.
3. That combined list was long on general database concepts and theory, and
short on administration concepts
4. Then Jürgen suggested that we integrate his working list of terms, very
much focused on internals, so I did that.
5. Everything after that was applying suggested edits and new terms.


> Here's some ideas; I'm *not* suggesting to include all of everything, but
> hopefully start with a coherent, self-contained list.
>

I don't think this list will ever be complete. It will always be a work in
progress. I'd prefer to get the general structure of a glossary committed
in the short term, and we're free to follow up with edits that focus on the
wording.


>
> grep -roh '[^<]*' doc/src/ |sed 's/.*/\L&/' |sort |uniq -c
> |sort -nr |less
>
> Maybe also:
> object identifier
> operator classes
> operator family
> visibility map
>

Just so I can prioritize my work, which of these things, along with your
suggestions in previous emails, would you say is a barrier to considering
this ready for a committer?


Re: Add A Glossary

2020-03-30 Thread Corey Huinker
On Sun, Mar 29, 2020 at 5:29 AM Jürgen Purtz  wrote:

> On 27.03.20 21:12, Justin Pryzby wrote:
> > On Fri, Mar 20, 2020 at 11:32:25PM +0100, Jürgen Purtz wrote:
>  +Archiver
> >>> Can you change that to archiver process ?
> >> I prefer the short term without the addition of 'process' - concerning
> >> 'Archiver' as well as the other cases. But I'm not an native English
> >> speaker.
> > I didn't like it due to lack of context.
> >
> > What about "wal archiver" ?
> >
> > It occured to me when I read this.
> >
> https://www.postgresql.org/message-id/20200327.163007.128069746774242774.horikyota.ntt%40gmail.com
> >
> "WAL archiver" is ok for me. In the current documentation we have 2
> places with "WAL archiver" and 4 with "archiver"-only
> (high-availability.sgml, monitoring.sgml).
>
> "backend process" is an exception to the other terms because the
> standalone term "backend" is sensibly used in diverse situations.
>
> Kind regards, Jürgen
>

I've taken Alvarao's fixes and done my best to incorporate the feedback
into a new patch, which Roger's (tech writer) reviewed yesterday.

The changes are too numerous to list, but the highlights are:

New definitions:
* All four ACID terms
* Vacuum (split off from Autovacuum)
* Tablespace
* WAL Archiver (replaces Archiver)

Changes to existing terms:
* Implemented most wording changes recommended by Justin
* all remaining links were either made into xrefs or edited out of existence

* de-tagged most second uses of of a term within a definition


Did not do
* Addressed the " Process" suffix suggested by Justin. There isn't
consensus on these changes, and I'm neutral on the matter
* change the Cast definition. I think it's important to express that a cast
has a FROM datatype as well as a TO
* anything host/server related as I couldn't see a consensus reached

Other thoughts:
* Trivial definitions that are just see-other-definition are ok with me, as
the goal of this glossary is to aid in discovery of term meanings, so
knowing that two terms are interchangable is itself helpful


It is my hope that this revision represents the final _structural_ change
to the glossary. New definitions and edits to existing definitions will, of
course, go on forever.
From 8a163603102f51a3eddfb05c51baf3b840c5d7f7 Mon Sep 17 00:00:00 2001
From: coreyhuinker 
Date: Mon, 30 Mar 2020 13:08:27 -0400
Subject: [PATCH] glossary v4

---
 doc/src/sgml/filelist.sgml |1 +
 doc/src/sgml/glossary.sgml | 1551 
 doc/src/sgml/postgres.sgml |1 +
 3 files changed, 1553 insertions(+)
 create mode 100644 doc/src/sgml/glossary.sgml

diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 1043d0f7ab..cf21ef857e 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -170,6 +170,7 @@
 
 
 
+
 
 
 
diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml
new file mode 100644
index 00..eab14f3c9b
--- /dev/null
+++ b/doc/src/sgml/glossary.sgml
@@ -0,0 +1,1551 @@
+
+ Glossary
+ 
+  This is a list of terms and their meaning in the context of
+  PostgreSQL and relational database
+  systems in general.
+ 
+  
+   
+Aggregating
+
+ 
+  The act of combining a collection of data (input) values into
+  a single output value, which may not be of the same type as the
+  input values.
+ 
+
+   
+
+   
+Aggregate Function
+
+ 
+  A Function that combines multiple input values,
+  for example by counting, averaging or adding them all together,
+  yielding a single output value.
+ 
+ 
+  For more information, see
+  .
+ 
+ 
+  See also Window Function.
+ 
+
+   
+
+   
+Analytic
+
+ 
+  A Function whose computed value can reference
+  values found in nearby Rows of the same
+  Result Set.
+ 
+ 
+  For more information, see
+  .
+ 
+
+   
+
+   
+Atomic
+
+ 
+  In reference to the value of an Attribute or
+  Datum: an item that cannot be broken down
+  into smaller components.
+ 
+ 
+  In reference to an operation: an event that cannot be completed in
+  part; it must either entirely succeed or entirely fail. For
+  example, a series of SQL statements can be
+  combined into a Transaction, and that
+  transaction is said to be atomic.
+  Atomic.
+ 
+
+   
+
+   
+Atomicity
+
+ 
+  One of the ACID properties. This is the state of 
+  being Atomic in the operational/transactional sense.
+ 
+
+   
+
+   
+Attribute
+
+ 
+  An element with a certain name and data type found within a
+  Tuple or Table.
+ 
+
+   
+
+   
+Autovacuum
+
+ 
+  Background Worker processes that routinely
+  perform Vacuum operations.
+ 
+ 
+  For more information, see
+  .
+ 
+
+   
+
+   
+Backend Process
+
+ 
+  Processes of an 

Re: Add A Glossary

2020-03-24 Thread Corey Huinker
>
>
> > > +  Records to the file system and creates a special
> >
> > Does the chckpointer actually write WAL ?
>
> Yes.
>
> > An FK doesn't require the values in its table to be unique, right ?
>
> I believe it does require that the values are unique.
>
> > I think there's some confusion.  Constraints are not objects, right ?
>
> I think constraints are definitely objects. They have names and you
> can, for example, COMMENT on them.
>
> > Do we use shared_buffers for WAL ?
>
> No.
>
> (I have not reviewed the patch; these are just a few comments on your
> comments.)
>
>
I'm going to be coalescing the feedback into an updated patch very soon
(tonight/tomorrow), so please keep the feedback on the text/wording coming
until then.
If anyone has a first attempt at all the ACID definitions, I'd love to see
those as well.


Re: optimisation? collation "C" sorting for GroupAggregate for all deterministic collations

2020-03-22 Thread Corey Huinker
>
> Perhaps this is what you mean by "deterministic", but isn't it
> possible for some collations to treat multiple byte sequences as equal
> values? And those multiple byte sequences wouldn't necessarily occur
> sequentially in C collation, so it wouldn't be possible to work around
> that by having the grouping node use one collation but the sorting
> node use the C one.
>
> If my memory is incorrect, then this sounds like an intriguing idea.
>
>
I could see the value in a hash aggregate on C-collation that then passes
itself as a partial aggregate up to another step which applies the
collation and then finalizes the aggregation before sorting


Re: Add A Glossary

2020-03-20 Thread Corey Huinker
On Fri, Mar 20, 2020 at 6:32 PM Jürgen Purtz  wrote:

> man pages: Sorry, if I confused someone with my poor English. I just
> want to express in my 'offline' mail that we don't have to worry about
> man page generation. The patch doesn't affect files in the /ref
> subdirectory from where man pages are created.
>

It wasn't your poor English - everyone else understood what you meant. I
had wondered if our docs went into man page format as well, so my research
was still time well spent.


Re: Add A Glossary

2020-03-20 Thread Corey Huinker
>
> It's hard to review work from a professional tech writer.  I'm under the
> constant impression that I'm ruining somebody's perfect end product,
> making a fool of myself.


If it makes you feel better, it's a mix of definitions I wrote that Roger
proofed and restructured, ones that Jürgen had written for a separate
effort which then got a Roger-pass, and then some edits of my own and some
by Jürgen which I merged without consulting Roger.


Re: Add A Glossary

2020-03-20 Thread Corey Huinker
>
> Jürgen mentioned off-list that the man page doesn't build. I was going to
>> look into that, but if anyone has more familiarity with that, I'm listening.
>>
>
Looking at this some more, I'm not sure anything needs to be done for man
pages. man1 is for executables, man3 seems to be dblink and SPI, and man7
is all SQL commands. This isn't any of those. The only possible thing left
would be how to render the text of a foo
sgml/postgres.sgml:  
sgml/release.sgml:[A-Z][A-Z_ ]+[A-Z_] , ,
, 
sgml/stylesheet.css:acronym { font-style: inherit; }

filelist.sgml, postgres.sgml, ans stylesheet.css already have the
corresponding change, and the release.sgml is just an incidental mention of
acronym.

Of course I could be missing something.

>


Re: Add A Glossary

2020-03-19 Thread Corey Huinker
On Thu, Mar 19, 2020 at 8:11 PM Alvaro Herrera 
wrote:

> I gave this a look.  I first reformatted it so I could read it; that's
> 0001.  Second I changed all the long  items into s, which
>

Thanks! I didn't know about xrefs, that is a big improvement.


> are shorter and don't have to repeat the title of the refered to page.
> (Of course, this changes the link to be in the same style as every other
> link in our documentation; some people don't like it. But it's our
> style.)
>
> There are some mistakes.  "Tupple" is most glaring one -- not just the
> typo but also the fact that it goes to sql-revoke.  A few definitions
> we'll want to modify.  Nothing too big.  In general I like this work and
> I think we should have it in pg13.
>
> Please bikeshed the definition of your favorite term, and suggest what
> other terms to add.  No pointing out of mere typos yet, please.
>

Jürgen mentioned off-list that the man page doesn't build. I was going to
look into that, but if anyone has more familiarity with that, I'm listening.


> I think we should have the terms Consistency, Isolation, Durability.
>

+1


Re: Add A Glossary

2020-03-18 Thread Corey Huinker
On Fri, Mar 13, 2020 at 12:18 AM Jürgen Purtz  wrote:

>
> The statement that names of schema objects are unique isn't *strictly* true,
> just *mostly* true. Take the case of a unique constraints.
>
> Concerning CONSTRAINTS you are right. Constraints seems to be an exception:
>
>- Their name belongs to a schema, but are not necessarily unique
>within this context:
>https://www.postgresql.org/docs/current/catalog-pg-constraint.html.
>- There is a UNIQUE index within the system catalog pg_constraints:  
> "pg_constraint_conrelid_contypid_conname_index"
>UNIQUE, btree (conrelid, contypid, conname), which expresses that
>names are unique within the context of a table/constraint-type.
>Nevertheless tests have shown that some stronger restrictions exists across
>table-boarders (,which seems to be implemented in CREATE statements - or as
>a consequence of your mentioned correlation between constraint and index 
> ?).
>
> I hope that there are no more such exception to the global rule 'object
> names in a schema are unique':
> https://www.postgresql.org/docs/current/sql-createschema.html
>
> This facts must be mentioned as a short note in glossary and in more
> detail in the later patch about the architecture.
>
>
> I did what I could to address the near uniqueness, as well as incorporate
your earlier edits into this new, squashed patch attached.
From dbce6922194eb4ad8de57e81e182b9a6eebf859e Mon Sep 17 00:00:00 2001
From: coreyhuinker 
Date: Tue, 10 Mar 2020 11:26:29 -0400
Subject: [PATCH] add glossary page with revisions

---
 doc/src/sgml/filelist.sgml |1 +
 doc/src/sgml/glossary.sgml | 1072 
 doc/src/sgml/postgres.sgml |1 +
 3 files changed, 1074 insertions(+)
 create mode 100644 doc/src/sgml/glossary.sgml

diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 3da2365ea9..504c8a6326 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -170,6 +170,7 @@
 
 
 
+
 
 
 
diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml
new file mode 100644
index 00..d28bfb6fcf
--- /dev/null
+++ b/doc/src/sgml/glossary.sgml
@@ -0,0 +1,1072 @@
+
+ Glossary
+ 
+  This is a list of terms and their meaning in the context of PostgreSQL and Databases in general.
+ 
+  
+   
+Aggregate
+
+ 
+  To combine a collection of data values into a single value, whose value may not be of the same type as the original values. Aggregate Functions combine multiple Rows that share a common set of values into one Row, which means that the only data visible in the values in common, and the aggregates of the non-common data.
+ 
+ 
+  For more information, see Aggregate Functions.
+ 
+
+   
+
+   
+Analytic
+
+ 
+  A Function whose computed value can reference values found in nearby Rows of the same Result Set.
+ 
+ 
+  For more information, see Window Functions.
+ 
+
+   
+
+   
+Archiver
+
+ 
+  A process that backs up WAL Files in order to reclaim space on the file system.
+ 
+ 
+  For more information, see Backup and Restore: Continuous Archiving and Point-in-Time Recovery (PITR).
+ 
+
+   
+
+   
+Atomic
+
+ 
+  In reference to the value of an Attribute or Datum: cannot be broken down into smaller components.
+ 
+ 
+  In reference to an operation: An event that cannot be completed in part: it must either entirely succeed or entirely fail. A series of SQL statements can be combined into a Transaction, and that transaction is said to be Atomic.
+ 
+
+   
+
+   
+Attribute
+
+ 
+  An element with a certain name and data type found within a Tuple or Table.
+ 
+
+   
+
+   
+Autovacuum
+
+ 
+  Processes that remove outdated MVCC Records of the Heap and Index.
+ 
+ 
+  For more information, see Routine Database Maintenance Tasks: Routine Vacuuming.
+ 
+
+   
+
+   
+Backend Process
+
+ 
+  Processes of an Instance which act on behalf of client Connections and handle their requests.
+ 
+ 
+  (Don't confuse this term with the similar terms Background Worker or Background Writer).
+ 
+
+   
+
+   
+Backend Server
+
+ 
+  See Instance.
+ 
+
+   
+
+   
+Background Worker
+
+ 
+  Individual processes within an Instance, which run system- or user-supplied code. Typical use cases are processes which handle parts of an SQL query to take advantage of parallel execution on servers with multiple CPUs.
+
+
+ For more information, see Background Worker Processes.
+
+
+   
+
+   
+Background Writer
+
+ 
+  Writes continuously dirty pages from Shared Memory to the file system. It starts periodically, but works only for a short period in order to distribute
+expensive I/O activity over time instead of generating 

Re: Add A Glossary

2020-03-11 Thread Corey Huinker
>
>
> * Transaction - yes, all those things could be "visible" or they could be
> "side effects". It may be best to leave the over-simplified definition in
> place, and add a "For more information see < tutorial-transactions>>
>

transaction-iso would be a better linkref in this case


Re: Add A Glossary

2020-03-11 Thread Corey Huinker
>
> It will be helpful for diff-ing to restrict the length of lines in the
> SGML files to 71 characters (as usual).


I did it that way for the following reasons
1. It aids grep-ability
2. The committers seem to be moving towards that for SQL strings, mostly
for reason #1
3. I recall that the code is put through a linter as one of the final steps
before release, I assumed that the SGML gets the same.
4. Even if #3 is false, its easy enough to do manually for me to do for
this one file once we've settled on the text of the definitions.

As for the changes, most things seem fine, I specifically like:
* Checkpoint - looks good
* yes, PGDATA should have been a literal
* Partition - the a/b split works for me
* Unlogged - it reads better

I'm not so sure on / responses to your ???s:
* The statement that names of schema objects are unique isn't *strictly* true,
just *mostly* true. Take the case of a unique constraints. The constraint
has a name and the unique index has the same name, to the point where
adding a unique constraint using an existing index renames that index to
conform to the constraint name.
* Serializable "other way around" question - It's both. Outside the
transaction you can't see changes made inside another transaction (though
you can be blocked by them), and inside serializable you can't see any
changes made since you started. Does that make sense? Were you asking a
different question?
* Transaction - yes, all those things could be "visible" or they could be
"side effects". It may be best to leave the over-simplified definition in
place, and add a "For more information see <>


Re: Add A Glossary

2020-03-11 Thread Corey Huinker
On Wed, Mar 11, 2020 at 12:50 PM Jürgen Purtz  wrote:

> I made changes on top of 0001-add-glossary-page.patch which was supplied
> by C. Huinker. This affects not only terms proposed by me but also his
> original terms. If my changes are not obvious, please let me know and I
> will describe my motivation.
>
> Please note especially lines marked with question marks.
>
> It will be helpful for diff-ing to restrict the length of lines in the
> SGML files to 71 characters (as usual).
>
> J. Purtz
>

A new person replied off-list with some suggested edits, all of which
seemed pretty good. I'll incorporate them myself if that person chooses to
remain off-list.


Re: Add A Glossary

2020-03-10 Thread Corey Huinker
This latest version is an attempt at merging the work of Jürgen Purtz into
what I had posted earlier. There was relatively little overlap in the terms
we had chosen to define.

Each glossary definition now has a reference id (good idea Jürgen), the
form of which is "glossary-term". So we can link to the glossary from
outside if we so choose.

I encourage everyone to read the definitions, and suggest fixes to any
inaccuracies or awkward phrasings. Mostly, though, I'm seeking feedback on
the structure itself, and hoping to get that committed.


On Tue, Feb 11, 2020 at 11:22 PM Corey Huinker 
wrote:

> It seems like this could be a good idea, still the patch has been
>> waiting on his author for more than two weeks now, so I have marked it
>> as returned with feedback.
>>
>
> In light of feedback, I enlisted the help of an actual technical writer
> (Roger Harkavy, CCed) and we eventually found the time to take a second
> pass at this.
>
> Attached is a revised patch.
>
>
From 690473e51fc442c55c1744f69813795fce9d22dc Mon Sep 17 00:00:00 2001
From: coreyhuinker 
Date: Tue, 10 Mar 2020 11:26:29 -0400
Subject: [PATCH] add glossary page

---
 doc/src/sgml/filelist.sgml |1 +
 doc/src/sgml/glossary.sgml | 1008 
 doc/src/sgml/postgres.sgml |1 +
 3 files changed, 1010 insertions(+)
 create mode 100644 doc/src/sgml/glossary.sgml

diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 3da2365ea9..504c8a6326 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -170,6 +170,7 @@
 
 
 
+
 
 
 
diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml
new file mode 100644
index 00..851e9debe6
--- /dev/null
+++ b/doc/src/sgml/glossary.sgml
@@ -0,0 +1,1008 @@
+
+ Glossary
+ 
+  This is a list of terms and their in the context of PostgreSQL and Databases in general.
+ 
+  
+   
+Aggregate
+
+ 
+  To combine a collection of data values into a single value, whose value may not be of the same type as the original values. Aggregate Functions combine multiple Rows that share a common set of values into one Row, which means that the only data visible in the values in common, and the aggregates of the non-common data.
+ 
+ 
+  For more information, see Aggregate Functions.
+ 
+
+   
+
+   
+Analytic
+
+ 
+  A Function whose computed value can reference values found in nearby Rows of the same Result Set.
+ 
+ 
+  For more information, see Window Functions.
+ 
+
+   
+
+   
+Archiver
+
+ 
+  A process that backs up WAL Files in order to reclaim space on the file system.
+ 
+ 
+  For more information, see Backup and Restore: Continuous Archiving and Point-in-Time Recovery (PITR).
+ 
+
+   
+
+   
+Atomic
+
+ 
+  In reference to the value of an Attribute or Datum: cannot be broken up into smaller components.
+ 
+ 
+  In reference to an operation: An event that cannot be completed in part: it must either entirely succeed or entirely fail. A series of SQL statements can be combined into a Transaction, and that transaction is said to be Atomic.
+ 
+
+   
+
+   
+Attribute
+
+ 
+  A typed data element found within a Tuple or Relation or Table.
+ 
+
+   
+
+   
+Autovacuum
+
+ 
+  Processes that remove outdated MVCC Records of the Heap and Index.
+ 
+ 
+  For more information, see Routine Database Maintenance Tasks: Routine Vacuuming.
+ 
+
+   
+
+   
+Backend Process
+
+ 
+  Processes of an Instance which act on behalf of client Connections and handle their requests.
+ 
+ 
+  (Don't confuse this term with the similar terms Background Worker or Background Writer).
+ 
+
+   
+
+   
+Backend Server
+
+ 
+  See Instance.
+ 
+
+   
+
+   
+Background Worker
+
+ 
+  Individual processes within an Instance, which run system- or user-supplied code. Typical use cases are processes which handle parts of an SQL query to take advantage of parallel execution on servers with multiple CPUs.
+
+
+ For more information, see Background Worker Processes.
+
+
+   
+
+   
+Background Writer
+
+ 
+  Writes continuously dirty pages from Shared Memory to the file system. It starts periodically, but works only for a short period in order to distribute
+expensive I/O activity over time instead of generating fewer large I/O peaks which could block other processes.
+ 
+ 
+  For more information, see Server Configuration: Resource Consumption.
+ 
+
+   
+
+   
+Cast
+
+ 
+  A conversion of a Datum from its current data type to another data type.
+ 
+
+   
+
+ 
+Catalog
+
+ 
+  The SQL standard uses this standalone term to indicate

Re: Resolving the python 2 -> python 3 mess

2020-02-18 Thread Corey Huinker
>
> So, as with Jesse's example, what I'm wondering is whether or not 2to3
> will fix that for you (or even flag it).  The basic difference between
> the two alternatives I suggested is whether we force people to put their
> python function through that converter before we'll even try to run it.
> Subtleties that 2to3 doesn't catch seem like non-reasons to insist on
> applying it.
>

The 2018 vintage of 2to3 didn't catch it.

It's not firsthand knowledge, but I just watched a nearby team have some
production issues where one library couldn't fetch b'http://foo.org' so I'm
guessing 2to3 still doesn't catch those things, or they stopped using it.


Re: Resolving the python 2 -> python 3 mess

2020-02-17 Thread Corey Huinker
>
> A possible gotcha in this approach is if there are any python 2/3
> incompatibilities that would not manifest as syntax errors or
> obvious runtime errors, but would allow old code to execute and
> silently do the wrong thing.  One would hope that the Python crowd
> weren't dumb enough to do that, but I don't know whether it's true.
> If there are nasty cases like that, maybe what we have to do is allow
> plpythonu/plpython2u functions to be dumped and reloaded into a
> python-3-only install, but refuse to execute them until they've
> been converted.
>

Unfortunately, I think there are cases like that. The shift to Unicode as
the default string means that some functions that used to return a `str`
now return a `bytes` (I know of this in the hashlib and base64 modules, but
probably also in URL request data and others), and to use a `bytes` in
string manipulation you have to first explicitly convert it to some string
encoding. So things like a function that wraps around a python crypto
library would be the exact places where those was-str-now-bytes functions
would be used.


Re: Add A Glossary

2020-02-11 Thread Corey Huinker
>
> It seems like this could be a good idea, still the patch has been
> waiting on his author for more than two weeks now, so I have marked it
> as returned with feedback.
>

In light of feedback, I enlisted the help of an actual technical writer
(Roger Harkavy, CCed) and we eventually found the time to take a second
pass at this.

Attached is a revised patch.
From f087e44fe4db7996880cf4df982297018d444363 Mon Sep 17 00:00:00 2001
From: Corey Huinker 
Date: Wed, 12 Feb 2020 04:17:59 +
Subject: [PATCH] add glossary page with initial definitions

---
 doc/src/sgml/filelist.sgml |   1 +
 doc/src/sgml/glossary.sgml | 540 +
 doc/src/sgml/postgres.sgml |   1 +
 3 files changed, 542 insertions(+)
 create mode 100644 doc/src/sgml/glossary.sgml

diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 3da2365ea9..504c8a6326 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -170,6 +170,7 @@
 
 
 
+
 
 
 
diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml
new file mode 100644
index 00..1b881690fa
--- /dev/null
+++ b/doc/src/sgml/glossary.sgml
@@ -0,0 +1,540 @@
+
+
+
+ Glossary
+ 
+  This is a list of terms and their definitions in the context of PostgreSQL and databases in general.
+ 
+  
+   
+Aggregate
+
+ 
+  The act of combining a defined collection of data values into a single value that may not be the same type as the original values. Aggregate functions are most often used with Grouping operations which define the separate sets of data by the common values shared within those sets.
+ 
+
+   
+
+   
+Analytic
+
+ 
+  A function whose computed value can reference values found in nearby rows of the same result set.
+ 
+
+   
+
+   
+Atomic
+
+ 
+  When referring to the value of an attribute or datum: cannot be broken up into smaller components.
+ 
+ 
+  When referring to an operation: An event that cannot be partially completed; it must either completely succeed or completely fail. A series of SQL statements can be combined into a transaction, and that transaction is described as atomic.
+ 
+
+   
+
+   
+Attribute
+
+ 
+  A typed data element found within a tuple or relation or table.
+ 
+
+   
+
+   
+BYTEA
+
+ 
+  A data type for storing binary data. It is roughly analogous to the BLOB data type in other database products.
+ 
+
+   
+
+   
+Cast
+
+ 
+  The act of converting of a datum from its current data type to another data type.
+ 
+
+   
+
+   
+Check Constraint
+
+ 
+  A type of constraint defined for a relation which restricts the values allowed in one or more attributes. The check constraint can make reference to any attribute in the relation, but cannot reference other rows of the same relation or other relations.
+ 
+
+   
+
+   
+Column
+
+ 
+  An attribute found in a table or view.
+ 
+
+   
+
+   
+Commit
+
+ 
+  The act of finalizing a transaction within the database.
+ 
+
+   
+
+   
+Concurrency
+
+ 
+  The concept that multiple independent operations can be happening within the database at the same time.
+ 
+
+   
+
+   
+Constraint
+
+ 
+  A method of restricting the values of data allowed within a relation. Constraints can currently be of the following types: Check Constraint, Unique Constraint, and Exclusion Constraint.
+ 
+
+   
+
+   
+Datum
+
+ 
+  The internal representation of a SQL datatype.
+ 
+
+   
+
+   
+Delete
+
+ 
+  A SQL command that removes rows from a given table or relation.
+ 
+
+   
+
+   
+Exclusion Constraint
+
+ 
+  Exclusion constraints define both a set of columns for matching rows, and rules where values in one row would conflict with values in another.
+ 
+
+   
+
+   
+Foreign Data Wrapper
+
+ 
+  A means of representing data outside the local database so that it appears as if it were in local tables. With a Foreign Data Wrapper it is possible to define a Foreign Server and Foreign Tables.
+ 
+
+   
+
+   
+Foreign Key
+
+ 
+  A type of constraint defined on one or more columns in a table which requires the value in those columns to uniquely identify a row in the specified table.
+ 
+
+   
+
+   
+Foreign Server
+
+ 
+  A named collection of Foreign Tables which all use the same Foreign Data Wrapper and have other configured attributes in common.
+ 
+
+   
+
+   
+Foreign Table
+
+ 
+  A relation which appears to have rows and columns like a regular table, but when queried will instead forward the request for data through its Foreign Data Wrapper, which will return results structured according to the definition

Add Change Badges to documentation

2019-10-18 Thread Corey Huinker
Attached is a patch to implement change badges in our documentation.

What's a change badge? It's my term for a visual cue in the documentation
used to indicate that the nearby section of documentation is new in this
version or otherwise changed from the previous version.

One example of change badges being used is in the DocBook documentation
reference:
https://tdg.docbook.org/tdg/4.5/ref-elements.html#common.attributes

Docbook used graphical badges, which seemed to be a bad idea. Instead, I
went with a decorated text span like one finds in gmail labels or Reddit
"flair".

The badges are implemented via using the "revision" attribute available on
all docbook tags. All one needs to do to indicate a change is to change one
tag, and add a revision attribute. For example:



will add a small green text box with the tex "new in 13" immediately
preceding the rendered elements. I have attached a screenshot
(badges_in_acronyms.png) of an example of this from my browser viewing
changes to the acronyms.html file. This obviously lacks the polish of
viewing the page on a full website, but it does give you an idea of the
flexibility of the change badge, and where badge placement is (and is not)
a good idea.

What are the benefits of using this?

I think the benefits are as follows:

1. It shows a casual user what pieces are new on that page (new functions,
new keywords, new command options, etc).

2. It also works in the negative: a user can quickly skim a page, and
lacking any badges, feel confident that everything there works in the way
that it did in version N-1.

3. It also acts as a subtle cue for the user to click on the previous
version to see what it used to look like, confident that there *will* be a
difference on the previous version.


How would we implement this?

1. All new documentation pages would get a "NEW" badge in their title.

2. New function definitions, new command options, etc would get a "NEW"
badge as visually close to the change as is practical.

3. Changes to existing functions, options, etc. would get a badge of
"UPDATED"

4. At major release time, we could do one of two things:

4a. We could keep the NEW/UPDATED badges in the fixed release version, and
then completely remove them from the master, because for version N+1, they
won't be new anymore. This can be accomplished with an XSL transform
looking for any tag with the "revision" attribute

4b. We could code in the version number at release time, and leave it in
place. So in version 14 you could find both "v13" and "v14" badges, and in
version 15 you could find badges for 15, 14, and 13. At some point (say
v17), we start retiring the v13 badges, and in v18 we'd retire the v14
badges, and so on, to keep the clutter to a minimum.

Back to the patch:
I implemented this only for html output, and the colors I chose are very
off-brand for postgres, so that will have to change. There's probably some
spacing/padding issues I haven't thought of. Please try it out, make some
modifications to existing document pages to see how badges would work in
those contexts.
From ded965fc90b223a834ac52d55512587b7a6ea139 Mon Sep 17 00:00:00 2001
From: Corey Huinker 
Date: Fri, 18 Oct 2019 06:15:10 -0400
Subject: [PATCH] add document change badges

---
 doc/src/sgml/acronyms.sgml  |  6 +++---
 doc/src/sgml/stylesheet-html-common.xsl | 10 ++
 doc/src/sgml/stylesheet.css | 10 ++
 3 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/acronyms.sgml b/doc/src/sgml/acronyms.sgml
index f638665dc9..87bfef04be 100644
--- a/doc/src/sgml/acronyms.sgml
+++ b/doc/src/sgml/acronyms.sgml
@@ -10,7 +10,7 @@
   
 

-ANSI
+ANSI
 
  
   https://en.wikipedia.org/wiki/American_National_Standards_Institute;>
@@ -19,7 +19,7 @@
 

 
-   
+   
 API
 
  
@@ -31,7 +31,7 @@

 ASCII
 
- 
+ 
   https://en.wikipedia.org/wiki/Ascii;>American Standard
   Code for Information Interchange
  
diff --git a/doc/src/sgml/stylesheet-html-common.xsl b/doc/src/sgml/stylesheet-html-common.xsl
index 9edce52a10..cb04cb7f0d 100644
--- a/doc/src/sgml/stylesheet-html-common.xsl
+++ b/doc/src/sgml/stylesheet-html-common.xsl
@@ -289,4 +289,14 @@ set   toc,title
   
 
 
+
+
+  
+  
+  
+
+  
+  
+
+
 
diff --git a/doc/src/sgml/stylesheet.css b/doc/src/sgml/stylesheet.css
index 1a66c789d5..d0cae2f59f 100644
--- a/doc/src/sgml/stylesheet.css
+++ b/doc/src/sgml/stylesheet.css
@@ -109,3 +109,13 @@ acronym		{ font-style: inherit; }
 width: 75%;
   }
 }
+
+/* version badge styling */
+span.revision-badge {
+	visibility: visible  ;
+color: white;
+	background-color: #00933C;
+	border: 1px solid #00;
+	border-radius: 2px;
+padding: 1px;
+}
-- 
2.14.1



Add A Glossary

2019-10-14 Thread Corey Huinker
Attached is a v1 patch to add a Glossary to the appendix of our current
documentation.

I believe that our documentation needs a glossary for a few reasons:

1. It's hard to ask for help if you don't know the proper terminology of
the problem you're having.

2. Readers who are new to databases may not understand a few of the terms
that are used casually both in the documentation and in forums. This helps
to make our documentation a bit more useful as a teaching tool.

3. Readers whose primary language is not English may struggle to find the
correct search terms, and this glossary may help them grasp that a given
term has a usage in databases that is different from common English usage.

3b. If we are not able to find the resources to translate all of the
documentation into a given language, translating the glossary page would be
a good first step.

4. The glossary would be web-searchable, and draw viewers to the official
documentation.

5. adding link anchors to each term would make them cite-able, useful in
forum conversations.


A few notes about this patch:

1. It's obviously incomplete. There are more terms, a lot more, to add.

2. The individual definitions supplied are off-the-cuff, and should be
thoroughly reviewed.

3. The definitions as a whole should be reviewed by an actual tech writer
(one was initially involved but had to step back due to prior commitments),
and the definitions should be normalized in terms of voice, tone, audience,
etc.

4. My understanding of DocBook is not strong. The glossary vs glosslist tag
issue is a bit confusing to me, and I'm not sure if the glossary tag is
even appropriate for our needs.

5. I've made no effort at making each term an anchor, nor have I done any
CSS styling at all.

6. I'm not quite sure how to handle terms that have different definitions
in different contexts. Should that be two glossdefs following one
glossterm, or two separate def/term pairs?

Please review and share your thoughts.
From 343d5c18bf23f98341b510595e3e042e002242cb Mon Sep 17 00:00:00 2001
From: Corey Huinker 
Date: Sun, 13 Oct 2019 17:57:36 +
Subject: [PATCH] add glossary page with sample terms and definitions

---
 doc/src/sgml/filelist.sgml  |   1 +
 doc/src/sgml/glossary.sgml  | 618 
 doc/src/sgml/stylesheet.css |   2 +
 3 files changed, 621 insertions(+)
 create mode 100644 doc/src/sgml/glossary.sgml

diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 3da2365ea9..504c8a6326 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -170,6 +170,7 @@
 
 
 
+
 
 
 
diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml
new file mode 100644
index 00..016eee2d76
--- /dev/null
+++ b/doc/src/sgml/glossary.sgml
@@ -0,0 +1,618 @@
+
+
+
+ Glossary
+
+ 
+  This is a list of terms and their in the context of PostgreSQL and databases in general.
+
+  
+
+   
+Aggregate
+
+ 
+  To combine a collection of data values into a single value, whose value
+may not be of the same type as the original values. Aggregate functions combine
+multiple rows that share a common set of values into one row, which means that
+the only data visible in the values in common, and the aggregates of the
+non-common data.
+ 
+
+   
+
+   
+Analytic
+
+ 
+  A function whose computed value can reference values found in nearby rows
+of the same result set.
+ 
+
+   
+
+   
+Atomic
+
+ 
+  In reference to the value of an Attribute or Datum: cannot be broken up
+into smaller components.
+ 
+ 
+  In reference to an operation: An event that cannot be completed in part:
+it must either entirely succeed or entirely fail. A series of SQL statements can
+be combined into a Transaction, and that transaction is said to be Atomic.
+ 
+
+   
+
+   
+Attribute
+
+ 
+  A typed data element found within a Tuple or Relation or Table.
+ 
+
+   
+
+   
+Cast
+
+ 
+  A conversion of a Datum from its current data type to another data type.
+ 
+
+   
+
+   
+Check Constraint
+
+ 
+  A type of constraint defined on a relation which restricts the values
+allowed in one or more Attributes. The check constraint can make reference to
+any Attribute in the Relation, but cannot reference other rows of the same
+relation or other relations.
+ 
+
+   
+
+   
+Column
+
+ 
+  An Attribute found in a Table or View.
+ 
+
+   
+
+   
+Commit
+
+ 
+  The act of finalizing a Transaction within the database.
+ 
+
+   
+
+   
+Concurrency
+
+ 
+  The concept that multiple independent operations can be happening within
+the database at the same time.
+ 
+
+   
+
+   
+Constraint
+
+ 
+  A method of restricting the values of data allowed within a Table.
+ 
+
+   
+
+   
+Datum
+
+ 
+  The internal

Re: Extending range type operators to cope with elements

2019-09-14 Thread Corey Huinker
>
>
> >- @> contains range/element
> >- <@ element/range is contained by
>


I'm not a heavy user or range types, so I can't really judge how useful
> that is in practice, but it seems like a fairly natural extension of the
> existing operators. I mean, if I understand it correctly, the proposed
> behavior is equal to treating the element as a "collapsed range".
>

I used to give a talk on ranges and partitioning, prior to postgresql
getting native partitioning (see:
https://wiki.postgresql.org/images/1/1b/Ranges%2C_Partitioning_and_Limitations.pdf
 )
In that talk, I mention the need for exactly these operators, specifically
for an extension called range_partitioning which had some logic for "If I
were to insert a row with this value, what partition would it end up in?"
which allowed for a subsequent COPY operation directly to that partition.
That logic essentially binary-searched a series of ranges, so it needed an
"elem <@ range" as well as << and >>.

Yes, constructing a collapsed range was the work-around I used in the
absence of real functions.

That extension has been replaced by real table partitioning and the planner
itself now does similar logic for partition pruning.

So yes, I've had a need for those operators in the past. What I don't know
is whether adding these functions will be worth the catalog clutter.


Re: Referential Integrity Checks with Statement-level Triggers

2019-08-01 Thread Corey Huinker
>
>
> > The people who expressed opinions on nuking triggers from orbit (it's
> the only way to be sure) have yet to offer up any guidance on how to
> proceed from here, and I suspect it's because they're all very busy getting
> things ready for v12. I definitely have an interest in working on this for
> 13, but I don't feel good about striking out on my own without their input.
>
> Very interesting thread, but the current patch has been through two
> CFs without comments or new patches, so I'm going to mark it "Returned
> with feedback".  I hope all this discussion will trigger more research
> in this space.
>

I've noticed that the zedstore efforts ran into the same problem that
refactoring triggers has: we cannot determine which columns in a table will
be affected by a trigger. so we have to assume that all of them will be.
This causes a lot of unnecessary overhead with triggers. If we had a
compilation step for triggers (which, ultimately means a compilation step
for procedures) which kept a dependency tree of which tables/columns were
touched, then we would have that insight. it's true that one dynamic
statement or SELECT * would force us right back to keep-everything, but if
procedures which did not do such things had performance benefits, that
would be an incentive to code them more fastidiously.


Re: \describe*

2019-08-01 Thread Corey Huinker
>
> It seems this topic is ongoing so I've moved it to the September CF,
> but it's in "Waiting on Author" because we don't have a concrete patch
> that applies (or agreement on what it should do?) right now.
>

All recent work has been investigating the need(s) we're trying to address.
This is as good of a time as any to share my findings (with much
collaboration with Dave Fetter) so far.

1. Adding helper commands to psql aids only psql, and a great number of
users do not, or can not, use psql. So adding something on the server side
would have broader usage and appeal. Furthermore, some access tools
(especially browser-based ones) are not good about returning non-tabular
results, so helper commands that return result sets would have the broadest
usage.

2. Our own interest in server-side commands is all over the map. Some just
want the convenience of having them server side, or familiarity with
$OTHER_DB. Others want to eliminate the need for some code in pg_dump,
JDBC, or elsewhere.

3. There isn't much consensus in the other databases, though all of them do
*something*:

SQLServer
---

SQLServer has sp_help (
https://docs.microsoft.com/en-us/sql/relational-databases/system-stored-procedures/sp-help-transact-sql?view=sql-server-2017
 )

which contextually returns one of two different result sets (name, owner,
object type) or (column name, type, storage, length, precision, scale,
nullable, default, rule, collation)

DB2
--
Has a describe command (source:
https://www.ibm.com/support/knowledgecenter/SSEPGG_11.1.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0002019.html)
 which
can be used to describe query output (data type, data type length, column
name, column name length).

It also has an option to DESCRIBE TABLE foo which returns a set of
(col_name, schema_of_datatype, data_type, data_type_length,
data_type_scale, Nulls t/f)

It also has DESCRIBE INDEXES FOR TABLE foo which returns a set of (schema
of index, name of index, unique flag, number of columns, index type)

It also has DESCRIBE DATA PARTITIONS FOR TABLE which as you might guess
shows partitions.

All of these options have a SHOW DETAIL modifier which adds more columns.

MySQL
--

(https://dev.mysql.com/doc/refman/8.0/en/show-columns.html)
MySSQL has SHOW COLUMNS which also returns a set of  (name, type similar to
format_type(), null flag, PK or index indicator, default value, notes about
auto-increment/autogreneration/implicit trggers), and can be extended to
show privileges and comments with the EXTENDED and FULL options.

MySQL has a DESCRIBE command, but it is a synonym of EXPLAIN.

MySQL also has a raft of commands like SHOW CREATE USER, SHOW CREATE VIEW,
SHOW CREATE TRIGGER, SHOW CREATE TABLE, etc. (ex:
https://dev.mysql.com/doc/refman/8.0/en/show-create-user.html)  These
commands all return a result set of of exactly one column, each row
representing one SQL statement, essentially doing a single-object
schema-only pg_dump.

Oracle
-

https://docs.oracle.com/cd/B19306_01/server.102/b14357/ch12019.htm

SQL*Plus has a describe command that works on tables and views and
composite types (tabular set of: name, null, type) procedures (tabular set
of: arg name, type, in/out), and packages (a series of sets one per type
and procedure)

SQLcl has the INFO statement, which is roughly analogous to psql's \d in
that it is a mix of tabular and non-tabular information.

Oracle itself has dbms_metadata.get_ddl() which seems analogous to mysql's
SHOW CREATE commands.

Snowflake
--

Snowflake has DESCRIBE TABLE
https://docs.snowflake.net/manuals/sql-reference/sql/desc-table.html and
DESCRIBE VIEW
https://docs.snowflake.net/manuals/sql-reference/functions/get_ddl.html

Which return a set of: (name, type, column type, null flag, default,
primary key, unique key, check, expression, comment).

It also has an option for describing "stage" tables, which are s3 buckets
with a file format associated, the closest postgresql analog would be a
file_fdw foreign table, and there is a separate result set format for that.

Snowflake has no concept of indexes (it understands that there's things
called a unique keys, and it remembers that you said you wanted one, but
does nothing to enforce it), so no command for that.

These result sets are not composable in a query, however, they are stored
in the RESULT_SCAN cache, which means that you can run a describe, and then
immediately fetch the results of that command as if it was a table.

Snowflake also has a get_ddl() function
https://docs.snowflake.net/manuals/sql-reference/sql/desc-view.html which
is a one-column result set of statements to re-create the given object.


>From all this, I have so far concluded:

1. There is real demand to be able to easily see the basic structure of
tables, views, and indexes in a way that strikes a balance between detail
and clutter.
2. There is some acknowledgement that this data be useful if it was further
filtered through SQL, though only one vendor 

Re: Catching missing Datum conversions

2019-07-20 Thread Corey Huinker
>
> I should probably split this into "actionable" (categories 3 and 4)
> and "noise and scaffolding" patches.
>

Breaking down the noise-and-scaffolding into some subgroups might make the
rather long patches more palatable/exceedingly-obvious:
* (Datum) 0 ---> NullDatum
* 0 > NullDatum
* The DatumGetPointer(allParameterTypes) null tests

Having said that, everything you did seems really straightforward, except
for

src/backend/rewrite/rewriteDefine.c
src/backend/statistics/mcv.c
src/backend/tsearch/ts_parse.c

and those seem like cases where the DatumGetXXX was a no-op before Datum
was a struct.


Re: SHOW CREATE

2019-07-05 Thread Corey Huinker
On Fri, Jul 5, 2019 at 12:32 PM David Fetter  wrote:

> Folks,
>
> Corey Huinker put together the documentation for this proposed
> feature. Does this seem like a reasonable way to do it?
>
>
In doing that work, it became clear that the command was serving two
masters:
1. A desire to see the underlying nuts and bolts of a given database object.
2. A desire to essentially make the schema portion of pg_dump a server side
command.

To that end, I see splitting this into two commands, SHOW CREATE and SHOW
DUMP.

SHOW DUMP would the original command minus the object type and object name
specifier, and it would dump the entire current database as seen from the
current user (again, no data).

SHOW CREATE would still have all the object_type parameters as before, but
would only dump the one specified object, plus any dependent objects
specified in the WITH options (comments, grants, indexes, constraints,
partitions, all).

Please note that any talk of a server side DESCRIBE is separate from this.
That would be a series of commands that would have result sets tailored to
the object type, and each one would be an inherent compromise between
completeness and readability.

I'd like to hear what others have to say, and incorporate that feedback
into a follow up proposal.


Re: \describe*

2019-06-22 Thread Corey Huinker
>
> > So what is the uptake on implementing this at the server side, ie.
> > DESCRIBE?
>
> I'm pretty skeptical of this idea, unless you are willing to throw
> away at least one and possibly both of the following goals:
>
> 1. Compatibility with psql's existing \d behavior.
>

I don't think *compatibility* with the behavior should be a goal in itself.
Coverage of the majority of the use-cases is.

2. Usability of DESCRIBE for any purpose whatsoever other than emitting
> something that looks just like what psql prints.
>
> We've migrated many of the \d displays so far away from "a single query
> result" that I don't believe there's a way for a server command to
> duplicate them, at least not without some seriously unholy in-bed-ness
> between the server command and some postprocessing logic in describe.c.
> (At which point you've lost whatever system architectural value there
> might be in the whole project, since having a more-arm's-length
> relationship there kinda seems like the point to me.)
>

I think there's a genuine use for regular printed output, and there's also
a use for a query-able output. Maybe that queryable output is just a JSONB
output that the outer query can pick apart as it sees fit, and that would
handle the fact that the data often doesn't fit into a single query's
output.

Incidentally, I had need of this very functionality in Snowflake the other
day. The data dictionary there isn't capable of telling you which columns
are in a primary key, but that information is printed when you run
"DESCRIBE my_table".  The workaround is to run "DESCRIBE my_table" and then
make another query using a table function to recall the output of the last
query made in the session, and then filter that. Yeah, as a pattern it's
weird and sad, but it shows that there's are uses for something
DESCRIBE-ish on the server side.

So if we're going servier-side on DESCRIBE, it should be it's own entity,
not beholden to design decisions made in psql.


> There are a bunch of other little behavioral differences that you just
> can't replicate server-side, like the fact that localization of the
> results depends on psql's LC_MESSAGES not the server's.  Maybe people
> would be okay with changing that, but it's not a transparent
> reimplementation.
>

I think people would be OK with that. We're asking the server what it knows
about an object, not how psql feels about that same information.

I think if we want to have server-side describe capability, we're better
> off just to implement a DESCRIBE command that's not intended to be exactly
> like \d anything, and not try to make it be the implementation for \d
> anything.  (This was, in fact, where David started IIUC.  Other people's
> sniping at that idea hasn't yielded any better idea.)
>

I'm very much in support of server-side DESCRIBE that's not beholden to \d
in any way. For instance, I'm totally fine with DESCRIBE not being able to
handle wildcard patterns.

My initial suggestion for client-side \describe was mostly borne of it
being easy to implement a large subset of the \d commands to help users.
Not all users have psql access, so having a server side command helps more
people.

It could be that we decide that DESCRIBE is set-returning, and we have to
break up \d functionality to suit. By this I mean that we might find it
simpler to require DESCRIBE TABLE foo to only show columns with minimal
information about PKs and follow up commands like "DESCRIBE TABLE foo
INDEXES" or "DESCRIBE TABLE foo CONSTRAINTS" to keep output in tabular
format.


> In particular, I'm really strongly against having "\describe-foo-bar"
> invoke DESCRIBE, because (a) that will break compatibility with the
> existing \des command, and (b) it's not actually saving any typing,
> and (c) I think it'd confuse users no end.
>

+1. Having psql figure out which servers can give proper
servier-side-describes would boggle the mind.


> Of course, this line of thought does lead to the conclusion that we'd be
> maintaining psql/describe.c and server-side DESCRIBE in parallel forever,
>

Not fun, but what's our motivation for adding new new \d functionality once
a viable DESCRIBE is in place? Wouldn't the \d commands essentially be
feature-frozen at that point?


> which doesn't sound like fun.  But we should be making DESCRIBE with an
> eye to more use-cases than psql.  If it allows jdbc to not also maintain
> a pile of equivalent code, that'd be a win.  If it allows pg_dump to toss
> a bunch of logic overboard (or at least stop incrementally adding new
> variants), that'd be a big win.
>

I don't know enough about JDBC internals to know what sort of non-set
results it can handle, but that seems key to showing us how to proceed.

As for pg_dump, that same goal was a motivation for a similar server-side
command "SHOW CREATE " (essentially, pg_dump of ) which
would have basically the same design issues as DESCRIBE would, though the
result set would be a much simpler SETOF text.


Re: Extracting only the columns needed for a query

2019-06-16 Thread Corey Huinker
>
> The thing that most approaches to this have fallen down on is triggers ---
> that is, a trigger function might access columns mentioned nowhere in the
> SQL text.  (See 8b6da83d1 for a recent example :-()  If you have a plan
> for dealing with that, then ...
>

Well, if we had a trigger language that compiled to  at creation
time, and that trigger didn't do any dynamic/eval code, we could store
which attributes and rels were touched inside the trigger.

I'm not sure if that trigger language would be sql, plpgsql with a
"compile" pragma, or maybe we exhume PSM, but it could have some side
benefits:

  1. This same issue haunts any attempts at refactoring triggers and
referential integrity, so narrowing the scope of what a trigger touches
will help there too
  2. additional validity checks
  3. (this is an even bigger stretch) possibly a chance to combine multiple
triggers into one statement, or combine mutliple row-based triggers into a
statement level trigger

Of course, this all falls apart with one dynamic SQL or one SELECT *, but
it would be incentive for the users to refactor code to not do things that
impede trigger optimization.


Re: PostgreSQL 12 Beta 1 press release draft

2019-05-21 Thread Corey Huinker
For CTEs, is forcing inlining the example we want to give, rather than the
example of forcing materialization given?

According to the docs, virtual generated columns aren't yet supported. I'm
pretty sure the docs are right. Do we still want to mention it?

Otherwise it looks good to me.

On Tue, May 21, 2019 at 11:39 PM Jonathan S. Katz 
wrote:

> Hi,
>
> Attached is a draft of the PG12 Beta 1 press release that is going out
> this Thursday. The primary goals of this release announcement are to
> introduce new features, enhancements, and changes that are available in
> PG12, as well as encourage our users to test and provide feedback to
> help ensure the stability of the release.
>
> Speaking of feedback, please provide me with your feedback on the
> technical correctness of this announcement so I can incorporate changes
> prior to the release.
>
> Thanks!
>
> Jonathan
>


Re: Table as argument in postgres function

2019-05-21 Thread Corey Huinker
>
>
>> Is there anything preventing us from having the planner resolve object
>> names from strings?
>>
>
> The basic problem is fact so when you use PREPARE, EXECUTE protocol, you
> has not parameters in planning time.
>

I agree that it defeats PREPARE as it is currently implemented with
PQprepare(), and it would never be meaningful to have a query plan that
hasn't finalized which objects are involved.

But could it be made to work with PQexecParams(), where the parameter
values are already provided?

Could we make a version of PQprepare() that takes an extra array of
paramValues for object names that must be supplied at prepare-time?


Re: Table as argument in postgres function

2019-05-19 Thread Corey Huinker
>
>
> You can pass table name as text or table object id as regclass type.
>
> inside procedure you should to use dynamic sql - execute statement.
> Generally you cannot to use a variable as table or column name ever.
>
> Dynamic SQL is other mechanism - attention on SQL injection.
>

On this note, Snowflake has the ability to to parameterize object names
(see:
https://docs.snowflake.net/manuals/sql-reference/identifier-literal.html )

So you can do things like
SELECT col_a, col_b FROM identifier('a_table_name')
or as a bind variable
SELECT col_a, col_b FROM identifier($1)

Which is their way of avoiding SQL injection attacks in *some* circumstances.
Their implementation of it is a bit uneven, but it has proven useful for my
work.

I can see where this obviously would prevent the planning of a prepared
statement when a table name is a parameter, but the request comes up often
enough, and the benefits to avoiding SQL injection attacks are significant
enough that maybe we should try to enable it for one-off. I don't
necessarily think we need an identifier(string) function, a
'schema.table'::regclass would be more our style.

Is there anything preventing us from having the planner resolve object
names from strings?


Re: range_agg

2019-05-04 Thread Corey Huinker
>
> One question is how to aggregate ranges that would leave gaps and/or
> overlaps. So in my extension there is a one-param version that forbids
> gaps & overlaps, but I let you permit them by passing extra parameters,
> so the signature is:
>

Perhaps a third way would be to allow and preserve the gaps.

A while back I wrote an extension called disjoint_date_range for storing
sets of dates where it was assumed that most dates would be contiguous. The
basic idea was that The core datatype was an array of ranges of dates, and
with every modification you'd unnest them all to their discrete elements
and use a window function to identify "runs" of dates and recompose them
into a canonical set. It was an efficient way of representing "Every day
last year except for June 2nd and August 4th, when we closed business for
special events."

For arrays of ranges the principle is the same but it'd get a bit more
tricky, you'd have to order by low bound, use window functions to detect
adjacency/overlap to identify your runs, and the generate the canonical
minimum set of ranges in your array.


Re: [PATCH v1] Add \echo_stderr to psql

2019-04-21 Thread Corey Huinker
>
>
>\warn ...
>\warning ...
>

These two seem about the best to me, drawing from the perl warn command.

I suppose we could go the bash &2 route here, but I don't want to.


Re: DWIM mode for psql

2019-03-31 Thread Corey Huinker
On Sun, Mar 31, 2019 at 5:04 PM Andres Freund  wrote:

> On 2019-04-01 09:52:34 +1300, Thomas Munro wrote:
> > +/*
> > + * This program is free software: you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License as published by
> > + *  the Free Software Foundation, either version 3 of the License, or
> > + *  (at your option) any later version.
>
> Indentation bug. You really need to work a bit more careful.
>

The patch applies cleanly, and passes "make check", but it generated an
executable called "mongodb".
Should I have run "make maintainer-clean" first?


Re: Syntax diagrams in user documentation

2019-03-28 Thread Corey Huinker
On Thu, Mar 28, 2019 at 6:49 PM Peter Geoghegan  wrote:

> On Thu, Mar 28, 2019 at 3:46 PM Jeremy Schneider 
> wrote:
> > We're just gearing up for the Google Season of Docs and I think this
> > would be a great task for a doc writer to help with.  Any reason to
> > expect serious objections to syntax diagram graphics in the docs?
>
> It might be hard to come to a consensus, because it's one of those
> things that everybody can be expected to have an opinion on. It
> probably won't be hard to get something committed that's clearly more
> informative than what we have right now, though.
>
> There is a question about how we maintain consistency between the
> syntax diagrams in psql if we go this way, though. Not sure what to do
> about that.
>

This discussion is highly relevant to an upcoming talk I have called "In
Aid Of RTFM", and the work I hope would follow from it.

While I personally like these bubble charts because they remind me of my
misspent youth at IBM, they have some drawbacks:

1. They look like something out of an IBM manual
2. Images conceal information from visually impaired people
3. They aren't copy paste-able text
4. They aren't easily comparable
5. They bake in the language of the comments

The merits of #1 can be argued forever, and it's possible that a more
modern bubble chart theme is possible.

#2 is problematic, because things like ADA compliance and the EU
Accessibility Requirements frown upon conveying text inside images. The way
around this might be to have the alt-text of the image be the original
syntax as we have it now.

#3 is important when attempting to relay the relevant excerpt of a very
large documentation page via email or slack. Yes, I could right click and
copy the URL of the image (in this case
https://www.sqlite.org/images/syntax/insert-stmt.gif and others), but
that's more work that copy-paste. We could add an HTML anchor to each image
(my talk discusses our current lack of reference anchors) and that would
mitigate it somewhat. Making the original text available via mouse-over or
a "copy text" link might work too.

#3b As long as I live, I will never properly memorize the syntax for RANGE
BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. I will google this and
copy-paste it. I suspect I'm not alone. If it's available only in an image,
then I can't copy paste, and I *will* mistype some part of that at least
twice.

#4 isn't such an immediate issue, but one of my points in the talk is that
right now there is no way to easily distinguish text on a page that is new
in the most recent version of pgsql (i.e. a red-line markup). We could of
course flag that an image changed from version X-1 to X, but it would be
tougher to convey which parts of the image changed.

#5 it not such a big issue because most of what is in the diagram is pure
syntax, but comments will leak in, and those snippets of English will be
buried very deep in bubble-markup.


GIN indexes on an = ANY(array) clause

2019-03-13 Thread Corey Huinker
(moving this over from pgsql-performance)

A client had an issue with a where that had a where clause something like
this:

WHERE 123456 = ANY(integer_array_column)


I was surprised that this didn't use the pre-existing GIN index on
integer_array_column, whereas recoding as

WHERE ARRAY[123456] <@ integer_array_column


did cause the GIN index to be used. Is this a known/expected behavior? If
so, is there any logical reason why we couldn't have the planner pick up on
that?

Flo Rance (toura...@gmail.com) was nice enough to show that yes, this is
expected behavior.

Which leaves the questions:
- is the transformation I made is algebraically correct in a general case?
- if so, could we have the planner do that automatically when in the
presence of a matching GIN index?

This seems like it might tie in with the enforcement of foreign keys within
an array thread (which I can't presently find...).


Re: \describe*

2019-03-08 Thread Corey Huinker
On Mon, Mar 4, 2019 at 1:45 PM Corey Huinker 
wrote:

>
>>> - Tab completion for \descibe-verbose.
>>> I know that \d+ tab completion is also not there, but I think we must
>>> have tab completion for \descibe-verbose.
>>>
>>> postgres=# \describe-
>>> \describe-extension
>>>  \describe-replication-publication \describe-user-mapping
>>> \describe-foreign-data-wrapper
>>> \describe-replication-subscription\describe-view
>>> \describe-foreign-server  \describe-role
>>> \describe-window-function
>>> \describe-foreign-table   \describe-rule
>>>  ...
>>>
>>
> I just confirmed that there isn't tab completion for the existing S/+
> options, so it's hard to justify them for the equivalent verbose suffixes.
>

We can add completions for describe[-thing-]-verbose, but the
auto-completions start to run into combinatoric complexity, and the
original short-codes don't do that completion, probably for the same reason.

+   success =
>>> listTables("tvmsE", NULL, show_verbose, show_system);
>>> +   }
>>> +   status =
>>> PSQL_CMD_UNKNOWN;
>>>
>>>
> I'll look into this, thanks!
>

This was fixed, good find.



> - Confusion about \desc and \desC
>>> There is confusion while running the \desc command. I know the problem,
>>> but the user may confuse by this.
>>> postgres=# \desC
>>>List of foreign servers
>>>  Name | Owner | Foreign-data wrapper
>>> --+---+--
>>> (0 rows)
>>>
>>> postgres=# \desc
>>> Invalid command \desc. Try \? for help.
>>>
>>
I've changed the code to first strip out 0-1 instances of "-verbose" and
"-system" and the remaining string must be an exact match of a describe
command or it's an error. This same system could be applied to the short
commands to strip out 'S' and '+' and it might clean up the original code a
bit.

This command shows a list of relation "\d"
>>> postgres=# \describe-aggregatE-function
>>> List of relations
>>>  Schema | Name | Type  |  Owner
>>> +--+---+-
>>>  public | foo  | table | vagrant
>>> (1 row)
>>>
>>
Same issue, same fix.


>>> I have done a brief code review except for the documentation code. I
>>> don't like this code
>>>
>>> if (cmd_match(cmd,"describe-aggregate-function"))
>>>
>>>  success = describeAggregates(pattern, show_verbose, show_system);
>>>  else if (cmd_match(cmd,
>>> "describe-access-method"))
>>>  success =
>>> describeAccessMethods(pattern, show_verbose);
>>>  else if (cmd_match(cmd,
>>> "describe-tablespace"))
>>>  success = describeTablespaces(pattern,
>>> show_verbose);
>>>  else if (cmd_match(cmd,
>>> "describe-conversion"))
>>>  success = listConversions(pattern,
>>> show_verbose, show_system);
>>>  else if (cmd_match(cmd, "describe-cast"))
>>>  success = listCasts(pattern,
>>> show_verbose
>>>
>>>
>>> This can be achieved with the list/array/hash table, so I have changed
>>> that code in the attached patch just for a sample if you want I can do that
>>> for whole code.
>>>
>>
> There's some problems with a hash table. The function signatures vary
> quite a lot, and some require additional psql_scan_slash_options to be
> called. The hash option, if implemented, probably should be expanded to all
> slash commands, at which point maybe it belongs in psqlscanslash.l...
>

As I suspected, there's a lot of variance in the function signatures of the
various listSomething()/describeSomething() commands,
and listDbRoleSettings requires a second pattern to be scanned, and as far
as I know PsqlScanState isn't known inside describe.h, so building and
using a hash table would be a lot of work for uncertain gain. The original
code just plows through strings in alphabetical order, breaking things up
by comparing leading characters, so I largely did the same at the
des/decribe levels.

Instead of a hash table, It might be fun to write

Re: Re: \describe*

2019-03-05 Thread Corey Huinker
>
>
> I agree with Andres and Robert.  This patch should be pushed to PG13.
>
> I'll do that on March 8 unless there is a compelling argument not to.
>
>
No objection. I'll continue to work on it, though.


Re: \describe*

2019-03-04 Thread Corey Huinker
>
>
>> - Tab completion for \descibe-verbose.
>> I know that \d+ tab completion is also not there, but I think we must
>> have tab completion for \descibe-verbose.
>>
>> postgres=# \describe-
>> \describe-extension
>>  \describe-replication-publication \describe-user-mapping
>> \describe-foreign-data-wrapper
>> \describe-replication-subscription\describe-view
>> \describe-foreign-server  \describe-role
>>   \describe-window-function
>> \describe-foreign-table   \describe-rule
>>  ...
>>
>
I just confirmed that there isn't tab completion for the existing S/+
options, so it's hard to justify them for the equivalent verbose suffixes.



> (1 row)
>> Invalid command \describe. Try \? for help.
>>
>>
>> I think this status is causing the problem.
>>
>>
>>
>> +   /*
>> standard listing of interesting things */
>> +   success =
>> listTables("tvmsE", NULL, show_verbose, show_system);
>> +   }
>> +   status = PSQL_CMD_UNKNOWN;
>>
>>
I'll look into this, thanks!



> - Confusion about \desc and \desC
>> There is confusion while running the \desc command. I know the problem,
>> but the user may confuse by this.
>> postgres=# \desC
>>List of foreign servers
>>  Name | Owner | Foreign-data wrapper
>> --+---+--
>> (0 rows)
>>
>> postgres=# \desc
>> Invalid command \desc. Try \? for help.
>>
>> - Auto-completion of commands.
>> There is some more confusion in the completion of commands.
>>
>> This command shows List of aggregates.
>> postgres=# \describe-aggregate-function
>>  List of aggregate functions
>>  Schema | Name | Result data type | Argument data types | Description
>> +--+--+-+-
>> (0 rows)
>>
>>
>>
>> This command shows a list of relation "\d"
>> postgres=# \describe-aggregatE-function
>> List of relations
>>  Schema | Name | Type  |  Owner
>> +--+---+-
>>  public | foo  | table | vagrant
>> (1 row)
>>
>> This command also shows a list of relations "\d".
>> postgres=# \describe-aggr
>> List of relations
>>  Schema | Name | Type  |  Owner
>> +--+---+-
>>  public | foo  | table | vagrant
>> (1 row)
>>
>> This command shows error messages.
>> postgres=# \descr
>> Invalid command \descr. Try \? for help.
>>
>>
I will look into it.



>
>> I have done a brief code review except for the documentation code. I
>> don't like this code
>>
>> if (cmd_match(cmd,"describe-aggregate-function"))
>>
>>  success = describeAggregates(pattern, show_verbose, show_system);
>>  else if (cmd_match(cmd,
>> "describe-access-method"))
>>  success = describeAccessMethods(pattern,
>> show_verbose);
>>  else if (cmd_match(cmd,
>> "describe-tablespace"))
>>  success = describeTablespaces(pattern,
>> show_verbose);
>>  else if (cmd_match(cmd,
>> "describe-conversion"))
>>  success = listConversions(pattern,
>> show_verbose, show_system);
>>  else if (cmd_match(cmd, "describe-cast"))
>>  success = listCasts(pattern, show_verbose
>>
>>
>> This can be achieved with the list/array/hash table, so I have changed
>> that code in the attached patch just for a sample if you want I can do that
>> for whole code.
>>
>
There's some problems with a hash table. The function signatures vary quite
a lot, and some require additional psql_scan_slash_options to be called.
The hash option, if implemented, probably should be expanded to all slash
commands, at which point maybe it belongs in psqlscanslash.l...

>


Re: Referential Integrity Checks with Statement-level Triggers

2019-02-25 Thread Corey Huinker
>
>
> In order to avoid per-row calls of the constraint trigger functions, we
> could
> try to "aggregate" the constraint-specific events somehow, but I think a
> separate queue would be needed for the constraint-specific events.
>
> In general, the (after) triggers and constraints have too much in common,
> so
> separation of these w/o seeing code changes is beyond my imagination.
>
>
Yeah, there's a lot of potential for overlap where a trigger could "borrow"
an RI tuplestore or vice versa.

The people who expressed opinions on nuking triggers from orbit (it's the
only way to be sure) have yet to offer up any guidance on how to proceed
from here, and I suspect it's because they're all very busy getting things
ready for v12. I definitely have an interest in working on this for 13, but
I don't feel good about striking out on my own without their input.


Re: some ri_triggers.c cleanup

2019-02-25 Thread Corey Huinker
>
> Right, this makes a lot of sense, similar to how ri_restrict() combines
> RESTRICT and NO ACTION.
>

I'm pretty sure that's where I got the idea, yes.


Re: \describe*

2019-02-23 Thread Corey Huinker
>
> Given that this patch has been added to the last commitfest for v12, I
> think we should mark it as targeting 13, so it can be skipped over by
> people looking to get things into v12.  Even leaving fairness aside, I
> don't think it's likely to be ready quickly enough...
>

Obviously this patch is nowhere near the importance of most patches slated
for v12, but I would hope it can be considered, time permitting.

The size of the patch may look large (1036 lines), but 650+ of that is pure
documentation changes, ~50 lines of added autocomplete strings, ~140 lines
are added TailMatches calls (one per new autocomplete string), and what
remains is strncmp() calls to match those same strings, so it's pretty mild
in terms of impact.


Re: some ri_triggers.c cleanup

2019-02-23 Thread Corey Huinker
On Fri, Feb 22, 2019 at 1:12 PM Corey Huinker 
wrote:

> On Fri, Feb 22, 2019 at 11:05 AM Peter Eisentraut <
> peter.eisentr...@2ndquadrant.com> wrote:
>
>> ri_triggers.c is endlessly long and repetitive.  I want to clean it up a
>> bit (more).
>>
>
> Having just been down this road, I agree that a lot of cleanup is needed
> and possible.
>
>
>> I looked into all these switch cases for the unimplemented MATCH PARTIAL
>> option.  I toyed around with how a MATCH PARTIAL implementation would
>> actually look like, and it likely wouldn't use the existing code
>> structure anyway, so let's just simplify this for now.
>>
>
> +1
>
>
>
>> Attached are some patches.
>
>
> I intend to look this over in much greater detail, but I did skim the code
> and it seems like you left the SET DEFAULT and SET NULL paths separate. In
> my attempt at statement level triggers I realized that they only differed
> by the one literal value, and parameterized the function.
>
>

I've looked it over more closely now and I think that it's a nice
improvement.

As I suspected, the code for SET NULL and SET DEFAULT are highly similar
(see .diff), the major difference being two constants, the order of some
variable declarations, and the recheck in the set-default case.

The changes were so simple that I felt remiss not adding the patch for you
(see .patch).

Passes make check.
diff --git a/set_null.c b/set_default.c
index bc323ec..b2dd91d 100644
--- a/set_null.c
+++ b/set_default.c
@@ -1,10 +1,10 @@
 /*
- * ri_setnull -
+ * ri_setdefault -
  *
- * Common code for ON DELETE SET NULL and ON UPDATE SET NULL
+ * Common code for ON DELETE SET DEFAULT and ON UPDATE SET DEFAULT
  */
 static Datum
-ri_setnull(TriggerData *trigdata)
+ri_setdefault(TriggerData *trigdata)
 {
 const RI_ConstraintInfo *riinfo;
 Relationfk_rel;
@@ -30,10 +30,10 @@ ri_setnull(TriggerData *trigdata)
 elog(ERROR, "SPI_connect failed");
 
 /*
- * Fetch or prepare a saved plan for the set null operation (it's
- * the same query for delete and update cases)
+ * Fetch or prepare a saved plan for the set default operation
+ * (it's the same query for delete and update cases)
  */
-ri_BuildQueryKey(, riinfo, RI_PLAN_SETNULL_DOUPDATE);
+ri_BuildQueryKey(, riinfo, RI_PLAN_SETDEFAULT_DOUPDATE);
 
 if ((qplan = ri_FetchPreparedPlan()) == NULL)
 {
@@ -44,12 +44,12 @@ ri_setnull(TriggerData *trigdata)
 charparamname[16];
 const char *querysep;
 const char *qualsep;
-const char *fk_only;
 Oid queryoids[RI_MAX_NUMKEYS];
+const char *fk_only;
 
 /* --
  * The query string built is
- *  UPDATE [ONLY]  SET fkatt1 = NULL [, ...]
+ *  UPDATE [ONLY]  SET fkatt1 = DEFAULT [, ...]
  *  WHERE $1 = fkatt1 [AND ...]
  * The type id's for the $ parameters are those of the
  * corresponding PK attributes.
@@ -57,9 +57,9 @@ ri_setnull(TriggerData *trigdata)
  */
 initStringInfo();
 initStringInfo();
+quoteRelationName(fkrelname, fk_rel);
 fk_only = fk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
 "" : "ONLY ";
-quoteRelationName(fkrelname, fk_rel);
 appendStringInfo(, "UPDATE %s%s SET",
  fk_only, fkrelname);
 querysep = "";
@@ -72,9 +72,10 @@ ri_setnull(TriggerData *trigdata)
 quoteOneName(attname,
  RIAttName(fk_rel, riinfo->fk_attnums[i]));
 appendStringInfo(,
- "%s %s = NULL",
+ "%s %s = DEFAULT",
  querysep, attname);
 sprintf(paramname, "$%d", i + 1);
+sprintf(paramname, "$%d", i + 1);
 ri_GenerateQual(, qualsep,
 paramname, pk_type,
 riinfo->pf_eq_oprs[i],
@@ -104,5 +105,20 @@ ri_setnull(TriggerData *trigdata)
 
 table_close(fk_rel, RowExclusiveLock);
 
-return PointerGetDatum(NULL);
+/*
+ * If we just deleted or updated the PK row whose key was equal to
+ * the FK columns' default values, and a referencing row exists in
+ * the FK table, we would have updated that row to the same values
+ * it already had --- and RI_FKey_fk_upd_check_required would
+ * hence believe no check is necessary.  So we need to do another
+ * lookup now and in case a reference still exists, abort the
+ * operation.  That is already implemented in the NO ACTION
+ * trigger, so just run it.  (This recheck is only needed in the
+ * SET DEFAULT case, since CASCADE would remove such rows in case
+ * of a 

Re: some ri_triggers.c cleanup

2019-02-22 Thread Corey Huinker
On Fri, Feb 22, 2019 at 11:05 AM Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> ri_triggers.c is endlessly long and repetitive.  I want to clean it up a
> bit (more).
>

Having just been down this road, I agree that a lot of cleanup is needed
and possible.


> I looked into all these switch cases for the unimplemented MATCH PARTIAL
> option.  I toyed around with how a MATCH PARTIAL implementation would
> actually look like, and it likely wouldn't use the existing code
> structure anyway, so let's just simplify this for now.
>

+1



> Attached are some patches.


I intend to look this over in much greater detail, but I did skim the code
and it seems like you left the SET DEFAULT and SET NULL paths separate. In
my attempt at statement level triggers I realized that they only differed
by the one literal value, and parameterized the function.


Re: Referential Integrity Checks with Statement-level Triggers

2019-02-22 Thread Corey Huinker
>
>
> While the idea to use the transition table is good, this approach probably
> requires the trigger engine (trigger.c) to be adjusted, and that in a
> non-trivial way.
>

It probably does. Several people with advanced knowledge of trigger.c
expressed a desire to rebuild trigger.c from the ground up, and with it
create case-specific tuplestores for handling referential integrity
constraints, which would be lighter than either the transition tables or
the per-row invocation of a trigger. After all, we need a RI check to
happen, we don't need it to happen *through a trigger function*.

I'm also not sure if it's o.k. that performance related patch potentially
> makes performance worse in some cases. If FK violations are checked at
> statement boundary, the wasted effort / time can (at least in theory) be
> high
> if early rows violate the FK.
>

That concern was also expressed with varying levels of alarm in their
voices.

Have you considered bulk processing of individual rows by row-level trigger?
> For IMMEDIATE constraints we'd have to ensure that the trigger is notified
> that the current row is the last one from the current query, but that might
> not be difficult.
>

I'm not sure I understand what you're suggesting, but if it keeps the
overhead of one trigger firing per row deleted, then it doesn't seem like
much of a win.

Given that this patch has been punted to v13, I'd like to instead look at
how we might go about building up the transition tuplestores for the
specific purpose of doing the RI checks, not just deletes, and executing
those at the appropriate time, rather than trying to make our needs fit
into trigger form.


Re: Alternative to \copy in psql modelled after \g

2019-01-28 Thread Corey Huinker
> Otherwise "\g -" looks good as a portable solution.

+1



Re: \describe*

2019-01-24 Thread Corey Huinker
Attached is a patch to add verbose \describe commands to compliment our
existing but slightly cryptic family of \d commands.

The goals of this are:
- aid user discovery of \d-commands via tab completion
- make scripts and snippets slightly more self-documenting and
understandable
- save experienced users that 0.22 seconds where they try to remember what
\dFpS+ means or which command lists user mappings.

DESIGN CHOICES:

Every new command is of the form
\describe-some-system-object-type[-system][-verbose]. The -system suffix
stands in for the 'S' suffix and -verbose stands in for '+'.

New commands used the singular form, not plural.

Every new command has a direct analog \d-command, but the reverse is not
always true, especially when it comes to the commands that can specify
multiple object types. In those cases, there are multiple long versions
that correspond to several singular parameters (\describe-view,
\describe-materialized-view, \describe-index, etc) but no combinatorics
(i.e. no \describe-view-and-foreign-table).

There is a \describe-schema and \describe-namespace, both of which perform
\dn.

There is a \describe-role but no \describe-user or \describe-database-role.

I chose \describe-privilege for \dp

I chose \describe-type for \dT instead of \describe-data-type.

The command \describe-aggregate-function is \dfa, whereas
\describe-aggregate is \da.

NOTES:

There is currently nothing stopping you from using the short form suffixes
on long form commands, but the reverse isn't true. For example, you can
type \describe-functionS+ and it'll work, but \df-verbose will not. I allow
this mostly because it would take work to prevent it.

Documentation XML was updated but not formatted to make the diff easier to
read.

No regression cases were added. Currently our coverage of \d commands in
psql ifself is quite minimal:

~/src/postgres$ grep '\\d' src/test/regress/sql/psql.sql | sort | uniq
\copyright \dt arg1 \e arg1 arg2
\df exp
\d psql_serial_tab_id_seq


but perhaps we could test it indirectly in these other areas:

~/src/postgres/src/test/regress/sql$ grep '\\d' * | sed -e 's/^.*\\d/\\d/g'
-e 's/ .*//g' | sort | uniq -c
156 \d
  2 \d'
  1 \d*',
157 \d+
  1 \d{4})',
  1 \da
  2 \d+):([a-zA-Z0-9+/=]+)\$([a-zA-Z0-9+=/]+):([a-zA-Z0-9+/=]+)',
  4 \des
  8 \des+
  1 \det+
  4 \deu
  6 \deu+
  1 \dew
 14 \dew+
 21 \df
  1 \dfn
  1 \dfp
  4 \dp
  4 \dRp
  6 \dRp+
  2 \dRs
  3 \dRs+
  2 \dt



On Mon, Jan 29, 2018 at 9:56 AM David Fetter  wrote:

> On Mon, Jan 29, 2018 at 02:51:53PM +, Ryan Murphy wrote:
> > >
> > > >What I propose is in fact a server command, >which at least three of
> > > >the other popular RDBMSs already have.
> > >
> > Well to actually implement it, it would probably be a client command,
> > because that's what \d* are.
>
> Why should this command be silo'ed off to the psql client?  If it's a
> server command, it's available to all clients, not just psql.
>
> > We would most likely want them implemented the same, to avoid
> > needless complexity.
>
> We could certainly have \d call DESCRIBE for later versions of the
> server.  \ commands which call different SQL depending on server
> version have long been a standard practice.
>
> > I think people are more ok with \describe (with the backslash), which
> seems
> > like what you're suggesting anyway.  I read Vik's "hard pass" as being on
> > having DESCRIBE which looks like an SQL command but would actually be
> > implemented on the client.  This seems simpler at first but could cause
> > deep confusion later.
>
> If we implement \d as DESCRIBE for server versions as of when DESCRIBE
> is actually implemented, we've got wins all around.
>
> Best,
> David.
> --
> David Fetter  http://fetter.org/
> Phone: +1 415 235 3778
>
> Remember to vote!
> Consider donating to Postgres: http://www.postgresql.org/about/donate
>
From e67e61ae789b09c98fe03378c819224d838c2f65 Mon Sep 17 00:00:00 2001
From: Corey Huinker 
Date: Fri, 25 Jan 2019 00:57:23 +
Subject: [PATCH] Add \describe commands to compliment \d commands

---
 doc/src/sgml/ref/psql-ref.sgml | 175 -
 src/bin/psql/command.c | 132 -
 src/bin/psql/describe.c|  13 ++-
 src/bin/psql/describe.h|   3 +
 src/bin/psql/tab-complete.c| 135 -
 5 files changed, 381 insertions(+), 77 deletions(-)

diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 6c76cf2f00..363d6d9678 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -871,6 +871,17 @@ testdb=
 same line.
 
 
+
+The family of meta-commands starting with \d often
+have a

Re: Referential Integrity Checks with Statement-level Triggers

2019-01-23 Thread Corey Huinker
Attached is a patch that refactors DELETE triggers to fire at the statement
level.

I chose delete triggers partly out of simplicity, and partly because there
some before/after row linkage in the ON UPDATE CASCADE cases where
statement level triggers might not be feasible as we have currently
implemented them.

After having done the work, I think INSERT triggers would be similarly
straightforward, but wanted to limit scope.

Also, after having stripped the delete cases out of the update-or-delete
functions, it became obvious that the on-update-set-null and
on-update-set-default cases differed by only 3-4 lines, so those functions
were combined.

On a vagrant VM running on my desktop machine, I'm seeing a speed-up of
about 25% in the benchmark provided. I think that figure is cloudy and
below my expectations. Perhaps we'd get a much better picture of whether or
not this is worth it on a bare metal machine, or at least a VM better
suited to benchmarking.

Currently 4 make-check tests are failing. Two of which appear to false
positives (the test makes assumptions about triggers that are no longer
true), and the other two are outside the scope of this benchmark so I'll
revisit them if we go forward.

ri-set-logic.sql is an edited benchmark script adapted from Kevin
Grittner's benchmark that he ran against hand-rolled triggers and posted on
2016-11-02
ri_test.out is a copy paste of two runs of the benchmark script.

Many thanks to everyone who helped, often despite their own objections to
the overall reasoning behind the endeavor. I'm aware that a large
contingent of highly experienced people would very much like to replace our
entire trigger architecture, or at least divorce RI checks from triggers.
Maybe this patch spurs on that change. Even if nothing comes of it, it's
been a great learning experience.

On Sat, Dec 22, 2018 at 11:28 AM Emre Hasegeli  wrote:

> > It is far from a premature optimization IMO, it is super useful and
> something I was hoping would happen ever since I heard about transition
> tables being worked on.
>
> Me too.  Never-ending DELETEs are a common pain point especially for
> people migrated from MySQL which creates indexes for foreign keys
> automatically.
>
From 8a73f9233211076421a565b5c90ecd029b5e6581 Mon Sep 17 00:00:00 2001
From: vagrant 
Date: Wed, 23 Jan 2019 16:59:17 +
Subject: [PATCH] Change Delete RI triggers to Statement-Level Triggers

---
 src/backend/commands/tablecmds.c|   9 +-
 src/backend/commands/trigger.c  |   2 +
 src/backend/utils/adt/ri_triggers.c | 779 
 3 files changed, 566 insertions(+), 224 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 28a137bb53..21f5bf94a4 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -8954,6 +8954,11 @@ createForeignKeyActionTriggers(Relation rel, Oid refRelOid, Constraint *fkconstr
 {
 	CreateTrigStmt *fk_trigger;
 
+	TriggerTransition *del = makeNode(TriggerTransition);
+	del->name = "pg_deleted_transition_table";
+	del->isNew = false;
+	del->isTable = true;
+
 	/*
 	 * Build and execute a CREATE CONSTRAINT TRIGGER statement for the ON
 	 * DELETE action on the referenced table.
@@ -8961,11 +8966,11 @@ createForeignKeyActionTriggers(Relation rel, Oid refRelOid, Constraint *fkconstr
 	fk_trigger = makeNode(CreateTrigStmt);
 	fk_trigger->trigname = "RI_ConstraintTrigger_a";
 	fk_trigger->relation = NULL;
-	fk_trigger->row = true;
+	fk_trigger->row = false;
 	fk_trigger->timing = TRIGGER_TYPE_AFTER;
 	fk_trigger->events = TRIGGER_TYPE_DELETE;
 	fk_trigger->columns = NIL;
-	fk_trigger->transitionRels = NIL;
+	fk_trigger->transitionRels = list_make1(del);
 	fk_trigger->whenClause = NULL;
 	fk_trigger->isconstraint = true;
 	fk_trigger->constrrel = NULL;
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 7ffaeaffc6..080587215f 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -510,7 +510,9 @@ CreateTrigger(CreateTrigStmt *stmt, const char *queryString,
 			 *
 			 * Currently this is enforced by the grammar, so just Assert here.
 			 */
+			/*
 			Assert(!stmt->isconstraint);
+			*/
 
 			if (tt->isNew)
 			{
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index e1aa3d0044..6f89ab4c77 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -194,9 +194,10 @@ static int	ri_constraint_cache_valid_count = 0;
 static bool ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
   HeapTuple old_row,
   const RI_ConstraintInfo *riinfo);
-static Datum ri_restrict(TriggerData *trigdata, bool is_no_action);
-static Datum ri_setnull(TriggerData *trigdata);
-static Datum ri_setdefault(TriggerData *trigdata);
+static Datum ri_on_update_restrict(TriggerData *trigdata, bool is_no_action);
+static Datum ri_on_delete_restrict(TriggerData *trigdata, bool is_no_action);
+static 

Re: Statement-level Triggers For Uniqueness Checks

2019-01-08 Thread Corey Huinker
On Fri, Jan 4, 2019 at 7:49 AM Peter Eisentraut
 wrote:
>
> On 25/12/2018 00:56, Corey Huinker wrote:
> > The regression diff (attached) seems to imply that the triggers simply
> > are not firing, though.
>
> The reason for this was explained by Dean.  If you take out the check
> that he mentioned, then your trigger fires but crashes.  In your changed
> unique_key_recheck(), "slot" is not initialized before use (or ever).

Thanks. I'll be revisiting this shortly. Dean's information made me
think the potential for a gain is smaller than initially imagined.



Re: Statement-level Triggers For Uniqueness Checks

2018-12-24 Thread Corey Huinker
So I took a first pass at this, and I got stuck.

The basic design is that instead of creating one row-level trigger per
deferrable unique constraint, we instead create one insert-statement level
trigger and one update-statement level trigger. Both call the function
unique_key_recheck(), which now attempts to walk the inserted transition
table, doing basically the same checks that were done in the per-row
trigger. I'm hoping for some performance advantage for large row
inserts/updates due to N-1 fewer triggers firing and N-1 attempts to lock
the unique index.

The regression diff (attached) seems to imply that the triggers simply are
not firing, though. I have verified that the triggers are created:

test=# CREATE TEMPORARY TABLE test ( x integer PRIMARY KEY DEFERRABLE
INITIALLY DEFERRED );

CREATE TABLE

test=# SELECT * FROM pg_trigger WHERE

oid tgconstrainttgdeferrabletginitdeferred  tgnargs
tgqual

tgargs  tgconstrindid   tgenabled   tgisinternaltgnewtable
tgrelid

tgattr  tgconstrrelid   tgfoid  tgname  tgoldtable
tgtype

test=# SELECT * FROM pg_trigger WHERE tgrelid = 'test'::regclass;

  oid  | tgrelid |tgname| tgfoid | tgtype |
tgenabled | tgisinternal | tgconstrrelid | tgconstrindid | tgconstraint | t

gdeferrable | tginitdeferred | tgnargs | tgattr | tgargs | tgqual |
tgoldtable |  tgnewtable

---+-+--+++---+--+---+---+--+--

++-+++++--

 16392 |   16387 | PK_ConstraintTrigger_i_16392 |   1250 |  4 | O
  | t| 0 | 16390 |16391 | t

| t  |   0 || \x ||
| pg_inserted_transition_table

 16393 |   16387 | PK_ConstraintTrigger_u_16393 |   1250 | 16 | O
  | t| 0 | 16390 |16391 | t

| t  |   0 || \x ||
| pg_inserted_transition_table

(2 rows)

Any idea where I went wrong?

On Mon, Dec 17, 2018 at 9:56 AM Corey Huinker 
wrote:

> In digging around the codebase (see thread: Referential Integrity Checks
> with Statement-level Triggers), I noticed that unique constraints are
> similarly enforced with a per-row trigger.
>
> The situation with unique indexes is fairly similar to the situation with
> RI checks: there is some overhead to using a transition table, but that
> overhead may be less than the cost of firing a trigger once per row
> inserted/updated.
>
> However, there are some significant differences (apologies to everyone
> already familiar with this part of the code, it's new to me).
>
> For one, there is no analog to RI_Initial_Check(). Instead the constraint
> is initially checked via building/finding the unique index that would
> enforce the uniqueness check.
>
> Then, the actual lookup done in unique_key_recheck has to contend with the
> intricacies of HOT updates, so I don't know if that can be expressed in an
> SPI query. Even if not, I think it should be possible to iterate over
> the EphemeralNamedRelation and that would result itself have a payoff in
> reduced trigger calls.
>
> I'm going to be working on this as a POC patch separate from the RI work,
> hence the separate thread, but there's obviously a lot of overlap.
>
> All advice is appreciated.
>
>


regression.diffs
Description: Binary data


0001-Refactor-per-row-unique-key-deferred-constraint-trig.patch
Description: Binary data


Statement-level Triggers For Uniqueness Checks

2018-12-17 Thread Corey Huinker
In digging around the codebase (see thread: Referential Integrity Checks
with Statement-level Triggers), I noticed that unique constraints are
similarly enforced with a per-row trigger.

The situation with unique indexes is fairly similar to the situation with
RI checks: there is some overhead to using a transition table, but that
overhead may be less than the cost of firing a trigger once per row
inserted/updated.

However, there are some significant differences (apologies to everyone
already familiar with this part of the code, it's new to me).

For one, there is no analog to RI_Initial_Check(). Instead the constraint
is initially checked via building/finding the unique index that would
enforce the uniqueness check.

Then, the actual lookup done in unique_key_recheck has to contend with the
intricacies of HOT updates, so I don't know if that can be expressed in an
SPI query. Even if not, I think it should be possible to iterate over
the EphemeralNamedRelation and that would result itself have a payoff in
reduced trigger calls.

I'm going to be working on this as a POC patch separate from the RI work,
hence the separate thread, but there's obviously a lot of overlap.

All advice is appreciated.


Referential Integrity Checks with Statement-level Triggers

2018-12-17 Thread Corey Huinker
Back when Pg added statement-level triggers, I was interested in the
potential promise of moving referential integrity checks to statement-level
triggers.

The initial conversation, along with Kevin Grittner's POC script (in SQL)
that showed a potential for a 98% reduction in time spent doing RI checks.
The original thread is here:

https://www.postgresql.org/message-id/CACjxUsM4s9%3DCUmPU4YFOYiD5f%3D2ULVDBjuFSo20Twe7KbUe8Mw%40mail.gmail.com

I dug around in the code, and was rather surprised at how close we already
are to implementing this. The function RI_Initial_Check() already does a
left-join query via SPI to look for any invalid data, so if we could just
replace the near table with the transition table for inserted rows, we'd be
home free. The function SPI_register_trigger_data() makes the transition
tables visible to SPI, so I started to wonder why this hadn't be done
already.

I approached Kevin and Thomas Munro seeking feedback on my approach. I also
made it into a session at the PgConf.ASIA un-conference, and then later
with Michael Paquier at that same conference, and the coalesced feedback
was this:

- the overhead of registering the transition tables probably makes it
unprofitable for single row inserts
- the single row overhead is itself significant, so maybe the transition
tables aren't worse
- there has been talk of replacing transition tables with an in-memory data
structure that would be closer to "free" from a startup perspective and
might even coalesce the transition tables of multiple statements in the
same transaction
- because no declarative code changes, it's trivial to switch from row
level to statement level triggering via pg_upgrade
- assuming that transition tables are an overhead that only pays off when >
N rows have been updated, does it make sense to enforce RI with something
that isn't actually a trigger?
- there was also some mention that parallel query uses a queue mechanism
that might be leveraged to do row-level triggers for updates of <= N rows
and statement level for > N

That's what I have so far. I'm going to be working on a POC patch so that I
can benchmark a pure-statement-level solution, which if nothing else will
let us know the approximate value of N.

All suggestions are appreciated.


Re: csv format for psql

2018-11-25 Thread Corey Huinker
On Sun, Nov 25, 2018 at 11:23 PM Tom Lane  wrote:

> Corey Huinker  writes:
> > Could we have another output type called "separated" that uses the
> existing
> > --fieldsep / --recordsep?
>
> Uh, what's the difference from the existing unaligned format?
>

No footer and I guess we'd want to escape instances of fieldsep and
recordsep in the data, so I guess if we had an option to escape instances
of fieldsep/recordsep found in the data, unaligned would work fine.


Re: csv format for psql

2018-11-25 Thread Corey Huinker
>
>
> Or we could kill both issues by hard-wiring the separator as ','.


+1

I've never encountered a situation where a customer wanted a custom
delimiter AND quoted strings. So either they wanted pure CSV or a customed
TSV.

Could we have another output type called "separated" that uses the existing
--fieldsep / --recordsep? Word will get out that csv is faster, but we'd
still have the flexibility if somebody really wanted it.


Re: Desirability of client-side expressions in psql?

2018-11-24 Thread Corey Huinker
>
> >>psql> \if :i >= 5
> >>
> > I think we're ok with that so long as none of the operators or values
> has a
> > \ in it.
> > What barriers do you see to re-using the pgbench grammar?
>
> The pgbench expression grammar mimics SQL expression grammar,
> on integers, floats, booleans & NULL.
>
> I'm unsure about some special cases in psql (`shell command`,
> 'text' "identifier"). They can be forbidden on a new commande (\let),
> but what happens on "\if ..." which I am afraid allows them is unclear.
>
> --
> Fabien.
>

(raising this thread from hibernation now that I have the bandwidth)

It seems like the big barriers to just using pgbench syntax are:
  - the ability to indicate that the next thing to follow will be a pgbench
expression
  - a way to coax pgbench truth-y values into psql truthy values (t/f, y/n,
1/0)

For that, I see a few ways forward:

1. A suffix on \if, \elif, -exp suffix (or even just -x) to existing
commands to indicate that a pgbench expression would follow
This would look something like
\ifx \elifx \setx
\if$ \elif$ \set$

2. A command-line-esque switch or other sigil to indicate that what follows
is a pgbench expression with psql vars to interpolate
Example:
\set foo -x 1 + 4
\set foo \expr 1 + 4
\if -x :limit > 10
\if \expr :limit > 10

3. A global toggle to indicate which mode should be used by \if, \elif, and
\set
Example:
 \pset expressions [on | off]

4. A combination of #2 and #3 with a corresponding switch/sigil to indicate
"do not evaluate pgbench-style
   This is particularly appealing to me because it would allow code
snippets from pgbench to be used without modification, while still allowing
the user to mix-in old/new style to an existing script.

5. A special variant of `command` where variables are interpolated before
being sent to the OS, and allow that on \if, \elif
\set foo ``expr :y + :z``
\set foo $( expr :y + :z )
\if ``expr :limit > 10``
\if $( expr :limit > 10 )

This also has some appeal because it allows for a great amount of
flexibility, but obviously constrains us with OS-dependencies. The user
might have a hard time sending commands with ')' in them if we go the $( )
route

6. Option #5, but we add an additional executable (suggested name: pgexpr)
to the client libs, which encapsulates the pgbench expression library as a
way around OS-dependent code.

7. I believe someone suggested introducing the :{! pgbench-command} or :{{
pgbench-command }} var-mode
\set foo :{! :y + :z }
\set foo :{{ :y + :z }}
\if :{! :limit > 10 }
\if :{{ :limit > 10 }}

This has some appeal as well, though I prefer the {{...}}  syntax
because "!" looks like negation, and {{ resembles the [[ x + y ]] syntax in
bash

One nice thing is that most of these options are not mutually exclusive.

Thoughts?


Re: [HACKERS] generated columns

2018-11-15 Thread Corey Huinker
>
> > 3. Radical alternative: Collapse everything into one new column.  We
> > could combine atthasdef and attgenerated and even attidentity into a new
> > column.  (Only one of the three can be the case.)  This would give
> > client code a clean break, which may or may not be good.  The
> > implementation would be uglier than #1 but probably cleaner than #2.  We
> > could also get 4 bytes back per pg_attribute row.
> >
> > I'm happy with the current choice #1, but it's worth thinking about.
>
> #3 looks very appealing in my opinion as those columns have no overlap,
> so it would take five possible values:
>

Could the removed columns live on...as generated-always columns?


Re: partitioned tables referenced by FKs

2018-11-05 Thread Corey Huinker
>
>
> > 1. it seems that we will continue to to per-row RI checks for inserts and
> > updates. However, there already exists a bulk check in
> RI_Initial_Check().
> > Could we modify this bulk check to do RI checks on a per-statement basis
> > rather than a per-row basis?
>
> One of the goals when implementing trigger transition tables was to
> supplant the current per-row implementation of RI triggers with
> per-statement.  I haven't done that, but AFAIK it remains possible :-)
>
> Changing that is definitely not a goal of this patch.
>

Then I may try to tackle it myself in a separate thread.

Without an implementation, I can't say, but if I had to guess, I would
> assume so.  Or maybe there are clever optimizations for that particular
> case.
>

But in this case there is no actual defined trigger, it's internal code
making an SPI call...is there an indicator that tells us whether this
change was multi-row or not?


Re: partitioned tables referenced by FKs

2018-11-04 Thread Corey Huinker
On Fri, Nov 2, 2018 at 7:42 PM Alvaro Herrera 
wrote:

> Here's a patch to allow partitioned tables to be referenced by foreign
> keys.  Current state is WIP, but everything should work; see below for
> the expected exception.
>
> The design is very simple: have one pg_constraint row for each partition
> on each side, each row pointing to the topmost table on the other side;
> triggers appear on each leaf partition (and naturally they don't appear
> on any intermediate partitioned table).
>

This is an important and much needed feature!

Based on my extremely naive reading of this code, I have two perhaps
equally naive questions:

1. it seems that we will continue to to per-row RI checks for inserts and
updates. However, there already exists a bulk check in RI_Initial_Check().
Could we modify this bulk check to do RI checks on a per-statement basis
rather than a per-row basis?

2. If #1 is possible, is the overhead of transitions tables too great for
the single-row case?


Re: COPY FROM WHEN condition

2018-11-02 Thread Corey Huinker
>
>
> > SELECT x.a, sum(x.b)
> > FROM ( COPY INLINE '/path/to/foo.txt' FORMAT CSV ) as x( a integer, b
> numeric, c text, d date, e json) )
>
> Apologies for bike-shedding, but wouldn't the following be a better
> fit with the current COPY?
>
> COPY t(a integer, b numeric, c text, d date, e json) FROM
> '/path/to/foo.txt' WITH (FORMAT CSV, INLINE)
>

+1 Very much a better fit.

>
>


Re: COPY FROM WHEN condition

2018-11-01 Thread Corey Huinker
>
> > Are you thinking something like having a COPY command that provides
> > results in such a way that they could be referenced in a FROM clause
> > (perhaps a COPY that defines a cursor…)?
>
> That would also be nice, but what I was thinking of was that some
> highly restricted subset of cases of SQL in general could lend
> themselves to levels of optimization that would be impractical in
> other contexts.
>

If COPY (or a syntactical equivalent) can return a result set, then the
whole of SQL is available to filter and aggregate the results and we don't
have to invent new syntax, or endure confusion whenCOPY-WHEN syntax behaves
subtly different from a similar FROM-WHERE.

Also, what would we be saving computationally? The whole file (or program
output) has to be consumed no matter what, the columns have to be parsed no
matter what. At least some of the columns have to be converted to their
assigned datatypes enough to know whether or not to filter the row, but we
might be able push that logic inside a copy. I'm thinking of something like
this:

SELECT x.a, sum(x.b)
FROM ( COPY INLINE '/path/to/foo.txt' FORMAT CSV ) as x( a integer, b
numeric, c text, d date, e json) )
WHERE x.d >= '2018-11-01'


In this case, there is the *opportunity* to see the following optimizations:
- columns c and e are never referenced, and need never be turned into a
datum (though we might do so just to confirm that they conform to the data
type)
- if column d is converted first, we can filter on it and avoid converting
columns a,b
- whatever optimizations we can infer from knowing that the two surviving
columns will go directly into an aggregate

If we go this route, we can train the planner to notice other optimizations
and add those mechanisms at that time, and then existing code gets faster.

If we go the COPY-WHEN route, then we have to make up new syntax for every
possible future optimization.


Re: date_trunc() in a specific time zone

2018-10-29 Thread Corey Huinker
>
> >> A use case that I see quite a lot of is needing to do reports and other
> >> calculations on data per day/hour/etc but in the user's time zone.  The
> >> way to do that is fairly trivial, but it's not obvious what it does so
> >> reading queries becomes just a little bit more difficult.
>
>
+1 A client encountered this exact problem last week, and I was surprised
that the parameter didn't already exist.


Re: CopyFrom() has become way too complicated

2018-10-15 Thread Corey Huinker
>
> I think the code needs to be split up so that CopyFrom() in the loop
> body calls CopyFromOneTuple(), which then also splits out the tuple
> routing into its own CopyFromOneTupleRoute() function (that's 200 LOC on
> its own...). I suspect it'd also be good to refactor the
> partition-change code out into its own function.
>

+1
I had a hard time with this when doing my copy_srf() misadventure.


Re: COPY FROM WHEN condition

2018-10-11 Thread Corey Huinker
On Thu, Oct 11, 2018 at 5:04 AM Surafel Temesgen 
wrote:

>
>
> On Thu, Oct 11, 2018 at 12:00 PM Christoph Moench-Tegeder <
> c...@burggraben.net> wrote:
>
>> You can:
>>   COPY ( query ) TO 'filename';
>>
> it is for COPY FROM
>
> regards
> Surafel
>

It didn't get far, but you may want to take a look at a rejected patch for
copy_srf() (set returning function)
https://www.postgresql.org/message-id/CADkLM%3DdoeiWQX4AGtDNG4PsWfSXz3ai7kY%3DPZm3sUhsUeev9Bg%40mail.gmail.com
https://commitfest.postgresql.org/12/869/

Having a set returning function gives you the full expressiveness of SQL,
at the cost of an extra materialization step.


Re: Why we allow CHECK constraint contradiction?

2018-10-09 Thread Corey Huinker
On Wed, Oct 10, 2018 at 1:44 AM David G. Johnston <
david.g.johns...@gmail.com> wrote:

> On Tuesday, October 9, 2018, Imai, Yoshikazu <
> imai.yoshik...@jp.fujitsu.com> wrote:
>>
>> Are there any rows which can satisfy the ct's CHECK constraint? If not,
>> why we
>> allow creating table when check constraint itself is contradicted?
>>
>
> I'd bet on it being a combination of complexity and insufficient expected
> benefit.  Time is better spent elsewhere.  Mathmatically proving a
> contradiction in software is harder than reasoning about it mentally.
>

I've actually used that as a feature, in postgresql and other databases,
where assertions were unavailable, or procedural code was unavailable or
against policy.

Consider the following:

CREATE TABLE wanted_values ( x integer );

INSERT INTO wanted_values VALUES (1), (2), (3);


CREATE TABLE found_values ( x integer );

INSERT INTO found_values VALUES (1), (3);


CREATE TABLE missing_values (

x integer,

CONSTRAINT contradiction CHECK (false)

);


INSERT INTO missing_values

SELECT x FROM wanted_values

EXCEPT

SELECT x FROM found_values;


gives the error

ERROR:  new row for relation "missing_values" violates check constraint
"contradiction"

DETAIL:  Failing row contains (2).


Which can be handy when you need to fail a transaction because of bad data
and don't have branching logic available.


Re: CREATE ROUTINE MAPPING

2018-09-10 Thread Corey Huinker
On Mon, Sep 10, 2018 at 3:28 AM Hannu Krosing 
wrote:

> Hi Corey
>
> Have you looked at pl/proxy ?
>

I have, a long while ago.


> It does this and then some (sharding)
>

PL/proxy isn't a part of the SQL Standard.
PL/proxy only connects to other libpq-speaking databases.
The hope with routine mapping is that other data sources that do not easily
conform to a rows-and-columns metaphor can still expose their data to
postgresql.


Re: Alter index rename concurrently to

2018-07-25 Thread Corey Huinker
>
> You appear to be saying that you think that renaming an index
> concurrently is not safe.  In that case, this patch should be rejected.
> However, I don't think it necessarily is unsafe.  What we need is some
> reasoning about the impact, not a bunch of different options that we
> don't understand.
>

I've had this same need, and dreamed this same solution before. I also
thought about a syntax like ALTER INDEX foo RENAME TO
WHATEVER-IT-WOULD-HAVE-BEEN-NAMED-BY-DEFAULT to aid this situation.

But all of those needs fade if we have REINDEX CONCURRENTLY. I think that's
where we should focus our efforts.

A possible side effort into something like a VACUUM FULL CONCURRENTLY,
which would essentially do what pg_repack does, but keeping the same oid
and the stats that go with it, but even that's a nice-to-have add-on to
REINDEX CONCURRENTLY.


Re: Desirability of client-side expressions in psql?

2018-07-10 Thread Corey Huinker
>
>
>psql> \if :i >= 5
>
>
I think we're ok with that so long as none of the operators or values has a
\ in it.
What barriers do you see to re-using the pgbench grammar?


Re: Query Rewrite for Materialized Views (Postgres Extension)

2018-06-18 Thread Corey Huinker
>
> Hope it is useful or interesting for someone! Questions or comments are
>> very welcome.
>>
>
> good idea.
>
> Regards
>
> Pavel
>

In a recent PgConf NYC presentation [1] I was talking about the technical
hurdles to implementing materialized views that could be kept up to date at
all times, and the benefits of having such a thing.

Some use cases can be addressed with eventually-consistent derivative table
structures (Vertica's projections, PipelineDB's continuous views, etc), but
those methods rely on the source data never having deletes or updates, or
confining those updates to the "hot" part of the source tables, so it
generally works for time-series data, but not for other cases.

It has occurred to me that Dave Fetter's work on ASSERTIONS [2] has common
underpinnings with true continuous materialized views. In both cases, the
creation of a system object causes the creations of insert/update/delete
triggers on one or more existing tables. In the case of assertions, those
triggers are run with the goal of raising an error if rows are returned
from a query. In the case of a materialized view, those same triggers would
be used to delete rows from a CMV and insert replacements rows.

If we can get always-up-to-date materialized views, then Denty's work on
query rewrite would have greatly enhanced utility.

[1]
https://postgresconf.org/conferences/2018/program/proposals/a-roadmap-to-continuous-materialized-views-b4644661-8d5a-4186-8c17-4fb82600e147
[2]
http://databasedoings.blogspot.com/2018/06/ive-posted-my-slides-for-my-asssertions.html


Re: [RFC] Add an until-0 loop in psql

2018-04-30 Thread Corey Huinker
On Mon, Apr 30, 2018 at 7:05 AM Pierre Ducroquet <
pierre.ducroq...@people-doc.com> wrote:

> On Monday, April 30, 2018 1:01:25 PM CEST Daniel Verite wrote:
> >   Corey Huinker wrote:
> > > As of v11, DO blocks can do transactions. I think this will meet your
> > > needs.
> > They do support COMMIT and ROLLBACK in the current
> > development tree, but not VACUUM as in Pierre's example.
> >
> > postgres=# \echo :SERVER_VERSION_NAME
> > 11devel
> >
> > postgres=# do ' begin vacuum; end ';
> > ERROR:VACUUM cannot be executed from a function
> > CONTEXT:  SQL statement "vacuum"
> > PL/pgSQL function inline_code_block line 1 at SQL statement
> >
> >
> > Best regards,
>
> Indeed, vacuum is going to be the biggest offender here, sadly.
> One could work around this of course (on top of my head, using notify to
> wake-
> up another client that would launch the required vacuums…)
> Being able to do transactions in DO blocks is a great new feature of v11 I
> was
> not aware of. But psql saw the addition of \if recently, so why not having
> loops in there too ? (Something better than this hack of course, it was
> just a
> 10 minutes hack-sprint for a demo)
>
> Regards
>
>  Pierre
>

Bummer about vacuum.

If you dig into the very long discussion about \if (which, incidentally,
started off as a 20-line command patch called \quit-if, so don't discount
that your idea could take off), you'll see some of the problems with
looping discussed, mostly about the issues I already alluded to (no concept
of reading backwards on STDIN, scoping outside the current "file", ability
of psql vars to contain executable \commands), you'll have a pretty good
grasp of the places where psql would need changes.

In the mean time, if you believe the table won't get much larger during the
operation, you could use \gexec as a finite loop iterator

SELECT count(*)::bigint / 1000 FROM big_table as num_iters
\gset
SELECT
'BEGIN',
'DELETE FROM big_table WHERE id IN (SELECT id FROM big_table WHERE bad
= true LIMIT 1000)',
'VACUUM big_table',
'COMMIT'
from generate_series(1,:num_iters) g
\gexec


If the number of rows increases, then your finite loop will fall short, and
if something else deletes a bunch of rows, your loop will spin it's wheels
a few times at the end, but it would do most of what you want.


<    1   2   3   4   >