Re: Fwd: Emit namespace in post-copy output
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: not tested Documentation:not tested Passed make check-world. Running make installcheck-world had 2 errors out of 209, but I got those same 2 errors on a clean branch. Feature is as-described, and very simple. As far as I can tell, there is no external specification for vacuum or any related utility. I searched the documentation, and found several examples of the invocation of the VACUUM FULL command and vacuuumdb utility, but at no point was sample output shown, so this change will not require updating documentation. The new status of this patch is: Ready for Committer
Re: Grammar railroad diagram
> > > Another way that I tested and it's working is to use > https://www.bottlecaps.de/convert/ paste the postgresql grammar there > and press "convert" and after press "view diagram". > I tried this out and I'm pleased to see that one of the outputs is xhtml + SVG, because SVGs have hover-over tool-tips, which are an important aspect of accessibility, which was my major concern the last time a similar thing was proposed [1]. [1] https://www.postgresql.org/message-id/cah2-wzmfc+p3pc_u1dsgm3lawurzkx5pqzmxtglgsxbf8gf...@mail.gmail.com
Re: Emit namespace in post-copy output
On Tue, Jun 22, 2021 at 6:08 PM Mike wrote: > When running a VACUUM or CLUSTER command, the namespace name is not part > of the emitted message. > > Using `vacuumdb` CLI tool recently with multiple jobs, I found that > reading the output messages harder to match the relations with their > namespaces. > > Example: > > INFO: vacuuming "sendgrid.open" > INFO: vacuuming "mailgun.open" > ... > INFO: "open": found 0 removable, 31460776 nonremovable row versions in > 1358656 pages > DETAIL: 0 dead row versions cannot be removed yet. > CPU 31.35s/261.26u sec elapsed 1620.68 sec. > ... > > In this example. the user can't readily tell which `open` relation was > completed. > > Attached is a patch using existing functions to include the namespace in > the output string. > > Looking forward to feedback! > -Mike Fiedler > I've added this to the open commitfest: https://commitfest.postgresql.org/33/3200/ The change is quite simple, just 3 lines, adding the schema name to two different lines of output. As such, there is no obvious documentation to change, though I can imagine that we have sample output from vacuum, vacuumdb or cluster somewhere that would need to be updated. I cobbled together a very simple test: ~/pgdata$ /usr/local/pgsql/bin/psql postgres psql (14beta2) Type "help" for help. postgres=# create database mike_test; CREATE DATABASE postgres=# \c mike_test You are now connected to database "mike_test" as user "corey". mike_test=# create schema foo; CREATE SCHEMA mike_test=# create table foo.bar(x integer); CREATE TABLE mike_test=# \q mike_test=# VACUUM FULL VERBOSE foo.bar; INFO: vacuuming "foo.bar" INFO: "foo.bar": found 0 removable, 0 nonremovable row versions in 0 pages DETAIL: 0 dead row versions cannot be removed yet. CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. VACUUM And of course vacuumdb ~/pgdata$ /usr/local/pgsql/bin/vacuumdb --full --verbose mike_test --table=foo.bar vacuumdb: vacuuming database "mike_test" INFO: vacuuming "foo.bar" INFO: "foo.bar": found 0 removable, 0 nonremovable row versions in 0 pages DETAIL: 0 dead row versions cannot be removed yet. CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. So far, so good.
Re: test runner (was Re: SQL-standard function body)
> > > This is nice. Are there any parallelism capabilities? > > Yes. It defaults to number-of-cores processes, but obviously can also be > specified explicitly. One very nice part about it is that it'd work > largely the same on windows (which has practically unusable testing > right now). It probably doesn't yet, because I just tried to get it > build and run tests at all, but it shouldn't be a lot of additional > work. > The pidgin developers speak very highly of meson, for the same reasons already mentioned in this thread.
A Case For Inlining Immediate Referential Integrity Checks
A Case For Inlining Immediate Referential Integrity Checks -- The following is an overview of how Postgres currently implemented referential integrity, the some problems with that architecture, attempted solutions for those problems, and a suggstion of another possible solution. Notes On Notation and Referential Integrity In General -- All referential integrity is ultimately of this form: R(X) => T(Y) Where one referencing table R has a set of columns X That references a set of columns Y which comprise a unique constraint on a target table T. Note that the Y-set of columns is usually the primary key of T, but does not have to be. The basic referential integrity checks fall into two basic categories, Insert and Delete, which can be checked Immediately following the statement, or can be Deferred to the end of the transaction. The Insert check is fairly straightforward. Any insert to R, or update of R that modifies [1] any column in X, is checked to see if all of the X columns are NOT NULL, and if so, a lookup is done on T to find a matching row tuple of Y. If none is found, then an error is raised. The Update check is more complicated, as it covers any UPDATE operation that modifies [1] any column in Y, where all of the values of Y are NOT NUL, as well as DELETE operation where all of the columns of Y are NOT NULL. For any Update check, the table R is scanned for any matching X tuples matching Y in the previous, and for any matches found, an action is taken. That action can be to fail the operation (NO ACTION, RESTRICT), update the X values to fixed values (SET NULL, SET DEFAULT), or to delete those rows in R (CASCADE). Current Implementation -- Currently, these operations are handled via per-row triggers. In our general case, one trigger is placed on R for INSERT operations, and one trigger is placed on T for DELETE operations, and an additional trigger is placed on T for UPDATE operations that affect any column of Y. These Insert trigger functions invoke the C function RI_FKey_check() [2]. The trigger is fired unconditionally, and the trigger itself determines if there is a referential integrity constraint to be made or not. Ultimately this trigger invokes an SPI query of the form SELECT 1 FROM WHERE () FOR KEY SHARE. This query is generally quite straightforward to the planner, as it becomes either a scan of a single unique index, or a partition search followed by a scan of a single unique index. The operation succeeds if a row is found, and fails if it does not. The Update trigger functions are implemented with a set of C functions RI_[noaction|restrict|cascade|setnull|setdefault]_[upd|del]() [3]. These functions each generate a variation of SPI query in one of the following forms cascade: DELETE FROM WHERE restrict/noaction: SELECT 1 FROM WHERE FOR KEY SHARE setnull: UPDATE SET x1 = NULL, ... WHERE setdefault: UPDATE SET x1 = DEFAULT, ... WHERE These triggers are either executed at statement time (Immediate) or are queued for execution as a part of the transaction commit (Deferred). Problems With The Current Implementation The main problems with this architecture come down to visiblity and performance. The foremost problem with this implementation is that these extra queries are not visible to the end user in any way. It is possible to infer that the functions executed by looking at the constraint defnitions and comparing pg_stat_user_tables or pg_stat_user_indexes before and after the operation, but in general the time spent in these functions accrues to the DML statement (Immediate) or COMMIT statement (Deferred) without any insght into what took place. This is especially vexing in situations where an operation as simple as "DELETE FROM highly_referenced_table WHERE id = 1" hits the primary key index, but takes several seconds to run. The performance of Insert operations is generally not too bad, in that query boils down to an Index Scan for a single row. The problem, however, is that this query must be executed for every row inserted. The query itself is only planned once, and that query plan is cached for later re-use. That removes some of the query overhead, but also incurs a growing cache of plans which can create memory pressure if the number of foreign keys is large, and indeed this has become a problem for at least one customer [4]. Some profiling of the RI check indicated that about half of the time of the insert was spent in SPI functions that could be bypassed if the C function called index_beginscan and index_rescan directly [5]. And these indications bore out when Amit Langote wrote a patch [6] which finds the designanted index from the constraint (with some drilling through partitions if need be) and then invokes the scan functions. This method showed about a halving of the
Re: simplifying foreign key/RI checks
> > > It seems to me 1 (RI_PLAN_CHECK_LOOKUPPK) is still alive. (Yeah, I > > know that doesn't mean the usefulness of the macro but the mechanism > > the macro suggests, but it is confusing.) On the other hand, > > RI_PLAN_CHECK_LOOKUPPK_FROM_PK and RI_PLAN_LAST_ON_PK seem to be no > > longer used. (Couldn't we remove them?) > > Yeah, better to just remove those _PK macros and say this module no > longer runs any queries on the PK table. > > How about the attached? > > Sorry for the delay. I see that the changes were made as described. Passes make check and make check-world yet again. I'm marking this Ready For Committer unless someone objects.
Re: parse_slash_copy doesn't support psql variables substitution
On Wed, Feb 10, 2021 at 8:33 AM Pavel Stehule wrote: > Hi > > Is there some reason why \copy statement (parse_slash_copy parser) doesn't > support psql variables? > > Regards > > Pavel > I remember wondering about that when I was working on the \if stuff. I dug into it a bit, but the problem was out of scope for my goals. The additional options recently added to \g reduced my need for \copy, and it seemed liked there was some effort to have input pipes as well, that would eliminate the need for \copy altogether.
Re: simplifying foreign key/RI checks
On Sun, Jan 24, 2021 at 6:51 AM Amit Langote wrote: > On Sun, Jan 24, 2021 at 11:26 AM Corey Huinker > wrote: > > On Sat, Jan 23, 2021 at 12:52 PM Zhihong Yu wrote: > >> > >> Hi, > > Thanks for the review. > > >> + for (i = 0; i < riinfo->nkeys; i++) > >> + { > >> + Oid eq_opr = eq_oprs[i]; > >> + Oid typeid = RIAttType(fk_rel, riinfo->fk_attnums[i]); > >> + RI_CompareHashEntry *entry = ri_HashCompareOp(eq_opr, > typeid); > >> + > >> + if (pk_nulls[i] != 'n' && > OidIsValid(entry->cast_func_finfo.fn_oid)) > >> > >> It seems the pk_nulls[i] != 'n' check can be lifted ahead of the > assignment to the three local variables. That way, ri_HashCompareOp > wouldn't be called when pk_nulls[i] == 'n'. > > Good idea, so done. Although, there can't be nulls right now. > > >> + case TM_Updated: > >> + if (IsolationUsesXactSnapshot()) > >> ... > >> + case TM_Deleted: > >> + if (IsolationUsesXactSnapshot()) > >> > >> It seems the handling for TM_Updated and TM_Deleted is the same. The > cases for these two values can be put next to each other (saving one block > of code). > > Ah, yes. The TM_Updated case used to be handled a bit differently in > earlier unposted versions of the patch, though at some point I > concluded that the special handling was unnecessary, but didn't > realize what you just pointed out. Fixed. > > > I'll pause on reviewing v4 until you've addressed the suggestions above. > > Here's v5. > v5 patches apply to master. Suggested If/then optimization is implemented. Suggested case merging is implemented. Passes make check and make check-world yet again. Just to confirm, we *don't* free the RI_CompareHashEntry because it points to an entry in a hash table which is TopMemoryContext aka lifetime of the session, correct? Anybody else want to look this patch over before I mark it Ready For Committer?
Re: simplifying foreign key/RI checks
On Sat, Jan 23, 2021 at 12:52 PM Zhihong Yu wrote: > Hi, > > + for (i = 0; i < riinfo->nkeys; i++) > + { > + Oid eq_opr = eq_oprs[i]; > + Oid typeid = RIAttType(fk_rel, riinfo->fk_attnums[i]); > + RI_CompareHashEntry *entry = ri_HashCompareOp(eq_opr, typeid); > + > + if (pk_nulls[i] != 'n' && > OidIsValid(entry->cast_func_finfo.fn_oid)) > > It seems the pk_nulls[i] != 'n' check can be lifted ahead of the > assignment to the three local variables. That way, ri_HashCompareOp > wouldn't be called when pk_nulls[i] == 'n'. > > + case TM_Updated: > + if (IsolationUsesXactSnapshot()) > ... > + case TM_Deleted: > + if (IsolationUsesXactSnapshot()) > > It seems the handling for TM_Updated and TM_Deleted is the same. The cases > for these two values can be put next to each other (saving one block of > code). > > Cheers > I'll pause on reviewing v4 until you've addressed the suggestions above.
Re: simplifying foreign key/RI checks
> > > > I decided not to deviate from pk_ terminology so that the new code > doesn't look too different from the other code in the file. Although, > I guess we can at least call the main function > ri_ReferencedKeyExists() instead of ri_PrimaryKeyExists(), so I've > changed that. > I think that's a nice compromise, it makes the reader aware of the concept. > > I've attached the updated patch. > Missing "break" added. Check. Comment updated. Check. Function renamed. Check. Attribute mapping matching test (and assertion) added. Check. Patch applies to an as-of-today master, passes make check and check world. No additional regression tests required, as no new functionality is introduced. No docs required, as there is nothing user-facing. Questions: 1. There's a palloc for mapped_partkey_attnums, which is never freed, is the prevailing memory context short lived enough that we don't care? 2. Same question for the AtrrMap map, should there be a free_attrmap().
Re: simplifying foreign key/RI checks
> > I decided not to deviate from pk_ terminology so that the new code > doesn't look too different from the other code in the file. Although, > I guess we can at least call the main function > ri_ReferencedKeyExists() instead of ri_PrimaryKeyExists(), so I've > changed that. > I agree with leaving the existing terminology where it is for this patch. Changing the function name is probably enough to alert the reader that the things that are called pks may not be precisely that.
Re: Release SPI plans for referential integrity with DISCARD ALL
On Wed, Jan 13, 2021 at 1:03 PM Corey Huinker wrote: > In addition to that, a following case would be solved with this approach: >> When many processes are referencing many tables defined foreign key >> constraints thoroughly, a huge amount of memory will be consumed >> regardless of whether referenced tables are partitioned or not. >> >> Attached the patch. Any thoughts? >> > > Amit Langote has done some great work at eliminating SPI from > INSERT/UPDATE triggers entirely, thus reducing the number of cached plans > considerably. > > I think he was hoping to have a patch formalized this week, if time > allowed. > > It doesn't have DELETE triggers in it, so this patch might still have good > value for deletes on a commonly used enumeration table. > > However, our efforts might be better focused on eliminating SPI from > delete triggers as well, an admittedly harder task. > Amit's patch is now available in this thread [1]. I'm curious if it has any effect on your memory pressure issue. [1] https://www.postgresql.org/message-id/ca+hiwqgkfjfydeq5vhph6eqpkjsbfpddy+j-kxyfepqedts...@mail.gmail.com
Re: simplifying foreign key/RI checks
On Mon, Jan 18, 2021 at 9:45 PM Amit Langote wrote: > On Tue, Jan 19, 2021 at 2:47 AM Zhihong Yu wrote: > > > > Hi, > > I was looking at this statement: > > > > insert into f select generate_series(1, 200, 2); > > > > Since certain generated values (the second half) are not in table p, > wouldn't insertion for those values fail ? > > I tried a scaled down version (1000th) of your example: > > > > yugabyte=# insert into f select generate_series(1, 2000, 2); > > ERROR: insert or update on table "f" violates foreign key constraint > "f_a_fkey" > > DETAIL: Key (a)=(1001) is not present in table "p". > > Sorry, a wrong copy-paste by me. Try this: > > create table p (a numeric primary key); > insert into p select generate_series(1, 200); > create table f (a bigint references p); > > -- Unpatched > insert into f select generate_series(1, 200, 2); > INSERT 0 100 > Time: 6527.652 ms (00:06.528) > > update f set a = a + 1; > UPDATE 100 > Time: 8108.310 ms (00:08.108) > > -- Patched: > insert into f select generate_series(1, 200, 2); > INSERT 0 100 > Time: 3312.193 ms (00:03.312) > > update f set a = a + 1; > UPDATE 100 > Time: 4292.807 ms (00:04.293) > > > For v1-0002-Avoid-using-SPI-for-some-RI-checks.patch : > > > > +* Collect partition key values from the unique key. > > > > At the end of the nested loop, should there be an assertion that > partkey->partnatts partition key values have been found ? > > This can be done by using a counter (initialized to 0) which is > incremented when a match is found by the inner loop. > > I've updated the patch to add the Assert. Thanks for taking a look. > > -- > Amit Langote > EDB: http://www.enterprisedb.com v2 patch applies and passes make check and make check-world. Perhaps, given the missing break at line 418 without any tests failing, we could add another regression test if we're into 100% code path coverage. As it is, I think the compiler warning was a sufficient alert. The code is easy to read, and the comments touch on the major points of what complexities arise from partitioned tables. A somewhat pedantic complaint I have brought up off-list is that this patch continues the pattern of the variable and function names making the assumption that the foreign key is referencing the primary key of the referenced table. Foreign key constraints need only reference a unique index, it doesn't have to be the primary key. Granted, that unique index is behaving exactly as a primary key would, so conceptually it is very similar, but keeping with the existing naming (pk_rel, pk_type, etc) can lead a developer to think that it would be just as correct to find the referenced relation and get the primary key index from there, which would not always be correct. This patch correctly grabs the index from the constraint itself, so no problem there. I like that this patch changes the absolute minimum of the code in order to get a very significant performance benefit. It does so in a way that should reduce resource pressure found in other places [1]. This will in turn reduce the performance penalty of "doing the right thing" in terms of defining enforced foreign keys. It seems to get a clearer performance boost than was achieved with previous efforts at statement level triggers. This patch completely sidesteps the DELETE case, which has more insidious performance implications, but is also far less common, and whose solution will likely be very different. [1] https://www.postgresql.org/message-id/cakkq508z6r5e3jdqhfpwszsajlpho3oyyoamfesaupto5vg...@mail.gmail.com
Re: simplifying foreign key/RI checks
> > > In file included from > /home/japin/Codes/postgresql/Debug/../src/include/postgres.h:47:0, > from > /home/japin/Codes/postgresql/Debug/../src/backend/utils/adt/ri_triggers.c:24: > /home/japin/Codes/postgresql/Debug/../src/backend/utils/adt/ri_triggers.c: > In function ‘ri_PrimaryKeyExists’: > /home/japin/Codes/postgresql/Debug/../src/include/utils/elog.h:134:5: > warning: this statement may fall through [-Wimplicit-fallthrough=] > do { \ > ^ > /home/japin/Codes/postgresql/Debug/../src/include/utils/elog.h:156:2: > note: in expansion of macro ‘ereport_domain’ > ereport_domain(elevel, TEXTDOMAIN, __VA_ARGS__) > ^~ > /home/japin/Codes/postgresql/Debug/../src/include/utils/elog.h:229:2: > note: in expansion of macro ‘ereport’ > ereport(elevel, errmsg_internal(__VA_ARGS__)) > ^~~ > /home/japin/Codes/postgresql/Debug/../src/backend/utils/adt/ri_triggers.c:417:5: > note: in expansion of macro ‘elog’ > elog(ERROR, "unexpected table_tuple_lock status: %u", res); > ^~~~ > /home/japin/Codes/postgresql/Debug/../src/backend/utils/adt/ri_triggers.c:419:4: > note: here > default: > ^~~ > > -- > Regrads, > Japin Li. > ChengDu WenWu Information Technology Co.,Ltd. > I also get this warning. Adding a "break;" at line 418 resolves the warning.
Re: Release SPI plans for referential integrity with DISCARD ALL
> > In addition to that, a following case would be solved with this approach: > When many processes are referencing many tables defined foreign key > constraints thoroughly, a huge amount of memory will be consumed > regardless of whether referenced tables are partitioned or not. > > Attached the patch. Any thoughts? > Amit Langote has done some great work at eliminating SPI from INSERT/UPDATE triggers entirely, thus reducing the number of cached plans considerably. I think he was hoping to have a patch formalized this week, if time allowed. It doesn't have DELETE triggers in it, so this patch might still have good value for deletes on a commonly used enumeration table. However, our efforts might be better focused on eliminating SPI from delete triggers as well, an admittedly harder task.
Re: Huge memory consumption on partitioned table with FKs
On Mon, Nov 30, 2020 at 9:48 PM Tom Lane wrote: > Corey Huinker writes: > > Given that we're already looking at these checks, I was wondering if this > > might be the time to consider implementing these checks by directly > > scanning the constraint index. > > Yeah, maybe. Certainly ri_triggers is putting a huge amount of effort > into working around the SPI/parser/planner layer, to not a lot of gain. > > However, it's not clear to me that that line of thought will work well > for the statement-level-trigger approach. In that case you might be > dealing with enough tuples to make a different plan advisable. > > regards, tom lane > Bypassing SPI would probably mean that we stay with row level triggers, and the cached query plan would go away, perhaps replaced by an already-looked-up-this-tuple hash sorta like what the cached nested loops effort is doing. I've been meaning to give this a try when I got some spare time. This may inspire me to try again.
Re: Huge memory consumption on partitioned table with FKs
> > I think this can be solved easily in the patch, by having > ri_BuildQueryKey() compare the parent's fk_attnums to the parent; if > they are equal then use the parent's constaint_id, otherwise use the > child constraint. That way, the cache entry is reused in the common > case where they are identical. > Somewhat of a detour, but in reviewing the patch for Statement-Level RI checks, Andres and I observed that SPI made for a large portion of the RI overhead. Given that we're already looking at these checks, I was wondering if this might be the time to consider implementing these checks by directly scanning the constraint index.
Re: Feature proposal for psql
>> One limitation of this approach is that \set can't span lines, so >> writing complex queries would be kinda painful. But that would >> be a good limitation to address separately; \set isn't the only >> metacommand where can't-span-lines is a problem sometimes. >> If you seriously want to pursue adding a feature like this, >> probably the -hackers list is a more appropriate discussion >> forum than -novice. >> >> regards, tom lane >> > > The ability to save and retrieve multi-line queries would be quite nice > though, often I would like to save a query too large to type. > > I think I don't know psql well enough to propose a viable syntax, so I > guess that would be up to experts here... > But I would be pretty happy to implement it. > > Regards > Denis > > Well, if you want to do it right now, you can do this: db=> select * from foo; x | y + 1 | 1 2 | 2 3 | 3 4 | 4 5 | 5 6 | 6 7 | 7 8 | 8 9 | 9 10 | 10 (10 rows) db=> select * from foo where x = :xval \w query1.sql db=> \set xval 4 db=> \i query1.sql x | y ---+--- 4 | 4 (1 row) Granted, that involves adding files to the filesystem, setting variables rather than passing parameters, remembering what those variables were, and having the discipline to not have overlapping uses for variable names across multiple files. So the key shortcomings right now seem to be: * no way to pass in values to an \i or \ir and no way to locally scope them * one file per query Setting variables locally in a \ir would need to somehow push and pop existing variable values because those vars are scoped at the session level, and that might confuse the user when they set the var inside the included file expecting the calling session to keep the value. Perhaps we could add a notion of a "bag of tricks" dir in each user's home directory, and a slash command \wbag (better name suggestions welcome) that behaves like \w but assumes the file will go in ~/.psql-bag-of-tricks/ and \ibag which includes a file from the same dir.
Re: use pg_get_functiondef() in pg_dump
> > I'm sure there's a lot of folks who'd like to see more of the logic we > have in pg_dump for building objects from the catalog available to more > tools through libpgcommon- psql being one of the absolute first > use-cases for exactly that (there's certainly no shortage of people > who've asked how they can get a CREATE TABLE statement for a table by > using psql...). > I count myself among those folks (see https://www.postgresql.org/message-id/CADkLM%3DfxfsrHASKk_bY_A4uomJ1Te5MfGgD_rwwQfV8wP68ewg%40mail.gmail.com for discussion of doing DESCRIBE and SHOW CREATE-ish functions either on server side or client side). I'm all for having this as "just" as set of pg_get_*def functions, because they allow for the results to be used in queries. Granted, the shape of the result set may not be stable, but that's the sort of thing we can warn for the same way we have warnings for changes to pg_stat_activity. At that point any DESCRIBE/SHOW CREATE server side functions essentially become just shells around the pg_get_*def(), with no particular requirement to make those new commands work inside a SELECT. Would it be totally out of left field to have the functions have an optional "version" parameter, defaulted to null, that would be used to give backwards compatible results if and when we do make a breaking change?
Re: PG 13 release notes, first draft
> > > > > > Please do list Jürgen, Corey and Roger as authors of the glossary. > > (Actually I should be listed as well, as the time I spent on it was > considerable.) > +1, the time spent was quite considerable
Re: Proposing WITH ITERATIVE
> > > > Perhaps something like this would be more readable > > > > WITH t AS ( > >UPDATE ( SELECT 1 AS ctr, 'x' as val ) > >SET ctr = ctr + 1, val = val || 'x' > >WHILE ctr <= 100 > >RETURNING ctr, val > > ) > > > > The notion of an UPDATE on an ephemeral subquery isn't that special, see > > "subquery2" in > > > https://docs.oracle.com/cd/B19306_01/appdev.102/b14261/update_statement.htm > , > > I must admit that I do not like much needing another level of subquery, > but maybe it could just be another named query in the WITH statement. > So like this: WITH initial_conditions as (SELECT 1 as ctr, 'x' as val) UPDATE initial_conditions SET ctr = ctr + 1, val = val || 'x' WHILE ctr <= 100 RETURNING ctr, val > ISTM that UPDATE is quite restrictive as the number of rows cannot > change, which does not seem desirable at all? How could I add or remove > rows from one iteration to the next? > My understanding was that maintaining a fixed number of rows was a desired feature. > ISTM that the WHILE would be checked before updating, so that WHILE FALSE > does nothing, in which case its position after SET is odd. > True, but having the SELECT before the FROM is equally odd. > Having both WHERE and WHILE might look awkward. > Maybe an UNTIL instead of WHILE? > > Also it looks much more procedural this way, which is the point, but also > depart from the declarative SELECT approach of WITH RECURSIVE. > Yeah, just throwing it out as a possibility. Looking again at what I suggested, it looks a bit like the Oracle "CONNECT BY level <= x" idiom. I suspect that the SQL standards body already has some preliminary work done, and we should ultimately follow that.
Re: Add A Glossary
On Wed, Apr 29, 2020 at 3:15 PM Peter Eisentraut < peter.eisentr...@2ndquadrant.com> wrote: > Why are all the glossary terms capitalized? Seems kind of strange. > > They weren't intended to be, and they don't appear to be in the page I'm looking at. Are you referring to the anchor like in https://www.postgresql.org/docs/devel/glossary.html#GLOSSARY-RELATION ? If so, that all-capping is part of the rendering, as the ids were all named in all-lower-case.
Re: Proposing WITH ITERATIVE
On Wed, Apr 29, 2020 at 10:34 AM Jonah H. Harris wrote: > On Wed, Apr 29, 2020 at 7:22 AM Peter Eisentraut < > peter.eisentr...@2ndquadrant.com> wrote: > >> Yeah the RECURSIVE vs ITERATIVE is a bit of a red herring here. As you >> say, the RECURSIVE keyword doesn't specify the processing but marks the >> fact that the specification of the query is recursive. >> > > Agreed. I started thinking through Fabien's response last night. > > I think a syntax that would fit better within the existing framework >> would be something like >> >> WITH RECURSIVE t AS ( >> SELECT base case >>REPLACE ALL -- instead of UNION ALL >> SELECT recursive case >> ) >> > > I was originally thinking more along the lines of Fabien's approach, but > this is similarly interesting. > Obviously I'm very concerned about doing something that the SQL Standard will clobber somewhere down the road. Having said that, the recursive syntax always struck me as awkward even by SQL standards. Perhaps something like this would be more readable WITH t AS ( UPDATE ( SELECT 1 AS ctr, 'x' as val ) SET ctr = ctr + 1, val = val || 'x' WHILE ctr <= 100 RETURNING ctr, val ) The notion of an UPDATE on an ephemeral subquery isn't that special, see "subquery2" in https://docs.oracle.com/cd/B19306_01/appdev.102/b14261/update_statement.htm , so the only syntax here without precedence is dropping a WHILE into an UPDATE statement.
Re: More efficient RI checks - take 2
On Wed, Apr 22, 2020 at 2:36 PM Andres Freund wrote: > Hi, > > On 2020-04-22 13:46:22 -0400, Robert Haas wrote: > > On Wed, Apr 22, 2020 at 1:18 PM Alvaro Herrera > wrote: > > > Well, I was actually thinking in building ready-made execution trees, > > > bypassing the planner altogether. But apparently no one thinks that > > > this is a good idea, and we don't have any code that does that already, > > > so maybe it's not a great idea. > > I was commenting on what I understood Corey to say, but was fairly > unclear about it. But I'm also far from sure that I understood Corey > correctly... > I was unclear because, even after my failed foray into statement level triggers for RI checks, I'm still pretty inexperienced in this area. I'm just happy that it's being discussed.
Re: More efficient RI checks - take 2
> > I can imagine removal of the SPI from the current implementation (and > constructing the plans "manually"), but note that the queries I use in my > patch are no longer that trivial. So the SPI makes sense to me because it > ensures regular query planning. > As an intermediate step, in the case where we have one row, it should be simple enough to extract that row manually, and do an SPI call with fixed values rather than the join to the ephemeral table, yes? > As for the tuplestore, I'm not sure the startup cost is a problem: if > you're > concerned about the 1-row case, the row should usually be stored in memory. > > > and once that is done, we could see about step #2. > > As I said during my review of your patch last year, I think the RI > semantics > has too much in common with that of triggers. I'd need more info to imagine > such a change. > As a general outline, I think that DML would iterate over the 2 sets of potentially relevant RI definitions rather than iterating over the triggers. The similarities between RI and general triggers are obvious, which explains why they went that route initially, but they're also a crutch, but since all RI operations boil down to either an iteration over a tuplestore to do lookups in an index (when checking for referenced rows), or a hash join of the transient data against the un-indexed table when checking for referencing rows, and people who know this stuff far better than me seem to think that SPI overhead is best avoided when possible. I'm looking forward to having more time to spend on this.
Re: Poll: are people okay with function/operator table redesign?
> > Yeah, back at the beginning of this exercise, Alvaro wondered aloud > if we should go to something other than tables altogether. I dunno > what that'd look like though. > It would probably look like our acronyms and glossary pages. Maybe the return example and return values get replaced with a programlisting?
Re: Poll: are people okay with function/operator table redesign?
> > Thinking out loud, it'd also be great if we could add in some anchors as > well, so perhaps in the future on the pgweb side we could add in some > discoverable links that other documentation has -- which in turn people > could click / link to others directly to the function name. > +1
Re: Add A Glossary
> > > Term 'relation': A sequence is internally a table with one row - right? > Shall we extend the list of concrete relations by 'sequence'? Or is this > not necessary because 'table' is already there? > I wrote one for sequence, it was a bit math-y for Alvaro's taste, so we're going to try again.
Re: More efficient RI checks - take 2
On Wed, Apr 8, 2020 at 1:06 PM Pavel Stehule wrote: > > > st 8. 4. 2020 v 18:36 odesílatel Antonin Houska napsal: > >> After having reviewed [1] more than a year ago (the problem I found was >> that >> the transient table is not available for deferred constraints), I've >> tried to >> implement the same in an alternative way. The RI triggers still work as >> row >> level triggers, but if multiple events of the same kind appear in the >> queue, >> they are all passed to the trigger function at once. Thus the check query >> does >> not have to be executed that frequently. >> > I'm excited that you picked this up! > >> Some performance comparisons are below. (Besides the execution time, >> please >> note the difference in the number of trigger function executions.) In >> general, >> the checks are significantly faster if there are many rows to process, >> and a >> bit slower when we only need to check a single row. However I'm not sure >> about >> the accuracy if only a single row is measured (if a single row check is >> performed several times, the execution time appears to fluctuate). >> > These numbers are very promising, and much more in line with my initial expectations. Obviously the impact on single-row DML is of major concern, though. It is hard task to choose good strategy for immediate constraints, but for > deferred constraints you know how much rows should be checked, and then you > can choose better strategy. > > Is possible to use estimation for choosing method of RI checks? > In doing my initial attempt, the feedback I was getting was that the people who truly understood the RI checks fell into the following groups: 1. people who wanted to remove the SPI calls from the triggers 2. people who wanted to completely refactor RI to not use triggers 3. people who wanted to completely refactor triggers While #3 is clearly beyond the scope for an endeavor like this, #1 seems like it would nearly eliminate the 1-row penalty (we'd still have the TupleStore initi penalty, but it would just be a handy queue structure, and maybe that cost would be offset by removing the SPI overhead), and once that is done, we could see about step #2.
Re: Add A Glossary
On Sat, Apr 4, 2020 at 2:55 AM Fabien COELHO wrote: > > > BTW it's now visible at: > > https://www.postgresql.org/docs/devel/glossary.html Nice. I went looking for it yesterday and the docs hadn't rebuilt yet. > ISTM that occurrences of these words elsewhere in the documentation should > link to the glossary definitions? > Yes, that's a big project. I was considering writing a script to compile all the terms as search terms, paired with their glossary ids, and then invoke git grep to identify all pages that have term FOO but don't have glossary-foo. We would then go about gloss-linking those pages as appropriate, but only a few pages at a time to keep scope sane. Also, I'm unclear about the circumstances under which we should _not_ tag a term. I remember hearing that we should only tag it on the first usage, but is that per section or per page? > As the definitions are short and to the point, maybe the HTML display > could (also) "hover" the definitions when the mouse passes over the word, > using the "title" attribute? > I like that idea, if it doesn't conflict with accessibility standards (maybe that's just titles on images, not sure). I suspect we would want to just carry over the first sentence or so with a ... to avoid cluttering the screen with my overblown definition of a sequence. I suggest we pursue this idea in another thread, as we'd probably want to do it for acronyms as well. > > "ACID" does not appear as an entry, nor in the acronyms sections. Also no > DCL, although DML & DDL are in acronyms. > It needs to be in the acronyms page, and in light of all the docbook wizardry that I've learned from Alvaro, those should probably get their own acronym-foo ids as well. The cutoff date for 13 fast approaches, so it might be for 14+ unless doc-only patches are treated differently. > Entries could link to relevant wikipedia pages, like the acronyms section > does? > They could. I opted not to do that because each external link invites debate about how authoritative that link is, which is easier to do with acronyms. Now that the glossary is a reality, it's easier to have those discussions.
Re: Add A Glossary
> > we have it, we can start thinking of patching the main part of the docs > to make reference to it by using in key spots. Right now > the glossary links to itself, but it makes lots of sense to have other > places point to it. > I have some ideas about how to patch the main docs, but will leave those to a separate thread. > * I commented out the definition of "sequence", which seemed to go into > excessive detail. Let's have a more concise definition? > That one's my fault. > > Patches for these omissions, and other contributions, welcome. > Thanks for all your work on this!
Re: Add A Glossary
On Thu, Apr 2, 2020 at 8:44 AM Jürgen Purtz wrote: > +1 and many thanks to Alvaros edits. > > I did some of the grunt work Alvaro alluded to in v6, and the results are attached and they build, which means there are no invalid links. Notes: * no definition wordings were changed * added a linkend to all remaining glossterms that do not immediately follow a glossentry * renamed id glossary-temporary-tables to glossary-temporary-table * temporarily re-added an id for glossary-row as we have many references to that. unsure if we should use the term Tuple in all those places or say Row while linking to glossary-tuple, or something else * temporarily re-added an id for glossary-segment, glossary-wal-segment, glossary-analytic-function, as those were also referenced and will need similar decisions made * added a stub entry for glossary-unique-index, unsure if it should have a definition on it's own, or we split it into unique and index. * I noticed several cases where a glossterm is used twice in a definition, but didn't de-term them * I'm curious about how we should tag a term when using it in its own definition. same as anywhere else? From 4603ce04306e77f5508bb207b42e5dec1425e7c5 Mon Sep 17 00:00:00 2001 From: coreyhuinker Date: Thu, 2 Apr 2020 15:32:43 -0400 Subject: [PATCH] glossary v7 --- doc/src/sgml/filelist.sgml |1 + doc/src/sgml/glossary.sgml | 1589 doc/src/sgml/postgres.sgml |1 + 3 files changed, 1591 insertions(+) create mode 100644 doc/src/sgml/glossary.sgml diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml index 1043d0f7ab..cf21ef857e 100644 --- a/doc/src/sgml/filelist.sgml +++ b/doc/src/sgml/filelist.sgml @@ -170,6 +170,7 @@ + diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml new file mode 100644 index 00..edfcf9d725 --- /dev/null +++ b/doc/src/sgml/glossary.sgml @@ -0,0 +1,1589 @@ + + Glossary + + This is a list of terms and their meaning in the context of + PostgreSQL and relational database + systems in general. + + + + + Aggregate Function + + + A function that + combines (aggregates) multiple input values, + for example by counting, averaging or adding, + yielding a single output value. + + + For more information, see + . + + + + + + + Analyze (operation) + + + The process of collecting statistics from data in + tables + and other relations + to help the query planner + to make decisions about how to execute + queries. + + + + + + Analytic Function + + + + + Atomic + + + In reference to a datum: + the fact that its value that cannot be broken down into smaller + components. + + + + + In reference to a + database transaction: + see atomicity. + + + + + + Atomicity + + + The property of a transaction + that either all its operations complete as a single unit or none do. + This is one of the ACID properties. + + + + + + Attribute + + + An element with a certain name and data type found within a + tuple or + table. + + + + + + Autovacuum + + + Background processes that routinely perform + Vacuum and Analyze + operations. + + + For more information, see + . + + + + + + Backend (process) + + + Processes of an Instance which act on behalf of + client Connections and handle their requests. + + + (Don't confuse this term with the similar terms + Background Worker or + Background Writer). + + + + + + Background Worker (process) + + + Individual processes within an Instance, which + run system- or user-supplied code. A typical use case is a process + which handles parts of an SQL query to take + advantage of parallel execution on servers with multiple + CPUs. + + +For more information, see +. + + + + + + Background Writer (process) + + + A process that continuously writes dirty pages from + Shared Memory to the file system. + It wakes up periodically, but + works only for a short period in order to distribute its expensive + I/O activity over time, instead of generating fewer + larger I/O peaks which could block other processes. + + + For more information, see + . + + + + + + Cast + + + A conversion of a Datum from its current data + type to another data type. + + + + + + Catalog + + + The SQL standard uses this term to + indicate what is called a Database in + PostgreSQL's terminology. + + + This should not be confused with the + System Catalog. + + + For more information, see + . + + + + + + Check Constraint + + + A type of Constraint defined on a
Re: Add A Glossary
> > I propose we define "planner" and make "optimizer" a entry. > I have no objection to more entries, or edits to entries, but am concerned that the process leads to someone having to manually merge several start-from-scratch patches, with no clear sense of when we'll be done. I may make sense to appoint an edit-collector. > I further propose not to define the term "normalized", at least not for > now. That seems a very deep rabbit hole. > +1 I think we appointed a guy named Xeno to work on that definition. He says he's getting close...
Re: Add A Glossary
> > 2. I found out that "see xyz" and "see also" have bespoke markup in > Docbook -- and . I changed some glossentries > to use those, removing some glossdefs and changing a couple of paras to > glossseealsos. I also removed all "id" properties from glossentries > that are just , because I think it's a mistake to have > references to entries that will make the reader look up a different > term; for me as a reader that's annoying, and I don't like to annoy > people. > +1 These structural enhancements are great. I'm fine with removing the id from just-glossee, and glad that we're keeping the entry to aid discovery. > I rewrote the definition for "atomic" once again. Made it two > glossdefs, because I can. If you don't like this, I can undo. > +1 Splitting this into two definitions, one for each context, is the most sensible thing and I don't know why I didn't do that in the first place.
Re: Add A Glossary
On Tue, Mar 31, 2020 at 2:09 PM Justin Pryzby wrote: > On Sun, Oct 13, 2019 at 04:52:05PM -0400, Corey Huinker wrote: > > 1. It's obviously incomplete. There are more terms, a lot more, to add. > > How did you come up with the initial list of terms ? > 1. I asked some newer database people to come up with a list of terms that they used. 2. I then added some more terms that seemed obvious given that first list. 3. That combined list was long on general database concepts and theory, and short on administration concepts 4. Then Jürgen suggested that we integrate his working list of terms, very much focused on internals, so I did that. 5. Everything after that was applying suggested edits and new terms. > Here's some ideas; I'm *not* suggesting to include all of everything, but > hopefully start with a coherent, self-contained list. > I don't think this list will ever be complete. It will always be a work in progress. I'd prefer to get the general structure of a glossary committed in the short term, and we're free to follow up with edits that focus on the wording. > > grep -roh '[^<]*' doc/src/ |sed 's/.*/\L&/' |sort |uniq -c > |sort -nr |less > > Maybe also: > object identifier > operator classes > operator family > visibility map > Just so I can prioritize my work, which of these things, along with your suggestions in previous emails, would you say is a barrier to considering this ready for a committer?
Re: Add A Glossary
On Sun, Mar 29, 2020 at 5:29 AM Jürgen Purtz wrote: > On 27.03.20 21:12, Justin Pryzby wrote: > > On Fri, Mar 20, 2020 at 11:32:25PM +0100, Jürgen Purtz wrote: > +Archiver > >>> Can you change that to archiver process ? > >> I prefer the short term without the addition of 'process' - concerning > >> 'Archiver' as well as the other cases. But I'm not an native English > >> speaker. > > I didn't like it due to lack of context. > > > > What about "wal archiver" ? > > > > It occured to me when I read this. > > > https://www.postgresql.org/message-id/20200327.163007.128069746774242774.horikyota.ntt%40gmail.com > > > "WAL archiver" is ok for me. In the current documentation we have 2 > places with "WAL archiver" and 4 with "archiver"-only > (high-availability.sgml, monitoring.sgml). > > "backend process" is an exception to the other terms because the > standalone term "backend" is sensibly used in diverse situations. > > Kind regards, Jürgen > I've taken Alvarao's fixes and done my best to incorporate the feedback into a new patch, which Roger's (tech writer) reviewed yesterday. The changes are too numerous to list, but the highlights are: New definitions: * All four ACID terms * Vacuum (split off from Autovacuum) * Tablespace * WAL Archiver (replaces Archiver) Changes to existing terms: * Implemented most wording changes recommended by Justin * all remaining links were either made into xrefs or edited out of existence * de-tagged most second uses of of a term within a definition Did not do * Addressed the " Process" suffix suggested by Justin. There isn't consensus on these changes, and I'm neutral on the matter * change the Cast definition. I think it's important to express that a cast has a FROM datatype as well as a TO * anything host/server related as I couldn't see a consensus reached Other thoughts: * Trivial definitions that are just see-other-definition are ok with me, as the goal of this glossary is to aid in discovery of term meanings, so knowing that two terms are interchangable is itself helpful It is my hope that this revision represents the final _structural_ change to the glossary. New definitions and edits to existing definitions will, of course, go on forever. From 8a163603102f51a3eddfb05c51baf3b840c5d7f7 Mon Sep 17 00:00:00 2001 From: coreyhuinker Date: Mon, 30 Mar 2020 13:08:27 -0400 Subject: [PATCH] glossary v4 --- doc/src/sgml/filelist.sgml |1 + doc/src/sgml/glossary.sgml | 1551 doc/src/sgml/postgres.sgml |1 + 3 files changed, 1553 insertions(+) create mode 100644 doc/src/sgml/glossary.sgml diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml index 1043d0f7ab..cf21ef857e 100644 --- a/doc/src/sgml/filelist.sgml +++ b/doc/src/sgml/filelist.sgml @@ -170,6 +170,7 @@ + diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml new file mode 100644 index 00..eab14f3c9b --- /dev/null +++ b/doc/src/sgml/glossary.sgml @@ -0,0 +1,1551 @@ + + Glossary + + This is a list of terms and their meaning in the context of + PostgreSQL and relational database + systems in general. + + + +Aggregating + + + The act of combining a collection of data (input) values into + a single output value, which may not be of the same type as the + input values. + + + + + +Aggregate Function + + + A Function that combines multiple input values, + for example by counting, averaging or adding them all together, + yielding a single output value. + + + For more information, see + . + + + See also Window Function. + + + + + +Analytic + + + A Function whose computed value can reference + values found in nearby Rows of the same + Result Set. + + + For more information, see + . + + + + + +Atomic + + + In reference to the value of an Attribute or + Datum: an item that cannot be broken down + into smaller components. + + + In reference to an operation: an event that cannot be completed in + part; it must either entirely succeed or entirely fail. For + example, a series of SQL statements can be + combined into a Transaction, and that + transaction is said to be atomic. + Atomic. + + + + + +Atomicity + + + One of the ACID properties. This is the state of + being Atomic in the operational/transactional sense. + + + + + +Attribute + + + An element with a certain name and data type found within a + Tuple or Table. + + + + + +Autovacuum + + + Background Worker processes that routinely + perform Vacuum operations. + + + For more information, see + . + + + + + +Backend Process + + + Processes of an
Re: Add A Glossary
> > > > > + Records to the file system and creates a special > > > > Does the chckpointer actually write WAL ? > > Yes. > > > An FK doesn't require the values in its table to be unique, right ? > > I believe it does require that the values are unique. > > > I think there's some confusion. Constraints are not objects, right ? > > I think constraints are definitely objects. They have names and you > can, for example, COMMENT on them. > > > Do we use shared_buffers for WAL ? > > No. > > (I have not reviewed the patch; these are just a few comments on your > comments.) > > I'm going to be coalescing the feedback into an updated patch very soon (tonight/tomorrow), so please keep the feedback on the text/wording coming until then. If anyone has a first attempt at all the ACID definitions, I'd love to see those as well.
Re: optimisation? collation "C" sorting for GroupAggregate for all deterministic collations
> > Perhaps this is what you mean by "deterministic", but isn't it > possible for some collations to treat multiple byte sequences as equal > values? And those multiple byte sequences wouldn't necessarily occur > sequentially in C collation, so it wouldn't be possible to work around > that by having the grouping node use one collation but the sorting > node use the C one. > > If my memory is incorrect, then this sounds like an intriguing idea. > > I could see the value in a hash aggregate on C-collation that then passes itself as a partial aggregate up to another step which applies the collation and then finalizes the aggregation before sorting
Re: Add A Glossary
On Fri, Mar 20, 2020 at 6:32 PM Jürgen Purtz wrote: > man pages: Sorry, if I confused someone with my poor English. I just > want to express in my 'offline' mail that we don't have to worry about > man page generation. The patch doesn't affect files in the /ref > subdirectory from where man pages are created. > It wasn't your poor English - everyone else understood what you meant. I had wondered if our docs went into man page format as well, so my research was still time well spent.
Re: Add A Glossary
> > It's hard to review work from a professional tech writer. I'm under the > constant impression that I'm ruining somebody's perfect end product, > making a fool of myself. If it makes you feel better, it's a mix of definitions I wrote that Roger proofed and restructured, ones that Jürgen had written for a separate effort which then got a Roger-pass, and then some edits of my own and some by Jürgen which I merged without consulting Roger.
Re: Add A Glossary
> > Jürgen mentioned off-list that the man page doesn't build. I was going to >> look into that, but if anyone has more familiarity with that, I'm listening. >> > Looking at this some more, I'm not sure anything needs to be done for man pages. man1 is for executables, man3 seems to be dblink and SPI, and man7 is all SQL commands. This isn't any of those. The only possible thing left would be how to render the text of a foo sgml/postgres.sgml: sgml/release.sgml:[A-Z][A-Z_ ]+[A-Z_] , , , sgml/stylesheet.css:acronym { font-style: inherit; } filelist.sgml, postgres.sgml, ans stylesheet.css already have the corresponding change, and the release.sgml is just an incidental mention of acronym. Of course I could be missing something. >
Re: Add A Glossary
On Thu, Mar 19, 2020 at 8:11 PM Alvaro Herrera wrote: > I gave this a look. I first reformatted it so I could read it; that's > 0001. Second I changed all the long items into s, which > Thanks! I didn't know about xrefs, that is a big improvement. > are shorter and don't have to repeat the title of the refered to page. > (Of course, this changes the link to be in the same style as every other > link in our documentation; some people don't like it. But it's our > style.) > > There are some mistakes. "Tupple" is most glaring one -- not just the > typo but also the fact that it goes to sql-revoke. A few definitions > we'll want to modify. Nothing too big. In general I like this work and > I think we should have it in pg13. > > Please bikeshed the definition of your favorite term, and suggest what > other terms to add. No pointing out of mere typos yet, please. > Jürgen mentioned off-list that the man page doesn't build. I was going to look into that, but if anyone has more familiarity with that, I'm listening. > I think we should have the terms Consistency, Isolation, Durability. > +1
Re: Add A Glossary
On Fri, Mar 13, 2020 at 12:18 AM Jürgen Purtz wrote: > > The statement that names of schema objects are unique isn't *strictly* true, > just *mostly* true. Take the case of a unique constraints. > > Concerning CONSTRAINTS you are right. Constraints seems to be an exception: > >- Their name belongs to a schema, but are not necessarily unique >within this context: >https://www.postgresql.org/docs/current/catalog-pg-constraint.html. >- There is a UNIQUE index within the system catalog pg_constraints: > "pg_constraint_conrelid_contypid_conname_index" >UNIQUE, btree (conrelid, contypid, conname), which expresses that >names are unique within the context of a table/constraint-type. >Nevertheless tests have shown that some stronger restrictions exists across >table-boarders (,which seems to be implemented in CREATE statements - or as >a consequence of your mentioned correlation between constraint and index > ?). > > I hope that there are no more such exception to the global rule 'object > names in a schema are unique': > https://www.postgresql.org/docs/current/sql-createschema.html > > This facts must be mentioned as a short note in glossary and in more > detail in the later patch about the architecture. > > > I did what I could to address the near uniqueness, as well as incorporate your earlier edits into this new, squashed patch attached. From dbce6922194eb4ad8de57e81e182b9a6eebf859e Mon Sep 17 00:00:00 2001 From: coreyhuinker Date: Tue, 10 Mar 2020 11:26:29 -0400 Subject: [PATCH] add glossary page with revisions --- doc/src/sgml/filelist.sgml |1 + doc/src/sgml/glossary.sgml | 1072 doc/src/sgml/postgres.sgml |1 + 3 files changed, 1074 insertions(+) create mode 100644 doc/src/sgml/glossary.sgml diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml index 3da2365ea9..504c8a6326 100644 --- a/doc/src/sgml/filelist.sgml +++ b/doc/src/sgml/filelist.sgml @@ -170,6 +170,7 @@ + diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml new file mode 100644 index 00..d28bfb6fcf --- /dev/null +++ b/doc/src/sgml/glossary.sgml @@ -0,0 +1,1072 @@ + + Glossary + + This is a list of terms and their meaning in the context of PostgreSQL and Databases in general. + + + +Aggregate + + + To combine a collection of data values into a single value, whose value may not be of the same type as the original values. Aggregate Functions combine multiple Rows that share a common set of values into one Row, which means that the only data visible in the values in common, and the aggregates of the non-common data. + + + For more information, see Aggregate Functions. + + + + + +Analytic + + + A Function whose computed value can reference values found in nearby Rows of the same Result Set. + + + For more information, see Window Functions. + + + + + +Archiver + + + A process that backs up WAL Files in order to reclaim space on the file system. + + + For more information, see Backup and Restore: Continuous Archiving and Point-in-Time Recovery (PITR). + + + + + +Atomic + + + In reference to the value of an Attribute or Datum: cannot be broken down into smaller components. + + + In reference to an operation: An event that cannot be completed in part: it must either entirely succeed or entirely fail. A series of SQL statements can be combined into a Transaction, and that transaction is said to be Atomic. + + + + + +Attribute + + + An element with a certain name and data type found within a Tuple or Table. + + + + + +Autovacuum + + + Processes that remove outdated MVCC Records of the Heap and Index. + + + For more information, see Routine Database Maintenance Tasks: Routine Vacuuming. + + + + + +Backend Process + + + Processes of an Instance which act on behalf of client Connections and handle their requests. + + + (Don't confuse this term with the similar terms Background Worker or Background Writer). + + + + + +Backend Server + + + See Instance. + + + + + +Background Worker + + + Individual processes within an Instance, which run system- or user-supplied code. Typical use cases are processes which handle parts of an SQL query to take advantage of parallel execution on servers with multiple CPUs. + + + For more information, see Background Worker Processes. + + + + + +Background Writer + + + Writes continuously dirty pages from Shared Memory to the file system. It starts periodically, but works only for a short period in order to distribute +expensive I/O activity over time instead of generating
Re: Add A Glossary
> > > * Transaction - yes, all those things could be "visible" or they could be > "side effects". It may be best to leave the over-simplified definition in > place, and add a "For more information see < tutorial-transactions>> > transaction-iso would be a better linkref in this case
Re: Add A Glossary
> > It will be helpful for diff-ing to restrict the length of lines in the > SGML files to 71 characters (as usual). I did it that way for the following reasons 1. It aids grep-ability 2. The committers seem to be moving towards that for SQL strings, mostly for reason #1 3. I recall that the code is put through a linter as one of the final steps before release, I assumed that the SGML gets the same. 4. Even if #3 is false, its easy enough to do manually for me to do for this one file once we've settled on the text of the definitions. As for the changes, most things seem fine, I specifically like: * Checkpoint - looks good * yes, PGDATA should have been a literal * Partition - the a/b split works for me * Unlogged - it reads better I'm not so sure on / responses to your ???s: * The statement that names of schema objects are unique isn't *strictly* true, just *mostly* true. Take the case of a unique constraints. The constraint has a name and the unique index has the same name, to the point where adding a unique constraint using an existing index renames that index to conform to the constraint name. * Serializable "other way around" question - It's both. Outside the transaction you can't see changes made inside another transaction (though you can be blocked by them), and inside serializable you can't see any changes made since you started. Does that make sense? Were you asking a different question? * Transaction - yes, all those things could be "visible" or they could be "side effects". It may be best to leave the over-simplified definition in place, and add a "For more information see <>
Re: Add A Glossary
On Wed, Mar 11, 2020 at 12:50 PM Jürgen Purtz wrote: > I made changes on top of 0001-add-glossary-page.patch which was supplied > by C. Huinker. This affects not only terms proposed by me but also his > original terms. If my changes are not obvious, please let me know and I > will describe my motivation. > > Please note especially lines marked with question marks. > > It will be helpful for diff-ing to restrict the length of lines in the > SGML files to 71 characters (as usual). > > J. Purtz > A new person replied off-list with some suggested edits, all of which seemed pretty good. I'll incorporate them myself if that person chooses to remain off-list.
Re: Add A Glossary
This latest version is an attempt at merging the work of Jürgen Purtz into what I had posted earlier. There was relatively little overlap in the terms we had chosen to define. Each glossary definition now has a reference id (good idea Jürgen), the form of which is "glossary-term". So we can link to the glossary from outside if we so choose. I encourage everyone to read the definitions, and suggest fixes to any inaccuracies or awkward phrasings. Mostly, though, I'm seeking feedback on the structure itself, and hoping to get that committed. On Tue, Feb 11, 2020 at 11:22 PM Corey Huinker wrote: > It seems like this could be a good idea, still the patch has been >> waiting on his author for more than two weeks now, so I have marked it >> as returned with feedback. >> > > In light of feedback, I enlisted the help of an actual technical writer > (Roger Harkavy, CCed) and we eventually found the time to take a second > pass at this. > > Attached is a revised patch. > > From 690473e51fc442c55c1744f69813795fce9d22dc Mon Sep 17 00:00:00 2001 From: coreyhuinker Date: Tue, 10 Mar 2020 11:26:29 -0400 Subject: [PATCH] add glossary page --- doc/src/sgml/filelist.sgml |1 + doc/src/sgml/glossary.sgml | 1008 doc/src/sgml/postgres.sgml |1 + 3 files changed, 1010 insertions(+) create mode 100644 doc/src/sgml/glossary.sgml diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml index 3da2365ea9..504c8a6326 100644 --- a/doc/src/sgml/filelist.sgml +++ b/doc/src/sgml/filelist.sgml @@ -170,6 +170,7 @@ + diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml new file mode 100644 index 00..851e9debe6 --- /dev/null +++ b/doc/src/sgml/glossary.sgml @@ -0,0 +1,1008 @@ + + Glossary + + This is a list of terms and their in the context of PostgreSQL and Databases in general. + + + +Aggregate + + + To combine a collection of data values into a single value, whose value may not be of the same type as the original values. Aggregate Functions combine multiple Rows that share a common set of values into one Row, which means that the only data visible in the values in common, and the aggregates of the non-common data. + + + For more information, see Aggregate Functions. + + + + + +Analytic + + + A Function whose computed value can reference values found in nearby Rows of the same Result Set. + + + For more information, see Window Functions. + + + + + +Archiver + + + A process that backs up WAL Files in order to reclaim space on the file system. + + + For more information, see Backup and Restore: Continuous Archiving and Point-in-Time Recovery (PITR). + + + + + +Atomic + + + In reference to the value of an Attribute or Datum: cannot be broken up into smaller components. + + + In reference to an operation: An event that cannot be completed in part: it must either entirely succeed or entirely fail. A series of SQL statements can be combined into a Transaction, and that transaction is said to be Atomic. + + + + + +Attribute + + + A typed data element found within a Tuple or Relation or Table. + + + + + +Autovacuum + + + Processes that remove outdated MVCC Records of the Heap and Index. + + + For more information, see Routine Database Maintenance Tasks: Routine Vacuuming. + + + + + +Backend Process + + + Processes of an Instance which act on behalf of client Connections and handle their requests. + + + (Don't confuse this term with the similar terms Background Worker or Background Writer). + + + + + +Backend Server + + + See Instance. + + + + + +Background Worker + + + Individual processes within an Instance, which run system- or user-supplied code. Typical use cases are processes which handle parts of an SQL query to take advantage of parallel execution on servers with multiple CPUs. + + + For more information, see Background Worker Processes. + + + + + +Background Writer + + + Writes continuously dirty pages from Shared Memory to the file system. It starts periodically, but works only for a short period in order to distribute +expensive I/O activity over time instead of generating fewer large I/O peaks which could block other processes. + + + For more information, see Server Configuration: Resource Consumption. + + + + + +Cast + + + A conversion of a Datum from its current data type to another data type. + + + + + +Catalog + + + The SQL standard uses this standalone term to indicate
Re: Resolving the python 2 -> python 3 mess
> > So, as with Jesse's example, what I'm wondering is whether or not 2to3 > will fix that for you (or even flag it). The basic difference between > the two alternatives I suggested is whether we force people to put their > python function through that converter before we'll even try to run it. > Subtleties that 2to3 doesn't catch seem like non-reasons to insist on > applying it. > The 2018 vintage of 2to3 didn't catch it. It's not firsthand knowledge, but I just watched a nearby team have some production issues where one library couldn't fetch b'http://foo.org' so I'm guessing 2to3 still doesn't catch those things, or they stopped using it.
Re: Resolving the python 2 -> python 3 mess
> > A possible gotcha in this approach is if there are any python 2/3 > incompatibilities that would not manifest as syntax errors or > obvious runtime errors, but would allow old code to execute and > silently do the wrong thing. One would hope that the Python crowd > weren't dumb enough to do that, but I don't know whether it's true. > If there are nasty cases like that, maybe what we have to do is allow > plpythonu/plpython2u functions to be dumped and reloaded into a > python-3-only install, but refuse to execute them until they've > been converted. > Unfortunately, I think there are cases like that. The shift to Unicode as the default string means that some functions that used to return a `str` now return a `bytes` (I know of this in the hashlib and base64 modules, but probably also in URL request data and others), and to use a `bytes` in string manipulation you have to first explicitly convert it to some string encoding. So things like a function that wraps around a python crypto library would be the exact places where those was-str-now-bytes functions would be used.
Re: Add A Glossary
> > It seems like this could be a good idea, still the patch has been > waiting on his author for more than two weeks now, so I have marked it > as returned with feedback. > In light of feedback, I enlisted the help of an actual technical writer (Roger Harkavy, CCed) and we eventually found the time to take a second pass at this. Attached is a revised patch. From f087e44fe4db7996880cf4df982297018d444363 Mon Sep 17 00:00:00 2001 From: Corey Huinker Date: Wed, 12 Feb 2020 04:17:59 + Subject: [PATCH] add glossary page with initial definitions --- doc/src/sgml/filelist.sgml | 1 + doc/src/sgml/glossary.sgml | 540 + doc/src/sgml/postgres.sgml | 1 + 3 files changed, 542 insertions(+) create mode 100644 doc/src/sgml/glossary.sgml diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml index 3da2365ea9..504c8a6326 100644 --- a/doc/src/sgml/filelist.sgml +++ b/doc/src/sgml/filelist.sgml @@ -170,6 +170,7 @@ + diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml new file mode 100644 index 00..1b881690fa --- /dev/null +++ b/doc/src/sgml/glossary.sgml @@ -0,0 +1,540 @@ + + + + Glossary + + This is a list of terms and their definitions in the context of PostgreSQL and databases in general. + + + +Aggregate + + + The act of combining a defined collection of data values into a single value that may not be the same type as the original values. Aggregate functions are most often used with Grouping operations which define the separate sets of data by the common values shared within those sets. + + + + + +Analytic + + + A function whose computed value can reference values found in nearby rows of the same result set. + + + + + +Atomic + + + When referring to the value of an attribute or datum: cannot be broken up into smaller components. + + + When referring to an operation: An event that cannot be partially completed; it must either completely succeed or completely fail. A series of SQL statements can be combined into a transaction, and that transaction is described as atomic. + + + + + +Attribute + + + A typed data element found within a tuple or relation or table. + + + + + +BYTEA + + + A data type for storing binary data. It is roughly analogous to the BLOB data type in other database products. + + + + + +Cast + + + The act of converting of a datum from its current data type to another data type. + + + + + +Check Constraint + + + A type of constraint defined for a relation which restricts the values allowed in one or more attributes. The check constraint can make reference to any attribute in the relation, but cannot reference other rows of the same relation or other relations. + + + + + +Column + + + An attribute found in a table or view. + + + + + +Commit + + + The act of finalizing a transaction within the database. + + + + + +Concurrency + + + The concept that multiple independent operations can be happening within the database at the same time. + + + + + +Constraint + + + A method of restricting the values of data allowed within a relation. Constraints can currently be of the following types: Check Constraint, Unique Constraint, and Exclusion Constraint. + + + + + +Datum + + + The internal representation of a SQL datatype. + + + + + +Delete + + + A SQL command that removes rows from a given table or relation. + + + + + +Exclusion Constraint + + + Exclusion constraints define both a set of columns for matching rows, and rules where values in one row would conflict with values in another. + + + + + +Foreign Data Wrapper + + + A means of representing data outside the local database so that it appears as if it were in local tables. With a Foreign Data Wrapper it is possible to define a Foreign Server and Foreign Tables. + + + + + +Foreign Key + + + A type of constraint defined on one or more columns in a table which requires the value in those columns to uniquely identify a row in the specified table. + + + + + +Foreign Server + + + A named collection of Foreign Tables which all use the same Foreign Data Wrapper and have other configured attributes in common. + + + + + +Foreign Table + + + A relation which appears to have rows and columns like a regular table, but when queried will instead forward the request for data through its Foreign Data Wrapper, which will return results structured according to the definition
Add Change Badges to documentation
Attached is a patch to implement change badges in our documentation. What's a change badge? It's my term for a visual cue in the documentation used to indicate that the nearby section of documentation is new in this version or otherwise changed from the previous version. One example of change badges being used is in the DocBook documentation reference: https://tdg.docbook.org/tdg/4.5/ref-elements.html#common.attributes Docbook used graphical badges, which seemed to be a bad idea. Instead, I went with a decorated text span like one finds in gmail labels or Reddit "flair". The badges are implemented via using the "revision" attribute available on all docbook tags. All one needs to do to indicate a change is to change one tag, and add a revision attribute. For example: will add a small green text box with the tex "new in 13" immediately preceding the rendered elements. I have attached a screenshot (badges_in_acronyms.png) of an example of this from my browser viewing changes to the acronyms.html file. This obviously lacks the polish of viewing the page on a full website, but it does give you an idea of the flexibility of the change badge, and where badge placement is (and is not) a good idea. What are the benefits of using this? I think the benefits are as follows: 1. It shows a casual user what pieces are new on that page (new functions, new keywords, new command options, etc). 2. It also works in the negative: a user can quickly skim a page, and lacking any badges, feel confident that everything there works in the way that it did in version N-1. 3. It also acts as a subtle cue for the user to click on the previous version to see what it used to look like, confident that there *will* be a difference on the previous version. How would we implement this? 1. All new documentation pages would get a "NEW" badge in their title. 2. New function definitions, new command options, etc would get a "NEW" badge as visually close to the change as is practical. 3. Changes to existing functions, options, etc. would get a badge of "UPDATED" 4. At major release time, we could do one of two things: 4a. We could keep the NEW/UPDATED badges in the fixed release version, and then completely remove them from the master, because for version N+1, they won't be new anymore. This can be accomplished with an XSL transform looking for any tag with the "revision" attribute 4b. We could code in the version number at release time, and leave it in place. So in version 14 you could find both "v13" and "v14" badges, and in version 15 you could find badges for 15, 14, and 13. At some point (say v17), we start retiring the v13 badges, and in v18 we'd retire the v14 badges, and so on, to keep the clutter to a minimum. Back to the patch: I implemented this only for html output, and the colors I chose are very off-brand for postgres, so that will have to change. There's probably some spacing/padding issues I haven't thought of. Please try it out, make some modifications to existing document pages to see how badges would work in those contexts. From ded965fc90b223a834ac52d55512587b7a6ea139 Mon Sep 17 00:00:00 2001 From: Corey Huinker Date: Fri, 18 Oct 2019 06:15:10 -0400 Subject: [PATCH] add document change badges --- doc/src/sgml/acronyms.sgml | 6 +++--- doc/src/sgml/stylesheet-html-common.xsl | 10 ++ doc/src/sgml/stylesheet.css | 10 ++ 3 files changed, 23 insertions(+), 3 deletions(-) diff --git a/doc/src/sgml/acronyms.sgml b/doc/src/sgml/acronyms.sgml index f638665dc9..87bfef04be 100644 --- a/doc/src/sgml/acronyms.sgml +++ b/doc/src/sgml/acronyms.sgml @@ -10,7 +10,7 @@ -ANSI +ANSI https://en.wikipedia.org/wiki/American_National_Standards_Institute;> @@ -19,7 +19,7 @@ - + API @@ -31,7 +31,7 @@ ASCII - + https://en.wikipedia.org/wiki/Ascii;>American Standard Code for Information Interchange diff --git a/doc/src/sgml/stylesheet-html-common.xsl b/doc/src/sgml/stylesheet-html-common.xsl index 9edce52a10..cb04cb7f0d 100644 --- a/doc/src/sgml/stylesheet-html-common.xsl +++ b/doc/src/sgml/stylesheet-html-common.xsl @@ -289,4 +289,14 @@ set toc,title + + + + + + + + + + diff --git a/doc/src/sgml/stylesheet.css b/doc/src/sgml/stylesheet.css index 1a66c789d5..d0cae2f59f 100644 --- a/doc/src/sgml/stylesheet.css +++ b/doc/src/sgml/stylesheet.css @@ -109,3 +109,13 @@ acronym { font-style: inherit; } width: 75%; } } + +/* version badge styling */ +span.revision-badge { + visibility: visible ; +color: white; + background-color: #00933C; + border: 1px solid #00; + border-radius: 2px; +padding: 1px; +} -- 2.14.1
Add A Glossary
Attached is a v1 patch to add a Glossary to the appendix of our current documentation. I believe that our documentation needs a glossary for a few reasons: 1. It's hard to ask for help if you don't know the proper terminology of the problem you're having. 2. Readers who are new to databases may not understand a few of the terms that are used casually both in the documentation and in forums. This helps to make our documentation a bit more useful as a teaching tool. 3. Readers whose primary language is not English may struggle to find the correct search terms, and this glossary may help them grasp that a given term has a usage in databases that is different from common English usage. 3b. If we are not able to find the resources to translate all of the documentation into a given language, translating the glossary page would be a good first step. 4. The glossary would be web-searchable, and draw viewers to the official documentation. 5. adding link anchors to each term would make them cite-able, useful in forum conversations. A few notes about this patch: 1. It's obviously incomplete. There are more terms, a lot more, to add. 2. The individual definitions supplied are off-the-cuff, and should be thoroughly reviewed. 3. The definitions as a whole should be reviewed by an actual tech writer (one was initially involved but had to step back due to prior commitments), and the definitions should be normalized in terms of voice, tone, audience, etc. 4. My understanding of DocBook is not strong. The glossary vs glosslist tag issue is a bit confusing to me, and I'm not sure if the glossary tag is even appropriate for our needs. 5. I've made no effort at making each term an anchor, nor have I done any CSS styling at all. 6. I'm not quite sure how to handle terms that have different definitions in different contexts. Should that be two glossdefs following one glossterm, or two separate def/term pairs? Please review and share your thoughts. From 343d5c18bf23f98341b510595e3e042e002242cb Mon Sep 17 00:00:00 2001 From: Corey Huinker Date: Sun, 13 Oct 2019 17:57:36 + Subject: [PATCH] add glossary page with sample terms and definitions --- doc/src/sgml/filelist.sgml | 1 + doc/src/sgml/glossary.sgml | 618 doc/src/sgml/stylesheet.css | 2 + 3 files changed, 621 insertions(+) create mode 100644 doc/src/sgml/glossary.sgml diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml index 3da2365ea9..504c8a6326 100644 --- a/doc/src/sgml/filelist.sgml +++ b/doc/src/sgml/filelist.sgml @@ -170,6 +170,7 @@ + diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml new file mode 100644 index 00..016eee2d76 --- /dev/null +++ b/doc/src/sgml/glossary.sgml @@ -0,0 +1,618 @@ + + + + Glossary + + + This is a list of terms and their in the context of PostgreSQL and databases in general. + + + + +Aggregate + + + To combine a collection of data values into a single value, whose value +may not be of the same type as the original values. Aggregate functions combine +multiple rows that share a common set of values into one row, which means that +the only data visible in the values in common, and the aggregates of the +non-common data. + + + + + +Analytic + + + A function whose computed value can reference values found in nearby rows +of the same result set. + + + + + +Atomic + + + In reference to the value of an Attribute or Datum: cannot be broken up +into smaller components. + + + In reference to an operation: An event that cannot be completed in part: +it must either entirely succeed or entirely fail. A series of SQL statements can +be combined into a Transaction, and that transaction is said to be Atomic. + + + + + +Attribute + + + A typed data element found within a Tuple or Relation or Table. + + + + + +Cast + + + A conversion of a Datum from its current data type to another data type. + + + + + +Check Constraint + + + A type of constraint defined on a relation which restricts the values +allowed in one or more Attributes. The check constraint can make reference to +any Attribute in the Relation, but cannot reference other rows of the same +relation or other relations. + + + + + +Column + + + An Attribute found in a Table or View. + + + + + +Commit + + + The act of finalizing a Transaction within the database. + + + + + +Concurrency + + + The concept that multiple independent operations can be happening within +the database at the same time. + + + + + +Constraint + + + A method of restricting the values of data allowed within a Table. + + + + + +Datum + + + The internal
Re: Extending range type operators to cope with elements
> > > >- @> contains range/element > >- <@ element/range is contained by > I'm not a heavy user or range types, so I can't really judge how useful > that is in practice, but it seems like a fairly natural extension of the > existing operators. I mean, if I understand it correctly, the proposed > behavior is equal to treating the element as a "collapsed range". > I used to give a talk on ranges and partitioning, prior to postgresql getting native partitioning (see: https://wiki.postgresql.org/images/1/1b/Ranges%2C_Partitioning_and_Limitations.pdf ) In that talk, I mention the need for exactly these operators, specifically for an extension called range_partitioning which had some logic for "If I were to insert a row with this value, what partition would it end up in?" which allowed for a subsequent COPY operation directly to that partition. That logic essentially binary-searched a series of ranges, so it needed an "elem <@ range" as well as << and >>. Yes, constructing a collapsed range was the work-around I used in the absence of real functions. That extension has been replaced by real table partitioning and the planner itself now does similar logic for partition pruning. So yes, I've had a need for those operators in the past. What I don't know is whether adding these functions will be worth the catalog clutter.
Re: Referential Integrity Checks with Statement-level Triggers
> > > > The people who expressed opinions on nuking triggers from orbit (it's > the only way to be sure) have yet to offer up any guidance on how to > proceed from here, and I suspect it's because they're all very busy getting > things ready for v12. I definitely have an interest in working on this for > 13, but I don't feel good about striking out on my own without their input. > > Very interesting thread, but the current patch has been through two > CFs without comments or new patches, so I'm going to mark it "Returned > with feedback". I hope all this discussion will trigger more research > in this space. > I've noticed that the zedstore efforts ran into the same problem that refactoring triggers has: we cannot determine which columns in a table will be affected by a trigger. so we have to assume that all of them will be. This causes a lot of unnecessary overhead with triggers. If we had a compilation step for triggers (which, ultimately means a compilation step for procedures) which kept a dependency tree of which tables/columns were touched, then we would have that insight. it's true that one dynamic statement or SELECT * would force us right back to keep-everything, but if procedures which did not do such things had performance benefits, that would be an incentive to code them more fastidiously.
Re: \describe*
> > It seems this topic is ongoing so I've moved it to the September CF, > but it's in "Waiting on Author" because we don't have a concrete patch > that applies (or agreement on what it should do?) right now. > All recent work has been investigating the need(s) we're trying to address. This is as good of a time as any to share my findings (with much collaboration with Dave Fetter) so far. 1. Adding helper commands to psql aids only psql, and a great number of users do not, or can not, use psql. So adding something on the server side would have broader usage and appeal. Furthermore, some access tools (especially browser-based ones) are not good about returning non-tabular results, so helper commands that return result sets would have the broadest usage. 2. Our own interest in server-side commands is all over the map. Some just want the convenience of having them server side, or familiarity with $OTHER_DB. Others want to eliminate the need for some code in pg_dump, JDBC, or elsewhere. 3. There isn't much consensus in the other databases, though all of them do *something*: SQLServer --- SQLServer has sp_help ( https://docs.microsoft.com/en-us/sql/relational-databases/system-stored-procedures/sp-help-transact-sql?view=sql-server-2017 ) which contextually returns one of two different result sets (name, owner, object type) or (column name, type, storage, length, precision, scale, nullable, default, rule, collation) DB2 -- Has a describe command (source: https://www.ibm.com/support/knowledgecenter/SSEPGG_11.1.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0002019.html) which can be used to describe query output (data type, data type length, column name, column name length). It also has an option to DESCRIBE TABLE foo which returns a set of (col_name, schema_of_datatype, data_type, data_type_length, data_type_scale, Nulls t/f) It also has DESCRIBE INDEXES FOR TABLE foo which returns a set of (schema of index, name of index, unique flag, number of columns, index type) It also has DESCRIBE DATA PARTITIONS FOR TABLE which as you might guess shows partitions. All of these options have a SHOW DETAIL modifier which adds more columns. MySQL -- (https://dev.mysql.com/doc/refman/8.0/en/show-columns.html) MySSQL has SHOW COLUMNS which also returns a set of (name, type similar to format_type(), null flag, PK or index indicator, default value, notes about auto-increment/autogreneration/implicit trggers), and can be extended to show privileges and comments with the EXTENDED and FULL options. MySQL has a DESCRIBE command, but it is a synonym of EXPLAIN. MySQL also has a raft of commands like SHOW CREATE USER, SHOW CREATE VIEW, SHOW CREATE TRIGGER, SHOW CREATE TABLE, etc. (ex: https://dev.mysql.com/doc/refman/8.0/en/show-create-user.html) These commands all return a result set of of exactly one column, each row representing one SQL statement, essentially doing a single-object schema-only pg_dump. Oracle - https://docs.oracle.com/cd/B19306_01/server.102/b14357/ch12019.htm SQL*Plus has a describe command that works on tables and views and composite types (tabular set of: name, null, type) procedures (tabular set of: arg name, type, in/out), and packages (a series of sets one per type and procedure) SQLcl has the INFO statement, which is roughly analogous to psql's \d in that it is a mix of tabular and non-tabular information. Oracle itself has dbms_metadata.get_ddl() which seems analogous to mysql's SHOW CREATE commands. Snowflake -- Snowflake has DESCRIBE TABLE https://docs.snowflake.net/manuals/sql-reference/sql/desc-table.html and DESCRIBE VIEW https://docs.snowflake.net/manuals/sql-reference/functions/get_ddl.html Which return a set of: (name, type, column type, null flag, default, primary key, unique key, check, expression, comment). It also has an option for describing "stage" tables, which are s3 buckets with a file format associated, the closest postgresql analog would be a file_fdw foreign table, and there is a separate result set format for that. Snowflake has no concept of indexes (it understands that there's things called a unique keys, and it remembers that you said you wanted one, but does nothing to enforce it), so no command for that. These result sets are not composable in a query, however, they are stored in the RESULT_SCAN cache, which means that you can run a describe, and then immediately fetch the results of that command as if it was a table. Snowflake also has a get_ddl() function https://docs.snowflake.net/manuals/sql-reference/sql/desc-view.html which is a one-column result set of statements to re-create the given object. >From all this, I have so far concluded: 1. There is real demand to be able to easily see the basic structure of tables, views, and indexes in a way that strikes a balance between detail and clutter. 2. There is some acknowledgement that this data be useful if it was further filtered through SQL, though only one vendor
Re: Catching missing Datum conversions
> > I should probably split this into "actionable" (categories 3 and 4) > and "noise and scaffolding" patches. > Breaking down the noise-and-scaffolding into some subgroups might make the rather long patches more palatable/exceedingly-obvious: * (Datum) 0 ---> NullDatum * 0 > NullDatum * The DatumGetPointer(allParameterTypes) null tests Having said that, everything you did seems really straightforward, except for src/backend/rewrite/rewriteDefine.c src/backend/statistics/mcv.c src/backend/tsearch/ts_parse.c and those seem like cases where the DatumGetXXX was a no-op before Datum was a struct.
Re: SHOW CREATE
On Fri, Jul 5, 2019 at 12:32 PM David Fetter wrote: > Folks, > > Corey Huinker put together the documentation for this proposed > feature. Does this seem like a reasonable way to do it? > > In doing that work, it became clear that the command was serving two masters: 1. A desire to see the underlying nuts and bolts of a given database object. 2. A desire to essentially make the schema portion of pg_dump a server side command. To that end, I see splitting this into two commands, SHOW CREATE and SHOW DUMP. SHOW DUMP would the original command minus the object type and object name specifier, and it would dump the entire current database as seen from the current user (again, no data). SHOW CREATE would still have all the object_type parameters as before, but would only dump the one specified object, plus any dependent objects specified in the WITH options (comments, grants, indexes, constraints, partitions, all). Please note that any talk of a server side DESCRIBE is separate from this. That would be a series of commands that would have result sets tailored to the object type, and each one would be an inherent compromise between completeness and readability. I'd like to hear what others have to say, and incorporate that feedback into a follow up proposal.
Re: \describe*
> > > So what is the uptake on implementing this at the server side, ie. > > DESCRIBE? > > I'm pretty skeptical of this idea, unless you are willing to throw > away at least one and possibly both of the following goals: > > 1. Compatibility with psql's existing \d behavior. > I don't think *compatibility* with the behavior should be a goal in itself. Coverage of the majority of the use-cases is. 2. Usability of DESCRIBE for any purpose whatsoever other than emitting > something that looks just like what psql prints. > > We've migrated many of the \d displays so far away from "a single query > result" that I don't believe there's a way for a server command to > duplicate them, at least not without some seriously unholy in-bed-ness > between the server command and some postprocessing logic in describe.c. > (At which point you've lost whatever system architectural value there > might be in the whole project, since having a more-arm's-length > relationship there kinda seems like the point to me.) > I think there's a genuine use for regular printed output, and there's also a use for a query-able output. Maybe that queryable output is just a JSONB output that the outer query can pick apart as it sees fit, and that would handle the fact that the data often doesn't fit into a single query's output. Incidentally, I had need of this very functionality in Snowflake the other day. The data dictionary there isn't capable of telling you which columns are in a primary key, but that information is printed when you run "DESCRIBE my_table". The workaround is to run "DESCRIBE my_table" and then make another query using a table function to recall the output of the last query made in the session, and then filter that. Yeah, as a pattern it's weird and sad, but it shows that there's are uses for something DESCRIBE-ish on the server side. So if we're going servier-side on DESCRIBE, it should be it's own entity, not beholden to design decisions made in psql. > There are a bunch of other little behavioral differences that you just > can't replicate server-side, like the fact that localization of the > results depends on psql's LC_MESSAGES not the server's. Maybe people > would be okay with changing that, but it's not a transparent > reimplementation. > I think people would be OK with that. We're asking the server what it knows about an object, not how psql feels about that same information. I think if we want to have server-side describe capability, we're better > off just to implement a DESCRIBE command that's not intended to be exactly > like \d anything, and not try to make it be the implementation for \d > anything. (This was, in fact, where David started IIUC. Other people's > sniping at that idea hasn't yielded any better idea.) > I'm very much in support of server-side DESCRIBE that's not beholden to \d in any way. For instance, I'm totally fine with DESCRIBE not being able to handle wildcard patterns. My initial suggestion for client-side \describe was mostly borne of it being easy to implement a large subset of the \d commands to help users. Not all users have psql access, so having a server side command helps more people. It could be that we decide that DESCRIBE is set-returning, and we have to break up \d functionality to suit. By this I mean that we might find it simpler to require DESCRIBE TABLE foo to only show columns with minimal information about PKs and follow up commands like "DESCRIBE TABLE foo INDEXES" or "DESCRIBE TABLE foo CONSTRAINTS" to keep output in tabular format. > In particular, I'm really strongly against having "\describe-foo-bar" > invoke DESCRIBE, because (a) that will break compatibility with the > existing \des command, and (b) it's not actually saving any typing, > and (c) I think it'd confuse users no end. > +1. Having psql figure out which servers can give proper servier-side-describes would boggle the mind. > Of course, this line of thought does lead to the conclusion that we'd be > maintaining psql/describe.c and server-side DESCRIBE in parallel forever, > Not fun, but what's our motivation for adding new new \d functionality once a viable DESCRIBE is in place? Wouldn't the \d commands essentially be feature-frozen at that point? > which doesn't sound like fun. But we should be making DESCRIBE with an > eye to more use-cases than psql. If it allows jdbc to not also maintain > a pile of equivalent code, that'd be a win. If it allows pg_dump to toss > a bunch of logic overboard (or at least stop incrementally adding new > variants), that'd be a big win. > I don't know enough about JDBC internals to know what sort of non-set results it can handle, but that seems key to showing us how to proceed. As for pg_dump, that same goal was a motivation for a similar server-side command "SHOW CREATE " (essentially, pg_dump of ) which would have basically the same design issues as DESCRIBE would, though the result set would be a much simpler SETOF text.
Re: Extracting only the columns needed for a query
> > The thing that most approaches to this have fallen down on is triggers --- > that is, a trigger function might access columns mentioned nowhere in the > SQL text. (See 8b6da83d1 for a recent example :-() If you have a plan > for dealing with that, then ... > Well, if we had a trigger language that compiled to at creation time, and that trigger didn't do any dynamic/eval code, we could store which attributes and rels were touched inside the trigger. I'm not sure if that trigger language would be sql, plpgsql with a "compile" pragma, or maybe we exhume PSM, but it could have some side benefits: 1. This same issue haunts any attempts at refactoring triggers and referential integrity, so narrowing the scope of what a trigger touches will help there too 2. additional validity checks 3. (this is an even bigger stretch) possibly a chance to combine multiple triggers into one statement, or combine mutliple row-based triggers into a statement level trigger Of course, this all falls apart with one dynamic SQL or one SELECT *, but it would be incentive for the users to refactor code to not do things that impede trigger optimization.
Re: PostgreSQL 12 Beta 1 press release draft
For CTEs, is forcing inlining the example we want to give, rather than the example of forcing materialization given? According to the docs, virtual generated columns aren't yet supported. I'm pretty sure the docs are right. Do we still want to mention it? Otherwise it looks good to me. On Tue, May 21, 2019 at 11:39 PM Jonathan S. Katz wrote: > Hi, > > Attached is a draft of the PG12 Beta 1 press release that is going out > this Thursday. The primary goals of this release announcement are to > introduce new features, enhancements, and changes that are available in > PG12, as well as encourage our users to test and provide feedback to > help ensure the stability of the release. > > Speaking of feedback, please provide me with your feedback on the > technical correctness of this announcement so I can incorporate changes > prior to the release. > > Thanks! > > Jonathan >
Re: Table as argument in postgres function
> > >> Is there anything preventing us from having the planner resolve object >> names from strings? >> > > The basic problem is fact so when you use PREPARE, EXECUTE protocol, you > has not parameters in planning time. > I agree that it defeats PREPARE as it is currently implemented with PQprepare(), and it would never be meaningful to have a query plan that hasn't finalized which objects are involved. But could it be made to work with PQexecParams(), where the parameter values are already provided? Could we make a version of PQprepare() that takes an extra array of paramValues for object names that must be supplied at prepare-time?
Re: Table as argument in postgres function
> > > You can pass table name as text or table object id as regclass type. > > inside procedure you should to use dynamic sql - execute statement. > Generally you cannot to use a variable as table or column name ever. > > Dynamic SQL is other mechanism - attention on SQL injection. > On this note, Snowflake has the ability to to parameterize object names (see: https://docs.snowflake.net/manuals/sql-reference/identifier-literal.html ) So you can do things like SELECT col_a, col_b FROM identifier('a_table_name') or as a bind variable SELECT col_a, col_b FROM identifier($1) Which is their way of avoiding SQL injection attacks in *some* circumstances. Their implementation of it is a bit uneven, but it has proven useful for my work. I can see where this obviously would prevent the planning of a prepared statement when a table name is a parameter, but the request comes up often enough, and the benefits to avoiding SQL injection attacks are significant enough that maybe we should try to enable it for one-off. I don't necessarily think we need an identifier(string) function, a 'schema.table'::regclass would be more our style. Is there anything preventing us from having the planner resolve object names from strings?
Re: range_agg
> > One question is how to aggregate ranges that would leave gaps and/or > overlaps. So in my extension there is a one-param version that forbids > gaps & overlaps, but I let you permit them by passing extra parameters, > so the signature is: > Perhaps a third way would be to allow and preserve the gaps. A while back I wrote an extension called disjoint_date_range for storing sets of dates where it was assumed that most dates would be contiguous. The basic idea was that The core datatype was an array of ranges of dates, and with every modification you'd unnest them all to their discrete elements and use a window function to identify "runs" of dates and recompose them into a canonical set. It was an efficient way of representing "Every day last year except for June 2nd and August 4th, when we closed business for special events." For arrays of ranges the principle is the same but it'd get a bit more tricky, you'd have to order by low bound, use window functions to detect adjacency/overlap to identify your runs, and the generate the canonical minimum set of ranges in your array.
Re: [PATCH v1] Add \echo_stderr to psql
> > >\warn ... >\warning ... > These two seem about the best to me, drawing from the perl warn command. I suppose we could go the bash &2 route here, but I don't want to.
Re: DWIM mode for psql
On Sun, Mar 31, 2019 at 5:04 PM Andres Freund wrote: > On 2019-04-01 09:52:34 +1300, Thomas Munro wrote: > > +/* > > + * This program is free software: you can redistribute it and/or modify > > + * it under the terms of the GNU General Public License as published by > > + * the Free Software Foundation, either version 3 of the License, or > > + * (at your option) any later version. > > Indentation bug. You really need to work a bit more careful. > The patch applies cleanly, and passes "make check", but it generated an executable called "mongodb". Should I have run "make maintainer-clean" first?
Re: Syntax diagrams in user documentation
On Thu, Mar 28, 2019 at 6:49 PM Peter Geoghegan wrote: > On Thu, Mar 28, 2019 at 3:46 PM Jeremy Schneider > wrote: > > We're just gearing up for the Google Season of Docs and I think this > > would be a great task for a doc writer to help with. Any reason to > > expect serious objections to syntax diagram graphics in the docs? > > It might be hard to come to a consensus, because it's one of those > things that everybody can be expected to have an opinion on. It > probably won't be hard to get something committed that's clearly more > informative than what we have right now, though. > > There is a question about how we maintain consistency between the > syntax diagrams in psql if we go this way, though. Not sure what to do > about that. > This discussion is highly relevant to an upcoming talk I have called "In Aid Of RTFM", and the work I hope would follow from it. While I personally like these bubble charts because they remind me of my misspent youth at IBM, they have some drawbacks: 1. They look like something out of an IBM manual 2. Images conceal information from visually impaired people 3. They aren't copy paste-able text 4. They aren't easily comparable 5. They bake in the language of the comments The merits of #1 can be argued forever, and it's possible that a more modern bubble chart theme is possible. #2 is problematic, because things like ADA compliance and the EU Accessibility Requirements frown upon conveying text inside images. The way around this might be to have the alt-text of the image be the original syntax as we have it now. #3 is important when attempting to relay the relevant excerpt of a very large documentation page via email or slack. Yes, I could right click and copy the URL of the image (in this case https://www.sqlite.org/images/syntax/insert-stmt.gif and others), but that's more work that copy-paste. We could add an HTML anchor to each image (my talk discusses our current lack of reference anchors) and that would mitigate it somewhat. Making the original text available via mouse-over or a "copy text" link might work too. #3b As long as I live, I will never properly memorize the syntax for RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. I will google this and copy-paste it. I suspect I'm not alone. If it's available only in an image, then I can't copy paste, and I *will* mistype some part of that at least twice. #4 isn't such an immediate issue, but one of my points in the talk is that right now there is no way to easily distinguish text on a page that is new in the most recent version of pgsql (i.e. a red-line markup). We could of course flag that an image changed from version X-1 to X, but it would be tougher to convey which parts of the image changed. #5 it not such a big issue because most of what is in the diagram is pure syntax, but comments will leak in, and those snippets of English will be buried very deep in bubble-markup.
GIN indexes on an = ANY(array) clause
(moving this over from pgsql-performance) A client had an issue with a where that had a where clause something like this: WHERE 123456 = ANY(integer_array_column) I was surprised that this didn't use the pre-existing GIN index on integer_array_column, whereas recoding as WHERE ARRAY[123456] <@ integer_array_column did cause the GIN index to be used. Is this a known/expected behavior? If so, is there any logical reason why we couldn't have the planner pick up on that? Flo Rance (toura...@gmail.com) was nice enough to show that yes, this is expected behavior. Which leaves the questions: - is the transformation I made is algebraically correct in a general case? - if so, could we have the planner do that automatically when in the presence of a matching GIN index? This seems like it might tie in with the enforcement of foreign keys within an array thread (which I can't presently find...).
Re: \describe*
On Mon, Mar 4, 2019 at 1:45 PM Corey Huinker wrote: > >>> - Tab completion for \descibe-verbose. >>> I know that \d+ tab completion is also not there, but I think we must >>> have tab completion for \descibe-verbose. >>> >>> postgres=# \describe- >>> \describe-extension >>> \describe-replication-publication \describe-user-mapping >>> \describe-foreign-data-wrapper >>> \describe-replication-subscription\describe-view >>> \describe-foreign-server \describe-role >>> \describe-window-function >>> \describe-foreign-table \describe-rule >>> ... >>> >> > I just confirmed that there isn't tab completion for the existing S/+ > options, so it's hard to justify them for the equivalent verbose suffixes. > We can add completions for describe[-thing-]-verbose, but the auto-completions start to run into combinatoric complexity, and the original short-codes don't do that completion, probably for the same reason. + success = >>> listTables("tvmsE", NULL, show_verbose, show_system); >>> + } >>> + status = >>> PSQL_CMD_UNKNOWN; >>> >>> > I'll look into this, thanks! > This was fixed, good find. > - Confusion about \desc and \desC >>> There is confusion while running the \desc command. I know the problem, >>> but the user may confuse by this. >>> postgres=# \desC >>>List of foreign servers >>> Name | Owner | Foreign-data wrapper >>> --+---+-- >>> (0 rows) >>> >>> postgres=# \desc >>> Invalid command \desc. Try \? for help. >>> >> I've changed the code to first strip out 0-1 instances of "-verbose" and "-system" and the remaining string must be an exact match of a describe command or it's an error. This same system could be applied to the short commands to strip out 'S' and '+' and it might clean up the original code a bit. This command shows a list of relation "\d" >>> postgres=# \describe-aggregatE-function >>> List of relations >>> Schema | Name | Type | Owner >>> +--+---+- >>> public | foo | table | vagrant >>> (1 row) >>> >> Same issue, same fix. >>> I have done a brief code review except for the documentation code. I >>> don't like this code >>> >>> if (cmd_match(cmd,"describe-aggregate-function")) >>> >>> success = describeAggregates(pattern, show_verbose, show_system); >>> else if (cmd_match(cmd, >>> "describe-access-method")) >>> success = >>> describeAccessMethods(pattern, show_verbose); >>> else if (cmd_match(cmd, >>> "describe-tablespace")) >>> success = describeTablespaces(pattern, >>> show_verbose); >>> else if (cmd_match(cmd, >>> "describe-conversion")) >>> success = listConversions(pattern, >>> show_verbose, show_system); >>> else if (cmd_match(cmd, "describe-cast")) >>> success = listCasts(pattern, >>> show_verbose >>> >>> >>> This can be achieved with the list/array/hash table, so I have changed >>> that code in the attached patch just for a sample if you want I can do that >>> for whole code. >>> >> > There's some problems with a hash table. The function signatures vary > quite a lot, and some require additional psql_scan_slash_options to be > called. The hash option, if implemented, probably should be expanded to all > slash commands, at which point maybe it belongs in psqlscanslash.l... > As I suspected, there's a lot of variance in the function signatures of the various listSomething()/describeSomething() commands, and listDbRoleSettings requires a second pattern to be scanned, and as far as I know PsqlScanState isn't known inside describe.h, so building and using a hash table would be a lot of work for uncertain gain. The original code just plows through strings in alphabetical order, breaking things up by comparing leading characters, so I largely did the same at the des/decribe levels. Instead of a hash table, It might be fun to write
Re: Re: \describe*
> > > I agree with Andres and Robert. This patch should be pushed to PG13. > > I'll do that on March 8 unless there is a compelling argument not to. > > No objection. I'll continue to work on it, though.
Re: \describe*
> > >> - Tab completion for \descibe-verbose. >> I know that \d+ tab completion is also not there, but I think we must >> have tab completion for \descibe-verbose. >> >> postgres=# \describe- >> \describe-extension >> \describe-replication-publication \describe-user-mapping >> \describe-foreign-data-wrapper >> \describe-replication-subscription\describe-view >> \describe-foreign-server \describe-role >> \describe-window-function >> \describe-foreign-table \describe-rule >> ... >> > I just confirmed that there isn't tab completion for the existing S/+ options, so it's hard to justify them for the equivalent verbose suffixes. > (1 row) >> Invalid command \describe. Try \? for help. >> >> >> I think this status is causing the problem. >> >> >> >> + /* >> standard listing of interesting things */ >> + success = >> listTables("tvmsE", NULL, show_verbose, show_system); >> + } >> + status = PSQL_CMD_UNKNOWN; >> >> I'll look into this, thanks! > - Confusion about \desc and \desC >> There is confusion while running the \desc command. I know the problem, >> but the user may confuse by this. >> postgres=# \desC >>List of foreign servers >> Name | Owner | Foreign-data wrapper >> --+---+-- >> (0 rows) >> >> postgres=# \desc >> Invalid command \desc. Try \? for help. >> >> - Auto-completion of commands. >> There is some more confusion in the completion of commands. >> >> This command shows List of aggregates. >> postgres=# \describe-aggregate-function >> List of aggregate functions >> Schema | Name | Result data type | Argument data types | Description >> +--+--+-+- >> (0 rows) >> >> >> >> This command shows a list of relation "\d" >> postgres=# \describe-aggregatE-function >> List of relations >> Schema | Name | Type | Owner >> +--+---+- >> public | foo | table | vagrant >> (1 row) >> >> This command also shows a list of relations "\d". >> postgres=# \describe-aggr >> List of relations >> Schema | Name | Type | Owner >> +--+---+- >> public | foo | table | vagrant >> (1 row) >> >> This command shows error messages. >> postgres=# \descr >> Invalid command \descr. Try \? for help. >> >> I will look into it. > >> I have done a brief code review except for the documentation code. I >> don't like this code >> >> if (cmd_match(cmd,"describe-aggregate-function")) >> >> success = describeAggregates(pattern, show_verbose, show_system); >> else if (cmd_match(cmd, >> "describe-access-method")) >> success = describeAccessMethods(pattern, >> show_verbose); >> else if (cmd_match(cmd, >> "describe-tablespace")) >> success = describeTablespaces(pattern, >> show_verbose); >> else if (cmd_match(cmd, >> "describe-conversion")) >> success = listConversions(pattern, >> show_verbose, show_system); >> else if (cmd_match(cmd, "describe-cast")) >> success = listCasts(pattern, show_verbose >> >> >> This can be achieved with the list/array/hash table, so I have changed >> that code in the attached patch just for a sample if you want I can do that >> for whole code. >> > There's some problems with a hash table. The function signatures vary quite a lot, and some require additional psql_scan_slash_options to be called. The hash option, if implemented, probably should be expanded to all slash commands, at which point maybe it belongs in psqlscanslash.l... >
Re: Referential Integrity Checks with Statement-level Triggers
> > > In order to avoid per-row calls of the constraint trigger functions, we > could > try to "aggregate" the constraint-specific events somehow, but I think a > separate queue would be needed for the constraint-specific events. > > In general, the (after) triggers and constraints have too much in common, > so > separation of these w/o seeing code changes is beyond my imagination. > > Yeah, there's a lot of potential for overlap where a trigger could "borrow" an RI tuplestore or vice versa. The people who expressed opinions on nuking triggers from orbit (it's the only way to be sure) have yet to offer up any guidance on how to proceed from here, and I suspect it's because they're all very busy getting things ready for v12. I definitely have an interest in working on this for 13, but I don't feel good about striking out on my own without their input.
Re: some ri_triggers.c cleanup
> > Right, this makes a lot of sense, similar to how ri_restrict() combines > RESTRICT and NO ACTION. > I'm pretty sure that's where I got the idea, yes.
Re: \describe*
> > Given that this patch has been added to the last commitfest for v12, I > think we should mark it as targeting 13, so it can be skipped over by > people looking to get things into v12. Even leaving fairness aside, I > don't think it's likely to be ready quickly enough... > Obviously this patch is nowhere near the importance of most patches slated for v12, but I would hope it can be considered, time permitting. The size of the patch may look large (1036 lines), but 650+ of that is pure documentation changes, ~50 lines of added autocomplete strings, ~140 lines are added TailMatches calls (one per new autocomplete string), and what remains is strncmp() calls to match those same strings, so it's pretty mild in terms of impact.
Re: some ri_triggers.c cleanup
On Fri, Feb 22, 2019 at 1:12 PM Corey Huinker wrote: > On Fri, Feb 22, 2019 at 11:05 AM Peter Eisentraut < > peter.eisentr...@2ndquadrant.com> wrote: > >> ri_triggers.c is endlessly long and repetitive. I want to clean it up a >> bit (more). >> > > Having just been down this road, I agree that a lot of cleanup is needed > and possible. > > >> I looked into all these switch cases for the unimplemented MATCH PARTIAL >> option. I toyed around with how a MATCH PARTIAL implementation would >> actually look like, and it likely wouldn't use the existing code >> structure anyway, so let's just simplify this for now. >> > > +1 > > > >> Attached are some patches. > > > I intend to look this over in much greater detail, but I did skim the code > and it seems like you left the SET DEFAULT and SET NULL paths separate. In > my attempt at statement level triggers I realized that they only differed > by the one literal value, and parameterized the function. > > I've looked it over more closely now and I think that it's a nice improvement. As I suspected, the code for SET NULL and SET DEFAULT are highly similar (see .diff), the major difference being two constants, the order of some variable declarations, and the recheck in the set-default case. The changes were so simple that I felt remiss not adding the patch for you (see .patch). Passes make check. diff --git a/set_null.c b/set_default.c index bc323ec..b2dd91d 100644 --- a/set_null.c +++ b/set_default.c @@ -1,10 +1,10 @@ /* - * ri_setnull - + * ri_setdefault - * - * Common code for ON DELETE SET NULL and ON UPDATE SET NULL + * Common code for ON DELETE SET DEFAULT and ON UPDATE SET DEFAULT */ static Datum -ri_setnull(TriggerData *trigdata) +ri_setdefault(TriggerData *trigdata) { const RI_ConstraintInfo *riinfo; Relationfk_rel; @@ -30,10 +30,10 @@ ri_setnull(TriggerData *trigdata) elog(ERROR, "SPI_connect failed"); /* - * Fetch or prepare a saved plan for the set null operation (it's - * the same query for delete and update cases) + * Fetch or prepare a saved plan for the set default operation + * (it's the same query for delete and update cases) */ -ri_BuildQueryKey(, riinfo, RI_PLAN_SETNULL_DOUPDATE); +ri_BuildQueryKey(, riinfo, RI_PLAN_SETDEFAULT_DOUPDATE); if ((qplan = ri_FetchPreparedPlan()) == NULL) { @@ -44,12 +44,12 @@ ri_setnull(TriggerData *trigdata) charparamname[16]; const char *querysep; const char *qualsep; -const char *fk_only; Oid queryoids[RI_MAX_NUMKEYS]; +const char *fk_only; /* -- * The query string built is - * UPDATE [ONLY] SET fkatt1 = NULL [, ...] + * UPDATE [ONLY] SET fkatt1 = DEFAULT [, ...] * WHERE $1 = fkatt1 [AND ...] * The type id's for the $ parameters are those of the * corresponding PK attributes. @@ -57,9 +57,9 @@ ri_setnull(TriggerData *trigdata) */ initStringInfo(); initStringInfo(); +quoteRelationName(fkrelname, fk_rel); fk_only = fk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ? "" : "ONLY "; -quoteRelationName(fkrelname, fk_rel); appendStringInfo(, "UPDATE %s%s SET", fk_only, fkrelname); querysep = ""; @@ -72,9 +72,10 @@ ri_setnull(TriggerData *trigdata) quoteOneName(attname, RIAttName(fk_rel, riinfo->fk_attnums[i])); appendStringInfo(, - "%s %s = NULL", + "%s %s = DEFAULT", querysep, attname); sprintf(paramname, "$%d", i + 1); +sprintf(paramname, "$%d", i + 1); ri_GenerateQual(, qualsep, paramname, pk_type, riinfo->pf_eq_oprs[i], @@ -104,5 +105,20 @@ ri_setnull(TriggerData *trigdata) table_close(fk_rel, RowExclusiveLock); -return PointerGetDatum(NULL); +/* + * If we just deleted or updated the PK row whose key was equal to + * the FK columns' default values, and a referencing row exists in + * the FK table, we would have updated that row to the same values + * it already had --- and RI_FKey_fk_upd_check_required would + * hence believe no check is necessary. So we need to do another + * lookup now and in case a reference still exists, abort the + * operation. That is already implemented in the NO ACTION + * trigger, so just run it. (This recheck is only needed in the + * SET DEFAULT case, since CASCADE would remove such rows in case + * of a
Re: some ri_triggers.c cleanup
On Fri, Feb 22, 2019 at 11:05 AM Peter Eisentraut < peter.eisentr...@2ndquadrant.com> wrote: > ri_triggers.c is endlessly long and repetitive. I want to clean it up a > bit (more). > Having just been down this road, I agree that a lot of cleanup is needed and possible. > I looked into all these switch cases for the unimplemented MATCH PARTIAL > option. I toyed around with how a MATCH PARTIAL implementation would > actually look like, and it likely wouldn't use the existing code > structure anyway, so let's just simplify this for now. > +1 > Attached are some patches. I intend to look this over in much greater detail, but I did skim the code and it seems like you left the SET DEFAULT and SET NULL paths separate. In my attempt at statement level triggers I realized that they only differed by the one literal value, and parameterized the function.
Re: Referential Integrity Checks with Statement-level Triggers
> > > While the idea to use the transition table is good, this approach probably > requires the trigger engine (trigger.c) to be adjusted, and that in a > non-trivial way. > It probably does. Several people with advanced knowledge of trigger.c expressed a desire to rebuild trigger.c from the ground up, and with it create case-specific tuplestores for handling referential integrity constraints, which would be lighter than either the transition tables or the per-row invocation of a trigger. After all, we need a RI check to happen, we don't need it to happen *through a trigger function*. I'm also not sure if it's o.k. that performance related patch potentially > makes performance worse in some cases. If FK violations are checked at > statement boundary, the wasted effort / time can (at least in theory) be > high > if early rows violate the FK. > That concern was also expressed with varying levels of alarm in their voices. Have you considered bulk processing of individual rows by row-level trigger? > For IMMEDIATE constraints we'd have to ensure that the trigger is notified > that the current row is the last one from the current query, but that might > not be difficult. > I'm not sure I understand what you're suggesting, but if it keeps the overhead of one trigger firing per row deleted, then it doesn't seem like much of a win. Given that this patch has been punted to v13, I'd like to instead look at how we might go about building up the transition tuplestores for the specific purpose of doing the RI checks, not just deletes, and executing those at the appropriate time, rather than trying to make our needs fit into trigger form.
Re: Alternative to \copy in psql modelled after \g
> Otherwise "\g -" looks good as a portable solution. +1
Re: \describe*
Attached is a patch to add verbose \describe commands to compliment our existing but slightly cryptic family of \d commands. The goals of this are: - aid user discovery of \d-commands via tab completion - make scripts and snippets slightly more self-documenting and understandable - save experienced users that 0.22 seconds where they try to remember what \dFpS+ means or which command lists user mappings. DESIGN CHOICES: Every new command is of the form \describe-some-system-object-type[-system][-verbose]. The -system suffix stands in for the 'S' suffix and -verbose stands in for '+'. New commands used the singular form, not plural. Every new command has a direct analog \d-command, but the reverse is not always true, especially when it comes to the commands that can specify multiple object types. In those cases, there are multiple long versions that correspond to several singular parameters (\describe-view, \describe-materialized-view, \describe-index, etc) but no combinatorics (i.e. no \describe-view-and-foreign-table). There is a \describe-schema and \describe-namespace, both of which perform \dn. There is a \describe-role but no \describe-user or \describe-database-role. I chose \describe-privilege for \dp I chose \describe-type for \dT instead of \describe-data-type. The command \describe-aggregate-function is \dfa, whereas \describe-aggregate is \da. NOTES: There is currently nothing stopping you from using the short form suffixes on long form commands, but the reverse isn't true. For example, you can type \describe-functionS+ and it'll work, but \df-verbose will not. I allow this mostly because it would take work to prevent it. Documentation XML was updated but not formatted to make the diff easier to read. No regression cases were added. Currently our coverage of \d commands in psql ifself is quite minimal: ~/src/postgres$ grep '\\d' src/test/regress/sql/psql.sql | sort | uniq \copyright \dt arg1 \e arg1 arg2 \df exp \d psql_serial_tab_id_seq but perhaps we could test it indirectly in these other areas: ~/src/postgres/src/test/regress/sql$ grep '\\d' * | sed -e 's/^.*\\d/\\d/g' -e 's/ .*//g' | sort | uniq -c 156 \d 2 \d' 1 \d*', 157 \d+ 1 \d{4})', 1 \da 2 \d+):([a-zA-Z0-9+/=]+)\$([a-zA-Z0-9+=/]+):([a-zA-Z0-9+/=]+)', 4 \des 8 \des+ 1 \det+ 4 \deu 6 \deu+ 1 \dew 14 \dew+ 21 \df 1 \dfn 1 \dfp 4 \dp 4 \dRp 6 \dRp+ 2 \dRs 3 \dRs+ 2 \dt On Mon, Jan 29, 2018 at 9:56 AM David Fetter wrote: > On Mon, Jan 29, 2018 at 02:51:53PM +, Ryan Murphy wrote: > > > > > > >What I propose is in fact a server command, >which at least three of > > > >the other popular RDBMSs already have. > > > > > Well to actually implement it, it would probably be a client command, > > because that's what \d* are. > > Why should this command be silo'ed off to the psql client? If it's a > server command, it's available to all clients, not just psql. > > > We would most likely want them implemented the same, to avoid > > needless complexity. > > We could certainly have \d call DESCRIBE for later versions of the > server. \ commands which call different SQL depending on server > version have long been a standard practice. > > > I think people are more ok with \describe (with the backslash), which > seems > > like what you're suggesting anyway. I read Vik's "hard pass" as being on > > having DESCRIBE which looks like an SQL command but would actually be > > implemented on the client. This seems simpler at first but could cause > > deep confusion later. > > If we implement \d as DESCRIBE for server versions as of when DESCRIBE > is actually implemented, we've got wins all around. > > Best, > David. > -- > David Fetter http://fetter.org/ > Phone: +1 415 235 3778 > > Remember to vote! > Consider donating to Postgres: http://www.postgresql.org/about/donate > From e67e61ae789b09c98fe03378c819224d838c2f65 Mon Sep 17 00:00:00 2001 From: Corey Huinker Date: Fri, 25 Jan 2019 00:57:23 + Subject: [PATCH] Add \describe commands to compliment \d commands --- doc/src/sgml/ref/psql-ref.sgml | 175 - src/bin/psql/command.c | 132 - src/bin/psql/describe.c| 13 ++- src/bin/psql/describe.h| 3 + src/bin/psql/tab-complete.c| 135 - 5 files changed, 381 insertions(+), 77 deletions(-) diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index 6c76cf2f00..363d6d9678 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -871,6 +871,17 @@ testdb= same line. + +The family of meta-commands starting with \d often +have a
Re: Referential Integrity Checks with Statement-level Triggers
Attached is a patch that refactors DELETE triggers to fire at the statement level. I chose delete triggers partly out of simplicity, and partly because there some before/after row linkage in the ON UPDATE CASCADE cases where statement level triggers might not be feasible as we have currently implemented them. After having done the work, I think INSERT triggers would be similarly straightforward, but wanted to limit scope. Also, after having stripped the delete cases out of the update-or-delete functions, it became obvious that the on-update-set-null and on-update-set-default cases differed by only 3-4 lines, so those functions were combined. On a vagrant VM running on my desktop machine, I'm seeing a speed-up of about 25% in the benchmark provided. I think that figure is cloudy and below my expectations. Perhaps we'd get a much better picture of whether or not this is worth it on a bare metal machine, or at least a VM better suited to benchmarking. Currently 4 make-check tests are failing. Two of which appear to false positives (the test makes assumptions about triggers that are no longer true), and the other two are outside the scope of this benchmark so I'll revisit them if we go forward. ri-set-logic.sql is an edited benchmark script adapted from Kevin Grittner's benchmark that he ran against hand-rolled triggers and posted on 2016-11-02 ri_test.out is a copy paste of two runs of the benchmark script. Many thanks to everyone who helped, often despite their own objections to the overall reasoning behind the endeavor. I'm aware that a large contingent of highly experienced people would very much like to replace our entire trigger architecture, or at least divorce RI checks from triggers. Maybe this patch spurs on that change. Even if nothing comes of it, it's been a great learning experience. On Sat, Dec 22, 2018 at 11:28 AM Emre Hasegeli wrote: > > It is far from a premature optimization IMO, it is super useful and > something I was hoping would happen ever since I heard about transition > tables being worked on. > > Me too. Never-ending DELETEs are a common pain point especially for > people migrated from MySQL which creates indexes for foreign keys > automatically. > From 8a73f9233211076421a565b5c90ecd029b5e6581 Mon Sep 17 00:00:00 2001 From: vagrant Date: Wed, 23 Jan 2019 16:59:17 + Subject: [PATCH] Change Delete RI triggers to Statement-Level Triggers --- src/backend/commands/tablecmds.c| 9 +- src/backend/commands/trigger.c | 2 + src/backend/utils/adt/ri_triggers.c | 779 3 files changed, 566 insertions(+), 224 deletions(-) diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 28a137bb53..21f5bf94a4 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -8954,6 +8954,11 @@ createForeignKeyActionTriggers(Relation rel, Oid refRelOid, Constraint *fkconstr { CreateTrigStmt *fk_trigger; + TriggerTransition *del = makeNode(TriggerTransition); + del->name = "pg_deleted_transition_table"; + del->isNew = false; + del->isTable = true; + /* * Build and execute a CREATE CONSTRAINT TRIGGER statement for the ON * DELETE action on the referenced table. @@ -8961,11 +8966,11 @@ createForeignKeyActionTriggers(Relation rel, Oid refRelOid, Constraint *fkconstr fk_trigger = makeNode(CreateTrigStmt); fk_trigger->trigname = "RI_ConstraintTrigger_a"; fk_trigger->relation = NULL; - fk_trigger->row = true; + fk_trigger->row = false; fk_trigger->timing = TRIGGER_TYPE_AFTER; fk_trigger->events = TRIGGER_TYPE_DELETE; fk_trigger->columns = NIL; - fk_trigger->transitionRels = NIL; + fk_trigger->transitionRels = list_make1(del); fk_trigger->whenClause = NULL; fk_trigger->isconstraint = true; fk_trigger->constrrel = NULL; diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c index 7ffaeaffc6..080587215f 100644 --- a/src/backend/commands/trigger.c +++ b/src/backend/commands/trigger.c @@ -510,7 +510,9 @@ CreateTrigger(CreateTrigStmt *stmt, const char *queryString, * * Currently this is enforced by the grammar, so just Assert here. */ + /* Assert(!stmt->isconstraint); + */ if (tt->isNew) { diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c index e1aa3d0044..6f89ab4c77 100644 --- a/src/backend/utils/adt/ri_triggers.c +++ b/src/backend/utils/adt/ri_triggers.c @@ -194,9 +194,10 @@ static int ri_constraint_cache_valid_count = 0; static bool ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel, HeapTuple old_row, const RI_ConstraintInfo *riinfo); -static Datum ri_restrict(TriggerData *trigdata, bool is_no_action); -static Datum ri_setnull(TriggerData *trigdata); -static Datum ri_setdefault(TriggerData *trigdata); +static Datum ri_on_update_restrict(TriggerData *trigdata, bool is_no_action); +static Datum ri_on_delete_restrict(TriggerData *trigdata, bool is_no_action); +static
Re: Statement-level Triggers For Uniqueness Checks
On Fri, Jan 4, 2019 at 7:49 AM Peter Eisentraut wrote: > > On 25/12/2018 00:56, Corey Huinker wrote: > > The regression diff (attached) seems to imply that the triggers simply > > are not firing, though. > > The reason for this was explained by Dean. If you take out the check > that he mentioned, then your trigger fires but crashes. In your changed > unique_key_recheck(), "slot" is not initialized before use (or ever). Thanks. I'll be revisiting this shortly. Dean's information made me think the potential for a gain is smaller than initially imagined.
Re: Statement-level Triggers For Uniqueness Checks
So I took a first pass at this, and I got stuck. The basic design is that instead of creating one row-level trigger per deferrable unique constraint, we instead create one insert-statement level trigger and one update-statement level trigger. Both call the function unique_key_recheck(), which now attempts to walk the inserted transition table, doing basically the same checks that were done in the per-row trigger. I'm hoping for some performance advantage for large row inserts/updates due to N-1 fewer triggers firing and N-1 attempts to lock the unique index. The regression diff (attached) seems to imply that the triggers simply are not firing, though. I have verified that the triggers are created: test=# CREATE TEMPORARY TABLE test ( x integer PRIMARY KEY DEFERRABLE INITIALLY DEFERRED ); CREATE TABLE test=# SELECT * FROM pg_trigger WHERE oid tgconstrainttgdeferrabletginitdeferred tgnargs tgqual tgargs tgconstrindid tgenabled tgisinternaltgnewtable tgrelid tgattr tgconstrrelid tgfoid tgname tgoldtable tgtype test=# SELECT * FROM pg_trigger WHERE tgrelid = 'test'::regclass; oid | tgrelid |tgname| tgfoid | tgtype | tgenabled | tgisinternal | tgconstrrelid | tgconstrindid | tgconstraint | t gdeferrable | tginitdeferred | tgnargs | tgattr | tgargs | tgqual | tgoldtable | tgnewtable ---+-+--+++---+--+---+---+--+-- ++-+++++-- 16392 | 16387 | PK_ConstraintTrigger_i_16392 | 1250 | 4 | O | t| 0 | 16390 |16391 | t | t | 0 || \x || | pg_inserted_transition_table 16393 | 16387 | PK_ConstraintTrigger_u_16393 | 1250 | 16 | O | t| 0 | 16390 |16391 | t | t | 0 || \x || | pg_inserted_transition_table (2 rows) Any idea where I went wrong? On Mon, Dec 17, 2018 at 9:56 AM Corey Huinker wrote: > In digging around the codebase (see thread: Referential Integrity Checks > with Statement-level Triggers), I noticed that unique constraints are > similarly enforced with a per-row trigger. > > The situation with unique indexes is fairly similar to the situation with > RI checks: there is some overhead to using a transition table, but that > overhead may be less than the cost of firing a trigger once per row > inserted/updated. > > However, there are some significant differences (apologies to everyone > already familiar with this part of the code, it's new to me). > > For one, there is no analog to RI_Initial_Check(). Instead the constraint > is initially checked via building/finding the unique index that would > enforce the uniqueness check. > > Then, the actual lookup done in unique_key_recheck has to contend with the > intricacies of HOT updates, so I don't know if that can be expressed in an > SPI query. Even if not, I think it should be possible to iterate over > the EphemeralNamedRelation and that would result itself have a payoff in > reduced trigger calls. > > I'm going to be working on this as a POC patch separate from the RI work, > hence the separate thread, but there's obviously a lot of overlap. > > All advice is appreciated. > > regression.diffs Description: Binary data 0001-Refactor-per-row-unique-key-deferred-constraint-trig.patch Description: Binary data
Statement-level Triggers For Uniqueness Checks
In digging around the codebase (see thread: Referential Integrity Checks with Statement-level Triggers), I noticed that unique constraints are similarly enforced with a per-row trigger. The situation with unique indexes is fairly similar to the situation with RI checks: there is some overhead to using a transition table, but that overhead may be less than the cost of firing a trigger once per row inserted/updated. However, there are some significant differences (apologies to everyone already familiar with this part of the code, it's new to me). For one, there is no analog to RI_Initial_Check(). Instead the constraint is initially checked via building/finding the unique index that would enforce the uniqueness check. Then, the actual lookup done in unique_key_recheck has to contend with the intricacies of HOT updates, so I don't know if that can be expressed in an SPI query. Even if not, I think it should be possible to iterate over the EphemeralNamedRelation and that would result itself have a payoff in reduced trigger calls. I'm going to be working on this as a POC patch separate from the RI work, hence the separate thread, but there's obviously a lot of overlap. All advice is appreciated.
Referential Integrity Checks with Statement-level Triggers
Back when Pg added statement-level triggers, I was interested in the potential promise of moving referential integrity checks to statement-level triggers. The initial conversation, along with Kevin Grittner's POC script (in SQL) that showed a potential for a 98% reduction in time spent doing RI checks. The original thread is here: https://www.postgresql.org/message-id/CACjxUsM4s9%3DCUmPU4YFOYiD5f%3D2ULVDBjuFSo20Twe7KbUe8Mw%40mail.gmail.com I dug around in the code, and was rather surprised at how close we already are to implementing this. The function RI_Initial_Check() already does a left-join query via SPI to look for any invalid data, so if we could just replace the near table with the transition table for inserted rows, we'd be home free. The function SPI_register_trigger_data() makes the transition tables visible to SPI, so I started to wonder why this hadn't be done already. I approached Kevin and Thomas Munro seeking feedback on my approach. I also made it into a session at the PgConf.ASIA un-conference, and then later with Michael Paquier at that same conference, and the coalesced feedback was this: - the overhead of registering the transition tables probably makes it unprofitable for single row inserts - the single row overhead is itself significant, so maybe the transition tables aren't worse - there has been talk of replacing transition tables with an in-memory data structure that would be closer to "free" from a startup perspective and might even coalesce the transition tables of multiple statements in the same transaction - because no declarative code changes, it's trivial to switch from row level to statement level triggering via pg_upgrade - assuming that transition tables are an overhead that only pays off when > N rows have been updated, does it make sense to enforce RI with something that isn't actually a trigger? - there was also some mention that parallel query uses a queue mechanism that might be leveraged to do row-level triggers for updates of <= N rows and statement level for > N That's what I have so far. I'm going to be working on a POC patch so that I can benchmark a pure-statement-level solution, which if nothing else will let us know the approximate value of N. All suggestions are appreciated.
Re: csv format for psql
On Sun, Nov 25, 2018 at 11:23 PM Tom Lane wrote: > Corey Huinker writes: > > Could we have another output type called "separated" that uses the > existing > > --fieldsep / --recordsep? > > Uh, what's the difference from the existing unaligned format? > No footer and I guess we'd want to escape instances of fieldsep and recordsep in the data, so I guess if we had an option to escape instances of fieldsep/recordsep found in the data, unaligned would work fine.
Re: csv format for psql
> > > Or we could kill both issues by hard-wiring the separator as ','. +1 I've never encountered a situation where a customer wanted a custom delimiter AND quoted strings. So either they wanted pure CSV or a customed TSV. Could we have another output type called "separated" that uses the existing --fieldsep / --recordsep? Word will get out that csv is faster, but we'd still have the flexibility if somebody really wanted it.
Re: Desirability of client-side expressions in psql?
> > >>psql> \if :i >= 5 > >> > > I think we're ok with that so long as none of the operators or values > has a > > \ in it. > > What barriers do you see to re-using the pgbench grammar? > > The pgbench expression grammar mimics SQL expression grammar, > on integers, floats, booleans & NULL. > > I'm unsure about some special cases in psql (`shell command`, > 'text' "identifier"). They can be forbidden on a new commande (\let), > but what happens on "\if ..." which I am afraid allows them is unclear. > > -- > Fabien. > (raising this thread from hibernation now that I have the bandwidth) It seems like the big barriers to just using pgbench syntax are: - the ability to indicate that the next thing to follow will be a pgbench expression - a way to coax pgbench truth-y values into psql truthy values (t/f, y/n, 1/0) For that, I see a few ways forward: 1. A suffix on \if, \elif, -exp suffix (or even just -x) to existing commands to indicate that a pgbench expression would follow This would look something like \ifx \elifx \setx \if$ \elif$ \set$ 2. A command-line-esque switch or other sigil to indicate that what follows is a pgbench expression with psql vars to interpolate Example: \set foo -x 1 + 4 \set foo \expr 1 + 4 \if -x :limit > 10 \if \expr :limit > 10 3. A global toggle to indicate which mode should be used by \if, \elif, and \set Example: \pset expressions [on | off] 4. A combination of #2 and #3 with a corresponding switch/sigil to indicate "do not evaluate pgbench-style This is particularly appealing to me because it would allow code snippets from pgbench to be used without modification, while still allowing the user to mix-in old/new style to an existing script. 5. A special variant of `command` where variables are interpolated before being sent to the OS, and allow that on \if, \elif \set foo ``expr :y + :z`` \set foo $( expr :y + :z ) \if ``expr :limit > 10`` \if $( expr :limit > 10 ) This also has some appeal because it allows for a great amount of flexibility, but obviously constrains us with OS-dependencies. The user might have a hard time sending commands with ')' in them if we go the $( ) route 6. Option #5, but we add an additional executable (suggested name: pgexpr) to the client libs, which encapsulates the pgbench expression library as a way around OS-dependent code. 7. I believe someone suggested introducing the :{! pgbench-command} or :{{ pgbench-command }} var-mode \set foo :{! :y + :z } \set foo :{{ :y + :z }} \if :{! :limit > 10 } \if :{{ :limit > 10 }} This has some appeal as well, though I prefer the {{...}} syntax because "!" looks like negation, and {{ resembles the [[ x + y ]] syntax in bash One nice thing is that most of these options are not mutually exclusive. Thoughts?
Re: [HACKERS] generated columns
> > > 3. Radical alternative: Collapse everything into one new column. We > > could combine atthasdef and attgenerated and even attidentity into a new > > column. (Only one of the three can be the case.) This would give > > client code a clean break, which may or may not be good. The > > implementation would be uglier than #1 but probably cleaner than #2. We > > could also get 4 bytes back per pg_attribute row. > > > > I'm happy with the current choice #1, but it's worth thinking about. > > #3 looks very appealing in my opinion as those columns have no overlap, > so it would take five possible values: > Could the removed columns live on...as generated-always columns?
Re: partitioned tables referenced by FKs
> > > > 1. it seems that we will continue to to per-row RI checks for inserts and > > updates. However, there already exists a bulk check in > RI_Initial_Check(). > > Could we modify this bulk check to do RI checks on a per-statement basis > > rather than a per-row basis? > > One of the goals when implementing trigger transition tables was to > supplant the current per-row implementation of RI triggers with > per-statement. I haven't done that, but AFAIK it remains possible :-) > > Changing that is definitely not a goal of this patch. > Then I may try to tackle it myself in a separate thread. Without an implementation, I can't say, but if I had to guess, I would > assume so. Or maybe there are clever optimizations for that particular > case. > But in this case there is no actual defined trigger, it's internal code making an SPI call...is there an indicator that tells us whether this change was multi-row or not?
Re: partitioned tables referenced by FKs
On Fri, Nov 2, 2018 at 7:42 PM Alvaro Herrera wrote: > Here's a patch to allow partitioned tables to be referenced by foreign > keys. Current state is WIP, but everything should work; see below for > the expected exception. > > The design is very simple: have one pg_constraint row for each partition > on each side, each row pointing to the topmost table on the other side; > triggers appear on each leaf partition (and naturally they don't appear > on any intermediate partitioned table). > This is an important and much needed feature! Based on my extremely naive reading of this code, I have two perhaps equally naive questions: 1. it seems that we will continue to to per-row RI checks for inserts and updates. However, there already exists a bulk check in RI_Initial_Check(). Could we modify this bulk check to do RI checks on a per-statement basis rather than a per-row basis? 2. If #1 is possible, is the overhead of transitions tables too great for the single-row case?
Re: COPY FROM WHEN condition
> > > > SELECT x.a, sum(x.b) > > FROM ( COPY INLINE '/path/to/foo.txt' FORMAT CSV ) as x( a integer, b > numeric, c text, d date, e json) ) > > Apologies for bike-shedding, but wouldn't the following be a better > fit with the current COPY? > > COPY t(a integer, b numeric, c text, d date, e json) FROM > '/path/to/foo.txt' WITH (FORMAT CSV, INLINE) > +1 Very much a better fit. > >
Re: COPY FROM WHEN condition
> > > Are you thinking something like having a COPY command that provides > > results in such a way that they could be referenced in a FROM clause > > (perhaps a COPY that defines a cursor…)? > > That would also be nice, but what I was thinking of was that some > highly restricted subset of cases of SQL in general could lend > themselves to levels of optimization that would be impractical in > other contexts. > If COPY (or a syntactical equivalent) can return a result set, then the whole of SQL is available to filter and aggregate the results and we don't have to invent new syntax, or endure confusion whenCOPY-WHEN syntax behaves subtly different from a similar FROM-WHERE. Also, what would we be saving computationally? The whole file (or program output) has to be consumed no matter what, the columns have to be parsed no matter what. At least some of the columns have to be converted to their assigned datatypes enough to know whether or not to filter the row, but we might be able push that logic inside a copy. I'm thinking of something like this: SELECT x.a, sum(x.b) FROM ( COPY INLINE '/path/to/foo.txt' FORMAT CSV ) as x( a integer, b numeric, c text, d date, e json) ) WHERE x.d >= '2018-11-01' In this case, there is the *opportunity* to see the following optimizations: - columns c and e are never referenced, and need never be turned into a datum (though we might do so just to confirm that they conform to the data type) - if column d is converted first, we can filter on it and avoid converting columns a,b - whatever optimizations we can infer from knowing that the two surviving columns will go directly into an aggregate If we go this route, we can train the planner to notice other optimizations and add those mechanisms at that time, and then existing code gets faster. If we go the COPY-WHEN route, then we have to make up new syntax for every possible future optimization.
Re: date_trunc() in a specific time zone
> > >> A use case that I see quite a lot of is needing to do reports and other > >> calculations on data per day/hour/etc but in the user's time zone. The > >> way to do that is fairly trivial, but it's not obvious what it does so > >> reading queries becomes just a little bit more difficult. > > +1 A client encountered this exact problem last week, and I was surprised that the parameter didn't already exist.
Re: CopyFrom() has become way too complicated
> > I think the code needs to be split up so that CopyFrom() in the loop > body calls CopyFromOneTuple(), which then also splits out the tuple > routing into its own CopyFromOneTupleRoute() function (that's 200 LOC on > its own...). I suspect it'd also be good to refactor the > partition-change code out into its own function. > +1 I had a hard time with this when doing my copy_srf() misadventure.
Re: COPY FROM WHEN condition
On Thu, Oct 11, 2018 at 5:04 AM Surafel Temesgen wrote: > > > On Thu, Oct 11, 2018 at 12:00 PM Christoph Moench-Tegeder < > c...@burggraben.net> wrote: > >> You can: >> COPY ( query ) TO 'filename'; >> > it is for COPY FROM > > regards > Surafel > It didn't get far, but you may want to take a look at a rejected patch for copy_srf() (set returning function) https://www.postgresql.org/message-id/CADkLM%3DdoeiWQX4AGtDNG4PsWfSXz3ai7kY%3DPZm3sUhsUeev9Bg%40mail.gmail.com https://commitfest.postgresql.org/12/869/ Having a set returning function gives you the full expressiveness of SQL, at the cost of an extra materialization step.
Re: Why we allow CHECK constraint contradiction?
On Wed, Oct 10, 2018 at 1:44 AM David G. Johnston < david.g.johns...@gmail.com> wrote: > On Tuesday, October 9, 2018, Imai, Yoshikazu < > imai.yoshik...@jp.fujitsu.com> wrote: >> >> Are there any rows which can satisfy the ct's CHECK constraint? If not, >> why we >> allow creating table when check constraint itself is contradicted? >> > > I'd bet on it being a combination of complexity and insufficient expected > benefit. Time is better spent elsewhere. Mathmatically proving a > contradiction in software is harder than reasoning about it mentally. > I've actually used that as a feature, in postgresql and other databases, where assertions were unavailable, or procedural code was unavailable or against policy. Consider the following: CREATE TABLE wanted_values ( x integer ); INSERT INTO wanted_values VALUES (1), (2), (3); CREATE TABLE found_values ( x integer ); INSERT INTO found_values VALUES (1), (3); CREATE TABLE missing_values ( x integer, CONSTRAINT contradiction CHECK (false) ); INSERT INTO missing_values SELECT x FROM wanted_values EXCEPT SELECT x FROM found_values; gives the error ERROR: new row for relation "missing_values" violates check constraint "contradiction" DETAIL: Failing row contains (2). Which can be handy when you need to fail a transaction because of bad data and don't have branching logic available.
Re: CREATE ROUTINE MAPPING
On Mon, Sep 10, 2018 at 3:28 AM Hannu Krosing wrote: > Hi Corey > > Have you looked at pl/proxy ? > I have, a long while ago. > It does this and then some (sharding) > PL/proxy isn't a part of the SQL Standard. PL/proxy only connects to other libpq-speaking databases. The hope with routine mapping is that other data sources that do not easily conform to a rows-and-columns metaphor can still expose their data to postgresql.
Re: Alter index rename concurrently to
> > You appear to be saying that you think that renaming an index > concurrently is not safe. In that case, this patch should be rejected. > However, I don't think it necessarily is unsafe. What we need is some > reasoning about the impact, not a bunch of different options that we > don't understand. > I've had this same need, and dreamed this same solution before. I also thought about a syntax like ALTER INDEX foo RENAME TO WHATEVER-IT-WOULD-HAVE-BEEN-NAMED-BY-DEFAULT to aid this situation. But all of those needs fade if we have REINDEX CONCURRENTLY. I think that's where we should focus our efforts. A possible side effort into something like a VACUUM FULL CONCURRENTLY, which would essentially do what pg_repack does, but keeping the same oid and the stats that go with it, but even that's a nice-to-have add-on to REINDEX CONCURRENTLY.
Re: Desirability of client-side expressions in psql?
> > >psql> \if :i >= 5 > > I think we're ok with that so long as none of the operators or values has a \ in it. What barriers do you see to re-using the pgbench grammar?
Re: Query Rewrite for Materialized Views (Postgres Extension)
> > Hope it is useful or interesting for someone! Questions or comments are >> very welcome. >> > > good idea. > > Regards > > Pavel > In a recent PgConf NYC presentation [1] I was talking about the technical hurdles to implementing materialized views that could be kept up to date at all times, and the benefits of having such a thing. Some use cases can be addressed with eventually-consistent derivative table structures (Vertica's projections, PipelineDB's continuous views, etc), but those methods rely on the source data never having deletes or updates, or confining those updates to the "hot" part of the source tables, so it generally works for time-series data, but not for other cases. It has occurred to me that Dave Fetter's work on ASSERTIONS [2] has common underpinnings with true continuous materialized views. In both cases, the creation of a system object causes the creations of insert/update/delete triggers on one or more existing tables. In the case of assertions, those triggers are run with the goal of raising an error if rows are returned from a query. In the case of a materialized view, those same triggers would be used to delete rows from a CMV and insert replacements rows. If we can get always-up-to-date materialized views, then Denty's work on query rewrite would have greatly enhanced utility. [1] https://postgresconf.org/conferences/2018/program/proposals/a-roadmap-to-continuous-materialized-views-b4644661-8d5a-4186-8c17-4fb82600e147 [2] http://databasedoings.blogspot.com/2018/06/ive-posted-my-slides-for-my-asssertions.html
Re: [RFC] Add an until-0 loop in psql
On Mon, Apr 30, 2018 at 7:05 AM Pierre Ducroquet < pierre.ducroq...@people-doc.com> wrote: > On Monday, April 30, 2018 1:01:25 PM CEST Daniel Verite wrote: > > Corey Huinker wrote: > > > As of v11, DO blocks can do transactions. I think this will meet your > > > needs. > > They do support COMMIT and ROLLBACK in the current > > development tree, but not VACUUM as in Pierre's example. > > > > postgres=# \echo :SERVER_VERSION_NAME > > 11devel > > > > postgres=# do ' begin vacuum; end '; > > ERROR:VACUUM cannot be executed from a function > > CONTEXT: SQL statement "vacuum" > > PL/pgSQL function inline_code_block line 1 at SQL statement > > > > > > Best regards, > > Indeed, vacuum is going to be the biggest offender here, sadly. > One could work around this of course (on top of my head, using notify to > wake- > up another client that would launch the required vacuums…) > Being able to do transactions in DO blocks is a great new feature of v11 I > was > not aware of. But psql saw the addition of \if recently, so why not having > loops in there too ? (Something better than this hack of course, it was > just a > 10 minutes hack-sprint for a demo) > > Regards > > Pierre > Bummer about vacuum. If you dig into the very long discussion about \if (which, incidentally, started off as a 20-line command patch called \quit-if, so don't discount that your idea could take off), you'll see some of the problems with looping discussed, mostly about the issues I already alluded to (no concept of reading backwards on STDIN, scoping outside the current "file", ability of psql vars to contain executable \commands), you'll have a pretty good grasp of the places where psql would need changes. In the mean time, if you believe the table won't get much larger during the operation, you could use \gexec as a finite loop iterator SELECT count(*)::bigint / 1000 FROM big_table as num_iters \gset SELECT 'BEGIN', 'DELETE FROM big_table WHERE id IN (SELECT id FROM big_table WHERE bad = true LIMIT 1000)', 'VACUUM big_table', 'COMMIT' from generate_series(1,:num_iters) g \gexec If the number of rows increases, then your finite loop will fall short, and if something else deletes a bunch of rows, your loop will spin it's wheels a few times at the end, but it would do most of what you want.