Re: [HACKERS] A little RLS oversight?
On 25 July 2015 at 19:12, Joe Conway joe.con...@crunchydata.com wrote: On 07/22/2015 02:17 PM, Dean Rasheed wrote: Hmm, I think it probably ought to do more, based on whether or not RLS is being bypassed or in force-mode -- see the first few checks in get_row_security_policies(). Perhaps a new SQL-callable function exposing those checks and calling check_enable_rls(). It's probably still worth including the c.relrowsecurity = false check in SQL to save calling the function for the majority of tables that don't have RLS. Please see the attached patch and let me know what you think. I believe the only thing lacking is documentation for the two new user visible functions. Comments? I'm not convinced about exporting convert_table_name() from acl.c, particularly with such a non-descriptive name. It's only a couple of lines of code, so I think they may as well just be included directly in the new function, as seems to be common practice elsewhere. As it stands, passing checkAsUser = GetUserId() to check_enable_rls() will cause it to skip the check for row_security = OFF, and so it may falsely report that RLS is active when the user has bypassed it. To avoid that row_security_active() needs to pass checkAsUser = InvalidOid to check_enable_rls(). [ Actually there seem to be a few other callers of check_enable_rls() that suffer the same problem. I don't really understand the reasoning behind that check in check_enable_rls() - if the current user has the BYPASSRLS attribute and row_security = OFF, shouldn't RLS be disabled on every table no matter how it is accessed? ] I think it would be better if the security context check were part of this new function too. Right now that can't make any difference, since only the RI triggers set SECURITY_ROW_LEVEL_DISABLED and this function cannot be called in that code path, but it's possible that in the future other code might set SECURITY_ROW_LEVEL_DISABLED, so moving the check down guarantees that check_enable_rls()/row_security_active() always accurately return the RLS status for the table. While I was looking at it, I spotted a couple of other things to tidy up in existing related code: * The comment for GetUserIdAndSecContext() needed updating for the new RLS bit. * Calling GetUserIdAndSecContext() and then throwing away the user_id returned seems ugly. There is already a code style precedent for checking the other bits of SecurityRestrictionContext, so I've added a similar bit-testing function for SECURITY_ROW_LEVEL_DISABLED which makes this code a bit neater. Attached is an updated patch (still needs some docs for the functions). Regards, Dean rls-pg-stats.v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] checkpointer continuous flushing
Hello, Attached is very minor v5 update which does a rebase completes the cleanup of doing a full sort instead of a chuncked sort. Attached is an updated version of the patch which turns the sort option into a boolean, and also include the sort time in the checkpoint log. There is still an open question about whether the sorting buffer allocation is lost on some signals and should be reallocated in such event. In such case, probably the allocation should be managed from CheckpointerMain, and the lazy allocation could remain for other callers (I guess just initdb). More open questions: - best name for the flush option (checkpoint_flush_to_disk, checkpoint_flush_on_write, checkpoint_flush, ...) - best name for the sort option (checkpoint_sort, checkpoint_sort_buffers, checkpoint_sort_ios, ...) Other nice-to-have inputs: - tests on a non-linux system with posix_fadvise (FreeBSD? others?) - tests on a large dedicated box Attached are some scripts to help with testing, if someone's feels like that: - cp_test.sh: run some tests, to adapt to one's setup... - cp_test_count.pl: show percent of late transactions - avg.py: show stats about stuff sh grep 'progress: ' OUTPUT_FILE | cut -d' ' -f4 | avg.py *BEWARE* that if pgbench got stuck some 0 data are missing, look for the actual tps in the output file and for the line count to check whether it is the case... some currently submitted patch on pgbench helps, see https://commitfest.postgresql.org/5/199/ As this pgbench patch is now in master, pgbench is less likely to get stuck, but check nevertheless that the number of progress line matches the expected number. -- Fabien.diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index bbe1eb0..0257e34 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -2483,6 +2483,28 @@ include_dir 'conf.d' /listitem /varlistentry + varlistentry id=guc-checkpoint-sort xreflabel=checkpoint_sort + termvarnamecheckpoint_sort/varname (typebool/type) + indexterm + primaryvarnamecheckpoint_sort/ configuration parameter/primary + /indexterm + /term + listitem + para +Whether to sort buffers before writting them out to disk on checkpoint. +For a HDD storage, this setting allows to group together +neighboring pages written to disk, thus improving performance by +reducing random write activity. +This sorting should have limited performance effects on SSD backends +as such storages have good random write performance, but it may +help with wear-leveling so be worth keeping anyway. +The default is literalon/. +This parameter can only be set in the filenamepostgresql.conf/ +file or on the server command line. + /para + /listitem + /varlistentry + varlistentry id=guc-checkpoint-warning xreflabel=checkpoint_warning termvarnamecheckpoint_warning/varname (typeinteger/type) indexterm @@ -2504,6 +2526,24 @@ include_dir 'conf.d' /listitem /varlistentry + varlistentry id=guc-checkpoint-flush-to-disk xreflabel=checkpoint_flush_to_disk + termvarnamecheckpoint_flush_to_disk/varname (typebool/type) + indexterm + primaryvarnamecheckpoint_flush_to_disk/ configuration parameter/primary + /indexterm + /term + listitem + para +When writing data for a checkpoint, hint the underlying OS that the +data must be sent to disk as soon as possible. This may help smoothing +disk I/O writes and avoid a stall when fsync is issued at the end of +the checkpoint, but it may also reduce average performance. +This setting may have no effect on some platforms. +The default is literaloff/. + /para + /listitem + /varlistentry + varlistentry id=guc-min-wal-size xreflabel=min_wal_size termvarnamemin_wal_size/varname (typeinteger/type) indexterm diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml index e3941c9..eea6668 100644 --- a/doc/src/sgml/wal.sgml +++ b/doc/src/sgml/wal.sgml @@ -546,6 +546,29 @@ /para para + When hard-disk drives (HDD) are used for terminal data storage + xref linkend=guc-checkpoint-sort allows to sort pages + so that neighboring pages on disk will be flushed together by + chekpoints, reducing the random write load and improving performance. + If solid-state drives (SSD) are used, sorting pages induces no benefit + as their random write I/O performance is good: this feature could then + be disabled by setting varnamecheckpoint_sort/ to valueoff/. + It is possible that sorting may help with SSD wear leveling, so it may + be kept on that account. + /para + + para + On Linux and POSIX platforms, xref linkend=guc-checkpoint-flush-to-disk + allows to hint the OS that pages written on checkpoints must be flushed + to disk quickly.
Re: [HACKERS] Failing assertions in indxpath.c, placeholder.c and brin_minmax.c
Andreas Seltenreich seltenre...@gmx.de writes: when running my random query generator contraption[1] against the regression database of 9.5 or master, it occasionally triggers one of the following three assertions. Very very cool tool! Please keep doing that testing. The first two seem to be planner problems, so I'll take responsibility for digging into those. But the third appears to be plain old brain fade in the BRIN code. It can be reproduced by regression=# explain select * from brintest where int4col = NULL::integer::information_schema.cardinal_number; QUERY PLAN Bitmap Heap Scan on brintest (cost=52.01..56.02 rows=1 width=339) Recheck Cond: (int4col = ((NULL::integer)::information_schema.cardinal_number )::integer) - Bitmap Index Scan on brinidx (cost=0.00..52.01 rows=1 width=0) Index Cond: (int4col = ((NULL::integer)::information_schema.cardinal_nu mber)::integer) (4 rows) regression=# select * from brintest where int4col = NULL::integer::information_schema.cardinal_number; server closed the connection unexpectedly or you can do it like this: regression=# select * from brintest where int4col = (select NULL::integer); server closed the connection unexpectedly or you could do it with a join to a table containing some null values. You need some complication because if you just write a plain null literal: regression=# explain select * from brintest where int4col = NULL::integer; QUERY PLAN -- Result (cost=0.00..106.30 rows=1 width=339) One-Time Filter: NULL::boolean - Seq Scan on brintest (cost=0.00..106.30 rows=1 width=339) (3 rows) the planner knows that int4eq is strict so it reduces the WHERE clause to constant NULL and doesn't bother with an indexscan. Bottom line is that somebody failed to consider the possibility of a null comparison value reaching the BRIN index lookup machinery. The code stanza that's failing supposes that only IS NULL or IS NOT NULL tests could have SK_ISNULL set, but that's just wrong. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improving replay of XLOG_BTREE_VACUUM records
On 2015-07-24 09:53:49 +0300, Heikki Linnakangas wrote: On 05/02/2015 02:10 AM, Jim Nasby wrote: This looks like a good way to address this until the more significant work can be done. I'm not a fan of RBM_ZERO_NO_BM_VALID; how about RBM_ZERO_BM_INVALID? or BM_NOT_VALID? Or maybe I'm just trying to impose too much English on the code; I see the logic to NO_BM_VALID... + * RBM_ZERO_NO_BM_VALID is the same as RBM_ZERO_AND_LOCK, but does not set + * BM_VALID bit before returning buffer so that noone could pin it. It would be better to explain why we want that mode. How about: RBM_ZERO_NO_BM_VALID is the same as RBM_ZERO_AND_LOCK but does not set BM_VALID before returning the buffer. This is done to ensure that no one can pin the buffer without actually reading the buffer contents in. This is necessary while replying XLOG_BTREE_VACUUM records in hot standby. That's a rather strange mode. The BM_VALID flag is internal to the buffer manager - if you don't know how the buffer manager works, you cannot understand that description. I couldn't quite understand what that means myself. What can or can you not do with a buffer that's not marked as BM_VALID? I'd suggest a mode like this instead: In RBM_IF_IN_CACHE mode, the page is returned if it is in the buffer cache already. If it's not, it is not read from disk, and InvalidBuffer is returned instead. To me it sounds like this shouldn't go through the full ReadBuffer() rigamarole. That code is already complex enough, and here it's really not needed. I think it'll be much easier to review - and actually faster in many cases to simply have something like bool BufferInCache(Relation, ForkNumber, BlockNumber) { /* XXX: setup tag, hash, partition */ LWLockAcquire(newPartitionLock, LW_SHARED); buf_id = BufTableLookup(newTag, newHash); LWLockRelease(newPartitionLock); return buf_id != -1; } and then fall back to the normal ReadBuffer() when it's in cache. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Restore-reliability mode
On Thu, Jul 23, 2015 at 04:53:49PM -0300, Alvaro Herrera wrote: Peter Geoghegan wrote: On Sat, Jun 6, 2015 at 12:58 PM, Noah Misch n...@leadboat.com wrote: - Call VALGRIND_MAKE_MEM_NOACCESS() on a shared buffer when its local pin count falls to zero. Under CLOBBER_FREED_MEMORY, wipe a shared buffer when its global pin count falls to zero. Did a patch for this ever materialize? I think the first part would be something like the attached. Neat. Does it produce any new complaints during make installcheck? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSets: Getting collation related error when GSets has text column
Hi, On 2015-07-17 18:55:52 +0530, Jeevan Chalke wrote: Attached patch which attempts to fix this issue. However I am not sure whether it is correct. But it does not add any new issues as such. Added few test in the patch as well. Pushed the fix. Thanks for the report and fix. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Buildfarm failure from overly noisy warning message
chipmunk failed last night http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=chipmunkdt=2015-07-26%2007%3A36%3A32 like so: == pgsql.build/src/test/regress/regression.diffs === *** /home/pgbfarm/buildroot/REL9_3_STABLE/pgsql.build/src/test/regress/expected/create_index.out Sun Jul 26 10:37:41 2015 --- /home/pgbfarm/buildroot/REL9_3_STABLE/pgsql.build/src/test/regress/results/create_index.out Sun Jul 26 10:51:48 2015 *** *** 14,19 --- 14,20 CREATE INDEX tenk1_hundred ON tenk1 USING btree(hundred int4_ops); CREATE INDEX tenk1_thous_tenthous ON tenk1 (thousand, tenthous); CREATE INDEX tenk2_unique1 ON tenk2 USING btree(unique1 int4_ops); + WARNING: could not send signal to process 30123: No such process CREATE INDEX tenk2_unique2 ON tenk2 USING btree(unique2 int4_ops); CREATE INDEX tenk2_hundred ON tenk2 USING btree(hundred int4_ops); CREATE INDEX rix ON road USING btree (name text_ops); == What's evidently happened here is that our session tried to boot an autovacuum process off a table lock, only that process was gone by the time we issued the kill() call. No problem really ... but aside from causing buildfarm noise, I could see somebody getting really panicky if this message appeared on a production server. I'm inclined to reduce the WARNING to LOG, and/or skip it altogether if the error is ESRCH. The relevant code in ProcSleep() is: ereport(LOG, (errmsg(sending cancel to blocking autovacuum PID %d, pid), errdetail_log(%s, logbuf.data))); if (kill(pid, SIGINT) 0) { /* Just a warning to allow multiple callers */ ereport(WARNING, (errmsg(could not send signal to process %d: %m, pid))); } so logging failures at LOG level seems pretty reasonable. One could also argue that both of these ereports ought to be downgraded to DEBUG1 or less, since this mechanism is pretty well shaken out by now. Thoughts? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Grouping Sets: Fix unrecognized node type bug
On 2015-07-17 19:57:22 +0100, Andrew Gierth wrote: Attached is the current version of my fix (with Jeevan's regression tests plus one of mine). Pushed, thanks for the report and fix! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] CustomScan and readfuncs.c
Hello, Under the investigation of ParallelAppend, I noticed here is a few problems in CustomScan, that prevents to reproduce an equivalent plan node on the background worker from serialized string. 1. CustomScanMethods-TextOutCustomScan callback This callback allows to output custom information on nodeToString. Originally, we intend to use this callback for debug only, because CustomScan must be copyObject() safe, thus, all the private data also must be stored in custom_exprs or custom_private. However, it will lead another problem when we try to reproduce CustomScan node from the string form generated by outfuncs.c. If TextOutCustomScan prints something, upcoming _readCustomScan has to deal with unexpected number of tokens in unexpected format. I'd like to propose to omit this callback prior to v9.5 release, for least compatibility issues. 2. Reproduce method table on background worker -- The method field of CustomPath/Scan/ScanState is expected to be a reference to a static structure. Thus, copyObject() does not copy the entire table, but only pointers. However, we have no way to guarantee the callback functions have same entrypoint addressed on background workers. So, we may need an infrastructure to reproduce same CustomScan node with same callback function tables, probably, identified by name. We may have a few ways to solve the problem. * Add system catalog, function returns pointer The simplest way, like FDW. System catalog has a name and function to return callback pointers. It also needs SQL statement support, even a little down side. * Registered by name, during shared_preload_libraries only Like an early version of CustomScan interface, it requires custom- scan providers to register a pair of name and callbacks, but only when shared_preload_libraries is processed, to guarantee the callbacks are also registered in the background workers also. (Is this assumption right on windows?) Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei kai...@ak.jp.nec.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] A little RLS oversight?
On 07/26/2015 07:19 AM, Dean Rasheed wrote: I'm not convinced about exporting convert_table_name() from acl.c, particularly with such a non-descriptive name. It's only a couple of lines of code, so I think they may as well just be included directly in the new function, as seems to be common practice elsewhere. fair enough As it stands, passing checkAsUser = GetUserId() to check_enable_rls() will cause it to skip the check for row_security = OFF, and so it may falsely report that RLS is active when the user has bypassed it. To avoid that row_security_active() needs to pass checkAsUser = InvalidOid to check_enable_rls(). [ Actually there seem to be a few other callers of check_enable_rls() that suffer the same problem. I don't really understand the reasoning behind that check in check_enable_rls() - if the current user has the BYPASSRLS attribute and row_security = OFF, shouldn't RLS be disabled on every table no matter how it is accessed? ] Hmmm, I can see that now but find it surprising. Seems like an odd interface, and probably deserves some explanation in the function comments. Maybe Stephen or someone else can weigh in on this... I think it would be better if the security context check were part of this new function too. Right now that can't make any difference, since only the RI triggers set SECURITY_ROW_LEVEL_DISABLED and this function cannot be called in that code path, but it's possible that in the future other code might set SECURITY_ROW_LEVEL_DISABLED, so moving the check down guarantees that check_enable_rls()/row_security_active() always accurately return the RLS status for the table. While I was looking at it, I spotted a couple of other things to tidy up in existing related code: * The comment for GetUserIdAndSecContext() needed updating for the new RLS bit. * Calling GetUserIdAndSecContext() and then throwing away the user_id returned seems ugly. There is already a code style precedent for checking the other bits of SecurityRestrictionContext, so I've added a similar bit-testing function for SECURITY_ROW_LEVEL_DISABLED which makes this code a bit neater. Attached is an updated patch (still needs some docs for the functions). Thanks for that. I'll add the docs. Joe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Buildfarm failure from overly noisy warning message
Hi, On 2015-07-26 10:56:05 -0400, Tom Lane wrote: CREATE INDEX tenk2_unique1 ON tenk2 USING btree(unique1 int4_ops); + WARNING: could not send signal to process 30123: No such process What's evidently happened here is that our session tried to boot an autovacuum process off a table lock, only that process was gone by the time we issued the kill() call. No problem really ... but aside from causing buildfarm noise, I could see somebody getting really panicky if this message appeared on a production server. Oops. I'm inclined to reduce the WARNING to LOG Hm, that doesn't seem like a very nice solution, given that LOG is even more likely to end up in the server log. and/or skip it altogether if the error is ESRCH. Combined with a comment why we're ignoring that case that'd work for me. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Grouping Sets: Fix unrecognized node type bug
On 2015-07-17 11:37:26 +0530, Jeevan Chalke wrote: However I wonder why we are supporting GROUPING SETS inside GROUPING SETS. On Oracle, it is throwing an error. We are not trying to be Oracle compatible, but just curious to know. The SQL specification seems to be pretty unambigous about supporting nested grouping set specifications. Check 7.9 group by clause: grouping set (nested inside a grouping sets specification) can contain grouping sets specification. Regards, Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] extend pgbench expressions with functions
Hello Heikki, As soon as we add more functions, the way they are documented needs to be reworked too; we'll need to add a table in the manual to list them. Here is a v8 with abs, min, max, random, gaussian et exponential. [...] There is no real doc, WIP... Here is a v9 with a doc. The implicit typing of expressions is improved. I also added two debug functions which allow to show intermediate integer (idebug) or double (ddebug). \set i idebug(random(1, 10)) will print the random value and assign it to i. I updated the defaut scripts, which seems to speed up meta command evaluations. The initial version does less than 2 million evals per second: sh cat before.sql \set naccounts 10 * :scale \setrandom aid 1 :naccounts sh ./pgbench -T 3 -P 1 -f before.sql [...] tps = 1899004.793098 (excluding connections establishing) The updated version does more than 3 million evals per second: sh cat after.sql \set aid random(1, 10 * :scale) sh ./pgbench -T 3 -P 1 -f after.sql [...] tps = 3143168.813690 (excluding connections establishing) Suggestion: A possibility would be to remove altogether the \setrandom stuff as the functionality is available through \set, maybe with an error message to advise about using \set with one of the random functions. That would remove about 200 ugly locs... Another option would be to translate the setrandom stuff into a \set expression, that would maybe save 100 locs for the eval, but keep and expand a little the manual parser part. -- Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml index 2517a3a..73b7caf 100644 --- a/doc/src/sgml/ref/pgbench.sgml +++ b/doc/src/sgml/ref/pgbench.sgml @@ -758,17 +758,20 @@ pgbench optional replaceableoptions/ /optional replaceabledbname/ Sets variable replaceablevarname/ to an integer value calculated from replaceableexpression/. The expression may contain integer constants such as literal5432/, - references to variables literal:/replaceablevariablename/, + double constants such as literal3.14156/, + references to integer variables literal:/replaceablevariablename/, and expressions composed of unary (literal-/) or binary operators (literal+/, literal-/, literal*/, literal//, literal%/) - with their usual associativity, and parentheses. + with their usual associativity, integer function calls and parentheses. + xref linkend=functions-pgbench-func-table shows the available + functions. /para para Examples: programlisting \set ntellers 10 * :scale -\set aid (1021 * :aid) % (10 * :scale) + 1 +\set aid (1021 * random(1, 10 * :scale)) % (10 * :scale) + 1 /programlisting/para /listitem /varlistentry @@ -918,18 +921,110 @@ pgbench optional replaceableoptions/ /optional replaceabledbname/ /varlistentry /variablelist + !-- list pgbench functions in alphabetical order -- + table id=functions-pgbench-func-table +titlePgBench Functions/title +tgroup cols=5 + thead + row + entryFunction/entry + entryReturn Type/entry + entryDescription/entry + entryExample/entry + entryResult/entry + /row + /thead + tbody + row + entryliteralfunctionabs(replaceablea/)/// + entrysame as replaceablea// + entryinteger or double absolute value/ + entryliteralabs(-17)// + entryliteral17// + /row + row + entryliteralfunctionddebug(replaceablex/)/// + entrydouble/ + entrystderr print for debug and return argument/ + entryliteralddebug(5432.1)// + entryliteral5432.1// + /row + row + entryliteralfunctiondouble(replaceablei/)/// + entrydouble/ + entryevaluate as int and cast to double/ + entryliteraldouble(5432)// + entryliteral5432.0// + /row + row + entryliteralfunctionexporand(replaceablei/, replaceablej/, replaceablet/)/// + entryinteger/ + entryexponentially distributed random integer in the bounds, see below/ + entryliteralexporand(1, 10, 3.0)// + entryint between literal1/ and literal10// + /row + row + entryliteralfunctionidebug(replaceablei/)/// + entryinteger/ + entrystderr print for debug and return argument/ + entryliteralidebug(5432)// + entryliteral5432// + /row + row + entryliteralfunctionint(replaceablex/)/// + entryinteger/ + entryevaluate as double and cast to int/ + entryliteralint(5.4 + 3.8)// + entryliteral9// + /row + row + entryliteralfunctiongaussrand(replaceablei/, replaceablej/, replaceablet/)/// + entryinteger/ + entrygaussian distributed random integer in the bounds, see below/ + entryliteralgaussrand(1, 10, 2.5)// + entryint between literal1/ and literal10// + /row + row +
Re: [HACKERS] Gsets: ROW expression semantic broken between 9.4 and 9.5
Jeevan == Jeevan Chalke jeevan.cha...@enterprisedb.com writes: Jeevan Hi Jeevan It looks like we have broken the ROW expression without Jeevan explicit ROW keyword in GROUP BY. Andres has given the short version, but here's the long version: In the spec, GROUP BY ROW(a,b) is an error, while GROUP BY (a,b) is exactly equivalent to GROUP BY a,b. Previously, pg treated GROUP BY (a,b) as if it were GROUP BY ROW(a,b) since it was parsing it as an expression, and (a,b) in an expression is shorthand for ROW(a,b). However, the parens are significant in many contexts in the grouping set syntax, e.g. ROLLUP(a,(b,c)) is equivalent to GROUPING SETS ((a,b,c), (a), ()), and we have to be able to parse both GROUPING SETS (a,b) (which is two grouping sets) and GROUPING SETS ((a,b),(c,d)), which means that we can't actually use the grammar to distinguish expressions from parenthesized sublists. What the code therefore does is to explicitly distinguish (a,b) and ROW(a,b), and treat the first as a list and the second as a single expression. This is documented in the following NOTE in queries.sgml: note para The construct literal(a,b)/ is normally recognized in expressions as a link linkend=sql-syntax-row-constructorsrow constructor/link. Within the literalGROUP BY/ clause, this does not apply at the top levels of expressions, and literal(a,b)/ is parsed as a list of expressions as described above. If for some reason you emphasisneed/ a row constructor in a grouping expression, use literalROW(a,b)/. /para /note http://www.postgresql.org/docs/9.5/static/queries-table-expressions.html#QUERIES-GROUPING-SETS Andres has suggested that this should be mentioned in an incompatibility note in the release notes. I'm not sure that's needed, since I don't believe there are any cases where previously valid queries change in behavior; a query such as select (a,b) from (values (1,2)) v(a,b) group by (a,b); previously evaluated the row constructor before grouping, while now it groups by a and b separately and evaluates the row constructor afterwards. If there's a way to make this change affect the result, I've not found it yet, even when using volatile functions. -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sharing aggregate states between different aggregate functions
On 07/09/2015 12:44 PM, David Rowley wrote: On 15 June 2015 at 12:05, David Rowley david.row...@2ndquadrant.com wrote: This basically allows an aggregate's state to be shared between other aggregate functions when both aggregate's transition functions (and a few other things) match There's quite a number of aggregates in our standard set which will benefit from this optimisation. After compiling the original patch with another compiler, I noticed a couple of warnings. The attached fixes these. I spent some time reviewing this. I refactored the ExecInitAgg code rather heavily to make it more readable (IMHO); see attached. What do you think? Did I break anything? Some comments: * In aggref_has_compatible_states(), you give up if aggtype or aggcollid differ. But those properties apply to the final function, so you were leaving some money on the table by disallowing state-sharing if they differ. * The filter and input expressions are initialized for every AggRef, before the deduplication logic kicks in. The AggrefExprState.aggfilter, aggdirectargs and args fields really belong to the AggStatePerAggState struct instead. This is not a new issue, but now that we have a convenient per-aggstate struct to put them in, let's do so. * There was a reference-after free bug in aggref_has_compatible_states; you cannot ReleaseSysCache and then continue pointing to the struct. * The code was a bit fuzzy on which parts of the per-aggstate are filled in at what time. Some of the fields were overwritten every time a match was found. With the same values, so no harm done, but I found it confusing. I refactored ExecInitAgg in the attached patch to clear that up. * There API of build_aggregate_fnexprs() was a bit strange now that some callers use it to only fill in the final function, some call it to fill both the transition functions and the final function. I split it to two separate functions. * I wonder if we should do this duplicate elimination at plan time. It's very fast, so I'm not worried about that right now, but you had grand plans to expand this so that an aggregate could optionally use one of many different kinds of state values. At that point, it certainly seems like a planning decision to decide which aggregates share state. I think we can leave it as it is for now, but it's something to perhaps consider later. BTW, the name of the AggStatePerAggStateData struct is pretty horrible. The repeated AggState feels awkward. Now that I've stared at the patch for a some time, it doesn't bother me anymore, but it took me quite a while to over that. I'm sure it will for others too. And it's not just that struct, the comments talk about aggregate state, which could be confused to mean AggState, but it actually means AggStatePerAggStateData. I don't have any great suggestions, but can you come up a better naming scheme? - Heikki diff --git a/src/backend/executor/execQual.c b/src/backend/executor/execQual.c index 0f911f2..fd922bd 100644 --- a/src/backend/executor/execQual.c +++ b/src/backend/executor/execQual.c @@ -4485,35 +4485,15 @@ ExecInitExpr(Expr *node, PlanState *parent) break; case T_Aggref: { -Aggref *aggref = (Aggref *) node; AggrefExprState *astate = makeNode(AggrefExprState); astate-xprstate.evalfunc = (ExprStateEvalFunc) ExecEvalAggref; if (parent IsA(parent, AggState)) { AggState *aggstate = (AggState *) parent; - int naggs; aggstate-aggs = lcons(astate, aggstate-aggs); - naggs = ++aggstate-numaggs; - - astate-aggdirectargs = (List *) ExecInitExpr((Expr *) aggref-aggdirectargs, - parent); - astate-args = (List *) ExecInitExpr((Expr *) aggref-args, - parent); - astate-aggfilter = ExecInitExpr(aggref-aggfilter, - parent); - - /* - * Complain if the aggregate's arguments contain any - * aggregates; nested agg functions are semantically - * nonsensical. (This should have been caught earlier, - * but we defend against it here anyway.) - */ - if (naggs != aggstate-numaggs) - ereport(ERROR, -(errcode(ERRCODE_GROUPING_ERROR), - errmsg(aggregate function calls cannot be nested))); + aggstate-numaggs++; } else { diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c index 2bf48c5..fcc3859 100644 --- a/src/backend/executor/nodeAgg.c +++ b/src/backend/executor/nodeAgg.c @@ -152,17 +152,28 @@ /* - * AggStatePerAggData - per-aggregate working state for the Agg scan + * AggStatePerAggStateData - per aggregate state data for the Agg scan + * + * Working state for calculating the aggregate state, using the state + * transition function. This struct does not store the information needed + * to produce the final aggregate result from the state value; that's stored + * in AggStatePerAggData instead. This separation allows multiple aggregate + * results to be
Re: [HACKERS] Combining Aggregates
On 04/01/2015 06:28 PM, Robert Haas wrote: On Mon, Mar 30, 2015 at 1:28 AM, Michael Paquier michael.paqu...@gmail.com wrote: I've been thinking of bumping this patch to the June commitfest as the patch only exists to provide the basic infrastructure for things like parallel aggregation, aggregate before join, and perhaps auto updating materialised views. It seems unlikely that any of those things will happen for 9.5. Yeah, I guess so... Does anybody object to me moving this to June's commitfest? Not from my side FWIW. I think it actually makes sense. +1. I'd like to devote some time to looking at this, but I don't have the time right now. The chances that we can do something useful with it in 9.6 seem good. And the June commitfest is now in progress. This patch seems sane to me, as far as it goes. However, there's no planner or executor code to use the aggregate combining for anything. I'm not a big fan of dead code, I'd really like to see something to use this. The main use case people have been talking about is parallel query, but is there some other case this would be useful right now, without the parallel query feature? You and Simon talked about this case: 2. Queries such as: SELECT p.name, SUM(s.qty) FROM sales s INNER JOIN product p ON s.product_id = p.product_id GROUP BY p.name; Such a query could be transformed into: SELECT p.name,SUM(qty) FROM (SELECT product_id,SUM(qty) AS qty FROM sales GROUP BY product_id) s INNER JOIN product p ON p.product_id = s.product_id GROUP BY p_name; Of course the outer query's SUM and GROUP BY would not be required if there happened to be a UNIQUE index on product(name), but assuming there's not then the above should produce the results faster. This of course works ok for SUM(), but for something like AVG() or STDDEV() the combine/merge aggregate functions would be required to process those intermediate aggregate results that were produced by the sub-query. Any chance you could implement that in the planner? - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Failing assertions in indxpath.c, placeholder.c and brin_minmax.c
Hi, when running my random query generator contraption[1] against the regression database of 9.5 or master, it occasionally triggers one of the following three assertions. Someone more knowledgeable might want to take a look at them... -- FailedAssertion(!(outer_rel-rows 0), File: indxpath.c, Line: 1911) -- sample query: select rel1925354.loid as c0, rel1925353.version as c1 from (select rel1925352.aa as c0, rel1925352.aa as c1 from public.b as rel1925352 where (rel1925352.bb is NULL) and (rel1925352.bb rel1925352.bb)) as subq_303136 inner join pg_catalog.pg_stat_ssl as rel1925353 on (subq_303136.c0 = rel1925353.pid ) right join pg_catalog.pg_largeobject as rel1925354 on (subq_303136.c0 = rel1925354.pageno ) where (rel1925353.clientdn !~ rel1925353.clientdn) and (rel1925353.cipher = rel1925353.clientdn); ,[ git bisect ] | first bad commit: [3f8c23c4d31d4a0e801041733deb2c7cfa577b32] Improve | predtest.c's ability to reason about operator expressions. ` -- FailedAssertion(!(!bms_is_empty(phinfo-ph_eval_at)), File: placeholder.c, Line: 109) -- sample query: select rel1600276.viewowner as c0, rel1600274.maxwritten_clean as c1, rel1600275.n_tup_hot_upd as c2 from pg_catalog.pg_stat_bgwriter as rel1600274 inner join pg_catalog.pg_stat_xact_all_tables as rel1600275 on (rel1600274.maxwritten_clean = rel1600275.seq_scan ) right join pg_catalog.pg_views as rel1600276 right join pg_catalog.pg_operator as rel1600277 on (rel1600276.viewname = rel1600277.oprname ) on (rel1600275.relname = rel1600277.oprname ) where 3 is not NULL; ,[ git bisect ] | first bad commit: [f4abd0241de20d5d6a79b84992b9e88603d44134] Support | flattening of empty-FROM subqueries and one-row VALUES tables. ` -- FailedAssertion(!(key-sk_flags 0x0080), File: brin_minmax.c, Line: 177) -- sample query: select rel167978.namecol as c0 from information_schema.parameters as rel167972 left join public.student as rel167977 inner join public.brintest as rel167978 on (rel167977.age = rel167978.int4col ) on (rel167972.interval_precision = rel167977.age ) where rel167977.name rel167977.name; regards, andreas Footnotes: [1] https://github.com/anse1/sqlsmith -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSets: Fix bug involving GROUPING and HAVING together
On 2015-07-24 11:34:22 +0100, Andrew Gierth wrote: Andrew == Andrew Gierth and...@tao11.riddles.org.uk writes: Andrew The other is that in subquery_planner, the optimization of Andrew converting HAVING clauses to WHERE clauses is suppressed if Andrew parse-groupingSets isn't empty. (It is empty if there's either Andrew no group by clause at all, or if there's exactly one grouping Andrew set, which must not be empty, however derived.) This is costing Andrew us some optimizations, especially in the case of an explicit Andrew GROUP BY () clause; I'll look into this. I'm inclined to go with the simplest approach here, which copies the quals if there are grouping sets. The only way we could safely move a qual without copying is if we can show that none of the grouping sets is empty, that is (), and at this point the grouping set list has not been expanded so it's not trivial to determine that. I pushed this to both master and 9.5. While this isn't strictly a bugfix, it seemed better to keep the branches the same at this point. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CustomScan and readfuncs.c
Kouhei Kaigai kai...@ak.jp.nec.com writes: Under the investigation of ParallelAppend, I noticed here is a few problems in CustomScan, that prevents to reproduce an equivalent plan node on the background worker from serialized string. 1. CustomScanMethods-TextOutCustomScan callback This callback allows to output custom information on nodeToString. Originally, we intend to use this callback for debug only, because CustomScan must be copyObject() safe, thus, all the private data also must be stored in custom_exprs or custom_private. However, it will lead another problem when we try to reproduce CustomScan node from the string form generated by outfuncs.c. If TextOutCustomScan prints something, upcoming _readCustomScan has to deal with unexpected number of tokens in unexpected format. Um ... wait a second. There is no support in readfuncs for any plan node type, and never has been, and I seriously doubt that there ever should be. I do not think it makes sense to ship plans around in the way you seem to have in mind. (Also, I don't think the problems you mention are exactly unique to CustomScan. There's no reason to assume that FDW plans could survive this treatment either, since we do not know what's in the fdw_private stuff; certainly no one has ever suggested that it should not contain pointers to static data.) I'd like to propose to omit this callback prior to v9.5 release, for least compatibility issues. I regard our commitment to cross-version compatibility for the custom scan APIs as being essentially zero, for reasons previously discussed. So if this goes away in 9.6 it will not matter, but we might as well leave it in for now for debug support. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSets: Fix bug involving GROUPING and HAVING together
On 2015-07-14 14:51:09 +0530, Jeevan Chalke wrote: Fix this by adding GroupingFunc node in this walker. We do it correctly in contain_aggs_of_level_walker() in which we have handling for GroupingFunc there. Attached patch to fix this. Pushed, thanks for fix! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Buildfarm failure from overly noisy warning message
Andrew Dunstan and...@dunslane.net writes: On 07/26/2015 11:00 AM, Andres Freund wrote: On 2015-07-26 10:56:05 -0400, Tom Lane wrote: I'm inclined to reduce the WARNING to LOG Hm, that doesn't seem like a very nice solution, given that LOG is even more likely to end up in the server log. and/or skip it altogether if the error is ESRCH. Combined with a comment why we're ignoring that case that'd work for me. +1 for this. if the process is gone we shouldn't really have a problem, should we? The only real reason for printing anything here is the possibility that the shared lock table is sufficiently corrupted that we think there's an autovacuum process blocking us when there isn't. However, I don't see any particular reason to think that this is more probable than any other bad consequence of corrupted shmem, so I don't feel a need to be in the user's face with a WARNING. The evidence so far is that the race condition is real, but I don't recall anyone ever reporting this in a context where we'd suspect actual shmem corruption. I do, however, think it might be reasonable to LOG the failure given that we LOG'd the action. Is nobody else on board with downgrading both ereports to DEBUG? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] anole: assorted stability problems
On 2015-07-07 13:25:24 +0200, Andres Freund wrote: So, it's starting to look good. Not exactly allowing for a lot of confidence yet, but still: http://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=anolebr=HEAD Since there have not been any relevant failures since, I'm going to remove this issue from the open items list. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] security labels on databases are bad for dump restore
On Thu, Jul 23, 2015 at 12:14:14PM -0400, Robert Haas wrote: On Wed, Jul 22, 2015 at 3:42 PM, Adam Brightwell adam.brightw...@crunchydatasolutions.com wrote: I like Noah's proposal of having pg_dump --create reproduce all database-level state. Should it be enabled by default? If so, then wouldn't it make more sense to call it --no-create and do the opposite? So, --no-create would exclude rather than include database-level information? Would enabling it by default cause issues with the current expected use of the tool by end users? This seems a bit hairy to me. If we want to transfer responsibility for dumping this stuff from pg_dumpall to pg_dump, I have no problem with that at all. But doing it only when --create is specified seems odd. Then, does pg_dumpall -g dump it or not? The principle I had in mind was to dump ACLs, pg_db_role_setting entries, comments and security labels if and only if we emit a CREATE statement for the object they modify. That is already the rule for objects located inside databases. Since pg_dumpall -g does not emit CREATE DATABASE statements[1], it would not dump these attributes of databases. If it does, then we're sorta dumping it in two places when --create is used. If it doesn't, then when --create is not used we're doing it nowhere. Yep. Plain pg_dump dumps the contents of a database without dumping the database itself. I don't like that as a default, but we're stuck with it. [1] pg_dumpall -g --binary-upgrade _does_ emit CREATE DATABASE statements, so _it_ would dump these attributes. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] False comment about speculative insertion
Attached patch removes a reference to an executor README section about speculative insertion. In fact, the high-level overview of speculative insertion ended up at the top of execIndexing.c. The executor README was not touched by the ON CONFLICT patch at all. I don't think it's necessary to refer to execIndexing.c within ExecInsert instead. All the routines being called from ExecInsert() that relate to speculative insertion are in execIndexing.c anyway. -- Peter Geoghegan From 5ea69e5f98a4eeb4c9f6ffc8f161e3e16f0cda86 Mon Sep 17 00:00:00 2001 From: Peter Geoghegan peter.geoghega...@gmail.com Date: Sun, 26 Jul 2015 12:28:39 -0700 Subject: [PATCH] Remove false comment about speculative insertion There never was an executor README section that discussed speculative insertion in the original ON CONFLICT commit. --- src/backend/executor/nodeModifyTable.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c index 874ca6a..1ef76d0 100644 --- a/src/backend/executor/nodeModifyTable.c +++ b/src/backend/executor/nodeModifyTable.c @@ -351,8 +351,7 @@ ExecInsert(ModifyTableState *mtstate, * * We loop back here if we find a conflict below, either during * the pre-check, or when we re-check after inserting the tuple - * speculatively. See the executor README for a full discussion - * of speculative insertion. + * speculatively. */ vlock: specConflict = false; -- 1.9.1 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] security labels on databases are bad for dump restore
On 20 July 2015 at 01:18, Noah Misch n...@leadboat.com wrote: On Wed, Jul 15, 2015 at 11:08:53AM +0200, Andres Freund wrote: On 2015-07-15 12:04:40 +0300, Alvaro Herrera wrote: Andres Freund wrote: One thing worth mentioning is that arguably the problem is caused by the fact that we're here emitting database level information in pg_dump, normally only done for dumpall. Consistency with existing practice would indeed have pg_dump ignore pg_shseclabel and have pg_dumpall reproduce its entries. Existing practice is pretty broken though, and not necessarily a good guide. COMMENT ON DATABASE and SECURITY LABEL FOR DATABASE are dumped by pg_dump, but always refer to the database's name at the time it was dumped, so restoring it can break. GRANTs on databases are ignored and not dumped by pg_dump or by pg_dumpall --globals-only. The only way to dump them seems to be to use pg_dumpall, which nobody uses in the real world. I'd be strongly in favour of teaching GRANT, SECURITY LABEL, COMMENT ON DATABASE, etc to recognise CURRENT_DATABASE as a keyword. Then dumping them in pg_dump --create, and in pg_dump -Fc . In practice I see zero real use of pg_dumpall without --globals-only, and almost everyone does pg_dump -Fc . I'd like to see that method case actually preserve the whole state of the system and do the right thing sensibly. A pg_restore option to skip database-level settings could be useful, but I think by default they should be restored. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive
On Sat, Jul 25, 2015 at 10:30 PM, Ildus Kurbangaliev i.kurbangal...@postgrespro.ru wrote: On Jul 24, 2015, at 10:02 PM, Robert Haas robertmh...@gmail.com wrote: Also, the patch should not invent a new array similar but not quite identical to LockTagTypeNames[]. This is goofy: + if (tranche_id 0) + result-tranche = tranche_id; + else + result-tranche = LW_USERDEFINED; If we're going to make everybody pass a tranche ID when they call LWLockAssign(), then they should have to do that always, not sometimes pass a tranche ID and otherwise pass 0, which is actually a valid tranche ID but to which you've given a weird special meaning here. I'd also name the constants differently, like LWLOCK_TRANCHE_XXX or something. LW_ is a somewhat ambiguous prefix. The idea of tranches is really that each tranche is an array of items each of which contains 1 or more lwlocks. Here you are intermingling different tranches. I guess that won't really break anything but it seems ugly. Maybe it will be better to split LWLockAssign to two functions then, keep name LWLockAssign for user defined locks and other function with tranche_id. On Jul 25, 2015, at 1:54 PM, Amit Kapila amit.kapil...@gmail.com wrote: That anyway he has to do it either you go by defining individual arrays or having unified WaitEventType enum for individual arrays he has to find out that array. Another thing is with that you can just encapsulate this information in one byte in structure PgBackendStatus, rather than using more number of bytes (4 bytes) and I think the function for reporting Waitevent will be much more simplified. In my way there are no special meaning for names. Array with names located in lwlock.c and lock.c, and can be used for other things (for example tracing). One byte sounds good only for this case. Do you mean to say that you need more than 256 events? I am not sure if we can add that many events without adding performance penalty in some path. The original idea was proposed for one-byte and the patch was written considering the same, now you are planning to extend (which is okay), but modifying it without any prior consent is what slightly a matter of concern. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Combining Aggregates
On 27 July 2015 at 04:58, Heikki Linnakangas hlinn...@iki.fi wrote: On 04/01/2015 06:28 PM, Robert Haas wrote: On Mon, Mar 30, 2015 at 1:28 AM, Michael Paquier michael.paqu...@gmail.com wrote: I've been thinking of bumping this patch to the June commitfest as the patch only exists to provide the basic infrastructure for things like parallel aggregation, aggregate before join, and perhaps auto updating materialised views. It seems unlikely that any of those things will happen for 9.5. Yeah, I guess so... Does anybody object to me moving this to June's commitfest? Not from my side FWIW. I think it actually makes sense. +1. I'd like to devote some time to looking at this, but I don't have the time right now. The chances that we can do something useful with it in 9.6 seem good. And the June commitfest is now in progress. This patch seems sane to me, as far as it goes. However, there's no planner or executor code to use the aggregate combining for anything. I'm not a big fan of dead code, I'd really like to see something to use this. Thanks for looking. I partially agree with that, it is a little weird to put in code that's yet to be used. I'd certainly agree 95% if this was the final commitfest of 9.5, but we're in the first commitfest of 9.6 and I think there's a very high probability of this getting used in 9.6, and likely that probability would be even higher if the code is already in. Perhaps it's a little bit in the same situation as to Robert's parallel worker stuff? The main use case people have been talking about is parallel query, but is there some other case this would be useful right now, without the parallel query feature? You and Simon talked about this case: 2. Queries such as: SELECT p.name, SUM(s.qty) FROM sales s INNER JOIN product p ON s.product_id = p.product_id GROUP BY p.name; Such a query could be transformed into: SELECT p.name,SUM(qty) FROM (SELECT product_id,SUM(qty) AS qty FROM sales GROUP BY product_id) s INNER JOIN product p ON p.product_id = s.product_id GROUP BY p_name; Of course the outer query's SUM and GROUP BY would not be required if there happened to be a UNIQUE index on product(name), but assuming there's not then the above should produce the results faster. This of course works ok for SUM(), but for something like AVG() or STDDEV() the combine/merge aggregate functions would be required to process those intermediate aggregate results that were produced by the sub-query. Any chance you could implement that in the planner? Yes! I'm actually working on it now and so far have it partially working. Quite likely I'll be able to submit for CF2. There's still some costing tweaks to do. So far it just works for GROUP BY with no aggs. I plan to fix that later using this patch. I don't want to talk too much about it on this thread, but in a test query which is the one in my example, minus the SUM(qty), with 1 million sales records, and 100 products, performance goes from 350ms to 200ms on my machine, so looking good so far. Regards David Rowley -- David Rowley http://www.2ndQuadrant.com/ http://www.2ndquadrant.com/ PostgreSQL Development, 24x7 Support, Training Services
Re: [HACKERS] pgbench - allow backslash-continuations in custom scripts
Hello, Attatched is the revised version of this patch. The first patch is not changed from before. The second is fixed a kind of bug. Ths third is the new one to allow backslash continuation for backslash commands. Ah, thanks:-) Would you consider adding the patch to the next commitfest? I may put myself as a reviewer... No problem. [...] I'm not satisfied by the design but I don't see another way.. I'll try to have a look. Thanks. I don't have idea how to deal with the copy of psqlscan.[lh] from psql. Currently they are simply the dead copies of those of psql. I think that there should be no copies, but it should use relative symbolic links so that the files are kept synchronized. Yeah, I think so but symlinks could harm on git and Windows. The another way would be make copies it from psql directory. They live next door to each other. Indeed there are plenty of links already which are generated by makefiles (see src/bin/pg_xlogdump/*), and probably a copy is made on windows. There should no file duplication within the source tree. Thank you for pointing out that. Makefile of pg_xlogdump re-creates symlinks to those files and they are replaced with cp for the platforms where symlinks are not usable. But the files are are explicitly added to .sln file on msvc. Anyway I'll address it. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Failing assertions in indxpath.c, placeholder.c and brin_minmax.c
Andreas Seltenreich seltenre...@gmx.de writes: when running my random query generator contraption[1] against the regression database of 9.5 or master, it occasionally triggers one of the following three assertions. I've fixed the first two of these --- thanks for the report! ,[ git bisect ] | first bad commit: [3f8c23c4d31d4a0e801041733deb2c7cfa577b32] Improve | predtest.c's ability to reason about operator expressions. ` I'm a bit confused about this aspect of your report though, because in my hands that example fails clear back to 9.2. It doesn't seem to require the predtest.c improvement to expose the fault. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Failing assertions in indxpath.c, placeholder.c and brin_minmax.c
On Sun, Jul 26, 2015 at 7:07 AM, Tom Lane t...@sss.pgh.pa.us wrote: Andreas Seltenreich seltenre...@gmx.de writes: when running my random query generator contraption[1] against the regression database of 9.5 or master, it occasionally triggers one of the following three assertions. Very very cool tool! Please keep doing that testing. The SQLite people have been using a tool like this for some time. They've also had luck finding bugs with a generic fuzz-testing tool called american fuzzy lop (yes, seriously, that's what it's called), which apparently is the state of the art. I myself ran that tool against Postgres. I didn't spend enough time to tweak it in a way that might have been effective. I also didn't figure out a way to make iterations fast enough for the tool to be effective, because I was invoking Postgres in single-user mode. I might pick it up again in the future, but probably for a more targeted case. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Combining Aggregates
On 04/01/2015 06:28 PM, Robert Haas wrote: On Mon, Mar 30, 2015 at 1:28 AM, Michael Paquier michael.paqu...@gmail.com wrote: I've been thinking of bumping this patch to the June commitfest as the patch only exists to provide the basic infrastructure for things like parallel aggregation, aggregate before join, and perhaps auto updating materialised views. It seems unlikely that any of those things will happen for 9.5. Yeah, I guess so... Does anybody object to me moving this to June's commitfest? Not from my side FWIW. I think it actually makes sense. +1. I'd like to devote some time to looking at this, but I don't have the time right now. The chances that we can do something useful with it in 9.6 seem good. And the June commitfest is now in progress. This patch seems sane to me, as far as it goes. However, there's no planner or executor code to use the aggregate combining for anything. I'm not a big fan of dead code, I'd really like to see something to use this. +1, this patch itself looks good for me, but... The main use case people have been talking about is parallel query, but is there some other case this would be useful right now, without the parallel query feature? You and Simon talked about this case: 2. Queries such as: SELECT p.name, SUM(s.qty) FROM sales s INNER JOIN product p ON s.product_id = p.product_id GROUP BY p.name; Such a query could be transformed into: SELECT p.name,SUM(qty) FROM (SELECT product_id,SUM(qty) AS qty FROM sales GROUP BY product_id) s INNER JOIN product p ON p.product_id = s.product_id GROUP BY p_name; Of course the outer query's SUM and GROUP BY would not be required if there happened to be a UNIQUE index on product(name), but assuming there's not then the above should produce the results faster. This of course works ok for SUM(), but for something like AVG() or STDDEV() the combine/merge aggregate functions would be required to process those intermediate aggregate results that were produced by the sub-query. Any chance you could implement that in the planner? It likely needs planner enhancement prior to other applications... http://www.postgresql.org/message-id/ca+tgmobgwkhfzc09b+s2lxjtword5ht-avovdvaq4+rpwro...@mail.gmail.com Once planner allowed to have both of normal path and partial aggregation paths to compare according to the cost, it is the straightforward way to do. Here are various academic research, for example, below is the good starting point to clarify aggregate queries that we can run with 2-phase approach. http://www.researchgate.net/publication/2715288_Performing_Group-By_before_Join Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei kai...@ak.jp.nec.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New functions
On Wed, Jul 8, 2015 at 10:18 PM, Michael Paquier michael.paqu...@gmail.com wrote: On Wed, Jul 8, 2015 at 9:15 PM, Дмитрий Воронин carriingfat...@yandex.ru wrote: 07.07.2015, 18:34, Michael Paquier michael.paqu...@gmail.com: Speaking of which, I have rewritten the patch as attached. This looks way cleaner than the previous version submitted. Dmitry, does that look fine for you? I am switching this patch as Waiting on Author. Regards, -- Michael Michael, hello. I'm looking your variant of patch. You create function ssl_extensions_info(), that gives information of SSL extensions in more informative view. So, it's cool. OK cool. I think a committer could look at it, hence switching it to Ready for Committer. Note for committers: attached is a small script that will generate a client certificate with extensions enabled. This is helpful when testing this patch. Once created, then simply connect with something like this connection string: host=127.0.0.1 sslmode=verify-full sslcert=client.crt sslkey=client.key sslrootcert=server.crt By querying the new function implemented by this patch the information about the client certificate extensions will show up. -- Michael ssl_key_generate Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] spgist recovery assertion failure
On Mon, Jul 27, 2015 at 2:00 PM, Noah Misch n...@leadboat.com wrote: When I caused a crash during the create_index regression test, recovery hit an assertion failure. Minimal test case: psql -X EOSQL CREATE TABLE t (c text); INSERT INTO t SELECT 'P0123456789abcdef' FROM generate_series(1,1000); INSERT INTO t VALUES ('P0123456789abcdefF'); CREATE INDEX ON t USING spgist (c); EOSQL pg_ctl -m immediate -w restart On which platform are you seeing the failure? I am afraid I could not reproduce the failure on Linux and OSX after testing it on HEAD. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Combining Aggregates
On 27 July 2015 at 12:14, Kouhei Kaigai kai...@ak.jp.nec.com wrote: The main use case people have been talking about is parallel query, but is there some other case this would be useful right now, without the parallel query feature? You and Simon talked about this case: 2. Queries such as: SELECT p.name, SUM(s.qty) FROM sales s INNER JOIN product p ON s.product_id = p.product_id GROUP BY p.name; Such a query could be transformed into: SELECT p.name,SUM(qty) FROM (SELECT product_id,SUM(qty) AS qty FROM sales GROUP BY product_id) s INNER JOIN product p ON p.product_id = s.product_id GROUP BY p_name; Of course the outer query's SUM and GROUP BY would not be required if there happened to be a UNIQUE index on product(name), but assuming there's not then the above should produce the results faster. This of course works ok for SUM(), but for something like AVG() or STDDEV() the combine/merge aggregate functions would be required to process those intermediate aggregate results that were produced by the sub-query. Any chance you could implement that in the planner? It likely needs planner enhancement prior to other applications... http://www.postgresql.org/message-id/ca+tgmobgwkhfzc09b+s2lxjtword5ht-avovdvaq4+rpwro...@mail.gmail.com I had thought this too, but I'm not sure that's 100% correct. As I just said in the my previous email on this thread, I am now working on group by before join. I had started it with the intentions of path-ifying the grouping planner, but I soon realised that the grouping_planner() does quite a bit more at that final stage that I propose to do to allow group by before join. This is mostly around handling of DISTINCT, HAVING and LIMIT. I don't think those need to be handled in my patch, perhaps with the exception of DISTINCT without GROUP BY, but not when both are present. Instead I've started by inventing GroupingPath and I'm now building these new path types once there's enough tables joined for all Vars of the GROUP BY and agg parameters. I believe this optimisation needs to be costed as pushing the GROUP BY down to happen as early as possible is *not* always a win. Perhaps the JOIN is very selective and eliminates many groups. Hence I've added costing and allowed the planner to choose what it thinks is cheapest. Once planner allowed to have both of normal path and partial aggregation paths to compare according to the cost, it is the straightforward way to do. I've ended up with 2 boolean members to struct GroupingPath, combine_states and finalize_aggs. I plan to modify nodeAgg.c to use these, and EXPLAIN to give a better description of what its doing. Here are various academic research, for example, below is the good starting point to clarify aggregate queries that we can run with 2-phase approach. http://www.researchgate.net/publication/2715288_Performing_Group-By_before_Join Thanks, I've been using that very paper. Regards David Rowley -- David Rowley http://www.2ndQuadrant.com/ http://www.2ndquadrant.com/ PostgreSQL Development, 24x7 Support, Training Services
[HACKERS] Documentation tweak for row-valued expressions and null
Hi I wonder if it might be worth adding a tiny note to the manual to point out that the special logic for row-valued-expression IS [ NOT ] NULL doesn't apply anywhere else that we handle nulls or talk about [non]-null values in the manual. See attached. -- Thomas Munro http://www.enterprisedb.com note-about-row-is-null.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] spgist recovery assertion failure
When I caused a crash during the create_index regression test, recovery hit an assertion failure. Minimal test case: psql -X EOSQL CREATE TABLE t (c text); INSERT INTO t SELECT 'P0123456789abcdef' FROM generate_series(1,1000); INSERT INTO t VALUES ('P0123456789abcdefF'); CREATE INDEX ON t USING spgist (c); EOSQL pg_ctl -m immediate -w restart The log ends with: 29294 2015-07-27 04:41:43.330 GMT LOG: database system was not properly shut down; automatic recovery in progress 29294 2015-07-27 04:41:43.330 GMT LOG: redo starts at 0/17A4070 TRAP: FailedAssertion(!(xldata-parentBlk == -1), File: spgxlog.c, Line: 338) 29292 2015-07-27 04:41:43.338 GMT LOG: startup process (PID 29294) was terminated by signal 6: Aborted 29292 2015-07-27 04:41:43.338 GMT LOG: aborting startup due to startup process failure REL9_4_STABLE is unaffected. I suspect, but have not checked, that a standby replaying WAL from a cluster running make installcheck would observe the same problem. Thanks, nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CustomScan and readfuncs.c
Kouhei Kaigai kai...@ak.jp.nec.com writes: Under the investigation of ParallelAppend, I noticed here is a few problems in CustomScan, that prevents to reproduce an equivalent plan node on the background worker from serialized string. 1. CustomScanMethods-TextOutCustomScan callback This callback allows to output custom information on nodeToString. Originally, we intend to use this callback for debug only, because CustomScan must be copyObject() safe, thus, all the private data also must be stored in custom_exprs or custom_private. However, it will lead another problem when we try to reproduce CustomScan node from the string form generated by outfuncs.c. If TextOutCustomScan prints something, upcoming _readCustomScan has to deal with unexpected number of tokens in unexpected format. Um ... wait a second. There is no support in readfuncs for any plan node type, and never has been, and I seriously doubt that there ever should be. I do not think it makes sense to ship plans around in the way you seem to have in mind. (Also, I don't think the problems you mention are exactly unique to CustomScan. There's no reason to assume that FDW plans could survive this treatment either, since we do not know what's in the fdw_private stuff; certainly no one has ever suggested that it should not contain pointers to static data.) Yep, no Plan node types are supported at this moment, however, will appear soon by the Funnel + PartialSeqScan nodes. It serializes a partial plan subtree using nodeToString() then gives the flatten PlannedStmt to background workers. I'm now investigating to apply same structure to Append not to kick child nodes in parallel. Once various plan node types appear in readfuncs.c, we have to care about this problem, don't it? I'm working for the patch submission of ParallelAppend on the next commit-fest, so like to make a consensus how to treat this matter. I'd like to propose to omit this callback prior to v9.5 release, for least compatibility issues. I regard our commitment to cross-version compatibility for the custom scan APIs as being essentially zero, for reasons previously discussed. So if this goes away in 9.6 it will not matter, but we might as well leave it in for now for debug support. I don't argue this point strongly. If TextOutCustomScan shall be obsoleted on v9.6, it is just kindness for developers not to use this callback. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei kai...@ak.jp.nec.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Failing assertions in indxpath.c, placeholder.c and brin_minmax.c
On Sun, Jul 26, 2015 at 10:32 PM, Andreas Seltenreich seltenre...@gmx.de wrote: Michael Paquier writes: Footnotes: [1] https://github.com/anse1/sqlsmith This is really interesting stuff. I think that it would be possible to extract self-contained test cases from your tool and those queries to reproduce the failures. It is written that this tools connects to a database to retrieve the schema, what is it exactly in the case of those failures? I used the database regression that pg_regress leaves behind when you remove the --temp-install from it's default invocation through make check. Sorry about not being explicit about that. So, dropping one of the queries into src/test/regress/sql/smith.sql and invoking make check EXTRA_TESTS=smith was all that was needed to integrate them. I was then able to perform git bisect run on this command. Er, plus consing the expected output file. I'm using the regression db a lot when hacking on sqlsmith, as it contains much more nasty things than your average database. Ah, OK. Thanks. The code is licensed as GPL, has a dependency on libpqxx and is written in C++, so it cannot be integrated into core as a test module in this state, but I think that it would be definitely worth having something like that in the code tree that runs on the buildfarm. We could have caught up those problems earlier. Now I imagine that this is a costly run, so we had better have a switch to control if it is run or not, like a configure option or a flag. Thoughts? -- Michael
Re: [HACKERS] raw output from copy
On 7 July 2015 at 14:32, Pavel Stehule pavel.steh...@gmail.com wrote: Hi previous patch was broken, and buggy Here is new version with fixed upload and more tests I routinely see people trying to use COPY ... FORMAT binary to export a single binary field (like an image, for example) and getting confused by the header PostgreSQL adds. Or using text-format COPY and struggling with the hex escaping. It's clearly something people have trouble with. It doesn't help that while lo_import and lo_export can read paths outside the datadir (and refuse to read from within it), pg_read_binary_file is superuser only and disallows absolute paths. There's no corresponding pg_write_binary_file. So users who want to import and export a single binary field tend to try to use COPY. We have functionality for large objects that has no equivalent for 'bytea'. I don't love the use of COPY for this, but it gets us support for arbitrary clients pretty easily. Otherwise it'd be server-side only via local filesystem access, or require special psql-specific functionality like we have for lo_import etc. The main point is that this is a real world thing. People want to do it, try to do it, and have problems doing it. So it's a solution a real issue. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On Thu, Jul 23, 2015 at 9:42 PM, Amit Kapila amit.kapil...@gmail.com wrote: On Wed, Jul 22, 2015 at 9:14 PM, Robert Haas robertmh...@gmail.com wrote: One thing I noticed that is a bit dismaying is that we don't get a lot of benefit from having more workers. Look at the 0.1 data. At 2 workers, if we scaled perfectly, we would be 3x faster (since the master can do work too), but we are actually 2.4x faster. Each process is on the average 80% efficient. That's respectable. At 4 workers, we would be 5x faster with perfect scaling; here we are 3.5x faster. So the third and fourth worker were about 50% efficient. Hmm, not as good. But then going up to 8 workers bought us basically nothing. I think the improvement also depends on how costly is the qualification, if it is costly, even for same selectivity the gains will be shown till higher number of clients and for simple qualifications, we will see that cost of having more workers will start dominating (processing data over multiple tuple queues) over the benefit we can achieve by them. Yes, That's correct. when the qualification cost is increased, the performance is also increasing with number of workers. Instead of using all the configured workers per query, how about deciding number of workers based on cost of the qualification? I am not sure whether we have any information available to find out the qualification cost. This way the workers will be distributed to all backends properly. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] spgist recovery assertion failure
On Mon, Jul 27, 2015 at 02:19:09PM +0900, Michael Paquier wrote: On Mon, Jul 27, 2015 at 2:00 PM, Noah Misch n...@leadboat.com wrote: When I caused a crash during the create_index regression test, recovery hit an assertion failure. Minimal test case: psql -X EOSQL CREATE TABLE t (c text); INSERT INTO t SELECT 'P0123456789abcdef' FROM generate_series(1,1000); INSERT INTO t VALUES ('P0123456789abcdefF'); CREATE INDEX ON t USING spgist (c); EOSQL pg_ctl -m immediate -w restart On which platform are you seeing the failure? I am afraid I could not reproduce the failure on Linux and OSX after testing it on HEAD. I saw it on ppc64 Fedora and on {ppc32 PostgreSQL, ppc64 kernel} AIX. Like you, I don't see it on x86_64 Ubuntu. Perhaps it is specific to big-endian. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LWLock deadlock and gdb advice
On 2015-07-16 PM 04:03, Jeff Janes wrote: On Wed, Jul 15, 2015 at 8:44 AM, Heikki Linnakangas hlinn...@iki.fi wrote: Both. Here's the patch. Previously, LWLockAcquireWithVar set the variable associated with the lock atomically with acquiring it. Before the lwlock-scalability changes, that was straightforward because you held the spinlock anyway, but it's a lot harder/expensive now. So I changed the way acquiring a lock with a variable works. There is now a separate flag, LW_FLAG_VAR_SET, which indicates that the current lock holder has updated the variable. The LWLockAcquireWithVar function is gone - you now just use LWLockAcquire(), which always clears the LW_FLAG_VAR_SET flag, and you can call LWLockUpdateVar() after that if you want to set the variable immediately. LWLockWaitForVar() always waits if the flag is not set, i.e. it will not return regardless of the variable's value, if the current lock-holder has not updated it yet. I ran this for a while without casserts and it seems to work. But with casserts, I get failures in the autovac process on the GIN index. I don't see how this is related to the LWLock issue, but I didn't see it without your patch. Perhaps the system just didn't survive long enough to uncover it without the patch (although it shows up pretty quickly). It could just be an overzealous Assert, since the casserts off didn't show problems. bt and bt full are shown below. I got a similar assert failure but with a btree index (pg_attribute_relid_attnum_index). The backtrace looks like Jeff's: (gdb) bt #0 0x003969632625 in raise () from /lib64/libc.so.6 #1 0x003969633e05 in abort () from /lib64/libc.so.6 #2 0x0092eb9e in ExceptionalCondition (conditionName=0x9c2220 !(((PageHeader) (page))-pd_special = (__builtin_offsetof (PageHeaderData, pd_linp))), errorType=0x9c0c41 FailedAssertion, fileName=0x9c0c10 nbtree.c, lineNumber=903) at assert.c:54 #3 0x004e02d8 in btvacuumpage (vstate=0x7fff2c7655f0, blkno=9, orig_blkno=9) at nbtree.c:903 #4 0x004e0067 in btvacuumscan (info=0x7fff2c765cd0, stats=0x279f7d0, callback=0x668f6d lazy_tid_reaped, callback_state=0x279e338, cycleid=49190) at nbtree.c:821 #5 0x004dfdde in btbulkdelete (fcinfo=0x7fff2c7657d0) at nbtree.c:676 #6 0x00939769 in FunctionCall4Coll (flinfo=0x7fff2c765bb0, collation=0, arg1=140733939342544, arg2=0, arg3=6721389, arg4=41542456) at fmgr.c:1375 #7 0x004d2a01 in index_bulk_delete (info=0x7fff2c765cd0, stats=0x0, callback=0x668f6d lazy_tid_reaped, callback_state=0x279e338) at indexam.c:690 #8 0x0066861d in lazy_vacuum_index (indrel=0x7fd40ab846f0, stats=0x279e770, vacrelstats=0x279e338) at vacuumlazy.c:1367 #9 0x006678a8 in lazy_scan_heap (onerel=0x274ec90, vacrelstats=0x279e338, Irel=0x279e790, nindexes=2, scan_all=0 '\000') at vacuumlazy.c:1098 #10 0x006660f7 in lazy_vacuum_rel (onerel=0x274ec90, options=99, params=0x27bdc88, bstrategy=0x27bdd18) at vacuumlazy.c:244 #11 0x00665c1a in vacuum_rel (relid=1249, relation=0x7fff2c7662a0, options=99, params=0x27bdc88) at vacuum.c:1388 #12 0x006643ce in vacuum (options=99, relation=0x7fff2c7662a0, relid=1249, params=0x27bdc88, va_cols=0x0, bstrategy=0x27bdd18, isTopLevel=1 '\001') at vacuum.c:293 #13 0x0075d23c in autovacuum_do_vac_analyze (tab=0x27bdc80, bstrategy=0x27bdd18) at autovacuum.c:2807 #14 0x0075c632 in do_autovacuum () at autovacuum.c:2328 #15 0x0075b457 in AutoVacWorkerMain (argc=0, argv=0x0) at autovacuum.c:1647 #16 0x0075b0a5 in StartAutoVacWorker () at autovacuum.c:1454 #17 0x0076f9cc in StartAutovacuumWorker () at postmaster.c:5261 #18 0x0076f28a in sigusr1_handler (postgres_signal_arg=10) at postmaster.c:4918 #19 signal handler called #20 0x0039696e1353 in __select_nocancel () from /lib64/libc.so.6 #21 0x0076ace7 in ServerLoop () at postmaster.c:1582 #22 0x0076a538 in PostmasterMain (argc=3, argv=0x26f9330) at postmaster.c:1263 #23 0x006c1c2e in main (argc=3, argv=0x26f9330) at main.c:223 Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sharing aggregate states between different aggregate functions
On 27 July 2015 at 03:24, Heikki Linnakangas hlinn...@iki.fi wrote: On 07/09/2015 12:44 PM, David Rowley wrote: On 15 June 2015 at 12:05, David Rowley david.row...@2ndquadrant.com wrote: This basically allows an aggregate's state to be shared between other aggregate functions when both aggregate's transition functions (and a few other things) match There's quite a number of aggregates in our standard set which will benefit from this optimisation. After compiling the original patch with another compiler, I noticed a couple of warnings. The attached fixes these. I spent some time reviewing this. I refactored the ExecInitAgg code rather heavily to make it more readable (IMHO); see attached. What do you think? Did I break anything? Thanks for taking the time to look at this and makes these fixes. I'm just looking over your changes: - ereport(ERROR, - (errcode(ERRCODE_INVALID_FUNCTION_DEFINITION), - errmsg(aggregate %u needs to have compatible input type and transition type, - aggref-aggfnoid))); + ereport(ERROR, + (errcode(ERRCODE_INVALID_FUNCTION_DEFINITION), + errmsg(aggregate needs to have compatible input type and transition type))); I can't quite see the reason to remove the agg OID from the error message here. It seems to be still valid to use as build_peraggstate_for_aggref() only is called when nothing is shared. - * agg_input_types, agg_state_type, agg_result_type identify the input, - * transition, and result types of the aggregate. These should all be - * resolved to actual types (ie, none should ever be ANYELEMENT etc). + * agg_input_types identifies the input types of the aggregate. These should + * be resolved to actual types (ie, none should ever be ANYELEMENT etc). I'm not sure I understand why agg_state_type and agg_result_type have changed here. + peraggstate-sortstates = (Tuplesortstate **) + palloc0(sizeof(Tuplesortstate *) * numGroupingSets); + for (currentsortno = 0; currentsortno numGroupingSets; currentsortno++) + peraggstate-sortstates[currentsortno] = NULL; This was not you, but this NULL setting looks unneeded due to the palloc0(). Some comments: * In aggref_has_compatible_states(), you give up if aggtype or aggcollid differ. But those properties apply to the final function, so you were leaving some money on the table by disallowing state-sharing if they differ. Good catch, and accurate analogy. Thanks for fixing. * The filter and input expressions are initialized for every AggRef, before the deduplication logic kicks in. The AggrefExprState.aggfilter, aggdirectargs and args fields really belong to the AggStatePerAggState struct instead. This is not a new issue, but now that we have a convenient per-aggstate struct to put them in, let's do so. Good idea. I failed to notice that code over there in execQual.c so I agree that where you've moved it to is much better. * There was a reference-after free bug in aggref_has_compatible_states; you cannot ReleaseSysCache and then continue pointing to the struct. Thanks for fixing. In this function I also wasn't quite sure if it was with comparing both non-NULL INITCOND's here. I believe my code comments may slightly contradict what the code actually does, as the comments talk about them having to match, but the code just bails if any are non-NULL. The reason I didn't check them was because it seems inevitable that some duplicate work needs to be done when setting up the INITCOND. Perhaps it's worth it? select aggfnoid || '(' || typname || ')',aggtransfn,agginitval from pg_aggregate inner join pg_type on aggtranstype = oid where aggtransfn in (select aggtransfn from pg_aggregate group by aggtransfn having count(*)1) order by aggtransfn; This indicates that everything using float4_accum as a transfn could benefit from that. I just wasn't sure how far to go. * The code was a bit fuzzy on which parts of the per-aggstate are filled in at what time. Some of the fields were overwritten every time a match was found. With the same values, so no harm done, but I found it confusing. I refactored ExecInitAgg in the attached patch to clear that up. * There API of build_aggregate_fnexprs() was a bit strange now that some callers use it to only fill in the final function, some call it to fill both the transition functions and the final function. I split it to two separate functions. That's much better. * I wonder if we should do this duplicate elimination at plan time. It's very fast, so I'm not worried about that right now, but you had grand plans to expand this so that an aggregate could optionally use one of many different kinds of state values. At that point, it certainly seems like a planning decision to decide which aggregates share state. I think we can leave it as it is for now, but it's something to perhaps consider later. I don't think I'm going to get the time to work on the supporting aggregate stuff you're talking about, but I think it's a good enough
Re: [HACKERS] Buildfarm failure from overly noisy warning message
Tom Lane t...@sss.pgh.pa.us wrote: + WARNING: could not send signal to process 30123: No such process What's evidently happened here is that our session tried to boot an autovacuum process off a table lock, only that process was gone by the time we issued the kill() call. I'm inclined to reduce the WARNING to LOG, and/or skip it altogether if the error is ESRCH. One could also argue that both of these ereports ought to be downgraded to DEBUG1 or less, since this mechanism is pretty well shaken out by now. Thoughts? I think a LOG entry when an autovacuum process is actually canceled has value just in case it is happening on a particular table so frequently that the table starts to bloat. I see no reason to log anything if there is an intention to cancel an autovacuum process but it actually completes before we can do so. IMV the best solution would be to proceed straight to the kill() and only log something if it was found by kill(). -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] more RLS oversights
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 07/03/2015 10:03 AM, Noah Misch wrote: (2) CreatePolicy() and AlterPolicy() omit to create a pg_shdepend entry for each role in the TO clause. Test case: Please see the attached patch. Note that I used SHARED_DEPENDENCY_ACL for this. It seems appropriate, but possibly we should invent a new shared dependency type for this use case? Comments? Thanks, Joe -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAEBAgAGBQJVtSlAAAoJEDfy90M199hlrkIP/0BqMj30JpJVj3H5pIopU0RZ pesBzFzzsWmt2QmasX9hajRfo6eXQAWPmKQmw/NDTJvHHNACxLdo0MHE7A9y7No7 aZIFTm0KhnG4jpzxfzpGqQ4ron5Vsc2TlNQgCBYCtbEEtuD0mV2qmcuTkqGte7iB 7iqneRBhmTYBy63X7S0ir4AhDi+JJg8P4F7YuU/XMcha5v5CpNJAToPpW7mCoZ8O 9w/RbXCHso7p1DIxfx4tfxVwLQ7j7G2j0rXbuA2e6OKfwZWWrk5E8Wnvc3xy3yCY J37fcjOxFdhU/j1j+Tr90LYOSuRu5fQ4z6PmmsPkBM3T0Vllz/nNP64aVKbmHzga szrIBERRs2icKa4OI08qZF42ObIEv0/t/NuyrXpegY6awRNNNsjyZvwM+51zdjw1 fuWh+2rObzh3RDH8lPB0N0rVfDMM+wU85Vaekckv/7UQ/pbWcwsYD/tykA1engQ8 Ww8kBAEct4dWdqppTqvLLxLgo4d+vuiS1mJ7SRY2aZFRX8QQ8TfB3eObETUsU4/X 9UWyMn+E0Au9ICX+wfD4whaBKst8EuO36qOZx0oZt+++EMOlzypgkCCxDtZO0KWd KZHbtmJXHFVH+ihbEGXAKMIx+gLqSDG3IKy9MIJxpB3JY3XSdBNqobL2hiQy36/w svK7i+mEYmUCQ6pB1j8c =P1AS -END PGP SIGNATURE- diff --git a/src/backend/commands/policy.c b/src/backend/commands/policy.c index 17b48d4..20ac54e 100644 *** a/src/backend/commands/policy.c --- b/src/backend/commands/policy.c *** *** 22,27 --- 22,28 #include catalog/indexing.h #include catalog/namespace.h #include catalog/objectaccess.h + #include catalog/pg_authid.h #include catalog/pg_policy.h #include catalog/pg_type.h #include commands/policy.h *** *** 48,54 static void RangeVarCallbackForPolicy(const RangeVar *rv, Oid relid, Oid oldrelid, void *arg); static char parse_policy_command(const char *cmd_name); ! static ArrayType *policy_role_list_to_array(List *roles); /* * Callback to RangeVarGetRelidExtended(). --- 49,55 static void RangeVarCallbackForPolicy(const RangeVar *rv, Oid relid, Oid oldrelid, void *arg); static char parse_policy_command(const char *cmd_name); ! static Datum *policy_role_list_to_array(List *roles, int *num_roles); /* * Callback to RangeVarGetRelidExtended(). *** parse_policy_command(const char *cmd_nam *** 130,159 /* * policy_role_list_to_array ! * helper function to convert a list of RoleSpecs to an array of role ids. */ ! static ArrayType * ! policy_role_list_to_array(List *roles) { ! ArrayType *role_ids; ! Datum *temp_array; ListCell *cell; - int num_roles; int i = 0; /* Handle no roles being passed in as being for public */ if (roles == NIL) { ! temp_array = (Datum *) palloc(sizeof(Datum)); ! temp_array[0] = ObjectIdGetDatum(ACL_ID_PUBLIC); ! role_ids = construct_array(temp_array, 1, OIDOID, sizeof(Oid), true, ! 'i'); ! return role_ids; } ! num_roles = list_length(roles); ! temp_array = (Datum *) palloc(num_roles * sizeof(Datum)); foreach(cell, roles) { --- 131,158 /* * policy_role_list_to_array ! * helper function to convert a list of RoleSpecs to an array of ! * role id Datums. */ ! static Datum * ! policy_role_list_to_array(List *roles, int *num_roles) { ! Datum *role_oids; ListCell *cell; int i = 0; /* Handle no roles being passed in as being for public */ if (roles == NIL) { ! *num_roles = 1; ! role_oids = (Datum *) palloc(*num_roles * sizeof(Datum)); ! role_oids[0] = ObjectIdGetDatum(ACL_ID_PUBLIC); ! return role_oids; } !*num_roles = list_length(roles); ! role_oids = (Datum *) palloc(*num_roles * sizeof(Datum)); foreach(cell, roles) { *** policy_role_list_to_array(List *roles) *** 164,187 */ if (spec-roletype == ROLESPEC_PUBLIC) { ! if (num_roles != 1) ereport(WARNING, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg(ignoring roles specified other than public), errhint(All roles are members of the public role.))); ! temp_array[0] = ObjectIdGetDatum(ACL_ID_PUBLIC); ! num_roles = 1; ! break; } else ! temp_array[i++] = ObjectIdGetDatum(get_rolespec_oid((Node *) spec, false)); } ! role_ids = construct_array(temp_array, num_roles, OIDOID, sizeof(Oid), true, ! 'i'); ! ! return role_ids; } /* --- 163,187 */ if (spec-roletype == ROLESPEC_PUBLIC) { ! Datum *tmp_role_oids; ! ! if (*num_roles != 1) ereport(WARNING, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg(ignoring roles specified other than public), errhint(All roles are members of the public role.))); ! *num_roles = 1; ! tmp_role_oids = (Datum *) palloc(*num_roles * sizeof(Datum)); ! tmp_role_oids[0] = ObjectIdGetDatum(ACL_ID_PUBLIC); ! ! return tmp_role_oids; } else