Re: [HACKERS] Patch to support SEMI and ANTI join removal
On Wed, Nov 19, 2014 at 11:49 PM, David Rowley dgrowle...@gmail.com wrote: On Sun, Nov 16, 2014 at 12:19 PM, David Rowley dgrowle...@gmail.com wrote: On Sun, Nov 16, 2014 at 10:09 AM, Simon Riggs si...@2ndquadrant.com wrote: I propose that we keep track of whether there are any potentially skippable joins at the top of the plan. When we begin execution we do a single if test to see if there is run-time work to do. If we pass the run-time tests we then descend the tree and prune the plan to completely remove unnecessary nodes. We end with an EXPLAIN and EXPLAIN ANALYZE that looks like this QUERY PLAN -- Aggregate (actual rows=1 loops=1) - Seq Scan on t1 (actual rows=100 loops=1) Doing that removes all the overheads and complexity; it also matches how join removal currently works. I've attached an updated patch which works in this way. All of the skipping code that I had added to the executor's join functions has now been removed. Here's an example output with the plan trimmed, and then untrimmed. set constraints b_c_id_fkey deferred; explain (costs off) select b.* from b inner join c on b.c_id = c.id; QUERY PLAN --- Seq Scan on b (1 row) -- add a item to the trigger queue by updating a referenced record. update c set id = 2 where id=1; explain (costs off) select b.* from b inner join c on b.c_id = c.id; QUERY PLAN -- Hash Join Hash Cond: (b.c_id = c.id) - Seq Scan on b - Hash - Seq Scan on c (5 rows) A slight quirk with the patch as it stands is that I'm unconditionally NOT removing Sort nodes that sit below a MergeJoin node. The reason for this is that I've not quite figured out a way to determine if the Sort order is required still. An example of this can be seen in the regression tests: -- check merge join nodes are removed properly set enable_hashjoin = off; -- this should remove joins to b and c. explain (costs off) select COUNT(*) from a inner join b on a.b_id = b.id left join c on a.id = c.id; QUERY PLAN --- Aggregate - Sort Sort Key: a.b_id - Seq Scan on a (4 rows) As the patch stands there's still a couple of FIXMEs in there, so there's still a bit of work to do yet. Comments are welcome Regards David Rowley inner_join_removals_2014-11-24_7cde1e4.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] make check-world regress failed
Hi, PostgreSQL check-world regress failed with current GIT HEAD on my Kubuntu 14.10. uname -a Linux vlD-kuci 3.16.0-24-generic #32-Ubuntu SMP Tue Oct 28 13:13:18 UTC 2014 i686 athlon i686 GNU/Linux gdb -d /home/src/postgresql-devel/postgresql-git/postgresql/src -c core ... Loaded symbols for /home/src/postgresql-devel/dev-build/src/test/regress/regress.so (gdb) bt #0 0xb76ecc7c in __kernel_vsyscall () #1 0xb7075577 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 #2 0xb7076cf3 in __GI_abort () at abort.c:89 #3 0x084c326a in ?? () #4 0x0a56c3b8 in ?? () #5 0xb76d232f in pg_atomic_init_u64 (ptr=0xbfa16fd4, val=0) at /home/src/postgresql-devel/postgresql-git/postgresql/src/include/port/atomics.h:445 #6 0xb76d50e4 in test_atomic_uint64 () at /home/src/postgresql-devel/postgresql-git/postgresql/src/test/regress/regress.c:1022 #7 0xb76d5756 in test_atomic_ops (fcinfo=0xa57c76c) at /home/src/postgresql-devel/postgresql-git/postgresql/src/test/regress/regress.c:1114 #8 0x0825bfee in ?? () ... My patch solved check-world regress fail. Best regards Vladimir Kokovic DP senior generic-gcc.diff Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: plpgsql - Assert statement
I wrote: The core of that complaint is that we'd have to make ASSERT a plpgsql reserved word, which is true enough as things stand today. However, why is it that plpgsql statement-introducing keywords need to be reserved? The only reason for that AFAICS is to allow us to distinguish the statements from assignments. But it seems like that could possibly be gotten around with some work. Attached is a draft patch that de-reserves 17 of plpgsql's existing reserved words, leaving 24 still reserved (of which only 9 are not also reserved in the core grammar). This would make it painless to invent an ASSERT statement, as well as any other new statement type that's not associated with looping or block structure. (The limiting factor on those is that if a statement could have an opt_block_label, its keyword still has to be reserved, unless we want to complicate matters a bunch more.) regards, tom lane diff --git a/src/pl/plpgsql/src/pl_gram.y b/src/pl/plpgsql/src/pl_gram.y index 82378c7..4af28ea 100644 *** a/src/pl/plpgsql/src/pl_gram.y --- b/src/pl/plpgsql/src/pl_gram.y *** unreserved_keyword : *** 2315,2346 --- 2315,2360 | K_ALIAS | K_ARRAY | K_BACKWARD + | K_CLOSE + | K_COLLATE | K_COLUMN | K_COLUMN_NAME | K_CONSTANT | K_CONSTRAINT | K_CONSTRAINT_NAME + | K_CONTINUE | K_CURRENT | K_CURSOR | K_DATATYPE | K_DEBUG + | K_DEFAULT | K_DETAIL + | K_DIAGNOSTICS | K_DUMP + | K_ELSIF | K_ERRCODE | K_ERROR + | K_EXCEPTION + | K_EXIT + | K_FETCH | K_FIRST | K_FORWARD + | K_GET | K_HINT | K_INFO + | K_INSERT | K_IS | K_LAST | K_LOG | K_MESSAGE | K_MESSAGE_TEXT + | K_MOVE | K_NEXT | K_NO | K_NOTICE + | K_OPEN | K_OPTION + | K_PERFORM | K_PG_CONTEXT | K_PG_DATATYPE_NAME | K_PG_EXCEPTION_CONTEXT *** unreserved_keyword : *** 2349,2356 --- 2363,2372 | K_PRINT_STRICT_PARAMS | K_PRIOR | K_QUERY + | K_RAISE | K_RELATIVE | K_RESULT_OID + | K_RETURN | K_RETURNED_SQLSTATE | K_REVERSE | K_ROW_COUNT diff --git a/src/pl/plpgsql/src/pl_scanner.c b/src/pl/plpgsql/src/pl_scanner.c index 6a5a04b..db01ba5 100644 *** a/src/pl/plpgsql/src/pl_scanner.c --- b/src/pl/plpgsql/src/pl_scanner.c *** IdentifierLookup plpgsql_IdentifierLooku *** 31,51 * * We keep reserved and unreserved keywords in separate arrays. The * reserved keywords are passed to the core scanner, so they will be ! * recognized before (and instead of) any variable name. Unreserved ! * words are checked for separately, after determining that the identifier * isn't a known variable name. If plpgsql_IdentifierLookup is DECLARE then * no variable names will be recognized, so the unreserved words always work. * (Note in particular that this helps us avoid reserving keywords that are * only needed in DECLARE sections.) * * In certain contexts it is desirable to prefer recognizing an unreserved ! * keyword over recognizing a variable name. Those cases are handled in ! * pl_gram.y using tok_is_keyword(). * ! * For the most part, the reserved keywords are those that start a PL/pgSQL ! * statement (and so would conflict with an assignment to a variable of the ! * same name). We also don't sweat it much about reserving keywords that ! * are reserved in the core grammar. Try to avoid reserving other words. */ /* --- 31,58 * * We keep reserved and unreserved keywords in separate arrays. The * reserved keywords are passed to the core scanner, so they will be ! * recognized before (and instead of) any variable name. Unreserved words ! * are checked for separately, usually after determining that the identifier * isn't a known variable name. If plpgsql_IdentifierLookup is DECLARE then * no variable names will be recognized, so the unreserved words always work. * (Note in particular that this helps us avoid reserving keywords that are * only needed in DECLARE sections.) * * In certain contexts it is desirable to prefer recognizing an unreserved ! * keyword over recognizing a variable name. In particular, at the start ! * of a statement we should prefer unreserved keywords unless the statement ! * looks like an assignment (i.e., first token is followed by ':=' or '['). ! * This rule allows most statement-introducing keywords to be kept unreserved. ! * (We still have to reserve initial keywords that might follow a block ! * label, unfortunately, since the method used to determine if we are at ! * start of statement doesn't recognize such cases. We'd also have to ! * reserve any keyword that could legitimately be followed by ':=' or '['.) ! * Some additional cases are handled in pl_gram.y using
Re: [HACKERS] Turning recovery.conf into GUCs
On 2014-11-21 10:59:23 -0800, Josh Berkus wrote: On 11/21/2014 10:54 AM, Stephen Frost wrote: * Josh Berkus (j...@agliodbs.com) wrote: Either way, from the code it is clear that we only stay in recovery if standby_mode is directly turned on. This makes the whole check for a specially named file unnecessary, IMO: we should just check the value of standby_mode (which is off by default). So, what happens when someone does pg_ctl promote? Somehow standby_mode needs to get set to off. Maybe we write standby_mode = off to postgresql.auto.conf? Uhh, and then not work for the sane folks who disable postgresql.auto.conf? No thanks.. Other ideas then, without reverting to the old system? Being able to promote servers over port 5432 will be a huge advantage for container-based systems, so I don't want to give that up as a feature. Why is a trigger file making that impossible? Adding the code from pg_ctl promote into a SQL callable function is a couple minutes worth of work? A guc alone won't work very well - standby_mode is checked in specific places, you can't just turn that off and expect that that'd result in speedy promotion. And it'd break people using scripts pg_standby. And it also has other effects. So, no. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] superuser() shortcuts
On 2014-11-21 10:12:40 -0500, Stephen Frost wrote: * Andres Freund (and...@2ndquadrant.com) wrote: I still think this change makes the error message more verbose, without any win in clarity. Can we agree that there should be consistency? Consistency with what? Are you thinking of the messages in aclck.c:no_priv_msg? I don't think that's really comparable. A permission denied on a relation is much easier to understand than replication permissions and such. It'd surely not be better if pg_basebackup would a error message bar actually helpful information. Btw, the replication permission use in postinit.c isn't related to slots. I'm not really particular about which way we go with the specific wording (suggestions welcome..) but the inconsistency should be dealt with. Meh. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] group locking: incomplete patch, just for discussion
On 2014-11-21 08:31:01 -0500, Robert Haas wrote: On Thu, Nov 20, 2014 at 8:31 PM, Andres Freund and...@2ndquadrant.com wrote: I can't follow that logic. Why do you think self-exclusive locks are the only ones that matter? All the others can be acquired by jumping in front of the lock's waitqueue? That's is true, as a practical matter, in many cases. But from the point of view of the lock manager, it's really not true. It is almost true if you assume that the lock level acquired by the parallel backend will be weaker than the lock level held by the user backed, but not quite, because RowExclusiveLock conflicts with ShareLock. The assumption that a parallel backend will accquire a locklevel = the worker's lock imo is a pretty sane one - we obviously can't predict things if not. And I don't think any of the other presented approaches can do that. I'm not really bothered by RowExclusiveLock vs. ShareLock. Unless I miss something that can only be problematic if the primary acquires a ShareLock and the worker tries a RowExclusive. That just can't be sane. The assumption that the parallel backend's lock level will always be weaker seems shaky. It's probably true for relation locks, but if we ever want to support parallel workers that can write any data whatsoever, they're going to need to be able to take relation extension locks. I don't think extension locks are a good argument here. As you noted upthread, they really need to be used consistenly across processes. And they shouldn't/wouldn't be granted by your suggested mechanism either, right? If one backend in a parallel group attempts to acquire the relation extension lock while another worker holds it, the rule that all group members should be regarded as waiting for each other will result in instantaneous deadlock. That's pretty undesirable, and suggests to me that the whole model is defective in some way. Shouldn't the deadlock checker actually check that all involved processes are actually waiting for a lock? It seems a bit pointless to deadlock when one node is actually progressing. That seems like it'd be an absolute PITA for anything involving system tables and such. I don't buy the plan time/execution time argument. We'll need to be able to adapt plans to the availability of bgworker slots *anyway*. Otherwise we either need to to set the number of bgworkers to MaxConnections * MaxParallelism or live with errors as soon as too many processes use parallelism. The former is clearly unrealistic given PG's per-backend overhead and the latter will be a *much* bigger PITA than all the deadlock scenarios talked about here. It'll constantly fail, even if everything is ok. I certainly think that any parallel node needs to be able to cope with getting fewer workers than it would ideally like to have. But that's different from saying that when our own backend takes new locks, we have to invalidate plans. Those seem like largely unrelated problems. No need to invalidate them if you have the infrastructure to run with no parallel workers - just use the path that's used if there's no workers available. Well, it's OK for a query to run with less paralellism than the planner thought optimal. It's not OK for it to attempt to run with the requested degree of parallelism and deadlock. I think if the deadlock detector gets involved it's quite unlikely that you're going to be able to make things degrade smoothly to non-parallel operation. That was in response to your argument about plan invalidation - which I don't buy because it'd only reuse the graceful degradiation capabilities we need anyway. The heavyweight locking issue is really killing me, though. I don't really understand why you're not content with just detecting deadlocks for now. Everything else seems like bells and whistles for later. I don't think it's OK for a parallel query to just randomly deadlock. I think that should be the end goal, yes. It's OK for parallel query to excuse itself politely and retire from the field, but I don't think it's OK for it to say, oh, sure, I can parallelize that for you, and then fail by deadlocking with itself. But I also think that right now it doesn't necessarily need to be the focus immediately. This is a topic that I think will be much easier to approach if some demonstration of actual parallel users would be available. Doesn't have to be committable or such, but I think it'll help to shape how this has to look. I think unrecognized deadlocks will make things too annoying to use, but I don't think it needs to be guaranteed deadlock free or such. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On 23 November 2014 14:45, Amit Kapila Wrote Thanks a lot for debugging and fixing the issue.. The stacktrace of crash is as below: #0 0x0080108cf3a4 in .strlen () from /lib64/libc.so.6 #1 0x0080108925bc in ._IO_vfprintf () from /lib64/libc.so.6 #2 0x0080108bc1e0 in .__GL__IO_vsnprintf_vsnprintf () from /lib64/libc.so.6 #3 0x0fff7e581590 in .appendPQExpBufferVA () from /data/akapila/workspace/master/installation/lib/libpq.so.5 #4 0x0fff7e581774 in .appendPQExpBuffer () from /data/akapila/workspace/master/installation/lib/libpq.so.5 #5 0x10003748 in .run_parallel_vacuum () #6 0x10003f60 in .vacuum_parallel () #7 0x10002ae4 in .main () (gdb) f 5 #5 0x10003748 in .run_parallel_vacuum () So now the real reason here is that the list of tables passed to function is corrupted. The below code seems to be the real culprit: vacuum_parallel() { .. if (!tables || !tables-head) { SimpleStringList dbtables = {NULL, NULL}; ... .. tables = dbtables; } .. } In above code dbtables is local to if loop and code is using the address of same to assign it to tables which is used out of if block scope, moving declaration to the outer scope fixes the problem in my environment. Find the updated patch that fixes this problem attached with this mail. Let me know your opinion about the same. Yes, that’s the reason of corruption, this must be causing both the issues, sending corrupted query to server as well as crash at client side. While looking at this problem, I have noticed couple of other improvements: a. In prepare_command() function, patch is doing init of sql buffer (initPQExpBuffer(sql);) which I think is not required as both places from where this function is called, it is done by caller. I think this will lead to memory leak. Fixed.. b. In prepare_command() function, for fixed strings you can use appendPQExpBufferStr() which is what used in original code as well. Changed as per comment.. c. run_parallel_vacuum() { .. prepare_command(connSlot[free_slot].connection, full, verbose, and_analyze, analyze_only, freeze, sql); appendPQExpBuffer(sql, %s, cell-val); .. } I think it is better to end command with ';' by using appendPQExpBufferStr(sql, ;); in above code. Done Latest patch is attached, please have a look. Regards, Dilip Kumar vacuumdb_parallel_v18.patch Description: vacuumdb_parallel_v18.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Mon, Nov 24, 2014 at 7:34 AM, Dilip kumar dilip.ku...@huawei.com wrote: On 23 November 2014 14:45, Amit Kapila Wrote Thanks a lot for debugging and fixing the issue.. Latest patch is attached, please have a look. I think still some of the comments given upthread are not handled: a. About cancel handling b. Setting of inAbort flag for case where PQCancel is successful c. Performance data of --analyze-in-stages switch d. Have one pass over the comments in patch. I could still some wrong multiline comments. Refer below: + /* Otherwise, we got a stage from vacuum_all_databases(), so run + * only that one. */ With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] postgres_fdw behaves oddly
On Sun, Nov 23, 2014 at 2:46 AM, Tom Lane t...@sss.pgh.pa.us wrote: Etsuro Fujita fujita.ets...@lab.ntt.co.jp writes: (2014/11/19 18:21), Ashutosh Bapat wrote: Ok. I added that comment to the commitfest and changed the status to ready for commiter. Many thanks! I committed this with some cosmetic adjustments, and one not-so-cosmetic one: I left it continuing to allow CTID conditions to be sent to the remote server. That works today and we shouldn't disable it, especially not in the back branches where it could mean a performance regression in a minor release. Thanks. As for the question of whether reltargetlist or tlist should be examined, reltargetlist is the correct thing. I do not see a need to modify use_physical_tlist either. If you trace through what happens in e.g. postgres_fdw, you'll notice that it only bothers to retrieve actually-used columns (which it computes correctly) from the remote side. It then constructs a scan tuple that corresponds to the foreign table's physical column set, inserting NULLs for any unfetched columns. This is handed back to execScan.c and matches what a heapscan or indexscan would produce. The point of the use_physical_tlist optimization is to avoid a projection step *on this tuple* if we don't really need to do one (ie, if it'd be just as good for the scan node's output tuple to be identical to the row's physical tuple). If we disabled use_physical_tlist then we'd be forcing a projection step that would have the effect of removing the NULLs for unfetched columns, but it would not actually be of value, just as it isn't in the corresponding cases for heap/index scans. For the current patch, that's fair. In grouping_planner() the targetlist of the topmost plan node of join tree is changed to the set of expressions requested in the query. 1554 else 1555 { 1556 /* 1557 * Otherwise, just replace the subplan's flat tlist with 1558 * the desired tlist. 1559 */ 1560 result_plan-targetlist = sub_tlist; 1561 } This means that for a simple SELECT from the foreign table, projection would be necessary in ExecScan if all the columns are not required. So, the strategy to emulate heap or index scan wastes extra cycles and memory in adding NULL columns which are not required and then filtering them during the projection (within ExecScan itself). It also limits the ability of foreign scans to fetch shippable expressions when those expressions are part of the targetlist and fetching their values (instead of the columns participating in the expressions) reduces the size of the fetched row. These cases do not exist for heap scans or index scans because the tuples are obtained directly from the buffer, so no extra memory required for the extra columns and projection as well as expression evaluation is anyway required. BTW, given that there wasn't any very good way to test the createplan.c fix except by adding test cases to postgres_fdw, I didn't see much value in splitting the patch in two. The committed patch includes tests that the createplan.c bug is fixed as well as the postgres_fdw one. regards, tom lane -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] [TODO] Track number of files ready to be archived in pg_stat_archiver
Hearing nothing from the original author, this patch that was in state Waiting on Author for a couple of days is switched to returned with feedback. Regards, -- Michael