Re: [HACKERS] libpq bad async behaviour
On 14 January 2015 at 08:40, Andres Freund and...@2ndquadrant.com wrote: I think that kind of solution isn't likely to be satisfying. The amount of porting work is just not going to be worth the cost. And it won't be easily hideable in the API at all as the callers will expect a normal fd. All that consumers of the API need is something they can `select()` or equivalent on. Yeah, this is a problem. Fixing it isn't easy, though, I think. I think extern PostgresPollingStatusType PQconnectPoll(PGconn *conn); has the right interface. It returns what upper layers need to wait for. I think we should extend pretty much that to more interfaces. This would be a fine solution. That enum indeed has the correct values/semantics. This likely means that we'll need extended versions of PQFlush() and PQconsumeInput() - afaics it shouldn't be much more? PQping? PQconnectPoll already has it. Though, I think we could probably even reduce this down to a single common function for all cases: PQpoll() or similar.
Re: [HACKERS] OOM on EXPLAIN with lots of nodes
On Tue, Jan 13, 2015 at 8:16 PM, Tom Lane t...@sss.pgh.pa.us wrote: I wrote: Heikki Linnakangas hlinnakan...@vmware.com writes: But do we really need to backpatch any of this? Alexey's example consumes only a couple hundred MB in 9.2, vs about 7GB peak in 9.3 and up. That seems like a pretty nasty regression. I did a bit more measurement of the time and backend memory consumption for Alexey's example EXPLAIN: 9.2: 0.9 sec, circa 200 MB HEAD: 56 sec, circa 7300 MB with patch below: 3.7 sec, circa 300 MB So while this doesn't get us all the way back down to where we were before we started trying to guarantee unique table/column identifiers in EXPLAIN printouts, it's at least a lot closer. Not sure whether to just commit this to HEAD and call it a day, or to risk back-patching. I think we need to back-patch something; that's a pretty nasty regression, and I have some EDB-internal reports that might be from the same cause. I'm not too concerned about forcibly breaking the API here, but I can understand why somebody might want to do that. If we do, I like the idea of renaming ExplainInitState() or maybe by replacing it by a NewExplainState() function that is used instead. But I'm not sure how necessary it is really. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT UPDATE and RLS
* Dean Rasheed (dean.a.rash...@gmail.com) wrote: Turns out it wasn't as simple as that. prepend_row_security_policies() really could get called multiple times for the same RTE, because the call to query_tree_walker() at the end of fireRIRrules() would descend into the just-added quals again. The simplest fix seems to be to process RLS in a separate loop at the end, so that it can have it's own infinite recursion detection, which is different from that needed for pre-existing security quals and with check options from security barrier views. This approach simplifies things a bit, and ensures that we only try to expand RLS once for each RTE. Right, I specifically recall having prepend_row_security_policies() getting called multiple times for the same RTE. I like this approach of using a separate loop though and it strikes me that it lends more weight to the argument that we're better off with these as independent considerations. Also, I'm thinking that it would be better to refactor things a bit and have prepend_row_security_policies() just return the new securityQuals and withCheckOptions to add. Then fireRIRrules() would only have to recurse into the new quals being added, not the already-processed quals. Hmm, good point. Turns out that refactoring actually became necessary in order to fix this bug, but I think it makes things cleaner and more efficient. Sounds good, I'll take a look. Here's an updated patch with a new test for this bug. I've been developing the fixes for these RLS issues as one big patch, but I suppose it would be easy to split up, if that's preferred. I'm alright with it as-is for now. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Safe memory allocation functions
On Tue, Jan 13, 2015 at 10:10 AM, Tom Lane t...@sss.pgh.pa.us wrote: However, there is a larger practical problem with this whole concept, which is that experience should teach us to be very wary of the assumption that asking for memory the system can't give us will just lead to nice neat malloc-returns-NULL behavior. Any small perusal of the mailing list archives will remind you that very often the end result will be SIGSEGV, OOM kills, unrecoverable trap-on-write when the kernel realizes it can't honor a copy-on-write promise, yadda yadda. Agreed that it's arguable that these only occur in misconfigured systems ... but misconfiguration appears to be the default in a depressingly large fraction of systems. (This is another reason for _safe not being the mot juste :-() I don't really buy this. It's pretty incredible to think that after a malloc() failure there is absolutely no hope of carrying on sanely. If that were true, we wouldn't be able to ereport() out-of-memory errors at any severity less than FATAL, but of course it doesn't work that way. Moreover, AllocSetAlloc() contains malloc() and, if that fails, calls malloc() again with a smaller value, without even throwing an error. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hung backends stuck in spinlock heavy endless loop
On Wed, Jan 14, 2015 at 9:49 AM, Andres Freund and...@2ndquadrant.com wrote: On 2015-01-14 09:47:19 -0600, Merlin Moncure wrote: On Wed, Jan 14, 2015 at 9:30 AM, Andres Freund and...@2ndquadrant.com wrote: If you gdb in, and type 'fin' a couple times, to wait till the function finishes, is there actually any progress? I'm wondering whether it's just many catalog accesses + contention, or some other problem. Alternatively set a breakpoint on ScanPgRelation() or so and see how often it's hit. well, i restarted the database, forgetting my looper was running which immediately spun up and it got stuck again with a similar profile (lots of cpu in spinlock): Samples: 3K of event 'cycles', Event count (approx.): 2695723228 31.16% postgres[.] s_lock 22.32% postgres[.] tas 12.13% postgres[.] tas 5.93% postgres[.] spin_delay 5.69% postgres[.] LWLockRelease 3.75% postgres[.] LWLockAcquireCommon 3.61% perf[.] 0x000526c4 2.51% postgres[.] FunctionCall2Coll 1.48% libc-2.17.so[.] 0x0016a132 If you gdb in, and type 'fin' a couple times, (gdb) fin Run till exit from #0 0x7ff4c63f7a97 in semop () from /lib/x86_64-linux-gnu/libc.so.6 0x006de073 in PGSemaphoreLock () (gdb) fin Run till exit from #0 0x006de073 in PGSemaphoreLock () It returned once. Second time, it didn't at least so far (minute or so). Hm, that's autovac though, not the normal user backends that actually do stuff, right? If you could additionally check those, it'd be great. got it to reproduce. had to take a break to get some work done. (gdb) fin Run till exit from #0 0x7f7b07099dc3 in select () from /lib/x86_64-linux-gnu/libc.so.6 0x008d3ac4 in pg_usleep () (gdb) fin Run till exit from #0 0x008d3ac4 in pg_usleep () 0x00750b69 in s_lock () (gdb) fin Run till exit from #0 0x00750b69 in s_lock () 0x00750844 in LWLockRelease () (gdb) fin Run till exit from #0 0x00750844 in LWLockRelease () 0x0073 in LockBuffer () (gdb) fin Run till exit from #0 0x0073 in LockBuffer () 0x004b2db4 in _bt_relandgetbuf () (gdb) fin Run till exit from #0 0x004b2db4 in _bt_relandgetbuf () 0x004b7116 in _bt_moveright () (gdb) fin Run till exit from #0 0x004b7116 in _bt_moveright () so it looks like nobody ever exits from _bt_moveright. any last requests before I start bisecting down? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] explain sortorder
Hi, we will also remove the following is lc_collate hint in the next version, showing only mandatory info as suggested. /* for those who use COLLATE although their default is already the wanted */ if (strcmp(collname, localeptr) == 0) { appendStringInfo(sortorderInformation, (%s is LC_COLLATE), collname); } Anybody insisting on that? Arne Note: I see, at the moment we use the wrong default for DESC. We'll fix that. On Wed, 14 Jan 2015, Heikki Linnakangas wrote: On 01/14/2015 05:26 PM, Timmer, Marius wrote: Hello Heikki, abbreviated version: Sorry, the problem is only the unhandy patch text format, not different opinions how to proceed. Long version: The v7 patch file already addressed your suggestions, but the file contained serveral (old) local commits, the new ones at the end of the patch text/file. Ah, missed that. I stopped reading when I saw the old stuff there :-). v7.1 is attached and addresses this issue providing a clean patch file. Ok, thanks, will take a look. V8 will - as mentioned - add missing docs and regression tests, Great! - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] explain sortorder
Hello Heikki, abbreviated version: Sorry, the problem is only the unhandy patch text format, not different opinions how to proceed. Long version: The v7 patch file already addressed your suggestions, but the file contained serveral (old) local commits, the new ones at the end of the patch text/file. v7.1 is attached and addresses this issue providing a clean patch file. V8 will - as mentioned - add missing docs and regression tests, Mike suggested. VlG-Arne Marius --- Marius Timmer Zentrum für Informationsverarbeitung Westfälische Wilhelms-Universität Münster Einsteinstraße 60 mtimm...@uni-muenster.de Am 13.01.2015 um 18:52 schrieb Heikki Linnakangas hlinnakan...@vmware.com: On 01/13/2015 06:04 PM, Timmer, Marius wrote: -malloc() (StringInfo is used as suggested now). There really shouldn't be any snprintf() calls in the patch, when StringInfo is used correctly... @@ -1187,6 +1187,7 @@ explain (verbose, costs off) select * from matest0 order by 1-id; Sort Output: matest0.id, matest0.name, ((1 - matest0.id)) Sort Key: ((1 - matest0.id)) + Sort Order: ASC NULLS LAST - Result Output: matest0.id, matest0.name, (1 - matest0.id) - Append This patch isn't going to be committed with this output format. Please change per my suggestion earlier: I don't like this output. If there are a lot of sort keys, it gets difficult to match the right ASC/DESC element to the sort key it applies to. (Also, there seems to be double-spaces in the list) I would suggest just adding the information to the Sort Key line. As long as you don't print the modifiers when they are defaults (ASC and NULLS LAST), we could print the information even in non-VERBOSE mode. So it would look something like: Sort Key: sortordertest.n1 NULLS FIRST, sortordertest.n2 DESC Or if you don't agree with that, explain why. - Heikki explain_sortorder-v7_1.patch Description: explain_sortorder-v7_1.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing INNER JOINs
On 1/13/15 5:02 AM, David Rowley wrote: I can't quite get my head around what you mean here, as the idea sounds quite similar to something that's been discussed already and ruled out. If we're joining relation a to relation b, say the plan chosen is a merge join. If we put some special node as the parent of the merge join then how will we know to skip or not skip any sorts that are there especially for the merge join, or perhaps the planner choose an index scan as a sorted path, where now that the join is removed, could become a faster seqscan. The whole plan switching node discussion came from this exact problem. Nobody seemed to like the non-optimal plan that was not properly optimised for having the relation removed. In my mind this is the same as a root level Alternative Plan, so you'd be free to do whatever you wanted in the alternate: - blah blah - Alternate - Merge join ... - SeqScan I'm guessing this would be easier to code, but that's just a guess. The other advantage is if you can't eliminate the join to table A at runtime you could still eliminate table B, whereas a top-level Alternate node doesn't have that flexibility. This does have a disadvantage of creating more plan variations to consider. With a single top-level Alternate node there's only one other option. I believe multiple Alternates would create more options to consider. Ultimately, unless this is easier to code than a top-level alternate, it's probably not worth pursuing. It also seems that transitioning through needless nodes comes at a cost. This is why I quite liked the Alternative Plan node idea, as it allowed me to skip over the alternative plan node at plan initialisation. For init I would expect this to result in a smaller number of nodes than a top-level Alternate, because you wouldn't be duplicating all the stuff above the joins. That said, I rather doubt it's worth worrying about the cost to init; won't it be completely swamped by extra planning cost, no matter how we go about this? -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On Tue, Jan 13, 2015 at 6:25 AM, John Gorman johngorm...@gmail.com wrote: One approach that I has worked well for me is to break big jobs into much smaller bite size tasks. Each task is small enough to complete quickly. We add the tasks to a task queue and spawn a generic worker pool which eats through the task queue items. This solves a lot of problems. - Small to medium jobs can be parallelized efficiently. - No need to split big jobs perfectly. - We don't get into a situation where we are waiting around for a worker to finish chugging through a huge task while the other workers sit idle. - Worker memory footprint is tiny so we can afford many of them. - Worker pool management is a well known problem. - Worker spawn time disappears as a cost factor. - The worker pool becomes a shared resource that can be managed and reported on and becomes considerably more predictable. I think this is a good idea, but for now I would like to keep our goals somewhat more modest: let's see if we can get parallel sequential scan, and only parallel sequential scan, working and committed. Ultimately, I think we may need something like what you're talking about, because if you have a query with three or six or twelve different parallelizable operations in it, you want the available CPU resources to switch between those as their respective needs may dictate. You certainly don't want to spawn a separate pool of workers for each scan. But I think getting that all working in the first version is probably harder than what we should attempt. We have a bunch of problems to solve here just around parallel sequential scan and the parallel mode infrastructure: heavyweight locking, prefetching, the cost model, and so on. Trying to add to that all of the problems that might attend on a generic task queueing infrastructure fills me with no small amount of fear. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Typo fix in alter_table.sgml
On Wed, Jan 14, 2015 at 2:55 AM, Michael Paquier michael.paqu...@gmail.com wrote: I noticed that SET STATISTICS was not in a literal block in alter_table.sgml: para - SET STATISTICS acquires a literalSHARE UPDATE EXCLUSIVE/literal lock. + literalSET STATISTICS/literal acquires a + literalSHARE UPDATE EXCLUSIVE/literal lock. /para That's a small detail, still.. Patch is attached. OK, committed. But that's not a typo as stated in $SUBJECT but rather a markup fix. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ereport bug
On Wed, Jan 14, 2015 at 1:40 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Mon, Jan 12, 2015 at 6:27 AM, Dmitry Voronin carriingfat...@yandex.ru wrote: I am attaching to this letter a test case that shows the behavior errcontext() macro and the way to fix it. So the upshot of this is that given errfinish(A, B, C), where A, B, and C are expressions, my gcc is choosing to evaluate C, then B, then A, then the errfinish call itself. But whoever wrote the errcontext() macro evidently thought, in this kind of situation, the compiler would be certain to evaluate A, then B, then C, then errfinish. But it doesn't. http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=1f9bf05e539646103c518bcbb49c04919b238f7a Oops, sorry. I missed that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup fails with long tablespace paths
On Tue, Jan 13, 2015 at 4:41 PM, Peter Eisentraut pete...@gmx.net wrote: I think the key point I'm approaching is that the information should only ever be in one place, all the time. This is not dissimilar from why we took the tablespace location out of the system catalogs. Users might have all kinds of workflows for how they back up, restore, and move their tablespaces. This works pretty well right now, because the authoritative configuration information is always in plain view. The proposal is essentially that we add another location for this information, because the existing location is incompatible with some operating system tools. And, when considered by a user, that second location might or might not collide with or overwrite the first location at some mysterious times. So I think the preferable fix is not to add a second location, but to make the first location compatible with said operating system tools, possibly in the way I propose above. I see. I'm a little concerned that following symlinks may be cheaper than whatever system we would come up with for caching the tablespace-name-to-file-name mappings. But that concern might be unfounded, and apart from it I have no reason to oppose your proposal, if you want to do the work. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT UPDATE and RLS
On 14 January 2015 at 08:43, Dean Rasheed dean.a.rash...@gmail.com wrote: On 12 January 2015 at 14:24, Stephen Frost sfr...@snowman.net wrote: Interesting, thanks for the work! I had been suspicious that there was an issue with the recursion handling. So continuing to review the RLS code, I spotted the following in prepend_row_security_policies(): /* * We may end up getting called multiple times for the same RTE, so check * to make sure we aren't doing double-work. */ if (rte-securityQuals != NIL) return false; which looked suspicious. I assume that's just a hang-up from an earlier attempt to prevent infinite recursion in RLS expansion, but actually it's wrong because in the case of an UPDATE to a security barrier view on top of a table with RLS enabled, the view's securityQuals will get added to the RTE first, and that shouldn't prevent the underlying table's RLS securityQuals from being added. AFAICT, it should be safe to just remove the above code. I can't see how prepend_row_security_policies() could end up getting called more than once for the same RTE. Turns out it wasn't as simple as that. prepend_row_security_policies() really could get called multiple times for the same RTE, because the call to query_tree_walker() at the end of fireRIRrules() would descend into the just-added quals again. The simplest fix seems to be to process RLS in a separate loop at the end, so that it can have it's own infinite recursion detection, which is different from that needed for pre-existing security quals and with check options from security barrier views. This approach simplifies things a bit, and ensures that we only try to expand RLS once for each RTE. Also, I'm thinking that it would be better to refactor things a bit and have prepend_row_security_policies() just return the new securityQuals and withCheckOptions to add. Then fireRIRrules() would only have to recurse into the new quals being added, not the already-processed quals. Turns out that refactoring actually became necessary in order to fix this bug, but I think it makes things cleaner and more efficient. Here's an updated patch with a new test for this bug. I've been developing the fixes for these RLS issues as one big patch, but I suppose it would be easy to split up, if that's preferred. Regards, Dean diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c new file mode 100644 index 9cbbcfb..34ddf41 *** a/src/backend/optimizer/plan/planner.c --- b/src/backend/optimizer/plan/planner.c *** inheritance_planner(PlannerInfo *root) *** 790,795 --- 790,796 { Query *parse = root-parse; int parentRTindex = parse-resultRelation; + Bitmapset *resultRTindexes = NULL; List *final_rtable = NIL; int save_rel_array_size = 0; RelOptInfo **save_rel_array = NULL; *** inheritance_planner(PlannerInfo *root) *** 814,820 --- 815,835 * (1) would result in a rangetable of length O(N^2) for N targets, with * at least O(N^3) work expended here; and (2) would greatly complicate * management of the rowMarks list. + * + * Note that any RTEs with security barrier quals will be turned into + * subqueries during planning, and so we must create copies of them too, + * except where they are target relations, which will each only be used + * in a single plan. */ + resultRTindexes = bms_add_member(resultRTindexes, parentRTindex); + foreach(lc, root-append_rel_list) + { + AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(lc); + if (appinfo-parent_relid == parentRTindex) + resultRTindexes = bms_add_member(resultRTindexes, + appinfo-child_relid); + } + foreach(lc, root-append_rel_list) { AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(lc); *** inheritance_planner(PlannerInfo *root) *** 885,905 { RangeTblEntry *rte = (RangeTblEntry *) lfirst(lr); ! if (rte-rtekind == RTE_SUBQUERY) { Index newrti; /* * The RTE can't contain any references to its own RT ! * index, so we can save a few cycles by applying ! * ChangeVarNodes before we append the RTE to the ! * rangetable. */ newrti = list_length(subroot.parse-rtable) + 1; ChangeVarNodes((Node *) subroot.parse, rti, newrti, 0); ChangeVarNodes((Node *) subroot.rowMarks, rti, newrti, 0); ChangeVarNodes((Node *) subroot.append_rel_list, rti, newrti, 0); rte = copyObject(rte); subroot.parse-rtable = lappend(subroot.parse-rtable, rte); } --- 900,928 { RangeTblEntry *rte = (RangeTblEntry *) lfirst(lr); ! /* ! * Copy subquery RTEs and RTEs with security barrier quals ! * that will be turned into subqueries, except those that are ! * target relations. ! */ ! if (rte-rtekind == RTE_SUBQUERY || ! (rte-securityQuals !=
Re: [HACKERS] orangutan seizes up during isolation-check
On 1/1/15 11:04 PM, Noah Misch wrote: Clusters hosted on OS X fall into these categories: 1) Unaffected configuration. This includes everyone setting a valid messages locale via LANG, LC_ALL or LC_MESSAGES. 2) Affected configuration. Through luck and light use, the cluster would not experience the crashes/hangs. 3) Cluster would experience the crashes/hangs. DBAs in (3) want the FATAL at startup, but those in (2) want a LOG message instead. DBAs in (1) don't care. Since intermittent postmaster hangs are far worse than startup failure, if (2) and (3) have similar population, FATAL is the better bet. If (2) is sufficiently more populous than (3), then the many small pricks from startup failure do add up to hurt more than the occasional postmaster hang. Who knows how that calculation plays out. The first attached patch, for all branches, adds LOG-level messages and an assertion. So cassert builds will fail hard, while others won't. The second patch, for master only, changes the startup-time message to FATAL. If we decide to use FATAL in all branches, I would just squash them into one. What I'm seeing now is that the unaccent regression tests when run under make check-world abort with FATAL: postmaster became multithreaded during startup HINT: Set the LC_ALL environment variable to a valid locale. My locale settings for this purpose are LANG=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_NUMERIC=en_US.UTF-8 LC_TIME=en_US.UTF-8 LC_ALL= The environment variable LANG is set, all the other ones are not set. I could presumably set LC_ALL, but then I lose the ability to set different locales for different categories. Running LC_ALL=$LANG make -C contrib/unaccent check doesn't fix it; I get the same error. The behavior is also a bit strange. The step == starting postmaster== hangs for one minute before failing. This probably has nothing to do with your change, but maybe pg_regress could detect postmaster startup failures better. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hung backends stuck in spinlock heavy endless loop
On Wed, Jan 14, 2015 at 4:53 PM, Merlin Moncure mmonc...@gmail.com wrote: yeah. via: cds2=# \copy (select s as page, (bt_page_items('pg_class_oid_index', s)).* from generate_series(1,12) s) to '/tmp/page_items.csv' csv header; My immediate observation here is that blocks 2 and 9 have identical metadata (from their page opaque area), but partially non-matching data items (however, the number of items on each block is consistent and correct according to that metadata, as far as it goes). I think block 9 is supposed to have a right-link to block 5 (block 5 seems to think so, at least -- its left link is 9). So now the question is: how did that inconsistency arise? It didn't necessarily arise at the time of the (presumed) split of block 2 to create 9. It could be that the opaque area was changed by something else, some time later. I'll investigate more. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] s_lock.h default definitions are rather confused
On 2015-01-14 12:27:42 -0500, Robert Haas wrote: On Mon, Jan 12, 2015 at 12:57 PM, Andres Freund and...@2ndquadrant.com wrote: On 2015-01-10 23:03:36 +0100, Andres Freund wrote: On 2015-01-10 16:09:42 -0500, Tom Lane wrote: I've not tried to build HEAD on my HPPA dinosaur for awhile, but I did just now, and I am presented with boatloads of this: ../../../src/include/storage/s_lock.h:759: warning: `S_UNLOCK' redefined ../../../src/include/storage/s_lock.h:679: warning: this is the location of the previous definition which is not too surprising because the default definition at line 679 precedes the HPPA-specific one at line 759. That's 0709b7ee72e4bc71ad07b7120acd117265ab51d0. Not too surprising that it broke and wasn't noticed without access to hppa - the hppa code uses gcc inline assembly outside of the big defined(__GNUC__) and inside the section headed Platforms that use non-gcc inline assembly. I'm not particularly interested in untangling the effects of the recent hackery in s_lock.h enough to figure out how the overall structure got broken, but I trust one of you will clean up the mess. I think it's easiest solved by moving the gcc inline assembly up to the rest of the gcc inline assembly. That'll require duplicating a couple lines, but seems easier to understand nonetheless. Not pretty. Robert, do you have a better idea? How about hoisting the entire hppa section up to the top of the file, before the __GNUC__ || __INTEL_COMPILER section? We could do that, but I'd rather not see that uglyness at the top everytime I open the file :P Right now I think a #ifdef/undef S_UNLOCK in the relevant gcc section sufficient and acceptable. It's after all the HPPA section that doesn't really play by the rules. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)
On Fri, Jan 9, 2015 at 10:51 AM, Kouhei Kaigai kai...@ak.jp.nec.com wrote: When custom-scan node replaced a join-plan, it shall have at least two child plan-nodes. The callback handler of PlanCustomPath needs to be able to call create_plan_recurse() to transform the underlying path-nodes to plan-nodes, because this custom-scan node may take other built-in scan or sub-join nodes as its inner/outer input. In case of FDW, it shall kick any underlying scan relations to remote side, thus we may not expect ForeignScan has underlying plans... Do you have an example of this? Yes, even though full code set is too large for patch submission... https://github.com/pg-strom/devel/blob/master/src/gpuhashjoin.c#L1880 This create_gpuhashjoin_plan() is PlanCustomPath callback of GpuHashJoin. It takes GpuHashJoinPath inherited from CustomPath that has multiple underlying scan/join paths. Once it is called back from the backend, it also calls create_plan_recurse() to make inner/outer plan nodes according to the paths. In the result, we can see the following query execution plan that CustomScan takes underlying scan plans. postgres=# EXPLAIN SELECT * FROM t0 NATURAL JOIN t1 NATURAL JOIN t2; QUERY PLAN -- Custom Scan (GpuHashJoin) (cost=2968.00..140120.31 rows=3970922 width=143) Hash clause 1: (aid = aid) Hash clause 2: (bid = bid) Bulkload: On - Custom Scan (GpuScan) on t0 (cost=500.00..57643.00 rows=409 width=77) - Custom Scan (MultiHash) (cost=734.00..734.00 rows=4 width=37) hash keys: aid nBatches: 1 Buckets: 46000 Memory Usage: 99.99% - Seq Scan on t1 (cost=0.00..734.00 rows=4 width=37) - Custom Scan (MultiHash) (cost=734.00..734.00 rows=4 width=37) hash keys: bid nBatches: 1 Buckets: 46000 Memory Usage: 49.99% - Seq Scan on t2 (cost=0.00..734.00 rows=4 width=37) (13 rows) Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei kai...@ak.jp.nec.com -Original Message- From: Robert Haas [mailto:robertmh...@gmail.com] Sent: Thursday, January 15, 2015 2:07 AM To: Kaigai Kouhei(海外 浩平) Cc: Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada Subject: ##freemail## Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API) On Fri, Jan 9, 2015 at 10:51 AM, Kouhei Kaigai kai...@ak.jp.nec.com wrote: When custom-scan node replaced a join-plan, it shall have at least two child plan-nodes. The callback handler of PlanCustomPath needs to be able to call create_plan_recurse() to transform the underlying path-nodes to plan-nodes, because this custom-scan node may take other built-in scan or sub-join nodes as its inner/outer input. In case of FDW, it shall kick any underlying scan relations to remote side, thus we may not expect ForeignScan has underlying plans... Do you have an example of this? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improving RLS qual pushdown
On Wed, Jan 14, 2015 at 9:22 AM, Dean Rasheed dean.a.rash...@gmail.com wrote: On 14 January 2015 at 13:29, Robert Haas robertmh...@gmail.com wrote: One thing they could still leak is the number of times they got called, and thus possibly the number of unseen rows. Now if the expressions get constant-folded away that won't be an issue, but a clever user can probably avoid that. Right now, EXPLAIN ANALYSE can be used to tell you the number of unseen rows. Is that something that people are concerned about, and are there any plans to change it? Interesting question. I don't know. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Shouldn't CREATE TABLE LIKE copy the relhasoids property?
On 01/14/2015 07:29 PM, Tom Lane wrote: dst1 doesn't get an OID column: regression=# create table src1 (f1 int) with oids; CREATE TABLE regression=# create table dst1 (like src1); CREATE TABLE regression=# \d+ src1 Table public.src1 Column | Type | Modifiers | Storage | Stats target | Description +-+---+-+--+- f1 | integer | | plain | | Has OIDs: yes regression=# \d+ dst1 Table public.dst1 Column | Type | Modifiers | Storage | Stats target | Description +-+---+-+--+- f1 | integer | | plain | | If you don't find that problematic, how about this case? regression=# create table src2 (f1 int, primary key(oid)) with oids; CREATE TABLE regression=# create table dst2 (like src2 including indexes); ERROR: column oid named in key does not exist I agree it's odd, and probably wrong, although it's been like that for a very long time, hasn't it? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hung backends stuck in spinlock heavy endless loop
On Wed, Jan 14, 2015 at 6:50 PM, Peter Geoghegan p...@heroku.com wrote: This is great, but it's not exactly clear which bt_page_items() page is which - some are skipped, but I can't be sure which. Would you mind rewriting that query to indicate which block is under consideration by bt_page_items()? yeah. via: cds2=# \copy (select s as page, (bt_page_items('pg_class_oid_index', s)).* from generate_series(1,12) s) to '/tmp/page_items.csv' csv header; merlin page,itemoffset,ctid,itemlen,nulls,vars,data 1,1,(0,9),16,f,f,16 0b 00 00 00 00 00 00 1,2,(2,50),16,f,f,70 00 00 00 00 00 00 00 1,3,(2,52),16,f,f,71 00 00 00 00 00 00 00 1,4,(2,24),16,f,f,ae 00 00 00 00 00 00 00 1,5,(2,25),16,f,f,af 00 00 00 00 00 00 00 1,6,(2,51),16,f,f,24 02 00 00 00 00 00 00 1,7,(2,53),16,f,f,25 02 00 00 00 00 00 00 1,8,(4,66),16,f,f,3a 03 00 00 00 00 00 00 1,9,(2,55),16,f,f,3b 03 00 00 00 00 00 00 1,10,(2,56),16,f,f,3c 03 00 00 00 00 00 00 1,11,(4,15),16,f,f,70 04 00 00 00 00 00 00 1,12,(1,31),16,f,f,71 04 00 00 00 00 00 00 1,13,(4,14),16,f,f,bd 04 00 00 00 00 00 00 1,14,(4,17),16,f,f,be 04 00 00 00 00 00 00 1,15,(1,34),16,f,f,d0 04 00 00 00 00 00 00 1,16,(1,35),16,f,f,d1 04 00 00 00 00 00 00 1,17,(0,2),16,f,f,df 04 00 00 00 00 00 00 1,18,(3,10),16,f,f,e1 04 00 00 00 00 00 00 1,19,(3,11),16,f,f,e7 04 00 00 00 00 00 00 1,20,(7,6),16,f,f,eb 04 00 00 00 00 00 00 1,21,(2,23),16,f,f,ec 04 00 00 00 00 00 00 1,22,(4,16),16,f,f,ed 04 00 00 00 00 00 00 1,23,(4,12),16,f,f,ee 04 00 00 00 00 00 00 1,24,(4,64),16,f,f,89 05 00 00 00 00 00 00 1,25,(3,31),16,f,f,8a 05 00 00 00 00 00 00 1,26,(1,48),16,f,f,8b 08 00 00 00 00 00 00 1,27,(4,63),16,f,f,18 09 00 00 00 00 00 00 1,28,(0,64),16,f,f,20 09 00 00 00 00 00 00 1,29,(0,65),16,f,f,21 09 00 00 00 00 00 00 1,30,(4,18),16,f,f,5c 09 00 00 00 00 00 00 1,31,(1,11),16,f,f,5d 09 00 00 00 00 00 00 1,32,(4,40),16,f,f,28 0a 00 00 00 00 00 00 1,33,(3,47),16,f,f,29 0a 00 00 00 00 00 00 1,34,(3,48),16,f,f,2a 0a 00 00 00 00 00 00 1,35,(3,49),16,f,f,2b 0a 00 00 00 00 00 00 1,36,(3,41),16,f,f,2c 0a 00 00 00 00 00 00 1,37,(4,45),16,f,f,2d 0a 00 00 00 00 00 00 1,38,(3,42),16,f,f,2e 0a 00 00 00 00 00 00 1,39,(4,48),16,f,f,2f 0a 00 00 00 00 00 00 1,40,(4,49),16,f,f,30 0a 00 00 00 00 00 00 1,41,(4,44),16,f,f,31 0a 00 00 00 00 00 00 1,42,(3,43),16,f,f,32 0a 00 00 00 00 00 00 1,43,(7,26),16,f,f,33 0a 00 00 00 00 00 00 1,44,(3,50),16,f,f,34 0a 00 00 00 00 00 00 1,45,(4,71),16,f,f,35 0a 00 00 00 00 00 00 1,46,(4,47),16,f,f,37 0a 00 00 00 00 00 00 1,47,(3,46),16,f,f,38 0a 00 00 00 00 00 00 1,48,(3,44),16,f,f,39 0a 00 00 00 00 00 00 1,49,(4,41),16,f,f,3a 0a 00 00 00 00 00 00 1,50,(0,1),16,f,f,3b 0a 00 00 00 00 00 00 1,51,(4,42),16,f,f,3c 0a 00 00 00 00 00 00 1,52,(0,61),16,f,f,5a 0a 00 00 00 00 00 00 1,53,(0,54),16,f,f,5b 0a 00 00 00 00 00 00 1,54,(0,55),16,f,f,5c 0a 00 00 00 00 00 00 1,55,(0,56),16,f,f,5d 0a 00 00 00 00 00 00 1,56,(0,57),16,f,f,5e 0a 00 00 00 00 00 00 1,57,(0,59),16,f,f,5f 0a 00 00 00 00 00 00 1,58,(0,50),16,f,f,60 0a 00 00 00 00 00 00 1,59,(7,14),16,f,f,61 0a 00 00 00 00 00 00 1,60,(0,42),16,f,f,62 0a 00 00 00 00 00 00 1,61,(0,43),16,f,f,63 0a 00 00 00 00 00 00 1,62,(0,68),16,f,f,64 0a 00 00 00 00 00 00 1,63,(0,69),16,f,f,65 0a 00 00 00 00 00 00 1,64,(7,3),16,f,f,66 0a 00 00 00 00 00 00 1,65,(7,4),16,f,f,67 0a 00 00 00 00 00 00 1,66,(0,53),16,f,f,68 0a 00 00 00 00 00 00 1,67,(7,22),16,f,f,69 0a 00 00 00 00 00 00 1,68,(7,23),16,f,f,6a 0a 00 00 00 00 00 00 1,69,(7,24),16,f,f,6b 0a 00 00 00 00 00 00 1,70,(1,72),16,f,f,6c 0a 00 00 00 00 00 00 1,71,(1,73),16,f,f,6d 0a 00 00 00 00 00 00 1,72,(1,74),16,f,f,6e 0a 00 00 00 00 00 00 1,73,(1,6),16,f,f,6f 0a 00 00 00 00 00 00 1,74,(1,7),16,f,f,70 0a 00 00 00 00 00 00 1,75,(1,75),16,f,f,71 0a 00 00 00 00 00 00 1,76,(1,76),16,f,f,72 0a 00 00 00 00 00 00 1,77,(1,67),16,f,f,73 0a 00 00 00 00 00 00 1,78,(0,40),16,f,f,74 0a 00 00 00 00 00 00 1,79,(0,41),16,f,f,75 0a 00 00 00 00 00 00 1,80,(1,50),16,f,f,76 0a 00 00 00 00 00 00 1,81,(1,51),16,f,f,77 0a 00 00 00 00 00 00 1,82,(1,49),16,f,f,78 0a 00 00 00 00 00 00 1,83,(1,58),16,f,f,79 0a 00 00 00 00 00 00 1,84,(1,59),16,f,f,7a 0a 00 00 00 00 00 00 1,85,(1,89),16,f,f,7b 0a 00 00 00 00 00 00 1,86,(1,70),16,f,f,7c 0a 00 00 00 00 00 00 1,87,(1,71),16,f,f,7d 0a 00 00 00 00 00 00 1,88,(1,56),16,f,f,7e 0a 00 00 00 00 00 00 1,89,(1,57),16,f,f,7f 0a 00 00 00 00 00 00 1,90,(1,52),16,f,f,80 0a 00 00 00 00 00 00 1,91,(1,53),16,f,f,81 0a 00 00 00 00 00 00 1,92,(1,43),16,f,f,82 0a 00 00 00 00 00 00 1,93,(7,1),16,f,f,83 0a 00 00 00 00 00 00 1,94,(1,61),16,f,f,84 0a 00 00 00 00 00 00 1,95,(1,62),16,f,f,85 0a 00 00 00 00 00 00 1,96,(0,29),16,f,f,86 0a 00 00 00 00 00 00 1,97,(0,30),16,f,f,87 0a 00 00 00 00 00 00 1,98,(6,35),16,f,f,88 0a 00 00 00 00 00 00 1,99,(1,36),16,f,f,89 0a 00 00 00 00 00 00 1,100,(1,37),16,f,f,8a 0a 00 00 00 00 00 00 1,101,(1,63),16,f,f,8b 0a 00 00 00 00 00 00 1,102,(1,64),16,f,f,8d 0a 00 00 00 00 00 00 1,103,(7,61),16,f,f,8e 0a 00 00 00 00 00 00 1,104,(2,19),16,f,f,8f 0a 00 00 00 00 00 00 1,105,(2,22),16,f,f,90 0a
Re: [HACKERS] Async execution of postgres_fdw.
On Tue, Jan 13, 2015 at 6:46 AM, Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp wrote: Is it possible to use the parallel query infrastructure being built by Robert or to do something like parallel seq scan? That will work, not just for Postgres FDW but all the FDWs. But, I think, from the performance view, every scan of multiple foreign scans don't need correnponding local process. Quite so. I think this is largely a separate project. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] __attribute__ for non-gcc compilers
On 2015-01-14 17:46:39 -0500, Robert Haas wrote: On Wed, Jan 14, 2015 at 5:29 PM, Andres Freund and...@2ndquadrant.com wrote: I think it's better than the alternatives: a) Don't support 64bit atomics on any 32bit platform. I think that'd be sad because there's some places that could greatly benefit from being able to read/store/manipulate e.g. LSNs atomically. b) Double the size of 64bit atomics on 32bit platforms, and add TYPEALIGN() to every access inside the atomics implementation. c) Require 64 atomics to be aligned appropriately manually in every place they're embedded. I think that's completely impractical. The only viable one imo is a) I can't really fault that reasoning, but if __attribute__((align)) only works on some platforms, then you've got silent, subtle breakage on the ones where it doesn't. The __attribute__((align()))'s are in compiler specific code sections anyway - and there's asserts ensuring that the alignment is correct in all routines where it matters (IIRC). Those are what caught the problem. C.f. http://archives.postgresql.org/message-id/20150108204635.GK6299%40alap3.anarazel.de I think I'd for now simply not define pg_attribute_aligned() on platforms where it's not supported, instead of defining it empty. If we need a softer variant we can name it pg_attribute_aligned_if_possible or something. Sounds sane? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Shouldn't CREATE TABLE LIKE copy the relhasoids property?
dst1 doesn't get an OID column: regression=# create table src1 (f1 int) with oids; CREATE TABLE regression=# create table dst1 (like src1); CREATE TABLE regression=# \d+ src1 Table public.src1 Column | Type | Modifiers | Storage | Stats target | Description +-+---+-+--+- f1 | integer | | plain | | Has OIDs: yes regression=# \d+ dst1 Table public.dst1 Column | Type | Modifiers | Storage | Stats target | Description +-+---+-+--+- f1 | integer | | plain | | If you don't find that problematic, how about this case? regression=# create table src2 (f1 int, primary key(oid)) with oids; CREATE TABLE regression=# create table dst2 (like src2 including indexes); ERROR: column oid named in key does not exist regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hung backends stuck in spinlock heavy endless loop
This is great, but it's not exactly clear which bt_page_items() page is which - some are skipped, but I can't be sure which. Would you mind rewriting that query to indicate which block is under consideration by bt_page_items()? Thanks -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Turning recovery.conf into GUCs
On 01/15/2015 02:24 AM, Robert Haas wrote: I think your idea of adding read-only GUCs with the same names as all of the recovey.conf parameters is a clear win. Even if we do nothing more than that, it makes the values visible from the SQL level, and that's good. But I think we should go further and make them really be GUCs. Otherwise, if you want to be able to change one of those parameters other than at server startup time, you've got to invent a separate system for reloading them on SIGHUP. If you make them part of the GUC mechanism, you can use that. I think it's quite safe to say that the whole reason we *have* a GUC mechanism is so that all of our configuration can be done through one grand, unified mechanism rather than being fragmented across many files. +1 I do find it ironic that the creator of the Grand Unified Configuration Settings is arguing against unifying the files. Some renaming might be in order. Heikki previously suggested merging all of the recovery_target_whatever settings down into a single parameter recovery_target='kindofrecoverytarget furtherdetailsgohere', like recovery_target='xid 1234' or recovery_target='name bob'. Maybe that would be more clear (or not). Not keen on non-atomic settings, personally. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_rewind in contrib
On Wed, Jan 7, 2015 at 8:44 AM, Andrew Dunstan and...@dunslane.net wrote: I understand, but I think pg_rewind is likely to be misleading to many users who will say but I don't want just to rewind. I'm not wedded to the name I suggested, but I think we should look at possible alternative names. We do have experience of misleading names causing confusion (e.g. initdb). FWIW, pg_rewind sounds pretty good to me. But maybe I'm just not understanding what the use case for this, apart from rewinding? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] __attribute__ for non-gcc compilers
On 2015-01-14 15:48:47 -0500, Robert Haas wrote: On Tue, Jan 13, 2015 at 4:18 PM, Oskari Saarenmaa o...@ohmu.fi wrote: Commit db4ec2ffce35 added alignment attributes for 64-bit atomic variables as required on 32-bit platforms using __attribute__((aligned(8)). That works fine with GCC, and would work with Solaris Studio Compiler [1] and IBM XL C [2], but src/include/c.h defines __attribute__ as an empty macro when not using GCC. Unfortunately we can't just disable that #define and enable all __attributes__ for Solaris CC and XLC as we use a bunch of attributes that are not supported by those compilers and using them unconditionally would generate a lot of warnings. Attached a patch that defines custom macros for each attribute and enables them individually for compilers that support them and never defines __attribute__. I have tested this with GCC 4.9.2 on Linux x86-64 and Solaris CC 5.12 on Sparc (32-bit build); I don't have access to an IBM box with XLC. I guess my first question is whether we want to be relying on __attribute__((aligned)) in the first place. I think it's better than the alternatives: a) Don't support 64bit atomics on any 32bit platform. I think that'd be sad because there's some places that could greatly benefit from being able to read/store/manipulate e.g. LSNs atomically. b) Double the size of 64bit atomics on 32bit platforms, and add TYPEALIGN() to every access inside the atomics implementation. c) Require 64 atomics to be aligned appropriately manually in every place they're embedded. I think that's completely impractical. The only viable one imo is a) Since the atomics code is compiler dependant anyway, I don't much of a problem with relying on compiler specific alignment instructions. Really, the only problem right now is that __attribute__ is defined to be empty on compilers where it has meaning. If we do, then this seems like a pretty sensible and necessary change. I think it's also an improvment independent from the alignment code. Having a more widely portable __attribute__((noreturn)), __attribute__((unused)), __attribute__((format(..))) seems like a good general improvement. And all of these are supported in other compilers than gcc. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hung backends stuck in spinlock heavy endless loop
On Wed, Jan 14, 2015 at 2:32 PM, Peter Geoghegan p...@heroku.com wrote: On Wed, Jan 14, 2015 at 12:24 PM, Peter Geoghegan p...@heroku.com wrote: Could you write some code to print out the block number (i.e. BlockNumber blkno) if there are more than, say, 5 retries within _bt_moveright()? Obviously I mean that the block number should be printed, no matter whether or not the P_INCOMPLETE_SPLIT() path is taken or not. So you should just move this to the top of the for(;;) loop, so it's always available to print: BlockNumber blkno = BufferGetBlockNumber(buf); (gdb) print BufferGetBlockNumber(buf) $15 = 9 ..and it stays 9, continuing several times having set breakpoint. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hung backends stuck in spinlock heavy endless loop
On Wed, Jan 14, 2015 at 4:26 PM, Merlin Moncure mmonc...@gmail.com wrote: The index is the oid index on pg_class. Some more info: *) temp table churn is fairly high. Several dozen get spawned and destroted at the start of a replication run, all at once, due to some dodgy coding via dblink. During the replication run, the temp table churn rate drops. *) running btreecheck, I see: I knew I wrote that tool for a reason. :-) It would be great if I could get a raw dump of the index using contrib/pageinspect at this juncture too, as already described. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] __attribute__ for non-gcc compilers
On Wed, Jan 14, 2015 at 5:29 PM, Andres Freund and...@2ndquadrant.com wrote: I think it's better than the alternatives: a) Don't support 64bit atomics on any 32bit platform. I think that'd be sad because there's some places that could greatly benefit from being able to read/store/manipulate e.g. LSNs atomically. b) Double the size of 64bit atomics on 32bit platforms, and add TYPEALIGN() to every access inside the atomics implementation. c) Require 64 atomics to be aligned appropriately manually in every place they're embedded. I think that's completely impractical. The only viable one imo is a) I can't really fault that reasoning, but if __attribute__((align)) only works on some platforms, then you've got silent, subtle breakage on the ones where it doesn't. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hung backends stuck in spinlock heavy endless loop
On Wed, Jan 14, 2015 at 5:39 PM, Peter Geoghegan p...@heroku.com wrote: On Wed, Jan 14, 2015 at 3:38 PM, Merlin Moncure mmonc...@gmail.com wrote: (gdb) print BufferGetBlockNumber(buf) $15 = 9 ..and it stays 9, continuing several times having set breakpoint. And the index involved? I'm pretty sure that this in an internal page, no? The index is the oid index on pg_class. Some more info: *) temp table churn is fairly high. Several dozen get spawned and destroted at the start of a replication run, all at once, due to some dodgy coding via dblink. During the replication run, the temp table churn rate drops. *) running btreecheck, I see: cds2=# select bt_index_verify('pg_class_oid_index'); NOTICE: page 7 of index pg_class_oid_index is deleted NOTICE: page 10 of index pg_class_oid_index is deleted NOTICE: page 12 of index pg_class_oid_index is deleted bt_index_verify ─ cds2=# select bt_leftright_verify('pg_class_oid_index'); WARNING: left link/right link pair don't comport at level 0, block 9, last: 2, current left: 4 WARNING: left link/right link pair don't comport at level 0, block 9, last: 9, current left: 4 WARNING: left link/right link pair don't comport at level 0, block 9, last: 9, current left: 4 WARNING: left link/right link pair don't comport at level 0, block 9, last: 9, current left: 4 WARNING: left link/right link pair don't comport at level 0, block 9, last: 9, current left: 4 [repeat infinity until cancel] which looks like the index is corrupted? ISTM _bt_moveright is hanging because it's trying to move from block 9 to block 9 and so loops forever. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Shouldn't CREATE TABLE LIKE copy the relhasoids property?
Andrew Dunstan and...@dunslane.net writes: On 01/14/2015 07:29 PM, Tom Lane wrote: If you don't find that problematic, how about this case? regression=# create table src2 (f1 int, primary key(oid)) with oids; CREATE TABLE regression=# create table dst2 (like src2 including indexes); ERROR: column oid named in key does not exist I agree it's odd, and probably wrong, although it's been like that for a very long time, hasn't it? Sure, LIKE has always behaved this way. It still seems wrong though. As a reference point, creating a table that inherits from src1 or src2 will result in it having oids (even if you say WITHOUT OIDS). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: decreasing memory needlessly consumed by array_agg
On 12.1.2015 01:28, Ali Akbar wrote: Or else we implement what you suggest below (more comments below): Thinking about the 'release' flag a bit more - maybe we could do this instead: if (release astate-private_cxt) MemoryContextDelete(astate-mcontext); else if (release) { pfree(astate-dvalues); pfree(astate-dnulls); pfree(astate); } i.e. either destroy the whole context if possible, and just free the memory when using a shared memory context. But I'm afraid this would penalize the shared memory context, because that's intended for cases where all the build states coexist in parallel and then at some point are all converted into a result and thrown away. Adding pfree() calls is no improvement here, and just wastes cycles. As per Tom's comment, i'm using parent memory context instead of shared memory context below. In the future, if some code writer decided to use subcontext=false, to save memory in cases where there are many array accumulation, and the parent memory context is long-living, current code can cause memory leak. So i think we should implement your suggestion (pfreeing astate), and warn the implication in the API comment. The API user must choose between release=true, wasting cycles but preventing memory leak, or managing memory from the parent memory context. I'm wondering whether this is necessary after fixing makeArrayResult to use the privat_cxt flag. It's still possible to call makeMdArrayResult directly (with the wrong 'release' value). Another option might be to get rid of the 'release' flag altogether, and just use the 'private_cxt' - I'm not aware of a code using release=false with private_cxt=true (e.g. to build the same array twice from the same astate). But maybe there's such code, and another downside is that it'd break the existing API. In one possible use case, for efficiency maybe the caller will create a special parent memory context for all array accumulation. Then uses makeArrayResult* with release=false, and in the end releasing the parent memory context once for all. Yeah, although I'd much rather not break the existing code at all. That is - my goal is not to make it slower unless absolutely necessary (and in that case the code may be fixed per your suggestion). But I'm not convinced it's worth it. OK. Do you think we need to note this in the comments? Something like this: If using subcontext=false, the caller must be careful about memory usage, because makeArrayResult* will not free the memory used. Yes, I think it's worth mentioning. But I think it makes sense to move the error handling into initArrayResultArr(), including the get_element_type() call, and remove the element_type from the signature. This means initArrayResultAny() will call the get_element_type() twice, but I guess that's negligible. And everyone who calls initArrayResultArr() will get the error handling for free. Patch v7 attached, implementing those two changes, i.e. * makeMdArrayResult(..., astate-private_cxt) * move error handling into initArrayResultArr() * remove element_type from initArrayResultArr() signature Reviewing the v7 patch: - applies cleanly to current master. patch format, whitespace, etc is good - make check runs without error - performance memory usage still consistent If you think we don't have to add the comment (see above), i'll mark this as ready for committer Attached is v8 patch, with a few comments added: 1) before initArrayResult() - explaining when it's better to use a single memory context, and when it's more efficient to use a separate memory context for each array build state 2) before makeArrayResult() - explaining that it won't free memory when allocated in a single memory context (and that a pfree() has to be used if necessary) 3) before makeMdArrayResult() - explaining that it's illegal to use release=true unless using a subcontext Otherwise the v8 patch is exactly the same as v7. Assuming the comments make it sufficiently clear, I agree with marking this as 'ready for committer'. regards -- Tomas Vondrahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services diff --git a/src/backend/executor/nodeSubplan.c b/src/backend/executor/nodeSubplan.c index f3ce1d7..9eb4d63 100644 --- a/src/backend/executor/nodeSubplan.c +++
Re: [HACKERS] Out-of-bounds write and incorrect detection of trigger file in pg_standby
On Wed, Jan 14, 2015 at 8:24 AM, Michael Paquier michael.paqu...@gmail.com wrote: In pg_standby.c, we use a 32-byte buffer in CheckForExternalTrigger to which is read the content of the trigger file defined by -f: CheckForExternalTrigger(void) { charbuf[32]; [...] if ((len = read(fd, buf, sizeof(buf))) 0) { stuff(); } if (len sizeof(buf)) buf[len] = '\0'; However it happens that if the file read contains 32 bytes or more, we enforce \0 in a position out of bounds. While looking at that, I also noticed that pg_standby can trigger a failover as long as the beginning of buf matches either smart or fast, but I think we should check for a perfect match instead of a comparison of the first bytes, no? Attached is a patch to fix the out-of-bound issue as well as improvements for the detection of the trigger file content. I think that the out-of-bound fix should be backpatched, while the improvements for the trigger file are master-only. Coverity has pointed out the out-of-bound issue. I would imagine that the goal might have been to accept not only smart but also smart\r and smart\n and smart\r\n. It's a little weird that we also accept smartypants, though. Instead of doing this: if (len sizeof(buf)) buf[len] = '\0'; ...I would suggest making the size of the buffer one greater than the size of the read(), and then always nul-terminating the buffer. It seems to me that would make the code easier to reason about. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hung backends stuck in spinlock heavy endless loop
On Wed, Jan 14, 2015 at 3:38 PM, Merlin Moncure mmonc...@gmail.com wrote: (gdb) print BufferGetBlockNumber(buf) $15 = 9 ..and it stays 9, continuing several times having set breakpoint. And the index involved? I'm pretty sure that this in an internal page, no? -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] s_lock.h default definitions are rather confused
Andres Freund and...@2ndquadrant.com writes: Right now I think a #ifdef/undef S_UNLOCK in the relevant gcc section sufficient and acceptable. It's after all the HPPA section that doesn't really play by the rules. Works for me. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hung backends stuck in spinlock heavy endless loop
On Wed, Jan 14, 2015 at 6:26 PM, Merlin Moncure mmonc...@gmail.com wrote: On Wed, Jan 14, 2015 at 5:39 PM, Peter Geoghegan p...@heroku.com wrote: On Wed, Jan 14, 2015 at 3:38 PM, Merlin Moncure mmonc...@gmail.com wrote: (gdb) print BufferGetBlockNumber(buf) $15 = 9 ..and it stays 9, continuing several times having set breakpoint. And the index involved? I'm pretty sure that this in an internal page, no? The index is the oid index on pg_class. Some more info: *) temp table churn is fairly high. Several dozen get spawned and destroted at the start of a replication run, all at once, due to some dodgy coding via dblink. During the replication run, the temp table churn rate drops. *) running btreecheck, I see: cds2=# select bt_index_verify('pg_class_oid_index'); NOTICE: page 7 of index pg_class_oid_index is deleted NOTICE: page 10 of index pg_class_oid_index is deleted NOTICE: page 12 of index pg_class_oid_index is deleted bt_index_verify ─ cds2=# select bt_leftright_verify('pg_class_oid_index'); WARNING: left link/right link pair don't comport at level 0, block 9, last: 2, current left: 4 WARNING: left link/right link pair don't comport at level 0, block 9, last: 9, current left: 4 WARNING: left link/right link pair don't comport at level 0, block 9, last: 9, current left: 4 WARNING: left link/right link pair don't comport at level 0, block 9, last: 9, current left: 4 WARNING: left link/right link pair don't comport at level 0, block 9, last: 9, current left: 4 [repeat infinity until cancel] which looks like the index is corrupted? ISTM _bt_moveright is hanging because it's trying to move from block 9 to block 9 and so loops forever. per Peter the following might be useful: cds2=# select * from bt_metap('pg_class_oid_index'); magic │ version │ root │ level │ fastroot │ fastlevel ┼─┼──┼───┼──┼─── 340322 │ 2 │3 │ 1 │3 │ 1 cds2=# select (bt_page_stats('pg_class_oid_index', s)).* from generate_series(1,12) s; blkno │ type │ live_items │ dead_items │ avg_item_size │ page_size │ free_size │ btpo_prev │ btpo_next │ btpo │ btpo_flags ───┼──┼┼┼───┼───┼───┼───┼───┼───┼ 1 │ l│119 │ 0 │16 │ 8192 │ 5768 │ 0 │ 4 │ 0 │ 1 2 │ l│ 25 │ 0 │16 │ 8192 │ 7648 │ 4 │ 9 │ 0 │ 1 3 │ r│ 8 │ 0 │15 │ 8192 │ 7996 │ 0 │ 0 │ 1 │ 2 4 │ l│178 │ 0 │16 │ 8192 │ 4588 │ 1 │ 2 │ 0 │ 1 5 │ l│ 7 │ 0 │16 │ 8192 │ 8008 │ 9 │11 │ 0 │ 1 6 │ l│ 5 │ 0 │16 │ 8192 │ 8048 │11 │ 8 │ 0 │ 1 7 │ d│ 0 │ 0 │ 0 │ 8192 │ 0 │-1 │-1 │ 12366 │ 0 8 │ l│187 │ 0 │16 │ 8192 │ 4408 │ 6 │ 0 │ 0 │ 1 9 │ l│ 25 │ 0 │16 │ 8192 │ 7648 │ 4 │ 9 │ 0 │ 1 10 │ d│ 0 │ 0 │ 0 │ 8192 │ 0 │-1 │-1 │ 12366 │ 0 11 │ l│ 6 │ 0 │16 │ 8192 │ 8028 │ 5 │ 6 │ 0 │ 1 12 │ d│ 0 │ 0 │ 0 │ 8192 │ 0 │-1 │-1 │ 10731 │ 0 merlin cds2=# select (bt_page_items('pg_class_oid_index', s)).* from generate_series(1,12) s; NOTICE: page is deleted NOTICE: page is deleted NOTICE: page is deleted NOTICE: page is deleted NOTICE: page is deleted NOTICE: page is deleted NOTICE: page is deleted NOTICE: page is deleted NOTICE: page is deleted NOTICE: page is deleted NOTICE: page is deleted NOTICE: page is deleted NOTICE: page is deleted NOTICE: page is deleted NOTICE: page is deleted NOTICE: page is deleted NOTICE: page is deleted NOTICE: page is deleted itemoffset │ ctid │ itemlen │ nulls │ vars │ data ┼┼─┼───┼──┼─ 1 │ (0,9) │ 16 │ f │ f│ 16 0b 00 00 00 00 00 00 2 │ (2,50) │ 16 │ f │ f│ 70 00 00 00 00 00 00 00 3 │ (2,52) │ 16 │ f │ f│ 71 00 00 00 00 00 00 00 4 │ (2,24) │ 16 │ f │ f│ ae 00 00 00 00 00 00 00 5 │ (2,25) │ 16 │ f │ f│ af 00 00 00 00 00 00 00 6 │ (2,51) │ 16 │ f │ f│ 24 02 00 00
Re: [HACKERS] Out-of-bounds write and incorrect detection of trigger file in pg_standby
On Thu, Jan 15, 2015 at 7:13 AM, Robert Haas robertmh...@gmail.com wrote: Instead of doing this: if (len sizeof(buf)) buf[len] = '\0'; ...I would suggest making the size of the buffer one greater than the size of the read(), and then always nul-terminating the buffer. It seems to me that would make the code easier to reason about. How about the attached then? This way we still detect the same way any invalid values: - if ((len = read(fd, buf, sizeof(buf))) 0) + if ((len = read(fd, buf, sizeof(buf) - 1)) 0) Regards, -- Michael diff --git a/contrib/pg_standby/pg_standby.c b/contrib/pg_standby/pg_standby.c index d6b1692..2f9f2b4 100644 --- a/contrib/pg_standby/pg_standby.c +++ b/contrib/pg_standby/pg_standby.c @@ -418,7 +418,7 @@ CheckForExternalTrigger(void) return; } - if ((len = read(fd, buf, sizeof(buf))) 0) + if ((len = read(fd, buf, sizeof(buf) - 1)) 0) { fprintf(stderr, WARNING: could not read \%s\: %s\n, triggerPath, strerror(errno)); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improving RLS qual pushdown
Robert Haas wrote: On Wed, Jan 14, 2015 at 9:22 AM, Dean Rasheed dean.a.rash...@gmail.com wrote: On 14 January 2015 at 13:29, Robert Haas robertmh...@gmail.com wrote: One thing they could still leak is the number of times they got called, and thus possibly the number of unseen rows. Now if the expressions get constant-folded away that won't be an issue, but a clever user can probably avoid that. Right now, EXPLAIN ANALYSE can be used to tell you the number of unseen rows. Is that something that people are concerned about, and are there any plans to change it? Interesting question. I don't know. Wasn't this part of the covert channel discussion that took place way before RLS was committed? As I recall, it was argued that such covert channels are acceptable as long as their bandwidth is low. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hung backends stuck in spinlock heavy endless loop
On Wed, Jan 14, 2015 at 5:23 PM, Peter Geoghegan p...@heroku.com wrote: My immediate observation here is that blocks 2 and 9 have identical metadata (from their page opaque area), but partially non-matching data items (however, the number of items on each block is consistent and correct according to that metadata, as far as it goes). I think block 9 is supposed to have a right-link to block 5 (block 5 seems to think so, at least -- its left link is 9). I am mistaken on one detail here - blocks 2 and 9 are actually fully identical. I still have no idea why, though. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] initdb -S and tablespaces
At 2015-01-14 11:59:08 +0100, and...@2ndquadrant.com wrote: + if (ControlFile-state != DB_SHUTDOWNED + ControlFile-state != DB_SHUTDOWNED_IN_RECOVERY) + perform_fsync(data_directory); + a) Please think of a slightly more descriptive name than perform_fsync OK. (I just copied the name initdb uses, because at the time I was still thinking in terms of a later patch moving this to src/common.) What do you think of fsync_recursively? fsync_pgdata? I think fsync_recursively(data_directory) reads well. b) I think we should check here whether fsync is enabled. OK, will do. c) I'm wondering if we should add fsync to the control file and also perform an fsync if the last shutdown was clear, but fsync was disabled. Explain? Add fsync to the control file means store the value of the fsync GUC setting in the control file? And would the fsync you mention be dependent on the setting, or unconditional? +static void +pre_sync_fname(char *fname, bool isdir) +{ this essentially already exists in the backend inparts. C.f. pg_flush_data. OK, I missed that. I'll rework the patch to use it. Theoretically you should also invoke fsync on directories. OK. +* We need to name the parent of PGDATA. get_parent_directory() isn't +* enough here, because it can result in an empty string. +*/ + snprintf(pdir, MAXPGPATH, %s/.., pg_data); + canonicalize_path(pdir); Hm. Why is this neded? Just syncing . should work? Not sure, will investigate. Thanks. -- Abhijit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improving RLS qual pushdown
* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote: Robert Haas wrote: On Wed, Jan 14, 2015 at 9:22 AM, Dean Rasheed dean.a.rash...@gmail.com wrote: On 14 January 2015 at 13:29, Robert Haas robertmh...@gmail.com wrote: One thing they could still leak is the number of times they got called, and thus possibly the number of unseen rows. Now if the expressions get constant-folded away that won't be an issue, but a clever user can probably avoid that. Right now, EXPLAIN ANALYSE can be used to tell you the number of unseen rows. Is that something that people are concerned about, and are there any plans to change it? Interesting question. I don't know. Wasn't this part of the covert channel discussion that took place way before RLS was committed? As I recall, it was argued that such covert channels are acceptable as long as their bandwidth is low. Yes, it was part of the discussion and no, there's no plans to try and hide row counts in explain analyze, nor to deal with things like unique constraint or foreign key reference violations. There are other areas which need improvement which will help address covert channel activity (better auditing, better control over what actions are allowed to diffferent users when it comes to creating objects, modifying permissions and policies, throttling, etc). Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] WITH CHECK and Column-Level Privileges
Noah, * Noah Misch (n...@leadboat.com) wrote: On Mon, Jan 12, 2015 at 05:16:40PM -0500, Stephen Frost wrote: Alright, here's an updated patch which doesn't return any detail if no values are visible or if only a partial key is visible. I browsed this patch. There's been no mention of foreign key constraints, but ri_ReportViolation() deserves similar hardening. If a user has only DELETE privilege on a PK table, FK violation messages should not leak the PK values. Modifications to the foreign side are less concerning, since the user will often know the attempted value; still, I would lock down both sides. Ah, yes, good point. Please add a comment explaining the safety of each row-emitting message you haven't changed. For example, the one in refresh_by_match_merge() is safe because getting there requires MV ownership. Sure. Instead of duplicating an entire ereport() to change whether you include an errdetail, use condition ? errdetail(...) : 0. Yeah, that's a bit nicer, will do. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] pg_rewind in contrib
On Wed, Jan 14, 2015 at 6:43 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Here is a new version. There are now a fair amount of code changes compared to the version on github, like the logging and progress information, and a lot of comment changes. I also improved the documentation. Perhaps this could a direct link to the page of pg_rewind with xref linkend=app-pgrewind? -possibly new, system. Once complete, the primary and standby can be +possibly new, system. The applicationpg_rewind/ utility can be +used to speed up this process on large clusters. +Once complete, the primary and standby can be Missing a dot a the end of the phrase of this paragraph: +para + This option specifies the target data directory that is synchronized + with the source. The target server must shut down cleanly before + running applicationpg_rewind/application +/para Compilation of pg_rewind on MSVC is not done. It is still listed as a contrib/ in Mkvcbuild.pm. The compilation of pg_xlogdump fails because inclusion of dbcommands_xlog.h is missing in rmgrdesc.c (patch attached). src/bin/Makefile has not been updated to compile pg_rewind. -- Michael diff --git a/contrib/pg_xlogdump/rmgrdesc.c b/contrib/pg_xlogdump/rmgrdesc.c index 180818d..deb1b8f 100644 --- a/contrib/pg_xlogdump/rmgrdesc.c +++ b/contrib/pg_xlogdump/rmgrdesc.c @@ -23,6 +23,7 @@ #include access/xlog_internal.h #include catalog/storage_xlog.h #include commands/dbcommands.h +#include commands/dbcommands_xlog.h #include commands/sequence.h #include commands/tablespace.h #include rmgrdesc.h -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Partitioning: issues/ideas (Was: Re: [HACKERS] On partitioning)
On 06-01-2015 PM 03:40, Amit Langote wrote: I agree that while we are discussing these points, we could also be discussing how we solve problems of existing partitioning implementation using whatever the above things end up being. Proposed approaches to solve those problems might be useful to drive the first step as well or perhaps that's how it should be done anyway. I realize the discussion has not quite brought us to *conclusions* so far though surely there has been valuable input from people. Anyway, starting a new thread with the summary of what has been (please note that the order of listing the points does not necessarily connote the priority): * It has been repeatedly pointed out that we may want to decouple partitioning from inheritance because implementing partitioning as an extension of inheritance mechanism means that we have to keep all the existing semantics which might limit what we want to do with the special case of it which is partitioning; in other words, we would find ourselves in difficult position where we have to inject a special case code into a very generalized mechanism that is inheritance. Specifically, do we regard a partitions as pg_inherits children of its partitioning parent? * Syntax: do we want to make it similar to one of the many other databases out there? Or we could invent our own? I like the syntax that Robert suggested that covers the cases of RANGE and LIST partitioning without actually having to use those keywords explicitly; something like the following: CREATE TABLE parent PARTITION ON (column [ USING opclass ] [, ... ]); CREATE TABLE child PARTITION OF parent_name FOR VALUES { (value, ...) [ TO (value, ...) ] } So instead of making a hard distinction between range and list partitioning, you can say: CREATE TABLE child_name PARTITION OF parent_name FOR VALUES (3, 5, 7); wherein, child is effectively a LIST partition CREATE TABLE child PARTITION OF parent_name FOR VALUES (8) TO (12); wherein, child is effectively a RANGE partition on one column CREATE TABLE child PARTITION OF parent_name FOR VALUES(20, 120) TO (30, 130); wherein, child is effectively a RANGE partition on two columns I wonder if we could add a clause like DISTRIBUTED BY to complement PARTITION ON that represents a hash distributed/partitioned table (that could be a syntax to support sharded tables maybe; we would definitely want to move ahead in that direction I guess) * Catalog: We would like to have a catalog structure suitable to implement capabilities like multi-column partitioning, sub-partitioning (with arbitrary number of levels in the hierarchy). I had suggested that we create two new catalogs viz. pg_partitioned_rel, pg_partition_def to store metadata about a partition key of a partitioned relation and partition bound info of a partition, respectively. Also, see the point about on-disk representation of partition bounds * It is desirable to treat partitions as pg_class relations with perhaps a new relkind(s). We may want to choose an implementation where only the lowest level relations in a partitioning hierarchy have storage; those at the upper layers are mere placeholder relations though of course with associated constraints determined by partitioning criteria (with appropriate metadata entered into the additional catalogs). I am not quite sure if each kind of the relations involved in the partitioning scheme have separate namespaces and, if they are, how we implement that * In the initial implementation, we could just live with partitioning on a set of columns (and not arbitrary expressions of them) * We perhaps do not need multi-column LIST partitions as they are not very widely used and may complicate the implementation * There are a number of suggestions about how we represent partition bounds (on-disk) - pg_node_tree, RECORD (a composite type or the rowtype associated with the relation itself), etc. Important point to consider here may be that partition key may contain more than one column * How we represent partition definition in memory (for a given partitioned relation) - important point to remember is that such a representation should be efficient to iterate through or binary-searchable. Also see the points about tuple-routing and partition-pruning * Overflow/catchall partition: it seems we do not want/need them. It might seem desirable for example in cases where a big transaction enters a large number of tuples all but one of which find a defined partition; we may not want to error out in such case but instead enter that erring tuple into the overflow partition instead. If we choose to implement that, we would like to also implement the capability to move the tuples into the appropriate partition once it's defined. Related is the notion of automatically creating partitions if one is not already defined for a just entered tuple; but there may be locking troubles if many concurrent sessions try to do that * Tuple-routing: based on the internal
Re: [HACKERS] Merging postgresql.conf and postgresql.auto.conf
On Wed, Jan 14, 2015 at 9:01 PM, Sawada Masahiko sawada.m...@gmail.com wrote: Hi all, The postgresql.auto.conf is loaded after loading of postgresql.conf whenever configuration file is loaded or reloaded. This means that parameter in postgresql.auto.conf is quite high priority, so the parameter in postgresql.conf does not work at all even if user set it manually. If user want to change stopped postgresql server then user need to merge two configuration file(postgresql.conf and postgresql.auto.conf) while maintaining the consistency manually. I think one way to currently do it is with the help of Alter System .. Reset command. Basically user can check via pg_settings, if the variable belongs to postgresql.auto.conf and he knows already a duplicate copy of same is available in postrgresql.conf and that is the value he want to use, then RESET command could be used to remove the setting from postgresql.auto.conf From an operational perspective having a written config with duplicate entries is not good thing. I think we need to merge two configuration file into one (or more than one, if it uses like 'include' word) The one solution is to add merging tool/commnand which merges two configuration file while maintaining the consistency. If you want to think of some solution which can make the usage of Alter System more convenient, then we should think only in terms of change/ remove the value in postgresql.auto.conf as that is the place where we have added values programatically. One thought I have in this line is that currently there doesn't seem to be a way to know if the setting has an entry both in postgresql.conf and postgresql.auto.conf, if we can have some way of knowing the same (pg_settings?), then it could be convenient for user to decide if the value in postgresql.auto.conf is useful or not and if it's not useful then use Alter System .. Reset command to remove the same from postgresql.auto.conf. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Safe memory allocation functions
Robert Haas wrote: On Tue, Jan 13, 2015 at 10:10 AM, Tom Lane t...@sss.pgh.pa.us wrote: However, there is a larger practical problem with this whole concept, which is that experience should teach us to be very wary of the assumption that asking for memory the system can't give us will just lead to nice neat malloc-returns-NULL behavior. Any small perusal of the mailing list archives will remind you that very often the end result will be SIGSEGV, OOM kills, unrecoverable trap-on-write when the kernel realizes it can't honor a copy-on-write promise, yadda yadda. Agreed that it's arguable that these only occur in misconfigured systems ... but misconfiguration appears to be the default in a depressingly large fraction of systems. (This is another reason for _safe not being the mot juste :-() I don't really buy this. It's pretty incredible to think that after a malloc() failure there is absolutely no hope of carrying on sanely. If that were true, we wouldn't be able to ereport() out-of-memory errors at any severity less than FATAL, but of course it doesn't work that way. Moreover, AllocSetAlloc() contains malloc() and, if that fails, calls malloc() again with a smaller value, without even throwing an error. I understood Tom's point differently: instead of malloc() failing, malloc() will return a supposedly usable pointer, but later usage of it will lead to a crash of some sort. We know this does happen in reality, because people do report it; but we also know how to fix it. And for systems that have been correctly set up, the new behavior (using some plan B for when malloc actually fails instead of spuriously succeeding only to cause a later crash) will be much more convenient. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] orangutan seizes up during isolation-check
On Wed, Jan 14, 2015 at 04:48:53PM -0500, Peter Eisentraut wrote: What I'm seeing now is that the unaccent regression tests when run under make check-world abort with FATAL: postmaster became multithreaded during startup HINT: Set the LC_ALL environment variable to a valid locale. contrib/unaccent/Makefile sets NO_LOCALE=1, so that makes sense. I expect the patch over here will fix it: http://www.postgresql.org/message-id/20150109063015.ga2491...@tornado.leadboat.com The behavior is also a bit strange. The step == starting postmaster== hangs for one minute before failing. This probably has nothing to do with your change, but maybe pg_regress could detect postmaster startup failures better. Yeah, I think any postmaster startup failure has that effect. Not ideal. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hung backends stuck in spinlock heavy endless loop
On Wed, Jan 14, 2015 at 5:23 PM, Peter Geoghegan p...@heroku.com wrote: My immediate observation here is that blocks 2 and 9 have identical metadata (from their page opaque area), but partially non-matching data items (however, the number of items on each block is consistent and correct according to that metadata, as far as it goes). I think block 9 is supposed to have a right-link to block 5 (block 5 seems to think so, at least -- its left link is 9). For the convenience of others, here is a version with a normalized data column added (interpreting the datums as little-endian Oids). -- Peter Geoghegan page,itemoffset,ctid,itemlen,nulls,vars,data,normalized data 1,1,(0,9),16,f,f,16 0b 00 00 00 00 00 00,2838 1,2,(2,50),16,f,f,70 00 00 00 00 00 00 00,112 1,3,(2,52),16,f,f,71 00 00 00 00 00 00 00,113 1,4,(2,24),16,f,f,ae 00 00 00 00 00 00 00,174 1,5,(2,25),16,f,f,af 00 00 00 00 00 00 00,175 1,6,(2,51),16,f,f,24 02 00 00 00 00 00 00,548 1,7,(2,53),16,f,f,25 02 00 00 00 00 00 00,549 1,8,(4,66),16,f,f,3a 03 00 00 00 00 00 00,826 1,9,(2,55),16,f,f,3b 03 00 00 00 00 00 00,827 1,10,(2,56),16,f,f,3c 03 00 00 00 00 00 00,828 1,11,(4,15),16,f,f,70 04 00 00 00 00 00 00,1136 1,12,(1,31),16,f,f,71 04 00 00 00 00 00 00,1137 1,13,(4,14),16,f,f,bd 04 00 00 00 00 00 00,1213 1,14,(4,17),16,f,f,be 04 00 00 00 00 00 00,1214 1,15,(1,34),16,f,f,d0 04 00 00 00 00 00 00,1232 1,16,(1,35),16,f,f,d1 04 00 00 00 00 00 00,1233 1,17,(0,2),16,f,f,df 04 00 00 00 00 00 00,1247 1,18,(3,10),16,f,f,e1 04 00 00 00 00 00 00,1249 1,19,(3,11),16,f,f,e7 04 00 00 00 00 00 00,1255 1,20,(7,6),16,f,f,eb 04 00 00 00 00 00 00,1259 1,21,(2,23),16,f,f,ec 04 00 00 00 00 00 00,1260 1,22,(4,16),16,f,f,ed 04 00 00 00 00 00 00,1261 1,23,(4,12),16,f,f,ee 04 00 00 00 00 00 00,1262 1,24,(4,64),16,f,f,89 05 00 00 00 00 00 00,1417 1,25,(3,31),16,f,f,8a 05 00 00 00 00 00 00,1418 1,26,(1,48),16,f,f,8b 08 00 00 00 00 00 00,2187 1,27,(4,63),16,f,f,18 09 00 00 00 00 00 00,2328 1,28,(0,64),16,f,f,20 09 00 00 00 00 00 00,2336 1,29,(0,65),16,f,f,21 09 00 00 00 00 00 00,2337 1,30,(4,18),16,f,f,5c 09 00 00 00 00 00 00,2396 1,31,(1,11),16,f,f,5d 09 00 00 00 00 00 00,2397 1,32,(4,40),16,f,f,28 0a 00 00 00 00 00 00,2600 1,33,(3,47),16,f,f,29 0a 00 00 00 00 00 00,2601 1,34,(3,48),16,f,f,2a 0a 00 00 00 00 00 00,2602 1,35,(3,49),16,f,f,2b 0a 00 00 00 00 00 00,2603 1,36,(3,41),16,f,f,2c 0a 00 00 00 00 00 00,2604 1,37,(4,45),16,f,f,2d 0a 00 00 00 00 00 00,2605 1,38,(3,42),16,f,f,2e 0a 00 00 00 00 00 00,2606 1,39,(4,48),16,f,f,2f 0a 00 00 00 00 00 00,2607 1,40,(4,49),16,f,f,30 0a 00 00 00 00 00 00,2608 1,41,(4,44),16,f,f,31 0a 00 00 00 00 00 00,2609 1,42,(3,43),16,f,f,32 0a 00 00 00 00 00 00,2610 1,43,(7,26),16,f,f,33 0a 00 00 00 00 00 00,2611 1,44,(3,50),16,f,f,34 0a 00 00 00 00 00 00,2612 1,45,(4,71),16,f,f,35 0a 00 00 00 00 00 00,2613 1,46,(4,47),16,f,f,37 0a 00 00 00 00 00 00,2615 1,47,(3,46),16,f,f,38 0a 00 00 00 00 00 00,2616 1,48,(3,44),16,f,f,39 0a 00 00 00 00 00 00,2617 1,49,(4,41),16,f,f,3a 0a 00 00 00 00 00 00,2618 1,50,(0,1),16,f,f,3b 0a 00 00 00 00 00 00,2619 1,51,(4,42),16,f,f,3c 0a 00 00 00 00 00 00,2620 1,52,(0,61),16,f,f,5a 0a 00 00 00 00 00 00,2650 1,53,(0,54),16,f,f,5b 0a 00 00 00 00 00 00,2651 1,54,(0,55),16,f,f,5c 0a 00 00 00 00 00 00,2652 1,55,(0,56),16,f,f,5d 0a 00 00 00 00 00 00,2653 1,56,(0,57),16,f,f,5e 0a 00 00 00 00 00 00,2654 1,57,(0,59),16,f,f,5f 0a 00 00 00 00 00 00,2655 1,58,(0,50),16,f,f,60 0a 00 00 00 00 00 00,2656 1,59,(7,14),16,f,f,61 0a 00 00 00 00 00 00,2657 1,60,(0,42),16,f,f,62 0a 00 00 00 00 00 00,2658 1,61,(0,43),16,f,f,63 0a 00 00 00 00 00 00,2659 1,62,(0,68),16,f,f,64 0a 00 00 00 00 00 00,2660 1,63,(0,69),16,f,f,65 0a 00 00 00 00 00 00,2661 1,64,(7,3),16,f,f,66 0a 00 00 00 00 00 00,2662 1,65,(7,4),16,f,f,67 0a 00 00 00 00 00 00,2663 1,66,(0,53),16,f,f,68 0a 00 00 00 00 00 00,2664 1,67,(7,22),16,f,f,69 0a 00 00 00 00 00 00,2665 1,68,(7,23),16,f,f,6a 0a 00 00 00 00 00 00,2666 1,69,(7,24),16,f,f,6b 0a 00 00 00 00 00 00,2667 1,70,(1,72),16,f,f,6c 0a 00 00 00 00 00 00,2668 1,71,(1,73),16,f,f,6d 0a 00 00 00 00 00 00,2669 1,72,(1,74),16,f,f,6e 0a 00 00 00 00 00 00,2670 1,73,(1,6),16,f,f,6f 0a 00 00 00 00 00 00,2671 1,74,(1,7),16,f,f,70 0a 00 00 00 00 00 00,2672 1,75,(1,75),16,f,f,71 0a 00 00 00 00 00 00,2673 1,76,(1,76),16,f,f,72 0a 00 00 00 00 00 00,2674 1,77,(1,67),16,f,f,73 0a 00 00 00 00 00 00,2675 1,78,(0,40),16,f,f,74 0a 00 00 00 00 00 00,2676 1,79,(0,41),16,f,f,75 0a 00 00 00 00 00 00,2677 1,80,(1,50),16,f,f,76 0a 00 00 00 00 00 00,2678 1,81,(1,51),16,f,f,77 0a 00 00 00 00 00 00,2679 1,82,(1,49),16,f,f,78 0a 00 00 00 00 00 00,2680 1,83,(1,58),16,f,f,79 0a 00 00 00 00 00 00,2681 1,84,(1,59),16,f,f,7a 0a 00 00 00 00 00 00,2682 1,85,(1,89),16,f,f,7b 0a 00 00 00 00 00 00,2683 1,86,(1,70),16,f,f,7c 0a 00 00 00 00 00 00,2684 1,87,(1,71),16,f,f,7d 0a 00 00 00 00 00 00,2685 1,88,(1,56),16,f,f,7e 0a 00 00 00 00 00 00,2686 1,89,(1,57),16,f,f,7f 0a 00 00 00 00 00 00,2687 1,90,(1,52),16,f,f,80 0a 00 00 00 00 00 00,2688 1,91,(1,53),16,f,f,81 0a
Re: [HACKERS] Overhauling our interrupt handling
Hello, I'd synced up this at last. I think I should finilize my commitfest item for this issue, with .. Rejected? This mail is a answer to http://archives.postgresql.org/message-id/20150110022542.GF12509%40alap3.anarazel.de but I decided that it's a good go move it into a new thread since the patchseries has outgrown its original target. It's invasive and touches many arcane areas of the code, so that I think more eyes than a long deep thread is likely to have, are a good thing. As a short description of the goal of the patchseries: This started out as steps on the way to be able to safely handle terminate_backend() et al when we're reading/writing from the client. E.g. because the client is dead and we haven't noticed yet and are stuck writing, or because we want to implement a idle_in_transaction timeout. Making these changes allowed me to see that not doing significant work in signal handlers can make code much simpler and more robust. The number of bugs postgres had in the past that involved doing too much in signal handlers is significant. Thus I've since extended the patchseries to get rid of nearly all nontrivial work in signal handlers. It sounds pretty nice. All the patches in the series up to 0008 hav ecommit messages providing more detail. A short description of each patch follows: 0001: Replace walsender's latch with the general shared latch. New patch that removes ImmediateInteruptOK behaviour from walsender. I think that's a rather good idea, because walsender currently seems to assume WaitLatchOrSocket is reentrant - which I don't think is really guaranteed. Hasn't been reviewed yet, but I think it's not far from being committable. Deesn't this patchset containing per-socket basis non-blocking control for win32? It should make the code (above the win32 socket layer itself) more simpler. 0002: Use a nonblocking socket for FE/BE communication and block using latches. Has previously been reviewed by Heikki. I think Noah also had a look, although I'm not sure how close that was. I think this can be committed soon. 0003: Introduce and use infrastructure for interrupt processing during client reads. From here on ImmediateInterruptOK isn't set during client communication. Normal interrupts and sinval/async interrupts are processed outside of signal handlers. Especially the sinval/async greatly simplify the respective code. Has been reviewed by Heikki in an earlier version - I think I addressed all the review comments. I'd like somebody else to look at the sinval/async.c changes before committing. I really like the simplification this change allowed. I think this patch might not be safe without 0004 because I can't convince myself that it's safe to interrupt latch waits with work that actually might also use the same latch (sinval interrupts). But it's easier to review this way as 0004 is quite big. 0004: Process 'die' interrupts while reading/writing from the client socket. This is the reason Horiguchi-san started this thread. I think the important debate here is whether we think it's acceptable that there are situations (a full socket buffer, but a alive connection) where the client previously got an error, but not anymore afterwards. I think that's much better than having unkillable connections, but I can see others being of a different opinion. This patch yields a code a bit confusion like following. | secure_raw_write(Port *port, const void *ptr, size_t len) | { .. | w = WaitLatchOrSocket(MyLatch, | if (w WL_LATCH_SET) ... | errno = EINTR; | else if (w WL_SOCKET_WRITEABLE) | goto wloop; | | errno = save_errno; The errno set when WL_LATCH_SET always vanishes. Specifically, the EINTR set by SIGTERM(WL_LATCH_SET) is overwritten by EAGAIN. As the result, pg_terminte_backend() cannot kill the backend till the write blocking is released. errno = save_errno should be the alternative of the line errno = EINTR and I confirmed that the change leads to the desirable (as of me) behavior. 0005: Move deadlock and other interrupt handling in proc.c out of signal handlers. I'm surprised how comparatively simple this turned out to be. For robustness and understanding I think this is a great improvement. New and not reviewed at all. Needs significant review. But works in my simple testcases. 0006: Don't allow immediate interupts during authentication anymore. So far we've set ImmediateInterruptOK to true during large parts of the client authentication - that's not all that pretty, interrupts might arrive while we're in some system routines. Due to patches 0003/0004 we now are able to safely serve
Re: [HACKERS] Parallel Seq Scan
On 1/13/15 9:42 PM, Amit Kapila wrote: As an example one of the the strategy could be if the table size is X MB and there are 8 workers, then divide the work as X/8 MB for each worker (which I have currently used in patch) and another could be each worker does scan 1 block at a time and then check some global structure to see which next block it needs to scan, according to me this could lead to random scan. I have read that some other databases also divide the work based on partitions or segments (size of segment is not very clear). Long-term I think we'll want a mix between the two approaches. Simply doing something like blkno % num_workers is going to cause imbalances, but trying to do this on a per-block basis seems like too much overhead. Also long-term, I think we also need to look at a more specialized version of parallelism at the IO layer. For example, during an index scan you'd really like to get IO requests for heap blocks started in the background while the backend is focused on the mechanics of the index scan itself. While this could be done with the stuff Robert has written I have to wonder if it'd be a lot more efficient to use fadvise or AIO. Or perhaps it would just be better to deal with an entire index page (remembering TIDs) and then hit the heap. But I agree with Robert; there's a lot yet to be done just to get *any* kind of parallel execution working before we start thinking about how to optimize it. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Enforce creation of destination folders for source files in pg_regress (Was: pg_regress writes into source tree)
Hi all, pg_regress will fail with test suites using only source files if the destination folders do not exist in the code tree. This is annoying because this forces to maintain empty folders sql/ and expected/ with a .gitignore ignoring everything. The issue has been discussed here (http://www.postgresql.org/message-id/54935d9d.7010...@dunslane.net) but nobody actually sent a patch, so here is one, and a thread for this discussion. Regards, -- Michael *** a/src/test/regress/pg_regress.c --- b/src/test/regress/pg_regress.c *** *** 496,501 convert_sourcefiles_in(char *source_subdir, char *dest_dir, char *dest_subdir, c --- 496,502 { char testtablespace[MAXPGPATH]; char indir[MAXPGPATH]; + char result_dir[MAXPGPATH]; struct stat st; int ret; char **name; *** *** 520,525 convert_sourcefiles_in(char *source_subdir, char *dest_dir, char *dest_subdir, c --- 521,534 /* Error logged in pgfnames */ exit(2); + /* + * Enforce creation of destination directory if it does not exist yet. + * This is useful for tests using only source files. + */ + snprintf(result_dir, MAXPGPATH, %s/%s, dest_dir, dest_subdir); + if (!directory_exists(result_dir)) + make_directory(result_dir); + snprintf(testtablespace, MAXPGPATH, %s/testtablespace, outputdir); #ifdef WIN32 *** *** 565,571 convert_sourcefiles_in(char *source_subdir, char *dest_dir, char *dest_subdir, c /* build the full actual paths to open */ snprintf(prefix, strlen(*name) - 6, %s, *name); snprintf(srcfile, MAXPGPATH, %s/%s, indir, *name); ! snprintf(destfile, MAXPGPATH, %s/%s/%s.%s, dest_dir, dest_subdir, prefix, suffix); infile = fopen(srcfile, r); --- 574,580 /* build the full actual paths to open */ snprintf(prefix, strlen(*name) - 6, %s, *name); snprintf(srcfile, MAXPGPATH, %s/%s, indir, *name); ! snprintf(destfile, MAXPGPATH, %s/%s.%s, result_dir, prefix, suffix); infile = fopen(srcfile, r); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Define and document a guaranteed ordering for RETURNING?
Hi all It's recently occurred to me that many people rely on the RETURNING clause for multi-valued INSERTs, INSERT INTO ... SELECT, etc, returning rows in the same order they were supplied as input into the INSERT. I often see lock-step iteration over the inputs to an INSERT and the RETURNING result-tuples used to map generated sequence values to the corresponding input. I can't find anything relevant in the docs about the order of results in RETURNING, and it strikes me that this should probably be codified one way or the other. The docs simply read: The optional RETURNING clause causes INSERT to compute and return value(s) based on each row actually inserted. This is primarily useful for obtaining values that were supplied by defaults, such as a serial sequence number. However, any expression using the table's columns is allowed. The syntax of the RETURNING list is identical to that of the output list of SELECT. Looking at the code I don't see any reason why it wouldn't currently be safe to assume that INSERT ... VALUES (...), (...), ... RETURNING ... and INSERT ... SELECT ... ORDER BY ... RETURNING ... always return rows in the exact order of their input. But that may not always be true and if it's not documented as safe, it's not necessarily wise to rely on it. If you can't rely on RETURNING order you have to instead identify a candidate key on the input and use a wider RETURNING result that includes that candidate key as well as whatever else you want returned. Then map the results onto the inputs. No fun at all. So - I propose to document that INSERT ... RETURNING guarantees that the result-tuples will be in the same order as the input to the INSERT ... RETURNING statement. So if that order is well-defined, as in the case of multi-VALUES clauses or SELECT ... ORDER BY ..., the result order is also well defined. (An alternative would be adding RETURNING ... WITH ORDINALITY. Which seems pretty horrible). Thoughts/comments? Barring objections I'll write up a docs patch for the docs list. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services
Re: [HACKERS] Parallel Seq Scan
On Wed, Jan 14, 2015 at 9:30 AM, Ashutosh Bapat ashutosh.ba...@enterprisedb.com wrote: On Wed, Jan 14, 2015 at 9:12 AM, Amit Kapila amit.kapil...@gmail.com wrote: Here we have to decide what should be the strategy and how much each worker should scan. As an example one of the the strategy could be if the table size is X MB and there are 8 workers, then divide the work as X/8 MB for each worker (which I have currently used in patch) and another could be each worker does scan 1 block at a time and then check some global structure to see which next block it needs to scan, according to me this could lead to random scan. I have read that some other databases also divide the work based on partitions or segments (size of segment is not very clear). A block can contain useful tuples, i.e tuples which are visible and fulfil the quals + useless tuples i.e. tuples which are dead, invisible or that do not fulfil the quals. Depending upon the contents of these blocks, esp. the ratio of (useful tuples)/(unuseful tuples), even though we divide the relation into equal sized runs, each worker may take different time. So, instead of dividing the relation into number of run = number of workers, it might be better to divide them into fixed sized runs with size (total number of blocks/ number of workers), and let a worker pick up a run after it finishes with the previous one. The smaller the size of runs the better load balancing but higher cost of starting with the run. So, we have to strike a balance. I think your suggestion is good and it somewhat falls inline with what Robert has suggested, but instead of block-by-block, you seem to be suggesting of doing it in chunks (where chunk size is not clear), however the only point against this is that such a strategy for parallel sequence scan could lead to random scans which can hurt the operation badly. Nonetheless, I will think more on this lines of making work distribution dynamic so that we can ensure that all workers can be kept busy. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Minor configure tweak to simplify adjusting gcc warnings
On 2015-01-14 10:01:39 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2015-01-14 09:34:23 -0500, Tom Lane wrote: Well, that would only fix my problem if we added a configure-time test for whether gcc recognizes z, which frankly seems like a waste of cycles. I've probably got the last one left in captivity that doesn't. Hm. I had kinda assumed that %z support for sprintf and gcc's recognition of the format string would coincide and we could just use the %z result. But gull's output doesn't actually that way. It's only reasonable to assume that gcc matches sprintf if gcc is the native (vendor-supplied) compiler for the platform. That's not the case on gaur. It used to be very very commonly not the case, though I think a lot of vendors have now abandoned their proprietary compilers. If we were to test for this at all, I think we'd need to make the test separate from the one for sprintf's behavior. I've already given up... Given how infrequent it is, suppressing it for gull seems sufficient. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hung backends stuck in spinlock heavy endless loop
Merlin Moncure mmonc...@gmail.com writes: On Wed, Jan 14, 2015 at 8:41 AM, Tom Lane t...@sss.pgh.pa.us wrote: What are the autovac processes doing (according to pg_stat_activity)? pid,running,waiting,query 7105,00:28:40.789221,f,autovacuum: VACUUM ANALYZE pg_catalog.pg_class Hah, I suspected as much. Is that the one that's stuck in LockBufferForCleanup, or the other one that's got a similar backtrace to all the user processes? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hung backends stuck in spinlock heavy endless loop
Merlin Moncure mmonc...@gmail.com writes: There were seven process with that backtrace exact backtrace (except that randomly they are sleeping in the spinloop). Something else interesting: autovacuum has been running all night as well. Unlike the other process however, cpu utilization does not register on top. I backtraced them as well. They are not identical. One of them looks like this: What are the autovac processes doing (according to pg_stat_activity)? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hung backends stuck in spinlock heavy endless loop
On Wed, Jan 14, 2015 at 9:05 AM, Tom Lane t...@sss.pgh.pa.us wrote: Merlin Moncure mmonc...@gmail.com writes: On Wed, Jan 14, 2015 at 8:41 AM, Tom Lane t...@sss.pgh.pa.us wrote: What are the autovac processes doing (according to pg_stat_activity)? pid,running,waiting,query 7105,00:28:40.789221,f,autovacuum: VACUUM ANALYZE pg_catalog.pg_class Hah, I suspected as much. Is that the one that's stuck in LockBufferForCleanup, or the other one that's got a similar backtrace to all the user processes? Yes, it is pg_class is coming from LockBufferForCleanup (). As you can see above, it has a shorter runtime. So it was killed off once about a half hour ago which did not free up the logjam. However, AV spawned it again and now it does not respond to cancel. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hung backends stuck in spinlock heavy endless loop
On 2015-01-14 09:22:45 -0600, Merlin Moncure wrote: On Wed, Jan 14, 2015 at 9:11 AM, Andres Freund and...@2ndquadrant.com wrote: On 2015-01-14 10:05:01 -0500, Tom Lane wrote: Merlin Moncure mmonc...@gmail.com writes: On Wed, Jan 14, 2015 at 8:41 AM, Tom Lane t...@sss.pgh.pa.us wrote: What are the autovac processes doing (according to pg_stat_activity)? pid,running,waiting,query 7105,00:28:40.789221,f,autovacuum: VACUUM ANALYZE pg_catalog.pg_class It'd be interesting to know whether that vacuum gets very frequent semaphore wakeups. Could you strace it for a second or three? for 30 seconds+ it just looks like this: mmoncure@mernix2 ~ $ sudo strace -p 7105 Process 7105 attached semop(5701638, {{4, -1, 0}}, 1 Ok. So that explains why it's not interruptible. all of other processes are yielding out of the spinlock, for example: select(0, NULL, NULL, NULL, {0, 1408}) = 0 (Timeout) Note the above isn't the spinlock, it's the process's semaphore. It'll only get set if the refcount ever indicates that nobody but autovac is holding the lock. How did this perform 9.4? this is a new project. However, I can run it vs earlier version. Can you guess how many times these dynamic statements are planned? How many different relations are accessed in the dynamically planned queries? only once or twice, and only a couple of tables. Hm. Odd. The first -g profile seemed to indicate a hell of a lot time was spent in LWLockRelease() - indicating that there's actually progress. Later profiles/backtraces were less clear. If you gdb in, and type 'fin' a couple times, to wait till the function finishes, is there actually any progress? I'm wondering whether it's just many catalog accesses + contention, or some other problem. Alternatively set a breakpoint on ScanPgRelation() or so and see how often it's hit. I can send the code off-list if you guys think it'd help. Might be interesting. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hung backends stuck in spinlock heavy endless loop
On Wed, Jan 14, 2015 at 8:41 AM, Tom Lane t...@sss.pgh.pa.us wrote: Merlin Moncure mmonc...@gmail.com writes: There were seven process with that backtrace exact backtrace (except that randomly they are sleeping in the spinloop). Something else interesting: autovacuum has been running all night as well. Unlike the other process however, cpu utilization does not register on top. I backtraced them as well. They are not identical. One of them looks like this: What are the autovac processes doing (according to pg_stat_activity)? pid,running,waiting,query 27885,15:40:46.904039,f,autovacuum: VACUUM ANALYZE onesitepmc.propertyguestcard 7209,00:24:44.162684,f,SELECT n.nspname as Schema, c.relname as Name, CASE c.relkind WHEN 'r' THEN 'table' WHEN 'v' THEN 'view' WHEN 'm' THEN 'materialized view' WHEN 'i' THEN 'index' WHEN 'S' THEN 'sequence' WHEN 's' THEN 'special' WHEN 'f' THEN 'foreign table' END as Type, pg_catalog.pg_get_userbyid(c.relowner) as Owner FROM pg_catalog.pg_class c LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace WHERE c.relkind IN ('r','v','m','S','f','') AND n.nspname 'pg_catalog' AND n.nspname 'information_schema' AND n.nspname !~ '^pg_toast' AND pg_catalog.pg_table_is_visible(c.oid) ORDER BY 1,2; 7105,00:28:40.789221,f,autovacuum: VACUUM ANALYZE pg_catalog.pg_class 31282,15:28:29.070942,f,SELECT CDSReconcileRunTable(867) 31792,15:27:09.483365,f,SELECT CDSReconcileRunTable(874) 7135,00:26:08.977953,t,vacuum VERBOSE pg_class; 32725,15:24:46.008341,f,SELECT CDSReconcileRunTable(884) 32367,15:25:44.4206,f,SELECT CDSReconcileRunTable(881) 32492,15:25:31.27934,f,SELECT CDSReconcileRunTable(883) 31899,15:26:57.246415,f,SELECT CDSReconcileRunTable(875) 32368,15:25:44.418445,f,SELECT CDSReconcileRunTable(880) 7270,00:21:42.274104,f,SELECT n.nspname as Schema, c.relname as Name, CASE c.relkind WHEN 'r' THEN 'table' WHEN 'v' THEN 'view' WHEN 'm' THEN 'materialized view' WHEN 'i' THEN 'index' WHEN 'S' THEN 'sequence' WHEN 's' THEN 'special' WHEN 'f' THEN 'foreign table' END as Type, pg_catalog.pg_get_userbyid(c.relowner) as Owner FROM pg_catalog.pg_class c LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace WHERE c.relkind IN ('r','v','m','S','f','') AND n.nspname 'pg_catalog' AND n.nspname 'information_schema' AND n.nspname !~ '^pg_toast' AND pg_catalog.pg_table_is_visible(c.oid) ORDER BY 1,2; 7403,00:00:00,f,COPY ( select pid, now ( ) - query_start as running, waiting, query from pg_stat_activity ) TO STDOUT csv header ; notes: *) SELECT CDSReconcileRunTable(xxx) are the replication processes. they are burning cpu. *) SELECT n.nspname ... is via psql \d. they stick when invoked and do not respond to cancel. they are also burning cpu *) the autovacum processes do not respond to cancel. however, one of them did earlier (tracing in from lazy_vacuum_index ()) did respond. However, now, it doesn't (upon being respawned via AV) with the same backtrace. The blocked manual vacuum verbose did cancel merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hung backends stuck in spinlock heavy endless loop
On 2015-01-14 10:05:01 -0500, Tom Lane wrote: Merlin Moncure mmonc...@gmail.com writes: On Wed, Jan 14, 2015 at 8:41 AM, Tom Lane t...@sss.pgh.pa.us wrote: What are the autovac processes doing (according to pg_stat_activity)? pid,running,waiting,query 7105,00:28:40.789221,f,autovacuum: VACUUM ANALYZE pg_catalog.pg_class It'd be interesting to know whether that vacuum gets very frequent semaphore wakeups. Could you strace it for a second or three? How did this perform 9.4? Can you guess how many times these dynamic statements are planned? How many different relations are accessed in the dynamically planned queries? Hah, I suspected as much. Is that the one that's stuck in LockBufferForCleanup, or the other one that's got a similar backtrace to all the user processes? Do you have a theory? Right now it primarily looks like contention on a single buffer due to the high number of dynamic statements, possibly made worse by the signalling between normal pinners and vacuum waiting for cleanup. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hung backends stuck in spinlock heavy endless loop
On 2015-01-14 10:13:32 -0500, Tom Lane wrote: Merlin Moncure mmonc...@gmail.com writes: Yes, it is pg_class is coming from LockBufferForCleanup (). As you can see above, it has a shorter runtime. So it was killed off once about a half hour ago which did not free up the logjam. However, AV spawned it again and now it does not respond to cancel. Interesting. That seems like there might be two separate issues at play. It's plausible that LockBufferForCleanup might be interfering with other attempts to scan the index, but then why wouldn't killing the AV have unstuck things? LockBufferForCleanup() unfortunately isn't interruptible. I've every now and then seen vacuums being stuck for a long while, trying to acquire cleanup locks - IIRC I complained about that on list even. So autovac will only be cancelled when going to the next page. And even if the page ever gets a zero (well, one) refcount, by the time autovac is woken up via the semaphore, it'll often end up being used again. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hung backends stuck in spinlock heavy endless loop
Andres Freund and...@2ndquadrant.com writes: On 2015-01-14 10:05:01 -0500, Tom Lane wrote: Hah, I suspected as much. Is that the one that's stuck in LockBufferForCleanup, or the other one that's got a similar backtrace to all the user processes? Do you have a theory? Right now it primarily looks like contention on a single buffer due to the high number of dynamic statements, possibly made worse by the signalling between normal pinners and vacuum waiting for cleanup. The readers shouldn't be contending with each other. It's certainly plausible enough that there's a steady stream of readers that would prevent LockBufferForCleanup from getting in, but that should not result in any readers getting blocked. So something's gotten broken in that area. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improving RLS qual pushdown
On 14 January 2015 at 13:29, Robert Haas robertmh...@gmail.com wrote: One thing they could still leak is the number of times they got called, and thus possibly the number of unseen rows. Now if the expressions get constant-folded away that won't be an issue, but a clever user can probably avoid that. Right now, EXPLAIN ANALYSE can be used to tell you the number of unseen rows. Is that something that people are concerned about, and are there any plans to change it? Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hung backends stuck in spinlock heavy endless loop
On Wed, Jan 14, 2015 at 9:11 AM, Andres Freund and...@2ndquadrant.com wrote: On 2015-01-14 10:05:01 -0500, Tom Lane wrote: Merlin Moncure mmonc...@gmail.com writes: On Wed, Jan 14, 2015 at 8:41 AM, Tom Lane t...@sss.pgh.pa.us wrote: What are the autovac processes doing (according to pg_stat_activity)? pid,running,waiting,query 7105,00:28:40.789221,f,autovacuum: VACUUM ANALYZE pg_catalog.pg_class It'd be interesting to know whether that vacuum gets very frequent semaphore wakeups. Could you strace it for a second or three? for 30 seconds+ it just looks like this: mmoncure@mernix2 ~ $ sudo strace -p 7105 Process 7105 attached semop(5701638, {{4, -1, 0}}, 1 all of other processes are yielding out of the spinlock, for example: select(0, NULL, NULL, NULL, {0, 1408}) = 0 (Timeout) How did this perform 9.4? this is a new project. However, I can run it vs earlier version. Can you guess how many times these dynamic statements are planned? How many different relations are accessed in the dynamically planned queries? only once or twice, and only a couple of tables. This is an operation that should only take few seconds (inserting a few 10s of thousands of rows), that has blocked for many hours now. Usually it runs through taking a few seconds. This is either a deadlock or a deadlock emulating sequence of operations. I'll try to pull commits that Peter suggested and see if that helps (I'm getting ready to bring the database down). I can send the code off-list if you guys think it'd help. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Merging postgresql.conf and postgresql.auto.conf
Hi all, The postgresql.auto.conf is loaded after loading of postgresql.conf whenever configuration file is loaded or reloaded. This means that parameter in postgresql.auto.conf is quite high priority, so the parameter in postgresql.conf does not work at all even if user set it manually. If user want to change stopped postgresql server then user need to merge two configuration file(postgresql.conf and postgresql.auto.conf) while maintaining the consistency manually. From an operational perspective having a written config with duplicate entries is not good thing. I think we need to merge two configuration file into one (or more than one, if it uses like 'include' word) The one solution is to add merging tool/commnand which merges two configuration file while maintaining the consistency. This topic have been already discussed? Regards, --- Sawada Masahiko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Merging postgresql.conf and postgresql.auto.conf
Sawada Masahiko sawada.m...@gmail.com writes: The postgresql.auto.conf is loaded after loading of postgresql.conf whenever configuration file is loaded or reloaded. This means that parameter in postgresql.auto.conf is quite high priority, so the parameter in postgresql.conf does not work at all even if user set it manually. If user want to change stopped postgresql server then user need to merge two configuration file(postgresql.conf and postgresql.auto.conf) while maintaining the consistency manually. From an operational perspective having a written config with duplicate entries is not good thing. I think we need to merge two configuration file into one (or more than one, if it uses like 'include' word) The one solution is to add merging tool/commnand which merges two configuration file while maintaining the consistency. This topic have been already discussed? Yes. The entire reason that postgresql.auto.conf is separate is that we despaired of reading and rewriting postgresql.conf automatically without making a hash of material in the comments. Calling the logic a merge tool does not make that problem go away. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minor configure tweak to simplify adjusting gcc warnings
Andres Freund and...@2ndquadrant.com writes: On 2015-01-13 22:19:30 -0500, Tom Lane wrote: The reason I got interested in this is that I attempted to pass in CFLAGS=-Wno-format to configure, to suppress format warnings on buildfarm member gaur (whose gcc is too old to recognize z modifiers). That doesn't work because -Wall turns the warnings right back on again. If the user-supplied CFLAGS were inserted after -Wall then it would work. In the line of http://archives.postgresql.org/message-id/54B58BA3.8040302%40ohmu.fi in wonder if the better fix isn't to define pg_format_attribute(...) and define it empty that if the compiler doesn't support what we want. Well, that would only fix my problem if we added a configure-time test for whether gcc recognizes z, which frankly seems like a waste of cycles. I've probably got the last one left in captivity that doesn't. Not that I have any great objection to improving the attribute-slinging logic. It just doesn't seem like a good fix for this particular issue. A slightly more complicated change could be applied to make sure that *all* of the CFLAGS forcibly inserted by configure appear before any externally-sourced CFLAGS, allowing any of them to be overridden from the environment variable. I'm not sure if it's worth the trouble to do that, but if there's interest I could make it happen. I think it'd be good idea, but unless you're enthusiastic I guess there are more important things. Nah, I'm fine with doing it, it's just a couple more lines of code. I feared people might think it wasn't worth adding extra complexity for, but as long as someone else likes the idea I'll go do it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minor configure tweak to simplify adjusting gcc warnings
On 2015-01-14 09:34:23 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2015-01-13 22:19:30 -0500, Tom Lane wrote: The reason I got interested in this is that I attempted to pass in CFLAGS=-Wno-format to configure, to suppress format warnings on buildfarm member gaur (whose gcc is too old to recognize z modifiers). That doesn't work because -Wall turns the warnings right back on again. If the user-supplied CFLAGS were inserted after -Wall then it would work. In the line of http://archives.postgresql.org/message-id/54B58BA3.8040302%40ohmu.fi in wonder if the better fix isn't to define pg_format_attribute(...) and define it empty that if the compiler doesn't support what we want. Well, that would only fix my problem if we added a configure-time test for whether gcc recognizes z, which frankly seems like a waste of cycles. I've probably got the last one left in captivity that doesn't. Hm. I had kinda assumed that %z support for sprintf and gcc's recognition of the format string would coincide and we could just use the %z result. But gull's output doesn't actually that way. A slightly more complicated change could be applied to make sure that *all* of the CFLAGS forcibly inserted by configure appear before any externally-sourced CFLAGS, allowing any of them to be overridden from the environment variable. I'm not sure if it's worth the trouble to do that, but if there's interest I could make it happen. I think it'd be good idea, but unless you're enthusiastic I guess there are more important things. Nah, I'm fine with doing it, it's just a couple more lines of code. I feared people might think it wasn't worth adding extra complexity for, but as long as someone else likes the idea I'll go do it. Ok cool. I had thought the templates, subdirectory overrides would make it slightly less than trivial. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minor configure tweak to simplify adjusting gcc warnings
Andres Freund and...@2ndquadrant.com writes: On 2015-01-14 09:34:23 -0500, Tom Lane wrote: Well, that would only fix my problem if we added a configure-time test for whether gcc recognizes z, which frankly seems like a waste of cycles. I've probably got the last one left in captivity that doesn't. Hm. I had kinda assumed that %z support for sprintf and gcc's recognition of the format string would coincide and we could just use the %z result. But gull's output doesn't actually that way. It's only reasonable to assume that gcc matches sprintf if gcc is the native (vendor-supplied) compiler for the platform. That's not the case on gaur. It used to be very very commonly not the case, though I think a lot of vendors have now abandoned their proprietary compilers. If we were to test for this at all, I think we'd need to make the test separate from the one for sprintf's behavior. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hung backends stuck in spinlock heavy endless loop
Merlin Moncure mmonc...@gmail.com writes: Yes, it is pg_class is coming from LockBufferForCleanup (). As you can see above, it has a shorter runtime. So it was killed off once about a half hour ago which did not free up the logjam. However, AV spawned it again and now it does not respond to cancel. Interesting. That seems like there might be two separate issues at play. It's plausible that LockBufferForCleanup might be interfering with other attempts to scan the index, but then why wouldn't killing the AV have unstuck things? Anyway it's now seeming that Peter may have the right idea about where to look. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] explain sortorder
On 01/14/2015 05:26 PM, Timmer, Marius wrote: Hello Heikki, abbreviated version: Sorry, the problem is only the unhandy patch text format, not different opinions how to proceed. Long version: The v7 patch file already addressed your suggestions, but the file contained serveral (old) local commits, the new ones at the end of the patch text/file. Ah, missed that. I stopped reading when I saw the old stuff there :-). v7.1 is attached and addresses this issue providing a clean patch file. Ok, thanks, will take a look. V8 will - as mentioned - add missing docs and regression tests, Great! - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fillfactor for GIN indexes
On Thu, Jan 8, 2015 at 2:03 PM, Michael Paquier michael.paqu...@gmail.com wrote: On Thu, Jan 8, 2015 at 6:31 AM, Alexander Korotkov aekorot...@gmail.com wrote: On Wed, Jan 7, 2015 at 4:11 PM, Michael Paquier michael.paqu...@gmail.com I am attaching an updated patch, with the default fillfactor value at 75%, and with the page split code using the fillfactor rate. Thoughts? Rewritten version of patch is attached. I made following changes: Thanks! With this patch (and my previous version as well) GIN indexes with default fillfactor have a size higher than 9.4 indexes, 9.4 behavior being consistent only with fillfactor=100 and not the default of 90. Are we fine with that? IMO, this patch has value to control random updates on GIN indexes, but we should have a default fillfactor of 100 to have index size consistent with 9.4. Thoughts? -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] inherit support for foreign tables
On 2014/12/23 0:36, Tom Lane wrote: Yeah, we need to do something about the PlanRowMark data structure. Aside from the pre-existing issue in postgres_fdw, we need to fix it to support inheritance trees in which more than one rowmark method is being used. That rte.hasForeignChildren thing is a crock, The idea I'd had about that was to convert the markType field into a bitmask, so that a parent node's markType could represent the logical OR of the rowmark methods being used by all its children. I've not attempted to code this up though, and probably won't get to it until after Christmas. One thing that's not clear is what should happen with ExecRowMark. As I said before, that seems to me like a good idea. So I'll update the patch based on that if you're okey with it. Or you've found any problem concerning the above idea? Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] install libpq.dll in bin directory on Windows / Cygwin
On Wed, Dec 24, 2014 at 3:08 PM, Michael Paquier michael.paqu...@gmail.com wrote: Attached are two patches, one for MinGW/cygwin, a slightly modified version from Peter and the second implementing the same thing but for the MSVC scripts. The method for MSVC is similar to what is done in Peter's patch: roughly it checks if SO_MAJOR_VERSION is present in the Makefile of a given library, the path of Makefile is found by looking at the location of the .rc in the vcproj file (could be better but I could not come up with a better method). TBH, it would be good to be completely consistent in the way we build things on Windows, and we may as well consider a backpatch to fix this long-standing bug. The MSVC patch removes of course the hack copying libpq.dll from lib/ to bin/. I mentioned the fix for MSVC scripts as well here: http://www.postgresql.org/message-id/cab7npqqiuepzphd3mmk+q7_cqqrkk_v1fvxknymri660z4d...@mail.gmail.com Peter, this patch is waiting for input for a couple of weeks. IMO, it would be good to finally get a fix for this bug, and we have patches for both MSVC (the patch I sent) and mingw (your stuff). -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Define and document a guaranteed ordering for RETURNING?
Apparently I'm semi-blind - the docs also note that: If the INSERT command contains a RETURNING clause, the result will be similar to that of a SELECT statement containing the columns and values defined in the RETURNING list, computed over the row(s) inserted by the command. ... so perhaps it's enough to just explicitly mention that ordering is preserved.
Re: [HACKERS] PATCH: Reducing lock strength of trigger and foreign key DDL
I wrote: I think that we could pass it to a committer. Marked as such. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] __attribute__ for non-gcc compilers
15.01.2015, 00:54, Andres Freund kirjoitti: I think I'd for now simply not define pg_attribute_aligned() on platforms where it's not supported, instead of defining it empty. If we need a softer variant we can name it pg_attribute_aligned_if_possible or something. Good point, all attributes that actually change program behavior (aligned and packed for now) need to error out during compilation if they're used and they're not supported by the compiler. Attributes which may improve optimization or just provide more information for the developers (noreturn, unused, format) can be defined empty when they're not supported (or are not supported well enough: GCC 3 doesn't know about %z in printf format.) / Oskari -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Compression of full-page-writes
Marking this patch as returned with feedback for this CF, moving it to the next one. I doubt that there will be much progress here for the next couple of days, so let's try at least to get something for this release cycle. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Join push-down support for foreign tables
On Fri, Dec 26, 2014 at 1:48 PM, Shigeru Hanada shigeru.han...@gmail.com wrote: Hmm, I agree to support N-way join is very useful. Postgres-XC's SQL generator seems to give us a hint for such case, I'll check it out again. Switching to returned with feedback, as this patch is waiting for feedback for a couple of weeks now. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Async execution of postgres_fdw.
On Tue, Jan 13, 2015 at 8:46 PM, Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp wrote: I'll look into the case after this, but I'd like to send a revised patch at this point. Hm. Seems like this patch is not completely baked yet. Horiguchi-san, as you are obviously still working on it, would you agree to move it to the next CF? -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Commit fest 2015-12 enters money time
Hi all, We are soon entering in the money time for this CF. The last month has been mainly a vacation period, the progress being fantomatic on many fronts, still there are a couple of patches that are marked as ready for committer: - Foreign table inheritance , whose first patch has been committed - speedup tidbitmap patch: cache page - pg_basebackup vs. Windows and tablespaces (Extend base backup to include symlink file used to restore symlinks) - If pg_hba.conf has changed since last reload, emit a HINT to the client and server log - Add restore_command_retry_interval option to control timeout of restore_command nonzero status code - documentation update for doc/src/sgml/func.sgml - Reducing lock strength of trigger and foreign key DDL I am going to make a first pass on all the patches to check their status, and please note that the patches waiting for some input from their authors will be marked as returned with feedback. Thanks, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] add ssl_protocols configuration option
On Mon, Dec 15, 2014 at 11:15 PM, Alex Shulgin a...@commandprompt.com wrote: Michael Paquier michael.paqu...@gmail.com writes: Perhaps ssloptions.[ch], unless you plan to add non-option-related code there later? I don't think anything else than common options-related code fits in there, so ssloptions.c makes sense to me. BTW, there is no Regent code in your openssl.c, so the copyright statement is incorrect. Good catch, I just blindly copied that from some other file. There have been arguments in favor and against this patch... But marking it as returned with feedback because of a lack of activity in the last couple of weeks. Feel free to update if you think that's not correct, and please move this patch to commit fest 2014-12. Attached is a new version that addresses the earlier feedback: renamed the added *.[ch] files and removed incorrect copyright line. I'm moving this to the current CF. This patch is statuquo since the beginning of this CF, at the arguments are standing the same. If there is nothing happening maybe it would be better to mark it as returned with feedback? Thoughts? -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch : Allow toast tables to be moved to a different tablespace
On Tue, Dec 30, 2014 at 11:48 AM, Andreas Karlsson andr...@proxel.se wrote: Here is my review of the feature. Marked as returned with feedback. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support UPDATE table SET(*)=...
On Mon, Dec 15, 2014 at 7:50 PM, Atri Sharma atri.j...@gmail.com wrote: I have moved patch to current CF and marked it as Waiting on Author since I plan to resubmit after addressing the concerns. Nothing happened in the last month, so marking as returned with feedback. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] EvalPlanQual behaves oddly for FDW queries involving system columns
Here is an example using postgres_fdw. [Terminal 1] postgres=# create table t (a int, b int); CREATE TABLE postgres=# insert into t values (1, 1); INSERT 0 1 postgres=# begin; BEGIN postgres=# update t set b = b * 2; UPDATE 1 [Terminal 2] postgres=# create foreign table ft (a int) server loopback options (table_name 'lbt'); CREATE FOREIGN TABLE postgres=# insert into ft values (1); INSERT 0 1 postgres=# select tableoid, ctid, * from ft; tableoid | ctid | a --+---+--- 25092 | (0,1) | 1 (1 row) postgres=# select ft.tableoid, ft.ctid, ft.* from t, ft where t.a = ft.a for update; [Terminal 1] postgres=# commit; COMMIT [Terminal 2] postgres=# select ft.tableoid, ft.ctid, ft.* from t, ft where t.a = ft.a for update; tableoid | ctid | a --++--- 0 | (4294967295,0) | 1 (1 row) Note that tableoid and ctid have been changed! I think the reason for that is because EvalPlanQualFetchRowMarks doesn't properly set tableoid and ctid for foreign tables, IIUC. I think this should be fixed. Please find attached a patch. The patch slightly relates to [1], so if it is reasonable, I'll update [1] on top of this. [1] https://commitfest.postgresql.org/action/patch_view?id=1386 Best regards, Etsuro Fujita *** a/contrib/postgres_fdw/postgres_fdw.c --- b/contrib/postgres_fdw/postgres_fdw.c *** *** 2947,2953 make_tuple_from_result_row(PGresult *res, tuple = heap_form_tuple(tupdesc, values, nulls); if (ctid) ! tuple-t_self = *ctid; /* Clean up */ MemoryContextReset(temp_context); --- 2947,2953 tuple = heap_form_tuple(tupdesc, values, nulls); if (ctid) ! tuple-t_self = tuple-t_data-t_ctid = *ctid; /* Clean up */ MemoryContextReset(temp_context); *** a/src/backend/executor/execMain.c --- b/src/backend/executor/execMain.c *** *** 795,800 InitPlan(QueryDesc *queryDesc, int eflags) --- 795,801 { PlanRowMark *rc = (PlanRowMark *) lfirst(l); Oid relid; + char relkind; Relation relation; ExecRowMark *erm; *** *** 817,826 InitPlan(QueryDesc *queryDesc, int eflags) --- 818,833 break; case ROW_MARK_COPY: /* there's no real table here ... */ + relkind = rt_fetch(rc-rti, rangeTable)-relkind; + if (relkind == RELKIND_FOREIGN_TABLE) + relid = getrelid(rc-rti, rangeTable); + else + relid = InvalidOid; relation = NULL; break; default: elog(ERROR, unrecognized markType: %d, rc-markType); + relid = InvalidOid; relation = NULL; /* keep compiler quiet */ break; } *** *** 831,836 InitPlan(QueryDesc *queryDesc, int eflags) --- 838,844 erm = (ExecRowMark *) palloc(sizeof(ExecRowMark)); erm-relation = relation; + erm-relid = relid; erm-rti = rc-rti; erm-prti = rc-prti; erm-rowmarkId = rc-rowmarkId; *** *** 2318,2325 EvalPlanQualFetchRowMarks(EPQState *epqstate) /* build a temporary HeapTuple control structure */ tuple.t_len = HeapTupleHeaderGetDatumLength(td); ! ItemPointerSetInvalid((tuple.t_self)); ! tuple.t_tableOid = InvalidOid; tuple.t_data = td; /* copy and store tuple */ --- 2326,2342 /* build a temporary HeapTuple control structure */ tuple.t_len = HeapTupleHeaderGetDatumLength(td); ! /* if relid is valid, rel is a foreign table; set system columns */ ! if (OidIsValid(erm-relid)) ! { ! tuple.t_self = td-t_ctid; ! tuple.t_tableOid = erm-relid; ! } ! else ! { ! ItemPointerSetInvalid((tuple.t_self)); ! tuple.t_tableOid = InvalidOid; ! } tuple.t_data = td; /* copy and store tuple */ *** a/src/backend/executor/nodeForeignscan.c --- b/src/backend/executor/nodeForeignscan.c *** *** 22,27 --- 22,28 */ #include postgres.h + #include access/htup_details.h #include executor/executor.h #include executor/nodeForeignscan.h #include foreign/fdwapi.h *** *** 53,65 ForeignNext(ForeignScanState *node) /* * If any system columns are requested, we have to force the tuple into * physical-tuple form to avoid cannot extract system attribute from ! * virtual tuple errors later. We also insert a valid value for ! * tableoid, which is the only actually-useful system column. */ if (plan-fsSystemCol !TupIsNull(slot)) { HeapTuple tup = ExecMaterializeSlot(slot); tup-t_tableOid = RelationGetRelid(node-ss.ss_currentRelation); } --- 54,69 /* * If any system columns are requested, we have to force the tuple into * physical-tuple form to avoid cannot extract system attribute from ! * virtual tuple errors later. We also insert a valid value for TID ! * and tableoid, which are the only actually-useful system columns. */ if (plan-fsSystemCol !TupIsNull(slot)) { HeapTuple tup = ExecMaterializeSlot(slot); + /*
Re: [HACKERS] compress method for spgist - 2
Marking this patch as returned with feedback because it is waiting for input from the author for now a couple of weeks. Heikki, the refactoring patch has some value, are you planning to push it? -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical column ordering
On Sun, Jan 4, 2015 at 10:37 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: So I'm reworking my patch with that in mind. Switching to returned with feedback. Alvaro, feel free to add an entry to the next CF if you are planning to work on it again. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DROP PRIVILEGES OWNED BY
On Thu, Dec 18, 2014 at 1:43 AM, Marko Tiikkaja ma...@joh.to wrote: I don't have a problem with that. It would probably work, too, since FROM is already fully reserved. Marking patch as returned with feedback as there has been no input from Marko in the last couple of weeks. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] moving from contrib to bin
On Tue, Dec 23, 2014 at 3:24 PM, Michael Paquier michael.paqu...@gmail.com wrote: On Mon, Dec 22, 2014 at 11:30 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Michael Paquier wrote: And here is an updated patch following those lines. Similarly to the things in contrib/, a set of variables are used to define for each module what are the extra libraries, include dirs, etc. This refactors quite a bit of code, even if there are a couple of exceptions like pg_xlogdump/ or pg_basebackup/. In a broad look, this looks a lot better. I think we should press forward with this whole set of patches and see what the buildfarm thinks. Here is a new series of patches for all those things, with the following additions: - Some cleanup for MSVC scripts compared to last patch - Moved documentation to ref/ - Removed mention to the authors of the utilities moved (?) This set of patches is basically made of the former set, with 2 additional patches for MSVC stuff and documentation pages moved to ref/. Where are we on this? This patch is waiting for input from author for the last couple of weeks. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Typo fix in alter_table.sgml
Robert Haas wrote: But that's not a typo as stated in $SUBJECT but rather a markup fix. Definitely. Thanks. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Merging postgresql.conf and postgresql.auto.conf
On Thu, Jan 15, 2015 at 1:15 AM, Tom Lane t...@sss.pgh.pa.us wrote: Sawada Masahiko sawada.m...@gmail.com writes: On Thu, Jan 15, 2015 at 12:37 AM, Tom Lane t...@sss.pgh.pa.us wrote: Yes. The entire reason that postgresql.auto.conf is separate is that we despaired of reading and rewriting postgresql.conf automatically without making a hash of material in the comments. Calling the logic a merge tool does not make that problem go away. The merge tool do not only to merge the all parameters in two configuration into one file but also to remove duplicate parameters. That is, the configuration files will be one file in logically. I'll just say one more time that if we thought this could work, we'd not have arrived at the separate-files design to begin with. You can work on it if you like, but I will bet a good deal that you will not end up with something that gets accepted. Yep, I don't intend to propose again that. Because I thought that the maintaining of configuration file will be complicated, so I just thought we can add supporting tool. Regards, --- Sawada Masahiko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Merging postgresql.conf and postgresql.auto.conf
On Thu, Jan 15, 2015 at 12:37 AM, Tom Lane t...@sss.pgh.pa.us wrote: Sawada Masahiko sawada.m...@gmail.com writes: The postgresql.auto.conf is loaded after loading of postgresql.conf whenever configuration file is loaded or reloaded. This means that parameter in postgresql.auto.conf is quite high priority, so the parameter in postgresql.conf does not work at all even if user set it manually. If user want to change stopped postgresql server then user need to merge two configuration file(postgresql.conf and postgresql.auto.conf) while maintaining the consistency manually. From an operational perspective having a written config with duplicate entries is not good thing. I think we need to merge two configuration file into one (or more than one, if it uses like 'include' word) The one solution is to add merging tool/commnand which merges two configuration file while maintaining the consistency. This topic have been already discussed? Yes. The entire reason that postgresql.auto.conf is separate is that we despaired of reading and rewriting postgresql.conf automatically without making a hash of material in the comments. Calling the logic a merge tool does not make that problem go away. The merge tool do not only to merge the all parameters in two configuration into one file but also to remove duplicate parameters. That is, the configuration files will be one file in logically. It will be clearly complicated work that the user need to rewrite postgresql.conf manually while maintaining the consistency. (On top of that, The executing of ALTER SYSTEM command is not allowed user except super user.) Is it bad that ALTER SYSTEM parses postgresql.conf again at AlterSystemSetConfigFile() to get line number and file name of that parameter? (or postgresql continue to keep line number and file name of parameter) Regards, --- Sawada Masahiko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Merging postgresql.conf and postgresql.auto.conf
Sawada Masahiko sawada.m...@gmail.com writes: On Thu, Jan 15, 2015 at 12:37 AM, Tom Lane t...@sss.pgh.pa.us wrote: Yes. The entire reason that postgresql.auto.conf is separate is that we despaired of reading and rewriting postgresql.conf automatically without making a hash of material in the comments. Calling the logic a merge tool does not make that problem go away. The merge tool do not only to merge the all parameters in two configuration into one file but also to remove duplicate parameters. That is, the configuration files will be one file in logically. I'll just say one more time that if we thought this could work, we'd not have arrived at the separate-files design to begin with. You can work on it if you like, but I will bet a good deal that you will not end up with something that gets accepted. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Incremental backup v3: incremental PoC
Hi Marco, thank you for sending an updated patch. I am writing down a report of this initial (and partial) review. IMPORTANT: This patch is not complete, as stated by Marco. See the Conclusions section for my proposed TODO list. == Patch application I have been able to successfully apply your patch and compile it. Regression tests passed. == Initial run I have created a fresh new instance of PostgreSQL and activated streaming replication to be used by pg_basebackup. I have done a pgbench run with scale 100. I have taken a full consistent backup with pg_basebackup (in plain format): pg_basebackup -v -F p -D $BACKUPDIR/backup-$(date '+%s') -x I have been able to verify that the backup_profile is correctly placed in the destination PGDATA directory. Here is an excerpt: POSTGRESQL BACKUP PROFILE 1 START WAL LOCATION: 0/358 (file 00010003) CHECKPOINT LOCATION: 0/38C BACKUP METHOD: streamed BACKUP FROM: master START TIME: 2015-01-14 10:07:07 CET LABEL: pg_basebackup base backup FILE LIST \N \N t 1421226427 206 backup_label \N \N t 1421225508 88 postgresql.auto.conf ... As suggested by Marco, I have manually taken the LSN from this file (next version must do this automatically). I have then executed pg_basebackup and activated the incremental feature by using the LSN from the previous backup, as follows: LSN=$(awk '/^START WAL/{print $4}' backup_profile) pg_basebackup -v -F p -D $BACKUPDIR/backup-$(date '+%s') -I $LSN -x The time taken by this operation has been much lower than the previous one and the size is much lower (I have not done any operation in the meantime): du -hs backup-1421226* 1,5Gbackup-1421226427 17M backup-1421226427 I have done some checks on the file system and then used the prototype of recovery script in Python written by Marco. ./recover.py backup-1421226427 backup-1421226427 new-data The cluster started successfully. I have then run a pg_dump of the pgbench database and were able to reload it on the initial cluster. == Conclusions The first run of this patch seems promising. While the discussion on the LSN map continues (which is mainly an optimisation of this patch), I would really like to see this patch progress as it would be a killer feature in several contexts (not in every context). Just in this period we are releasing file based incremental backup for Barman and customers using the alpha version are experiencing on average a deduplication ratio between 50% to 70%. This is for example an excerpt of barman show-backup from one of our customers (a daily saving of 550GB is not bad): Base backup information: Disk usage : 1.1 TiB (1.1 TiB with WALs) Incremental size : 564.6 GiB (-50.60%) ... My opinion, Marco, is that for version 5 of this patch, you: 1) update the information on the wiki (it is outdated - I know you have been busy with LSN map optimisation) 2) modify pg_basebackup in order to accept a directory (or tar file) and automatically detect the LSN from the backup profile 3) add the documentation regarding the backup profile and pg_basebackup Once we have all of this, we can continue trying the patch. Some unexplored paths are: * tablespace usage * tar format * performance impact (in both read-only and heavily updated contexts) * consistency checks I would then leave for version 6 the pg_restorebackup utility (unless you want to do everything at once). One limitation of the current recovery script is that it cannot accept multiple incremental backups (it just accepts three parameters: base backup, incremental backup and merge destination). Maybe you can change the syntax as follows: ./recover.py DESTINATION BACKUP_1 BACKUP_2 [BACKUP_3, ...] Thanks a lot for working on this. I am looking forward to continuing the review. Ciao, Gabriele -- Gabriele Bartolini - 2ndQuadrant Italia - Managing Director PostgreSQL Training, Services and Support gabriele.bartol...@2ndquadrant.it | www.2ndQuadrant.it 2015-01-13 17:21 GMT+01:00 Marco Nenciarini marco.nenciar...@2ndquadrant.it : Il 13/01/15 12:53, Gabriele Bartolini ha scritto: Hi Marco, could you please send an updated version the patch against the current HEAD in order to facilitate reviewers? Here is the updated patch for incremental file based backup. It is based on the current HEAD. I'm now working to the client tool to rebuild a full backup starting from a file based incremental backup. Regards, Marco -- Marco Nenciarini - 2ndQuadrant Italy PostgreSQL Training, Services and Support marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it
Re: [HACKERS] Re: Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs
On Mon, Jan 5, 2015 at 12:54 PM, Tom Lane t...@sss.pgh.pa.us wrote: TBH, my first reaction to this entire patch is unfavorable: it's a solution in search of a problem. It adds substantial complication not only for users but for PG developers in order to solve a rather narrow performance issue. What would make sense to me is to teach the planner about inlining SQL functions that include ORDER BY clauses, so that the performance issue of a double sort could be avoided entirely transparently to the user. That's not a bad idea, but it only helps for SQL functions. Actually, the problem I have run into in the past was not that the planner didn't know the ordering of the SRF's return value, but that it had no statistics for it, and therefore made bad optimization decisions. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hung backends stuck in spinlock heavy endless loop
On Wed, Jan 14, 2015 at 9:30 AM, Andres Freund and...@2ndquadrant.com wrote: If you gdb in, and type 'fin' a couple times, to wait till the function finishes, is there actually any progress? I'm wondering whether it's just many catalog accesses + contention, or some other problem. Alternatively set a breakpoint on ScanPgRelation() or so and see how often it's hit. well, i restarted the database, forgetting my looper was running which immediately spun up and it got stuck again with a similar profile (lots of cpu in spinlock): Samples: 3K of event 'cycles', Event count (approx.): 2695723228 31.16% postgres[.] s_lock 22.32% postgres[.] tas 12.13% postgres[.] tas 5.93% postgres[.] spin_delay 5.69% postgres[.] LWLockRelease 3.75% postgres[.] LWLockAcquireCommon 3.61% perf[.] 0x000526c4 2.51% postgres[.] FunctionCall2Coll 1.48% libc-2.17.so[.] 0x0016a132 If you gdb in, and type 'fin' a couple times, (gdb) fin Run till exit from #0 0x7ff4c63f7a97 in semop () from /lib/x86_64-linux-gnu/libc.so.6 0x006de073 in PGSemaphoreLock () (gdb) fin Run till exit from #0 0x006de073 in PGSemaphoreLock () It returned once. Second time, it didn't at least so far (minute or so). I can send the code off-list if you guys think it'd help. Might be interesting. sent (off-list). merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hung backends stuck in spinlock heavy endless loop
On 2015-01-14 09:47:19 -0600, Merlin Moncure wrote: On Wed, Jan 14, 2015 at 9:30 AM, Andres Freund and...@2ndquadrant.com wrote: If you gdb in, and type 'fin' a couple times, to wait till the function finishes, is there actually any progress? I'm wondering whether it's just many catalog accesses + contention, or some other problem. Alternatively set a breakpoint on ScanPgRelation() or so and see how often it's hit. well, i restarted the database, forgetting my looper was running which immediately spun up and it got stuck again with a similar profile (lots of cpu in spinlock): Samples: 3K of event 'cycles', Event count (approx.): 2695723228 31.16% postgres[.] s_lock 22.32% postgres[.] tas 12.13% postgres[.] tas 5.93% postgres[.] spin_delay 5.69% postgres[.] LWLockRelease 3.75% postgres[.] LWLockAcquireCommon 3.61% perf[.] 0x000526c4 2.51% postgres[.] FunctionCall2Coll 1.48% libc-2.17.so[.] 0x0016a132 If you gdb in, and type 'fin' a couple times, (gdb) fin Run till exit from #0 0x7ff4c63f7a97 in semop () from /lib/x86_64-linux-gnu/libc.so.6 0x006de073 in PGSemaphoreLock () (gdb) fin Run till exit from #0 0x006de073 in PGSemaphoreLock () It returned once. Second time, it didn't at least so far (minute or so). Hm, that's autovac though, not the normal user backends that actually do stuff, right? If you could additionally check those, it'd be great. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minor configure tweak to simplify adjusting gcc warnings
Andres Freund and...@2ndquadrant.com writes: I've already given up... Given how infrequent it is, suppressing it for gull seems sufficient. I'm confused --- I see no format warnings in gull's current reports. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minor configure tweak to simplify adjusting gcc warnings
On 2015-01-14 11:09:10 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: I've already given up... Given how infrequent it is, suppressing it for gull seems sufficient. I'm confused --- I see no format warnings in gull's current reports. Sorry it was me being confused. I somehow switched gaur and gull because I had tabs open for both. Turns out that gaur doesn't doesn't do %z at all... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers