Re: Postgres with pthread
> But it is a theory. The main idea of this prototype was to prove or disprove > this expectation at practice. > But please notice that it is very raw prototype. A lot of stuff is not > working yet. > And supporting all of exited Postgres functionality requires > much more efforts (and even more efforts are needed for optimizing Postgres > for this architecture). > > I just want to receive some feedback and know if community is interested in > any further work in this direction. Looks good. You are right, it is a theory. If your prototype does actually show what we think it does then it is a good and interesting result. I think we need careful analysis to show where these exact gains come from. The actual benefit is likely not evenly distributed across the list of possible benefits. Did they arise because you produced a stripped down version of Postgres? Or did they arise from using threads? It would not be the first time a result shown in protoype did not show real gains on a completed project. I might also read your results to show that connection concentrators would be a better area of work, since 100 connections perform better than 1000 in both cases, so why bother optimising for 1000 connections at all? In which case we should read the benefit at the 100 connections line, where it shows the lower 28% gain, closer to the gain your colleague reported. So I think we don't yet have enough to make a decision. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] Runtime Partition Pruning
On Wed, Dec 6, 2017 at 1:21 PM, David Rowleywrote: > On 2 December 2017 at 08:04, Robert Haas wrote: >> On Fri, Dec 1, 2017 at 6:20 AM, Beena Emerson >> wrote: >>> David Q1: >>> postgres=# explain analyse execute ab_q1 (3,3); --const >>>QUERY PLAN >>> - >>> Append (cost=0.00..43.90 rows=1 width=8) (actual time=0.006..0.006 >>> rows=0 loops=1) >>>-> Seq Scan on ab_a3_b3 (cost=0.00..43.90 rows=1 width=8) (actual >>> time=0.005..0.005 rows=0 loops=1) >>> Filter: ((a = 3) AND (b = 3)) >>> Planning time: 0.588 ms >>> Execution time: 0.043 ms >>> (5 rows) >> >> I think the EXPLAIN ANALYZE input should show something attached to >> the Append node so that we can tell that partition pruning is in use. >> I'm not sure if that is as simple as "Run-Time Partition Pruning: Yes" >> or if we can give a few more useful details. > > It already does. Anything subnode with "(never executed)" was pruned > at runtime. Do we really need anything else to tell us that? I have added the partition quals that are used for pruning. PFA the updated patch. I have changed the names of variables to make it more appropriate, along with adding more code comments and doing some refactoring and other code cleanups. Few cases: 1. Only runtime pruning - David's case1 explain analyse execute ab_q1 (2,3); QUERY PLAN - Append (cost=0.00..395.10 rows=9 width=8) (actual time=0.101..0.101 rows=0 loops=1) Runtime Partition Pruning: ((a = $1) AND (b = $2)) -> Seq Scan on ab_a1_b1 (cost=0.00..43.90 rows=1 width=8) (never executed) Filter: ((a = $1) AND (b = $2)) -> Seq Scan on ab_a1_b2 (cost=0.00..43.90 rows=1 width=8) (never executed) Filter: ((a = $1) AND (b = $2)) -> Seq Scan on ab_a1_b3 (cost=0.00..43.90 rows=1 width=8) (never executed) Filter: ((a = $1) AND (b = $2)) -> Seq Scan on ab_a2_b1 (cost=0.00..43.90 rows=1 width=8) (never executed) Filter: ((a = $1) AND (b = $2)) -> Seq Scan on ab_a2_b2 (cost=0.00..43.90 rows=1 width=8) (never executed) Filter: ((a = $1) AND (b = $2)) -> Seq Scan on ab_a2_b3 (cost=0.00..43.90 rows=1 width=8) (actual time=0.007..0.007 rows=0 loops=1) Filter: ((a = $1) AND (b = $2)) -> Seq Scan on ab_a3_b1 (cost=0.00..43.90 rows=1 width=8) (never executed) Filter: ((a = $1) AND (b = $2)) -> Seq Scan on ab_a3_b2 (cost=0.00..43.90 rows=1 width=8) (never executed) Filter: ((a = $1) AND (b = $2)) -> Seq Scan on ab_a3_b3 (cost=0.00..43.90 rows=1 width=8) (never executed) Filter: ((a = $1) AND (b = $2)) Planning time: 0.780 ms Execution time: 0.220 ms (22 rows) 2. Runtime pruning after optimizer pruning - David's case 2. ((a >= 4) AND (a <= 5) is used during optimizer pruning and only (a = $1) is used for runtime pruning. =# explain (analyse, costs off, summary off) execute ab_q1 (4); QUERY PLAN --- Append (actual time=0.062..0.062 rows=0 loops=1) Runtime Partition Pruning: (a = $1) -> Seq Scan on ab_a4 (actual time=0.005..0.005 rows=0 loops=1) Filter: ((a >= 4) AND (a <= 5) AND (a = $1)) -> Seq Scan on ab_a5 (never executed) Filter: ((a >= 4) AND (a <= 5) AND (a = $1)) (6 rows) 3. Nestloop Join tbl1.col1 only has values from 1 to 10. =# \d+ tprt Table "public.tprt" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description +-+---+--+-+-+--+- col1 | integer | | | | plain | | col2 | integer | | | | plain | | Partition key: RANGE (col1) Partitions: tprt_1 FOR VALUES FROM (1) TO (5001), tprt_2 FOR VALUES FROM (5001) TO (10001), tprt_3 FOR VALUES FROM (10001) TO (20001) =# explain (analyse, costs off, summary off) SELECT * FROM tbl1 JOIN tprt ON tbl1.col1 > tprt.col1; QUERY PLAN Nested Loop (actual time=0.053..0.192 rows=45 loops=1) -> Seq Scan on tbl1 (actual time=0.007..0.009 rows=10 loops=1) -> Append (actual time=0.003..0.004 rows=4 loops=10) Runtime Partition Pruning Join Filter: (tbl1.col1 > col1) -> Index Scan using tprt1_idx on tprt_1 (actual time=0.002..0.004 rows=5 loops=9) Index Cond: (tbl1.col1 > col1) -> Index Scan using
Re: Speeding up pg_upgrade
On Tue, Dec 5, 2017 at 09:30:53AM -0500, Stephen Frost wrote: > > > The other concern is if there's changes made to the catalogs by non-DDL > > > activity that needs to be addressed too (logical replication?); nothing > > > definite springs to mind off-hand for me, but perhaps others will think > > > of things. > > > > Yes, it could extend to many parts of the system, which is why I am > > asking if it is worth it. > > My initial reaction is that it's worth it, but then I also wonder about > other issues (having to get an ANALYZE done on the new cluster before > opening it up, for example..) and it makes me wonder if perhaps it'll > end up being too much risk for too little gain. Yes, dump/reload of analyze statistics seems like a better use of time. I have avoided it since it locks us into supporting the text respresentation of data type, but at this point it might be worth it. -- Bruce Momjianhttp://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Re: Add PGDLLIMPORT lines to some variables
On 5 December 2017 at 22:49, Robert Haaswrote: > > Committed with these additions. Please check that I haven't messed > anything up. > > Looks good to me. For the record the commit is commit c572599c65bfe0387563233faabecd2845073538 Author: Robert Haas Date: Tue Dec 5 09:23:57 2017 -0500 Mark assorted variables PGDLLIMPORT. This makes life easier for extension authors who wish to support Windows. Brian Cloutier, slightly amended by me. Discussion: http://postgr.es/m/cajcy68fscdnhmzfps4kyo00cadkvxvea-28h-otenk-pa2o...@mail.gmail.com plus back branches. I was going to pipe up here to add ReplicationSlotCtl to the list. Otherwise the only way to access slot information is via the SPI and pg_stat_replication_slots, which isn't super fun. And it's not like ReplicationSlotCtl is any more internal than MyReplicationSlot. I missed the boat on your commit, but ... please? Patches attached. MyReplicationSlot was only made PGDLLIMPORT in 9.6, so there's one for 9.4/9.5 and one for 9.6, 10, and master. Personally I don't care about 9.4/9.5 in the slightest for this, but that's where c572599c is backpatched to. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services From de25c6aff0e07b6acf761a3b939baf2be0fa6389 Mon Sep 17 00:00:00 2001 From: Craig Ringer Date: Thu, 7 Dec 2017 13:40:13 +0800 Subject: [PATCH v1] Make MyReplicationSlot and ReplicationSlotCtl PGDLLIMPORT --- src/include/replication/slot.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h index c129a4a..0fee63c 100644 --- a/src/include/replication/slot.h +++ b/src/include/replication/slot.h @@ -136,8 +136,8 @@ typedef struct ReplicationSlotCtlData /* * Pointers to shared memory */ -extern ReplicationSlotCtlData *ReplicationSlotCtl; -extern ReplicationSlot *MyReplicationSlot; +extern PGDLLIMPORT ReplicationSlotCtlData *ReplicationSlotCtl; +extern PGDLLIMPORT ReplicationSlot *MyReplicationSlot; /* GUCs */ extern PGDLLIMPORT int max_replication_slots; -- 2.9.5 From 9e7331ecb3889cbdccc3f8a3a62ac71f215bb3f6 Mon Sep 17 00:00:00 2001 From: Craig Ringer Date: Thu, 7 Dec 2017 13:32:11 +0800 Subject: [PATCH v1] Mark ReplicationSlotCtl as PGDLLIMPORT --- src/include/replication/slot.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h index 0c44233..c3b7cc8 100644 --- a/src/include/replication/slot.h +++ b/src/include/replication/slot.h @@ -156,7 +156,7 @@ typedef struct ReplicationSlotCtlData /* * Pointers to shared memory */ -extern ReplicationSlotCtlData *ReplicationSlotCtl; +extern PGDLLIMPORT ReplicationSlotCtlData *ReplicationSlotCtl; extern PGDLLIMPORT ReplicationSlot *MyReplicationSlot; /* GUCs */ -- 2.9.5
Re: [HACKERS] INSERT ON CONFLICT and partitioned tables
On 2017/12/02 2:57, Robert Haas wrote: > On Fri, Dec 1, 2017 at 2:44 AM, Amit Langote >wrote: >> I forgot to consider the fact that mtstate could be NULL in >> ExecSetupPartitionTupleRouting(), so would result in dereferencing NULL >> pointer when called from CopyFrom(), which fixed in the attached updated >> patch. > > a ? b : false can more simply be spelled a && b. > > Committed after changing it like that, fixing the broken documentation > build, and making minor edits to the comments and documentation. Thanks for committing. Regards, Amit
Re: Postgres with pthread
On 7 December 2017 at 11:44, Tsunakawa, Takayuki < tsunakawa.ta...@jp.fujitsu.com> wrote: > From: Craig Ringer [mailto:cr...@2ndquadrant.com] > > I'd personally expect that an immediate conversion would result > > in very > > little speedup, a bunch of code deleted, a bunch of complexity > > added. And it'd still be massively worthwhile, to keep medium to > > long > > term complexity and feature viability in control. > > +1 > I hope for things like: > > * More performance statistics like system-wide LWLock waits, without the > concern about fixed shared memory size > * Dynamic memory sizing, such as shared_buffers, work_mem, > maintenance_work_mem > I'm not sure how threaded operations would help us much there. If we could split shared_buffers into extents we could do this with something like dsm already. Without the ability to split it into extents, we can't do it with locally malloc'd memory in a threaded system either. Re performance diagnostics though, you can already get a lot of useful data from PostgreSQL's SDT tracepoints, which are usable with perf and DTrace amongst other tools. Dynamic userspace 'perf' probes can tell you a lot too. I'm confident you could collect some seriously useful data with perf tracepoints and 'perf script' these days. (BTW, I extended the https://wiki.postgresql.org/wiki/Profiling_with_perf article a bit yesterday with some tips on this). Of course better built-in diagnostics would be nice. But I really don't see how it'd have much to do with threaded vs forked model of execution; we can allocate chunks of memory with dsm now, after all. > * Running multi-threaded components in postgres extension (is it really > safe to run JVM for PL/Java in a single-threaded postgres?) > PL/Java is a giant mess for so many more reasons than that. The JVM is a heavyweight startup, lightweight thread model system. It doesn't play at all well with postgres's lightweight process fork()-based CoW model. You can't fork() the JVM because fork() doesn't play nice with threads, at all. So you have to start it in each backend individually, which is just awful. One of the nice things if Pg got a threaded model would be that you could embed a JVM, Mono/.NET runtime, etc and have your sessions work together in ways you cannot currently sensibly do. Folks using MS SQL, Oracle, etc are pretty used to being able to do this, and while it should be done with caution it can offer huge benefits for some complex workloads. Right now if a PostgreSQL user wants to do anything involving IPC, shared data, etc, we pretty much have to write quite complex C extensions to do it. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Postgres with pthread
On 7 December 2017 at 05:58, Thomas Munrowrote: > > Using a ton of thread local variables may be a useful stepping stone, > but if we want to be able to separate threads/processes from sessions > eventually then I guess we'll want to model sessions as first class > objects and pass them around explicitly or using a single TLS variable > current_session. > > Yep. This is the real reason I'm excited by the idea of a threading conversion. PostgreSQL's architecture conflates "connection", "session" and "executor" into one somewhat muddled mess. I'd love to be able to untangle that to the point where we can pool executors amongst active queries, while retaining idle sessions' state properly even while they're in a transaction. Yeah, that's a long way off, but it'd be a whole lot more practical if we didn't have to serialize and deserialize the entire session state to do it. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Postgres with pthread
On 7 December 2017 at 01:17, Andres Freundwrote: > > > I think you've done us a very substantial service by pursuing > > this far enough to get some quantifiable performance results. > > But now that we have some results in hand, I think we're best > > off sticking with the architecture we've got. > > I don't agree. > > I'd personally expect that an immediate conversion would result in very > little speedup, a bunch of code deleted, a bunch of complexity > added. And it'd still be massively worthwhile, to keep medium to long > term complexity and feature viability in control. > Personally I think it's a pity we didn't land up here before the foundations for parallel query went in - DSM, shm_mq, DSA, etc. I know the EDB folks at least looked into it though, and presumably there were good reasons to go in this direction. Maybe that was just "community will never accept threaded conversion" at the time, though. Now we have quite a lot of homebrew infrastructure to consider if we do a conversion. That said, it might in some ways make it easier. shm_mq, for example, would likely convert to a threaded backend with minimal changes to callers, and probably only limited changes to shm_mq its self. So maybe these abstractions will prove to have been a win in some ways. Except DSA, and even then it could serve as a transitional API... -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
On Thu, Dec 7, 2017 at 1:21 AM, Alvaro Herrerawrote: > Put together, I propose the attached delta for 0001. I have been looking at Andres' 0001 and your tweaks here for some time since yesterday... I have also performed sanity checks using all the scripts that have accumulated on my archives for this stuff. This looks solid to me. I have not seen failures with broken hot chains, REINDEX, etc. > This way, an xmax that has exactly the OldestXmin value will return > RECENTLY_DEAD rather DEAD, which seems reasonable to me (since > OldestXmin value itself is supposed to be still possibly visible to > somebody). Also, this way it is consistent with the other comparison to > OldestXmin at the bottom of the function. There is no reason for the > "else" or the extra braces. +1. It would be nice to add a comment in the patched portion mentioning that the new code had better match what is at the bottom of the function. + else if (!MultiXactIdIsRunning(HeapTupleHeaderGetRawXmax(tuple), false)) + { + /* +* Not in Progress, Not Committed, so either Aborted or crashed. +* Mark the Xmax as invalid. +*/ + SetHintBits(tuple, buffer, HEAP_XMAX_INVALID, InvalidTransactionId); } - /* -* Not in Progress, Not Committed, so either Aborted or crashed. -* Remove the Xmax. -*/ - SetHintBits(tuple, buffer, HEAP_XMAX_INVALID, InvalidTransactionId); return HEAPTUPLE_LIVE; I would find cleaner if the last "else if" is put into its own separate if condition, and that for a multixact still running this refers to an updating transaction aborted so hint bits are not set. > Your commit message does a poor job of acknowledging prior work on > diagnosing the problem starting from Dan's initial test case and patch. (Nit: I have extracted from the test case of Dan an isolation test, which Andres has reduced to a subset of permutations. Peter G. also complained about what is visibly the same bug we are discussing here but without a test case.) -- Michael
Re: pgsql: When VACUUM or ANALYZE skips a concurrently dropped table, log i
On Wed, Dec 6, 2017 at 4:31 PM, Tom Lanewrote: > Well, I think it's a minute per query not per whole test script. But in > any case, if it's taking a longer time than any other isolation test on > the CLOBBER_CACHE_ALWAYS critters, then it's also taking a proportionately > longer time than any other test on every other platform, and is therefore > costing every developer precious time today and indefinitely far into the > future. I continue to say that this test ain't worth it. Sure. But, you also continue to not respond to my arguments about why it IS worth it. I don't want to spend a lot of time fighting about this, but it looks to me like your preferences here are purely arbitrary. Yesterday, you added - without discussion - a test that I had "obviously" left out by accident. Today, you want a test removed that I added on purpose but which you assert has insufficient value. So, sometimes you think it's worth adding tests that make the test suite longer, and other times you think it isn't. That's fair enough -- everyone comes down in different places on this at different times -- but the only actual reason you've offered is that the script contains a command that runs for over a minute on very slow machines that have been artificially slowed down 100x. That's a silly reason: it means that on real machines we're talking less than a second of runtime even without modifying the test case, and if we do modify the test, it can probably be made much less. Please give me a little time to see if I can speed up this test enough to fix this problem. If that doesn't work out, then we can rip this out. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: compress method for spgist - 2
On 06.12.2017 21:49, Alexander Korotkov wrote: On Wed, Dec 6, 2017 at 6:08 PM, Nikita Glukhov> wrote: On 05.12.2017 23:59, Alexander Korotkov wrote: On Tue, Dec 5, 2017 at 1:14 PM, Darafei Praliaskouski > wrote: The following review has been posted through the commitfest application: make installcheck-world: not tested Implements feature: not tested Spec compliant: not tested Documentation: tested, passed I've read the updated patch and see my concerns addressed. I'm looking forward to SP-GiST compress method support, as it will allow usage of SP-GiST index infrastructure for PostGIS. The new status of this patch is: Ready for Committer I went trough this patch. I found documentation changes to be not sufficient. And I've made some improvements. In particular, I didn't understand why is reconstructedValue claimed to be of spgConfigOut.leafType while it should be of spgConfigIn.attType both from general logic and code. I've fixed that. Nikita, correct me if I'm wrong. I think we are reconstructing a leaf datum, so documentation was correct but the code in spgWalk() and freeScanStackEntry() wrongly used attType instead of attLeafType. Fixed patch is attached. Reconstructed datum is used for index-only scan. Thus, it should be original indexed datum of attType, unless we have decompress method and pass reconstructed datum through it. But if we have compress method and do not have decompress method then index-only scan seems to be impossible. Also, I wonder should we check for existence of compress method when attType and leafType are not the same in spgvalidate() function? We don't do this for now. I've added compress method existence check to spgvalidate(). It would be nice to evade double calling of config method. Possible option could be to memorize difference between attribute type and leaf type in high bit of functionset, which is guaranteed to be free. I decided to simply set compress method's bit in functionset when this method it is not required. -- Nikita Glukhov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/doc/src/sgml/spgist.sgml b/doc/src/sgml/spgist.sgml index 139c8ed..b4a8be4 100644 --- a/doc/src/sgml/spgist.sgml +++ b/doc/src/sgml/spgist.sgml @@ -240,20 +240,22 @@ There are five user-defined methods that an index operator class for - SP-GiST must provide. All five follow the convention - of accepting two internal arguments, the first of which is a - pointer to a C struct containing input values for the support method, - while the second argument is a pointer to a C struct where output values - must be placed. Four of the methods just return void, since - all their results appear in the output struct; but + SP-GiST must provide, and one is optional. All five + mandatory methods follow the convention of accepting two internal + arguments, the first of which is a pointer to a C struct containing input + values for the support method, while the second argument is a pointer to a + C struct where output values must be placed. Four of the mandatory methods just + return void, since all their results appear in the output struct; but leaf_consistent additionally returns a boolean result. The methods must not modify any fields of their input structs. In all cases, the output struct is initialized to zeroes before calling the - user-defined method. + user-defined method. Optional sixth method compress + accepts datum to be indexed as the only argument and returns value suitable + for physical storage in leaf tuple. - The five user-defined methods are: + The five mandatory user-defined methods are: @@ -283,6 +285,7 @@ typedef struct spgConfigOut { Oid prefixType; /* Data type of inner-tuple prefixes */ Oid labelType; /* Data type of inner-tuple node labels */ +Oid leafType; /* Data type of leaf-tuple values */ boolcanReturnData; /* Opclass can reconstruct original data */ boollongValuesOK; /* Opclass can cope with values 1 page */ } spgConfigOut; @@ -305,6 +308,22 @@ typedef struct spgConfigOut class is capable of segmenting long values by repeated suffixing (see ). + + + leafType is typically the same as + attType. For the reasons of backward + compatibility, method config can + leave leafType uninitialized; that would + give the same effect as setting leafType equal + to attType. When attType + and leafType are different, then optional + method compress must be provided. + Method compress is responsible + for transformation of datums to
Logical replication without a Primary Key
-Hackers, In the docs it says: " If the table does not have any suitable key, then it can be set to replica identity“full”, which means the entire row becomes the key. " How does that work? Is it using one of the hidden columns on a row? Thanks, JD -- Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc PostgreSQL Centered full stack support, consulting and development. Advocate: @amplifypostgres || Learn: https://pgconf.org * Unless otherwise stated, opinions are my own. *
Re: es_query_dsa is broken
On Wed, Dec 6, 2017 at 10:16 PM, Amit Kapilawrote: > On Wed, Dec 6, 2017 at 1:14 AM, Robert Haas wrote: >> On Tue, Dec 5, 2017 at 7:24 AM, Amit Kapila wrote: >>> + EState *estate = gatherstate->ps.state; >>> + >>> + /* Install our DSA area while executing the plan. */ >>> + estate->es_query_dsa = gatherstate->pei->area; >>> outerTupleSlot = ExecProcNode(outerPlan); >>> + estate->es_query_dsa = NULL; >>> >>> Won't the above coding pattern create a problem, if ExecProcNode >>> throws an error and outer block catches it and continues execution >>> (consider the case of execution inside PL blocks)? >> >> I don't see what the problem is. The query that got aborted by the >> error wouldn't be sharing an EState with one that didn't. > > That's right. Ignore my comment, I got confused. Other than that, I > don't see any problem with the code as such apart from that it looks > slightly hacky. I think Thomas or someone needs to develop a patch on > the lines you have mentioned or what Thomas was trying to describe in > his email and see how it comes out. Yeah, it is a bit hacky, but I can't see a another way to fix it without changing released APIs and it's only for one release and will certainly be unhackified in v11. For v11 I think we need to decide between: 1. Removing es_query_dsa and injecting the right context into the executor tree as discussed. 2. Another idea mentioned by Robert in an off-list chat: We could consolidate all DSM segments in a multi-gather plan into one. See the nearby thread where someone had over a hundred Gather nodes and had to crank up max_connections to reserve enough DSM slots. Of course, optimising for that case doesn't make too much sense (I suspect multi-gather plans will become less likely with Parallel Append and Parallel Hash in the picture anyway), but it would reduce a bunch of duplicated work we do when it happens as well as scarce slot consumption. If we did that, then all of a sudden es_query_dsa would make sense again (ie it'd be whole-query scoped), and we could revert that hacky change. Or we could do both things anyway. -- Thomas Munro http://www.enterprisedb.com
Re: [HACKERS] Transaction control in procedures
On Wed, Dec 6, 2017 at 8:41 AM, Peter Eisentrautwrote: > On 12/5/17 13:33, Robert Haas wrote: >> On Tue, Dec 5, 2017 at 1:25 PM, Peter Eisentraut >> wrote: >>> I think ROLLBACK in a cursor loop might not make sense, because the >>> cursor query itself could have side effects, so a rollback would have to >>> roll back the entire loop. That might need more refined analysis before >>> it could be allowed. >> >> COMMIT really has the same problem; if the cursor query has side >> effects, you can't commit those side effects piecemeal as the loop >> executed and have things behave sanely. > > The first COMMIT inside the loop would commit the cursor query. This > isn't all that different from what you'd get now if you coded this > manually using holdable cursors or just plain client code. Clearly, you > can create a mess if the loop body interacts with the loop expression, > but that's already the case. > > But if you coded something like this yourself now and ran a ROLLBACK > inside the loop, the holdable cursor would disappear (unless previously > committed), so you couldn't proceed with the loop. > > The SQL standard for persistent stored modules explicitly prohibits > COMMIT and ROLLBACK in cursor loop bodies. But I think people will > eventually want it. The may want it, but silently promoting all cursors to held ones is not the way to give it to them, unless we narrow it down the the 'for-loop derived cursor' only. Even then we should consider syntax decoration: FOR x IN SELECT WITH HOLD LOOP END LOOP; merlin
Re: Postgres with pthread
On Thu, Dec 7, 2017 at 6:08 AM, Andres Freundwrote: > On 2017-12-06 19:40:00 +0300, Konstantin Knizhnik wrote: >> As far as I remember, several years ago when implementation of intra-query >> parallelism was just started there was discussion whether to use threads or >> leave traditional Postgres process architecture. The decision was made to >> leave processes. So now we have bgworkers, shared message queue, DSM, ... >> The main argument for such decision was that switching to threads will >> require rewriting of most of Postgres code. > >> It seems to be quit reasonable argument and and until now I agreed with it. >> >> But recently I wanted to check it myself. > > I think that's something pretty important to play with. There've been > several discussions lately, both on and off list / in person, that we're > taking on more-and-more technical debt just because we're using > processes. Besides the above, we've grown: > - a shared memory allocator > - a shared memory hashtable > - weird looking thread aware pointers > - significant added complexity in various projects due to addresses not > being mapped to the same address etc. Yes, those are all workarounds for an ancient temporary design choice. To quote from a 1989 paper[1] "Currently, POSTGRES runs as one process for each active user. This was done as an expedient to get a system operational as quickly as possible. We plan on converting POSTGRES to use lightweight processes [...]". +1 for sticking to the plan. While personally contributing to the technical debt items listed above, I always imagined that all that machinery could become compile-time options controlled with --with-threads and dsa_get_address() would melt away leaving only a raw pointers, and dsa_area would forward to the MemoryContext + ResourceOwner APIs, or something like that. It's unfortunate that we lose type safety along the way though. (If only there were some way we could write dsa_pointer. In fact it was also a goal of the original project to adopt C++, based on a comment in 4.2's nodes.h: "Eventually this code should be transmogrified into C++ classes, and this is more or less compatible with those things.") If there were a good way to reserve (but not map) a large address range before forking, there could also be an intermediate build mode that keeps the multi-process model but where DSA behaves as above, which might an interesting way to decouple the DSA-go-faster-and-reduce-tech-debt project from the threading project. We could manage the reserved address space ourselves and map DSM segments with MAP_FIXED, so dsa_get_address() address decoding could be compiled away. One way would be to mmap a huge range backed with /dev/zero, and then map-with-MAP_FIXED segments over the top of it and then remap /dev/zero back into place when finished, but that sucks because it gives you that whole mapping in your core files and relies on overcommit which we don't like, hence my interest in a way to reserve but not map. >> The first problem with porting Postgres to pthreads is static variables >> widely used in Postgres code. >> Most of modern compilers support thread local variables, for example GCC >> provides __thread keyword. >> Such variables are placed in separate segment which is address through >> segment register (at Intel). >> So access time to such variables is the same as to normal static variables. > > I experimented similarly. Although I'm not 100% sure that if were to go > for it, we wouldn't instead want to abstract our session concept > further, or well, at all. Using a ton of thread local variables may be a useful stepping stone, but if we want to be able to separate threads/processes from sessions eventually then I guess we'll want to model sessions as first class objects and pass them around explicitly or using a single TLS variable current_session. > I think the biggest problem with doing this for real is that it's a huge > project, and that it'll take a long time. > > Thanks for working on this! +1 [1] http://db.cs.berkeley.edu/papers/ERL-M90-34.pdf -- Thomas Munro http://www.enterprisedb.com
Re: Accessing base table relational names via RelOptInfo
On 7 December 2017 at 09:26, Walter Caiwrote: > I want to be able to programmatically access the relation names inside from > inside the calc_joinrel_size_estimate method (in costsize.c) using the > RelOptInfo *outer_rel, RelOptInfo *inner_rel arguments. I'm pretty sure I > need to use the Relids relids variables in the RelOptInfo struct but I'm not > sure where to go from there. If you loop through the RelOptInfo->relids with bms_next_member() and lookup the RangeTblEntry from root->simple_rte_array[] you can get the "relid", which is the Oid of the relation. You can then call get_rel_name() on that Oid. You'll also need to ensure the relid actually belongs to a relation, for this check that the RangeTblEntry->rtekind is RTE_RELATION. You might also want to think about schema names since two relations can share the same name in different schemas. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: pgsql: When VACUUM or ANALYZE skips a concurrently dropped table, log i
On Wed, Dec 6, 2017 at 12:57 PM, Tom Lanewrote: > What appears to be happening is that a database-wide ANALYZE takes more > than a minute under CLOBBER_CACHE_ALWAYS, causing isolationtester.c's > hardwired one-minute timeout to trigger. > > While you could imagine doing something to get around that, I do not > believe that this test is worth memorializing in perpetuity to begin > with. I'd recommend just taking it out again. Mumble. I don't really mind that, but I'll bet $0.05 that this will get broken at some point and we won't notice right away without the isolation test. Is it really our policy that no isolation test can take more than a minute on the slowest buildfarm critter? If somebody decides to start running CLOBBER_CACHE_ALWAYS on an even-slower critter, will we just nuke isolation tests from orbit until the tests pass there? I have difficulty seeing that as a sound approach. Another thought is that it might not be necessary to have a database-wide ANALYZE to trigger this. I managed to reproduce it locally by doing VACUUM a, b while alternately locking a and b, so that I let the name lookups complete, but then blocked trying to vacuum a, and then at that point dropped b, then released the VACUUM. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Accessing base table relational names via RelOptInfo
Hi, I hope this is the appropriate list to send this to. *Context:* I (grad student) am trying to insert my own cardinality estimates into the optimizer *What I need to do to get there:* I want to be able to programmatically access the relation names inside from inside the calc_joinrel_size_estimate method (in costsize.c) using the RelOptInfo *outer_rel, RelOptInfo *inner_rel arguments. I'm pretty sure I need to use the Relids relids variables in the RelOptInfo struct but I'm not sure where to go from there. Any help would be much appreciated Best, Walter
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Andres Freund wrote: > I've played around quite some with the attached patch. So far, after > applying the second patch, neither VACUUM nor VACUUM FULL / CLUSTER make > the situation worse for already existing corruption. HOT pruning can > change the exact appearance of existing corruption a bit, but I don't > think it can make the corruption meaningfully worse. It's a bit > annoying and scary to add so many checks to backbranches but it kinda > seems required. The error message texts aren't perfect, but these are > "should never be hit" type elog()s so I'm not too worried about that. Looking at 0002: I agree with the stuff being done here. I think a couple of these checks could be moved one block outerwards in term of scope; I don't see any reason why the check should not apply in that case. I didn't catch any place missing additional checks. Despite these being "shouldn't happen" conditions, I think we should turn these up all the way to ereports with an errcode and all, and also report the XIDs being complained about. No translation required, though. Other than those changes and minor copy editing a commit (attached), 0002 looks good to me. I started thinking it'd be good to report block number whenever anything happened while scanning the relation. The best way to go about this seems to be to add an errcontext callback to lazy_scan_heap, so I attach a WIP untested patch to add that. (I'm not proposing this for back-patch for now, mostly because I don't have the time/energy to push for it right now.) It appears that you got all the places that seem to reasonably need additional checks. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >From 9ac665638c86460f0b767628203f8bf28df35e49 Mon Sep 17 00:00:00 2001 From: Alvaro HerreraDate: Wed, 6 Dec 2017 16:44:25 -0300 Subject: [PATCH 1/2] tweaks for 0002 --- src/backend/access/heap/heapam.c | 67 +++ src/backend/commands/vacuumlazy.c | 6 ++-- 2 files changed, 50 insertions(+), 23 deletions(-) diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index eb93718baa..5c284a4c32 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -6387,17 +6387,23 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask, else if (MultiXactIdPrecedes(multi, cutoff_multi)) { if (MultiXactIdPrecedes(multi, relminmxid)) - elog(ERROR, "encountered multixact from before horizon"); + ereport(ERROR, + (errcode(ERRCODE_DATA_CORRUPTED), +errmsg_internal("found multixact %u from before relminmxid %u", +multi, relminmxid))); /* -* This old multi cannot possibly have members still running. If it -* was a locker only, it can be removed without any further -* consideration; but if it contained an update, we might need to -* preserve it. +* This old multi cannot possibly have members still running, but +* verify just in case. If it was a locker only, it can be removed +* without any further consideration; but if it contained an update, we +* might need to preserve it. */ if (MultiXactIdIsRunning(multi, HEAP_XMAX_IS_LOCKED_ONLY(t_infomask))) - elog(ERROR, "multixact from before cutoff found to be still running"); + ereport(ERROR, + (errcode(ERRCODE_DATA_CORRUPTED), +errmsg_internal("multixact %u from before cutoff %u found to be still running", +multi, cutoff_multi))); if (HEAP_XMAX_IS_LOCKED_ONLY(t_infomask)) { @@ -6419,9 +6425,14 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask, if (TransactionIdPrecedes(xid, cutoff_xid)) { if (TransactionIdPrecedes(xid, relfrozenxid)) - elog(ERROR, "encountered xid from before horizon"); + ereport(ERROR, + (errcode(ERRCODE_DATA_CORRUPTED), +errmsg_internal("found xid %u from before relfrozenxid %u", + xid, relfrozenxid))); if
Re: compress method for spgist - 2
On Wed, Dec 6, 2017 at 6:08 PM, Nikita Glukhovwrote: > On 05.12.2017 23:59, Alexander Korotkov wrote: > > On Tue, Dec 5, 2017 at 1:14 PM, Darafei Praliaskouski > wrote: > >> The following review has been posted through the commitfest application: >> make installcheck-world: not tested >> Implements feature: not tested >> Spec compliant: not tested >> Documentation:tested, passed >> >> I've read the updated patch and see my concerns addressed. >> >> I'm looking forward to SP-GiST compress method support, as it will allow >> usage of SP-GiST index infrastructure for PostGIS. >> >> The new status of this patch is: Ready for Committer >> > > I went trough this patch. I found documentation changes to be not > sufficient. And I've made some improvements. > > In particular, I didn't understand why is reconstructedValue claimed to be > of spgConfigOut.leafType while it should be of spgConfigIn.attType both > from general logic and code. I've fixed that. Nikita, correct me if I'm > wrong. > > > I think we are reconstructing a leaf datum, so documentation was correct > but the code in spgWalk() and freeScanStackEntry() wrongly used attType > instead of attLeafType. Fixed patch is attached. > Reconstructed datum is used for index-only scan. Thus, it should be original indexed datum of attType, unless we have decompress method and pass reconstructed datum through it. > Also, I wonder should we check for existence of compress method when > attType and leafType are not the same in spgvalidate() function? We don't > do this for now. > > I've added compress method existence check to spgvalidate(). > It would be nice to evade double calling of config method. Possible option could be to memorize difference between attribute type and leaf type in high bit of functionset, which is guaranteed to be free. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: pow support for pgbench
Raúl Marín Rodríguezwrote: > I don't want to go too deep into it, but you get stuff like this: > > Select pow(2.0, -3)::text = pow(2, -3)::text; > ?column? > -- > f Indeed, to me, that has turned out to be the most intriguing part of the whole thread. Needs to be in some SQL subtleties exam somewhere: select pow(2.0,-3), pow(2,-3); pow | pow +--- 0.1250 | 0.125 Looks like the first call resolves to the numeric version, while the second (with integer arguments) resolves to the double one: select pow(2.0,-3) is of (numeric), pow(2,-3) is of (double precision); ?column? | ?column? --+-- t| t Still, a numeric 0.125 doesn't always have those trailing zeros: select pow(2.0,-3), pow(2,-3)::numeric; pow | pow +--- 0.1250 | 0.125 What's going on in the representation? select numeric_send(pow(2.0,-3)), numeric_send(pow(2,-3)::numeric); numeric_send | numeric_send + \x0001001004e2 | \x0001000304e2 I assume the 10 vs. 03 hex in the tails of those things represent either 'precision' or 'scale' of 16 vs. 3? I don't get much help from IS OF (numeric(p,s)), which seems to ignore any p,s and just be true for any numeric. But here, this matches: select numeric_send(0.125::numeric(16,16)); numeric_send \x0001001004e2 How does numeric_power choose the precision and scale of its result? Is that something the standard dictates? Given that 0.125 is exact for this answer, at first I wanted to ask if numeric_power could be made to produce the result with precision 3, but then I realized that's backwards. A result with precision 3 would be like saying, eh, it's somewhere between 0.1245 and 0.1255. If a result is known to be exact, it would be better to go the other way and return it as numeric(huge). That then led me to wonder if the cast float8_numeric is really doing the right thing. Is it turning 0.125 (an exact representation as float8) into numeric(3,3), again hedging as if it might be anything from 0.1245 to 0.1255? Would it be better for float8_numeric to produce a numeric with the precision/scale reflecting the actual limits of float8? Ok, now I've been driven to UTSL. It looks as if the intent of the snprintf(..., "%.*g", DBL_DIG, val) in float8_numeric could have been to accomplish that. It doesn't, though, as (at least on my platform), %g drops trailing zeros, though there is a documented 'alternate form' flag # that prevents that. It works in bash: bash-4.2$ printf '%.*g\n' 15 0.125 0.125 bash-4.2$ printf '%#.*g\n' 15 0.125 0.125 Does the standard prescribe how cast(float8 as numeric) ought to select the precision/scale? Sorry to drift OT, as this is more about the SQL functions than pgbench, but it was too puzzling to ignore. :) -Chap
Re: Usage of epoch in txid_current
Hi, On 2017-12-06 17:39:09 +0530, Amit Kapila wrote: > We are using ShmemVariableCache->nextXid at many places so always > converting/truncating it to 32-bit number before using seems slightly > awkward, so we can think of using a separate nextBigXid 64bit number > as well. -many It's not actually that many places. And a lot of them would should be updated anyway, to be epoch aware. Let's not add this kind of crummyness to avoid a few truncating casts here and there. > Either way, it is not clear to me how we will keep it > updated after recovery. Right now, the mechanism is quite simple, at > the beginning of a recovery we take the value of nextXid from > checkpoint record and then if any xlog record indicates xid that > follows nextXid, we advance it. Here, the point to note is that we > take the xid from the WAL record (which means that it assumes xids are > non-contiguous or some xids are consumed without being logged) and > increment it. Unless we plan to change something in that logic (like > storing 64-bit xids in WAL records), it is not clear to me how to make > it work. OTOH, recovering value of epoch which increments only at > wraparound seems fairly straightforward as described in my previous > email. I think it should be fairly simple if simply add the 64bit xid to the existing clog extension WAL records. Greetings, Andres Freund
Re: Postgres with pthread
Hi, On 2017-12-06 12:28:29 -0500, Robert Haas wrote: > > Possibly we even want to continue having various > > processes around besides that, the most interesting cases involving > > threads are around intra-query parallelism, and pooling, and for both a > > hybrid model could be beneficial. > > I think if we only use threads for intra-query parallelism we're > leaving a lot of money on the table. For example, if all > shmem-connected backends are using the same process, then we can make > max_locks_per_transaction PGC_SIGHUP. That would be sweet, and there > are probably plenty of similar things. Moreover, if threads are this > thing that we only use now and then for parallel query, then our > support for them will probably have bugs. If we use them all the > time, we'll actually find the bugs and fix them. I hope. I think it'd make a lot of sense to go there gradually. I agree that we probably want to move to more and more use of threads, but we also want our users not to kill us ;). Initially we'd surely continue to use partitioned dynahash for locks, which'd make resizing infeasible anyway. Similar for shared buffers (which I find a hell of a lot more interesting to change at runtime than max_locks_per_transaction), etc... - Andres
Re: Postgres with pthread
On 12/06/2017 06:08 PM, Andres Freund wrote: I think the biggest problem with doing this for real is that it's a huge project, and that it'll take a long time. An additional issue is that this could break a lot of extensions and in a way that it is not apparent at compile time. This means we may need to break all extensions to force extensions authors to check if they are thread safe. I do not like making life hard for out extension community, but if the gains are big enough it might be worth it. Thanks for working on this! Seconded. Andreas
Re: Postgres with pthread
> "barely a 50% speedup" - Hah. I don't believe the numbers, but that'd be > huge. They are numbers derived from a benchmark that any sane person would be using a connection pool for in a production environment, but impressive if true none the less.
Re: [HACKERS] Walsender timeouts and large transactions
On 05/12/17 21:07, Robert Haas wrote: > On Mon, Dec 4, 2017 at 10:59 PM, Craig Ringerwrote: >> To me it looks like it's time to get this committed, marking as such. > > This version has noticeably more code rearrangement than before, and > I'm not sure that is actually buying us anything. Why not keep the > changes minimal? > Yeah we moved things around in the end, the main reason would be that it actually works closer to how it was originally designed to work. Meaning that the slow path is not so slow when !pq_is_send_pending() which seems to have been the reasoning for original coding. It's not completely necessary to do it for fixing the bug, but why make things slower than they need to be. > Also, TBH, this doesn't seem to have been carefully reviewed for style: > > -if (!pq_is_send_pending()) > -return; > +/* Try taking fast path unless we get too close to walsender timeout. */ > +if (now < TimestampTzPlusMilliseconds(last_reply_timestamp, > + wal_sender_timeout / 2)) > +{ > +if (!pq_is_send_pending()) > +return; > +} > > Generally we write if (a && b) { ... } not if (a) { if (b) .. } > It's rather ugly with && because one of the conditions is two line, but okay here you go. I am keeping the brackets even if normally don't for one-liners because it's completely unreadable without them IMHO. > -} > +}; > Oops. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services From 61a23cad58a0016876e3d6c829f6353ee491b4c7 Mon Sep 17 00:00:00 2001 From: Petr Jelinek Date: Tue, 12 Sep 2017 17:31:28 +0900 Subject: [PATCH] Fix walsender timeouts when decoding large transaction The logical slots have fast code path for sending data in order to not impose too high per message overhead. The fast path skips checks for interrupts and timeouts. However, the existing coding failed to consider the fact that transaction with large number of changes may take very long to be processed and sent to the client. This causes walsender to ignore interrupts for potentially long time and more importantly it will cause walsender being killed due to timeout at the end of such transaction. This commit changes the fast path to also check for interrupts and only allows calling the fast path when last keeplaive check happened less than half of walsender timeout ago, otherwise the slower code path will be taken. --- src/backend/replication/walsender.c | 66 + 1 file changed, 37 insertions(+), 29 deletions(-) diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c index fa1db748b5..e015870a4e 100644 --- a/src/backend/replication/walsender.c +++ b/src/backend/replication/walsender.c @@ -1151,6 +1151,8 @@ static void WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid, bool last_write) { + TimestampTz now; + /* output previously gathered data in a CopyData packet */ pq_putmessage_noblock('d', ctx->out->data, ctx->out->len); @@ -1160,23 +1162,54 @@ WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid, * several releases by streaming physical replication. */ resetStringInfo(); - pq_sendint64(, GetCurrentTimestamp()); + now = GetCurrentTimestamp(); + pq_sendint64(, now); memcpy(>out->data[1 + sizeof(int64) + sizeof(int64)], tmpbuf.data, sizeof(int64)); - /* fast path */ + CHECK_FOR_INTERRUPTS(); + /* Try to flush pending output to the client */ if (pq_flush_if_writable() != 0) WalSndShutdown(); - if (!pq_is_send_pending()) + /* Try taking fast path unless we get too close to walsender timeout. */ + if (now < TimestampTzPlusMilliseconds(last_reply_timestamp, + wal_sender_timeout / 2) && + pq_is_send_pending()) + { return; + } + /* If we have pending write here, go to slow path */ for (;;) { int wakeEvents; long sleeptime; - TimestampTz now; + + /* Check for input from the client */ + ProcessRepliesIfAny(); + + now = GetCurrentTimestamp(); + + /* die if timeout was reached */ + WalSndCheckTimeOut(now); + + /* Send keepalive if the time has come */ + WalSndKeepaliveIfNecessary(now); + + if (!pq_is_send_pending()) + break; + + sleeptime = WalSndComputeSleeptime(now); + + wakeEvents = WL_LATCH_SET | WL_POSTMASTER_DEATH | + WL_SOCKET_WRITEABLE | WL_SOCKET_READABLE | WL_TIMEOUT; + + /* Sleep until something happens or we time out */ + WaitLatchOrSocket(MyLatch, wakeEvents, + MyProcPort->sock, sleeptime, + WAIT_EVENT_WAL_SENDER_WRITE_DATA); /* * Emergency bailout if postmaster has died. This is to avoid the @@ -1198,34 +1231,9 @@ WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid, SyncRepInitConfig(); } - /* Check for input from the client */ -
Re: Postgres with pthread
Hi, On 2017-12-06 11:53:21 -0500, Tom Lane wrote: > Konstantin Knizhnikwrites: > However, if I guess at which numbers are supposed to be what, > it looks like even the best case is barely a 50% speedup. "barely a 50% speedup" - Hah. I don't believe the numbers, but that'd be huge. > That would be worth pursuing if it were reasonably low-hanging > fruit, but converting PG to threads seems very far from being that. I don't think immediate performance gains are the interesting part about using threads. It's rather what their absence adds a lot in existing / submitted code complexity, and makes some very commonly requested features a lot harder to implement: - we've a lot of duplicated infrastructure around dynamic shared memory. dsm.c dsa.c, dshash.c etc. A lot of these, especially dsa.c, are going to become a lot more complicated over time, just look at how complicated good multi threaded allocators are. - we're adding a lot of slowness to parallelism, just because we have different memory layouts in different processes. Instead of just passing pointers through queues, we put entire tuples in there. We deal with dsm aware pointers. - a lot of features have been a lot harder (parallelism!), and a lot of frequently requested ones are so hard due to processes that they never got off ground (in-core pooling, process reuse, parallel worker reuse) - due to the statically sized shared memory a lot of our configuration is pretty fundamentally PGC_POSTMASTER, even though that present a lot of administrative problems. ... > I think you've done us a very substantial service by pursuing > this far enough to get some quantifiable performance results. > But now that we have some results in hand, I think we're best > off sticking with the architecture we've got. I don't agree. I'd personally expect that an immediate conversion would result in very little speedup, a bunch of code deleted, a bunch of complexity added. And it'd still be massively worthwhile, to keep medium to long term complexity and feature viability in control. Greetings, Andres Freund
Re: Postgres with pthread
On Wed, Dec 6, 2017 at 11:53 AM, Tom Lanewrote: > barely a 50% speedup. I think that's an awfully strange choice of adverb. This is, by its authors own admission, a rough cut at this, probably with very little of the optimization that could ultimately done, and it's already buying 50% on some test cases? That sounds phenomenally good to me. A 50% speedup is huge, and chances are that it can be made quite a bit better with more work, or that it already is quite a bit better with the right test case. TBH, based on previous discussion, I expected this to initially be *slower* but still worthwhile in the long run because of optimizations that it would let us do eventually with parallel query and other things. If it's this much faster out of the gate, that's really exciting. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Postgres with pthread
Hi! On 2017-12-06 19:40:00 +0300, Konstantin Knizhnik wrote: > As far as I remember, several years ago when implementation of intra-query > parallelism was just started there was discussion whether to use threads or > leave traditional Postgres process architecture. The decision was made to > leave processes. So now we have bgworkers, shared message queue, DSM, ... > The main argument for such decision was that switching to threads will > require rewriting of most of Postgres code. > It seems to be quit reasonable argument and and until now I agreed with it. > > But recently I wanted to check it myself. I think that's something pretty important to play with. There've been several discussions lately, both on and off list / in person, that we're taking on more-and-more technical debt just because we're using processes. Besides the above, we've grown: - a shared memory allocator - a shared memory hashtable - weird looking thread aware pointers - significant added complexity in various projects due to addresses not being mapped to the same address etc. > The first problem with porting Postgres to pthreads is static variables > widely used in Postgres code. > Most of modern compilers support thread local variables, for example GCC > provides __thread keyword. > Such variables are placed in separate segment which is address through > segment register (at Intel). > So access time to such variables is the same as to normal static variables. I experimented similarly. Although I'm not 100% sure that if were to go for it, we wouldn't instead want to abstract our session concept further, or well, at all. > Certainly may be not all compilers have builtin support of TLS and may be > not at all hardware platforms them are implemented ias efficiently as at > Intel. > So certainly such approach decreases portability of Postgres. But IMHO it is > not so critical. I'd agree there, but I don't think the project necessarily does. > What I have done: > 1. Add session_local (defined as __thread) to definition of most of static > and global variables. > I leaved some variables pointed to shared memory as static. Also I have to > changed initialization of some static variables, > because address of TLS variable can not be used in static initializers. > 2. Change implementation of GUCs to make them thread specific. > 3. Replace fork() with pthread_create > 4. Rewrite file descriptor cache to be global (shared by all threads). That one I'm very unconvinced of, that's going to add a ton of new contention. > What are the advantages of using threads instead of processes? > > 1. No need to use shared memory. So there is no static limit for amount of > memory which can be used by Postgres. No need in distributed shared memory > and other stuff designed to share memory between backends and > bgworkers. This imo is the biggest part. We can stop duplicating OS and our own implementations in a shmem aware way. > 2. Threads significantly simplify implementation of parallel algorithms: > interaction and transferring data between threads can be done easily and > more efficiently. That's imo the same as 1. > 3. It is possible to use more efficient/lightweight synchronization > primitives. Postgres now mostly relies on its own low level sync.primitives > which user-level implementation > is using spinlocks and atomics and then fallback to OS semaphores/poll. I am > not sure how much gain can we get by replacing this primitives with one > optimized for threads. > My colleague from Firebird community told me that just replacing processes > with threads can obtain 20% increase of performance, but it is just first > step and replacing sync. primitive > can give much greater advantage. But may be for Postgres with its low level > primitives it is not true. I don't believe that that's actually the case to any significant degree. > 6. Faster backend startup. Certainly starting backend at each user's request > is bad thing in any case. Some kind of connection pooling should be used in > any case to provide acceptable performance. But in any case, start of new > backend process in postgres causes a lot of page faults which have > dramatical impact on performance. And there is no such problem with threads. I don't buy this in itself. The connection establishment overhead isn't largely the fork, it's all the work afterwards. I do think it makes connection pooling etc easier. > I just want to receive some feedback and know if community is interested in > any further work in this direction. I personally am. I think it's beyond high time that we move to take advantage of threads. That said, I don't think just replacing threads is the right thing. I'm pretty sure we'd still want to have postmaster as a separate process, for robustness. Possibly we even want to continue having various processes around besides that, the most interesting cases involving threads are around intra-query parallelism, and pooling, and for both a hybrid
Re: [HACKERS] parallel.c oblivion of worker-startup failures
On Wed, Dec 6, 2017 at 9:54 AM, Tom Lanewrote: > Maybe track "worker is known to have launched" in the leader's state > someplace? That's probably one piece of it, yes. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: views
Andrzej Barszczwrites: > I would be happy when checkRuleResultList in rewriteDefine.c lost so strict > conditions for returning clause. Are there any reasons to have such > restriction for views ? Uh, so that the results of a query that invokes the rule are well-defined? If you think that for your application, it's okay for the RETURNING rule to not bother providing useful data for some columns, you could just have it return nulls for those. But I don't think it's appropriate for the system itself to make a value judgement like that. If the columns don't match up, it could very well be user error, and we should report that. regards, tom lane
Re: Postgres with pthread
Here it is formatted a little better. So a little over 50% performance improvement for a couple of the test cases. On Wed, Dec 6, 2017 at 11:53 AM, Tom Lanewrote: > Konstantin Knizhnik writes: > > Below are some results (1000xTPS) of select-only (-S) pgbench with scale > > 100 at my desktop with quad-core i7-4770 3.40GHz and 16Gb of RAM: > > > ConnectionsVanilla/default Vanilla/prepared > > pthreads/defaultpthreads/prepared > > 10100 191 > > 106 207 > > 100 67 131 > > 105 168 > > 100041 65 > > 55 102 > > This table is so mangled that I'm not very sure what it's saying. > Maybe you should have made it an attachment? > > However, if I guess at which numbers are supposed to be what, > it looks like even the best case is barely a 50% speedup. > That would be worth pursuing if it were reasonably low-hanging > fruit, but converting PG to threads seems very far from being that. > > I think you've done us a very substantial service by pursuing > this far enough to get some quantifiable performance results. > But now that we have some results in hand, I think we're best > off sticking with the architecture we've got. > > regards, tom lane > >
views
Dear hackers, I would be happy when checkRuleResultList in rewriteDefine.c lost so strict conditions for returning clause. Are there any reasons to have such restriction for views ? What I mean in short: create view ... as select MAIN_TABLE ... joins ... 1. Updating view means updating MAIN_TABLE 2. I want to return after insert at least primary key No rule system, no trigger INSTEAD OF give me such possibility but I know that it's possible after modification of checkRuleResultList.
Postgres with pthread
Hi hackers, As far as I remember, several years ago when implementation of intra-query parallelism was just started there was discussion whether to use threads or leave traditional Postgres process architecture. The decision was made to leave processes. So now we have bgworkers, shared message queue, DSM, ... The main argument for such decision was that switching to threads will require rewriting of most of Postgres code. It seems to be quit reasonable argument and and until now I agreed with it. But recently I wanted to check it myself. The first problem with porting Postgres to pthreads is static variables widely used in Postgres code. Most of modern compilers support thread local variables, for example GCC provides __thread keyword. Such variables are placed in separate segment which is address through segment register (at Intel). So access time to such variables is the same as to normal static variables. Certainly may be not all compilers have builtin support of TLS and may be not at all hardware platforms them are implemented ias efficiently as at Intel. So certainly such approach decreases portability of Postgres. But IMHO it is not so critical. What I have done: 1. Add session_local (defined as __thread) to definition of most of static and global variables. I leaved some variables pointed to shared memory as static. Also I have to changed initialization of some static variables, because address of TLS variable can not be used in static initializers. 2. Change implementation of GUCs to make them thread specific. 3. Replace fork() with pthread_create 4. Rewrite file descriptor cache to be global (shared by all threads). I have not changed all Postgres synchronization primitives and shared memory. It took me about one week of work. What is not done yet: 1. Handling of signals (I expect that Win32 code can be somehow reused here). 2. Deallocation of memory and closing files on backend (thread) termination. 3. Interaction of postmaster and backends with PostgreSQL auxiliary processes (threads), such as autovacuum, bgwriter, checkpointer, stat collector,... What are the advantages of using threads instead of processes? 1. No need to use shared memory. So there is no static limit for amount of memory which can be used by Postgres. No need in distributed shared memory and other stuff designed to share memory between backends and bgworkers. 2. Threads significantly simplify implementation of parallel algorithms: interaction and transferring data between threads can be done easily and more efficiently. 3. It is possible to use more efficient/lightweight synchronization primitives. Postgres now mostly relies on its own low level sync.primitives which user-level implementation is using spinlocks and atomics and then fallback to OS semaphores/poll. I am not sure how much gain can we get by replacing this primitives with one optimized for threads. My colleague from Firebird community told me that just replacing processes with threads can obtain 20% increase of performance, but it is just first step and replacing sync. primitive can give much greater advantage. But may be for Postgres with its low level primitives it is not true. 4. Threads are more lightweight entities than processes. Context switch between threads takes less time than between process. And them consume less memory. It is usually possible to spawn more threads than processes. 5. More efficient access to virtual memory. As far as all threads are sharing the same memory space, TLB is used much efficiently in this case. 6. Faster backend startup. Certainly starting backend at each user's request is bad thing in any case. Some kind of connection pooling should be used in any case to provide acceptable performance. But in any case, start of new backend process in postgres causes a lot of page faults which have dramatical impact on performance. And there is no such problem with threads. Certainly, processes are also having some advantages comparing with threads: 1. Better isolation and error protection 2. Easier error handling 3. Easier control of used resources But it is a theory. The main idea of this prototype was to prove or disprove this expectation at practice. I didn't expect large differences in performance because synchronization primitives are not changed and I performed my experiments at Linux where threads/processes are implemented in similar way. Below are some results (1000xTPS) of select-only (-S) pgbench with scale 100 at my desktop with quad-core i7-4770 3.40GHz and 16Gb of RAM: Connections Vanilla/default Vanilla/prepared pthreads/defaultpthreads/prepared 10 100 191 106 207 100 67 131 105 168 1000 41 65 55 102 As you can see, for small number of connection results
Re: ALTER TABLE ADD COLUMN fast default
On 12/06/2017 11:05 AM, Tom Lane wrote: > Andrew Dunstanwrites: >> On 12/06/2017 10:16 AM, Tom Lane wrote: >>> I'm quite disturbed both by the sheer invasiveness of this patch >>> (eg, changing the API of heap_attisnull()) and by the amount of logic >>> it adds to fundamental tuple-deconstruction code paths. I think >>> we'll need to ask some hard questions about whether the feature gained >>> is worth the distributed performance loss that will ensue. >> I'm sure we can reduce the footprint some. >> w.r.t. heap_attisnull, only a handful of call sites actually need the >> new functionality, 8 or possibly 10 (I need to check more on those in >> relcache.c). So we could instead of changing the function provide a new >> one to be used at those sites. I'll work on that. and submit a new patch >> fairly shortly. > Meh, I'm not at all sure that's an improvement. A version of > heap_attisnull that ignores the possibility of a "missing" attribute value > will give the *wrong answer* if the other version should have been used > instead. Furthermore it'd be an attractive nuisance because people would > fail to notice if an existing call were now unsafe. If we're to do this > I think we have no choice but to impose that compatibility break on > third-party code. Fair point. > > In general, you need to be thinking about this in terms of making sure > that it's impossible to accidentally not update code that needs to be > updated. Another example is that I noticed that some places the patch > has s/CreateTupleDescCopy/CreateTupleDescCopyConstr/, evidently because > the former will fail to copy the missing-attribute support data. > I think that's an astonishingly bad idea. There is basically no situation > in which CreateTupleDescCopy defined in that way will ever be safe to use. > The missing-attribute info is now a fundamental part of a tupdesc and it > has to be copied always, just as much as e.g. atttypid. > > I would opine that it's a mistake to design TupleDesc as though the > missing-attribute data had anything to do with default values. It may > have originated from the same place but it's a distinct thing. Better > to store it in a separate sub-structure. OK, will work on all that. Thanks again. cheers andrew -- Andrew Dunstanhttps://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
I think you've done a stellar job of identifying what the actual problem was. I like the new (simpler) coding of that portion of HeapTupleSatisfiesVacuum. freeze-the-dead is not listed in isolation_schedule; an easy fix. I confirm that the test crashes with an assertion failure without the code fix, and that it doesn't with it. I think the comparison to OldestXmin should be reversed: if (!TransactionIdPrecedes(xmax, OldestXmin)) return HEAPTUPLE_RECENTLY_DEAD; return HEAPTUPLE_DEAD; This way, an xmax that has exactly the OldestXmin value will return RECENTLY_DEAD rather DEAD, which seems reasonable to me (since OldestXmin value itself is supposed to be still possibly visible to somebody). Also, this way it is consistent with the other comparison to OldestXmin at the bottom of the function. There is no reason for the "else" or the extra braces. Put together, I propose the attached delta for 0001. Your commit message does a poor job of acknowledging prior work on diagnosing the problem starting from Dan's initial test case and patch. I haven't looked at your 0002 yet. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c index 8be2980116..aab03835d1 100644 --- a/src/backend/utils/time/tqual.c +++ b/src/backend/utils/time/tqual.c @@ -1324,25 +1324,23 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin, else if (TransactionIdDidCommit(xmax)) { /* -* The multixact might still be running due to lockers. If the -* updater is below the horizon we have to return DEAD regardless -* - otherwise we could end up with a tuple where the updater has -* to be removed due to the horizon, but is not pruned away. It's -* not a problem to prune that tuple because all the lockers will -* also be present in the newer tuple version. +* The multixact might still be running due to lockers. If the +* updater is below the xid horizon, we have to return DEAD +* regardless -- otherwise we could end up with a tuple where the +* updater has to be removed due to the horizon, but is not pruned +* away. It's not a problem to prune that tuple, because any +* remaining lockers will also be present in newer tuple versions. */ - if (TransactionIdPrecedes(xmax, OldestXmin)) - { - return HEAPTUPLE_DEAD; - } - else + if (!TransactionIdPrecedes(xmax, OldestXmin)) return HEAPTUPLE_RECENTLY_DEAD; + + return HEAPTUPLE_DEAD; } else if (!MultiXactIdIsRunning(HeapTupleHeaderGetRawXmax(tuple), false)) { /* * Not in Progress, Not Committed, so either Aborted or crashed. -* Remove the Xmax. +* Mark the Xmax as invalid. */ SetHintBits(tuple, buffer, HEAP_XMAX_INVALID, InvalidTransactionId); } diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule index e41b9164cd..eb566ebb6c 100644 --- a/src/test/isolation/isolation_schedule +++ b/src/test/isolation/isolation_schedule @@ -44,6 +44,7 @@ test: update-locked-tuple test: propagate-lock-delete test: tuplelock-conflict test: tuplelock-update +test: freeze-the-dead test: nowait test: nowait-2 test: nowait-3
Re: ALTER TABLE ADD COLUMN fast default
Andrew Dunstanwrites: > On 12/06/2017 10:16 AM, Tom Lane wrote: >> I'm quite disturbed both by the sheer invasiveness of this patch >> (eg, changing the API of heap_attisnull()) and by the amount of logic >> it adds to fundamental tuple-deconstruction code paths. I think >> we'll need to ask some hard questions about whether the feature gained >> is worth the distributed performance loss that will ensue. > I'm sure we can reduce the footprint some. > w.r.t. heap_attisnull, only a handful of call sites actually need the > new functionality, 8 or possibly 10 (I need to check more on those in > relcache.c). So we could instead of changing the function provide a new > one to be used at those sites. I'll work on that. and submit a new patch > fairly shortly. Meh, I'm not at all sure that's an improvement. A version of heap_attisnull that ignores the possibility of a "missing" attribute value will give the *wrong answer* if the other version should have been used instead. Furthermore it'd be an attractive nuisance because people would fail to notice if an existing call were now unsafe. If we're to do this I think we have no choice but to impose that compatibility break on third-party code. In general, you need to be thinking about this in terms of making sure that it's impossible to accidentally not update code that needs to be updated. Another example is that I noticed that some places the patch has s/CreateTupleDescCopy/CreateTupleDescCopyConstr/, evidently because the former will fail to copy the missing-attribute support data. I think that's an astonishingly bad idea. There is basically no situation in which CreateTupleDescCopy defined in that way will ever be safe to use. The missing-attribute info is now a fundamental part of a tupdesc and it has to be copied always, just as much as e.g. atttypid. I would opine that it's a mistake to design TupleDesc as though the missing-attribute data had anything to do with default values. It may have originated from the same place but it's a distinct thing. Better to store it in a separate sub-structure. regards, tom lane
Re: ALTER TABLE ADD COLUMN fast default
On 12/06/2017 10:16 AM, Tom Lane wrote: > Andrew Dunstanwrites: >> Attached is a patch for $subject. It's based on Serge Reilau's patch of >> about a year ago, taking into account comments made at the time. with >> bitrot removed and other enhancements such as documentation. >> Essentially it stores a value in pg_attribute that is used when the >> stored tuple is missing the attribute. This works unless the default >> expression is volatile, in which case a table rewrite is forced as >> happens now. > I'm quite disturbed both by the sheer invasiveness of this patch > (eg, changing the API of heap_attisnull()) and by the amount of logic > it adds to fundamental tuple-deconstruction code paths. I think > we'll need to ask some hard questions about whether the feature gained > is worth the distributed performance loss that will ensue. > > Thanks for prompt comments. I'm sure we can reduce the footprint some. w.r.t. heap_attisnull, only a handful of call sites actually need the new functionality, 8 or possibly 10 (I need to check more on those in relcache.c). So we could instead of changing the function provide a new one to be used at those sites. I'll work on that. and submit a new patch fairly shortly. cheers andrew -- Andrew Dunstanhttps://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: ALTER TABLE ADD COLUMN fast default
Andrew Dunstanwrites: > Attached is a patch for $subject. It's based on Serge Reilau's patch of > about a year ago, taking into account comments made at the time. with > bitrot removed and other enhancements such as documentation. > Essentially it stores a value in pg_attribute that is used when the > stored tuple is missing the attribute. This works unless the default > expression is volatile, in which case a table rewrite is forced as > happens now. I'm quite disturbed both by the sheer invasiveness of this patch (eg, changing the API of heap_attisnull()) and by the amount of logic it adds to fundamental tuple-deconstruction code paths. I think we'll need to ask some hard questions about whether the feature gained is worth the distributed performance loss that will ensue. regards, tom lane
Re: compress method for spgist - 2
On 05.12.2017 23:59, Alexander Korotkov wrote: On Tue, Dec 5, 2017 at 1:14 PM, Darafei Praliaskouski> wrote: The following review has been posted through the commitfest application: make installcheck-world: not tested Implements feature: not tested Spec compliant: not tested Documentation: tested, passed I've read the updated patch and see my concerns addressed. I'm looking forward to SP-GiST compress method support, as it will allow usage of SP-GiST index infrastructure for PostGIS. The new status of this patch is: Ready for Committer I went trough this patch. I found documentation changes to be not sufficient. And I've made some improvements. In particular, I didn't understand why is reconstructedValue claimed to be of spgConfigOut.leafType while it should be of spgConfigIn.attType both from general logic and code. I've fixed that. Nikita, correct me if I'm wrong. I think we are reconstructing a leaf datum, so documentation was correct but the code in spgWalk() and freeScanStackEntry() wrongly used attType instead of attLeafType. Fixed patch is attached. Also, I wonder should we check for existence of compress method when attType and leafType are not the same in spgvalidate() function? We don't do this for now. I've added compress method existence check to spgvalidate(). 0002-spgist-polygon-8.patch is OK for me so soon. -- Nikita Glukhov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/doc/src/sgml/spgist.sgml b/doc/src/sgml/spgist.sgml index 139c8ed..b4a8be4 100644 --- a/doc/src/sgml/spgist.sgml +++ b/doc/src/sgml/spgist.sgml @@ -240,20 +240,22 @@ There are five user-defined methods that an index operator class for - SP-GiST must provide. All five follow the convention - of accepting two internal arguments, the first of which is a - pointer to a C struct containing input values for the support method, - while the second argument is a pointer to a C struct where output values - must be placed. Four of the methods just return void, since - all their results appear in the output struct; but + SP-GiST must provide, and one is optional. All five + mandatory methods follow the convention of accepting two internal + arguments, the first of which is a pointer to a C struct containing input + values for the support method, while the second argument is a pointer to a + C struct where output values must be placed. Four of the mandatory methods just + return void, since all their results appear in the output struct; but leaf_consistent additionally returns a boolean result. The methods must not modify any fields of their input structs. In all cases, the output struct is initialized to zeroes before calling the - user-defined method. + user-defined method. Optional sixth method compress + accepts datum to be indexed as the only argument and returns value suitable + for physical storage in leaf tuple. - The five user-defined methods are: + The five mandatory user-defined methods are: @@ -283,6 +285,7 @@ typedef struct spgConfigOut { Oid prefixType; /* Data type of inner-tuple prefixes */ Oid labelType; /* Data type of inner-tuple node labels */ +Oid leafType; /* Data type of leaf-tuple values */ boolcanReturnData; /* Opclass can reconstruct original data */ boollongValuesOK; /* Opclass can cope with values 1 page */ } spgConfigOut; @@ -305,6 +308,22 @@ typedef struct spgConfigOut class is capable of segmenting long values by repeated suffixing (see ). + + + leafType is typically the same as + attType. For the reasons of backward + compatibility, method config can + leave leafType uninitialized; that would + give the same effect as setting leafType equal + to attType. When attType + and leafType are different, then optional + method compress must be provided. + Method compress is responsible + for transformation of datums to be indexed from attType + to leafType. + Note: both consistent functions will get scankeys + unchanged, without transformation using compress. + @@ -380,10 +399,16 @@ typedef struct spgChooseOut } spgChooseOut; - datum is the original datum that was to be inserted - into the index. - leafDatum is initially the same as - datum, but can change at lower levels of the tree + datum is the original datum of + spgConfigIn.attType + type that was to be inserted into the index. + leafDatum is a value of + spgConfigOut.leafType + type which is initially an result of method + compress applied to datum + when method compress is provided, or same value as + datum otherwise. + leafDatum
Re: [HACKERS] Custom compression methods
On Fri, 1 Dec 2017 21:47:43 +0100 Tomas Vondrawrote: > > +1 to do the rewrite, just like for other similar ALTER TABLE commands Ok. What about the following syntax: ALTER COLUMN DROP COMPRESSION - removes compression from the column with the rewrite and removes related compression options, so the user can drop compression method. ALTER COLUMN SET COMPRESSION NONE for the cases when the users want to just disable compression for future tuples. After that they can keep compressed tuples, or in the case when they have a large table they can decompress tuples partially using e.g. UPDATE, and then use ALTER COLUMN DROP COMPRESSION which will be much faster then. ALTER COLUMN SET COMPRESSION WITH will change compression for new tuples but will not touch old ones. If the users want the recompression they can use DROP/SET COMPRESSION combination. I don't think that SET COMPRESSION with the rewrite of the whole table will be useful enough on any somewhat big tables and same time big tables is where the user needs compression the most. I understand that ALTER with the rewrite sounds logical and much easier to implement (and it doesn't require Oids in tuples), but it could be unusable. -- Regards, Ildus Kurbangaliev
Re: [HACKERS] Transaction control in procedures
On 12/5/17 13:33, Robert Haas wrote: > On Tue, Dec 5, 2017 at 1:25 PM, Peter Eisentraut >wrote: >> I think ROLLBACK in a cursor loop might not make sense, because the >> cursor query itself could have side effects, so a rollback would have to >> roll back the entire loop. That might need more refined analysis before >> it could be allowed. > > COMMIT really has the same problem; if the cursor query has side > effects, you can't commit those side effects piecemeal as the loop > executed and have things behave sanely. The first COMMIT inside the loop would commit the cursor query. This isn't all that different from what you'd get now if you coded this manually using holdable cursors or just plain client code. Clearly, you can create a mess if the loop body interacts with the loop expression, but that's already the case. But if you coded something like this yourself now and ran a ROLLBACK inside the loop, the holdable cursor would disappear (unless previously committed), so you couldn't proceed with the loop. The SQL standard for persistent stored modules explicitly prohibits COMMIT and ROLLBACK in cursor loop bodies. But I think people will eventually want it. >>> - COMMIT or ROLLBACK inside a procedure with a SET clause attached, >> >> That also needs to be prohibited because of the way the GUC nesting >> currently works. It's probably possible to fix it, but it would be a >> separate effort. >> >>> and/or while SET LOCAL is in effect either at the inner or outer >>> level. >> >> That seems to work fine. > > These two are related -- if you don't permit anything that makes > temporary changes to GUCs at all, like SET clauses attached to > functions, then SET LOCAL won't cause any problems. The problem is if > you do a transaction operation when something set locally is in the > stack of values, but not at the top. Yes, that's exactly the problem. So right now I'm just preventing the problematic scenario. So fix that, one would possibly have to replace the stack by something not quite a stack. New patch attached. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services From 18aaf292fbb22647e09c38cc21b56ff98643e518 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Wed, 6 Dec 2017 09:29:25 -0500 Subject: [PATCH v4] Transaction control in PL procedures In each of the supplied procedural languages (PL/pgSQL, PL/Perl, PL/Python, PL/Tcl), add language-specific commit and rollback functions/commands to control transactions in procedures in that language. Add similar underlying functions to SPI. Some additional cleanup so that transaction commit or abort doesn't blow away data structures still used by the procedure call. Add execution context tracking to CALL and DO statements so that transaction control commands can only be issued in top-level procedure and block calls, not function calls or other procedure or block calls. --- doc/src/sgml/plperl.sgml | 49 + doc/src/sgml/plpgsql.sgml | 91 doc/src/sgml/plpython.sgml| 38 doc/src/sgml/pltcl.sgml | 41 doc/src/sgml/ref/call.sgml| 7 + doc/src/sgml/ref/create_procedure.sgml| 7 + doc/src/sgml/ref/do.sgml | 7 + doc/src/sgml/spi.sgml | 219 src/backend/commands/functioncmds.c | 45 +++- src/backend/executor/spi.c| 112 -- src/backend/tcop/utility.c| 6 +- src/backend/utils/mmgr/portalmem.c| 58 +++--- src/include/commands/defrem.h | 4 +- src/include/executor/spi.h| 5 + src/include/executor/spi_priv.h | 4 + src/include/nodes/nodes.h | 3 +- src/include/nodes/parsenodes.h| 7 + src/include/utils/portal.h| 1 + src/pl/plperl/GNUmakefile | 2 +- src/pl/plperl/SPI.xs | 12 ++ src/pl/plperl/expected/plperl_transaction.out | 94 + src/pl/plperl/plperl.c| 6 + src/pl/plperl/sql/plperl_transaction.sql | 88 src/pl/plpgsql/src/pl_exec.c | 66 +- src/pl/plpgsql/src/pl_funcs.c | 44 src/pl/plpgsql/src/pl_gram.y | 34 +++ src/pl/plpgsql/src/pl_handler.c | 8 + src/pl/plpgsql/src/pl_scanner.c | 2 + src/pl/plpgsql/src/plpgsql.h | 22 +- src/pl/plpython/Makefile | 1 + src/pl/plpython/expected/plpython_test.out| 4 +- src/pl/plpython/expected/plpython_transaction.out | 104
ALTER TABLE ADD COLUMN fast default
Attached is a patch for $subject. It's based on Serge Reilau's patch of about a year ago, taking into account comments made at the time. with bitrot removed and other enhancements such as documentation. Essentially it stores a value in pg_attribute that is used when the stored tuple is missing the attribute. This works unless the default expression is volatile, in which case a table rewrite is forced as happens now. Comments welcome. cheers andrew -- Andrew Dunstanhttps://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index 3f02202..d5dc14a 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -1150,6 +1150,18 @@ + atthasmissing + bool + + +This column has a value which is used where the column is entirely +missing from the row, as happens when a column is added after the +row is created. The actual value used is stored in the +attmissingval column. + + + + attidentity char @@ -1229,6 +1241,18 @@ + + attmissingval + bytea + + +This column has a binary representation of the value used when the column +is entirely missing from the row, as happens when the column is added after +the row is created. The value is only used when +atthasmissing is true. + + + diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml index 7bcf242..780bead 100644 --- a/doc/src/sgml/ref/alter_table.sgml +++ b/doc/src/sgml/ref/alter_table.sgml @@ -1115,26 +1115,28 @@ ALTER TABLE [ IF EXISTS ] name -When a column is added with ADD COLUMN, all existing -rows in the table are initialized with the column's default value -(NULL if no DEFAULT clause is specified). -If there is no DEFAULT clause, this is merely a metadata -change and does not require any immediate update of the table's data; -the added NULL values are supplied on readout, instead. + When a column is added with ADD COLUMN and a + non-volatile DEFAULT is specified, the default + is evaluated at the time of the statement and the result stored in the + table's metadata. That value will be used for the column for + all existing rows. If no DEFAULT is specified + NULL is used. In neither case is a rewrite of the table required. -Adding a column with a DEFAULT clause or changing the type of -an existing column will require the entire table and its indexes to be -rewritten. As an exception when changing the type of an existing column, -if the USING clause does not change the column -contents and the old type is either binary coercible to the new type or -an unconstrained domain over the new type, a table rewrite is not needed; -but any indexes on the affected columns must still be rebuilt. Adding or -removing a system oid column also requires rewriting the entire -table. Table and/or index rebuilds may take a significant amount of time -for a large table; and will temporarily require as much as double the disk -space. + Adding a column with a volatile DEFAULT or + changing the type of + an existing column will require the entire table and its indexes to be + rewritten. As an exception when changing the type of an existing column, + if the USING clause does not change the column + contents and the old type is either binary coercible to the new type or + an unconstrained domain over the new type, a table rewrite is not needed; + but any indexes on the affected columns must still be rebuilt. Adding or + removing a system oid column also requires rewriting + the entire table. + Table and/or index rebuilds may take a significant amount of time + for a large table; and will temporarily require as much as double the disk + space. diff --git a/src/backend/access/common/heaptuple.c b/src/backend/access/common/heaptuple.c index a1a9d99..8188acf 100644 --- a/src/backend/access/common/heaptuple.c +++ b/src/backend/access/common/heaptuple.c @@ -76,6 +76,105 @@ * */ +/* + * Return the missing value of an attribute, or NULL if it + * there is no missing value. + */ +static Datum +getmissingattr(TupleDesc tupleDesc, + int attnum, bool *isnull) +{ + int missingnum; + Form_pg_attribute att; + AttrDefault *attrdef; + + Assert(attnum <= tupleDesc->natts); + Assert(attnum > 0); + + att = TupleDescAttr(tupleDesc, attnum - 1); + + if (att->atthasmissing) + { + Assert(tupleDesc->constr); + Assert(tupleDesc->constr->num_defval > 0); + Assert(tupleDesc->constr->has_missingval); + + attrdef = tupleDesc->constr->defval; + + /* + * Look through the
Re: [HACKERS] parallel.c oblivion of worker-startup failures
On Wed, Dec 6, 2017 at 1:43 AM, Amit Kapilawrote: > Looks good to me. Committed and back-patched to 9.4 where dynamic background workers were introduced. >> Barring objections, I'll commit and >> back-patch this; then we can deal with the other part of this >> afterward. > > Sure, I will rebase on top of the commit unless you have some comments > on the remaining part. I'm not in love with that part of the fix; the other parts of that if statement are just testing variables, and a function call that takes and releases an LWLock is a lot more expensive. Furthermore, that test will often be hit in practice, because we'll often arrive at this function before workers have actually finished. On top of that, we'll typically arrive here having already communicated with the worker in some way, such as by receiving tuples, which means that in most cases we already know that the worker was alive at least at some point, and therefore the extra test isn't necessary. We only need that test, if I understand correctly, to cover the failure-to-launch case, which is presumably very rare. All that having been said, I don't quite know how to do it better. It would be easy enough to modify this function so that if it ever received a result other then BGWH_NOT_YET_STARTED for a worker, it wouldn't call this function again for that worker. However, that's only mitigating the problem, not really fixing it. We'll still end up frequently inquiring - admittedly only once - about the status of a worker with which we've previously communicated via the tuple queue. Maybe we just have to live with that problem, but do you (or does anyone else) have an idea? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
On Wed, Dec 6, 2017 at 5:03 PM, Andres Freundwrote: > Ping. I'm a bit surprised that a bug fixing a significant data > corruption issue has gotten no reviews at all. Note that I was planning to look at this problem today and tomorrow my time, getting stuck for CF handling last week and conference this week. If you think that helps, I'll be happy to help at the extent of what I can do. -- Michael