Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
On Fri, Jun 16, 2017 at 5:31 AM, Marina Polyakova wrote: > And thank you very much for your explanation how and why transactions with > failures should be retried! I'll try to implement all of it. To be clear, part of "retrying from the beginning" means that if a result from one statement is used to determine the content (or whether to run) a subsequent statement, that first statement must be run in the new transaction and the results evaluated again to determine what to use for the later statement. You can't simply replay the statements that were run during the first try. For examples, to help get a feel of why that is, see: https://wiki.postgresql.org/wiki/SSI -- Kevin Grittner VMware vCenter Server https://www.vmware.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
On Thu, Jun 15, 2017 at 4:18 PM, Alvaro Herrera wrote: > Kevin Grittner wrote: > As far as I understand her proposal, it is exactly the opposite -- if a > transaction fails, it is discarded. And this P.S. note is asking > whether this is a good idea, or would we prefer that failing > transactions are retried. > > I think it's pretty obvious that transactions that failed with > some serializability problem should be retried. Agreed all around. -- Kevin Grittner VMware vCenter Server https://www.vmware.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
On Thu, Jun 15, 2017 at 2:16 PM, Andres Freund wrote: > On 2017-06-14 11:48:25 +0300, Marina Polyakova wrote: >> I suggest a patch where pgbench client sessions are not disconnected because >> of serialization or deadlock failures and these failures are mentioned in >> reports. > > I think that's a good idea and sorely needed. +1 >> P.S. Does this use case (do not retry transaction with serialization or >> deadlock failure) is most interesting or failed transactions should be >> retried (and how much times if there seems to be no hope of success...)? > > I can't quite parse that sentence, could you restate? The way I read it was that the most interesting solution would retry a transaction from the beginning on a serialization failure or deadlock failure. Most people who use serializable transactions (at least in my experience) run though a framework that does that automatically, regardless of what client code initiated the transaction. These retries are generally hidden from the client code -- it just looks like the transaction took a bit longer. Sometimes people will have a limit on the number of retries. I never used such a limit and never had a problem, because our implementation of serializable transactions will not throw a serialization failure error until one of the transactions involved in causing it has successfully committed -- meaning that the retry can only hit this again on a *new* set of transactions. Essentially, the transaction should only count toward the TPS rate when it eventually completes without a serialization failure. Marina, did I understand you correctly? -- Kevin Grittner VMware vCenter Server https://www.vmware.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC 2017 weekly progress reports (week 2)
On Tue, Jun 13, 2017 at 1:02 PM, Andrew Borodin wrote: > 2017-06-13 18:00 GMT+05:00 Shubham Barai : > Good job! +1! :-) > So, in current HEAD test predicate_gist_2.spec generate false > positives, but with your patch, it does not? Keep in mind, that false positives do not break *correctness* of serializable transactions as long as it involves another transaction. (It *would* be a bug if a transaction running alone could cause a serialization failure.) A false *negative* is always a bug. That said, false positives hurt performance, so we should keep the rate as low as practicable. > I'd suggest keeping spec tests with your code in the same branch, it's > easier. +1 > Also it worth to clean up specs style and add some words to > documentation. +1 > Kevin, all, how do you think, is it necessary to expand these tests > not only on Index Only Scan, but also on Bitmap Index Scan? And may be > KNN version of scan too? > I couldn't find such tests for B-tree, do we have them? Off-hand, I don't know. It would be interesting to run regression tests with code coverage and look at the index AMs. https://www.postgresql.org/docs/9.6/static/regress-coverage.html -- Kevin Grittner VMware vCenter Server https://www.vmware.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PG10 transition tables, wCTEs and multiple operations on the same table
On Wed, Jun 7, 2017 at 5:00 PM, Peter Geoghegan wrote: > My assumption about how transition tables ought to behave here is > based on the simple fact that we already fire both AFTER > statement-level triggers, plus my sense of aesthetics, or bias. I > admit that I might be missing the point, but if I am it would be > useful to know how. Well, either will work. My inclination is that a single statement should cause one execution of the FOR EACH STATEMENT trigger, but if a good case can be made that we should have a FOR EACH STATEMENT trigger fire for each clause within a statement -- well, it won't be a problem for matview maintenance. The biggest hurt there would be to *my* sense of aesthetics. ;-) -- Kevin Grittner VMware vCenter Server https://www.vmware.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PG10 transition tables, wCTEs and multiple operations on the same table
On Wed, Jun 7, 2017 at 4:48 PM, Thomas Munro wrote: > Is there anything about that semantics that is incompatible with the > incremental matview use case? Nothing incompatible at all. If we had separate "new" tables for UPDATE and DELETE we would logically need to do a "counting"-style UNION of them just like we will want to do with the OLD (with a count of -1) and NEW (with a count of 1) to get to a delta now. So keeping them separate would be marginally more work for incremental matview, but not a big deal. > For example, are the "counting" and > "DRed" algorithms you've proposed (for non-recursive and recursive > views) driven entirely by deletions and insertions, so that updates > look like deletes followed by inserts anyway? Counting is best done from a "delta" relation which has old and new combined with opposite counts. I'm sure that will be fine with DRed, too. > If so, I think our > current treatment of ON CONFLICT DO UPDATE should be fine: you can't > tell whether the tuples in the new table result from insert or update > without also consulting the old table, but if the algorithm treats all > tuples in the old table as deletes (even though in this case they > result from updates) and all tuples in the new table as inserts (even > though in this case *some* of them result from updates) anyway then I > don't think it matters. I don't think so, either. -- Kevin Grittner VMware vCenter Server https://www.vmware.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PG10 transition tables, wCTEs and multiple operations on the same table
On Tue, Jun 6, 2017 at 4:42 PM, Robert Haas wrote: > So, are you willing and able to put any effort into this, like say > reviewing the patch Thomas posted, and if so when and how much? If > you're just done and you aren't going to put any more work into > maintaining it (for whatever reasons), then I think you should say so > straight out. People shouldn't have to guess whether you're going to > maintain your patch or not. It has become clear that the scope of problems being found now exceed what I can be sure of being able to fix in time to make for a stable release, in spite of the heroic efforts Thomas has been putting in. I had hoped to get this into the first or second CF of this release, same with the release before, and same with the release before that. At least landing it in the final CF drew the level of review and testing needed to polish it, but it's far from ideal timing (or procedure). I can revert from v10 and deal with all of this for the first CF of some future release, but if someone feels they can deal with it in v10 I'll stand back and offer what help I can. You mentioned blame earlier. That seems pointless to me. I'm looking to see what works and what doesn't. When I went to develop SSI it was because my employer needed it, and they asked what it would take to get that done; I said one year of my time without being drawn off for other tasks to get it to a point where they would be better off using it than not, and then two years half time work to address community concerns and get it committed, and follow up on problems. They gave me that and it worked, and worked well. In this case, resources to carry it through were not reserved when I started, and when I became full-up with other things, then problems surfaced. That doesn't work. I don't want to start something big again until I have resources set up and reserved, as a priority, to see it through. It's a matter of what works for both me and the community and what doesn't. In the meantime I'll try to keep to small enough issues that the resources required to support what I do is not beyond what I can deliver. -- Kevin Grittner VMware vCenter Server https://www.vmware.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PG10 transition tables, wCTEs and multiple operations on the same table
On Wed, Jun 7, 2017 at 3:42 AM, Thomas Munro wrote: > On Wed, Jun 7, 2017 at 7:27 PM, Thomas Munro > wrote: >> On Wed, Jun 7, 2017 at 10:47 AM, Peter Geoghegan wrote: >>> I suppose you'll need two tuplestores for the ON CONFLICT DO UPDATE >>> case -- one for updated tuples, and the other for inserted tuples. >> >> Hmm. Right. INSERT ... ON CONFLICT DO UPDATE causes both AFTER >> INSERT and AFTER UPDATE statement-level triggers to be fired, but then >> both kinds see all the tuples -- those that were inserted and those >> that were updated. That's not right. > > Or maybe it is right. The idea of transition tables is that you see all changes to the target of a single statement in the form of delta relations -- with and "old" table for any rows affected by a delete or update and a "new" table for any rows affected by an update or delete. If we have a single statement that combines INSERT and UPDATE behaviors, it might make sense to have an "old" table for updates and a single "new" table for both. -- Kevin Grittner VMware vCenter Server https://www.vmware.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions
On Sun, Jun 4, 2017 at 11:27 AM, Mengxing Liu wrote: > "vmstat 1" output is as follow. Because I used only 30 cores (1/4 of all), > cpu user time should be about 12*4 = 48. > There seems to be no process blocked by IO. > > procs ---memory-- ---swap-- -io -system-- > --cpu- > r b swpd free buff cache si sobibo in cs us sy id wa > st > 28 0 0 981177024 315036 7084376000 0 900 1 0 > 99 0 0 > 21 1 0 981178176 315036 7084378400 0 0 25482 329020 12 > 3 85 0 0 > 18 1 0 981179200 315036 7084379200 0 0 26569 323596 12 > 3 85 0 0 > 17 0 0 981175424 315036 7084380800 0 0 25374 322992 12 > 4 85 0 0 > 12 0 0 981174208 315036 7084382400 0 0 24775 321577 12 > 3 85 0 0 > 8 0 0 981179328 315036 7084533600 0 0 13115 199020 6 > 2 92 0 0 > 13 0 0 981179200 315036 7084579200 0 0 22893 301373 11 > 3 87 0 0 > 11 0 0 981179712 315036 7084580800 0 0 26933 325728 12 > 4 84 0 0 > 30 0 0 981178304 315036 7084582400 0 0 23691 315821 11 > 4 85 0 0 > 12 1 0 981177600 315036 7084583200 0 0 29485 320166 12 > 4 84 0 0 > 32 0 0 981180032 315036 7084584800 0 0 25946 316724 12 > 4 84 0 0 > 21 0 0 981176384 315036 7084586400 0 0 24227 321938 12 > 4 84 0 0 > 21 0 0 981178880 315036 7084588000 0 0 25174 326943 13 > 4 83 0 0 This machine has 120 cores? Is hyperthreading enabled? If so, what you are showing might represent a total saturation of the 30 cores. Context switches of about 300,000 per second is pretty high. I can't think of when I've seen that except when there is high spinlock contention. Just to put the above in context, how did you limit the test to 30 cores? How many connections were open during the test? > The flame graph is attached. I use 'perf' to generate the flame graph. Only > the CPUs running PG server are profiled. > I'm not familiar with other part of PG. Can you find anything unusual in the > graph? Two SSI functions stand out: 10.86% PredicateLockTuple 3.51% CheckForSerializableConflictIn In both cases, most of that seems to go to lightweight locking. Since you said this is a CPU graph, that again suggests spinlock contention issues. -- Kevin Grittner VMware vCenter Server https://www.vmware.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC 2017 : Proposal for predicate locking in gist index
On Thu, Jun 1, 2017 at 12:49 AM, Andrew Borodin wrote: > First, I just do not know, can VACUUM erase page with predicate lock? For handling in btree, see: https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/nbtree/nbtpage.c;h=f815fd40b20e98a88cb2fb5c71005ea125a459c9;hb=refs/heads/master#l1406 Note also this discussion: https://www.postgresql.org/message-id/4d669122.80...@enterprisedb.com It doesn't look like we ever got to the optimization Heikki suggested in that post, so on rare occasions we could see a false positive from this. Perhaps we should give this another look while we're in the AMs. > Right now, GiST never deletes pages, as far as I know, so this > question is only matter of future compatibility. ok > Second, when we are doing GiST inserts, we can encounter unfinished > split. That's not a frequent situation, but still. Should we skip > finishing split or should we add it to collision check too? When a page is split, I think you need to call PredicateLockPageSplit() before it gets to the point that an insert to get to it. -- Kevin Grittner VMware vCenter Server https://www.vmware.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: GSoC 2017 weekly progress reports ("Explicitly support predicate locks in index access methods besides b-tree")
On Mon, Jun 5, 2017 at 12:40 PM, Shubham Barai wrote: > GSoC (week 1) > 4. went through source code of gist index to understand its implementation > > 5. found appropriate places in gist index AM to insert calls to existing > functions (PredicateLockPage(), CheckForSerializableConflictIn() ...etc) When you have a patch for any AM, please post it to the list. Each AM will be reviewed and considered for commit separately. -- Kevin Grittner VMware vCenter Server https://www.vmware.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PG10 transition tables, wCTEs and multiple operations on the same table
On Tue, Jun 6, 2017 at 3:41 PM, Marko Tiikkaja wrote: > On Tue, Jun 6, 2017 at 10:25 PM, Kevin Grittner wrote: >> It took years to get an in-depth review, then I was asked >> not to commit it because others were working on patches that would >> conflict. That just doesn't leave enough time to address these >> issues before release. Fundamentally, I'm not sure that there is a >> level of interest sufficient to support the effort. > I can only say that the lack of this feature comes up on a weekly basis on > IRC, and a lot of people would be disappointed to see it reverted. Well, at PGCon I talked with someone who worked on the implementation in Oracle 20-some years ago. He said they had a team of 20 people full time working on the feature for years to get it working. Now, I think the PostgreSQL community is a little lighter on its feet, but without more active participation from others than there has been so far, I don't intend to take another run at it. I'll be happy to participate at such point that it's not treated as a second-class feature set. Barring that, anyone else who wants to take the patch and run with it is welcome to do so. -- Kevin Grittner VMware vCenter Server https://www.vmware.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PG10 transition tables, wCTEs and multiple operations on the same table
Nice as it would be to add a SQL standard feature and advance the effort to get to incremental maintenance of materialized views, and much as I really appreciate the efforts Thomas has put into trying to solve these problems, I agree that it is best to revert the feature. It took years to get an in-depth review, then I was asked not to commit it because others were working on patches that would conflict. That just doesn't leave enough time to address these issues before release. Fundamentally, I'm not sure that there is a level of interest sufficient to support the effort. I'll give it a few days for objections before reverting. -- Kevin Grittner VMware vCenter Server https://www.vmware.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PG10 transition tables, wCTEs and multiple operations on the same table
On Fri, Jun 2, 2017 at 8:46 PM, Thomas Munro wrote: > So, afterTriggers.query_stack is used to handle the reentrancy that > results from triggers running further statements that might fire > triggers. It isn't used for dealing with extra ModifyTable nodes that > can appear in a plan because of wCTEs. Could it also be used for that > purpose? I think that would only work if it is the case that each > ModifyTable node begin and then runs to completion (ie no interleaving > of wCTE execution) and then its AS trigger fires, which I'm not sure > about. I don't think we want to commit to anything that depends on a CTE creating an optimization fence, although *maybe* that would be OK in the case of DML as a CTE. That's a pretty special case; I'm not sure whether the standard discusses it. > If that is the case, perhaps AfterTriggerBeginQuery and > AfterTriggerEndQuery could become the responsibility of > nodeModifyTable.c rather than execMain.c. I haven't tried this yet > and it may well be too ambitious at this stage. In a world where one statement can contain multiple DML statements within CTEs, that may make sense regardless of transition table issues; but I agree we're past the time to be considering making such a change for v10. > Other ideas: (1) ban wCTEs that target relations with transition > tables in PG10, because we're out of time; Given that we haven't even discussed what to do about an UPDATE statement with a CTE that updates the same table when there are BEFORE EACH ROW UPDATE triggers on that table (which perhaps return NULL for some but not all rows?), I'm not sure we've yet plumbed the depths of this morass. For example, for this: drop table if exists t1 cascade; create table t1 (c1 int not null, c2 text not null default ''); insert into t1 (c1) values (1), (2), (3), (4), (5); drop function if exists t1_upd_func() cascade; create function t1_upd_func() returns trigger language plpgsql as $$ begin if old.c1 between 2 and 4 then new.c2 = old.c2 || ';' || old.c1::text || ',' || new.c1::text; end if; if old.c1 > 3 then return null; end if; return new; end; $$; create trigger t1_upd_update before update on t1 for each row execute procedure t1_upd_func(); with x as (update t1 set c1 = c1 - 1 returning c1) update t1 set c1 = t1.c1 + 1 from x where x.c1 = t1.c1; select * from t1; ... what is the correct result? I get this: c1 | c2 +-- 4 | 5 | 0 | 1 | ;2,1 2 | ;3,2 (5 rows) It is definitely not what I expected, and seems wrong in multiple ways from a logical perspective, looking at those as set-based operations. (Tautologies about the procedural mechanism that creates the result are not defenses of the result itself.) Can we fix things exclusive of transition tables in this release? If not, the only reasonable thing to do is to try to avoid further complicating things with CTE/transition table usage until we sort out the basics of triggers firing from CTE DML in the first place. -- Kevin Grittner VMware vCenter Server https://www.vmware.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions
On Sat, Jun 3, 2017 at 1:51 AM, Mengxing Liu wrote: > I tried 30 cores. But the CPU utilization is about 45%~70%. > How can we distinguish where the problem is? Is disk I/O or Lock? A simple way is to run `vmstat 1` for a bit during the test. Can you post a portion of the output of that here? If you can configure the WAL directory to a separate mount point (e.g., use the --waldir option of initdb), a snippet of `iostat 1` output would be even better. I think the best thing may be if you can generate a CPU flame graph of the worst case you can make for these lists: http://www.brendangregg.com/flamegraphs.html IMO, such a graph highlights the nature of the problem better than anything else. > Does "traversing these list" means the function "RWConflictExists"? > And "10% waiting on the corresponding kernel mutexes" means the > lock in the function CheckForSerializableIn/out ? Since they seemed to be talking specifically about the conflict list, I had read that as SerializableXactHashLock, although the wording is a bit vague -- they may mean something more inclusive. > If I used 30 cores for server, and 90 clients, RWConflictExists > consumes 1.0% CPU cycles, and CheckForSerializableIn/out takes 5% > in all. > But the total CPU utilization of PG is about 50%. So the result > seems to be matched. > If we can solve this problem, we can use this benchmark for the > future work. If you can get a flame graph of CPU usage on this load, that would be ideal. At that point, we can discuss what is best to attack. Reducing something that is 10% of the total PostgreSQL CPU load in a particular workload sounds like it could still have significant value, although if you see a way to attack the other 90% (or some portion of it larger than 10%) instead, I think we could adjust the scope based on those results. -- Kevin Grittner VMware vCenter Server https://www.vmware.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions
> Mengxing Liu wrote: >> The CPU utilization of CheckForSerializableConflictOut/In is >> 0.71%/0.69%. How many cores were on the system used for this test? The paper specifically said that they didn't see performance degradation on the PostgreSQL implementation until 32 concurrent connections with 24 or more cores. The goal here is to fix *scaling* problems. Be sure you are testing at an appropriate scale. (The more sockets, cores, and clients, the better, I think.) On Fri, Jun 2, 2017 at 10:08 AM, Alvaro Herrera wrote: > Kevin mentioned during PGCon that there's a paper by some group in > Sydney that developed a benchmark on which this scalability > problem showed up very prominently. https://pdfs.semanticscholar.org/6c4a/e427e6f392b7dec782b7a60516f0283af1f5.pdf "[...] we see a much better scalability of pgSSI than the corresponding implementations on InnoDB. Still, the overhead of pgSSI versus standard SI increases significantly with higher MPL than one would normally expect, reaching around 50% with MPL 128." "Our profiling showed that PostgreSQL spend 2.3% of the overall runtime in traversing these list, plus 10% of its runtime waiting on the corresponding kernel mutexes." If you cannot duplicate their results, you might want to contact the authors for more details on their testing methodology. Note that the locking around access to the list is likely to be a bigger problem than the actual walking and manipulation of the list itself. -- Kevin Grittner VMware vCenter Server https://www.vmware.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC 2017 : Proposal for predicate locking in gist index
On Wed, May 31, 2017 at 3:02 PM, Shubham Barai wrote: > I have been accepted as GSoC student for the project "Explicitly support > predicate locks in index access methods besides b-tree". I want to share my > approach for implementation of page level predicate locking in gist index. For the benefit of all following the discussion, implementing support in an index AM conceptually consists of two things: (1) Any scan with user-visible results must create SIRead predicate locks on "gaps" scanned. (For example, a scan just to find an insertion spot for an index entry does not generally count, whereas a scan to satisfy a user "EXISTS" test does.) (2) Any insert into the index must CheckForSerializableConflictIn() at enough points that at least one of them will detect an overlap with a predicate lock from a preceding scan if the inserted index entry might have changed the results of that preceding scan. Detecting such a conflict does not automatically result in a serialization failure, but is only part of tracking the information necessary to make such a determination. All that is handled by the existing machinery if the AM does the above. With a btree index, locking gaps at the leaf level is normally sufficient, because both scan and insertion of an index entry must descend that far. > The main difference between b-tree and gist index while searching for a > target tuple is that in gist index, we can determine if there is a match or > not at any level of the index. Agreed. A gist scan can fail at any level, but for that scan to have produced a different result due to insertion of an index entry, *some* page that the scan looked at must be modified. > The simplest way to do that will be by inserting a call for > prdicatelockpage() in gistscanpage(). Yup. > Insertion algorithm also needs to check for conflicting predicate locks at > each level of the index. Yup. > We can insert a call for CheckForSerializableConflictIn() at two places in > gistdoinsert(). > > 1. after acquiring an exclusive lock on internal page (in case we are trying > to replace an old search key) > > 2. after acquiring an exclusive lock on leaf page > > If there is not enough space for insertion, we have to copy predicate lock > from an old page to all new pages generated after a successful split > operation. For that, we can insert a call for PredicateLockPageSplit() in > gistplacetopage(). That all sounds good. When you code a patch, don't forget to add tests to src/test/isolation. Do you need any help getting a development/build/test environment set up? -- Kevin Grittner VMware vCenter Server https://www.vmware.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)
On Wed, May 31, 2017 at 1:44 AM, Noah Misch wrote: > IMMEDIATE ATTENTION REQUIRED. I should be able to complete review and testing by Friday. If there are problems I might not take action until Monday; otherwise I should be able to do so on Friday. -- Kevin Grittner VMware vCenter Server https://www.vmware.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)
On Thu, May 18, 2017 at 5:16 AM, Amit Langote wrote: > Do we need to update documentation? Perhaps, some clarification on the > inheritance/partitioning behavior somewhere. Yeah, I think so. > -Assert((enrmd->reliddesc == InvalidOid) != (enrmd->tupdesc == NULL)); > +Assert((enrmd->reliddesc == InvalidOid) != > + (enrmd->tupdesc == NULL)); > > Perhaps, unintentional change? Agreed; line is not long enough to need to wrap. > I'm not sure if it's significant for transition tables, but what if a > partition's BR trigger modified the tuple? Would we want to include the > modified version of the tuple in the transition table or the original as > the patch does? Same for the code in CopyFrom(). Good spot! If the BR trigger on the child table modifies or suppresses the action, I strongly feel that must be reflected in the transition table. This needs to be fixed. -- Kevin Grittner VMware vCenter Server https://www.vmware.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)
On Tue, May 16, 2017 at 4:50 PM, Thomas Munro wrote: > On Wed, May 17, 2017 at 9:20 AM, Kevin Grittner wrote: >> On Wed, May 10, 2017 at 7:02 AM, Thomas Munro >> wrote: >>> On Wed, May 10, 2017 at 11:10 PM, Thomas Munro >>> wrote: >>>> 2. If you attach a row-level trigger with transition tables to any >>>> inheritance child, it will see transition tuples from all tables in >>>> the inheritance hierarchy at or below the directly named table that >>>> were modified by the same statement, sliced so that they appear as >>>> tuples from the directly named table. >>> >>> Of course that's a bit crazy, not only for trigger authors to >>> understand and deal with, but also for plan caching: it just doesn't >>> really make sense to have a database object, even an ephemeral one, >>> whose type changes depending on how the trigger was invoked, because >>> the plans stick around. >> >> The patch to add transition tables changed caching of a trigger >> function to key on the combination of the function and the target >> relation, rather than having one cache entry regardless of the >> target table. > > Right. That works as long as triggers always see tuples in the format > of the relation that they're attached to, and I think that's the only > sensible choice. The problem I was thinking about was this: We have > only one pair of tuplestores and in the current design it holds tuples > of a uniform format, yet it may (eventually) need to be scanned by a > statement trigger attached to a the named relation AND any number of > row triggers attached to children with potentially different formats. > That implies some extra conversions are required at scan time. I had > a more ambitious patch that would deal with sone of that by tracking > storage format and scan format separately (next time your row trigger > is invoked the scan format will be the same but the storage format may > be different depending on which relation you named in a query), but > I'm putting that to one side for this release. That was a bit of a > rabbit hole, and there are some tricky design questions about tuple > conversions (to behave like DB2 with subtables may even require > tuplestore with per-tuple type information) and also the subset of > rows that each row trigger should see (which may require extra tuple > origin metadata or separate tuplestores). Yeah, I wish this had surfaced far earlier, but I missed it and none of the reviews prior to commit caught it, either. :-( It seems too big to squeeze into v10 now. I just want to have that general direction in mind and not paint ourselves into any corner that makes it hard to get there in the next release. > I'm about to post a much simpler patch that collects uniform tuples > from children, addressing the reported bug, and simply rejects > transition tables on row-triggers on tables that are in either kind of > inheritance hierarchy. More soon... I agree that we can safely go that far, but not farther. Thanks! -- Kevin Grittner VMware vCenter Server https://www.vmware.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)
On Wed, May 10, 2017 at 7:02 AM, Thomas Munro wrote: > On Wed, May 10, 2017 at 11:10 PM, Thomas Munro > wrote: >> 2. If you attach a row-level trigger with transition tables to any >> inheritance child, it will see transition tuples from all tables in >> the inheritance hierarchy at or below the directly named table that >> were modified by the same statement, sliced so that they appear as >> tuples from the directly named table. > > Of course that's a bit crazy, not only for trigger authors to > understand and deal with, but also for plan caching: it just doesn't > really make sense to have a database object, even an ephemeral one, > whose type changes depending on how the trigger was invoked, because > the plans stick around. The patch to add transition tables changed caching of a trigger function to key on the combination of the function and the target relation, rather than having one cache entry regardless of the target table. -- Kevin Grittner VMware vCenter Server https://www.vmware.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)
[Apologies to all for my recent absence from community lists, and special thanks to Thomas and Robert for picking up the slack.] On Tue, May 9, 2017 at 4:51 PM, Thomas Munro wrote: > On Tue, May 9, 2017 at 10:29 PM, Thomas Munro > wrote: > Recall that transition tables can be specified for statement-level > triggers AND row-level triggers. If you specify them for row-level > triggers, then they can see all rows changed so far each time they > fire. No, they see all rows from the statement, each time. test=# create table t (c int not null); CREATE TABLE test=# create function t_func() test-# returns trigger test-# language plpgsql test-# as $$ test$# begin test$# raise notice '% / % = %', test$#new.c, test$#(select sum(c) from n), test$#(select new.c::float / sum(n.c) from n); test$# return null; test$# end; test$# $$; CREATE FUNCTION test=# create trigger t_trig test-# after insert or update on t test-# referencing new table as n test-# for each row test-# execute procedure t_func(); CREATE TRIGGER test=# insert into t select generate_series(1,5); NOTICE: 1 / 15 = 0.0667 NOTICE: 2 / 15 = 0.133 NOTICE: 3 / 15 = 0.2 NOTICE: 4 / 15 = 0.267 NOTICE: 5 / 15 = 0.333 INSERT 0 5 This behavior is required for this feature by the SQL standard. > Now our policy of firing the statement level triggers only for > the named relation but firing row-level triggers for all modified > relations leads to a tricky problem for the inheritance case: what > type of transition tuples should the child table's row-level triggers > see? The record format for the object on which the trigger was declared, IMO. > Suppose you have an inheritance hierarchy like this: > > animal > -> mammal > -> cat > > You define a statement-level trigger on "animal" and another > statement-level trigger on "mammal". You define a row-level trigger > on "cat". When you update either "animal" or "mammal", the row > triggers on "cat" might run. Row-level triggers on "cat" see OLD and > NEW as "cat" tuples, of course, but if they are configured to see > transition tables, should they see "cat", "mammal" or "animal" tuples > in the transition tables? With my patch as it is, that depends on > which level of the hierarchy you explicitly updated! I think that the ideal behavior would be that if you define a trigger on "cat", you see rows in the "cat" format; if you define a trigger on rows for "mammal", you see rows in the "mammal" format; if you define a trigger on rows for "animal", you see rows in the "animal" format. Also, the ideal would be that we support an ONLY option for trigger declaration. If your statement is ONLY on the a given level in the hierarchy, the row triggers for only that level would fire. If you don't use ONLY, a row trigger at that level would fire for operations at that level or any child level, but with a record format matching the level of the trigger. Now, that may be too ambitious for this release. If so, I suggest we not implement anything that would be broken by the above, and throw a "not implemented" error when necessary. -- Kevin Grittner VMware vCenter Server https://www.vmware.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How huge does mvtest_huge need to be?
On Wed, May 3, 2017 at 4:37 PM, Tom Lane wrote: > At this point I'm inclined to just delete the whole test. ok -- Kevin Grittner VMware vCenter Server https://www.vmware.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How huge does mvtest_huge need to be?
On Wed, May 3, 2017 at 12:08 PM, Tom Lane wrote: > So ... is there a good reason to be using a large table here, and > if so what is it, and how big does the table really need to be > to provide useful test coverage? Hm. This seems like a particularly useless size. It would test a possibly useful corner case if it was over 10MB so that it was over vacuum's truncation threshold, but that would obviously be even slower. It doesn't seem justified. How about 500 so it at least goes to a second page which is then truncated to 1 page. The "huge" in the object names then seems odd, of course. -- Kevin Grittner VMware vCenter Server https://www.vmware.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transition tables for triggers on foreign tables and views
On Fri, Apr 28, 2017 at 10:56 PM, Tom Lane wrote: > They will fire if you have an INSTEAD OF row-level trigger; the existence > of that trigger is what determines whether we implement DML on a view > through the view's own triggers or through translation to an action on > the underlying table. > > I do not think it'd be reasonable to throw an error for creation of > a statement-level view trigger when there's no row-level trigger, > because that just imposes a hard-to-deal-with DDL ordering dependency. > > You could make a case for having the updatable-view translation code > print a WARNING if it notices that there are statement-level triggers > that cannot be fired due to the translation. Oh, I see -- you can add all the AFTER ... FOR EACH STATEMENT triggers you want for an updatable view and they will quietly sit there without firing no matter how many statements perform the supposedly triggering action, but as soon as you add a INSTEAD OF ... FOR EACH ROW trigger they spring to life. On the face of it that seems to me to violate the POLA, but I kinda see how it evolved. I need to look at this and the rather similar odd behavior under inheritance. I hope to post something Friday. -- Kevin Grittner VMware vCenter Server https://www.vmware.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)
On Mon, May 1, 2017 at 11:53 AM, Robert Haas wrote: > On Mon, May 1, 2017 at 12:10 PM, Kevin Grittner wrote: >> On Mon, May 1, 2017 at 10:01 AM, Robert Haas wrote: >>> It seems pretty clear to me that this is busted. >> >> I don't think you actually tested anything that is dependent on any >> of my patches there. > > I was testing which rows show up in a transition table, so I assumed > that was related to the transition tables patch. Note that this is > not about which triggers are fired, just about how inheritance > interacts with transition tables. Yeah, I got confused a bit there, comparing to the updateable views case. -- Kevin Grittner VMware vCenter Server https://www.vmware.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)
On Mon, May 1, 2017 at 10:01 AM, Robert Haas wrote: > It seems pretty clear to me that this is busted. I don't think you actually tested anything that is dependent on any of my patches there. > Adding this as an open item. Kevin? It will take some time to establish what legacy behavior is and how the new transition tables are impacted. My first reaction is that a trigger on the parent should fire for any related action on a child (unless maybe the trigger is defined with an ONLY keyword???) using the TupleDesc of the parent. Note that the SQL spec mandates that even in a AFTER EACH ROW trigger the transition tables must represent all rows affected by the STATEMENT. I think that this should be independent of triggers fired at the row level. I think the rules should be similar for updateable views. This will take some time to investigate, discuss and produce a patch. I think best case is Friday. -- Kevin Grittner VMware vCenter Server https://www.vmware.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transition tables for triggers on foreign tables and views
On Fri, Apr 28, 2017 at 3:54 PM, Kevin Grittner wrote: > Not firing a table's DML because it was fired off a view would be crazy, should read: Not firing a table's trigger because the trigger is on DML run from a view's INSTEAD OF trigger would be crazy, -- Kevin Grittner VMware vCenter Server https://www.vmware.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transition tables for triggers on foreign tables and views
On Fri, Apr 28, 2017 at 2:42 PM, Tom Lane wrote: > Kevin Grittner writes: >> On Tue, Apr 25, 2017 at 6:17 PM, Thomas Munro >> wrote: >>> For views, aside from the question of transition tables, I noticed >>> that statement triggers don't seem to fire at all with updatable >>> views. Surely they should -- isn't that a separate bug? > >> I checked out 25dc142a (before any of my commits for $subject), >> built it, and tried the above -- with no warning generated. I then >> used an UPDATE and DELETE against the view, also with no trigger >> fired (nor any error during trigger creation or DML). Does anyone >> know whether such trigger ever fired at any point in the commit >> history? > > [ experiments... ] They did, and do, fire if you do it the old-style > way with an INSTEAD OF row trigger. Here is the table from near the start of CREATE TRIGGER docs, "folded" such that I hope it remains intelligible in a fixed-width font after email systems get done with it: When Event Row-level Statement-level BEFORE INSERT/UPDATE/DELETE Tables and foreign tables Tables, views, and foreign tables TRUNCATE — Tables AFTER INSERT/UPDATE/DELETE Tables and foreign tables Tables, views, and foreign tables TRUNCATE — Tables INSTEAD OF INSERT/UPDATE/DELETE Views — TRUNCATE — — The claim seems to be that you can create a { BEFORE | AFTER } { event [ OR ... ] } ... FOR EACH STATEMENT trigger where event is { INSERT | UPDATE | DELETE } on an updateable view. Well, you can *create* it, but it will never fire. > They don't fire if you're relying > on an updatable view. It seems we fire the table's statement triggers > instead, ie, the update is fully translated into an update on the > underlying table. Well, certainly that should *also* happen. Not firing a table's DML because it was fired off a view would be crazy, or so it seems to me. > I'm not sure how intentional that was, but it's not a completely > unreasonable definition on its face, and given the lack of field > complaints since 9.3, I think we should probably stick to it. Are you talking about being able to create INSERT, UPDATE, and DELETE triggers on the view (which never fire), or about firing triggers on the table when an INSTEAD OF trigger is fired. > However, if you didn't understand that from the documentation, > then we have a documentation problem. I totally get that there are INSTEAD OF triggers and have no quibble with how they behave. I can't even argue that the above chart is wrong in terms of what CREATE TRIGGER allows. However, creating triggers that can never fire seems entirely wrong. >> If we do get these working, don't they deserve at least >> one regression test? > > Are you sure there isn't one? Well, I was sort of hoping that the triggers that can now be defined but can never fire *did* fire at some point. But if that were true, and they subsequently were broken, it would mean we lacked regression tests for that case. -- Kevin Grittner VMware vCenter Server https://www.vmware.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transition tables for triggers on foreign tables and views
On Tue, Apr 25, 2017 at 6:17 PM, Thomas Munro wrote: > My colleague Prabhat Sahu reported off list that transition tables > don't work for views. I probably should have thought about that when > I fixed something similar for partitioned tables, and after some > experimentation I see that this is also broken for foreign tables. Good spot. Don't know why that didn't occur to me to look at. > For foreign tables using postgres_fdw, I see that transition tables > capture new rows for INSERT but capture nothing for DELETE and UPDATE. Will dig into that in a bit, but first... > For views, aside from the question of transition tables, I noticed > that statement triggers don't seem to fire at all with updatable > views. Surely they should -- isn't that a separate bug? > > Example: > > create table my_table (i int); > create view my_view as select * from my_table; > create function my_trigger_function() returns trigger language plpgsql as > $$ begin raise warning 'hello world'; return null; end; $$; > create trigger my_trigger after insert or update or delete on my_view > for each statement execute procedure my_trigger_function(); > insert into my_view values (42); > > ... and the world remains ungreeted. I checked out 25dc142a (before any of my commits for $subject), built it, and tried the above -- with no warning generated. I then used an UPDATE and DELETE against the view, also with no trigger fired (nor any error during trigger creation or DML). Does anyone know whether such trigger ever fired at any point in the commit history? Should we remove the documentation that anything except INSTEAD OF triggers work on a view, and generate errors for attempt to do otherwise, or is there something to salvage that has worked at some point? If we do get these working, don't they deserve at least one regression test? Will post again after I have a chance to review the FDW issue. -- Kevin Grittner VMware vCenter Server https://www.vmware.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On How To Shorten the Steep Learning Curve Towards PG Hacking...
On Tue, Mar 28, 2017 at 10:36 PM, Craig Ringer wrote: > Personally I have to agree that the learning curve is very steep. Some > of the docs and presentations help, but there's a LOT to understand. Some small patches can be kept to a fairly narrow set of areas, and if you can find a similar capability to can crib technique for handling some of the more mysterious areas it might brush up against. When I started working on my first *big* patch that was bound to touch many areas (around the start of development for 9.1) I counted lines of code and found over a million lines just in .c and .h files. We're now closing in on 1.5 million lines. That's not counting over 376,000 lines of documentation in .sgml files, over 12,000 lines of text in README* files, over 26,000 lines of perl code, over 103,000 lines of .sql code (60% of which is in regression tests), over 38,000 lines of .y code (for flex/bison parsing), about 9,000 lines of various type of code just for generating the configure file, and over 439,000 lines of .po files (for message translations). I'm sure I missed a lot of important stuff there, but it gives some idea the challenge it is to get your head around it all. My first advice is to try to identify which areas of the code you will need to touch, and read those over. Several times. Try to infer the API to areas *that* code needs to reference from looking at other code (as similar to what you want to work on as you can find), reading code comments and README files, and asking questions. Secondly, there is a lot that is considered to be "coding rules" that is, as far as I've been able to tell, only contained inside the heads of veteran PostgreSQL coders, with occasional references in the discussion list archives. Asking questions, proposing approaches before coding, and showing work in progress early and often will help a lot in terms of discovering these issues and allowing you to rearrange things to fit these conventions. If someone with the "gift of gab" is able to capture these and put them into a readily available form, that would be fantastic. > * SSI (haven't gone there yet myself) For anyone wanting to approach this area, there is a fair amount to look at. There is some overlap, but in rough order of "practical" to "theoretical foundation", you might want to look at: https://www.postgresql.org/docs/current/static/transaction-iso.html https://wiki.postgresql.org/wiki/SSI The SQL standard https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob_plain;f=src/backend/storage/lmgr/README-SSI;hb=refs/heads/master http://www.vldb.org/pvldb/vol5.html http://hdl.handle.net/2123/5353 Papers cited in these last two. I have found papers authored by Alan Fekete or Adul Adya particularly enlightening. If any of the other areas that Craig listed have similar work available, maybe we should start a Wiki page where we list areas of code (starting with the list Craig included) as section headers, and put links to useful reading below each? -- Kevin Grittner VMware vCenter Server https://www.vmware.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] recent deadlock regression test failures
On Mon, Apr 10, 2017 at 1:17 PM, Tom Lane wrote: > Kevin Grittner writes: >> On Mon, Apr 10, 2017 at 9:28 AM, Tom Lane wrote: >>> I notice that the safe-snapshot code path is not paying attention to >>> parallel-query cases, unlike the lock code path. I'm not sure how >>> big a deal that is... > >> Parallel workers in serializable transactions should be using the >> transaction number of the "master" process to take any predicate >> locks, and if parallel workers are doing any DML work against >> tuples, that should be using the master transaction number for >> xmin/xmax and serialization failure testing. > > Right, but do they record the master's PID rather than their own in > the SERIALIZABLEXACT data structure? There had better be just a single SERIALIZABLEXACT data structure for a serializable transaction, or it just doesn't work. I can't see how it would have anything but the master's PID in it. If parallel execution is possible in a serializable transaction and things are done any other way, we have a problem. > Maybe it's impossible for a parallel worker to acquire its own > snapshot at all, in which case this is moot. But I'm nervous. I have trouble seeing what sane semantics we could possibly have if each worker for a parallel scan used a different snapshot -- under any transaction isolation level. I know that under the read committed transaction isolation level we can follow xmax pointers to hit tuples on the target relation which are not in the snapshot used for the scan, but I don't see how that would negate the need to have the scan itself run on a single snapshot, -- Kevin Grittner VMware vCenter Server https://www.vmware.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] recent deadlock regression test failures
On Mon, Apr 10, 2017 at 9:28 AM, Tom Lane wrote: > Thomas Munro writes: >> Here's a pair of draft patches for review: Thanks. > Pushed with cosmetic improvements. Thanks. > I notice that the safe-snapshot code path is not paying attention to > parallel-query cases, unlike the lock code path. I'm not sure how > big a deal that is... Parallel workers in serializable transactions should be using the transaction number of the "master" process to take any predicate locks, and if parallel workers are doing any DML work against tuples, that should be using the master transaction number for xmin/xmax and serialization failure testing. If either of those rules are being violated, parallel processing should probably be disabled within a serializable transaction. I don't think safe-snapshot processing needs to do anything special if the above rules are followed, nor can I see anything special it *could* do. -- Kevin Grittner VMware vCenter Server https://www.vmware.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] recent deadlock regression test failures
On Sat, Apr 8, 2017 at 12:56 PM, Tom Lane wrote: > Thomas Munro writes: >> On Sat, Apr 8, 2017 at 4:22 PM, Tom Lane wrote: >> Based on the above, here is a version that introduces a simple boolean >> function pg_waiting_for_safe_snapshot(pid) and adds that the the >> query. This was my "option 1" upthread. > > Nah, this is not good either. Yes, it's a fairly precise conversion > of what Kevin's patch did, but I think that patch is wrong even > without considering the performance angle. If you look back at the > discussions in Feb 2016 that led to what we had, it turned out to be > important not just to be able to say that process X is blocked, but > that it is blocked by one of the other isolationtest sessions, and > not by some random third party (like autovacuum). I do not know > whether there is, right now, any way for autovacuum to be the guilty > party for a SafeSnapshot wait --- but it does not seem like a good > plan to assume that there never will be. It would make no sense to run autovacuum at the serializable transaction isolation level, and only overlapping read-write serializable transactions can block the attempt to acquire a safe snapshot. > So I think what we need is functionality very much like what you had > in the prior patch, ie identify the specific PIDs that are causing > process X to wait for a safe snapshot. I'm just not happy with how > you packaged it. > > Here's a sketch of what I think we ought to do: > > 1. Leave pg_blocking_pids() alone; it's a published API now. Fair enough. > 2. Add GetSafeSnapshotBlockingPids() more or less as you had it > in the previous patch (+ Kevin's recommendations). There might be > value in providing a SQL-level way to call that, but I'm not sure, > and it would be independent of fixing isolationtester anyway. It seems entirely plausible that someone might want to know what is holding up the start of a backup or large report which uses the READ ONLY DEFERRABLE option, so I think there is a real use case for a documented SQL function similar to pg_blocking_pids() to show the pids of connections currently running transactions which are holding things up. Of course, they may not initially know whether it is being blocked by heavyweight locks or concurrent serializable read-write transactions, but it should not be a big deal to run two separate functions. Given the inability to run isolation tests to cover the deferrable code, we used a variation on DBT-2 that could cause serialization anomalies to generate a high concurrency saturation run using serializable transactions, and started a SERIALIZABLE READ ONLY DEFERRABLE transaction 1200 times competing with this load, timing how long it took to start. To quote the VLDB paper[1], "The median latency was 1.98 seconds, with 90% of transactions able to obtain a safe snapshot within 6 seconds, and all within 20 seconds. Given the intended use (long-running transactions), we believe this delay is reasonable." That said, a single long-running serializable read-write transaction could hold up the attempt indefinitely -- there is no maximum bound. Hence the benefit of a SQL function to find out what's happening. > 3. Invent a SQL function that is dedicated specifically to supporting > isolationtester and need not be documented at all; this gets us out > of the problem of whether it's okay to whack its semantics around > anytime somebody thinks of something else they want to test. > I'm imagining an API like > > isolation_test_is_waiting_for(int, int[]) returns bool > > so that isolationtester's query would reduce to something like > > SELECT pg_catalog.isolation_test_is_waiting_for($1, '{...}') > > which would take even more cycles out of the parse/plan overhead for it > (which is basically what's killing the CCA animals). Internally, this > function would call pg_blocking_pids and, if necessary, > GetSafeSnapshotBlockingPids, and would check for matches in its array > argument directly; it could probably do that significantly faster than the > general-purpose array && code. So we'd have to expend a bit of backend C > code, but we'd have something that would be quite speedy and we could > customize in future without fear of breaking behavior that users are > depending on. Good suggestion. Thomas, would you like to produce a patch along these lines, or should I? -- Kevin Grittner [1] Dan R. K. Ports and Kevin Grittner. 2012. Serializable Snapshot Isolation in PostgreSQL. Proceedings of the VLDB Endowment, Vol. 5, No. 12. The 38th International Conference on Very Large Data Bases, August 27th - 31st 2012, Istanbul, Turkey. http://vldb.org/pvldb/vol5/p1850_danrkports_vldb2012.pdf -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] recent deadlock regression test failures
On Fri, Apr 7, 2017 at 9:24 PM, Thomas Munro wrote: > 2. Did I understand correctly that it is safe to scan the list of > SERIALIZABLEXACTs and access the possibleUnsafeConflicts list while > holding only SerializableXactHashLock, Yes. > and that 'inLink' is the correct link to be following? If you're starting from the blocked (read-only) transaction (which you are), inLink is the one to follow. Note: It would be better form to use the SxactIsDeferrableWaiting() macro than repeat the bit-testing code directly in your function. -- Kevin Grittner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Add GUCs for predicate lock promotion thresholds
On Mon, Feb 27, 2017 at 7:35 AM, Dagfinn Ilmari Mannsåker wrote: > Kevin Grittner writes: > >> It occurred to me that it would make sense to allow these settings >> to be attached to a database or table (though *not* a role). I'll >> look at that when I get back if you don't get to it first. > > Attached is a draft patch (no docs or tests) on top of the previous one, > adding reloptions for the per-relation and per-page thresholds. That > made me realise that the corresponding GUCs don't need to be PGC_SIGHUP, > but could be PGC_SUSET or PGC_USERSET. I'll adjust the original patch > if you agree. I have not got around around to adding per-database > versions of the setting, but could have a stab at that. Unfortunately, I was unable to get the follow-on patch to allow setting by relation into a shape I liked. Let's see what we can do for the next release. The first patch was applied with only very minor tweaks. I realize that nothing would break if individual users could set their granularity thresholds on individual connections, but as someone who has filled the role of DBA, the thought kinda made my skin crawl. I left it as PGC_SIGHUP for now; we can talk about loosening that up later, but I think we should have one or more use-cases that outweigh the opportunities for confusion and bad choices by individual programmers to justify that. -- Kevin Grittner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] recent deadlock regression test failures
On Fri, Apr 7, 2017 at 3:47 PM, Thomas Munro wrote: > On Sat, Apr 8, 2017 at 6:35 AM, Kevin Grittner wrote: >> On Fri, Apr 7, 2017 at 12:52 PM, Andres Freund wrote: >> >>> I'd rather fix the issue, than remove the tests entirely. Seems quite >>> possible to handle blocking on Safesnapshot in a similar manner as >>> pg_blocking_pids? >> >> I'll see what I can figure out. > > Ouch. These are the other ways I thought of to achieve this: > > https://www.postgresql.org/message-id/CAEepm%3D1MR4Ug9YsLtOS4Q9KAU9aku0pZS4RhBN%3D0LY3pJ49Ksg%40mail.gmail.com > > I'd be happy to write one of those, but it may take a day as I have > some other commitments. Please give it a go. I'm dealing with putting out fires with customers while trying to make sure I have tested the predicate locking GUCs patch sufficiently. (I think it's ready to go, and has passed all tests so far, but there are a few more I want to run.) I'm not sure I can come up with a solution faster than you, given that. Since it is an improvement to performance for a test-only environment on a feature that is already committed and not causing problems for production environments, hopefully people will tolerate a fix a day or two out. If not, we'll have to revert and get it into the first CF for v11. -- Kevin Grittner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remaining 2017-03 CF entries
On Fri, Apr 7, 2017 at 1:37 PM, Andres Freund wrote: > I think it makes sense to go through those and see whether it's > realistic to commit any of them. > > Ready for Committer: > > Add GUCs for predicate lock promotion thresholds: > - claimed by Kevin, should be easy enough I was planning on pushing this today. -- Kevin Grittner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] recent deadlock regression test failures
On Fri, Apr 7, 2017 at 12:52 PM, Andres Freund wrote: > I'd rather fix the issue, than remove the tests entirely. Seems quite > possible to handle blocking on Safesnapshot in a similar manner as > pg_blocking_pids? I'll see what I can figure out. -- Kevin Grittner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] recent deadlock regression test failures
On Fri, Apr 7, 2017 at 12:28 PM, Tom Lane wrote: > Andrew Dunstan writes: >> On 04/07/2017 12:57 PM, Andres Freund wrote: >>> I don't think any recent changes are supposed to affect deadlock >>> detector behaviour? > >> Both these machines have CLOBBER_CACHE_ALWAYS set. And on both machines >> recent changes have made the isolation tests run much much longer. > > Ouch. I see friarbird's run time for the isolation tests has gone from an > hour and change to over 5 hours in one fell swoop. hyrax not much better. > Oddly, non-CCA animals don't seem to have changed much. > > Eyeing recent patches, it seems like the culprit must be Kevin's > addition to isolationtester's wait query: Ouch. Without this we don't have regression test coverage for the SERIALIZABLE READ ONLY DEFERRABLE code, but it's probably not worth adding 4 hours to any tests, even if it only shows up with CLOBBER_CACHE_ONLY. I assume the consensus is that I should revert it? -- Kevin Grittner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GSoC] Push-based query executor discussion
Sorry, I didn't notice that this was going to a public list. That URL is only available to people who signed up as mentors for PostgreSQL GSoC participation this year. Does the link to the draft work for you? -- Kevin Grittner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GSoC] Push-based query executor discussion
On Thu, Apr 6, 2017 at 8:11 AM, Alexander Korotkov wrote: >> https://docs.google.com/document/d/1dvBETE6IJA9AcXd11XJNPsF_VPcDhSjy7rlsxj262l8/edit?usp=sharing > I'd love to see a comment from Andres Freund who is leading executor > performance improvements. Note that the final proposal is here: https://summerofcode.withgoogle.com/serve/5874530240167936/ Also, I just entered a comment about an important question that I think needs to be answered right up front. -- Kevin Grittner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] delta relations in AFTER triggers
On Mon, Apr 3, 2017 at 7:16 PM, Thomas Munro wrote: > On Tue, Apr 4, 2017 at 3:41 AM, Kevin Grittner wrote: >> On Mon, Apr 3, 2017 at 8:59 AM, Tom Lane wrote: >>> Thomas Munro writes: >>>> Or perhaps the code to inject trigger data transition tables into SPI >>>> (a near identical code block these three patches) should be somewhere >>>> common so that each PLs would only need to call a function. If so, >>>> where should that go? >>> >>> spi.c? >> >> Until now, trigger.c didn't know about SPI, and spi.c didn't know >> about triggers. The intersection was left to referencing code, like >> PLs. Is there any other common code among the PLs dealing with this >> intersection? If so, maybe a new triggerspi.c file (or >> spitrigger.c?) would make sense. Possibly it could make sense from >> a code structure PoV even for a single function, but it seems kinda >> iffy for just this function. As far as I can see it comes down to >> adding it to spi.c or creating a new file -- or just duplicating >> these 30-some lines of code to every PL. > > Ok, how about SPI_register_trigger_data(TriggerData *)? Or any name > you prefer... I didn't suggest something as specific as > SPI_register_transition_tables because think it's plausible that > someone might want to implement SQL standard REFERENCING OLD/NEW ROW > AS and make that work in all PLs too one day, and this would be the > place to do that. > > See attached, which add adds the call to all four built-in PLs. Thoughts? Worked on the docs some more and then pushed it. Nice job cutting the number of *.[ch] lines by 30 while adding support for the other three core PLs. :-) -- Kevin Grittner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SERIALIZABLE with parallel query
On Fri, Mar 10, 2017 at 8:19 PM, Thomas Munro wrote: > On Wed, Feb 22, 2017 at 2:01 PM, Robert Haas wrote: >> I don't think I know enough about the serializable code to review >> this, or at least not quickly, but it seems very cool if it works. >> Have you checked what effect it has on shared memory consumption? > > I'm not sure how to test that. Kevin, can you provide any pointers to > the test workloads you guys used when developing SSI? During development there was first and foremost the set of tests which wound up implemented in the isolation testing environment developed by Heikki, although for most of the development cycle these tests were run by a python tool written by Markus Wanner (which was not as fast as Heikki's C-based tool, but provided a lot more detail in case of failure -- so it was very nice to have until we got down to polishing things). The final stress test to chase out race conditions and the performance benchmarks were all run by Dan Ports on big test machines at MIT. I'm not sure how much I can say to elaborate on what is in section 8 of this paper: http://vldb.org/pvldb/vol5/p1850_danrkports_vldb2012.pdf I seem to remember that he had some saturation run versions of the "DBT-2++" tests, modified to monitor for serialization anomalies missed by the implementation, which he sometimes ran for days at a time on MIT testing machines. There were some very narrow race conditions uncovered by this testing, which at high concurrency on a 16 core machine might hit a particular problem less often than once per day. I also remember that both Anssi Kääriäinen and Yamamoto Takahashi did a lot of valuable testing of the patch and found problems that we had missed. Perhaps they can describe their testing or make other suggestions. -- Kevin Grittner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC 2017 Proposal for "Explicitly support predicate locks in index access methods besides btree"
On Sat, Apr 1, 2017 at 9:03 AM, anant khandelwal wrote: > My name is Anant Khandelwal currently i am pursuing masters from IIT - Delhi > and previously i am a software engineer. > > I am particularly interested in working on the project "Explicitly support > predicate locks in index access methods besides b tree". Anant, Your post was mostly identical (as in copy/paste level identical) to a post by Shubham Barai four days earlier. https://www.postgresql.org/message-id/calxaepvrcjzz0sj2kb_ghatrrdej08rygurftr5nuqxc6ut...@mail.gmail.com https://www.postgresql.org/message-id/CAD=a8sbzwgd4u-tasjp0owtpileffhoz5o5ev0um6digqbv...@mail.gmail.com Unless you can produce convincing proof to the contrary, your proposal will be disqualified because of plagiarism. -- Kevin Grittner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] delta relations in AFTER triggers
On Mon, Apr 3, 2017 at 8:59 AM, Tom Lane wrote: > Thomas Munro writes: >> Or perhaps the code to inject trigger data transition tables into SPI >> (a near identical code block these three patches) should be somewhere >> common so that each PLs would only need to call a function. If so, >> where should that go? > > spi.c? Until now, trigger.c didn't know about SPI, and spi.c didn't know about triggers. The intersection was left to referencing code, like PLs. Is there any other common code among the PLs dealing with this intersection? If so, maybe a new triggerspi.c file (or spitrigger.c?) would make sense. Possibly it could make sense from a code structure PoV even for a single function, but it seems kinda iffy for just this function. As far as I can see it comes down to adding it to spi.c or creating a new file -- or just duplicating these 30-some lines of code to every PL. -- Kevin Grittner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC 2017 Proposal for "Explicitly support predicate locks in index access methods besides btree"
;_hash_first() > ->_hash_readnext() > ->_hash_readprev() > > 2.CheckForSerializableConflictIn() > > -> _hash_doinsert() > > 3. PredicateLockPageSplit() I have not looked in detail at this point, but that seems plausible and indicates admirable research effort in drafting this proposal. > This is just an idea also i understand that Page level predicate locking > exists in the btree AM, where it required the addition of 17 function calls > to implement, but remains unimplemented in the gin, gist, spgist, brin, and > hash index AMs. So we nned to insert function calls at other places also. Exactly. > Also tell me can we evaluate the performance on PostgreSQL on the following > workloads > > SIBENCH microbenchMark > TPC-c++ Those are good, but pgbench (both read-only and read-write loads) is popular to include because any pg hacker can duplicate the tests. https://www.postgresql.org/docs/current/static/pgbench.html -- Kevin Grittner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] delta relations in AFTER triggers
On Fri, Mar 31, 2017 at 12:58 PM, Robert Haas wrote: > On Fri, Mar 31, 2017 at 1:50 PM, Tom Lane wrote: >> I would vote for calling the struct typedef EphemeralNamedRelation and >> using the abbreviation ENR (capitalized that way, not as Enr) in related >> function names, where that seemed sensible. I really do *not* like >> "Enr" as a global name. > > Yeah, I had the same thought about capitalization but wasn't sure if > it was worth suggesting. But since Tom did, +1 from me. Will do. -- Kevin Grittner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] delta relations in AFTER triggers
On Thu, Mar 30, 2017 at 11:51 AM, Kevin Grittner wrote: > New version attached. It needs some of these problem cases added to > the testing, and a mention in the docs that only C and plpgsql > triggers can use the feature so far. I'll add those tomorrow. Done and attached. Now the question is, should it be pushed? It's been through just about every CF in the last three years with little modification, and finally got a thorough enough review in this CF that I think it can be considered. Here are the numbers: 85 files changed, 2266 insertions(+), 132 deletions(-) Of that, 70 lines are the plpgsql implementation (which I should probably push separately), about 200 lines are docs and 623 lines are new regression tests. Most of the rest only comes into play if the feature is used. This adds support for SQL standard sub-feature, although only in triggers written in C and plpgsql. (Other PLs will probably require fewer lines than plpgsql.) It also provides infrastructure needed to get incremental maintenance of materialized views based on just simple declarative DDL. Tom has expressed hope that it could be used to improve performance and memory usage for AFTER triggers, and I believe it can, but that that should be a follow-on patch. It might provide the basis of set-based statement-level enforcement of referential integrity, with the regression tests providing a rough proof of concept. My inclination is to push it late today, but be ready to revert if there are any hard-to-fix surprise problems missed in review and testing; but if the general preference is to hold it for version 11, that's OK with me, too. -- Kevin Grittner transition-v14.diff.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] delta relations in AFTER triggers
On Thu, Mar 23, 2017 at 7:14 PM, Thomas Munro wrote: > my only other comment would be a bikeshed issue: > Enr isn't a great name for a struct. I know, but EphemeralNamedRelation starts to get kinda long, especially when making the normal sorts of concatenations. I started making the change and balked when I saw things like EphemeralNamedRelationMetadataData and a function named EphemeralNamedRelationMetadataGetTupDesc() in place of EnrmdGetTupDesc(). A 40 character function name make for a lot of line-wrapping to stay within pgindent limits. Any suggestions? -- Kevin Grittner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] delta relations in AFTER triggers
On Thu, Mar 23, 2017 at 11:36 PM, Thomas Munro wrote: > One more thought: should this be allowed? > > postgres=# create table mytab (i int) partition by list (i); > CREATE TABLE > postgres=# create table mytab1 partition of mytab for values in (42); > CREATE TABLE > postgres=# create function my_trigger_function() returns trigger as $$ > begin end; $$ language plpgsql; > CREATE FUNCTION > postgres=# create trigger my_trigger after update on mytab referencing > old table as my_old for each statement execute procedure > my_trigger_function(); > CREATE TRIGGER > Perhaps the moral equivalent should be possible for statement triggers > with transition tables, and that already works with your patch as far > as I know. So I think your patch probably just needs to reject them > on partitioned tables. > [patch provided] Yeah, that looks good. Included in next patch version. On Sun, Mar 26, 2017 at 6:39 AM, Thomas Munro wrote: > BTW I had to make the following change to your v12 because of commit b8d7f053: Yeah, I ran into that, too, and used exactly the same fix. On Sun, Mar 26, 2017 at 6:39 AM, Thomas Munro wrote: > On Fri, Mar 24, 2017 at 1:14 PM, Thomas Munro >> When PlanCacheRelCallback runs, I don't think it understands that >> these named tuplestore RangeTblEntry objects are dependent on the >> subject table. Could that be fixed like this? >> >> @@ -2571,6 +2582,9 @@ extract_query_dependencies_walker(Node *node, >> PlannerInfo *context) >> if (rte->rtekind == RTE_RELATION) >> context->glob->relationOids = >> >> lappend_oid(context->glob->relationOids, rte->relid); >> + else if (rte->rtekind == RTE_NAMEDTUPLESTORE) >> + context->glob->relationOids = >> + >> lappend_oid(context->glob->relationOids, [subject table's OID]); > > I'm not sure if this is the right approach and it may have style > issues, but it does fix the crashing in the ALTER TABLE case I > reported: see attached patch which applies on top of your v12. I had been working along similar lines, but had not gotten it working. Merged your version and mine, taking the best of both. :-) Thanks for the reviews and the fixes! New version attached. It needs some of these problem cases added to the testing, and a mention in the docs that only C and plpgsql triggers can use the feature so far. I'll add those tomorrow. -- Kevin Grittner transition-v13.diff.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [pgsql-students] GSoC 2017 Proposal for "Explicitly support predicate locks in index access methods besides btree"
On Tue, Mar 28, 2017 at 12:36 PM, Shubham Barai wrote: > * Hash Index > currently, I am trying to understand how the splitting of bucket works. > should we acquire predicate lock on every page from an old and new bucket in > case there is a predicate lock on a page of an old bucket? For page level locking in hash indexes, I think it might be fine to lock the first page of a bucket. Also, in case you missed it, here are some guidelines I suggested. There weren't any comments, which means they probably didn't offend anyone else too badly. They're just my opinion, but you might want to consider them: Each GSoC student proposal should be a PDF file of 6 to 8 pages. In the end, Google will publish these documents on a web page, so the student should make each proposal something which they will be happy to have future potential employers review. Some ideas for desirable content: - A resume or CV of the student, including any prior GSoC work - Their reasons for wanting to participate - What else they have planned for the summer, and what their time commitment to the GSoC work will be - A clear statement that there will be no intellectual property problems with the work they will be doing -- that the PostgreSQL community will be able to use their work without encumbrances (e.g., there should be no agreements related to prior or ongoing work which might assign the rights to the work they do to someone else) - A description of what they will do, and how - Milestones with dates - What they consider to be the test that they have successfully completed the project -- Kevin Grittner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [pgsql-students] GSoC 2017 Proposal for "Explicitly support predicate locks in index access methods besides btree"
On Tue, Mar 28, 2017 at 12:36 PM, Shubham Barai wrote: > My name is Shubham Barai and I am a final year student at Maharashtra > Institute of Technology, Pune, India. I am very interested in contributing > Postgresql this year through GSoC project. Welcome! Sorry I didn't spot this post earlier -- I'm behind on reading the community lists and this didn't trigger any of the phrases that pop things out to my attention right away. > I am particularly interested in working on the project "Explicitly support > predicate locks in index access methods besides btree". I have gone through > some research papers which were recommended on > https://wiki.postgresql.org/wiki/GSoC_2017 to understand the concept of > Serializable Snapshot Isolation used in PostgreSQL. I have also browsed > through the codebase to get some idea of different access methods for gin, > gist, and hash indexes. I want to discuss my proposal to get some feedback > before I write my final proposal. Sorry, I am discussing my proposal little > late. I was really busy in my academics. Understandable, but please be careful to get your final proposal in by the deadline. Deadlines in GSoC are not flexible. > Currently, only B+-trees support page level predicate locking.For other > indexes, it acquires relation level lock which can lead to unnecessary > serialization failure due to rw dependency caused by any insert into the > index. So, the main task of this project is to support page level predicate > locking for remaining indexes. Right. > [calls out several places that specific calls to predicate locking functions > are needed] > There may be a lot of other places where we need to insert function calls > for predicate locking that I haven't included yet. I didn't go into details > of every index AM. That will be about half the work of the project. It is fine to identify examples for your proposal, to show that you know what to look for, but tracking down every last location can be completed after the proposal is accepted. The other half of the work will be testing and responding to issues others might raise. > can anyone help me find existing tests for b-tree? I think this should be it: kgrittn@kevin-desktop:~/pg/master$ find src/backend/access/nbtree -type f -name '*.c' | grep -v '/tmp_check/' | grep -v '/Debug/' | xargs egrep -n 'PredicateLock|SerializableConflict' src/backend/access/nbtree/nbtinsert.c:201: CheckForSerializableConflictIn(rel, NULL, buf); src/backend/access/nbtree/nbtinsert.c:402: CheckForSerializableConflictIn(rel, NULL, buf); src/backend/access/nbtree/nbtinsert.c:784: PredicateLockPageSplit(rel, src/backend/access/nbtree/nbtsearch.c:1040: PredicateLockRelation(rel, scan->xs_snapshot); src/backend/access/nbtree/nbtsearch.c:1052: PredicateLockPage(rel, BufferGetBlockNumber(buf), src/backend/access/nbtree/nbtsearch.c:1483: PredicateLockPage(rel, blkno, scan->xs_snapshot); src/backend/access/nbtree/nbtsearch.c:1578: PredicateLockPage(rel, BufferGetBlockNumber(so->currPos.buf), scan->xs_snapshot); src/backend/access/nbtree/nbtsearch.c:1869: PredicateLockRelation(rel, scan->xs_snapshot); src/backend/access/nbtree/nbtsearch.c:1874: PredicateLockPage(rel, BufferGetBlockNumber(buf), scan->xs_snapshot); src/backend/access/nbtree/nbtpage.c:1410: PredicateLockPageCombine(rel, leafblkno, leafrightsib); -- Kevin Grittner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Guidelines for GSoC student proposals / Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions
On Wed, Mar 29, 2017 at 11:17 PM, Mengxing Liu wrote: > Thanks, I've updated the proposal. Just one issue: > I agree that we can make skip list a general data structure. But > can we use the fixed-level skip list as a Plan B? Or a quick attempt > before the general data structure ? > Because I am not familiar with shared memory structure and tricks > used in it, and I cannot estimate how much time it would take. It's not really too bad for fixed allocation shared memory, and I can help with that. If I thought it would save much I could see doing a prototype without generalization, but you would still have most of the same shared memory issues, since the structure *must* live in shared memory. -- Kevin Grittner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Guidelines for GSoC student proposals / Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions
On Sat, Mar 25, 2017 at 9:24 PM, Mengxing Liu wrote: > I've updated the proposal according to your comments. > But I am still wondering that can you review it for a double-check > to make sure I've made everything clear? Additional comments added. I'm afraid a few new issues came to mind reading it again. (Nothing serious; just some points that could benefit from a little elaboration.) -- Kevin Grittner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Guidelines for GSoC student proposals / Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions
On Wed, Mar 22, 2017 at 2:24 AM, Mengxing Liu wrote: > I've finished a draft proposal for "Eliminate O(N^2) scaling from > rw-conflict tracking in serializable transactions". I've attached some comments to the document; let me know if they don't show up for you or you need clarification. Overall, if the comments are addressed, I think it is great. -- Kevin Grittner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Guidelines for GSoC student proposals / Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions
On Wed, Mar 22, 2017 at 2:24 AM, Mengxing Liu wrote: > I've finished a draft proposal for "Eliminate O(N^2) scaling from > rw-conflict tracking in serializable transactions". You can find > it from GSOC website or by the link below. > https://docs.google.com/document/d/17TAs3EJIokwPU7UTUmnlVY3ElB-VHViyX1zkQJmrD1A/edit?usp=sharing > > I was wondering if you have time to review the proposal and give me some > comments? Will take a look and give you an initial review in a day or two. -- Kevin Grittner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Candidate for local inline function?
On Fri, Mar 17, 2017 at 3:23 PM, Andres Freund wrote: > On 2017-03-17 15:17:33 -0500, Kevin Grittner wrote: >> Why do we warn of a hazard here instead of eliminating said hazard >> with a static inline function declaration in executor.h? > > Presumably because it was written long before we started relying on > inline functions :/ Right. git blame says it was changed in 2004. >> /* >> * ExecEvalExpr was formerly a function containing a switch statement; >> * now it's just a macro invoking the function pointed to by an ExprState >> * node. Beware of double evaluation of the ExprState argument! >> */ >> #define ExecEvalExpr(expr, econtext, isNull) \ >> ((*(expr)->evalfunc) (expr, econtext, isNull)) >> >> Should I change that to a static inline function doing exactly what >> the macro does? In the absence of multiple evaluations of a >> parameter with side effects, modern versions of gcc have generated >> the same code for a macro versus a static inline function, at least >> in the cases I checked. > > I'm absolutely not against changing this to an inline function, but I'd > prefer if that code weren't touched quite right now, there's a large > pending patch of mine in the area. If you don't mind, I'll just include > the change there, rather than have a conflict? Fine with me. -- Kevin Grittner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Candidate for local inline function?
Why do we warn of a hazard here instead of eliminating said hazard with a static inline function declaration in executor.h? /* * ExecEvalExpr was formerly a function containing a switch statement; * now it's just a macro invoking the function pointed to by an ExprState * node. Beware of double evaluation of the ExprState argument! */ #define ExecEvalExpr(expr, econtext, isNull) \ ((*(expr)->evalfunc) (expr, econtext, isNull)) Should I change that to a static inline function doing exactly what the macro does? In the absence of multiple evaluations of a parameter with side effects, modern versions of gcc have generated the same code for a macro versus a static inline function, at least in the cases I checked. -- Kevin Grittner
[HACKERS] Guidelines for GSoC student proposals
I've found various sources that give hints about what a student proposal should look like, but nothing I could just give as a link, so I pulled together what I could find, tempered by my own ideas and opinions. I suggest that we send the below, or something like it to each student who expresses interest in making a proposal, or who submits a proposal that doesn't meet the below guidelines. Thoughts or suggestions for changes before we do? Remember, time is short, so this cannot be a 200 message bike-shedding debate -- we just need to provide some sort of guidance to students in a timely way, with the timeline being: February 27 - March 20 Potential student participants discuss application ideas with mentoring organizations March 20 16:00 UTC Student application period opens April 3 16:00 UTC Student application deadline Each GSoC student proposal should be a PDF file of 6 to 8 pages. In the end, Google will publish these documents on a web page, so the student should make each proposal something which they will be happy to have future potential employers review. Some ideas for desirable content: - A resume or CV of the student, including any prior GSoC work - Their reasons for wanting to participate - What else they have planned for the summer, and what their time commitment to the GSoC work will be - A clear statement that there will be no intellectual property problems with the work they will be doing -- that the PostgreSQL community will be able to use their work without encumbrances (e.g., there should be no agreements related to prior or ongoing work which might assign the rights to the work they do to someone else) - A description of what they will do, and how - Milestones with dates - What they consider to be the test that they have successfully completed the project Note that a student proposal is supposed to be far more detailed than the ideas for projects provided by the organization -- those are intended to be ideas for what the student might write up as a proposal, not ready-to-go proposal documents. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions
On Wed, Mar 15, 2017 at 11:35 AM, Mengxing Liu wrote: >> On a NUMA machine It is not at all unusual to see bifurcated results >> -- with each run coming in very close to one number or a second >> number, often at about a 50/50 rate, with no numbers falling >> anywhere else. This seems to be based on where the processes and >> memory allocations happen to land. >> > > Do you mean that for a NUMA machine, there usually exists two > different results of its performance? > Just two? Neither three nor four? In my personal experience, I have often seen two timings that each run randomly matched; I have not seen nor heard of more, but that doesn't mean it can't happen. ;-) > At first, I will compile and install PostgreSQL by myself and try > the profile tools (perf or oprofile). perf is newer, and generally better if you can use it. Don't try to use either on HP hardware -- the BIOS uses some of the same hardware registers that other manufacturers leave for use of profilers; an HP machine is likely to freeze or reboot if you try to run either of those profilers under load. > Then I will run one or two benchmarks using different config, > where I may need your help to ensure that my tests are close to the > practical situation. Yeah, we should talk about OS and PostgreSQL configuration before you run any benchmarks. Neither tends to come configured as I would run a production system. > PS: Disable NUMA in BIOS means that CPU can use its own memory > controller when accessing local memory to reduce hops. NUMA means that each CPU chip directly controls some of the RAM (possibly with other, non-CPU controllers for some RAM). The question is whether the BIOS or the OS controls the memory allocation. The OS looks at what processes are on what cores and tries to use "nearby" memory for allocations. This can be pessimal if the amount of RAM that is under contention is less than the size of one memory segment, since all CPU chips need to ask the one managing that RAM for each access. In such a case, you actually get best performance using a cpuset which just uses one CPU package and the memory segments directly managed by that CPU package. Without the cpuset you may actually see better performance for this workload by letting the BIOS interleave allocations, which spreads the RAM allocations around to memory managed by all CPUs, and no one CPU becomes the bottleneck. The access is still not uniform, but you dodge a bottleneck -- albeit less efficiently than using a custom cpuset. > On the contrary, "enable" means UMA. No. Think about this: regardless of this BIOS setting each RAM package is directly connected to one CPU package, which functions as its memory controller. Most of the RAM used by PostgreSQL is for disk buffers -- shared buffers in shared memory and OS cache. Picture processes running on different CPU packages accessing a single particular shared buffer. Also picture processes running on different CPU packages using the same spinlock at the same time. No BIOS setting can avoid the communications and data transfer among the 8 CPU packages, and the locking of the cache lines. > Therefore, I think Tony is right, we should disable this setting. > > I got the information from here. > http://frankdenneman.nl/2010/12/28/node-interleaving-enable-or-disable/ Ah, that explains it. There is no such thing as "disabling NUMA" -- you can have the BIOS force interleaving, or you can have the BIOS leave the NUMA memory assignment to the OS. I was assuming that by "disabling NUMA" you meant to have BIOS control it through interleaving. You meant the opposite -- disabling the BIOS override of OS NUMA control. I agree that we should leave NUMA scheduling to the OS. There are, however, some non-standard OS configuration options that allow NUMA to behave better with PostgreSQL than the defaults allow. We will need to tune a little. The author of that article seems to be assuming that the usage will be with applications like word processing, spreadsheets, or browsers -- where the OS can place all the related processes on a single CPU package and all (or nearly all) memory allocations can be made from associated memory -- yielding a fairly uniform and fast access when the BIOS override is disabled. On a database product which wants to use all the cores and almost all of the memory, with heavy contention on shared memory, the situation is very different. Shared resource access time is going to be non-uniform no matter what you do. The difference is that the OS can still make *process local* allocations from nearby memory segments, while BIOS cannot. -- Kevin Grittner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions
On Tue, Mar 14, 2017 at 3:45 PM, Kevin Grittner wrote: >> On 3/14/17 17:34, Mengxing Liu wrote: >>> There are several alternative benchmarks. Tony suggested that we >>> should use TPC-E and TPC-DS. > > More benchmarks is better, all other things being equal. Keep in > mind that good benchmarking practice with PostgreSQL generally > requires a lot of setup time (so that we're starting from the exact > same conditions for every run), a lot of run time (so that the > effects of vacuuming, bloat, and page splitting all comes into play, > like it would in the real world), and a lot of repetitions of each > run (to account for variation). In particular, on a NUMA machine it > is not at all unusual to see bifurcated Sorry I didn't finish that sentence. On a NUMA machine It is not at all unusual to see bifurcated results -- with each run coming in very close to one number or a second number, often at about a 50/50 rate, with no numbers falling anywhere else. This seems to be based on where the processes and memory allocations happen to land. Different people have dealt with this in different ways. If you can get five or more runs of a given test, perhaps the best is to throw away the high and low values and average the rest. Other approaches I've seen are to average all, take the median, or take the best result. The worst result of a set of runs is rarely interesting, as it may be due to some periodic maintenance kicking in (perhaps at the OS level and not related to database activity at all). -- Kevin Grittner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions
On Tue, Mar 14, 2017 at 6:00 AM, DEV_OPS wrote: > On 3/14/17 17:34, Mengxing Liu wrote: >>>>> The worst problems have been >>>>> seen with 32 or more cores on 4 or more sockets with a large number >>>>> of active connections. I don't know whether you have access to a >>>>> machine capable of putting this kind of stress on it (perhaps at >>>>> your university?), but if not, the community has access to various >>>>> resources we should be able to schedule time on. >>>> There is a NUMA machine ( 120 cores, 8 sockets) in my lab. >>> Fantastic! Can you say a bit more about the architecture and OS? >> >> Intel(R) Xeon(R) CPU at 2.3GHz, with 1TB physical DRAM and 1.5 TB >> SSD, running Ubuntu 14.04, Kernel 3.19. >> I guess NUMA is disabled in BIOS, but I will check that. I'm not sure what it would mean to "disable" NUMA -- if the CPU chips are each functioning as memory controller for a subset of the RAM you will have non-uniform memory access speeds from any core to different RAM locations. You can switch it to "interleaved" access, so the speed of access from a core to any logical memory address will be rather random, rather than letting the OS try to arrange things so that processes do most of their access to memory that is faster for them. It is generally better to tune NUMA in the OS than to randomize things at the BIOS level. >> However, there is only one NIC, so network could be the >> bottleneck if we use too many cores? Well, if we run the pgbench client on the database server box, the NIC won't matter at all. If we move the client side to another box, I still think that when we hit this problem, it will dwarf any impact of the NIC throughput. > The configuration is really cool, for the SSD, is it SATA interface? > NVMe interface, or is PCIe Flash? different SSD interface had different > performance characters. Yeah, knowing model of SSD, as well as which particular Xeon we're using, would be good. > for the NUMA, because the affinity issue will impact PostgreSQL > performance. so, Please disabled it if possible I disagree. (see above) >> There are several alternative benchmarks. Tony suggested that we >> should use TPC-E and TPC-DS. More benchmarks is better, all other things being equal. Keep in mind that good benchmarking practice with PostgreSQL generally requires a lot of setup time (so that we're starting from the exact same conditions for every run), a lot of run time (so that the effects of vacuuming, bloat, and page splitting all comes into play, like it would in the real world), and a lot of repetitions of each run (to account for variation). In particular, on a NUMA machine it is not at all unusual to see bifurcated >> Personally, I am more familiar with TPC-C. Unfortunately, the TPC-C benchmark does not create any cycles in the transaction dependencies, meaning that it is not a great tool for benchmarking serializable transactions. I know there are variations on TPC-C floating around that add a transaction type to do so, but on a quick search right now, I was unable to find one. >> And pgbench seems only have TPC-B built-in benchmark. You can feed it your own custom queries, instead. >> Well, I think we can easily find the implementations of these >> benchmarks for PostgreSQL. Reportedly, some of the implementations of TPC-C are not very good. There is DBT-2, but I've known a couple of people to look at that and find that it needed work before they could use it. Based on the PostgreSQL versions mentioned on the Wiki page, it has been neglected for a while: https://wiki.postgresql.org/wiki/DBT-2 >> The paper you recommended to me used a special benchmark defined >> by themselves. But it is quite simple and easy to implement. It also has the distinct advantage that we *know* they created a scenario where the code we want to tune was using most of the CPU on the machine. >> For me, the challenge is profiling the execution. Is there any >> tools in PostgreSQL to analysis where is the CPU cycles consumed? >> Or I have to instrument and time by myself? Generally oprofile or perf is used if you want to know where the time is going. This creates a slight dilemma -- if you configure your build with --enable-cassert, you get the best stack traces and you can more easily break down execution profiles. That especially true if you disable optimization and don't omit frame pointers. But all of those things distort the benchmarks -- adding a lot of time, and not necessarily in proportion to where the time goes with an optimized build. -- Kevin Grittner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] delta relations in AFTER triggers
On Sun, Mar 12, 2017 at 4:08 PM, Thomas Munro wrote: > I found a new way to break it: run the trigger function so > that the plan is cached by plpgsql, then ALTER TABLE incompatibly, > then run the trigger function again. See attached. The first part doesn't seem so bad. Using the transition table in a FOR EACH STATEMENT trigger you get: test=# update hoge set name = (name::text || name::text)::integer; ERROR: attribute 2 has wrong type DETAIL: Table has type integer, but query expects text. CONTEXT: SQL statement "SELECT (SELECT string_agg(id || '=' || name, ',') FROM d)" PL/pgSQL function hoge_upd_func() line 3 at RAISE ... while putting each row on its own line from a FOR EACH ROW trigger you get: test=# update hoge set name = (name::text || name::text)::integer; ERROR: type of parameter 15 (integer) does not match that when preparing the plan (text) CONTEXT: SQL statement "SELECT (SELECT string_agg(old.id || '=' || old.name, ','))" PL/pgSQL function hoge_upd_func() line 3 at RAISE Does anyone think the first message needs improvement? If so, to what? Obviously the next part is a problem. With the transition table we get a segfault: test=# -- now drop column 'name' test=# alter table hoge drop column name; ALTER TABLE test=# update hoge set id = id; server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. While the row-oriented query manages to dodge it: test=# alter table hoge drop column name; ALTER TABLE test=# update hoge set id = id; ERROR: record "old" has no field "name" CONTEXT: SQL statement "SELECT (SELECT string_agg(old.id || '=' || old.name, ','))" PL/pgSQL function hoge_upd_func() line 3 at RAISE I expected that existing mechanisms would have forced re-planning of a trigger function if the table the function was attached to was altered. Either that was a bit "optimistic", or the old TupleDesc is used for the new plan. Will track down which it is, and fix it. >> Do you suppose we should have all PLs that are part of the base >> distro covered? > > I vote for doing that in Postgres 11. My pl/python patch[1] may be a > useful starting point, but I haven't submitted it in this CF and > nobody has shown up with pl/tcl or pl/perl versions. OK, but we'd better add something to the docs saying that only C and plpgsql triggers are supported in v10. >> What is necessary to indicate an additional SQL feature covered? > > I assume you're talking about information_schema.sql_features I had forgotten we had that in a table. I was thinking more of the docs: https://www.postgresql.org/docs/current/static/features.html I guess if we change one, we had better change the other. (Or are they generated off a common source?) > a couple of thoughts occurred to me when looking for > references to transition tables in an old draft standard I have. > These are both cases where properties of the subject table should > probably also affect access to the derived transition tables: > > * What privileges implications are there for transition tables? I'm > wondering about column and row level privileges; for example, if you > can't see a column in the subject table, I'm guessing you shouldn't be > allowed to see it in the transition table either, but I'm not sure. I'll see how that works in FOR EACH ROW triggers. We should match that, I think. Keep in mind that not just anyone can put a trigger on a table. > * In future we could consider teaching it about functional > dependencies as required by the spec; if you can SELECT id, name FROM > GROUP BY id, I believe you should be able to SELECT > id, name FROM GROUP BY id, but currently you can't. Interesting idea. I'll post a new patch once I figure out the dropped column on the cached function plan. -- Kevin Grittner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions
On Sat, Mar 11, 2017 at 8:39 PM, Mengxing Liu wrote: >> The worst problems have been >> seen with 32 or more cores on 4 or more sockets with a large number >> of active connections. I don't know whether you have access to a >> machine capable of putting this kind of stress on it (perhaps at >> your university?), but if not, the community has access to various >> resources we should be able to schedule time on. > > There is a NUMA machine ( 120 cores, 8 sockets) in my lab. Fantastic! Can you say a bit more about the architecture and OS? > I think it's enough to put this kind of stress. The researchers who saw this bottleneck reported that performance started to dip at 16 cores and the problem was very noticeable at 32 cores. A stress test with 120 cores on 8 sockets will be great! I think perhaps the first milestone on the project should be to develop a set of benchmarks we want to compare to at the end. That would need to include a stress test that clearly shows the problem we're trying to solve, along with some cases involving 1 or two connections -- just to make sure we don't harm performance for low-contention situations. -- Kevin Grittner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions
On Fri, Mar 10, 2017 at 9:12 PM, Mengxing Liu wrote: > My name is Mengxing Liu. I am interested in the project "Eliminate > O(N^2) scaling from rw-conflict tracking in serializable > transactions”. After discussing with Kevin off-list, I think it's > time to post discussion here. I am afraid that there are two things > that I need your help. Thank you very much. Welcome to the -hackers list! This is a key part of how development happens in the community. Don't be shy about posting questions and ideas, but also expect fairly blunt discussion to occur at times; don't let that put you off. >>> So the task is clear. We can use a tree-like or hash-like data >>> structure to speed up this function. >> >> Right -- especially with a large number of connections holding a >> large number of conflicts. In one paper with high concurrency, they >> found over 50% of the CPU time for PostgreSQL was going to these >> functions (including functions called by them). This seems to me to >> be due to the O(N^2) (or possibly worse) performance from the number >> of connections. > > Anyone knows the title of this paper? I want to reproduce its > workloads. I seem to remember there being a couple other papers or talks, but this is probably the most informative: http://sydney.edu.au/engineering/it/research/tr/tr693.pdf >> Remember, I think most of the work here is going to be in >> benchmarking. We not only need to show improvements in simple test >> cases using readily available tools like pgbench, but think about >> what types of cases might be worst for the approach taken and show >> that it still does well -- or at least not horribly. It can be OK >> to have some slight regression in an unusual case if the common >> cases improve a lot, but any large regression needs to be addressed >> before the patch can be accepted. There are some community members >> who are truly diabolical in their ability to devise "worst case" >> tests, and we don't want to be blind-sided by a bad result from one >> of them late in the process. >> > > Are there any documents or links introducing how to test and > benchmark PostgreSQL? I may need some time to learn about it. There is pgbench: https://www.postgresql.org/docs/devel/static/pgbench.html A read-only load and a read/write mix should both be tested, probably with a few different scales and client counts to force the bottleneck to be in different places. The worst problems have been seen with 32 or more cores on 4 or more sockets with a large number of active connections. I don't know whether you have access to a machine capable of putting this kind of stress on it (perhaps at your university?), but if not, the community has access to various resources we should be able to schedule time on. It may pay for you to search through the archives of the last year or two to look for other benchmarks and see what people have previously done. -- Kevin Grittner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSOC Introduction / Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions
> [two people interested in the same GSoC project] It's an interesting problem to have. If neither of you chooses to voluntarily back down, the obvious resolution is for the mentors to vote on the better proposal. If we do that early enough during the student application period, there might still be time for the person whose proposal wasn't chosen to submit a proposal for an alternative project. As I see it, that means: - I would tend to favor a proposal submitted on the first day (beginning March 20 16:00 UTC), if only one is. - I would push for very early voting by the PostgreSQL mentors. -- Kevin Grittner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSOC Introduction / Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions
[including Mengxing Liu in response, for reasons that should become obvious below...] Hi George, On Thu, Mar 9, 2017 at 6:49 PM, George Papadrosou wrote: > my name is George Papadrosou, this is my first semester as > graduate student at Georgia Tech and would like to submit a > proposal to Google Summer of Code, for the project "Eliminate > O(N^2) scaling from rw-conflict tracking in serializable > transactions”. I was recently contacted off-list by Mengxing Liu, who has been looking at the same project, and said he was planning to submit a GSoC proposal. Rather than have one of you sit this out, do either of you feel comfortable taking a different project instead? Since you've both been looking at the serializable code and supporting documents, perhaps one of you could change to the other suggested Serializable project? > I am going to prepare a draft proposal for this project and share > it with you soon. The project’s description is pretty clear, do > you think it should be more strictly defined in the proposal? At a minimum, the proposal should include list of milestones you expect to reach along the way, and a timeline indicating when you expect to reach them. Some description of the benchmarks you intend to run would be also be very good. > Until then, I would like to familiarize myself a bit with the > codebase and fix some bug/todo. I didn’t find many [E] marked > tasks in the todo list so the task I was thinking is "\s without > arguments (display history) fails with libedit, doesn't use pager > either - psql \s not working on OSX”. However, it works on my OSX > El Capitan laptop with Postgres 9.4.4. Would you suggest some > other starter task? There is a CommitFest in progress; reviewing patches is a good way to become involved and familiar with the community processes. https://wiki.postgresql.org/wiki/CommitFest -- Kevin Grittner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] SQL Standard Feature T211
[separate thread from transition table patch, since a different audience might be interested] Four things are required to claim support for Feature T211, "Basic trigger capability": - support for the CREATE TRIGGER statement - the ability to declare and reference transition tables for AFTER triggers - support for the DROP TRIGGER statement - support for TRIGGER in the GRANT and REVOKE statements With the addition of transition tables we have all four, although I'm not sure whether the CREATE TRIGGER statement conforms closely enough to claim the feature. The two basic issues I can think of are: - we don't allow the OLD and NEW transition variable names to be specified -- using hard-coded values instead - we don't support the standard formats, instead supporting only our EXECUTE PROCEDURE syntax Do we leave T211, "Basic trigger capability" on our "unsupported" list until those two points are addressed? Does anyone know of anything else that would need to be done? -- Kevin Grittner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] delta relations in AFTER triggers
On Tue, Mar 7, 2017 at 6:28 PM, Kevin Grittner wrote: > New patch attached. And bit-rotted less than 24 hours later by fcec6caa. New patch attached just to fix bit-rot. That conflicting patch might be a candidate to merge into the new Ephemeral Named Relation provided by my patch, for more flexibility and extensibility... -- Kevin Grittner transition-v12.diff.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] delta relations in AFTER triggers
On Mon, Feb 20, 2017 at 7:44 PM, Thomas Munro wrote: > Was it intentional that this test doesn't include any statements that > reach the case where the trigger does RAISE EXCEPTION 'RI error'? > From the output generated there doesn't seem to be any evidence that > the triggers run at all, though I know from playing around with this > that they do Tests expanded to cover more. Some bugs found and fixed in the process. :-/ > + * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group > > These copyright messages are missing 3 years' worth of bugfixes. Those were only in files created with the initial patch in 2014. No bug fixes missing. Updated the years to 2017, though. > +SPI_get_caller_relation(const char *name) > > Do we need this function? It's not used by the implementation. Good point. It seemed useful way back when, but since no uses for it emerged, it should be removed until such time (if any) that it would be useful. > +typedef struct NamedTuplestoreScan > +{ > + Scan scan; > + char *enrname; > +} NamedTuplestoreScan; > > Nearly plan node structs always have a comment for the members below > 'scan'; I think this needs one too because 'enrname' is not > self-explanatory. Done. > /* > + * Capture the NEW and OLD transition TABLE tuplestores (if specified for > + * this trigger). > + */ > + if (trigdata->tg_newtable || trigdata->tg_oldtable) > + { > + estate.queryEnv = create_queryEnv(); > + if (trigdata->tg_newtable) > + { > + Enr enr = palloc(sizeof(EnrData)); > + > + enr->md.name = trigdata->tg_trigger->tgnewtable; > + enr->md.tupdesc = trigdata->tg_relation->rd_att; > + enr->md.enrtuples = tuplestore_tuple_count(trigdata->tg_newtable); > + enr->reldata = trigdata->tg_newtable; > + register_enr(estate.queryEnv, enr); > + SPI_register_relation(enr); > + } > > Why do we we have to call register_enr and also SPI_register_relation here? Essentially, because plpgsql does some things through SPI and some things not. Both cases are covered. > On Mon, Dec 19, 2016 at 4:35 AM, Thomas Munro > wrote: >> On Sun, Dec 18, 2016 at 3:15 PM, Kevin Grittner wrote: >>> There were a few changes Thomas included in the version he posted, >>> without really delving into an explanation for those changes. Some >>> or all of them are likely to be worthwhile, but I would rather >>> incorporate them based on explicit discussion, so this version >>> doesn't do much other than generalize the interface a little, >>> change some names, and add more regression tests for the new >>> feature. (The examples I worked up for the rough proof of concept >>> of enforcement of RI through set logic rather than row-at-a-time >>> navigation were the basis for the new tests, so the idea won't get >>> totally lost.) Thomas, please discuss each suggested change (e.g., >>> the inclusion of the query environment in the parameter list of a >>> few more functions). >> >> I was looking for omissions that would cause some kind of statements >> to miss out on ENRs arbitrarily. It seemed to me that >> parse_analyze_varparams should take a QueryEnvironment, mirroring >> parse_analyze, so that PrepareQuery could pass it in. Otherwise, >> PREPARE wouldn't see ENRs. Is there any reason why SPI_prepare should >> see them but PREPARE not? > > Any thoughts about that? Do you see any way to test that code, or would it be dead code there "just in case" we later decided to do something that needed it? I'm not a big fan of the latter. I've had to spend too much time maintaining and/or ripping out code that fits that description. On Mon, Feb 20, 2017 at 9:43 PM, Thomas Munro wrote: > On Tue, Feb 21, 2017 at 7:14 AM, Thomas Munro > wrote: >> On Fri, Jan 20, 2017 at 2:49 AM, Kevin Grittner wrote: >>> Attached is v9 which fixes bitrot from v8. No other changes. >>> >>> Still needs review. > > Based on a suggestion from Robert off-list I tried inserting into a > delta relation from a trigger function and discovered that it > segfaults Fixed. Such an attempt now generates something like this: ERROR: relation "d" cannot be the target of a modifying statement CONTEXT: SQL statement "INSERT INTO d VALUES (100, 100, 'x')" PL/pgSQL function level2_table_bad_usage_func() line 3 at SQL statement New patch attached. Miscellanea: Do you suppose we should have all PLs that are part of the base distro covered? What is necessary to indicate an additional SQL feature covered? -- Kevin Grittner transition-v11.diff.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] WARNING: relcache reference leak: relation "p1" not closed
With e434ad39ae7316bcf35fd578dd34ad7e1ff3c25f I did a `make world`, `make install-world`, a fresh default initdb, a start with default config, `make installcheck`, connected to the regression database with psql as the initial superuser, and ran: regression=# vacuum freeze analyze; WARNING: relcache reference leak: relation "p1" not closed VACUUM -- Kevin Grittner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WARNING: relcache reference leak: relation "p1" not closed
[original message held up for review -- should be along eventualy...] On Mon, Mar 6, 2017 at 3:11 PM, Kevin Grittner wrote: > With e434ad39ae7316bcf35fd578dd34ad7e1ff3c25f I did a `make world`, > `make install-world`, a fresh default initdb, a start with default > config, `make installcheck`, connected to the regression database > with psql as the initial superuser, and ran: > > regression=# vacuum freeze analyze; > WARNING: relcache reference leak: relation "p1" not closed > VACUUM Still present in e6477a8134ace06ef3a45a7ce15813cd398e72d8 -- Kevin Grittner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] delta relations in AFTER triggers
On Thu, Mar 2, 2017 at 9:04 AM, David Steele wrote: > On 2/20/17 10:43 PM, Thomas Munro wrote: >> Based on a suggestion from Robert off-list I tried inserting into a >> delta relation from a trigger function and discovered that it >> segfaults: >> >> * frame #0: 0x0001057705a6 >> postgres`addRangeTableEntryForRelation(pstate=0x7fa58186a4d0, >> rel=0x, alias=0x, inh='\0', >> inFromCl='\0') + 70 at parse_relation.c:1280 [opt] >> frame #1: 0x00010575bbda >> postgres`setTargetTable(pstate=0x7fa58186a4d0, >> relation=0x7fa58186a098, inh=, alsoSource='\0', >> requiredPerms=1) + 90 at parse_clause.c:199 [opt] >> frame #2: 0x000105738530 postgres`transformStmt [inlined] >> transformInsertStmt(pstate=) + 69 at analyze.c:540 [opt] >> frame #3: 0x0001057384eb >> postgres`transformStmt(pstate=, parseTree=) >> + 2411 at analyze.c:279 [opt] >> frame #4: 0x000105737a42 >> postgres`transformTopLevelStmt(pstate=, >> parseTree=0x7fa58186a438) + 18 at analyze.c:192 [opt] >> frame #5: 0x0001059408d0 >> postgres`pg_analyze_and_rewrite_params(parsetree=0x7fa58186a438, >> query_string="insert into d values (100, 100, 'x')", >> parserSetup=(plpgsql.so`plpgsql_parser_setup at pl_comp.c:1017), >> parserSetupArg=0x7fa58185c2a0, queryEnv=0x7fa581857798) + 128 >> at postgres.c:706 [opt] > > Do you know when you will have a new patch ready? > > This looks like an interesting and important feature but I think it's > going to have to come together quickly if it's going to make it into v10. I hope to post a new version addressing review comments by Monday (6 March). -- Kevin Grittner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal
On Thu, Feb 9, 2017 at 1:44 PM, Tom Lane wrote: > Robert Haas writes: >> On Thu, Feb 9, 2017 at 2:08 PM, Tom Lane wrote: >>> Although this doesn't really settle whether we ought to do 3a (with >>> backwards-compatibility function aliases in core) or 3b (without 'em). >>> Do people want to re-vote, understanding that those are the remaining >>> choices? > >> I prefer (3c) put them in an extension and let people that need 'em >> install 'em, but not have them available by default. > > As far as the core code is concerned, 3b and 3c are the same thing. I think so, too, if we're talking about an extension in core. My vote is for 3b. If someone wants to write the alias extension and make it available outside of core, fine -- though they don't need anyone's vote to do so. -- Kevin Grittner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal
On Sun, Feb 5, 2017 at 10:24 PM, Michael Paquier wrote: >> So... somebody want to tally up the votes here? > > Here is what I have, 6 votes clearly stated: > 1. Rename nothing: Daniel, > 2. Rename directory only: Andres > 3. Rename everything: Stephen, Vladimir, David S, Michael P (with > aliases for functions, I could live without at this point...) I vote for 3 as well, with 1 as the only sane alternative. -- Kevin Grittner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal
On Thu, Jan 26, 2017 at 3:55 PM, Robert Haas wrote: > The substantive issue here is whether we should go forward with this > change, back out the change we already did, or leave things as they > are. Tom, David, and I seem to be in lock step on at least the > following conclusion: halfway in between is bad. I agree. > So I have every > intention of continuing to push very hard for us to go either forward > or backward. +1 Given the number of times I've known of people deleting files from pg_xlog because the name made the contents seem unimportant, I think finishing the job is better. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Add GUCs for predicate lock promotion thresholds
On Mon, Jan 23, 2017 at 9:50 AM, Dagfinn Ilmari Mannsåker wrote: > [new patch addressing issues raised] Thanks! >> In releases prior to this patch, max_pred_locks_per_relation was >> calculated as "max_pred_locks_per_transaction / 2". I know that >> people have sometimes increased max_pred_locks_per_transaction >> specifically to raise the limit on locks within a relation before >> the promotion to relation granularity occurs. It seems kinda >> anti-social not to support a special value for continuing that >> behavior or, if we don't want to do that, at least modifying >> pg_upgrade to set max_pred_locks_per_relation to the value that was >> in effect in the old version. > > This is exactly what we've been doing at my workplace It occurred to me that it would make sense to allow these settings to be attached to a database or table (though *not* a role). I'll look at that when I get back if you don't get to it first. >>> ilm...@ilmari.org (Dagfinn Ilmari Mannsåker) writes: >>>> One thing I don't like about this patch is that if a user has >>>> increased max_pred_locks_per_transaction, they need to set >>>> max_pred_locks_per_relation to half of that to retain the current >>>> behaviour, or they'll suddenly find themselves with a lot more >>>> relation locks. If it's possible to make a GUCs default value >>>> dependent on the value of another, that could be a solution. >>>> Otherwise, the page lock threshold GUC could be changed to be >>>> expressed as a fraction of max_pred_locks_per_transaction, to keep >>>> the current behaviour. >> >>> Another option would be to have a special sentinel (e.g. -1) which is >>> the default, and keeps the current behaviour. > > The attached updated patch combines the two behaviours. Positive values > mean an absolute number of locks, while negative values mean > max_predicate_locks_per_transaction / -max_predicate_locks_per_relation. > Making the default value -2 keeps backwards compatibility. > > Alternatively we could have a separate setting for the fraction (which > could then be a floating-point number, for finer control), and use the > absolute value if that is zero. I will need some time to consider that -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Add GUCs for predicate lock promotion thresholds
On Mon, Jan 23, 2017 at 9:50 AM, Dagfinn Ilmari Mannsåker wrote: > Kevin Grittner writes: > >> (1) The new GUCs are named max_pred_locks_per_*, but the >> implementation passes them unmodified from a function specifying at >> what count the lock should be promoted. We either need to change >> the names or (probably better, only because I can't think of a good >> name for the other way) return the GUC value + 1 from the function. >> Or maybe change the function name and how the returned number is >> used, if we care about eliminating the overhead of the increment >> instruction. That actually seems least confusing. > > I've done the latter, and renamed the function MaxPredicateChildLocks. > I also changed the default for max_pred_locks_per_page from 3 to 4 and > the minimum to 0, to keep the effective thresholds the same. > >> (2) The new GUCs are declared and documented to be set only on >> startup. This was probably just copied from >> max_pred_locks_per_transaction, but that sets an allocation size >> while these don't. I think these new GUCs should be PGC_SIGHUP, >> and documented to change on reload. > > Fixed. > >> (3) The documentation for max_pred_locks_per_relation needs a fix. >> Both page locks and tuple locks for the relation are counted toward >> the limit. > > Fixed. > >> In releases prior to this patch, max_pred_locks_per_relation was >> calculated as "max_pred_locks_per_transaction / 2". I know that >> people have sometimes increased max_pred_locks_per_transaction >> specifically to raise the limit on locks within a relation before >> the promotion to relation granularity occurs. It seems kinda >> anti-social not to support a special value for continuing that >> behavior or, if we don't want to do that, at least modifying >> pg_upgrade to set max_pred_locks_per_relation to the value that was >> in effect in the old version. > > This is exactly what we've been doing at my workplace, and I mentioned > this in my original email: > >>> ilm...@ilmari.org (Dagfinn Ilmari Mannsåker) writes: >>>> One thing I don't like about this patch is that if a user has >>>> increased max_pred_locks_per_transaction, they need to set >>>> max_pred_locks_per_relation to half of that to retain the current >>>> behaviour, or they'll suddenly find themselves with a lot more >>>> relation locks. If it's possible to make a GUCs default value >>>> dependent on the value of another, that could be a solution. >>>> Otherwise, the page lock threshold GUC could be changed to be >>>> expressed as a fraction of max_pred_locks_per_transaction, to keep >>>> the current behaviour. >> >>> Another option would be to have a special sentinel (e.g. -1) which is >>> the default, and keeps the current behaviour. > > The attached updated patch combines the two behaviours. Positive values > mean an absolute number of locks, while negative values mean > max_predicate_locks_per_transaction / -max_predicate_locks_per_relation. > Making the default value -2 keeps backwards compatibility. > > Alternatively we could have a separate setting for the fraction (which > could then be a floating-point number, for finer control), and use the > absolute value if that is zero. > > Regards, > > Ilmari > > -- > "A disappointingly low fraction of the human race is, > at any given time, on fire." - Stig Sandbeck Mathisen > -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Add GUCs for predicate lock promotion thresholds
A couple things occurred to me after hitting "Send". In addition to the prior 2 points: (3) The documentation for max_pred_locks_per_relation needs a fix. Both page locks and tuple locks for the relation are counted toward the limit. In releases prior to this patch, max_pred_locks_per_relation was calculated as "max_pred_locks_per_transaction / 2". I know that people have sometimes increased max_pred_locks_per_transaction specifically to raise the limit on locks within a relation before the promotion to relation granularity occurs. It seems kinda anti-social not to support a special value for continuing that behavior or, if we don't want to do that, at least modifying pg_upgrade to set max_pred_locks_per_relation to the value that was in effect in the old version. In any event, it seems more like autovacuum_work_mem or autovacuum_vacuum_cost_limit than like effective_cache_size. Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Add GUCs for predicate lock promotion thresholds
On Thu, Jan 5, 2017 at 12:19 PM, Tom Lane wrote: > ilm...@ilmari.org (Dagfinn Ilmari =?utf-8?Q?Manns=C3=A5ker?=) writes: >> ilm...@ilmari.org (Dagfinn Ilmari Mannsåker) writes: >>> One thing I don't like about this patch is that if a user has increased >>> max_pred_locks_per_transaction, they need to set >>> max_pred_locks_per_relation to half of that to retain the current >>> behaviour, or they'll suddenly find themselves with a lot more relation >>> locks. If it's possible to make a GUCs default value dependent on the >>> value of another, that could be a solution. Otherwise, the page lock >>> threshold GUC could be changed to be expressed as a fraction of >>> max_pred_locks_per_transaction, to keep the current behaviour. > >> Another option would be to have a special sentinel (e.g. -1) which is >> the default, and keeps the current behaviour. > > FWIW, interdependent GUC defaults are generally unworkable. > See commit a16d421ca and the sorry history that led up to it. > I think you could make it work as a fraction, though. I think that, ultimately, both an fraction of actual (the number of tuples on a page or the number of pages in a relation) *and* an absolute number (as this patch implements) could be useful. The former would be more "adaptable" -- providing reasonable behavior for different sized tuples and different sized tables, while the latter would prevent a single table with very small tuples or a lot of pages from starving all other uses. This patch implements the easier part, and is likely to be very useful to many users without the other part, so I think it is worth accepting as a step in the right direction, and consistent with not letting the good be the enemy of the perfect. There are a couple minor formatting issues I can clean up as committer, but there are a couple more substantive things to note. (1) The new GUCs are named max_pred_locks_per_*, but the implementation passes them unmodified from a function specifying at what count the lock should be promoted. We either need to change the names or (probably better, only because I can't think of a good name for the other way) return the GUC value + 1 from the function. Or maybe change the function name and how the returned number is used, if we care about eliminating the overhead of the increment instruction. That actually seems least confusing. (2) The new GUCs are declared and documented to be set only on startup. This was probably just copied from max_pred_locks_per_transaction, but that sets an allocation size while these don't. I think these new GUCs should be PGC_SIGHUP, and documented to change on reload. Other than that, I think it is in good shape and I am would feel good about committing it. Of course, if there are any objections to it, I will not go ahead without reaching consensus. I am on vacation next week, so I would not do so before then; in fact I may not have a chance to respond to any comments before then. If the author wishes to make these changes, great; otherwise I can do it before commit. I will mark this Ready for Committer. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] delta relations in AFTER triggers
On Fri, Jan 20, 2017 at 2:08 PM, Nico Williams wrote: > On Fri, Jan 20, 2017 at 01:37:33PM -0600, Kevin Grittner wrote: >> There is currently plenty of room for pseudo-MV implementations, >> and may be for a while. It's a good indication of the need for the >> feature in core. An implementation in the guts of core can have >> advantages that nothing else can, of course. For example, for >> eager application of the deltas, nothing will be able to beat >> capturing tuples already in RAM and being looked at for possible >> trigger firing into a RAM-with-spill-to-disk tuplestore. > > BTW, automatic updates of certain types of MVs should be easy: add > constraints based on NEW/OLD rows from synthetic triggers to the > underlying query. Convincing me that this is a good idea for actual MVs, versus pseudo-MVs using tables, would be an uphill battle. I recognize the need to distinguish between MVs which contain recursive CTEs in their definitions and MVs that don't, so that the DRed algorithm can be used for the former and the counting algorithm for the latter; but firing triggers for row-at-a-time maintenance is not going to be efficient for very many cases, and the cost of identifying those cases to handle them differently is probably going to exceed any gains. Comparative benchmarks, once there is an implementation using set-based techniques, could potentially convince me; but there's not much point arguing about it before that exists. :-) > However, there is a bug in the query planner that prevents this > from being very fast. At some point I want to tackle that bug. What bug is that? > Basically, the planner does not notice that a table source in a > join has a lookup key sufficiently well-specified by those additional > constraints that it should be the first table source in the outermost > loop. Is that a description of what you see as the bug? Can you give an example, to clarify the point? I am dubious, though, of the approach in general, as stated above. >> I don't have time to review what you've done right now, but will >> save that link to look at later, if you give permission to borrow >> from it (with proper attribution, of course) if there is something >> that can advance what I'm doing. If such permission is not >> forthcoming, I will probably avoid looking at it, to avoid any >> possible copyright issues. > > Our intention is to contribute this. We're willing to sign > reasonable contribution agreements. Posting a patch to these lists constitutes an assertion that you have authority to share the IP, and are doing so. Referencing a URL is a bit iffy, since it doesn't leave an archival copy of the contribution under the community's control. > I'd appreciate a review, for sure. Thanks! Would it be possible to get your approach running using tables and/or (non-materialized) views as an extension? A trigger-based way to maintain pseudo-MVs via triggers might make an interesting extension, possibly even included in contrib if it could be shown to have advantages over built-in MVs for some non-trivial applications. > There's a gotcha w.r.t. NULL columns, but it affects the built-in > REFRESH as well, IIRC. The commentary in our implementation > discusses that in more detail. Could you report that on a new thread on the lists? I've seen comments about such a "gotcha", but am not clear on the details. It probably deserves its own thread. Once understood, we can probably fix it. Thanks! -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] delta relations in AFTER triggers
On Thu, Jan 19, 2017 at 4:14 PM, Nico Williams wrote: > I hope what I've done about delta relations will be mostly irrelevant > given your patch (which I've not looked at in detail), Reviews welcome! > but just FYI, > I've built an alternate, all-SQL-coded materialized view system that > captures deltas between refreshes and deltas from direct DMLs of the > materialized view: There is currently plenty of room for pseudo-MV implementations, and may be for a while. It's a good indication of the need for the feature in core. An implementation in the guts of core can have advantages that nothing else can, of course. For example, for eager application of the deltas, nothing will be able to beat capturing tuples already in RAM and being looked at for possible trigger firing into a RAM-with-spill-to-disk tuplestore. > https://github.com/twosigma/postgresql-contrib/blob/master/pseudo_mat_views.sql > > There are some good ideas there, IMO, even if that implementation were > useless because of your patch. I don't have time to review what you've done right now, but will save that link to look at later, if you give permission to borrow from it (with proper attribution, of course) if there is something that can advance what I'm doing. If such permission is not forthcoming, I will probably avoid looking at it, to avoid any possible copyright issues. > Incidentally, it's really nice that PG has some "higher order" SQL > features that make this sort of thing easier. In particular, here, row > values and record types, and being able to refer to a table as a column > of the table's record type. Yeah, I found that quite handy in developing the REFRESH feature, and expect to be using it in incremental maintenance. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] delta relations in AFTER triggers
Attached is v10 which fixes bitrot from v9 caused by ea15e186. Still needs review. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company transition-v10.diff.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] delta relations in AFTER triggers
Attached is v9 which fixes bitrot from v8. No other changes. Still needs review. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company transition-v9.diff.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Packages: Again
On Fri, Jan 13, 2017 at 12:35 PM, Serge Rielau wrote: > Yes my proposal to nest schemata is “radical” and this community > is not falling into that camp. > But there is nothing holy about database.schema.object.attribute It is mandated by the U.S. and international SQL standard documents. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch to implement pg_current_logfile() function
On Fri, Jan 13, 2017 at 7:09 AM, Michael Paquier wrote: > There is no real harm in including current_logfiles in base > backups, but that's really in the same bag as postmaster.opts or > postmaster.pid, particularly if the log file name has a > timestamp. I'm going to dispute that -- if postmaster.opts and postmaster.pid are present when you restore, it takes away a level of insurance against restoring a corrupted image of the database without knowing it. In particular, if the backup_label file is deleted (which happens with alarmingly frequency), the startup code may think it is dealing with a cluster that crashed rather than with a restore of a backup. This often leads to corruption (anything from "database can't start" to subtle index corruption that isn't noticed for months). The presence of log files from the time of the backup do not present a similar hazard. So while I agree that there is no harm in including current_logfiles in base backups, I object to the comparisons to the more dangerous files. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal
On Thu, Jan 12, 2017 at 12:12 PM, Tom Lane wrote: > Vladimir Rusinov writes: >> On Thu, Jan 12, 2017 at 4:57 PM, Euler Taveira wrote: >>> As Robert suggested in the other email: extension to create old names. > >> I don't follow the reasoning for the extension. It seem to have downsides >> of both alternatives combined: we still break people's code, and we add >> even more maintenance burden than just keeping aliases. > > Yeah, I'm not terribly for the extension idea. Robert cited the precedent > of contrib/tsearch2, but I think the history of that is a good argument > against this: tsearch2 is still there 9 years later and there's no > indication that we'll ever get rid of it. We can let things rot > indefinitely in core too. If we do ever agree to get rid of the aliases, > stripping them out of pg_proc.h will not be any harder than removing > a contrib directory would be. How about just leaving it to someone who cares about the aliases to post such an extension at pgxn.org (or anywhere else they like). It can be around as long as someone cares enough to maintain it. I guess that makes my vote -1 on aliases by the project. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [DOCS] [HACKERS] Questionable tag usage
On Tue, Jan 10, 2017 at 9:58 AM, Tom Lane wrote: > whether to continue using "see section m.n"-type cross-references For my part, I have a preference for including the section name with the link text, although if it took much work to add it (rather than being the new default) I might question whether the benefit was worth the effort of adding it. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RustgreSQL
On Tue, Jan 10, 2017 at 7:48 AM, Robert Haas wrote: > I'm not meaning to be funny or sarcastic or disrespectful when I say > that I think C is the best possible language for PostgreSQL. It works > great, and we've got a ton of investment in making it work. I can't > see why we'd want to start converting even a part of the code to > something else. Perhaps it seems like a good idea from 10,000 feet, > but in practice I believe it would be fraught with difficulties - and > if it injected even a few additional instructions into hot code paths, > it would be a performance loser. It strikes me that exactly the set of functions that Joel is suggesting could be written in another language is the set where the declarations in the .h files could be considered for replacement with static inline functions, potentially giving a significant performance boost which would not be available if they were instead converted to another language. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Odd behavior with PG_TRY
On Fri, Jan 6, 2017 at 5:43 AM, Michael Paquier wrote: > If a variable is modified within PG_TRY and then referenced in > PG_CATCH it needs to be marked as volatile to be strictly in > conformance with POSIX. This also ensures that any compiler does not > do any stupid optimizations with those variables in the way they are > referenced and used. That sort of begs the question of why PG_exception_stack is not marked as volatile, since the macros themselves modify it within the PG_TRY block and reference it within the PG_CATCH block. Is there some reason this variable is immune to the problem? -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] guc-ify the formerly hard-coded MAX_SEND_SIZE to max_wal_send
On Thu, Jan 5, 2017 at 7:32 PM, Jonathon Nelson wrote: > On Thu, Jan 5, 2017 at 1:01 PM, Andres Freund wrote: >> On 2017-01-05 12:55:44 -0600, Jonathon Nelson wrote: >>> In our lab environment and with a 16MiB setting, we saw substantially >>> better network utilization (almost 2x!), primarily over high bandwidth >>> delay product links. >> >> That's a bit odd - shouldn't the OS network stack take care of this in >> both cases? I mean either is too big for TCP packets (including jumbo >> frames). What type of OS and network is involved here? > > In our test lab, we make use of multiple flavors of Linux. No jumbo frames. > We simulated anything from 0 to 160ms RTT (with varying degrees of jitter, > packet loss, etc.) using tc. Even with everything fairly clean, at 80ms RTT > there was a 2x improvement in performance. Is there compression and/or encryption being performed by the network layers? My experience with both is that they run faster on bigger chunks of data, and that might happen before the data is broken into packets. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] delta relations in AFTER triggers
On Sun, Dec 4, 2016 at 11:35 PM, Haribabu Kommi wrote: > Moved to next CF with "waiting on author" status. Patch v8 attempts to address the issues explicitly raised in Thomas Munro's review. An opaque query environment is created that, for now, only passes through ephemeral named relations, of which the only initial implementation is named tuplestores; but the techniques are intended to support both other types of ephemeral named relations and environmental properties (things affecting parsing, planning, and execution that are not coming from the system catalog) besides ENRs. There is no clue in the access to the QE whether something is, for example, stored in a list or a hash table. That's on purpose, so that the addition of other properties or changes to their implementation doesn't affect the calling code. There were a few changes Thomas included in the version he posted, without really delving into an explanation for those changes. Some or all of them are likely to be worthwhile, but I would rather incorporate them based on explicit discussion, so this version doesn't do much other than generalize the interface a little, change some names, and add more regression tests for the new feature. (The examples I worked up for the rough proof of concept of enforcement of RI through set logic rather than row-at-a-time navigation were the basis for the new tests, so the idea won't get totally lost.) Thomas, please discuss each suggested change (e.g., the inclusion of the query environment in the parameter list of a few more functions). Changed to "Needs review" status. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company transition-v8.diff.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation [and 2 more messages] [and 1 more messages]
I guess the preceding posts leave us with these guarantees about read-only transactions which we might want to make explicit in the documentation: (1) A serialization failure cannot be initially thrown on a COMMIT attempt for a read-only transaction; however, if a subtransaction catches a serialization failure exception on a statement within the transaction, and doesn't re-throw it or throw any other error which terminates the transaction, the serialization failure can show up on the COMMIT attempt. (NOTE: We may want to check whether the "doomed" flag is set on a write conflict for a serializable transaction. It seems to me that it should be, but that might have been missed. If so, that should be treated as a bug and fixed.) (2) A read-only transaction cannot show results inconsistent with already-committed transactions, so there is no chance of needing to discard results from a read-only transaction due to failure of the transaction to commit. Both of these should hold for both explicit read-only transactions (which are set to READ ONLY after a BEGIN or START, or due to default_transaction_read_only being set tot true and not overridden), and implicit read-only transactions. It is still worthwhile to explicitly set serializable transactions to read-only whenever possible, for performance reasons. The idea that a serialization failure is not possible on the first (or only) statement o a read-only transaction was in error, and should be disregarded. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation [and 2 more messages] [and 1 more messages]
On Fri, Dec 16, 2016 at 8:24 AM, Robert Haas wrote: > On Thu, Dec 15, 2016 at 9:01 AM, Kevin Grittner wrote: >> I also realized some other properties of read-only transactions >> that might interest you (and that I should probably document). >> Since the only way for a read-only transaction to be the on >> experiencing a serialization failure is if Tout commits before the >> read-only transaction (which is always Tin) acquires its snapshot, >> Tpivot is still running when Tin acquires its snapshot, Tpivot >> commits before a serialization failure involving Tin is detected, >> and *then* Tin reads a data set affected by the writes of Tpivot. >> Since a snapshot is only acquired when a statement is run which >> requires a snapshot, that means that a query run in an implicit >> transaction (i.e., there is no START or BEGIN statement to >> explicitly start it; the SELECT or other data-accessing statement >> automatically starts the transaction so it has a valid context in >> which to run) that does not write data can never return bad results >> nor receive a serialization failure. Nor can those things happen >> on the *first* or *only* non-writing statement in an explicit >> transaction. > > I don't understand this argument. Every statement in a read-only, > serializable transaction runs with the same snapshot, so I don't see > how it can make a difference whether we're in the middle of running > the first statement or the tenth. Tpivot might commit in the middle > of executing the first statement of the transaction, which might then > -- later on during the execution of that same statement -- do > something that causes it to acquire a bunch more SIREAD locks. Good point. For the read only transaction to be the one to receive a serialization failure, it must acquire a snapshot while Tpivot is still running, and read a data set which was affected by Tpivot, and must do so after Tpivot has successfully committed. However, if the commit of Tpivot comes after Tin has parsed the statement, determined that it is one that requires a snapshot, and acquired its snapshot and before it reads the modified data set, Tin could get the serialization failure. Muddled thinking on my part to think of acquiring the snapshot to be atomic with running the statement which caused the snapshot to be acquired. Good catch! -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation [and 2 more messages] [and 1 more messages]
On Thu, Dec 15, 2016 at 9:53 AM, Ian Jackson wrote: > I don't think "set max_pred_locks_per_transaction generously" is a > practical thing to write in the documentation, because the application > programmer, or admin, has no sensible way to calculate what a > sufficiently generous value is. ok > You seem to be implying that code relying on the summarised data might > make over-optimistic decisions. That seems dangerous to me, but (with > my very dim view of database innards) I can't immediately see how to > demonstrate that it must in any case be excluded. No, with any of these conditions, the information on which a decision to generate a serialization failure is summarized into something less granular, and in all cases we turn any "in doubt" situations into serialization failures; that is, you can get false positives (a serialization failure exception where complete information could have avoided it) but not false negatives (a serialization anomaly appearing in the database or query results from a transaction which commits). Based on that alone, I think it is fair to say that it does not weaken guarantees about serialization failures for read-only transactions not being possible on commit unless the initial exception is suppressed in a subtransaction nor that anomalous results are not possible in a read-only transaction. The granularity promotion of predicate locks could not affect the guarantees about never seeing a serialization failure on the first statement that reads data in a read-only transaction, but I would need to take a very close look at how the SLRU summarization of committed transactions might affect that one -- we lose some of the detail about the relative order of the commits and snapshot acquisitions, and that might be enough to allow a false positive on that first statement. That would require more study than I can give it this month. I do remember that Dan ran some saturation workloads to stress this feature for days and weeks at a time pushing things to the point of using the SLRU summarization, and I remember thinking it odd that certain tests generated zero errors on the read-only transactions, which I'm pretty sure were single-statement transactions. It was only during this week's discussion that I had the epiphany about that only being possible if the read-only transaction had multiple statements; but the fact that such long saturation runs with SLRU summarization showed no errors on read-only transactions supports the idea that such summarization doesn't compromise that guarantee. Unfortunately, it falls short of proof. :-( -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation [and 2 more messages] [and 1 more messages]
transactions -- those that, in spite of not being *declared* as READ ONLY never wrote any data. Although these have higher overhead than transactions explicitly declared READ ONLY up front, many of the properties of the explicitly declared read-only transaction hold -- including, it seems to me, the property of never returning badly serialized data nor of being chosen to receive the serialization failure error unless both Tout and Tpivot have already committed (in that order). > Supposing I understand your `doomed' flag correctly, I think it is > then probably possible to construct an argument that proves that > allowing the application to trap and suppress serialisation failures > does not make it harder to provide coherency guarantees. That is the intention. > Or to put it another way: does pgsql already detect serialisation > problems (in transactions which only read) at the point where it would > otherwise return data not consistent with any serialisation order ? > (As it does in the `Rollover' example.) Yes, as explained above. > If so presumably it always throws a serialisation failure at that > point. I think that is then sufficient. There is no need to tell the > application programmer they have to commit even transactions which > only read. Well, if they don't explicitly start a transaction there is no need to explicitly commit it, period. An implicit transaction is created if a statement needing execution context (such as a SELECT) is started outside of any explicit transaction; but such a transaction is always explicitly committed or rolled back upon completion of the statement. There is always a transaction, but there is not always a need to explicitly manage it. > If my supposition is right then I will try to develop this argument > more formally. I think that would be worthwhile because the converse > property is very surprising to non-database programmers, and would > require very explicit documentation by pgsql, and careful attention by > application programmers. It would be nice to be able to document a > stronger promise. If you can put together a patch to improve the documentation, that is always welcome! In case you're not already aware of how we do things, patches are developed against the master branch, and then there is a determination about how far back to back-patch it in stable branches. While the rule is that stable branches are only modified to correct serious bugs or security vulnerabilities, in order to make it as safe as possible for people to apply minor releases without fear of breaking something that works, I think we could consider an argument for back-patching a doc change that clarifies or fills omissions that make it difficult to use a feature correctly. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation [and 2 more messages] [and 1 more messages]
On Wed, Dec 14, 2016 at 11:12 AM, Ian Jackson wrote: > Kevin Grittner writes ("Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: > Retry on constraint violation [and 2 more messages] [and 1 more messages]"): >> On Wed, Dec 14, 2016 at 10:20 AM, Ian Jackson >> wrote: >> I would alter that slightly to: >> >> There is a consistent serialization of all serializable >> transactions which successfully commit. > > Here `serializable' means SERIALIZABLE ? I'm not entirely sure what you mean to convey by the capitalization, so I'll just say that 'serializable' there referred to the transaction isolation level. (I *think* that was what you were getting at.) >> For examples, please see this Wiki page. You might be particularly >> interested in the examples in the "Read Only Transactions" section: >> >> https://wiki.postgresql.org/wiki/SSI > > Thanks. I read that part of the wiki page. But in that example, we > are told that T1 will be aborted, not T3. That is true in the first "Deposit Report") example in that section. The second ("Rollover") example shows the read-only transaction (T3) being the one which is aborted and retried. > Can it happen that a transaction which does not make any update > attempts, will see "impossible" data, and that this is only detected > at COMMIT ? Does that apply even to READ ONLY transactions ? As Robert pointed out, a read-only transaction cannot normally be aborted by a serialization failure on COMMIT. Under exceptional conditions, like an attempt to suppress the serialization failure, you might see the commit aborted, though. Also as pointed out by Robert, the state seen by a read-only transaction doesn't lack internal consistency, but it will be rolled back with a serialization failure exception if it can show a state which is inconsistent with some successfully-committed state of the database. In the "Rollover" example, the first time T3 is attempted its SELECT it would have shown rows containing 100 and 11, were it not canceled. That could have been consistent with the earlier state of 100 and 10 and the business rules that the first number can only change by having the second number added to it, and the second number can only change by being incremented; but that state and those rules don't fit with the *later* state of 110, 11, because that requires that the second number be added to the first before it was incremented, and if we allow the result of the first T3 transaction attempt to be seen, it would tell us that the increment happened first. Since we've already allowed successful commit of transactions putting things into a state only consistent with adding 10 to 100 before incrementing 10 to 11, cancel the read-only transaction and start over. This time it will show something consistent with the apparent order of execution of the other transactions. Note that neither the order that the first two transaction started in (T1->T2) nor the order they committed in (T2->T1) determines the apparent order of execution. It is the rw-dependencies that control (T1 reads a version of data before T2's work is applied, so T1 *appears* to have run before T2 in apparent order of execution.) Since both are successfully committed with that apparent order of execution, a third transaction (T3), which sees the work of T2 (since it had committed when the snapshot for T3 was taken) but not T1 (since it had not committed when the snapshot for T3 was taken) cannot be allowed to proceed. I know an example like that can cause one's head to hurt a bit (been there), but even if you don't fight your way to a full grasp of that case, it will hopefully give you some idea of both why we can have high concurrency with this approach, and why it is necessary to discard results from failed transactions. >> Once a serialization failure occurs the transaction is flagged as >> "doomed" and will not, under any circumstances be allowed to >> commit. If you find any exception to that, please report it as a >> bug. > > Right. I think this prevents any exception-catching arrangements from > suppressing the serialisation failure. Since AIUI it is not possible > to run the outer COMMIT from within an exception trapping context. Right. > If it /is/ possible to run that outer COMMIT in a way which swallows > the exception then [...] That is not possible, as I understand things. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation [and 2 more messages] [and 1 more messages]
On Wed, Dec 14, 2016 at 10:20 AM, Ian Jackson wrote: > Let me try to summarise my understanding of what the developers think > they can and intend to promise, about SERIALIZABLE transactions: > > There is a consistent serialisation of all transactions which > successfully commit; or which do not attempt to make any changes. > > A "consistent serialisation" means that there is an order in which > the same transactions might have been executed giving exactly the > answers. This includes, if applicable, the same errors. (The > database is free to generate synchronisation failure errors 40P01 and > 40001 whenever it chooses.) I would alter that slightly to: There is a consistent serialization of all serializable transactions which successfully commit. > A transaction which attempts to make any changes, and which does not > commit (whether because the application never asks for COMMIT, or > because of reported synchronisation failure) might see internally > inconsistent data, or an internally-consistent view which is not > compatible with any serialisation of other transactions. An > application which needs a coherent view should not rely on any of the > information from such a transaction. Even a read-only transaction can see a state that is not consistent with business rules (as enforced in the software) given that some particular later state is reached. For examples, please see this Wiki page. You might be particularly interested in the examples in the "Read Only Transactions" section: https://wiki.postgresql.org/wiki/SSI > Serialisation failures in subtransactions might cause the parent > transaction to experience a serialisation failure too. There is currently at least one bug which may allow serialization anomalies into the database if a constraint violation error is thrown in a subtransaction and that subtransaction catches and suppresses that exception and rolls back its work without throwing an error. I expect that any bugs of this type are will be fixed in a minor release set soon -- probably the next one that is released. Note that I don't think that an exception from any source other than a declarative constraint can cause this type of problem, and that other conditions must exist in combination with this to create a serialization anomaly. A serialization failure within any subtransaction will ensure the top level transaction will fail, even if there is an attempt to catch this exception and commit the top level transaction. It would be possible to catch a serialization failure exception and throw some *other* exception to terminate the transaction; however, (to step into very convoluted territory) if that other exception is caught and suppressed, the serialization failure error would occur. Once a serialization failure occurs the transaction is flagged as "doomed" and will not, under any circumstances be allowed to commit. If you find any exception to that, please report it as a bug. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Add GUCs for predicate lock promotion thresholds
On Wed, Dec 14, 2016 at 8:16 AM, Dagfinn Ilmari Mannsåker wrote: > Attached is a patch Please add this to the open CommitFest to ensure that it is reviewed. http://commitfest.postgresql.org/action/commitfest_view/open -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers