Re: [HACKERS] pgcrypto missing header file inclusions
On Sat, Dec 28, 2013 at 10:36:19PM -0500, Peter Eisentraut wrote: While playing around with the pginclude tools, I noticed that pgcrypto header files are failing to include some header files whose symbols they use. This change would fix it: diff --git a/contrib/pgcrypto/pgp.h b/contrib/pgcrypto/pgp.h index 3022abf..f856e07 100644 --- a/contrib/pgcrypto/pgp.h +++ b/contrib/pgcrypto/pgp.h @@ -29,6 +29,9 @@ * contrib/pgcrypto/pgp.h */ +#include mbuf.h +#include px.h + enum PGP_S2K_TYPE { PGP_S2K_SIMPLE = 0, Does that look reasonable? It's a style question - currently pgp.h expects other headers to be included in .c files. Including them in pgp.h might be better style, but then please sync .c files with it too. -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Polymorphic function calls
On 12/30/2013 01:22 AM, Sergey Konoplev wrote: On Sun, Dec 29, 2013 at 8:44 AM, knizhnik knizh...@garret.ru wrote: create function volume(r base_table) returns integer as $$ begin return r.x*r.y; end; $$ language plpgsql strict stable; create function volume(r derived_table) returns integer as $$ begin return r.x*r.y*r.z; end; $$ language plpgsql strict stable; [...] Execution plans of first and second queries are very similar. The difference is that type of r_1 in first query is base_table. It is obvious that query should return fixed set of columns, so select * from base_table; can not return z column. But passing direved_table type instead of base_table type to volume() function for record belonging to derived_table seems to be possible and not contradicting something, isn't it? Correct. Postgres chooses a function based on the passed signature. When you specify base_table it will choose volume(base_table) and for base_table it will be volume(derived_table) as well. I think you mean and for derived_table it will be base_table as well. Certainly I understand the reasons of such behavior and that it will be difficult to introduce some other non-contradictory model... But polymorphism is one of the basic features of OO approach. Definitely PostgreSQL is not OODBMS. But it supports inheritance. And I wonder if it is possible/desirable to make an exception for function invocation rules for derived tables? Actually a query on table with inherited tables is implemented as join of several independent table traversals. But for all derived tables we restrict context to the scope of base table. If it is possible to leave real record type when it is passed to some function, we can support polymorphic calls... Will it violate some principles/rules? I am not sure... FYI, there is a common practice to follow the DRY principle with inheritance and polymorphic functions in Postgres. On your example it might be shown like this: create function volume(r base_table) returns integer as $$ begin return r.x*r.y; end; $$ language plpgsql strict stable; create function volume(r derived_table) returns integer as $$ begin return volume(r::base_table) *r.z; end; $$ language plpgsql strict stable; I do not completely understand you here. Function of derived class can refere to the overridden function when it's result depends on result of the overridden function. But it is not always true, for example you can derived class Circle from Ellipse, but formula for circle volume do not need to use implementation of volume for ellipse. In any case, my question was not how to implement polymorphic volume function. I asked whether it is possible to change PostgreSQL function lookup rules to support something like virtual calls. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Upgrade to Autoconf 2.69
On 2013-12-29 02:48:21 -0500, Tom Lane wrote: 4. The server tries to start, and fails because it can't find a WAL file containing the last checkpoint record. This is pretty unsurprising given the facts above. The reason you don't see any no such file report is that XLogFileRead() will report any BasicOpenFile() failure *other than* ENOENT. And nothing else makes up for that. Re point 4: the logic, if you can call it that, in xlog.c and xlogreader.c is making my head spin. There are about four levels of overcomplicated and undercommented code before you ever get down to XLogFileRead, so I have no idea which level to blame for the lack of error reporting in this specific case. But there are pretty clearly some cases in which ignoring ENOENT in XLogFileRead isn't such a good idea, and XLogFileRead isn't being told when to do that or not. Yes, that code is pretty horrid. To Heikki's and my defense, I don't think the xlogreader.c split had much to do with it tho. I think the path erroring out essentially is ReadRecord()-XLogReadRecord()*-ReadPageInternal()*-XLogPageRead() -WaitForWALToBecomeAvailable()-XLogFileReadAnyTLI()-XLogFileRead() The *ed functions are new, but it's really code that was in ReadRecord() before. So I don't think too much has changed since 9.0ish, although the timeline switch didn't make it simpler. As far as I can tell XLogFileRead() actually is told when it's ok to ignore an error - the notfoundOK parameter. It's just that we're always passing true for it we're not streaming... I think it might be sufficient to make passing that flag additionally conditional on fetching_ckpt, that's already passed to WaitForWALToBecomeAvailable(), so we'd just need to add it to XLogFileReadAnyTLI(). Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
On 12/30/2013 05:57 AM, Peter Geoghegan wrote: Now, when you actually posted the prototype, I realized that it was the same basic design that I'd cited in my very first e-mail on the IGNORE patch (the one that didn't have row locking at all) - nobody else wanted to do heap insertion first for promise tuples. I read that 2007 thread [1] a long time ago, but that recognition only came when you posted your first prototype, or perhaps shortly before when you started participating on list. Ah, I didn't remember that thread. Yeah, apparently I proposed the exact same design back then. Simon complained about the dead tuples being left behind, but I don't think that's a big issue with the design we've been playing around now; you only end up with dead tuples when two backends try to insert the same value concurrently, which shouldn't happen very often. Other than that, there wasn't any discussion on whether that's a good approach or not. In short, I think that my approach may be better because it doesn't conflate row locking with value locking (therefore making it easier to reason about, maintaining modularity), You keep saying that, but I don't understand what you mean. With your approach, an already-inserted heap tuple acts like a value lock, just like in my approach. You have the page-level locks on b-tree pages in addition to that, but the heap-tuple based mechanism is there too. I'm not sure that this is essential to your design, and I'm not sure what your thoughts are on this, but Andres has defended the idea of promise tuples that lock old values indefinitely pending the outcome of another xact where we'll probably want to update, and I think this is a bad idea. Andres recently seemed less convinced of this than in the past [2], but I'd like to hear your thoughts. It's very pertinent, because I think releasing locks needs to be cheap, and rendering old promise tuples invisible is not cheap. Well, killing an old promise tuple is not cheap, but it shouldn't happen often. In both approaches, what probably matters more is the overhead of the extra heavy-weight locking. But this is all speculation, until we see some benchmarks. I said that I have limited enthusiasm for expanding the modularity violation that exists within the btree AM. Based on what Andres has said in the recent past on this thread about the current btree code, that in my opinion, bt_check_unique() doing [locking heap and btree buffers concurrently] is a bug that needs fixing [3], can you really blame me? What this patch does not need is another controversy. It seems pretty reasonable and sane that we'd implement this by generalizing from the existing mechanism. _bt_check_unique() is a modularity violation, agreed. Beauty is in the eye of the beholder, I guess, but I don't see either patch making that any better or worse. Now, enough with the venting. Back to drawing board, to figure out how best to fix the deadlock issue with the insert_on_dup-kill-on-conflict-2.patch. Please help me with that. I will help you. I'll look at it tomorrow. Thanks! [1] http://www.postgresql.org/message-id/1172858409.3760.1618.ca...@silverbirch.site [2] http://www.postgresql.org/message-id/20131227075453.gb17...@alap2.anarazel.de [3] http://www.postgresql.org/message-id/20130914221524.gf4...@awork2.anarazel.de - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
On 2013-12-29 19:57:31 -0800, Peter Geoghegan wrote: On Sun, Dec 29, 2013 at 8:18 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: My position is not based on a gut feeling. It is based on carefully considering the interactions of the constituent parts, plus the experience of actually building a working prototype. I also carefully considered all that stuff, and reached a different conclusion. Plus I also actually built a working prototype (for some approximation of working - it's still a prototype). Well, clearly you're in agreement with me about unprincipled deadlocking. That's what I was referring to here. Maybe you should describe what you mean with unprincipled. Sure, the current patch deadlocks, but I don't see anything fundamental, unresolvable there. So I don't understand what the word unprincipled means in that sentence.. Andres recently seemed less convinced of this than in the past [2], but I'd like to hear your thoughts. Not really, I just don't have the time/energy to fight for it (aka write a POC) at the moment. I still think any form of promise tuple, be it index, or heap based, it's a much better, more general, approach than yours. That doesn't preclude other approaches from being workable though. I didn't say that locking index pages is obviously better, and I certainly didn't say anything about what you've done being fundamentally flawed. I said that I have limited enthusiasm for expanding the modularity violation that exists within the btree AM. Based on what Andres has said in the recent past on this thread about the current btree code, that in my opinion, bt_check_unique() doing [locking heap and btree buffers concurrently] is a bug that needs fixing [3], can you really blame me? Uh. But that was said in the context of *your* approach being flawed. Because it - at that time, I didn't look at the newest version yet - extended the concept of holding btree page locks across external operation to far much more code, even including user defined code!. And you argued that that isn't a problem using _bt_check_unique() as an argument. I don't really see why your patch is less of a modularity violation than Heikki's POC. It's just a different direction. PS. In btreelock_insert_on_dup_v5.2013_12_28.patch, the language used in the additional text in README is quite difficult to read. Too many difficult sentences and constructs for a non-native English speaker like me. I had to look up concomitantly in a dictionary and I'm still not sure I understand that sentence :-). +1 on the simpler language front as a fellow non-native speaker. Personally, the biggest thing I think you could do in favor of your position, is trying to be a bit more succinct in the mailing list discussions. I certainly fail at times at that as well, but I really try to work on it... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Polymorphic function calls
On Mon, Dec 30, 2013 at 2:03 AM, knizhnik knizh...@garret.ru wrote: On 12/30/2013 01:22 AM, Sergey Konoplev wrote: On Sun, Dec 29, 2013 at 8:44 AM, knizhnik knizh...@garret.ru wrote: But passing direved_table type instead of base_table type to volume() function for record belonging to derived_table seems to be possible and not contradicting something, isn't it? Correct. Postgres chooses a function based on the passed signature. When you specify base_table it will choose volume(base_table) and for base_table it will be volume(derived_table) as well. I think you mean and for derived_table it will be base_table as well. Sure. Sorry for the typo. Certainly I understand the reasons of such behavior and that it will be difficult to introduce some other non-contradictory model... But polymorphism is one of the basic features of OO approach. Definitely PostgreSQL is not OODBMS. But it supports inheritance. And I wonder if it is possible/desirable to make an exception for function invocation rules for derived tables? Actually a query on table with inherited tables is implemented as join of several independent table traversals. I would say it is more like union, But for all derived tables we restrict context to the scope of base table. If it is possible to leave real record type when it is passed to some function, we can support polymorphic calls... Will it violate some principles/rules? I am not sure... Well, it is not implemented in Postgres. FYI, there is a common practice to follow the DRY principle with inheritance and polymorphic functions in Postgres. On your example it might be shown like this: create function volume(r base_table) returns integer as $$ begin return r.x*r.y; end; $$ language plpgsql strict stable; create function volume(r derived_table) returns integer as $$ begin return volume(r::base_table) *r.z; end; $$ language plpgsql strict stable; I do not completely understand you here. Function of derived class can refere to the overridden function when it's result depends on result of the overridden function. But it is not always true, for example you can derived class Circle from Ellipse, but formula for circle volume do not need to use implementation of volume for ellipse. Sure, it is not universal thing. In any case, my question was not how to implement polymorphic volume function. I asked whether it is possible to change PostgreSQL function lookup rules to support something like virtual calls. You can use this hack to make it. create or replace function volume(r base_table, t text) returns integer as $$ declare v integer; begin execute format('select volume(r) from %I r where r.x = %L', t, r.x) into v; return v; end; $$ language plpgsql; select volume(r, tableoid::regclass::text) from base_table r; volume 2 60 (2 rows) -- Kind regards, Sergey Konoplev PostgreSQL Consultant and DBA http://www.linkedin.com/in/grayhemp +1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979 gray...@gmail.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Patch: Show process IDs of processes holding a lock; show relation and tuple infos of a lock to acquire
Hi there, I created two patches improving the log messages generated by log_lock_waits. The first patch shows the process IDs holding a lock we try to acquire (show_pids_in_lock_log.patch), sample output (log_lock_waits=on required): session 1: BEGIN; LOCK TABLE foo IN SHARE MODE; session 2: BEGIN; LOCK TABLE foo IN SHARE MODE; session 3: BEGIN; LOCK TABLE foo IN EXCLUSIVE MODE; Output w/o patch: LOG: process 13777 still waiting for ExclusiveLock on relation 16385 of database 16384 after 1000.030 ms Output with patch: LOG: process 13777 still waiting for ExclusiveLock on relation 16385 of database 16384 after 1000.030 ms CONTEXT: processes owning lock: 13775, 13776 The second patch (show_table_name_and_tuple_in_lock_log.patch) includes relation info (table name and OID) as well as some tuple information (if available). Sample output (log_lock_waits=on required): session 1: CREATE TABLE foo (val integer); INSERT INTO foo (val) VALUES (1); BEGIN; UPDATE foo SET val = 3; session 2: BEGIN; UPDATE TABLE foo SET val = 2; Output w/o patch: LOG: process 24774 acquired ShareLock on transaction 696 after 11688.720 ms Output with patch: LOG: process 24774 acquired ShareLock on transaction 696 after 11688.720 ms CONTEXT: relation name: foo (OID 16385) tuple (ctid (0,1)): (1) Regarding this patch I am not really sure where to put the functions. Currently they are located in backend/storage/lmgr/lmgr.c because XactLockTableWait() is located there, too. What do you think? I also created two test specs for easy creation of the log output; however, I was not able to provide an expected file since the process IDs vary from test run to test run. Regards, -- Christian Kruse http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c index 122afb2..fdfacdf 100644 --- a/src/backend/storage/lmgr/proc.c +++ b/src/backend/storage/lmgr/proc.c @@ -1195,13 +1195,19 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable) */ if (log_lock_waits deadlock_state != DS_NOT_YET_CHECKED) { - StringInfoData buf; + StringInfoData buf, + buf1; const char *modename; long secs; int usecs; long msecs; + SHM_QUEUE *procLocks; + PROCLOCK *proclock; + bool first = true; + int lockHoldersNum = 0; initStringInfo(buf); + initStringInfo(buf1); DescribeLockTag(buf, locallock-tag.lock); modename = GetLockmodeName(locallock-tag.lock.locktag_lockmethodid, lockmode); @@ -1211,10 +1217,42 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable) msecs = secs * 1000 + usecs / 1000; usecs = usecs % 1000; + LWLockAcquire(partitionLock, LW_SHARED); + + procLocks = (lock-procLocks); + proclock = (PROCLOCK *) SHMQueueNext(procLocks, procLocks, + offsetof(PROCLOCK, lockLink)); + + while (proclock) + { +if (MyProc-pid != proclock-tag.myProc-pid) +{ + if (first) + { + appendStringInfo(buf1, %d, + proclock-tag.myProc-pid); + first = false; + } + else + appendStringInfo(buf1, , %d, + proclock-tag.myProc-pid); + + lockHoldersNum++; +} + +proclock = (PROCLOCK *) SHMQueueNext(procLocks, proclock-lockLink, + offsetof(PROCLOCK, lockLink)); + } + + LWLockRelease(partitionLock); + if (deadlock_state == DS_SOFT_DEADLOCK) ereport(LOG, (errmsg(process %d avoided deadlock for %s on %s by rearranging queue order after %ld.%03d ms, - MyProcPid, modename, buf.data, msecs, usecs))); +MyProcPid, modename, buf.data, msecs, usecs), + (errcontext(ngettext(process owning lock: %s, + processes owning lock: %s, + lockHoldersNum), buf1.data; else if (deadlock_state == DS_HARD_DEADLOCK) { /* @@ -1226,13 +1264,19 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable) */ ereport(LOG, (errmsg(process %d detected deadlock while waiting for %s on %s after %ld.%03d ms, - MyProcPid, modename, buf.data, msecs, usecs))); +MyProcPid, modename, buf.data, msecs, usecs), + (errcontext(ngettext(process owning lock: %s, + processes owning lock: %s, + lockHoldersNum), buf1.data; } if (myWaitStatus == STATUS_WAITING) ereport(LOG, (errmsg(process %d still waiting for %s on %s after %ld.%03d ms, - MyProcPid, modename, buf.data, msecs, usecs))); +MyProcPid, modename, buf.data, msecs, usecs), + (errcontext(ngettext(process owning lock: %s, + processes owning lock: %s, + lockHoldersNum), buf1.data; else if (myWaitStatus == STATUS_OK) ereport(LOG, (errmsg(process %d acquired %s on %s after %ld.%03d ms, @@ -1252,7 +1296,10 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable) if (deadlock_state !=
Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
On Mon, Dec 30, 2013 at 8:19 AM, Andres Freund and...@2ndquadrant.com wrote: Maybe you should describe what you mean with unprincipled. Sure, the current patch deadlocks, but I don't see anything fundamental, unresolvable there. So I don't understand what the word unprincipled means in that sentence.. Maybe it is resolvable, and maybe it's worth resolving - I never said that it wasn't, I just said that I doubt the latter. By unprincipled deadlocking, I mean deadlocking that cannot be reasonably prevented by a user. Currently, I think that never deadlocking is a reasonable aspiration for all applications. It's never really necessary. When it occurs, we can advise users to do simple analysis and application refactoring to prevent it. With unprincipled deadlocking, we can give no such advice. The only advice we can give is to stop doing so much upserting, which is a big departure from how things are today. AFAICT, no one disagrees with my view that this is bad, and probably unacceptable. Uh. But that was said in the context of *your* approach being flawed. Because it - at that time, I didn't look at the newest version yet - extended the concept of holding btree page locks across external operation to far much more code, even including user defined code!. And you argued that that isn't a problem using _bt_check_unique() as an argument. That's a distortion of my position at the time. I acknowledged from the start that all buffer locking was problematic (e.g. [1]), and was exploring alternative locking approaches and the merit of the design. This is obviously the kind of project that needs to be worked at through iterative prototyping. While arguing that deadlocking would not occur, I lost sight of the bigger picture. But even if that wasn't true, I don't know why you feel the need to go on and on about buffer locking like this months later. Are you trying to be provocative? Can you *please* stop? Everyone knows that the btree heap access is a modularity violation. Even the AM docs says that the heap access is without a doubt ugly and non-modular. So my original point remains, which is that expanding that is obviously going to be controversial, and probably legitimately so. I thought that your earlier marks on _bt_check_unique() were a good example of this sentiment, but I hardly need such an example. I don't really see why your patch is less of a modularity violation than Heikki's POC. It's just a different direction. My approach does not regress modularity because it doesn't do anything extra with the heap at all, and only btree insertion routines are affected. Locking is totally localized to the btree insertion routines - one .c file. At no other point does anything else have to care, and it's obvious that this situation won't change in the future when we decide to do something else cute with infomask bits or whatever. That's a *huge* distinction. [1] http://www.postgresql.org/message-id/cam3swzr2x4hjg7rjn0k4+hfdgucyx2prep0y3a7nccejeow...@mail.gmail.com -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
On 2013-12-30 12:29:22 -0800, Peter Geoghegan wrote: But even if that wasn't true, I don't know why you feel the need to go on and on about buffer locking like this months later. Are you trying to be provocative? Can you *please* stop? ERR? Peter? *You* quoted a statement of mine that only made sense in it's original context. And I *did* say that the point about buffer locking applied to the *past* version of the patch. Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
On 12/30/2013 12:45 PM, Andres Freund wrote: On 2013-12-30 12:29:22 -0800, Peter Geoghegan wrote: But even if that wasn't true, I don't know why you feel the need to go on and on about buffer locking like this months later. Are you trying to be provocative? Can you *please* stop? ERR? Peter? *You* quoted a statement of mine that only made sense in it's original context. And I *did* say that the point about buffer locking applied to the *past* version of the patch. Alright this seems to have gone from confusion about the proposal to confusion about the confusion. Might I suggest a cooling off period and a return to the discussion in possibly a Wiki page where the points/counter points could be laid out more efficiently? Andres -- Adrian Klaver adrian.kla...@gmail.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
On Mon, Dec 30, 2013 at 7:22 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Ah, I didn't remember that thread. Yeah, apparently I proposed the exact same design back then. Simon complained about the dead tuples being left behind, but I don't think that's a big issue with the design we've been playing around now; you only end up with dead tuples when two backends try to insert the same value concurrently, which shouldn't happen very often. Right, because you check first, which also has a cost, paid in CPU cycles and memory bandwidth, and buffer lock contention. As opposed to a cost almost entirely localized to inserters into a single leaf page per unique index for only an instant. You're checking *all* unique indexes. You call check_exclusion_or_unique_constraint() once per unique index (or EC), and specify to wait on the xact, at least until a conflict is found. So if you're waiting on an xact, your conclusion that earlier unique indexes had no conflicts could soon become totally obsolete. So for non-idiomatic usage, say like the usage Andres in particular cares about for MM conflict resolution, I worry about the implications of that. I'm not asserting that it's a problem, but it does seem like something that's quite hard to reason about. Maybe Andres can comment. In short, I think that my approach may be better because it doesn't conflate row locking with value locking (therefore making it easier to reason about, maintaining modularity), You keep saying that, but I don't understand what you mean. With your approach, an already-inserted heap tuple acts like a value lock, just like in my approach. You have the page-level locks on b-tree pages in addition to that, but the heap-tuple based mechanism is there too. Yes, but that historic behavior isn't really value locking at all. That's very much like row locking, because there is a row, not the uncertain intent to try to insert a row. Provided your transaction commits and the client's transaction doesn't delete the row, the row is definitely there. For upsert, conflicts may well be the norm, not the exception. Value locking is the exclusive lock on the buffer held during _bt_check_unique(). I'm trying to safely extend that mechanism, to reach consensus among unique indexes, which to me seems the most logical and conservative approach. For idiomatic usage, it's only sensible for there to be a conflict on one unique index, known ahead of time. If you don't know where the conflict will be, then typically your DML statement is unpredictable, just like the MySQL feature. Otherwise, for MM conflict resolution, I think it makes sense to pick those conflicts off, one at a time, dealing with exactly one row per conflict. I mean, even with your approach, you're still not dealing with later conflicts in later unique indexes, right? The fact that you prevent conflicts on previously non-conflicting unique indexes only, and, I suppose, not later ones too, seems quite arbitrary. Well, killing an old promise tuple is not cheap, but it shouldn't happen often. In both approaches, what probably matters more is the overhead of the extra heavy-weight locking. But this is all speculation, until we see some benchmarks. Fair enough. We'll know more when we have fixed the exclusion constraint supporting patch, which will allow us to make a fair comparison. I'm working on it. Although I'd suggest that having dead duplicates in indexes where that's avoidable is a cost that isn't necessarily that easily characterized. I especially don't like that you're currently doing the UNIQUE_CHECK_PARTIAL deferred unique constraint thing of always inserting, continuing on for *all* unique indexes regardless of finding a duplicate. Whatever overhead my approach may imply around lock contention, clearly the cost to index scans is zero. The other thing is that if you're holding old value locks (i.e. previously inserted btree tuples, from earlier unique indexes) pending resolving a value conflict, you're holding those value locks indefinitely pending the completion of the other guy's xact, just in case there ends up being no conflict, which in general is unlikely. So in extreme cases, that could be the difference between waiting all day (on someone sitting on a lock that they very probably have no use for), and not waiting at all. _bt_check_unique() is a modularity violation, agreed. Beauty is in the eye of the beholder, I guess, but I don't see either patch making that any better or worse. Clearly the way in which you envisage releasing locks to prevent unprincipled deadlocking implies that btree has to know more about the heap, and maybe even that the heap has to know something about btree, or at least about amcanunique AMs (including possible future amcanunique AMs that may or may not be well suited to implementing this the same way). -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your
Re: [HACKERS] Proposal: variant of regclass
It seems fine to me if the new function ignores the specific error of relation does not exist while continuing to throw other errors. Thanks. Here is the revised conceptual patch. I'm going to post a concrete patch and register it to 2014-01 Commit Fest. Before proceeding the work, I would like to make sure that followings are complete list of new functions. Inside parentheses are corresponding original functions. toregproc (regproc) toregoper (regoper) toregclass (regclass) toregtype (regtype) In addition to above, we have regprocedure, regoperator, regconfig, pg_ts_config and regdictionary. However I don't see much use case/users for new functions corresponding to them. Opinions? Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: variant of regclass
On 12/31/2013 02:38 AM, Tatsuo Ishii wrote: Before proceeding the work, I would like to make sure that followings are complete list of new functions. Inside parentheses are corresponding original functions. toregproc (regproc) toregoper (regoper) toregclass (regclass) toregtype (regtype) Pardon the bikeshedding, but those are hard to read for me. I would prefer to go with the to_timestamp() model and add an underscore to those names. -- Vik -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] truncating pg_multixact/members
Alvaro Herrera wrote: 1. slru.c doesn't consider file names longer than 4 hexadecimal chars. For 9.3, I propose we skip this and tweak the code to consider files whose names are 4 or 5 chars in length, to remain compatible with existing installations that have pg_multixact/member having a mixture of 4-char and 5-char file names. Attached is a patch for this. 2. pg_multixact/members truncation requires more intelligence to avoid removing files that are still needed. Right now we use modulo-2^32 arithmetic, but this doesn't work because the useful range can span longer than what we can keep within that range. #2c At start of truncation, save end-of-range in MultiXactState. This state is updated by GetNewMultiXactId as new files are created. That way, before each new file is created, the truncation routine knows to skip it. Attached is a patch implementing this. I also attach a patch implementing a burn multixact utility, initially coded by Andres Freund, tweaked by me. I used it to run a bunch of wraparound cycles and everything seems to behave as expected. (I don't recommend applying this patch; I'm posting merely because it's a very useful debugging tool.) One problem I see is length of time before freezing multis: they live for far too long, causing the SLRU files to eat way too much disk space. I ran burnmulti in a loop, creating multis of 3 members each, with a min freeze age of 50 million, and this leads to ~770 files in pg_multixact/offsets and ~2900 files in pg_multixact/members. Each file is 32 pages long. 256kB apiece. Probably enough to be bothersome. I think for computing the freezing point for multis, we should slash min_freeze_age by 10 or something like that. Or just set a hardcoded one million. 3. New pg_multixact/members generation requires more intelligence to avoid stomping on files from the previous wraparound cycle. Right now there is no defense against this at all. I still have no idea how to attack this. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services From b264bfe61b315a5f65d72b9550592cc9f73bf0a8 Mon Sep 17 00:00:00 2001 From: Alvaro Herrera alvhe...@alvh.no-ip.org Date: Tue, 31 Dec 2013 00:42:11 -0300 Subject: [PATCH 1/3] pg_burn_multixact utility Andres Freund, minor tweaks by me --- contrib/pageinspect/heapfuncs.c | 42 ++ contrib/pageinspect/pageinspect--1.1.sql |5 src/backend/access/heap/heapam.c |2 +- src/backend/access/transam/multixact.c | 12 + src/include/access/multixact.h |3 ++- 5 files changed, 57 insertions(+), 7 deletions(-) diff --git a/contrib/pageinspect/heapfuncs.c b/contrib/pageinspect/heapfuncs.c index 6d8f6f1..93a3317 100644 --- a/contrib/pageinspect/heapfuncs.c +++ b/contrib/pageinspect/heapfuncs.c @@ -29,6 +29,8 @@ #include funcapi.h #include utils/builtins.h #include miscadmin.h +#include access/multixact.h +#include access/transam.h Datum heap_page_items(PG_FUNCTION_ARGS); @@ -224,3 +226,43 @@ heap_page_items(PG_FUNCTION_ARGS) else SRF_RETURN_DONE(fctx); } + +extern Datum +pg_burn_multixact(PG_FUNCTION_ARGS); +PG_FUNCTION_INFO_V1(pg_burn_multixact); + +Datum +pg_burn_multixact(PG_FUNCTION_ARGS) +{ + int rep = PG_GETARG_INT32(0); + int size = PG_GETARG_INT32(1); + MultiXactMember *members; + MultiXactId ret; + TransactionId id = ReadNewTransactionId() - size; + int i; + + if (rep 1) + elog(ERROR, need to burn, burn, burn); + + members = palloc(size * sizeof(MultiXactMember)); + for (i = 0; i size; i++) + { + members[i].xid = id++; + members[i].status = MultiXactStatusForShare; + + if (!TransactionIdIsNormal(members[i].xid)) + { + id = FirstNormalTransactionId; + members[i].xid = id++; + } + } + + MultiXactIdSetOldestMember(); + + for (i = 0; i rep; i++) + { + ret = MultiXactIdCreateFromMembers(size, members, true); + } + + PG_RETURN_INT64((int64) ret); +} diff --git a/contrib/pageinspect/pageinspect--1.1.sql b/contrib/pageinspect/pageinspect--1.1.sql index 22a47d5..b895246 100644 --- a/contrib/pageinspect/pageinspect--1.1.sql +++ b/contrib/pageinspect/pageinspect--1.1.sql @@ -105,3 +105,8 @@ CREATE FUNCTION fsm_page_contents(IN page bytea) RETURNS text AS 'MODULE_PATHNAME', 'fsm_page_contents' LANGUAGE C STRICT; + +CREATE FUNCTION pg_burn_multixact(num int4, size int4) +RETURNS int4 +AS 'MODULE_PATHNAME', 'pg_burn_multixact' +LANGUAGE C STRICT; diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index 90e9e6f..1b8469f 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -5544,7 +5544,7 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask, * Create a new multixact with the surviving members of the previous * one, to set as new Xmax in the tuple. */ - xid = MultiXactIdCreateFromMembers(nnewmembers, newmembers); + xid =
Re: [HACKERS] [PATCH] Regression tests in windows ignore white space
On Mon, Dec 30, 2013 at 11:08 AM, David Rowley dgrowle...@gmail.com wrote: On Mon, Dec 30, 2013 at 6:02 PM, Amit Kapila amit.kapil...@gmail.com wrote: For Windows build, I am using whatever latest Git provides rather than downloading individual components which might not be good, but I find it convenient. The latest Git (1.8.4) download on windows still provides 2.7, which is the reason I am on older version. However I agree that it is better to use latest version. It looks like the diff version I'm using is from msys and msys is what is in my search path rather than the git\bin path. To be honest I didn't realise that git for windows came with bison and flex, (at least I see bison.exe and flex.exe in the git\bin path.. I don't remember me putting them there) Yes, git for windows came with bison,flex, etc required for PG build. I don't seem to even have git\bin in my %path% environment variable at all, so all those tools are being picked up from the msys path. I'd need to remind myself about the msys install process, but since we're already saying in the docs that msys should be installed, I think it is one of the ways of set up on Windows, not every user uses from msys. then would the fix for this not just be as simple as my patch plus a note in the docs to say to ensure msys\bin occurs before git\bin in the %path% environment var, minimum supported diff version is 2.8. To be honest, I am also not fully aware of the project policy in such a matter where we have to mention the minimum version of dependent executable (diff). I request some senior members to please let us know whether we can make PG dependent on diff version = 2.8 (atleast for windows), as there are some of the options which are not available in older versions. Another option could be that if version of diff available is = 2.8, then use '--strip-trailing-cr', else use existing option '-w', but not sure if it will serve the purpose completely. Did you install msys? if so does it have a later version of diff? No. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: variant of regclass
On 12/31/2013 02:38 AM, Tatsuo Ishii wrote: Before proceeding the work, I would like to make sure that followings are complete list of new functions. Inside parentheses are corresponding original functions. toregproc (regproc) toregoper (regoper) toregclass (regclass) toregtype (regtype) Pardon the bikeshedding, but those are hard to read for me. I would prefer to go with the to_timestamp() model and add an underscore to those names. I have no problem with adding to_. Objection? Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] fix_PGSTAT_NUM_TABENTRIES_macro patch
In src/include/pgstat.h, the PGSTAT_NUM_TABENTRIES macro attempts to subtract off the size of the PgStat_MsgTabstat struct up to the m_entry[] field. This macro was correct up until the fields m_block_read_time and m_block_write_time were added to that struct, as the macro was not changed to include their size. This patch fixes the macro. As an aside, the PGSTAT_MSG_PAYLOAD define from which these field sizes are being subtracted is a bit of a WAG, if I am reading the code correctly, in which case whether the two additional fields are subtracted from that WAG is perhaps not critical. But the code is certainly easier to understand if the macro matches the definition of the struct. Mark Dilger fix_PGSTAT_NUM_TABENTRIES_macro Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
On Sun, Dec 29, 2013 at 9:09 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: While mulling this over further, I had an idea about this: suppose we marked the tuple in some fashion that indicates that it's a promise tuple. I imagine an infomask bit, although the concept makes me wince a bit since we don't exactly have bit space coming out of our ears there. Leaving that aside for the moment, whenever somebody looks at the tuple with a mind to calling XactLockTableWait(), they can see that it's a promise tuple and decide to wait on some other heavyweight lock instead. The simplest thing might be for us to acquire a heavyweight lock on the promise tuple before making index entries for it, and then have callers wait on that instead always instead of transitioning from the tuple lock to the xact lock. Yeah, that seems like it should work. You might not even need an infomask bit for that; just take the other heavyweight lock always before calling XactLockTableWait(), whether it's a promise tuple or not. If it's not, acquiring the extra lock is a waste of time but if you're going to sleep anyway, the overhead of one extra lock acquisition hardly matters. Are you suggesting that I lock the tuple only (say, through a special LockPromiseTuple() call), or lock the tuple *and* call XactLockTableWait() afterwards? You and Robert don't seem to be in agreement about which here. From here on I assume Robert's idea (only get the special promise lock where appropriate), because that makes more sense to me. I've taken a look at this idea, but got frustrated. You're definitely going to need an infomask bit for this. Otherwise, how do you differentiate between a pending promise tuple and a fulfilled promise tuple (or a tuple that never had anything to do with promises in the first place)? You'll want to wake up as soon as it becomes clear that the former is not going to become the latter on the one hand. On the other hand, you really will want to wait until xact end on the pending promise tuple when it becomes a fulfilled promise, or on an already-fulfilled promise tuple, or a plain old tuple. It's either locking the promise tuple, or locking the xid; never both, because the combination makes no sense to any case (unless you're talking about the case where you lock the promise tuple and then later *somehow* decide that you need to lock the xid as the upserter releases promise tuple locks directly within ExecInsert() upon successful insertion). The fact that your LockPromiseTuple() call didn't find someone else with the lock does not mean no one ever promised the tuple (assuming no infomask bit has the relevant info). Obviously you can't just have upserters hold on to the promise tuple locks until xact end if the promiser's insertion succeeds, for the same reason we don't with regular in-memory tuple locks: they're totally unbounded. So not only are you going to need an infomask promise bit, you're going to need to go and unset the bit in the event of a *successful* insertion, so that waiters know to wait on your xact now when you finally UnlockPromiseTuple() within ExecInsert() to finish off successful insertion. *And*, all XactLockTableWait() promise waiters need to go back and check that just-in-case. This problem illustrates what I mean about conflating row locking with value locking. I think the interlocking with buffer locks and heavyweight locks to make that work could be complex. Hmm. Can you elaborate? What I meant is that you should be wary of what you go on to describe below. The inserter has to acquire the heavyweight lock before releasing the buffer lock, because otherwise another inserter (or deleter or updater) might see the tuple, acquire the heavyweight lock, and fall to sleep on XactLockTableWait(), before the inserter has grabbed the heavyweight lock. If that race condition happens, you have the original problem again, ie. the updater unnecessarily waits for the inserting transaction to finish, even though it already killed the tuple it inserted. Right. Can you suggest a workaround to the above problems? -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers