Re: [HACKERS] determine snapshot after obtaining locks for first statement
Hi, Quoting Tom Lane t...@sss.pgh.pa.us: Not at the table level. If you could lock only at the tuple level maybe you'd have something AFAIUI this is about the tuple level lock, yes. but it seems like you can't find the target tuples without having acquired a snapshot. Maybe not *the* target tuple, but we could certainly find candidate target tuples. Of course it's impossible to determine visibility without a snapshot (and thus find *the* one). But it seems to me it might suffice to optimistically pick a plausible tuple (i.e. determined by a candidate snapshot) and try to lock that. Only after we hold the lock, we get a real snapshot and re-check visibility of the tuple we've locked. If it's the visible target tuple we want to update we are all fine, if not another transaction updated the tuple and we have to look for the new version of that tuple. As we've just taken a new snapshot *after* that other transaction updating the tuple of interest, we now see the *new* tuple and can continue with our transaction normally (instead of having to abort the transaction with a serialization failure). So yes, to me this looks like a theoretically possible performance gain. It certainly only helps the very first tuple write. And it seems to apply to SERIALIZABLE transactions exclusively. Another minor gotcha exists, though. There's another possible cause for the visibility check to fail: our initial pick with the candidate snapshot might have been wrong. In that unfortunate case we can continue as described, but it's worth mentioning that we were waiting for the wrong lock (i.e. a tuple that's not visible according to the real snapshot might have been one from an aborted transaction, for example). The candidate snapshot should thus be rather good, but that's not much of an issue, I think. If we want to completely get rid of serialization failures in the first (writing) operation within a transaction, we'd have to repeat these steps after a visibility check fails. Meaning having to degrade the real snapshot acquired after the first lock to a candidate snapshot for the second tuple lock we try. Maybe candidate snapshots is a good short name for such a feature? Another line of thought: isn't this like READ COMMITTED for just the first operation in a SERIALIZABLE transaction? Regards Markus Wanner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUG?] strange behavior in ALTER TABLE ... RENAME TO on inherited columns
It is a patch for the matter which I reported before. When a column is inherited from multiple relations, ALTER TABLE with RENAME TO option is problematic. This patch fixes the matter. In correctly, it prevent to rename columns inherited from multiple relations and merged. Also see the past discussion: http://archives.postgresql.org/pgsql-hackers/2009-11/msg00138.php postgres=# CREATE TABLE t1 (a int, b int); CREATE TABLE postgres=# CREATE TABLE t2 (b int, c int); CREATE TABLE postgres=# CREATE TABLE t3 (x text) inherits (t1, t2); NOTICE: merging multiple inherited definitions of column b CREATE TABLE postgres=# SELECT * FROM t3; a | b | c | x ---+---+---+--- (0 rows) It looks to me fine. postgres=# ALTER TABLE t1 RENAME b TO y; ALTER TABLE postgres=# SELECT * FROM t3; a | y | c | x ---+---+---+--- (0 rows) postgres=# SELECT * FROM t1; a | y ---+--- (0 rows) It looks to me fine. postgres=# SELECT * FROM t2; ERROR: could not find inherited attribute b of relation t3 Oops, when we refer the t3 via t2, it expects the inherited relation also has the column b, but it was already renamed. One trouble is regression test. The misc_create test create a_star table, then it is inherited by b_star and c_star, then these are inherited to d_star table. Then misc test rename the a_star.a, but this patch prevent it. It looks like works well, but it is a corner case, because d_star.a is eventually inherited from a_star via b_star and c_star, and these are all the inherited relations. In generally, we don't have reasonable way to rename all the related columns upper and lower of the inheritance relationship. Thanks, (2009/11/05 9:57), KaiGai Kohei wrote: Tom Lane wrote: Thom Brownthombr...@gmail.com writes: 2009/11/4 Alvaro Herreraalvhe...@commandprompt.com: KaiGai Kohei wrote: I think we should not allow to rename a column with attinhcount 1. I think we should fix ALTER TABLE to cope with multiple inheritance. I'd be interested to see how this should work. Yeah. I don't think a fix is possible, because there is no non-astonishing way for it to behave. I think KaiGai is right that forbidding the rename is the best solution. The attached patch forbids rename when the attribute is inherited from multiple parents. postgres=# CREATE TABLE t1 (a int, b int); CREATE TABLE postgres=# CREATE TABLE t2 (b int, c int); CREATE TABLE postgres=# CREATE TABLE t3 (d int) INHERITS (t1, t2); NOTICE: merging multiple inherited definitions of column b CREATE TABLE postgres=# SELECT * FROM t3; a | b | c | d ---+---+---+--- (0 rows) postgres=# ALTER TABLE t1 RENAME b TO x; ERROR: cannot rename multiple inherited column b The regression test detected a matter in the misc test. It tries to rename column a of a_star table, but it failed due to the new restriction. -- -- test the star operators a bit more thoroughly -- this time, -- throw in lots of NULL fields... -- -- a is the type root -- b and c inherit from a (one-level single inheritance) -- d inherits from b and c (two-level multiple inheritance) -- e inherits from c (two-level single inheritance) -- f inherits from e (three-level single inheritance) -- CREATE TABLE a_star ( class char, a int4 ); CREATE TABLE b_star ( b text ) INHERITS (a_star); CREATE TABLE c_star ( c name ) INHERITS (a_star); CREATE TABLE d_star ( d float8 ) INHERITS (b_star, c_star); At the misc test, --- 242,278 ALTER TABLE c_star* RENAME COLUMN c TO cc; ALTER TABLE b_star* RENAME COLUMN b TO bb; ALTER TABLE a_star* RENAME COLUMN a TO aa; + ERROR: cannot rename multiple inherited column a SELECT class, aa FROM a_star* x WHERE aa ISNULL; ! ERROR: column aa does not exist ! LINE 1: SELECT class, aa ! It seems to me it is a case the regression test to be fixed up. (We don't have any reasonable way to know whether a certain attribute has a same source, or not.) Any comments? -- OSS Platform Development Division, NEC KaiGai Kohei kai...@ak.jp.nec.com *** src/test/regress/sql/inherit.sql (revision 2486) --- src/test/regress/sql/inherit.sql (working copy) *** *** 336,338 --- 336,350 CREATE TABLE inh_error2 (LIKE t4 INCLUDING STORAGE) INHERITS (t1); DROP TABLE t1, t2, t3, t4, t12_storage, t12_comments, t1_inh, t13_inh, t13_like, t_all; + + -- Test for renaming + CREATE TABLE t1 (a int, b int); + CREATE TABLE t2 (b int, c int); + CREATE TABLE t3 (d int) INHERITS(t1, t2); + ALTER TABLE t1 RENAME a TO x; + ALTER TABLE t1 RENAME b TO y; -- to be failed + ALTER TABLE t3 RENAME d TO z; + SELECT * FROM t3; + DROP TABLE t3; + DROP TABLE t2; + DROP TABLE t1; *** src/test/regress/expected/inherit.out
Re: [HACKERS] Streaming replication and non-blocking I/O
On Wed, Dec 16, 2009 at 6:53 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Great! The logical next step is move the handling of TimelineID and system identifier out of libpq as well. All right. 0. Begin by connecting to the master just like a normal backend does. We don't necessarily need the new ProtocolVersion code either, though it's probably still a good idea to reject connections to older server versions. And, I think that such backend should switch to walsender mode when the startup packet arrives. Otherwise, we would have to authenticate such backend twice on different context, i.e., a normal backend and walsender. So the settings for each context would be required in pg_hba.conf. This is odd, I think. Thought? 1. Get the system identifier of the master. Slave - Master: Query message, with a query string like GET_SYSTEM_IDENTIFIER Master - Slave: RowDescription, DataRow CommandComplete, and ReadyForQuery messages. The system identifier is returned in the DataRow message. This is identical to what happens when a query is executed against a normal backend using the simple query protocol, so walsender can use PQexec() for this. s/walsender/walreceiver ? A signal cannot cancel PQexec() during waiting for the message from the server. We might need to change SIGTERM handler of walreceiver so as to call proc_exit() immediately if it's during PQexec(). 2. Another query exchange like above, for timeline ID. (or these two steps can be joined into one query, to eliminate one round-trip). 3. Request a backup history file, if needed: Slave - Master: Query message, with a query string like GET_BACKUP_HISTORY_FILE XXX where XXX is XLogRecPtr or file name. Master - Slave: RowDescription, DataRow CommandComplete and ReadyForQuery messages as usual. The file contents are returned in the DataRow message. 4. Start replication Slave - Master: Query message, with query string START REPLICATION: , where is the RecPtr of the starting point. Master - Slave: CopyOutResponse followed by a continuous stream of CopyData messages with WAL contents. Seems OK. This minimizes the changes to the protocol and libpq, with a clear way of extending by adding new commands. Similar to what you did a long time ago, connecting as an actual backend at first and then switching to walsender mode after running a few queries, but this would all be handled in a separate loop in walsender instead of running as a full-blown backend. Agreed. Only walsender should be allowed to handle the query strings that you proposed, in order that we avoid touching a parser. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Range types
Tom Lane t...@sss.pgh.pa.us writes: Dimitri Fontaine dfonta...@hi-media.com writes: Tom Lane t...@sss.pgh.pa.us writes: foreach p2_member in unnest(p2) loop p1 := array(select period_except(p1_member, p2_member) from unnest(p1) p1_member); end loop; But maybe it can be done in a single SQL command. Yeah, as soon as you have LATERAL, I think. Without it there's no way to compose SRF in SQL, AFAIK. Hm, how would you do it with LATERAL? The problem is not so much composition as the need for a variable number of rounds of composition. Let's have a try at it: select p2_member, array_accum(p1) from unnest(p2) as p2_member lateral (select period_except(p1_member, p2_member) from unnest(p1) p1_member) as x(p1); I'm not sure I understand how the explicit looping over unnest(p2) is different from using lateral, or even if that's what you're talking about when mentioning variable number of rounds. Regards, -- dim -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] How should the notice message from the primary be handled in the standby?
On Tue, Dec 15, 2009 at 12:56 AM, Tom Lane t...@sss.pgh.pa.us wrote: Use PQsetNoticeReceiver. The other one is just there for backwards compatibility. Thanks! I'm thinking of changing nothing about handling of a notice message. Because there is no notice message (from the primary) which is worth being logged also in the standby. Thought? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Update on true serializable techniques in MVCC
Robert Haas wrote: A predicate can include columns from an index plus others. Am I missing something? Hmm, interesting point. In that case you couldn't use the index to enforce predicate locking under MVCC without disabling HOT. But there will be other cases where that wouldn't help anyway - a predicate could also include unindexed columns exclusively. For those, the traditional approach (not the one discussed in this paper) probably requires locking against any heap insert, or checking each new heap insert against the constraint, or... something. If I understand that correctly [...] by acquiring a shared lock on the next row in order, as a scan is made to check whether rows match a predicate. The scan might be through the data records or through an index I would say that in the case of a table scan, the whole table will be SILOCKed. I guess that's pretty much unavoidable if you want serializability. Yours, Laurenz Albe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Update on true serializable techniques in MVCC
[ Forgot the list, resending. ] 2009/12/16 Boszormenyi Zoltan z...@cybertec.at: Robert Haas írta: On Wed, Dec 16, 2009 at 1:25 PM, Robert Haas robertmh...@gmail.com wrote: On Wed, Dec 16, 2009 at 1:14 PM, Alvaro Herrera alvhe...@commandprompt.com wrote: So you'd have to disable HOT updates when true serializability was active? I thought about that, but I don't think so. HOT only applies to updates, and predicate locking only applies to inserts. Unless I have my head in the sand? Err, no, wait. Predicate locking can apply to updates, but since HOT updates never update an indexed column, I think we might still be OK? A predicate can include columns from an index plus others. Am I missing something? This whole concept (next-key locking) also applies in case there are no indexes. In the case of a table scan, the next key is either the next row relative to the scanned range (if the DBMS supports the notion of non-full table scans, for example if the table contents are themselves stored in sorted order), or something that indicates that the whole table was scanned (i.e., a table lock). Therefore, with next-key locking you better don't have too many table scans if you want to have any concurrent transactions. Nicolas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby, release candidate?
On sön, 2009-12-13 at 19:20 +, Simon Riggs wrote: Barring resolving a few points and subject to even more testing, this is the version I expect to commit to CVS on Wednesday. So it's Thursday now. Please keep us updated on the schedule, as we need to decide when to wrap alpha3 and whether to reopen the floodgates for post-CF commits. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: Spurious 22 in hstore.sgml
Magnus Hagander wrote: On Wed, Dec 16, 2009 at 20:34, David E. Wheeler wrote: *** a/doc/src/sgml/hstore.sgml --- b/doc/src/sgml/hstore.sgml Heh, interesting. That clearly shouldn't be there. Applied. Does this count as catch-22? Yours, Laurenz Albe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby, release candidate?
On Thu, 2009-12-17 at 12:01 +0200, Peter Eisentraut wrote: On sön, 2009-12-13 at 19:20 +, Simon Riggs wrote: Barring resolving a few points and subject to even more testing, this is the version I expect to commit to CVS on Wednesday. So it's Thursday now. Please keep us updated on the schedule, as we need to decide when to wrap alpha3 and whether to reopen the floodgates for post-CF commits. I've been looking at a semaphore deadlock problem reported by Hiroyuki Yamada. It's a serious issue, though luckily somewhat restricted in scope. I don't think its wise to rush in with a solution, since that involves some work with semaphores and I could easily make that area less stable by acting too quickly. What I will do now is put in a restriction on lock waits in Hot Standby, which will only happen for Alpha3. That avoids the deadlock issue in an easy and safe way, though it is heavy handed and I aim to replace it fairly soon, following discussion and testing. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: Spurious 22 in hstore.sgml
On Thu, Dec 17, 2009 at 11:01:19AM +0100, Albe Laurenz wrote: Magnus Hagander wrote: On Wed, Dec 16, 2009 at 20:34, David E. Wheeler wrote: *** a/doc/src/sgml/hstore.sgml --- b/doc/src/sgml/hstore.sgml Heh, interesting. That clearly shouldn't be there. Applied. Does this count as catch-22? Patch-22, perhaps? Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby and prepared transactions
On Wed, 2009-12-16 at 19:35 +0900, Hiroyuki Yamada wrote: * There is a window beween gathering lock information in GetRunningTransactionLocks() and writing WAL in LogAccessExclusiveLocks(). * In current lock redo algorithm, locks are released when the transaction holding the lock are commited or aborted. ... then what happens if any transaction holding ACCESS EXCLUSIVE lock commits in the window ? Yes, was a problem in that code. Fixed in git. We were doing it for prepared transactions but not for normal xacts. I will look again at that code. Thanks very much for reading the code. Any more?!? Well, I've read some more and have a question. The implementation assumes that transactions write COMMIT/ABORT WAL at the end of them, while it does not seem to write ABORT WAL in immediate shutdown. So, 1. acquire ACCESS EXCLUSIVE lock in table A in xact 1 2. execute immediate shutdown of the active node 3. restart it 4. acquire ACCESS EXCLUSIVE lock in table A in xact 2 ...then, duplicate lock acquisition by two diffrent transactions can occur in the standby node. Am I missing something ? Or is this already reported ? regards, -- Hiroyuki YAMADA Kokolink Corporation yam...@kokolink.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby and prepared transactions
Hiroyuki Yamada wrote: The implementation assumes that transactions write COMMIT/ABORT WAL at the end of them, while it does not seem to write ABORT WAL in immediate shutdown. So, 1. acquire ACCESS EXCLUSIVE lock in table A in xact 1 2. execute immediate shutdown of the active node 3. restart it 4. acquire ACCESS EXCLUSIVE lock in table A in xact 2 ...then, duplicate lock acquisition by two diffrent transactions can occur in the standby node. Am I missing something ? Or is this already reported ? Replay of the shutdown checkpoint that is written when the server is restarted should abort the old transaction and release its locks. Hmm, looking at the code, I think Simon threw that baby with the bathwater when he removed support for starting standby from a shutdown checkpoint. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby and prepared transactions
On Thu, 2009-12-17 at 19:55 +0900, Hiroyuki Yamada wrote: Well, I've read some more and have a question. The implementation assumes that transactions write COMMIT/ABORT WAL at the end of them, while it does not seem to write ABORT WAL in immediate shutdown. So, 1. acquire ACCESS EXCLUSIVE lock in table A in xact 1 2. execute immediate shutdown of the active node 3. restart it 4. acquire ACCESS EXCLUSIVE lock in table A in xact 2 ...then, duplicate lock acquisition by two diffrent transactions can occur in the standby node. Am I missing something ? Or is this already reported ? This was a tricky point in the design because we already knew that abort records are not always written for every transaction. ProcArrayApplyRecoveryInfo() was specifically designed to prune away older transactions that might have become stuck in that state, which calls StandbyReleaseOldLocks() to remove any such locks. The pruning operation is also one of the reasons why we need to log XLOG_RUNNING_XACTS on a regular basis. Duplicate lock acquisition isn't specifically checked for, since it is blocked on primary, but lock release if duplicates did exist is handled. I've checked the code some more to see if the above is possible. At step (3) we perform a shutdown checkpoint, which would/should be responsible for performing the prune operation that would prevent your case from being a problem. The code around shutdown checkpoint has changed a few times and it looks like that bug has been introduced by my edit on Dec 6 to prevent shutdown checkpoint as starting places. They need to be handled, since we now don't write a XLOG_RUNNING_XACTS record during a shutdown checkpoint. I will re-add this now. Thanks again: keep going, you're on a roll. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming replication and non-blocking I/O
Fujii Masao wrote: On Wed, Dec 16, 2009 at 6:53 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: 0. Begin by connecting to the master just like a normal backend does. We don't necessarily need the new ProtocolVersion code either, though it's probably still a good idea to reject connections to older server versions. And, I think that such backend should switch to walsender mode when the startup packet arrives. Otherwise, we would have to authenticate such backend twice on different context, i.e., a normal backend and walsender. So the settings for each context would be required in pg_hba.conf. This is odd, I think. Thought? True. This is identical to what happens when a query is executed against a normal backend using the simple query protocol, so walsender can use PQexec() for this. s/walsender/walreceiver ? Right. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming replication and non-blocking I/O
On Thu, Dec 17, 2009 at 9:02 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: And, I think that such backend should switch to walsender mode when the startup packet arrives. Otherwise, we would have to authenticate such backend twice on different context, i.e., a normal backend and walsender. So the settings for each context would be required in pg_hba.conf. This is odd, I think. Thought? True. Currently this switch depends on whether XLOG_STREAMING_CODE is sent from the standby or not, also which depends on whether PQstartXLogStreaming() is called or not. But, as the next step, we should get rid of also such changes of libpq. I'm thinking of making the standby send the walsender-switch-code the same way as application_name; walreceiver always specifies the option like replication=on in conninfo string and calls PQconnectdb(), which sends the code as a part of startup packet. And, the environment variable for that should not be defined to avoid user's mis-configuration, I think. Thought? Better idea? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby and prepared transactions
On Thu, 2009-12-17 at 13:24 +0200, Heikki Linnakangas wrote: Hmm, looking at the code, I think Simon threw that baby with the bathwater when he removed support for starting standby from a shutdown checkpoint. Hmm, I think that code was just for starting points only. It would not have been executed in normal running of the standby, so it appears the bug was older than that. Absence of baby now appears inconclusive. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] determine snapshot after obtaining locks for first statement
Tom Lane wrote: Kevin Grittner writes: Tom Lane wrote: (Besides which the lock acquired by UPDATE isn't exclusive and wouldn't block anyway...) It blocks other UPDATEs. Not at the table level. The question was about whether we could change the timing of when we get the current locks, not about getting different locks than we currently do. Clearly, adding a table lock would make no sense. If you could lock only at the tuple level maybe you'd have something, but it seems like you can't find the target tuples without having acquired a snapshot. That would be the fatal difference then. InnoDB doesn't actually keep old versions of a row; it generates them from undo records in the log when needed, which might be why it was feasible to get the lock before the snapshot there. If we need the snapshot before we can get the lock, it's not an optimization which is available to us. Thanks, -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby and prepared transactions
Simon Riggs wrote: On Thu, 2009-12-17 at 13:24 +0200, Heikki Linnakangas wrote: Hmm, looking at the code, I think Simon threw that baby with the bathwater when he removed support for starting standby from a shutdown checkpoint. Hmm, I think that code was just for starting points only. It would not have been executed in normal running of the standby, so it appears the bug was older than that. Absence of baby now appears inconclusive. This is the piece of code we're talking about, in xlog_redo(): --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -7340,41 +7340,6 @@ xlog_redo(XLogRecPtr lsn, XLogRecord *record) ShmemVariableCache-oldestXid = checkPoint.oldestXid; ShmemVariableCache-oldestXidDB = checkPoint.oldestXidDB; - /* -* We know nothing was running on the master at this point, though -* we can only use it as a starting point iff wal_standby_info was -* enabled, otherwise we may not get further information about changes -* from the master. -*/ - if (standbyState = STANDBY_UNINITIALIZED checkPoint.XLogStandbyInfoMode) - { - RunningTransactionsData running; - TransactionId oldestRunningXid; - TransactionId *xids; - int nxids; - - oldestRunningXid = PrescanPreparedTransactions(xids, nxids); - - /* -* Construct a RunningTransactions snapshot representing a shut -* down server, with only prepared transactions still alive. -* -* StandbyRecoverPreparedTransactions will call SubTransSetParent -* for any subtransactions, so we consider this a non-overflowed -* snapshot. -*/ - running.xcnt = nxids; - running.subxid_overflow = false; - running.nextXid = checkPoint.nextXid; - running.oldestRunningXid = oldestRunningXid; - running.xids = xids; - - ProcArrayInitRecoveryInfo(oldestRunningXid); - ProcArrayApplyRecoveryInfo(running); - - StandbyRecoverPreparedTransactions(); - } - /* Check to see if any changes to max_connections give problems */ if (standbyState != STANDBY_DISABLED) CheckRequiredParameterValues(checkPoint); That removed piece of code was executed in the standby whenever we saw a shutdown checkpoint. It calls ProcArrayApplyRecoveryInfo(), which calls ExpireOldKnownAssignedTransactionIds() and StandbyReleaseOldLocks() to clean up known-assigned-xid entries and locks of the implicitly-aborted transactions. I see now that in the presence of prepared transactions, we would fail to clean up failed transations with XID the oldest prepared transaction, but other than that it looks fine to me. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming replication and non-blocking I/O
Fujii Masao wrote: I'm thinking of making the standby send the walsender-switch-code the same way as application_name; walreceiver always specifies the option like replication=on in conninfo string and calls PQconnectdb(), which sends the code as a part of startup packet. And, the environment variable for that should not be defined to avoid user's mis-configuration, I think. Sounds good. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] determine snapshot after obtaining locks for first statement
Markus Wanner wrote: Another line of thought: isn't this like READ COMMITTED for just the first operation in a SERIALIZABLE transaction? Hmmm... You have a point. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Update on true serializable techniques in MVCC
On 16.12.09 16:40 , Kevin Grittner wrote: Nicolas Barbiernicolas.barb...@gmail.com wrote: I am not sure whether the serialization failures that it may cause are dependent on the plan used. They are. But so are failures due to deadlocks even today, no? The processing order of UPDATES which involve joins isn't any more well-defined that the order of rows returned by a similarly complex select I think. Actually, I think the whole SIREAD-lock idea can be seen as a kind of 2PL with opportunistic locking and deferred deadlock detection. Instead of making readers and writers block each other, you let them proceed in parallel, and check if that resulted in any mutual toe-stepping later. The surprising part is that SIREAD locks and those inConflict, outConflict flags actually provide enough information to detect possible problems. Or at least this is the idea I got after skipping through the thesis for an hour or so. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Update on true serializable techniques in MVCC
Albe Laurenz wrote: I would say that in the case of a table scan, the whole table will be SILOCKed. I guess that's pretty much unavoidable if you want serializability. Agreed. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby and prepared transactions
On Thu, 2009-12-17 at 15:18 +0200, Heikki Linnakangas wrote: That removed piece of code was executed in the standby whenever we saw a shutdown checkpoint. It calls ProcArrayApplyRecoveryInfo(), which calls ExpireOldKnownAssignedTransactionIds() and StandbyReleaseOldLocks() to clean up known-assigned-xid entries and locks of the implicitly-aborted transactions. OK, I was presuming that running StandbyRecoverPreparedTransactions() and ProcArrayInitRecoveryInfo() twice would cause problems. I see now that in the presence of prepared transactions, we would fail to clean up failed transations with XID the oldest prepared transaction Good! I just spotted that also, just prior to posting my fix, so rewriting it again now. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Update on true serializable techniques in MVCC
Nicolas Barbier wrote: Therefore, with next-key locking you better don't have too many table scans if you want to have any concurrent transactions. Well, I would say that you don't want too many table scans on heavily updated tables if you don't want too many serialization failures. Keep in mind that the SIREAD locks would not block anything. A dangerous structure, involving two adjacent rw dependencies, must be found before anything is rolled back. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Largeobject Access Controls (r2460)
2009/12/17 Takahiro Itagaki itagaki.takah...@oss.ntt.co.jp: Robert Haas robertmh...@gmail.com wrote: 2009/12/16 KaiGai Kohei kai...@ak.jp.nec.com: ? ?long desc: When turned on, privilege checks on large objects perform with ? ? ? ? ? ? ? backward compatibility as 8.4.x or earlier releases. Mostly English quality, but there are some other issues too. Proposed patch attached. I remember we had discussions about the version number, like Don't use '8.5' because it might be released as '9.0', no? I chose to follow the style which Tom indicated that he preferred here. We don't use the phrase 8.4.x series anywhere else in the documentation, so this doesn't seem like a good time to start. Tom or I will go through and renumber everything if we end up renaming the release to 9.0. Another comment is I'd like to keep link linkend=catalog-pg-largeobject-metadata for the first structnamepg_largeobject/structname in each topic. Those two things aren't the same. Perhaps you meant link linkend=catalog-pg-largeobject? I'll tweak the pg_largeobject and pg_largeobject_metadata sections to make sure each has a link to the other and commit this. I also found one more spelling mistake so I will include that correction as well. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] NOT IN Doesn't use Anti Joins?
I'm sure there is a good reason why NOT IN will not use an Anti-Join plan equivalent to NOT EXISTS due to NULL handling, but in this particular case the value being compared is in the PRIMARY KEY of both structures being joined. The NOT IN plan was killed after 10 minutes. The NOT EXISTS plan returned data in roughly 10ms. Is there a reason why the NOT IN plan could not use Anti-Joins when the column being compared against is guaranteed to be NOT NULL? Too much planner overhead to determine nullness of the column? sk=# explain select * from source_reb_listing where listing_id not in (select listing_id from source_reb_listing_specs) order by file_id desc limit 5; QUERY PLAN - Limit (cost=729015.39..3420463.83 rows=5 width=28) - Index Scan Backward using source_reb_listing_fileid_idx on source_reb_listing (cost=729015.39..169537219655.96 rows=314954 width=28) Filter: (NOT (SubPlan 1)) SubPlan 1 - Materialize (cost=729015.39..1185280.74 rows=32810035 width=8) - Seq Scan on source_reb_listing_specs (cost=0.00..568040.35 rows=32810035 width=8) (6 rows) sk=# explain select * from source_reb_listing where not exists (select * from source_reb_listing_specs as t where t.listing_id = source_reb_listing.listing_id) order by file_id desc limit 5; QUERY PLAN - Limit (cost=0.00..35.31 rows=5 width=28) - Nested Loop Anti Join (cost=0.00..3880495.87 rows=549496 width=28) - Index Scan Backward using source_reb_listing_fileid_idx on source_reb_listing (cost=0.00..1107142.20 rows=629907 width=28) - Index Scan using source_reb_listing_specs_pkey on source_reb_listing_specs t (cost=0.00..1592.74 rows=408 width=8) Index Cond: (t.listing_id = source_reb_listing.listing_id) (5 rows) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] NOT IN Doesn't use Anti Joins?
On Thu, Dec 17, 2009 at 9:02 AM, Rod Taylor rod.tay...@gmail.com wrote: Is there a reason why the NOT IN plan could not use Anti-Joins when the column being compared against is guaranteed to be NOT NULL? Too much planner overhead to determine nullness of the column? I doubt it. I think it's just a question of round tuits. I think Tom hasn't felt it worth the effort since you can work around it by rewriting the query, and nobody else has bothered to work up a patch. If you feel like working on it, I think it would be a nice improvement. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Range types
Dimitri Fontaine dfonta...@hi-media.com writes: Tom Lane t...@sss.pgh.pa.us writes: Hm, how would you do it with LATERAL? The problem is not so much composition as the need for a variable number of rounds of composition. Let's have a try at it: select p2_member, array_accum(p1) from unnest(p2) as p2_member lateral (select period_except(p1_member, p2_member) from unnest(p1) p1_member) as x(p1); I don't think that does it. Maybe I misunderstand LATERAL, but what that looks like to me is that each p1 will be separately filtered by each p2, giving rise to a distinct element in the output. What we need is for each p1 to be filtered by *all* p2's, successively (though in any order). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby and prepared transactions
On Thu, 2009-12-17 at 13:38 +, Simon Riggs wrote: I see now that in the presence of prepared transactions, we would fail to clean up failed transations with XID the oldest prepared transaction Good! I just spotted that also, just prior to posting my fix, so rewriting it again now. solution pushed to git -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] determine snapshot after obtaining locks for first statement
Markus Wanner mar...@bluegap.ch wrote: Another line of thought: isn't this like READ COMMITTED for just the first operation in a SERIALIZABLE transaction? I've mulled it over and I have two different logical proofs that this is safe; if anyone is dubious I'd be happy to share. This seems likely to be of significant benefit in some workloads, and I can't see anywhere that it is likely to cost much. Any objections to adding this to the TODO list as a performance item? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] determine snapshot after obtaining locks for first statement
On Thu, Dec 17, 2009 at 10:05 AM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Markus Wanner mar...@bluegap.ch wrote: Another line of thought: isn't this like READ COMMITTED for just the first operation in a SERIALIZABLE transaction? I've mulled it over and I have two different logical proofs that this is safe; if anyone is dubious I'd be happy to share. This seems likely to be of significant benefit in some workloads, and I can't see anywhere that it is likely to cost much. Any objections to adding this to the TODO list as a performance item? I thought you concluded two emails ago that it wouldn't work for PG? It's certainly not clear to me what exactly the TODO would be. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] determine snapshot after obtaining locks for first statement
Robert Haas robertmh...@gmail.com wrote: Kevin Grittner kevin.gritt...@wicourts.gov wrote: Markus Wanner mar...@bluegap.ch wrote: Another line of thought: isn't this like READ COMMITTED for just the first operation in a SERIALIZABLE transaction? I've mulled it over and I have two different logical proofs that this is safe; if anyone is dubious I'd be happy to share. This seems likely to be of significant benefit in some workloads, and I can't see anywhere that it is likely to cost much. Any objections to adding this to the TODO list as a performance item? I thought you concluded two emails ago that it wouldn't work for PG? It's certainly not clear to me what exactly the TODO would be. Tom's emails had me pretty convinced that this technique wouldn't work in PostgreSQL, but Markus put a fresh perspective on it which makes it seem relatively painless. (Although, as is often the case, my perspective may be naive.) Basically, in a SERIALIZABLE transaction, if the first statement which would require a snapshot would currently fail with ERROR: could not serialize access due to concurrent update we would instead get a fresh snapshot and retry -- which is what we do in a READ COMMITTED transaction. One way of looking at this is that any transaction which fails with a serialization error can be retried with a reasonable chance of success. There is no evidence of anything wrong with the transaction itself, just that its actions conflicted with those of a concurrent transaction. For the case we're discussing, that other transaction has now committed. (We blocked waiting to see whether it would commit or roll back.) If this is the first statement which needed a snapshot, retrying it with a new snapshot can't create any conflicting views of the data. We *could* view this sort of as an automatic transaction retry in the limited situations where the database engine can determine what to do. (If there had been prior statements, you can't really know that the current statement would have been issued by the client had the prior statements been run against a different snapshot.) Where this view of things is a little off is that explicit locks obtained earlier in the transaction would still be held; we're not really starting the *whole* transaction over. While this doesn't seem a fatal flaw, it does mean the other way of looking at it is a more technically correct. The other way of looking at it is that until a statement succeeds with a given snapshot, you have not fixed your snapshot for the serializable transaction. A retry similar to what we do for READ COMMITTED would just be part of obtaining the one snapshot used for the SERIALIZABLE transaction -- it isn't fixed until that first statement succeeds. I'm assuming that this could be a fairly small change because we already have code to do exactly the right thing for READ COMMITTED transactions. The logic to choose which way to handle the commit of a transaction which held a competing lock would need to be modified to use the READ COMMITTED lock on the first statement which obtains a snapshot in a SERIALIZABLE transaction, and the snapshot for a SERIALIZABLE transaction would not be fixed until the completion of the first statement needing a snapshot. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] determine snapshot after obtaining locks for first statement
Kevin Grittner kevin.gritt...@wicourts.gov writes: Basically, in a SERIALIZABLE transaction, if the first statement which would require a snapshot would currently fail with ERROR: could not serialize access due to concurrent update we would instead get a fresh snapshot and retry -- which is what we do in a READ COMMITTED transaction. This sounds like a pretty horrid kluge. For one thing, the statement might already have done a great deal of work before you hit the failure. (Admittedly, that work will be lost anyway if we abort, but it's not a localized change to make it all happen all over again.) Also, aborting that statement without also losing any previously-acquired locks would require establishing a hidden subtransaction, with ensuing extra costs to be paid even when there isn't a failure. I think you misunderstand how READ COMMITTED works; it does not change the snapshot for the entire statement, it only follows the update chain for a particular tuple that's been chosen for update or delete. I'm assuming that this could be a fairly small change It would not be. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Range types
Tom == Tom Lane t...@sss.pgh.pa.us writes: Someone mentioned LATERAL? Tom Lane t...@sss.pgh.pa.us writes: Hm, how would you do it with LATERAL? The problem is not so much composition as the need for a variable number of rounds of composition. Let's have a try at it: select p2_member, array_accum(p1) from unnest(p2) as p2_member lateral (select period_except(p1_member, p2_member) from unnest(p1) p1_member) as x(p1); Tom I don't think that does it. Maybe I misunderstand LATERAL, but Tom what that looks like to me is that each p1 will be separately Tom filtered by each p2, giving rise to a distinct element in the Tom output. What we need is for each p1 to be filtered by *all* Tom p2's, successively (though in any order). Right, that's not a job for LATERAL, though it could be done easily enough in one statement with a recursive CTE, I think. -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] determine snapshot after obtaining locks for first statement
On Thu, Dec 17, 2009 at 3:39 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Basically, in a SERIALIZABLE transaction, if the first statement which would require a snapshot would currently fail with ERROR: could not serialize access due to concurrent update we would instead get a fresh snapshot and retry -- which is what we do in a READ COMMITTED transaction. So I for multi-statement transactions I don't see what this buys you. You'll still have to write the code to retry, and postgres retrying in the cases where it can isn't really going to be a whole lot better. Moreover I think it would kick in less often than you might expect and sometimes surprise people by not kicking in when they expect it to. Any internal queries could count (though i think you get away with catalog operations in snapshot_now), any volatile functions, etc. So people might write a single-statement SQL transaction and not bother writing retry logic and then be surprised by errors. I'm unclear why serialization failures would be rare. It depends entirely on the application. If you're querying records which are busy from concurrent updates you could get a continuous stream of serialization failures. It seems better to report the situation to the user all the time since they have to handle it already and might want to know about the problem and implement some kind of backoff rather than hide it from them but only sometimes so they still have to write code to handle it but aren't allows to handle it consistently. This isn't the first time that we've seen advantages that could be had from packaging up a whole transaction so the database can see everything the transaction needs to do. Perhaps we should have an interface for saying you're going to feed a series of commands which you want the database to repeat for you verbatim automatically on serialization failures. Since you can't construct the queries based on the results of previous queries the database would be free to buffer them all up and run them together at the end of the transaction which would allow the other tricky optimizations we've pondered in the past as well. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] determine snapshot after obtaining locks for first statement
Tom Lane t...@sss.pgh.pa.us wrote: Kevin Grittner kevin.gritt...@wicourts.gov writes: we would instead get a fresh snapshot and retry -- which is what we do in a READ COMMITTED transaction. I think you misunderstand how READ COMMITTED works; it does not change the snapshot for the entire statement, it only follows the update chain for a particular tuple that's been chosen for update or delete. Thanks for the clarification. That does not work for SERIALIZABLE at all, because other tables or rows referenced in that first statement would be using the original snapshot. Indeed, the behavior under READ COMMITTED could be astonishing in certain circumstances as it breaks atomicity: atomicity: all of the results of a transaction should be visible in the database, or none of them should be. It should never be possible to see the results of some operations in a transaction without the others. connection1: test=# create table t (c1 int not null primary key, c2 int); NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index t_pkey for table t CREATE TABLE test=# insert into t values (1, 101), (2, 7); INSERT 0 2 test=# start TRANSACTION ISOLATION LEVEL READ COMMITTED ; START TRANSACTION test=# update t set c2 = c2 + (select c2 from t where c1 = 2) where c1 = 1; UPDATE 1 test=# update t set c2 = 11 where c1 = 2; UPDATE 1 connection2: test=# START TRANSACTION ISOLATION LEVEL READ COMMITTED ; START TRANSACTION test=# update t set c2 = c2 + (select c2 from t where c1 = 2) where c1 = 1; [blocks] connection1: test=# commit; COMMIT connection2: UPDATE 1 test=# commit; COMMIT test=# select * from t; c1 | c2 +- 2 | 11 1 | 115 (2 rows) The update on connection2 added the modified value of the first update from connection1 to the unmodified value from the second update on connection1. In other words, the atomicity of the update on connection1 is broken in this case. I'm not sure why this is considered OK. At a minimum it should be mentioned in our documentation of our implementation of the READ COMMITTED isolation level. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Range types
Tom Lane wrote: Dimitri Fontaine dfonta...@hi-media.com writes: Tom Lane t...@sss.pgh.pa.us writes: Hm, how would you do it with LATERAL? The problem is not so much composition as the need for a variable number of rounds of composition. Let's have a try at it: select p2_member, array_accum(p1) from unnest(p2) as p2_member lateral (select period_except(p1_member, p2_member) from unnest(p1) p1_member) as x(p1); I don't think that does it. Maybe I misunderstand LATERAL, but what that looks like to me is that each p1 will be separately filtered by each p2, giving rise to a distinct element in the output. What we need is for each p1 to be filtered by *all* p2's, successively (though in any order). regards, tom lane That approach will only work if you coalesce your inputs into non-contiguous sets (NCS) first. Overlapping ranges would break it in a hurry. In addition to two coalesce operations, period_except would be calculated 1000x for a pair of 100 element arrays. Original solution, while not short was probably a little more elegant than Tom gave credit for. In a single pass it pulls out only the data points needed to build the resultant NCS without making assumptions that the inputs were coalesced. I think I'll still be able to do a single pass solution for continuous ranges. I just wont be able to do the coalesce operations inline with the set operations. Scott -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: determine snapshot after obtaining locks for first statement
Greg Stark gsst...@mit.edu wrote: So I for multi-statement transactions I don't see what this buys you. Well, I became interested when Dr. Cahill said that adding this optimization yielded dramatic improvements in his high contention benchmarks. Clearly it won't help every load pattern. You'll still have to write the code to retry, and postgres retrying in the cases where it can isn't really going to be a whole lot better. In my view, any use of a relational database always carries with it the possibility of a serialization error. In other database products I've run into situations where a simple SELECT at READ COMMITTED can result in a serialization failure, so in my view all application software should use a framework capable of recognizing and automatically recovering from these. I just try to keep them to a manageable level. people might write a single-statement SQL transaction and not bother writing retry logic and then be surprised by errors. As has often been said here -- you can't always protect people from their own stupidity. I'm unclear why serialization failures would be rare. Did I say that somewhere??? It seems better to report the situation to the user all the time since they have to handle it already and might want to know about the problem and implement some kind of backoff The point was to avoid a serialization failure and its related rollback. Do you think we should be reporting something to the users every time a READ COMMITTED transaction blocks and then picks the updated row? (Actually, given that the results may be based on an inconsistent view of the database, maybe we should) This isn't the first time that we've seen advantages that could be had from packaging up a whole transaction so the database can see everything the transaction needs to do. Perhaps we should have an interface for saying you're going to feed a series of commands which you want the database to repeat for you verbatim automatically on serialization failures. Since you can't construct the queries based on the results of previous queries the database would be free to buffer them all up and run them together at the end of the transaction which would allow the other tricky optimizations we've pondered in the past as well. How is that different from putting the logic into a function and retrying on serialization failure? Are you just proposing a more convenient mechanism to do the same thing? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] determine snapshot after obtaining locks for first statement
Kevin Grittner kevin.gritt...@wicourts.gov writes: Thanks for the clarification. That does not work for SERIALIZABLE at all, because other tables or rows referenced in that first statement would be using the original snapshot. Indeed, the behavior under READ COMMITTED could be astonishing in certain circumstances as it breaks atomicity: Yup. That is stated fairly clearly already in the description of READ COMMITTED mode, no? http://developer.postgresql.org/pgdocs/postgres/transaction-iso.html#XACT-READ-COMMITTED regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] COPY IN as SELECT target
Recently there was discussion about allowing a COPY statement to be a SELECT target, returning a text array, although the syntax wasn't really nailed down that I recall. I was thinking that we might have COPY RETURNING ARRAY FROM ... instead of COPY tablename opt_column_list FROM ... the we possibly could do things like: SELECT t[5] as a, 3*(t[3]::numeric) as b FROM (COPY RETURNING ARRAY FROM STDIN CSV) as t; Thoughts? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] determine snapshot after obtaining locks for first statement
Tom Lane t...@sss.pgh.pa.us wrote: Kevin Grittner kevin.gritt...@wicourts.gov writes: the behavior under READ COMMITTED could be astonishing in certain circumstances as it breaks atomicity: Yup. That is stated fairly clearly already in the description of READ COMMITTED mode, no? http://developer.postgresql.org/pgdocs/postgres/transaction-iso.html#XACT-READ-COMMITTED : it is possible for an updating command to see an inconsistent : snapshot: it can see the effects of concurrent updating commands : on the same rows it is trying to update, but it does not see : effects of those commands on other rows in the database. This : behavior makes Read Committed mode unsuitable for commands that : involve complex search conditions I don't know how many times I've read that page (many), yet I never properly comprehended the impact of that part. I think the last bit I quoted above is somewhat misleading, in that it implies that the issue is limited to complex search conditions. In the failing case I showed in this thread, the search conditions involved are comparisons for equality of an integer literal to the one-column integer primary key. It seems like any join or subquery which references a table is vulnerable, yes? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY IN as SELECT target
Andrew Dunstan wrote: Recently there was discussion about allowing a COPY statement to be a SELECT target, returning a text array, although the syntax wasn't really nailed down that I recall. I was thinking that we might have COPY RETURNING ARRAY FROM ... instead of COPY tablename opt_column_list FROM ... It's not really returning an array, is it? It's returning a bag of rows like a (sub)query. the we possibly could do things like: SELECT t[5] as a, 3*(t[3]::numeric) as b FROM (COPY RETURNING ARRAY FROM STDIN CSV) as t; How about just COPY FROM? As in SELECT t[5] as a, 3*(t[3]::numeric) as b FROM (COPY FROM STDIN CSV) as t -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY IN as SELECT target
On Thu, Dec 17, 2009 at 12:23 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: How about just COPY FROM? As in SELECT t[5] as a, 3*(t[3]::numeric) as b FROM (COPY FROM STDIN CSV) as t I had the same thought. Though it would also be nice to allow something like: COPY (type1, type2, type3, type4) FROM STDIN CSV ...which is obviously going to create a horrible parser problem if you actually tried to use that syntax. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY IN as SELECT target
On Thu, Dec 17, 2009 at 12:28:50PM -0500, Robert Haas wrote: On Thu, Dec 17, 2009 at 12:23 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: How about just COPY FROM? As in SELECT t[5] as a, 3*(t[3]::numeric) as b FROM (COPY FROM STDIN CSV) as t I had the same thought. Though it would also be nice to allow something like: COPY (type1, type2, type3, type4) FROM STDIN CSV ...which is obviously going to create a horrible parser problem if you actually tried to use that syntax. How about using the CTE syntax? Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] determine snapshot after obtaining locks for first statement
Kevin Grittner kevin.gritt...@wicourts.gov writes: Tom Lane t...@sss.pgh.pa.us wrote: http://developer.postgresql.org/pgdocs/postgres/transaction-iso.html#XACT-READ-COMMITTED I don't know how many times I've read that page (many), yet I never properly comprehended the impact of that part. I think the last bit I quoted above is somewhat misleading, in that it implies that the issue is limited to complex search conditions. In the failing case I showed in this thread, the search conditions involved are comparisons for equality of an integer literal to the one-column integer primary key. It seems like any join or subquery which references a table is vulnerable, yes? Well, it would all depend on what you're trying to do. Typical single-row UPDATE commands aren't really affected by this problem, and in fact the behavior is pretty much exactly what they want as long as the WHERE conditions don't involve columns that are being changed. Maybe we should replace the bit about complex search conditions with something about referencing multiple rows to perform one update. I'm not very sure what a clearer explanation would look like though. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY IN as SELECT target
On Thu, Dec 17, 2009 at 12:38 PM, David Fetter da...@fetter.org wrote: On Thu, Dec 17, 2009 at 12:28:50PM -0500, Robert Haas wrote: On Thu, Dec 17, 2009 at 12:23 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: How about just COPY FROM? As in SELECT t[5] as a, 3*(t[3]::numeric) as b FROM (COPY FROM STDIN CSV) as t I had the same thought. Though it would also be nice to allow something like: COPY (type1, type2, type3, type4) FROM STDIN CSV ...which is obviously going to create a horrible parser problem if you actually tried to use that syntax. How about using the CTE syntax? I'm not sure what you're suggesting exactly, but the problem with the syntax I suggested is that COPY (...) TO whatever expects the ... part to be a subselect. You can't make COPY (...) FROM have something in there other than a subselect, because the parser can't fast-forward and look at the word FROM and then go back and decide how to parse the parenthesized stuff. That's almost magic in the general case. You'd have to stick a keyword in there before the opening parentheses. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY IN as SELECT target
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: Andrew Dunstan wrote: COPY RETURNING ARRAY FROM ... It's not really returning an array, is it? It's returning a bag of rows like a (sub)query. How about just COPY FROM? The problem with COPY FROM is that it hard-wires a decision that there is one and only one possible result format, which I think we pretty much proved already is the wrong thing. I'm not thrilled with RETURNING ARRAY either, but we need to leave ourselves wiggle room to have more than one result format from the same source file. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] determine snapshot after obtaining locks for first statement
On Thu, Dec 17, 2009 at 5:39 PM, Tom Lane t...@sss.pgh.pa.us wrote: Well, it would all depend on what you're trying to do. Typical single-row UPDATE commands aren't really affected by this problem, and in fact the behavior is pretty much exactly what they want as long as the WHERE conditions don't involve columns that are being changed. Maybe we should replace the bit about complex search conditions with something about referencing multiple rows to perform one update. I'm not very sure what a clearer explanation would look like though. I wonder if RETURNING hasn't created a whole new set of cases where our READ COMMITTED behaviour is bogus. I guess it's equivalent to having used SELECT FOR UPDATE. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] determine snapshot after obtaining locks for first statement
Tom Lane t...@sss.pgh.pa.us wrote: I'm not very sure what a clearer explanation would look like As a stab at it, how about?: This behavior makes Read Committed mode unsuitable for many UPDATE or DELETE commands with joins or subqueries -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] determine snapshot after obtaining locks for first statement
Greg Stark gsst...@mit.edu writes: I wonder if RETURNING hasn't created a whole new set of cases where our READ COMMITTED behaviour is bogus. I don't see how. It just gives you access to the same values that were actually used by the UPDATE. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] determine snapshot after obtaining locks for first statement
On Thu, Dec 17, 2009 at 12:51 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Tom Lane t...@sss.pgh.pa.us wrote: I'm not very sure what a clearer explanation would look like As a stab at it, how about?: This behavior makes Read Committed mode unsuitable for many UPDATE or DELETE commands with joins or subqueries I don't think that's any clearer, though it is more disparaging. :-) Note we also say: The partial transaction isolation provided by Read Committed mode is adequate for many applications, and this mode is fast and simple to use; however, it is not sufficient for all cases. Applications that do complex queries and updates might require a more rigorously consistent view of the database than Read Committed mode provides. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] determine snapshot after obtaining locks for first statement
On Thu, 2009-12-17 at 12:58 -0500, Robert Haas wrote: On Thu, Dec 17, 2009 at 12:51 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Tom Lane t...@sss.pgh.pa.us wrote: I'm not very sure what a clearer explanation would look like As a stab at it, how about?: This behavior makes Read Committed mode unsuitable for many UPDATE or DELETE commands with joins or subqueries I don't think that's any clearer, though it is more disparaging. :-) Note we also say: The partial transaction isolation provided by Read Committed mode is adequate for many applications, and this mode is fast and simple to use; however, it is not sufficient for all cases. Applications that do complex queries and updates might require a more rigorously consistent view of the database than Read Committed mode provides. What is needed here is a layman's context of what isolation modes are good for what type of operation. Neither your explanation or Tom's is particularly useful except to say, Crap, I might be screwed but I don't know if I am... how do I find out? Joshua D. Drake ...Robert -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564 Consulting, Training, Support, Custom Development, Engineering If the world pushes look it in the eye and GRR. Then push back harder. - Salamander -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] determine snapshot after obtaining locks for first statement
Kevin Grittner kevin.gritt...@wicourts.gov writes: Tom Lane t...@sss.pgh.pa.us wrote: I'm not very sure what a clearer explanation would look like As a stab at it, how about?: This behavior makes Read Committed mode unsuitable for many UPDATE or DELETE commands with joins or subqueries After thinking a bit, I'd be inclined to add a new paragraph. In particular, now that FOR UPDATE actually works in subqueries, it'd be worth pointing out that you can add that to guard against this type of issue. Perhaps, after the DELETE FROM website example, we could add something like UPDATEs and DELETEs involving joins or subqueries are particularly at risk, since they may perform an update based on a combination of old rows from other tables with an up-to-date target row. This risk can be mitigated by adding FOR UPDATE or FOR SHARE to subqueries, so that all rows directly involved in an update are guaranteed current. However that will also increase the risk of deadlock failures. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] determine snapshot after obtaining locks for first statement
On Thu, Dec 17, 2009 at 1:05 PM, Tom Lane t...@sss.pgh.pa.us wrote: Kevin Grittner kevin.gritt...@wicourts.gov writes: Tom Lane t...@sss.pgh.pa.us wrote: I'm not very sure what a clearer explanation would look like As a stab at it, how about?: This behavior makes Read Committed mode unsuitable for many UPDATE or DELETE commands with joins or subqueries After thinking a bit, I'd be inclined to add a new paragraph. In particular, now that FOR UPDATE actually works in subqueries, it'd be worth pointing out that you can add that to guard against this type of issue. Perhaps, after the DELETE FROM website example, we could add something like UPDATEs and DELETEs involving joins or subqueries are particularly at risk, since they may perform an update based on a combination of old rows from other tables with an up-to-date target row. This risk can be mitigated by adding FOR UPDATE or FOR SHARE to subqueries, so that all rows directly involved in an update are guaranteed current. However that will also increase the risk of deadlock failures. I like that. It might also be worth trying to explain that if you select some data out of the database, do a computation with it, and then use the results to drive an update, you're going to want to make the initial select be FOR SHARE. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] determine snapshot after obtaining locks for first statement
Robert Haas robertmh...@gmail.com wrote: I don't think that's any clearer, though it is more disparaging. :-) It's certainly not my goal to knock PostgreSQL. The precise conditions in which an UPDATE or DELETE can view an inconsistent database state (and therefore potentially persist something based on that inconsistent state) are that it has a FROM clause and/or subqueries which reference data changed by a concurrent database transaction which also affects rows which are targets of the UPDATE or DELETE. Precise descriptions of problem circumstances seem more useful to developers than vague statements like it's usually good enough, except when it isn't. If an accurate description of the behavior is considered disparaging, perhaps it's the behavior which should change, not just the description of it. Since I never use READ COMMITTED for updates, I'm not going to weigh in on whether this is a big enough problem to merit the effort and overhead of a different implementation; I'm just suggesting we should put the information out there more explicitly. My wording was still a little on the vague side, in an attempt to keep it short; perhaps that was a mistake. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] determine snapshot after obtaining locks for first statement
Joshua D. Drake j...@commandprompt.com writes: What is needed here is a layman's context of what isolation modes are good for what type of operation. Neither your explanation or Tom's is particularly useful except to say, Crap, I might be screwed but I don't know if I am... how do I find out? If we had a simple way to characterize that, we'd not be having this discussion :-( One possibility is to try to list the risky cases. So far I can think of: * updates using a WHERE clause that tests columns being changed by other transactions * updates using subqueries/joins so that the result depends on other rows besides the one directly updated/deleted, and those other rows are subject to concurrent changes But I'm not sure this is a complete list, and an incomplete one might do more harm than good ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] determine snapshot after obtaining locks for first statement
Tom Lane t...@sss.pgh.pa.us wrote: After thinking a bit, I'd be inclined to add a new paragraph. In particular, now that FOR UPDATE actually works in subqueries, it'd be worth pointing out that you can add that to guard against this type of issue. Perhaps, after the DELETE FROM website example, we could add something like UPDATEs and DELETEs involving joins or subqueries are particularly at risk, since they may perform an update based on a combination of old rows from other tables with an up-to-date target row. This risk can be mitigated by adding FOR UPDATE or FOR SHARE to subqueries, so that all rows directly involved in an update are guaranteed current. However that will also increase the risk of deadlock failures. Much better than my suggestion. Including both the problem conditions and the solution is ideal. I'd missed that we now allow FOR UPDATE and FOR SHARE on subqueries. Nice enhancement. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] determine snapshot after obtaining locks for first statement
Kevin Grittner kevin.gritt...@wicourts.gov writes: ... The precise conditions in which an UPDATE or DELETE can view an inconsistent database state (and therefore potentially persist something based on that inconsistent state) are that it has a FROM clause and/or subqueries which reference data changed by a concurrent database transaction which also affects rows which are targets of the UPDATE or DELETE. Are we sure that's a precise and complete description? I don't have a problem with putting a description just like that in the docs, but I'm not yet convinced it's right. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] determine snapshot after obtaining locks for first statement
Tom Lane t...@sss.pgh.pa.us wrote: Are we sure that's a precise and complete description? I don't have a problem with putting a description just like that in the docs, but I'm not yet convinced it's right. Well, I thought it was when I typed it. You mentioned referencing other columns in the updated rows; I'll test to see how that behaves. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY IN as SELECT target
Tom Lane wrote: Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: Andrew Dunstan wrote: COPY RETURNING ARRAY FROM ... It's not really returning an array, is it? It's returning a bag of rows like a (sub)query. How about just COPY FROM? The problem with COPY FROM is that it hard-wires a decision that there is one and only one possible result format, which I think we pretty much proved already is the wrong thing. I'm not thrilled with RETURNING ARRAY either, but we need to leave ourselves wiggle room to have more than one result format from the same source file. Well, we could have RETURNING type-expression with text[] supported for the first iteration. In answer to Heiki's argument, what I wanted was exactly to return an array of text for each row. Whatever we have needs to be able to handle to possibility of ragged input (see previous discussion) so we can't tie it down too tightly. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY IN as SELECT target
Andrew Dunstan and...@dunslane.net writes: Tom Lane wrote: The problem with COPY FROM is that it hard-wires a decision that there is one and only one possible result format, which I think we pretty much proved already is the wrong thing. I'm not thrilled with RETURNING ARRAY either, but we need to leave ourselves wiggle room to have more than one result format from the same source file. Well, we could have RETURNING type-expression with text[] supported for the first iteration. In answer to Heiki's argument, what I wanted was exactly to return an array of text for each row. Whatever we have needs to be able to handle to possibility of ragged input (see previous discussion) so we can't tie it down too tightly. I think that there are two likely possibilities for the result format: * Raw data after just the de-escaping and column separation steps. Array of text is probably the right thing here, at least for a text COPY (doesn't seem to cover the binary case though). * The data converted to some specified row type. RETURNING type-expression is probably not good since it looks more like the second case than the first --- and in fact it could be outright ambiguous, what if your data actually is one column that is a text array? If we're willing to assume these are the *only* possibilities then we could use COPY FROM ... for the first and COPY RETURNING type-list FROM ... for the second. I'm a bit uncomfortable with that assumption though; it seems likely that we'll want to shoehorn in some more alternatives later. (Like, what about the binary case?) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY IN as SELECT target
In answer to Heiki's argument, what I wanted was exactly to return an array of text for each row. Whatever we have needs to be able to handle to possibility of ragged input (see previous discussion) so we can't tie it down too tightly. I would have *lots* of use for this feature. Mind you, returning (arbitrary expression) would be even better, but if we can get returning TEXT[] for 8.5, I think it's worth doing on its own. --Josh Berkus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] determine snapshot after obtaining locks for first statement
On Thu, Dec 17, 2009 at 1:12 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Robert Haas robertmh...@gmail.com wrote: I don't think that's any clearer, though it is more disparaging. :-) It's certainly not my goal to knock PostgreSQL. The precise conditions in which an UPDATE or DELETE can view an inconsistent database state (and therefore potentially persist something based on that inconsistent state) are that it has a FROM clause and/or subqueries which reference data changed by a concurrent database transaction which also affects rows which are targets of the UPDATE or DELETE. Precise descriptions of problem circumstances seem more useful to developers than vague statements like it's usually good enough, except when it isn't. If an accurate description of the behavior is considered disparaging, perhaps it's the behavior which should change, not just the description of it. Since I never use READ COMMITTED for updates, I'm not going to weigh in on whether this is a big enough problem to merit the effort and overhead of a different implementation; I'm just suggesting we should put the information out there more explicitly. My wording was still a little on the vague side, in an attempt to keep it short; perhaps that was a mistake. Don't get me wrong, I don't love the current behavior. (I don't have a competing proposal either.) But I think we want to describe it with precision, because there are also many cases where _it works fine_. Telling people when it works and when it doesn't work is a lot more useful than attempting to qualitatively estimate how good or bad it is. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY IN as SELECT target
On Thu, Dec 17, 2009 at 1:37 PM, Tom Lane t...@sss.pgh.pa.us wrote: Andrew Dunstan and...@dunslane.net writes: Tom Lane wrote: The problem with COPY FROM is that it hard-wires a decision that there is one and only one possible result format, which I think we pretty much proved already is the wrong thing. I'm not thrilled with RETURNING ARRAY either, but we need to leave ourselves wiggle room to have more than one result format from the same source file. Well, we could have RETURNING type-expression with text[] supported for the first iteration. In answer to Heiki's argument, what I wanted was exactly to return an array of text for each row. Whatever we have needs to be able to handle to possibility of ragged input (see previous discussion) so we can't tie it down too tightly. I think that there are two likely possibilities for the result format: * Raw data after just the de-escaping and column separation steps. Array of text is probably the right thing here, at least for a text COPY (doesn't seem to cover the binary case though). * The data converted to some specified row type. Agreed. RETURNING type-expression is probably not good since it looks more like the second case than the first --- and in fact it could be outright ambiguous, what if your data actually is one column that is a text array? If we're willing to assume these are the *only* possibilities then we could use COPY FROM ... for the first and COPY RETURNING type-list FROM ... for the second. I'm a bit uncomfortable with that assumption though; it seems likely that we'll want to shoehorn in some more alternatives later. (Like, what about the binary case?) You might want to specify column names as well as well as types, in this second case. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] determine snapshot after obtaining locks for first statement
Kevin Grittner kevin.gritt...@wicourts.gov wrote: Tom Lane t...@sss.pgh.pa.us wrote: Are we sure that's a precise and complete description? I don't have a problem with putting a description just like that in the docs, but I'm not yet convinced it's right. Well, I thought it was when I typed it. You mentioned referencing other columns in the updated rows; I'll test to see how that behaves. Some quick testing seems to show that for the rows on which we were blocking, all columns reflect all updates from the concurrent transaction on which we were waiting, including columns used in the WHERE clause. I'm not sure exactly what other tests might be necessary. I'm having trouble coming up with anything which doesn't involve a join or subquery, but that could be a failure of imagination. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY IN as SELECT target
Robert Haas robertmh...@gmail.com writes: You might want to specify column names as well as well as types, in this second case. Well, we could do it like VALUES: arbitrarily name the columns column1 ... columnN and tell people to use an alias if they want other names. If it's convenient to fit column names into the syntax, good, but we don't absolutely have to. [ thinks... ] Although actually the obvious SQL-ish syntax for a rowtype specification is ( colname typename [ , ... ] ) so that's probably what we'd want to do in the processed-data case. Not sure about the raw-data case --- maybe a predetermined name is okay there. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] determine snapshot after obtaining locks for first statement
Kevin Grittner kevin.gritt...@wicourts.gov writes: Some quick testing seems to show that for the rows on which we were blocking, all columns reflect all updates from the concurrent transaction on which we were waiting, including columns used in the WHERE clause. I'm not sure exactly what other tests might be necessary. I'm having trouble coming up with anything which doesn't involve a join or subquery, but that could be a failure of imagination. The issue that I was thinking about is that there are actually two rounds of WHERE testing involved in a READ COMMITTED update: first we fetch a row that matches the WHERE clause *in the query snapshot*, and then we fetch its most up-to-date version and recheck the WHERE condition for that. If the updated version no longer satisfies WHERE we ignore it. The trouble with this is that the same transaction that changed that row to not satisfy WHERE might have also changed some other row so that it now *does* satisfy WHERE, but we won't ever find that other row because in the query snapshot it doesn't pass the WHERE. The website example in the docs is meant to illustrate this hazard. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] determine snapshot after obtaining locks for first statement
Robert Haas robertmh...@gmail.com wrote: Don't get me wrong, I don't love the current behavior. (I don't have a competing proposal either.) But I think we want to describe it with precision, because there are also many cases where _it works fine_. Telling people when it works and when it doesn't work is a lot more useful than attempting to qualitatively estimate how good or bad it is. It sounds like we're in violent agreement. I'm not sure what I said which might have led you to believe I felt otherwise. [reviews thread] The suggestion you felt was more disparaging was: : This behavior makes Read Committed mode unsuitable for : many UPDATE or DELETE commands with joins or subqueries You do realize that what is already in the documentation, for which this was a suggested replacement, was?: : This behavior makes Read Committed mode unsuitable for : commands that involve complex search conditions I'm not seeing where I made it more disparaging; I was trying to clarify under what circumstances it was a problem. If you have a suggestion for a better way to phrase the part I left alone, feel free to suggest something. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY IN as SELECT target
On Thu, Dec 17, 2009 at 1:50 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: You might want to specify column names as well as well as types, in this second case. Well, we could do it like VALUES: arbitrarily name the columns column1 ... columnN and tell people to use an alias if they want other names. If it's convenient to fit column names into the syntax, good, but we don't absolutely have to. [ thinks... ] Although actually the obvious SQL-ish syntax for a rowtype specification is ( colname typename [ , ... ] ) so that's probably what we'd want to do in the processed-data case. Yeah, I think that's good. Not sure about the raw-data case --- maybe a predetermined name is okay there. I would expect so. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY IN as SELECT target
Hi, Le 17 déc. 2009 à 19:39, Josh Berkus a écrit : Mind you, returning (arbitrary expression) would be even better, but if we can get returning TEXT[] for 8.5, I think it's worth doing on its own. Well, you already have it as soon as you have text[]: INSERT INTO destination SELECT row[0], row[1], myfunction(row[0], row[1]), row[2]::int + 1 FROM (COPY RETURNING text[] FROM '/path/to/file.cvs' CVS HEADER) as file(row); Of course as Andrew said already what it needs that the syntax here does not cover is ragged file processing, that is accepting file content when all the rows will not have the same number of columns. But if you have ragged input reading and COPY as a relation in a query, then you're able to apply any expression you want to in the query itself. Such as transforming the input slightly in order to conform to PostgreSQL datatype input syntaxes, e.g. Regards, -- dim Let's deprecate pgloader. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] determine snapshot after obtaining locks for first statement
Tom Lane t...@sss.pgh.pa.us wrote: [a transaction] might have also changed some other row so that it now *does* satisfy WHERE, but we won't ever find that other row because in the query snapshot it doesn't pass the WHERE. OK; got it. No way to fix that, really, without getting a fresh snapshot and re-starting the command, is there? I take it from your earlier posts that wouldn't be pretty. On the bright side, to be taken as showing an inconsistent state, the transaction on which we block has to both move one or more rows into the matching set as well as moving one or more rows out. Another example of the phenomenon: connection1: test=# create table t (name text not null primary key, is_it boolean not null); NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index t_pkey for table t CREATE TABLE test=# insert into t values ('huey', true), ('dewey', false), ('louie', false); INSERT 0 3 test=# start transaction isolation level read committed; START TRANSACTION test=# update t set is_it = not is_it where name in ('huey', 'dewey'); UPDATE 2 connection2: test=# start transaction isolation level read committed; START TRANSACTION test=# select * from t where is_it for update; [blocks] connection1: test=# commit; COMMIT connection2: name | is_it --+--- (0 rows) test=# select * from t where is_it for update; name | is_it ---+--- dewey | t (1 row) So this particular issue means that rows affected will be the intersection of rows matching the WHERE clause before and after the conflicting concurrent transaction(s) commit. The join/subquery issue means that all values used would be based on the snapshot at the start of the statement except that values from rows updated by concurrent transactions on which we blocked would be based on the updated rows. Any other issues? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: Spurious 22 in hstore.sgml
David Fetter wrote: On Thu, Dec 17, 2009 at 11:01:19AM +0100, Albe Laurenz wrote: Magnus Hagander wrote: On Wed, Dec 16, 2009 at 20:34, David E. Wheeler wrote: *** a/doc/src/sgml/hstore.sgml --- b/doc/src/sgml/hstore.sgml Heh, interesting. That clearly shouldn't be there. Applied. Does this count as catch-22? Patch-22, perhaps? Wait, that 22 was holding up the entire database system. ;-) -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] remove redundant ownership checks
KaiGai, * KaiGai Kohei (kai...@ak.jp.nec.com) wrote: The patch was not attached... This patch either does too much, or not enough. I would either leave the Assert() in-place as a double-check (I presume that's why it was there in the first place, and if that Assert() fails then our assumption about the permissions check being already done on the object in question would be wrong, since the check is done against the passed-in 'rel' and the assert is that 'rel' and 'ruletup-ev_class' are the same; if they're not, then we might need to do perms checking on ruletup-ev_class) Or Remove the now-unused variable eventRelationOid. Overall, I agree with removing this check as it's already done by ATSimplePermissions() and we don't double-check the permissions in the other things called through ATExecCmd() (though there are cases where specific commands have to do *additional* checks beyond what ATSimplePermissions() does.. it might be worth looking into what those are and thinking about if we should move/consolidate/etc them, or if it makes sense to leave them where they are). Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] An example of bugs for Hot Standby
On Wed, 2009-12-16 at 14:05 +, Simon Riggs wrote: On Wed, 2009-12-16 at 10:33 +, Simon Riggs wrote: On Tue, 2009-12-15 at 20:25 +0900, Hiroyuki Yamada wrote: Hot Standby node can freeze when startup process calls LockBufferForCleanup(). This bug can be reproduced by the following procedure. Interesting. Looks like this can happen, which is a shame cos I just removed the wait checking code after not ever having seen a wait. Thanks for the report. Must-fix item for HS. So this deadlock can happen at two places: 1. When a relation lock waits behind an AccessExclusiveLock and then Startup runs LockBufferForCleanup() 2. When Startup is a pin count waiter and a lock acquire begins to wait on a relation lock So we must put in direct deadlock detection in both places. We can't use the normal deadlock detector because in case (1) the backend might already have exceeded deadlock_timeout. Proposal: Better proposal * It's possible for 3-way deadlocks to occur in Hot Standby mode. * If a user backend sleeps on a lock while it holds a buffer pin that * leaves open the risk of deadlock. The user backend will only sleep * if it waits behind an AccessExclusiveLock held by Startup process. * If the Startup process then tries to access any buffer that is pinned * then it too will sleep and neither process will ever wake. * * We need to make a deadlock check in two places: in the user backend * when we sleep on a lock, and in the Startup process when we sleep * on a buffer pin. We need both checks because the deadlock can occur * from both directions. * * Just before a user backend sleeps on a lock, we accumulate a list of * buffers pinned by the backend. We then grab the an LWlock * and then check each of the buffers to see if the Startup process is * waiting on them. If so, we release the lock and throw deadlock error. * If Startup process is not waiting we then record the pinned buffers * in the BufferDeadlockRisk data structure and release the lock. * When we later get the lock we remove the deadlock risk. * * When the Startup process is about to wait on a buffer pin it checks * the buffer it is about to pin in the BufferDeadlockRisk list. If the * buffer is already held by one or more lock waiters then we send a * conflict cancel to them and wait for them to die before rechecking * the buffer lock. This way we only cancel direct deadlocks. It doesn't solve general problem of buffer waits, but they may be solvable by different mechanism. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: Spurious 22 in hstore.sgml
Bruce Momjian wrote: David Fetter wrote: On Thu, Dec 17, 2009 at 11:01:19AM +0100, Albe Laurenz wrote: Magnus Hagander wrote: On Wed, Dec 16, 2009 at 20:34, David E. Wheeler wrote: *** a/doc/src/sgml/hstore.sgml --- b/doc/src/sgml/hstore.sgml Heh, interesting. That clearly shouldn't be there. Applied. Does this count as catch-22? Patch-22, perhaps? Wait, that 22 was holding up the entire database system. ;-) You mean holding it up as in that was a load bearing integer! or holding it up as in this is a small caliber firearm -- gimme yer schema! ? Greg W. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] Installing PL/pgSQL by default
Bruce Momjian wrote: Tom Lane wrote: Bruce Momjian br...@momjian.us writes: I installed PL/pgSQL by default via initdb with the attached patch. The only problem is that pg_dump still dumps out the language creation: CREATE PROCEDURAL LANGUAGE plpgsql; ALTER PROCEDURAL LANGUAGE plpgsql OWNER TO postgres; What is odd is that I used the same process that initdb uses to create other objects. Does anyone know why this is happening? I think pg_dump pays attention to what schema the objects are in, and that's most likely creating them in PUBLIC. Try adding set search_path = pg_catalog. It's not impossible that we'll have to tweak pg_dump a bit; it's never had to deal with languages that shouldn't be dumped ... I found that pg_dump tests for pg_language.lanispl == true, which is true for all the stored procedure languages. I can easily special case plpgsql, or check for FirstNormalObjectId, though I don't see that used in pg_dump currently. A more difficult issue is whether we should preserve the fact that plpgsql was _removed_ in the pg_dump output, i.e, if someone removes plpgsql from a database, do we issue a DROP LANGUAGE in pg_dump? I don't remember us having to deal with anything like this before. OK, the attached patch installs plpgsql by default from initdb, and supresses the dumping of CREATE LANGUAGE in 8.5 and in 8.3/8.4 if binary upgrade is used (because you know you are upgrading to a release that has plpgsql installed by default). The 8.3/8.4 is necessary so the schema load doesn't generate any errors and cause pg_migrator to exit. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + Index: doc/src/sgml/installation.sgml === RCS file: /cvsroot/pgsql/doc/src/sgml/installation.sgml,v retrieving revision 1.333 diff -c -c -r1.333 installation.sgml *** doc/src/sgml/installation.sgml 15 Dec 2009 22:59:53 - 1.333 --- doc/src/sgml/installation.sgml 17 Dec 2009 23:35:36 - *** *** 2266,2279 is commandcreatelang/command failing with unusual errors. For example, running as the owner of the PostgreSQL installation: screen ! -bash-3.00$ createlang plpgsql template1 ! createlang: language installation failed: ERROR: could not load library /opt/dbs/pgsql748/lib/plpgsql.so: A memory address is not in the address space for the process. /screen Running as a non-owner in the group posessing the PostgreSQL installation: screen ! -bash-3.00$ createlang plpgsql template1 ! createlang: language installation failed: ERROR: could not load library /opt/dbs/pgsql748/lib/plpgsql.so: Bad address /screen Another example is out of memory errors in the PostgreSQL server logs, with every memory allocation near or greater than 256 MB --- 2266,2279 is commandcreatelang/command failing with unusual errors. For example, running as the owner of the PostgreSQL installation: screen ! -bash-3.00$ createlang plperl template1 ! createlang: language installation failed: ERROR: could not load library /opt/dbs/pgsql748/lib/plperl.so: A memory address is not in the address space for the process. /screen Running as a non-owner in the group posessing the PostgreSQL installation: screen ! -bash-3.00$ createlang plperl template1 ! createlang: language installation failed: ERROR: could not load library /opt/dbs/pgsql748/lib/plperl.so: Bad address /screen Another example is out of memory errors in the PostgreSQL server logs, with every memory allocation near or greater than 256 MB Index: src/bin/initdb/initdb.c === RCS file: /cvsroot/pgsql/src/bin/initdb/initdb.c,v retrieving revision 1.178 diff -c -c -r1.178 initdb.c *** src/bin/initdb/initdb.c 11 Dec 2009 03:34:56 - 1.178 --- src/bin/initdb/initdb.c 17 Dec 2009 23:35:36 - *** *** 176,181 --- 176,182 static void setup_privileges(void); static void set_info_version(void); static void setup_schema(void); + static void load_plpgsql(void); static void vacuum_db(void); static void make_template0(void); static void make_postgres(void); *** *** 1894,1899 --- 1895,1925 } /* + * load PL/pgsql server-side language + */ + static void + load_plpgsql(void) + { + PG_CMD_DECL; + + fputs(_(loading PL/pgSQL server-side language ... ), stdout); + fflush(stdout); + + snprintf(cmd, sizeof(cmd), + \%s\ %s template1 %s, + backend_exec, backend_options, + DEVNULL); + + PG_CMD_OPEN; + + PG_CMD_PUTS(CREATE LANGUAGE plpgsql;\n); + + PG_CMD_CLOSE; + + check_ok(); + } + + /* * clean everything up in template1 */ static void *** *** 3126,3131 --- 3152,3159
Re: [HACKERS] [PATCH] remove redundant ownership checks
(2009/12/18 6:38), Stephen Frost wrote: KaiGai, * KaiGai Kohei (kai...@ak.jp.nec.com) wrote: The patch was not attached... This patch either does too much, or not enough. I would either leave the Assert() in-place as a double-check (I presume that's why it was there in the first place, and if that Assert() fails then our assumption about the permissions check being already done on the object in question would be wrong, since the check is done against the passed-in 'rel' and the assert is that 'rel' and 'ruletup-ev_class' are the same; if they're not, then we might need to do perms checking on ruletup-ev_class) Or Remove the now-unused variable eventRelationOid. My preference is the later option, because the pg_rewrite entry to be checked is fetched using RULERELNAME syscache which takes OID of the relation and name of the rule. If this Assert() failed, it implies syscache mechanism has problem, not only integrity of pg_rewrite system catalog. The attached patch is an revised one. Thanks, Overall, I agree with removing this check as it's already done by ATSimplePermissions() and we don't double-check the permissions in the other things called through ATExecCmd() (though there are cases where specific commands have to do *additional* checks beyond what ATSimplePermissions() does.. it might be worth looking into what those are and thinking about if we should move/consolidate/etc them, or if it makes sense to leave them where they are). Thanks, Stephen -- OSS Platform Development Division, NEC KaiGai Kohei kai...@ak.jp.nec.com *** base/src/backend/rewrite/rewriteDefine.c (revision 2486) --- base/src/backend/rewrite/rewriteDefine.c (working copy) *** *** 671,677 { Relation pg_rewrite_desc; Oid owningRel = RelationGetRelid(rel); - Oid eventRelationOid; HeapTuple ruletup; bool changed = false; --- 671,676 *** *** 690,704 rulename, get_rel_name(owningRel; /* - * Verify that the user has appropriate permissions. - */ - eventRelationOid = ((Form_pg_rewrite) GETSTRUCT(ruletup))-ev_class; - Assert(eventRelationOid == owningRel); - if (!pg_class_ownercheck(eventRelationOid, GetUserId())) - aclcheck_error(ACLCHECK_NOT_OWNER, ACL_KIND_CLASS, - get_rel_name(eventRelationOid)); - - /* * Change ev_enabled if it is different from the desired new state. */ if (DatumGetChar(((Form_pg_rewrite) GETSTRUCT(ruletup))-ev_enabled) != --- 689,694 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] remove redundant ownership checks
KaiGai Kohei kai...@ak.jp.nec.com writes: [ patch to remove EnableDisableRule's permissions check ] I don't particularly like this patch, mainly because I disagree with randomly removing permissions checks without any sort of plan about where they ought to go. There are two principal entry points in rewriteDefine.c (the other one being DefineQueryRewrite), and currently they both do permissions checks. There is also a third major function RenameRewriteRule, currently ifdef'd out because it's not used, which is commented as Note that it lacks a permissions check, indicating that somebody (possibly me, I don't recall for sure) thought it was surprising that there wasn't such a check there. This is sensible if you suppose that this file implements rule utility commands that are more or less directly called by the user, which is in fact the case for DefineQueryRewrite, and it's not obvious why it wouldn't be the case for EnableDisableRule. Since that's a globally exposed function, it's quite possible that there's third-party code calling it and expecting it to do the appropriate permissions check. (A quick look at Slony, in particular, would be a good idea here.) If we're going to start moving these checks around we need a very well-defined notion of where permissions checks should be made, so that everyone knows what to expect. I have not seen any plan for that. Removing one check at a time because it appears to not be necessary in the code paths you've looked at is not a plan. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] determine snapshot after obtaining locks for first statement
On Thu, 2009-12-17 at 13:13 -0500, Tom Lane wrote: Joshua D. Drake j...@commandprompt.com writes: What is needed here is a layman's context of what isolation modes are good for what type of operation. Neither your explanation or Tom's is particularly useful except to say, Crap, I might be screwed but I don't know if I am... how do I find out? If we had a simple way to characterize that, we'd not be having this discussion :-( Certainly true. Sorry if I came off harsh my intent was to illustrate the more verbose yet less detailed information isn't going to help. Joshua D. Drake -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564 Consulting, Training, Support, Custom Development, Engineering Respect is earned, not gained through arbitrary and repetitive use or Mr. or Sir. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] determine snapshot after obtaining locks for first statement
On Thu, 2009-12-17 at 12:16 -0600, Kevin Grittner wrote: Tom Lane t...@sss.pgh.pa.us wrote: After thinking a bit, I'd be inclined to add a new paragraph. In particular, now that FOR UPDATE actually works in subqueries, it'd be worth pointing out that you can add that to guard against this type of issue. Perhaps, after the DELETE FROM website example, we could add something like UPDATEs and DELETEs involving joins or subqueries are particularly at risk, since they may perform an update based on a combination of old rows from other tables with an up-to-date target row. This risk can be mitigated by adding FOR UPDATE or FOR SHARE to subqueries, so that all rows directly involved in an update are guaranteed current. However that will also increase the risk of deadlock failures. Much better than my suggestion. Including both the problem conditions and the solution is ideal. I'd missed that we now allow FOR UPDATE and FOR SHARE on subqueries. Nice enhancement. +1 Joshua D. Drake -Kevin -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564 Consulting, Training, Support, Custom Development, Engineering Respect is earned, not gained through arbitrary and repetitive use or Mr. or Sir. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Largeobject Access Controls (r2460)
Robert Haas robertmh...@gmail.com wrote: Another comment is I'd like to keep link linkend=catalog-pg-largeobject-metadata for the first structnamepg_largeobject/structname in each topic. Those two things aren't the same. Perhaps you meant link linkend=catalog-pg-largeobject? Oops, yes. Thank you for the correction. We also have 8.4.x series in the core code. Do you think we also rewrite those messages? One of them is an use-visible message. LargeObjectAlterOwner() : pg_largeobject.c * The 'lo_compat_privileges' is not checked here, because we * don't have any access control features in the 8.4.x series * or earlier release. * So, it is not a place we can define a compatible behavior. guc.c {lo_compat_privileges, PGC_SUSET, COMPAT_OPTIONS_PREVIOUS, gettext_noop(Enables backward compatibility in privilege checks on large objects), gettext_noop(When turned on, privilege checks on large objects perform with backward compatibility as 8.4.x or earlier releases.) Regards, --- Takahiro Itagaki NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch - per-tablespace random_page_cost/seq_page_cost
On Thu, Dec 3, 2009 at 11:00 AM, Robert Haas robertmh...@gmail.com wrote: On Sat, Nov 28, 2009 at 9:54 PM, David Rowley dgrow...@gmail.com wrote: Robert Haas Wrote: Hmm. I'm not able to reliably detect a performance difference between unpatched CVS HEAD (er... git master branch) and same with spcoptions- v2.patch applied. I figured that if there were going to be an impact, it would be most likely to manifest itself in a query that touches lots and lots of tables but does very little actual work. So I used the attached script to create 200 empty tables, 100 in the default tablespace and 100 in tablespace dork (also known as, why I am working on this at 11 PM on Thanksgiving). Then I did: SELECT * FROM a1, a2, a3, ..., a100; (I've not read the patch, but I've just read the thread) If you're just benchmarking the planner times to see if the extra lookups are affecting the planning times, would it not be better to benchmark EXPLAIN SELECT * FROM a1, a2, a3, ..., a100; ? Otherwise any small changes might be drowned out in the execution time. Scanning 100 relations even if they are empty could account for quite a bit of that time, right? Possibly, but even if I can measure a difference doing it that way, it's not clear that it matters. It's fairly certain that there will be a performance degradation if we measure carefully enough, but if that difference is imperceptible in real-world scanerios, then it's not worth worrying about. Still, I probably will test it just to see. I did some fairly careful benchmarking of EXPLAIN SELECT * FROM a1, a2, a3, ..., a100. I explained this query 100 times via DBD::Pg and used time to measure how long the script took to run. I ran the script three times. And the result is... the unpatched version came out 1.7% SLOWER than the patched version. This seems difficult to take seriously, since it can't possibly be faster to do a syscache lookup and parse an array than it is to fetch a constant from a known memory address, but that's what I got. At any rate, it seems pretty clear that it's not hurting much. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch - per-tablespace random_page_cost/seq_page_cost
On Thu, Nov 26, 2009 at 4:25 PM, Robert Haas robertmh...@gmail.com wrote: On Sat, Nov 14, 2009 at 4:58 PM, Tom Lane t...@sss.pgh.pa.us wrote: I don't think there's even a solid consensus right now on which GUCs people would want to set at the tablespace level. This seems like an important point that we need to nail down. The original motivation for this patch was based on seq_page_cost and random_page_cost, to cover the case where, for example, one tablespace is on an SSD and another tablespace is on a RAID array. Greg Stark proposed adding effective_io_concurrency, and that makes plenty of sense to me, but I'm sort of disinclined to attempt to implement that as part of this patch because I have no familiarity with that part of the code and no hardware that I can use to test either the current behavior or the modified behavior. Since I'm recoding this to use the reloptions mechanism, a patch to add support for that should be pretty easy to write as a follow-on patch once this goes in. Any other suggestions? Going once... going twice... since no one has suggested anything or spoken against the proposal above, I'm just going to implement seq_page_cost and random_page_cost for now. Current version of patch is attached. I've revised it to use the reloptions stuff, but I don't think it's committable as-is because it currently thinks that extracting options from a pg_tablespace tuple is a cheap operation, which was true in the non-reloptions-based implementation but is less true now. At least, some benchmarking needs to be done to figure out whether and to what extent this is an issue. Per the email that I just sent a few minutes ago, there doesn't appear to be a performance impact in doing this even in a relatively stupid way - every call that requires seq_page_cost and/or random_page_cost results in a syscache lookup and then uses the relcache machinery to parse the returned array. That leaves the question of what the most elegant design is here. Tom suggested upthread that we should tag every RelOptInfo - and, presumably, IndexOptInfo, though it wasn't discussed - with this information. I don't however much like the idea of adding identically named members in both places. Should the number of options expand in the future, this will become silly very quickly. One option is to define a struct with seq_page_cost and random_page_cost that is then included in RelOptInfo and IndexOptInfo. It would seem to make sense to make the struct, rather than a pointer to the struct, the member, because it makes the copyfuncs/equalfuncs stuff easier to handle, and there's not really any benefit in incurring more palloc overhead. However, I'm sort of inclined to go ahead and invent a mini-cache for tablespaces. It avoids the (apparently insignificant) overhead of reparsing the array multiple times, but it also avoids bloating RelOptInfo and IndexOptInfo with more members than really necessary. It seems like a good idea to add one member to those structures anyway, for reltablespace, but copying all the values into every one we create just seems silly. Admittedly there are only two values right now, but again we may want to add more someday, and caching at the tablespace level just seems like the right way to do it. Thoughts? ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] remove redundant ownership checks
On Thu, Dec 17, 2009 at 7:19 PM, Tom Lane t...@sss.pgh.pa.us wrote: If we're going to start moving these checks around we need a very well-defined notion of where permissions checks should be made, so that everyone knows what to expect. I have not seen any plan for that. Removing one check at a time because it appears to not be necessary in the code paths you've looked at is not a plan. I'm not completely familiar with the existing code structure here, but it sort of seems like in general you might want to divide up the processing of a statement into a parse analysis phase, a permissions checking phase, and an execution phase. The parse analysis seems to be mostly separated out into transformXyz() functions, but the permissions checking is mixed in with the execution. Disentangling that seems like a job and a half. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming replication and non-blocking I/O
On Thu, Dec 17, 2009 at 10:25 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Fujii Masao wrote: I'm thinking of making the standby send the walsender-switch-code the same way as application_name; walreceiver always specifies the option like replication=on in conninfo string and calls PQconnectdb(), which sends the code as a part of startup packet. And, the environment variable for that should not be defined to avoid user's mis-configuration, I think. Sounds good. Okey. Design clarification again; 0. Begin by connecting to the master using PQconnectdb() with new conninfo option specifying the request of replication. The startup packet with the request is sent to the master, then the backend switches to the walsender mode. The walsender goes into the main loop and wait for the request from the walreceiver. 1. Get the system identifier of the master. Slave - Master: Query message, with a query string like GET_SYSTEM_IDENTIFIER Master - Slave: RowDescription, DataRow CommandComplete, and ReadyForQuery messages. The system identifier is returned in the DataRow message. 2. Another query exchange like above, for timeline ID. Slave - Master: Query message, with a query string like GET_TIMELINE Master - Slave: RowDescription, DataRow CommandComplete, and ReadyForQuery messages. The timeline ID is returned in the DataRow message. 3. Request a backup history file, if needed: Slave - Master: Query message, with a query string like GET_BACKUP_HISTORY_FILE XXX where XXX is XLogRecPtr. Master - Slave: RowDescription, DataRow CommandComplete and ReadyForQuery messages as usual. The file contents are returned in the DataRow message. In 1, 2, 3, the walreceiver uses PQexec() to send Query message and receive the results. 4. Start replication Slave - Master: Query message, with query string START REPLICATION: , where is the RecPtr of the starting point. Master - Slave: CopyOutResponse followed by a continuous stream of CopyData messages with WAL contents. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] remove redundant ownership checks
* Tom Lane (t...@sss.pgh.pa.us) wrote: KaiGai Kohei kai...@ak.jp.nec.com writes: [ patch to remove EnableDisableRule's permissions check ] I don't particularly like this patch, mainly because I disagree with randomly removing permissions checks without any sort of plan about where they ought to go. The goal of this was to increase consistancy with the rest of the code, in particular, ATPrepCmd checks ownership rights on the table, and anything which wants to check permissions beyond that has to do it independently. Do I like that? No, not really. There are two principal entry points in rewriteDefine.c (the other one being DefineQueryRewrite), and currently they both do permissions checks. DefineQueryRewrite gets called out of tcop/utility.c (either under a CREATE VIEW or a CREATE RULE). tcop/utility.c expects the functions it calls to to handle permissions checking (except in the one case it doesn't call a function- T_IndexStmt). Interestingly, DefineIndex, while it handles a number of other permissions checks, doesn't do checks to ensure the caller is the table owner- it expects callers to have done that (which happens both in tcop/utility.c for CREATE INDEX and in ATPrepCmd for ALTER TABLE ...). There is also a third major function RenameRewriteRule, currently ifdef'd out because it's not used, which is commented as Note that it lacks a permissions check, indicating that somebody (possibly me, I don't recall for sure) thought it was surprising that there wasn't such a check there. This is sensible if you suppose that this file implements rule utility commands that are more or less directly called by the user, which is in fact the case for DefineQueryRewrite, and it's not obvious why it wouldn't be the case for EnableDisableRule. Since that's a globally exposed function, it's quite possible that there's third-party code calling it and expecting it to do the appropriate permissions check. (A quick look at Slony, in particular, would be a good idea here.) Personally I find it suprising that things called from ATExecCmd() expect some permissions checks to have been done already. If we're going to start moving these checks around we need a very well-defined notion of where permissions checks should be made, so that everyone knows what to expect. I have not seen any plan for that. Removing one check at a time because it appears to not be necessary in the code paths you've looked at is not a plan. What I suggested previously, though it may be naive, is to do the permissions checks when you actually have enough information to do them. At the moment we do a 'simple' check in ATPrepCmd (essentially, ownership of the relation) and then any more complicated checks have to be done by the function which actually understands what's going on enough to know what else needs to be checked (eg: ATAddForeignKeyConstraint). As I see it, you've mainly got three steps: Parsing the command (syntax, basic understanding) Validation (check for object existance, get object info, etc) Implementation (do the requested action) I would put the permissions checking between Validation and Implementation, ideally at the 'top' of Implementation. This, imv, is pretty similar to how we handle DML commands today- parsing/validation are done first but then the permissions checks aren't done until we actually go to run the command (of course, this also deals with prepared queries and the like). Right now what we have are a bunch of checks scattered around during Validation, as we come across things we think should be checked (gee, we know the table the user referenced, let's check if he owns it now). Of course, this patch isn't doing that because it was intended to make the existing code consistant, not institute a new policy for how permissions checking should be done. The big downside about trying to define a new policy is that it's alot more difficult to get agreement on, and there are likely to be exceptions brought out that might make the policy appear to be ill-formed. Do people think this is a sensible approach? Are there known exceptions where this won't work or would cause problems? Is it possible to make that kind of deliniation in general? Should we work up an example patch which moves some of these checks in that direction? Thanks for your comments. Stephen signature.asc Description: Digital signature
Re: [HACKERS] LATERAL
On Sun, Oct 18, 2009 at 2:57 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: You could probably convince me that a merge join is not going to be too useful (how often can you want a merge join on the inner side of a nested loop? Why not? As Andrew pointed out, what we're really trying to accomplish here is consider sub-join plans that are parameterized by a value obtained from an outer relation. I think we shouldn't artificially limit what we consider. But anyway I think we're on the same page here: what we ought to do is try implementing this scheme without any extra restrictions on what it considers, and see what the performance is like. We can try to limit what it considers if it turns out not to work well in the simplest form. You mention what we ought to do here, so - is this something that you're planning to have a go at? Another question I have - while generalizing the inner-indexscan machinery is an interesting join optimization technique, I'm thinking that it actually has very little to do with LATERAL. Is there any reason to suppose that one or the other needs to be done first? ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] remove redundant ownership checks
* Robert Haas (robertmh...@gmail.com) wrote: Disentangling that seems like a job and a half. Indeed it will be, but I think it would be a good thing to actually have a defined point at which permissions checking is to be done. Trying to read the code and figure out what permissions you need to perform certain actions, when some of those checks are done as 'prep work' far up the tree, isn't fun. Makes validation of the checks we say we do in the documentation more difficult too. Not to mention that if we want to allow more granular permission granting for certain operations, it gets even uglier.. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] [PATCH] remove redundant ownership checks
On Thu, Dec 17, 2009 at 10:14 PM, Stephen Frost sfr...@snowman.net wrote: * Robert Haas (robertmh...@gmail.com) wrote: Disentangling that seems like a job and a half. Indeed it will be, but I think it would be a good thing to actually have a defined point at which permissions checking is to be done. Completely agree... ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Largeobject Access Controls (r2460)
On Thu, Dec 17, 2009 at 7:27 PM, Takahiro Itagaki itagaki.takah...@oss.ntt.co.jp wrote: Another comment is I'd like to keep link linkend=catalog-pg-largeobject-metadata for the first structnamepg_largeobject/structname in each topic. Those two things aren't the same. Perhaps you meant link linkend=catalog-pg-largeobject? Oops, yes. Thank you for the correction. We also have 8.4.x series in the core code. Do you think we also rewrite those messages? One of them is an use-visible message. Yes. I started going through the comments tonight. Partial patch attached. There were two comments that I was unable to understand and therefore could not reword - the one at the top of pg_largeobject_aclmask_snapshot(), and the second part of the comment at the top of LargeObjectExists(): * Note that LargeObjectExists always scans the system catalog * with SnapshotNow, so it is unavailable to use to check * existence in read-only accesses. In both cases, I'm lost. Help? In acldefault(), there is this comment: /* Grant SELECT,UPDATE by default, for now */ This doesn't seem to match what the code is doing, so I think we should remove it. I also notice that dumpBlobComments() is now misnamed, but it seems we've chosen to add a comment mentioning that fact rather than fixing it. That doesn't seem like the right approach. ...Robert diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c index 809df7a..b0aea41 100644 --- a/src/backend/catalog/aclchk.c +++ b/src/backend/catalog/aclchk.c @@ -4261,9 +4261,8 @@ pg_language_ownercheck(Oid lan_oid, Oid roleid) /* * Ownership check for a largeobject (specified by OID) * - * Note that we have no candidate to call this routine with a certain - * snapshot except for SnapshotNow, so we don't provide an interface - * with _snapshot() version now. + * This is only used for operations like ALTER LARGE OBJECT that are always + * relative to SnapshotNow. */ bool pg_largeobject_ownercheck(Oid lobj_oid, Oid roleid) diff --git a/src/backend/catalog/pg_largeobject.c b/src/backend/catalog/pg_largeobject.c index ada5b88..dfbf350 100644 --- a/src/backend/catalog/pg_largeobject.c +++ b/src/backend/catalog/pg_largeobject.c @@ -79,10 +79,8 @@ LargeObjectCreate(Oid loid) } /* - * Drop a large object having the given LO identifier. - * - * When we drop a large object, it is necessary to drop both of metadata - * and data pages in same time. + * Drop a large object having the given LO identifier. Both the data pages + * and metadata must be dropped. */ void LargeObjectDrop(Oid loid) @@ -191,13 +189,12 @@ LargeObjectAlterOwner(Oid loid, Oid newOwnerId) if (!superuser()) { /* - * The 'lo_compat_privileges' is not checked here, because we - * don't have any access control features in the 8.4.x series - * or earlier release. - * So, it is not a place we can define a compatible behavior. + * lo_compat_privileges is not checked here, because ALTER + * LARGE OBJECT ... OWNER did not exist at all prior to + * PostgreSQL 8.5. + * + * We must be the owner of the existing object. */ - - /* Otherwise, must be owner of the existing object */ if (!pg_largeobject_ownercheck(loid, GetUserId())) ereport(ERROR, (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), @@ -251,9 +248,8 @@ LargeObjectAlterOwner(Oid loid, Oid newOwnerId) /* * LargeObjectExists * - * Currently, we don't use system cache to contain metadata of - * large objects, because massive number of large objects can - * consume not a small amount of process local memory. + * We don't use the system cache to for large object metadata, for fear of + * using too much local memory. * * Note that LargeObjectExists always scans the system catalog * with SnapshotNow, so it is unavailable to use to check diff --git a/src/backend/commands/comment.c b/src/backend/commands/comment.c index 8f8ecc7..ece2a30 100644 --- a/src/backend/commands/comment.c +++ b/src/backend/commands/comment.c @@ -1449,7 +1449,7 @@ CommentLargeObject(List *qualname, char *comment) * * See the comment in the inv_create() which describes * the reason why LargeObjectRelationId is used instead - * of the LargeObjectMetadataRelationId. + * of LargeObjectMetadataRelationId. */ CreateComments(loid, LargeObjectRelationId, 0, comment); } diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index c799b13..22b8ea7 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -1229,9 +1229,9 @@ static struct config_bool ConfigureNamesBool[] = { {lo_compat_privileges, PGC_SUSET, COMPAT_OPTIONS_PREVIOUS, - gettext_noop(Enables backward compatibility in privilege checks on large objects), - gettext_noop(When turned on, privilege checks on large objects perform - with backward compatibility as 8.4.x or earlier releases.) + gettext_noop(Enables backward compatibility mode for privilege checks on large objects), +
Re: [HACKERS] [PATCH] remove redundant ownership checks
(2009/12/18 9:19), Tom Lane wrote: KaiGai Koheikai...@ak.jp.nec.com writes: [ patch to remove EnableDisableRule's permissions check ] I don't particularly like this patch, mainly because I disagree with randomly removing permissions checks without any sort of plan about where they ought to go. There are two principal entry points in rewriteDefine.c (the other one being DefineQueryRewrite), and currently they both do permissions checks. There is also a third major function RenameRewriteRule, currently ifdef'd out because it's not used, which is commented as Note that it lacks a permissions check, indicating that somebody (possibly me, I don't recall for sure) thought it was surprising that there wasn't such a check there. This is sensible if you suppose that this file implements rule utility commands that are more or less directly called by the user, which is in fact the case for DefineQueryRewrite, and it's not obvious why it wouldn't be the case for EnableDisableRule. Since that's a globally exposed function, it's quite possible that there's third-party code calling it and expecting it to do the appropriate permissions check. (A quick look at Slony, in particular, would be a good idea here.) If we consider this permission check is a part of specification in the EnableDisableRule(), it is a viewpoint/standpoint. However, if we adopt this standpoint, we should skip ATSimplePermissions() for ENABLE/DISABLE RULE options in ATPrepCmd(), because ATExecCmd() calls EnableDisableRule() which applies permission checks according to the specification. We don't need to apply same checks twice. It will be enough with either of them. But I don't think skipping ATSimplePermissions() is a right direction, because the existing PG model obviously handles rewrite rules as properties of relation, although it is not stored within pg_class. So, it is quite natural to check permission to modify properties of relations in ATPrepCmd(). -- OSS Platform Development Division, NEC KaiGai Kohei kai...@ak.jp.nec.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Largeobject Access Controls (r2460)
Robert Haas robertmh...@gmail.com wrote: In both cases, I'm lost. Help? They might be contrasted with the comments for myLargeObjectExists. Since we use MVCC visibility in loread(), metadata for large object also should be visible in MVCC rule. If I understand them, they say: * pg_largeobject_aclmask_snapshot requires a snapshot which will be used in loread(). * Don't use LargeObjectExists if you need MVCC visibility. In acldefault(), there is this comment: /* Grant SELECT,UPDATE by default, for now */ This doesn't seem to match what the code is doing, so I think we should remove it. Ah, ACL_NO_RIGHTS is the default. I also notice that dumpBlobComments() is now misnamed, but it seems we've chosen to add a comment mentioning that fact rather than fixing it. Hmmm, now it dumps not only comments but also ownership of large objects. Should we rename it dumpBlobMetadata() or so? Regards, --- Takahiro Itagaki NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY IN as SELECT target
2009/12/17 Andrew Dunstan and...@dunslane.net: Recently there was discussion about allowing a COPY statement to be a SELECT target, returning a text array, although the syntax wasn't really nailed down that I recall. I was thinking that we might have COPY RETURNING ARRAY FROM ... instead of COPY tablename opt_column_list FROM ... the we possibly could do things like: SELECT t[5] as a, 3*(t[3]::numeric) as b FROM (COPY RETURNING ARRAY FROM STDIN CSV) as t; Thoughts? In this case copy doesn't return array - so RETURNING ARRAY is little bit strange. what SELECT .. FROM (COPY VALUES [(colums)] FROM ) Regards Pavel cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers