Re: [HACKERS] refresh materialized view concurrently
On Tue, Jul 9, 2013 at 12:50 PM, Kevin Grittner kgri...@ymail.com wrote: Thanks again! New patch attached. After a couple of more attempts trying to break it, I mark this as ready to go. One small question: why do we use multiple unique indexes if exist? One index isn't enough? -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] refresh materialized view concurrently
Hitoshi Harada umi.tan...@gmail.com wrote: After a couple of more attempts trying to break it, I mark this as ready to go. Thanks. One small question: why do we use multiple unique indexes if exist? Two reasons. (1) By only matching up rows which test as equal on all columns used in primary keys, we can use UPDATE for the matches rather than DELETE and INSERT without fear of hitting a transient duplicate key error on one of the indexes. Since any update to an indexed column prevents a HOT update, and results in the equivalent of a DELETE and an INSERT anyway, there's no performance loss from letting them not match if *any* column which is part of *any* unique index doesn't match. (2) The planner can use one of the unique indexes to optimize the diff phase. How would we pick the one which is fastest? By comparing for equality on all the columns used in all unique indexes, we let the planner decide. I see no reason to try to second-guess it; just let it do it's thing. One index isn't enough? It would be enough for correctness, yes. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] refresh materialized view concurrently
On Sat, Jul 6, 2013 at 9:20 AM, Kevin Grittner kgri...@ymail.com wrote: Hitoshi Harada umi.tan...@gmail.com wrote: Oops! Indeed. Thanks for the careful testing. drop materialized view if exists mv; drop table if exists foo; create table foo(a, b) as values(1, 10); create materialized view mv as select * from foo; create unique index on mv(a); insert into foo select * from foo; refresh materialized view mv; refresh materialized view concurrently mv; test=# refresh materialized view mv; ERROR: could not create unique index mv_a_idx DETAIL: Key (a)=(1) is duplicated. test=# refresh materialized view concurrently mv; REFRESH MATERIALIZED VIEW Fixed by scanning the temp table for duplicates before generating the diff: test=# refresh materialized view concurrently mv; ERROR: new data for mv contains duplicate rows without any NULL columns DETAIL: Row: (1,10) I think the point is not check the duplicate rows. It is about unique key constraint violation. So, if you change the original table foo as values(1, 10), (1, 20), the issue is still reproduced. In non-concurrent operation it is checked by reindex_index when swapping the heap, but apparently we are not doing constraint check in concurrent mode. [ matview with all columns covered by unique indexes fails ] Fixed. Other than these, I've found index is opened with NoLock, relying on ExclusiveLock of parent matview, and ALTER INDEX SET TABLESPACE or something similar can run concurrently, but it is presumably safe. DROP INDEX, REINDEX would be blocked by the ExclusiveLock. Since others were also worried that an index definition could be modified while another process is holding an ExclusiveLock on its table, I changed this. OK, others are resolved. One thing I need to apology make_temptable_name_n, because I pointed the previous coding would be a potential memory overrun, but actually the relation name is defined by make_new_heap, so unless the function generates stupid long name, there is not possibility to make such big name that overruns NAMEDATALEN. So +1 for revert make_temptable_name_n, which is also meaninglessly invented. I've found another issue, which is regarding matview_maintenance_depth. If SPI calls failed (pg_cancel_backend is quite possible) and errors out in the middle of processing, this value stays above zero, so subsequently we can issue DML on the matview. We should make sure the value becomes zero before jumping out from this function. -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] refresh materialized view concurrently
Robert Haas robertmh...@gmail.com wrote: Hitoshi Harada umi.tan...@gmail.com wrote: Other than these, I've found index is opened with NoLock, relying on ExclusiveLock of parent matview, and ALTER INDEX SET TABLESPACE or something similar can run concurrently, but it is presumably safe. DROP INDEX, REINDEX would be blocked by the ExclusiveLock. I doubt very much that this is safe. And even if it is safe today, I think it's a bad idea, because we're likely to try to reduce lock levels in the future. Taking no lock on a relation we're opening, even an index, seems certain to be a bad idea. What we're talking about is taking a look at the index definition while the indexed table involved is covered by an ExclusiveLock. Why is that more dangerous than inserting entries into an index without taking a lock on that index while the indexed table is covered by a RowExclusiveLock, as happens on INSERT? RowExclusiveLock is a much weaker lock, and we can't add entries to an index without looking at its definition. Should we be taking out locks on every index for every INSERT? If not, what makes that safe while merely looking at the definition is too risky? -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] refresh materialized view concurrently
Kevin Grittner kgri...@ymail.com writes: Robert Haas robertmh...@gmail.com wrote: I doubt very much that this is safe. And even if it is safe today, I think it's a bad idea, because we're likely to try to reduce lock levels in the future. Taking no lock on a relation we're opening, even an index, seems certain to be a bad idea. I'm with Robert on this. What we're talking about is taking a look at the index definition while the indexed table involved is covered by an ExclusiveLock. Why is that more dangerous than inserting entries into an index without taking a lock on that index while the indexed table is covered by a RowExclusiveLock, as happens on INSERT? I don't believe that that happens. If it does, it's a bug. Either the planner or the executor should be taking a lock on each index touched by a query. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] refresh materialized view concurrently
On Wed, Jul 3, 2013 at 10:25 AM, Tom Lane t...@sss.pgh.pa.us wrote: Kevin Grittner kgri...@ymail.com writes: Robert Haas robertmh...@gmail.com wrote: I doubt very much that this is safe. And even if it is safe today, I think it's a bad idea, because we're likely to try to reduce lock levels in the future. Taking no lock on a relation we're opening, even an index, seems certain to be a bad idea. I'm with Robert on this. What we're talking about is taking a look at the index definition while the indexed table involved is covered by an ExclusiveLock. Why is that more dangerous than inserting entries into an index without taking a lock on that index while the indexed table is covered by a RowExclusiveLock, as happens on INSERT? I don't believe that that happens. If it does, it's a bug. Either the planner or the executor should be taking a lock on each index touched by a query. It seems Kevin's right. Not sure why that doesn't break. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] refresh materialized view concurrently
Robert Haas robertmh...@gmail.com writes: On Wed, Jul 3, 2013 at 10:25 AM, Tom Lane t...@sss.pgh.pa.us wrote: I don't believe that that happens. If it does, it's a bug. Either the planner or the executor should be taking a lock on each index touched by a query. It seems Kevin's right. Not sure why that doesn't break. Are we somehow not going through ExecOpenIndices? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] refresh materialized view concurrently
On Wed, Jul 3, 2013 at 10:47 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Wed, Jul 3, 2013 at 10:25 AM, Tom Lane t...@sss.pgh.pa.us wrote: I don't believe that that happens. If it does, it's a bug. Either the planner or the executor should be taking a lock on each index touched by a query. It seems Kevin's right. Not sure why that doesn't break. Are we somehow not going through ExecOpenIndices? I dunno. I just did a quick black-box test: CREATE TABLE foo (a int primary key); BEGIN; INSERT INTO foo VALUES (1); SELECT relation::regclass, locktype, mode, granted FROM pg_locks; I get: relation | locktype| mode | granted --+---+--+- pg_locks | relation | AccessShareLock | t foo | relation | RowExclusiveLock | t | virtualxid| ExclusiveLock| t | transactionid | ExclusiveLock| t No foo_pkey anywhere. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] refresh materialized view concurrently
Robert Haas robertmh...@gmail.com writes: On Wed, Jul 3, 2013 at 10:47 AM, Tom Lane t...@sss.pgh.pa.us wrote: Are we somehow not going through ExecOpenIndices? I dunno. I just did a quick black-box test: CREATE TABLE foo (a int primary key); BEGIN; INSERT INTO foo VALUES (1); SELECT relation::regclass, locktype, mode, granted FROM pg_locks; I get: relation | locktype| mode | granted --+---+--+- pg_locks | relation | AccessShareLock | t foo | relation | RowExclusiveLock | t | virtualxid| ExclusiveLock| t | transactionid | ExclusiveLock| t No foo_pkey anywhere. That proves nothing, as we don't keep such locks after the query (and there's no reason to AFAICS). See ExecCloseIndices. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] refresh materialized view concurrently
On 2013-07-03 11:08:32 -0400, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: On Wed, Jul 3, 2013 at 10:47 AM, Tom Lane t...@sss.pgh.pa.us wrote: Are we somehow not going through ExecOpenIndices? I dunno. I just did a quick black-box test: CREATE TABLE foo (a int primary key); BEGIN; INSERT INTO foo VALUES (1); SELECT relation::regclass, locktype, mode, granted FROM pg_locks; I get: relation | locktype| mode | granted --+---+--+- pg_locks | relation | AccessShareLock | t foo | relation | RowExclusiveLock | t | virtualxid| ExclusiveLock| t | transactionid | ExclusiveLock| t No foo_pkey anywhere. That proves nothing, as we don't keep such locks after the query (and there's no reason to AFAICS). See ExecCloseIndices. Should be easy enough to test by hacking LOCK TABLE to lock indexes and taking out a conflicting lock (SHARE?) in a second transaction. I wonder if we shouldn't just generally allow that, I remember relaxing that check before (when playing with CREATE/DROP CONCURRENTLY afair). Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] refresh materialized view concurrently
Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: Tom Lane t...@sss.pgh.pa.us wrote: Are we somehow not going through ExecOpenIndices? I dunno. I just did a quick black-box test: [ begin; insert; without commit ] No foo_pkey anywhere. That proves nothing, as we don't keep such locks after the query (and there's no reason to AFAICS). See ExecCloseIndices. OK. I had seen that no locks were held after the insert and wasn't aware that we acquired and then released them for each insert within a transaction. On the other hand, we acquire locks on all indexes even for a HOT UPDATE which uses a seqscan, and hold those until end of transaction. Is there a reason for that? I suppose that since a concurrent refresh is very likely to lock all these indexes with RowExclusiveLock anyway, as a result of the DELETE, UPDATE and INSERT statements subsequently issued through SPI, I might was well acquire that lock when I look at the definition, and not release it -- so that the subsequent locks are local to the backend, and therefore faster. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] refresh materialized view concurrently
Kevin Grittner kgri...@ymail.com writes: OK. I had seen that no locks were held after the insert and wasn't aware that we acquired and then released them for each insert within a transaction. On the other hand, we acquire locks on all indexes even for a HOT UPDATE which uses a seqscan, and hold those until end of transaction. Is there a reason for that? Sounds dubious to me; although in the HOT code it might be that there's no convenient place to release the index locks. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] refresh materialized view concurrently
Tom Lane t...@sss.pgh.pa.us wrote: Kevin Grittner kgri...@ymail.com writes: we acquire locks on all indexes even for a HOT UPDATE which uses a seqscan, and hold those until end of transaction. Is there a reason for that? Sounds dubious to me; although in the HOT code it might be that there's no convenient place to release the index locks. Further testing shows that any UPDATE or DELETE statement acquires a RowExclusiveLock on every index on the table and holds it until end of transaction, whether or not any rows are affected and regardless of whether an index scan or a seqscan is used. In fact, just an EXPLAIN of an UPDATE or DELETE does so. It is only INSERT which releases the locks at the end of the statement. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] refresh materialized view concurrently
Kevin Grittner kgri...@ymail.com writes: Further testing shows that any UPDATE or DELETE statement acquires a RowExclusiveLock on every index on the table and holds it until end of transaction, whether or not any rows are affected and regardless of whether an index scan or a seqscan is used. In fact, just an EXPLAIN of an UPDATE or DELETE does so. It is only INSERT which releases the locks at the end of the statement. Hm, possibly the planner is taking those locks. I don't think the executor's behavior is any different. But the planner only cares about indexes in SELECT/UPDATE/DELETE, since an INSERT has no interest in scanning the target table. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] refresh materialized view concurrently
On Thu, Jun 27, 2013 at 12:19 AM, Hitoshi Harada umi.tan...@gmail.comwrote: On Wed, Jun 26, 2013 at 1:38 AM, Kevin Grittner kgri...@ymail.com wrote: New version attached. Will take another look. Oops! drop materialized view if exists mv; drop table if exists foo; create table foo(a, b) as values(1, 10); create materialized view mv as select * from foo; create unique index on mv(a); insert into foo select * from foo; refresh materialized view mv; refresh materialized view concurrently mv; test=# refresh materialized view mv; ERROR: could not create unique index mv_a_idx DETAIL: Key (a)=(1) is duplicated. test=# refresh materialized view concurrently mv; REFRESH MATERIALIZED VIEW Here's one more. create table foo(a, b, c) as values(1, 2, 3); create materialized view mv as select * from foo; create unique index on mv (a); create unique index on mv (b); create unique index on mv (c); insert into foo values(2, 3, 4); insert into foo values(3, 4, 5); refresh materialized view concurrently mv; test=# refresh materialized view concurrently mv; ERROR: syntax error at or near FROM LINE 1: UPDATE public.mv x SET FROM pg_temp_2.pg_temp_16615_2 d WHE... ^ QUERY: UPDATE public.mv x SET FROM pg_temp_2.pg_temp_16615_2 d WHERE d.tid IS NOT NULL AND x.ctid = d.tid Other than these, I've found index is opened with NoLock, relying on ExclusiveLock of parent matview, and ALTER INDEX SET TABLESPACE or something similar can run concurrently, but it is presumably safe. DROP INDEX, REINDEX would be blocked by the ExclusiveLock. -- Hitoshi Harada
Re: [HACKERS] refresh materialized view concurrently
On Tue, Jul 2, 2013 at 4:02 AM, Hitoshi Harada umi.tan...@gmail.com wrote: Other than these, I've found index is opened with NoLock, relying on ExclusiveLock of parent matview, and ALTER INDEX SET TABLESPACE or something similar can run concurrently, but it is presumably safe. DROP INDEX, REINDEX would be blocked by the ExclusiveLock. I doubt very much that this is safe. And even if it is safe today, I think it's a bad idea, because we're likely to try to reduce lock levels in the future. Taking no lock on a relation we're opening, even an index, seems certain to be a bad idea. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] refresh materialized view concurrently
On Tue, Jun 25, 2013 at 9:07 AM, Robert Haas robertmh...@gmail.com wrote: On Fri, Jun 21, 2013 at 5:20 AM, Hitoshi Harada umi.tan...@gmail.com wrote: If I don't miss something, the requirement for the CONCURRENTLY option is to allow simple SELECT reader to read the matview concurrently while the view is populating the new data, and INSERT/UPDATE/DELETE and SELECT FOR UPDATE/SHARE are still blocked. So, I wonder why it is not possible just to acquire ExclusiveLock on the matview while populating the data and swap the relfile by taking small AccessExclusiveLock. This lock escalation is no dead lock hazard, I suppose, because concurrent operation would block the other at the point ExclusiveLock is acquired, and ExclusiveLock conflicts AccessExclusiveLock. Then you don't need the complicated SPI logic or unique key index dependency. This is no good. One, all lock upgrades are deadlock hazards. In this case, that plays out as follows: suppose that the session running REFRESH MATERIALIZED VIEW CONCURRENTLY also holds a lock on something else. Some other process takes an AccessShareLock on the materialized view and then tries to take a conflicting lock on the other object. Kaboom, deadlock. Granted, the chances of that happening in practice are small, but it IS the reason why we typically try to having long-running operations perform lock upgrades. Users get really annoyed when their DDL runs for an hour and then rolls back. OK, that' not safe. What I was thinking was something similar to compare-and-swap, where the whole operation is atomic under an AccessExclusiveLock. What if we release ExclusiveLock once a new matview was created and re-acquire AccessExclusiveLock before trying swap? Note matview is a little different from index which I know people are talking about in REINDEX CONCURRENTLY thread, in that the content of matview does not change incrementally (at least at this point), but only does change fully in swapping operation by the same REFRESH MATERIALIZED VIEW command. The only race condition is between releasing Exclusive lock and re-acquire AccessExclusiveLock someone else can go ahead with the same operation and could create another one. If it happens, let's abort us, because I guess that's the way our transaction system is working anyway; in case of unique key index insertion for example, if I find another guy is inserting the same value in the index, I wait for the other guy to finish his work and if his transaction commits I give up, otherwise I go ahead. Maybe it's annoying if an hour operation finally gets aborted, but my purpose is actually achieved by the other guy. If the primary goal of this feature is let reader reads the matview concurrently it should be ok? Hmm, but in such cases the first guy is always win and the second guy who may come an hour later loses so we cannot get the result from the latest command... I still wonder there should be some way. Two, until we get MVCC catalog scans, it's not safe to update any system catalog tuple without an AccessExclusiveLock on some locktag that will prevent concurrent catalog scans for that tuple. Under SnapshotNow semantics, concurrent readers can fail to see that the object is present at all, leading to mysterious failures - especially if some of the object's catalog scans are seen and others are missed. So what I'm saying above is take AccessExclusiveLock on swapping relfile in catalog. This doesn't violate your statement, I suppose. I'm actually still skeptical about MVCC catalog, because even if you can make catalog lookup MVCC, relfile on the filesystem is not MVCC. If session 1 changes relfilenode in pg_class and commit transaction, delete the old relfile from the filesystem, but another concurrent session 2 that just took a snapshot before 1 made such change keeps running and tries to open this relation, grabbing the old relfile and open it from filesystem -- ERROR: relfile not found. So everyone actually needs to see up-to-date information that synchronizes with what filesystem says and that's SnapshotNow. In my experimental thought above about compare-and-swap way, in compare phase he needs to see the most recent valid information, otherwise he never thinks someone did something new. Since I haven't read the whole thread, maybe we have already discussed about it, but it would help if you clarify this concern. Thanks, -- Hitoshi Harada
Re: [HACKERS] refresh materialized view concurrently
On Wed, Jun 26, 2013 at 1:38 AM, Kevin Grittner kgri...@ymail.com wrote: Hitoshi Harada umi.tan...@gmail.com wrote: I spent a few hours to review the patch. Thanks! As far as I can tell, the overall approach is as follows. - create a new temp heap as non-concurrent does, but with ExclusiveLock on the matview, so that reader wouldn't be blocked Non-concurrent creates the heap in the matview's tablespace and namespace, so the temp part is different in concurrent generation. This difference is why concurrent can be faster when few rows change. It's still not clear to me why you need temp in concurrent and not in non-concurrent. If this type of operations is always creating temp table and just swap it with existing one, why can't we just make it temp always? And if the performance is the only concern, is temp better than just turning off WAL for the table or UNLOGGED table? Also, before the next step there is an ANALYZE of the temp table, so the planner can make good choices in the next step. - with this temp table and the matview, query FULL JOIN and extract difference between original matview and temp heap (via SPI) Right; into another temp table. - this operation requires unique index for performance reason (or correctness reason too) It is primarily for correctness in the face of duplicate rows which have no nulls. Do you think the reasons need to be better documented with comments? Ah, yes, even after looking at patch I was confused if it was for performance or correctness. It's a shame we cannot refresh it concurrently if we have duplicate rows in the matview. I thought it would make sense to allow it without unique key if it was only performance tradeoffs. I also modified the confusing error message to something close to the suggestion from Robert. What to do about the Assert that the matview is not a system relation seems like material for a separate patch. After review, I'm inclined to remove the test altogether, so that extensions can create matviews in pg_catalog. I like this better. New version attached. Will take another look. Thanks, -- Hitoshi Harada
Re: [HACKERS] refresh materialized view concurrently
On 2013-06-27 00:12:07 -0700, Hitoshi Harada wrote: Two, until we get MVCC catalog scans, it's not safe to update any system catalog tuple without an AccessExclusiveLock on some locktag that will prevent concurrent catalog scans for that tuple. Under SnapshotNow semantics, concurrent readers can fail to see that the object is present at all, leading to mysterious failures - especially if some of the object's catalog scans are seen and others are missed. So what I'm saying above is take AccessExclusiveLock on swapping relfile in catalog. This doesn't violate your statement, I suppose. I'm actually still skeptical about MVCC catalog, because even if you can make catalog lookup MVCC, relfile on the filesystem is not MVCC. If session 1 changes relfilenode in pg_class and commit transaction, delete the old relfile from the filesystem, but another concurrent session 2 that just took a snapshot before 1 made such change keeps running and tries to open this relation, grabbing the old relfile and open it from filesystem -- ERROR: relfile not found. We can play cute tricks akin to what CREATE INDEX CONCURRENTLY currently does, i.e. wait for all other relations that could have possibly seen the old relfilenode (they must have at least a share lock on the relation) before dropping the actual storage. The reason we cannot currently do that in most scenarios is that we cannot perform transactional/mvcc updates of non-exclusively locked objects due to the SnapshotNow problems of seeing multiple or no versions of a row during a single scan. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] refresh materialized view concurrently
Andres Freund and...@2ndquadrant.com wrote: We can play cute tricks akin to what CREATE INDEX CONCURRENTLY currently does, i.e. wait for all other relations that could have possibly seen the old relfilenode (they must have at least a share lock on the relation) before dropping the actual storage. The reason we cannot currently do that in most scenarios is that we cannot perform transactional/mvcc updates of non-exclusively locked objects due to the SnapshotNow problems of seeing multiple or no versions of a row during a single scan. Not only would that be slower than the submitted patch in cases where only a few rows differ, but it could be waiting to swap in that new heap for an unbounded amount of time. I think the current patch will play nicer with incremental maintenance than what you suggest here. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] refresh materialized view concurrently
Hitoshi Harada umi.tan...@gmail.com wrote: Kevin Grittner kgri...@ymail.com wrote: Hitoshi Harada umi.tan...@gmail.com wrote: As far as I can tell, the overall approach is as follows. - create a new temp heap as non-concurrent does, but with ExclusiveLock on the matview, so that reader wouldn't be blocked Non-concurrent creates the heap in the matview's tablespace and namespace, so the temp part is different in concurrent generation. This difference is why concurrent can be faster when few rows change. It's still not clear to me why you need temp in concurrent and not in non-concurrent. Well, temp tables can be in an entirely different tablespace, so you can't just move the heap of a temp table into place as the new heap for the matview (as we do for a non-concurrent REFRESH) -- at least not without potentially copying everything an extra time. For concurrent we are modifying the existing matview heap, and the freshly generated set of values, as well as the diff table, are just needed, well, temporarily. That makes the behavior of temp tables ideal. Not only are they not logged, they are potentially placed on faster tablespaces, and the data written to them might never be written to disk: http://www.postgresql.org/docs/9.2/interactive/runtime-config-resource.html#GUC-TEMP-BUFFERS If this type of operations is always creating temp table and just swap it with existing one, why can't we just make it temp always? Because of the potentially differrent tablespaces. And if the performance is the only concern, is temp better than just turning off WAL for the table or UNLOGGED table? Yes, it is. - this operation requires unique index for performance reason (or correctness reason too) It is primarily for correctness in the face of duplicate rows which have no nulls. Do you think the reasons need to be better documented with comments? Ah, yes, even after looking at patch I was confused if it was for performance or correctness. This is the part of the function's comment which attempts to explain the problem. * This join cannot work if there are any * duplicated rows in either the old or new versions, in the sense that every * column would compare as equal between the two rows. It does work correctly * in the face of rows which have at least one NULL value, with all non-NULL * columns equal. The behavior of NULLs on equality tests and on UNIQUE * indexes turns out to be quite convenient here; the tests we need to make * are consistent with default behavior. If there is at least one UNIQUE * index on the materialized view, we have exactly the guarantee we need. By * joining based on equality on all columns which are part of any unique * index, we identify the rows on which we can use UPDATE without any problem. * If any column is NULL in either the old or new version of a row (or both), * we must use DELETE and INSERT, since there could be multiple rows which are * NOT DISTINCT FROM each other, and we could otherwise end up with the wrong * number of occurrences in the updated relation. I'm open to suggestions on better wording. As an example of the way the full join gets into trouble with duplicate rows when used for a diff, see this example: test=# create table old (c1 int, c2 int); CREATE TABLE test=# create table new (c1 int, c2 int); CREATE TABLE test=# insert into old values test-# (1,1),(1,2),(1,2),(1,2),(1,3),(1,null),(1,null); INSERT 0 7 test=# insert into new values test-# (1,1),(1,2),(1,2),(1,2),(1,2),(1,2),(1,4),(1,null),(1,null),(1,null); INSERT 0 10 At this point it is clear that we need to add two rows with values (1,2) and we need to wind up with one more row with values (1,null) than we already have. We also need to delete (1,3) and add (1,4). But full join logic fails to get things right for the case of duplicate rows with no nulls: test=# select old, new test-# from old test-# full join new on old.c1 = new.c1 and old.c2 = new.c2 test-# where (old.*) is distinct from (new.*); old | new ---+--- (1,3) | (1,) | (1,) | | (1,4) | (1,) | (1,) | (1,) (7 rows) It's a shame we cannot refresh it concurrently if we have duplicate rows in the matview. I thought it would make sense to allow it without unique key if it as only performance tradeoffs. Well, we *could* allow it without a unique index, but the code would be more complex and significantly slower. I think we would still want to handle it the way the current patch does when a unique index is present, even if we have a way to handle cases where such an index is not present. Even if I were convinced it was worthwhile to support the more general case, I would want to commit this patch first, and add the more complicated code as a follow-on patch. At this point I'm not convinced of the value of supporting concurrent refresh of materialized views which contain duplicate rows, nor of the benefit of allowing it to work ten
Re: [HACKERS] refresh materialized view concurrently
On Fri, Jun 21, 2013 at 5:20 AM, Hitoshi Harada umi.tan...@gmail.com wrote: If I don't miss something, the requirement for the CONCURRENTLY option is to allow simple SELECT reader to read the matview concurrently while the view is populating the new data, and INSERT/UPDATE/DELETE and SELECT FOR UPDATE/SHARE are still blocked. So, I wonder why it is not possible just to acquire ExclusiveLock on the matview while populating the data and swap the relfile by taking small AccessExclusiveLock. This lock escalation is no dead lock hazard, I suppose, because concurrent operation would block the other at the point ExclusiveLock is acquired, and ExclusiveLock conflicts AccessExclusiveLock. Then you don't need the complicated SPI logic or unique key index dependency. This is no good. One, all lock upgrades are deadlock hazards. In this case, that plays out as follows: suppose that the session running REFRESH MATERIALIZED VIEW CONCURRENTLY also holds a lock on something else. Some other process takes an AccessShareLock on the materialized view and then tries to take a conflicting lock on the other object. Kaboom, deadlock. Granted, the chances of that happening in practice are small, but it IS the reason why we typically try to having long-running operations perform lock upgrades. Users get really annoyed when their DDL runs for an hour and then rolls back. Two, until we get MVCC catalog scans, it's not safe to update any system catalog tuple without an AccessExclusiveLock on some locktag that will prevent concurrent catalog scans for that tuple. Under SnapshotNow semantics, concurrent readers can fail to see that the object is present at all, leading to mysterious failures - especially if some of the object's catalog scans are seen and others are missed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] refresh materialized view concurrently
On Fri, Jun 14, 2013 at 9:05 AM, Kevin Grittner kgri...@ymail.com wrote: Attached is a patch for REFRESH MATERIALIZED VIEW CONCURRENTLY for 9.4 CF1. The goal of this patch is to allow a refresh without interfering with concurrent reads, using transactional semantics. I spent a few hours to review the patch. As far as I can tell, the overall approach is as follows. - create a new temp heap as non-concurrent does, but with ExclusiveLock on the matview, so that reader wouldn't be blocked - with this temp table and the matview, query FULL JOIN and extract difference between original matview and temp heap (via SPI) - this operation requires unique index for performance reason (or correctness reason too) - run ANALYZE on this diff table - run UPDATE, INSERT and DELETE via SPI, to do the merge - close these temp heap If I don't miss something, the requirement for the CONCURRENTLY option is to allow simple SELECT reader to read the matview concurrently while the view is populating the new data, and INSERT/UPDATE/DELETE and SELECT FOR UPDATE/SHARE are still blocked. So, I wonder why it is not possible just to acquire ExclusiveLock on the matview while populating the data and swap the relfile by taking small AccessExclusiveLock. This lock escalation is no dead lock hazard, I suppose, because concurrent operation would block the other at the point ExclusiveLock is acquired, and ExclusiveLock conflicts AccessExclusiveLock. Then you don't need the complicated SPI logic or unique key index dependency. Assuming I'm asking something wrong and going for the current approach, here's what I've found in the patch. - I'm not sure if unique key index requirement is reasonable or not, because users only add CONCURRENTLY option and not merging or incremental update. - This could be an overflow in diffname buffer. + quoteRelationName(tempname, tempRel); + strcpy(diffname, tempname); + strcpy(diffname + strlen(diffname) - 1, _2\); - As others pointed out, quoteOneName can be replaced with quote_identifier - This looks harmless, but I wonder if you need to change relkind. *** 665,672 make_new_heap(Oid OIDOldHeap, Oid NewTableSpace) OldHeap-rd_rel-relowner, OldHeapDesc, NIL, ! OldHeap-rd_rel-relkind, ! OldHeap-rd_rel-relpersistence, false, RelationIsMapped(OldHeap), true, --- 680,687 OldHeap-rd_rel-relowner, OldHeapDesc, NIL, ! RELKIND_RELATION, ! relpersistence, false, RelationIsMapped(OldHeap), true, Since OldHeap-rd_rel-relkind has been working with 'm', too, not only 'r'? - I found two additional parameters on make_new_heap ugly, but couldn't come up with better solution. Maybe we can pass Relation of old heap to the function instead of Oid.. That's about it from me. Thanks, -- Hitoshi Harada
Re: [HACKERS] refresh materialized view concurrently
On Fri, Jun 21, 2013 at 2:20 AM, Hitoshi Harada umi.tan...@gmail.comwrote: On Fri, Jun 14, 2013 at 9:05 AM, Kevin Grittner kgri...@ymail.com wrote: Attached is a patch for REFRESH MATERIALIZED VIEW CONCURRENTLY for 9.4 CF1. The goal of this patch is to allow a refresh without interfering with concurrent reads, using transactional semantics. I spent a few hours to review the patch. Oh, BTW, though it is not part of this patch, but I came across this. ! /* ! * We're not using materialized views in the system catalogs. ! */ Assert(!IsSystemRelation(matviewRel)); Of course we don't have builtin matview on system catalog, but it is possible to create such one by allow_system_table_mods=true, so Assert doesn't look like good to me. Thanks, -- Hitoshi Harada
Re: [HACKERS] refresh materialized view concurrently
On 2013-06-21 02:43:23 -0700, Hitoshi Harada wrote: On Fri, Jun 21, 2013 at 2:20 AM, Hitoshi Harada umi.tan...@gmail.comwrote: On Fri, Jun 14, 2013 at 9:05 AM, Kevin Grittner kgri...@ymail.com wrote: Attached is a patch for REFRESH MATERIALIZED VIEW CONCURRENTLY for 9.4 CF1. The goal of this patch is to allow a refresh without interfering with concurrent reads, using transactional semantics. I spent a few hours to review the patch. Oh, BTW, though it is not part of this patch, but I came across this. ! /* ! * We're not using materialized views in the system catalogs. ! */ Assert(!IsSystemRelation(matviewRel)); Of course we don't have builtin matview on system catalog, but it is possible to create such one by allow_system_table_mods=true, so Assert doesn't look like good to me. Anybody using allow_system_table_mods gets to keep the pieces. There are so many ways to break just about everything things using it that I don't think worrying much makes sense. If you want you could replace that by an elog(), but... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] refresh materialized view concurrently
On 17 June 2013 00:43, Kevin Grittner kgri...@ymail.com wrote: Especially when one is known to be better than the other already. What is the hypothetical technique you're arguing is inferior? For my own part, I haven't gotten beyond the phase of knowing that to meet all requests for the feature, it would need to be available at about the same point that AFTER EACH STATEMENT triggers fire, but that it should not involve any user-written triggers. Have you implemented something similar to what you think I might be considering? Do you have benchmark results? Can you share details? Recording the changeset required by replication is known to be more efficient using WAL based extraction than using triggers. WAL writes are effectively free and using WAL concentrates the reads to avoid random I/O in large databases. That would be the most suitable approach for continuously updated matviews, or frequently updates. Extraction using multiple snapshots is also possible, using a technique similar to concurrently mechanism. That would require re-scanning the whole table which might be overkill depending upon the number of changes. That would work for reasonably infrequent updates. Given that we also want to do concurrent CLUSTER and ALTER TABLE ... SET TABLESPACE using changeset extraction I think its time that discussion happened on hackers. No objections to that here; but please don't hijack this thread for that discussion. There are multiple features all requiring efficient change set extraction. It seems extremely relevant to begin discussing what that mechanism might be in each case, so we don't develop 2 or even 3 different ones while everybody ignores each other. As you said, we should be helping each other and working together, and I agree. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] refresh materialized view concurrently
On 14.06.2013 19:05, Kevin Grittner wrote: Attached is a patch for REFRESH MATERIALIZED VIEW CONCURRENTLY for 9.4 CF1. The goal of this patch is to allow a refresh without interfering with concurrent reads, using transactional semantics. It is my hope to get this committed during this CF to allow me to focus on incremental maintenance for the rest of the release cycle. I must say this seems a bit pointless on its own. But if it's a stepping stone to incremental maintenance, I have no objections. I didn't need to touch very much outside of matview-specific files for this. My biggest concern is that I needed two small functions which did *exactly* what some static functions in ri_triggers.c were doing and couldn't see where the best place to share them from was. For the moment I just duplicated them, but my hope would be that they could be put in a suitable location and called from both places, rather than duplicating the 30-some lines of code. The function signatures are: void quoteOneName(char *buffer, const char *name) void quoteRelationName(char *buffer, Relation rel) I'd just use quote_identifier and quote_qualified_identifier instead. I didn't understand this error message: + if (!foundUniqueIndex) + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg(concurrent refresh requires a unique index on just columns for all rows of the materialized view))); + What does that mean? - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] refresh materialized view concurrently
Simon Riggs si...@2ndquadrant.com wrote: There are multiple features all requiring efficient change set extraction. It seems extremely relevant to begin discussing what that mechanism might be in each case Changeset extraction has nothing to do with this patch, and cannot possibly be useful for it. Please keep discussion which is completely unrelated to this patch off this thread. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] refresh materialized view concurrently
On 17 June 2013 12:13, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 14.06.2013 19:05, Kevin Grittner wrote: Attached is a patch for REFRESH MATERIALIZED VIEW CONCURRENTLY for 9.4 CF1. The goal of this patch is to allow a refresh without interfering with concurrent reads, using transactional semantics. It is my hope to get this committed during this CF to allow me to focus on incremental maintenance for the rest of the release cycle. I must say this seems a bit pointless on its own. But if it's a stepping stone to incremental maintenance, I have no objections. There are generally 4 kinds of mat view 1. Transactionally updated 2. Incremental update, eventually consistent 3. Incremental update, regular refresh 4. Full refresh At the moment we only have type 4 and it holds a full lock while it runs. We definitely need a CONCURRENTLY option and this is it. Implementing the other types won't invalidate what we currently have, so this makes sense to me. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] refresh materialized view concurrently
On 17 June 2013 13:15, Kevin Grittner kgri...@ymail.com wrote: Simon Riggs si...@2ndquadrant.com wrote: There are multiple features all requiring efficient change set extraction. It seems extremely relevant to begin discussing what that mechanism might be in each case Changeset extraction has nothing to do with this patch, and cannot possibly be useful for it. Please keep discussion which is completely unrelated to this patch off this thread. Kevin, You mentioned incremental maintenance in your original post and I have been discussing it. Had you not mentioned it, I doubt I would have thought of it. But since you did mention it, and its clearly an important issue, it seems relevant to have discussed it here and now. I'm happy to wait for you to start the thread discussing it directly though. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] refresh materialized view concurrently
Heikki Linnakangas hlinnakan...@vmware.com wrote: On 14.06.2013 19:05, Kevin Grittner wrote: Attached is a patch for REFRESH MATERIALIZED VIEW CONCURRENTLY for 9.4 CF1. The goal of this patch is to allow a refresh without interfering with concurrent reads, using transactional semantics. It is my hope to get this committed during this CF to allow me to focus on incremental maintenance for the rest of the release cycle. I must say this seems a bit pointless on its own. I completely disagree. When I read what people were posting about the materialized view creation that went into 9.3, there were many comments by people that they can't use it until the materialized views can be refreshed without blocking readers. There is a clear need for this. It doesn't do much to advance incremental maintenance, but it is a much smaller patch which will make matviews usable by a lot of people who can't use the initial feature set. I didn't understand this error message: + if (!foundUniqueIndex) + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg(concurrent refresh requires a unique index on just columns for all rows of the materialized view))); + What does that mean? It means that the REFRESH MATERIALIZED VIEW CONCURRENTLY command cannot be used on a materialized view unless it has at least one UNIQUE index which is not partial (i.e., there is no WHERE clause) and is not indexing on an expression (i.e., the index is entirely on bare column names). Set logic to do the diff is hard to get right if the tables are not proper sets (i.e., they contain duplicate rows). I can see at least three ways it *could* be done, but all of them are much more complex and significantly slower. With a UNIQUE index on some set of columns in all rows the correct guarantees exist to use fast set logic. It isn't that it's needed for access; it is needed to provide a guarantee that there is no row without NULLs that exactly duplicates another row. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] refresh materialized view concurrently
Simon Riggs si...@2ndquadrant.com wrote: Kevin Grittner kgri...@ymail.com wrote: Changeset extraction has nothing to do with this patch, and cannot possibly be useful for it. Please keep discussion which is completely unrelated to this patch off this thread. You mentioned incremental maintenance in your original post and I have been discussing it. Had you not mentioned it, I doubt I would have thought of it. But since you did mention it, and its clearly an important issue, it seems relevant to have discussed it here and now. What I said was that I wanted to get this out of the way before I started working on incremental maintenance. I'm happy to wait for you to start the thread discussing it directly though. Cool. -Kevin -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] refresh materialized view concurrently
2013/6/17 Heikki Linnakangas hlinnakan...@vmware.com: +errmsg(concurrent refresh requires a unique index on just columns for all rows of the materialized view))); Maybe my english is failing me here, but I don’t understand the “just” part. Nicolas -- A. Because it breaks the logical sequence of discussion. Q. Why is top posting bad? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] refresh materialized view concurrently
Nicolas Barbier nicolas.barb...@gmail.com wrote: 2013/6/17 Heikki Linnakangas hlinnakan...@vmware.com: + errmsg(concurrent refresh requires a unique index on just columns for all rows of the materialized view))); Maybe my english is failing me here, but I don’t understand the “just” part. It means that the index must not use any expressions in the list of what it's indexing on -- only column names. Suggestions for better wording would be welcome. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] refresh materialized view concurrently
On Mon, Jun 17, 2013 at 11:21 AM, Kevin Grittner kgri...@ymail.com wrote: Nicolas Barbier nicolas.barb...@gmail.com wrote: 2013/6/17 Heikki Linnakangas hlinnakan...@vmware.com: +errmsg(concurrent refresh requires a unique index on just columns for all rows of the materialized view))); Maybe my english is failing me here, but I don’t understand the “just” part. It means that the index must not use any expressions in the list of what it's indexing on -- only column names. Suggestions for better wording would be welcome. Random idea: ERROR: materialized view \%s\ does not have a unique key Perhaps augmented with: HINT: Create a UNIQUE btree index with no WHERE clause on one or more columns of the materialized view. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] refresh materialized view concurrently
On 06/17/2013 04:13 AM, Heikki Linnakangas wrote: On 14.06.2013 19:05, Kevin Grittner wrote: Attached is a patch for REFRESH MATERIALIZED VIEW CONCURRENTLY for 9.4 CF1. The goal of this patch is to allow a refresh without interfering with concurrent reads, using transactional semantics. It is my hope to get this committed during this CF to allow me to focus on incremental maintenance for the rest of the release cycle. I must say this seems a bit pointless on its own. But if it's a stepping stone to incremental maintenance, I have no objections. Actually, CONCURRENTLY was cited as the #1 deficiency for the matview feature for 9.3. With it, the use-case for Postgres matviews is broadened considerably. So it's very valuable on its own, even if (for example) INCREMENTAL didn't get done for 9.3. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] refresh materialized view concurrently
On 14 June 2013 17:05, Kevin Grittner kgri...@ymail.com wrote: Attached is a patch for REFRESH MATERIALIZED VIEW CONCURRENTLY for 9.4 CF1. The goal of this patch is to allow a refresh without interfering with concurrent reads, using transactional semantics. Is there a reason to keep the non-concurrent behaviour? Anybody that wants to explicitly lock should just run a LOCK statement. Can't we treat behaviour when fully locked as an optimisation, so we can just do the right thing without extra thought and keywords? It is my hope to get this committed during this CF to allow me to focus on incremental maintenance for the rest of the release cycle. Incremental maintenance will be very straightforward using the logical changeset extraction code Andres is working on. Having two parallel mechanisms for changeset extraction in one release seems like a waste of time. Especially when one is known to be better than the other already. Given that we also want to do concurrent CLUSTER and ALTER TABLE ... SET TABLESPACE using changeset extraction I think its time that discussion happened on hackers. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] refresh materialized view concurrently
Simon Riggs si...@2ndquadrant.com wrote: On 14 June 2013 17:05, Kevin Grittner kgri...@ymail.com wrote: Attached is a patch for REFRESH MATERIALIZED VIEW CONCURRENTLY for 9.4 CF1. The goal of this patch is to allow a refresh without interfering with concurrent reads, using transactional semantics. Is there a reason to keep the non-concurrent behaviour? Yes. For initial population, truncation, and replacement of contents when more than a small percentage of rows are affected. Anybody that wants to explicitly lock should just run a LOCK statement. Can't we treat behaviour when fully locked as an optimisation, so we can just do the right thing without extra thought and keywords? Are you suggesting that we use heap replacement or DML depending on what heavyweight locks held when the statement is executed? It is my hope to get this committed during this CF to allow me to focus on incremental maintenance for the rest of the release cycle. Incremental maintenance will be very straightforward using the logical changeset extraction code Andres is working on. At most, changeset extraction will help with obtaining the initial delta for the base relations, which is less than 5% of what needs doing for incremental maintenance of materialized views. If it looks like a good fit, of course I'll use it. Having two parallel mechanisms for changeset extraction in one release seems like a waste of time. I haven't looked in depth at what technique to use for capturing the base relation deltas. The changeset extraction technique is something to consider, for sure. I have a lot of work left to see whether it works for this. In particular, to handle all requested timings, it would need to have low enough latency to provide a delta during the completion of each DML statement, to support requests for eager maintenance of a materialized view -- where the transaction which just changed the base relation would see the effect if they queried the matview. That may not be the something to try to tackle in this release, but there are people who want it, and I would prefer to pick a technique which didn't have a latency high enough to make that impractical. That's not to say that I know that to be a problem for using the changeset extraction technique for this -- just that I haven't gotten to the point of evaluating that. Especially when one is known to be better than the other already. What is the hypothetical technique you're arguing is inferior? For my own part, I haven't gotten beyond the phase of knowing that to meet all requests for the feature, it would need to be available at about the same point that AFTER EACH STATEMENT triggers fire, but that it should not involve any user-written triggers. Have you implemented something similar to what you think I might be considering? Do you have benchmark results? Can you share details? Given that we also want to do concurrent CLUSTER and ALTER TABLE ... SET TABLESPACE using changeset extraction I think its time that discussion happened on hackers. No objections to that here; but please don't hijack this thread for that discussion. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers