Re: [HACKERS] 9.2.3 crashes during archive recovery
On 05.03.2013 14:09, KONDO Mitsumasa wrote: Hi, Horiguch's patch does not seem to record minRecoveryPoint in ReadRecord(); Attempt patch records minRecoveryPoint. [crash recovery - record minRecoveryPoint in control file - archive recovery] I think that this is an original intention of Heikki's patch. Yeah. That fix isn't right, though; XLogPageRead() is supposed to return true on success, and false on error, and the patch makes it return 'true' on error, if archive recovery was requested but we're still in crash recovery. The real issue here is that I missed the two return NULL;s in ReadRecord(), so the code that I put in the next_record_is_invalid codepath isn't run if XLogPageRead() doesn't find the file at all. Attached patch is the proper fix for this. I also found a bug in latest 9.2_stable. It does not get latest timeline and recovery history file in archive recovery when master and standby timeline is different. Works for me.. Can you create a test script for that? Remember to set recovery_target_timeline='latest'. - Heikki diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 92adc4e..74a54f6 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -4010,7 +4010,7 @@ ReadRecord(XLogRecPtr *RecPtr, int emode, bool fetching_ckpt) retry: /* Read the page containing the record */ if (!XLogPageRead(RecPtr, emode, fetching_ckpt, randAccess)) - return NULL; + goto next_record_is_invalid; pageHeaderSize = XLogPageHeaderSize((XLogPageHeader) readBuf); targetRecOff = RecPtr-xrecoff % XLOG_BLCKSZ; @@ -4168,7 +4168,7 @@ retry: } /* Wait for the next page to become available */ if (!XLogPageRead(pagelsn, emode, false, false)) -return NULL; +goto next_record_is_invalid; /* Check that the continuation record looks valid */ if (!(((XLogPageHeader) readBuf)-xlp_info XLP_FIRST_IS_CONTRECORD)) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
On 5 March 2013 22:02, Tom Lane t...@sss.pgh.pa.us wrote: FWIW, my opinion is that doing anything like this in the planner is going to be enormously expensive. Index matching is already pretty expensive, and that has the saving grace that we only do it once per base relation. Your sketch above implies trying to match to MVs once per considered join relation, which will be combinatorially worse. Even with a lot of sweat over reducing the cost of the matching, it will hurt. As we already said: no MVs = zero overhead = no problem. It costs in the cases where time savings are possible and not otherwise. The type of queries and their typical run times are different to the OLTP case, so any additional planning time is likely to be acceptable. I'm sure we can have a deep disussion about how to optimise the planner for this; I'm pretty sure there are reasonable answers to the not-small difficulties along that road. Most importantly, I want to make sure we don't swing the door shut on improvements here, especially if you (Tom) are not personally convinced enough to do the work yourself. Making transactional MVs work would be in part justified by the possible existence of automatic lookaside planning, so yes, the two things are not linked but certainly closely related. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On 5 March 2013 09:35, Heikki Linnakangas hlinnakan...@vmware.com wrote: Are there objectors? In addition to my hostility towards this patch in general, there are some specifics in the patch I'd like to raise (read out in a grumpy voice): ;-) We all want to make the right choice here, so all viewpoints gratefully received so we can decide. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On 5 March 2013 18:02, Jeff Davis pg...@j-davis.com wrote: Fletcher is probably significantly faster than CRC-16, because I'm just doing int32 addition in a tight loop. Simon originally chose Fletcher, so perhaps he has more to say. IIRC the research showed Fletcher was significantly faster for only a small loss in error detection rate. It was sufficient to make our error detection 1 million times better, possibly more. That seems sufficient to enable early detection of problems, since if we missed the first error, a second is very likely to be caught (etc). So I am assuming that we're trying to catch a pattern of errors early, rather than guarantee we can catch the very first error. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for REINDEX CONCURRENTLY
On 2013-03-06 13:21:27 +0900, Michael Paquier wrote: Please find attached updated patch realigned with your comments. You can find my answers inline... The only thing that needs clarification is the comment about UNIQUE_CHECK_YES/UNIQUE_CHECK_NO. Except that all the other things are corrected or adapted to what you wanted. I am also including now tests for matviews. On Wed, Mar 6, 2013 at 1:49 AM, Andres Freund and...@2ndquadrant.comwrote: On 2013-03-05 22:35:16 +0900, Michael Paquier wrote: + for (count = 0; count num_indexes; count++) + index_insert(toastidxs[count], t_values, t_isnull, + (toasttup-t_self), + toastrel, + toastidxs[count]-rd_index-indisunique ? + UNIQUE_CHECK_YES : UNIQUE_CHECK_NO); The indisunique check looks like a copy pasto to me, albeit not yours... Yes it is the same for all the indexes normally, but it looks more solid to me to do that as it is. So unchanged. Hm, if the toast indexes aren't unique anymore loads of stuff would be broken. Anyway, not your fault. I definitely cannot understand where you are going here. Could you be more explicit? Why could this be a problem? Without my patch a similar check is used for toast indexes. There's no problem. I just dislike the pointless check which caters for a situation that doesn't exist... Forget it, sorry. + if (count == 0) + snprintf(NewToastName, NAMEDATALEN, pg_toast_%u_index, + OIDOldHeap); + else + snprintf(NewToastName, NAMEDATALEN, pg_toast_%u_index_cct%d, + OIDOldHeap, count); + RenameRelationInternal(lfirst_oid(lc), + NewToastName); + count++; + } Hm. It seems wrong that this layer needs to know about _cct. Any other idea? For the time being I removed cct and added only a suffix based on the index number... Hm. It seems like throwing an error would be sufficient, that path is only entered for shared catalogs, right? Having multiple toast indexes would be a bug. Don't think so. Even if now those APIs are used only for catalog tables, I do not believe that this function has been designed to be used only with shared catalogs. Removing the cct suffix makes sense though... Forget what I said. + /* + * Index is considered as a constraint if it is PRIMARY KEY or EXCLUSION. + */ + isconstraint = indexRelation-rd_index-indisprimary || + indexRelation-rd_index-indisexclusion; unique constraints aren't mattering here? No they are not. Unique indexes are not counted as constraints in the case of index_create. Previous versions of the patch did that but there are issues with unique indexes using expressions. Hm. index_create's comment says: * isconstraint: index is owned by PRIMARY KEY, UNIQUE, or EXCLUSION constraint There are unique indexes that are constraints and some that are not. Looking at -indisunique is not sufficient to determine whether its one or not. Hum... OK. I changed that using a method based on get_index_constraint for a given index. So if the constraint Oid is invalid, it means that this index has no constraints and its concurrent entry won't create an index in consequence. It is more stable this way. Sounds good. Just to make that clear: To get a unique index without constraint: CREATE TABLE table_u(id int, data int); CREATE UNIQUE INDEX table_u__data ON table_u(data); To get a constraint: ALTER TABLE table_u ADD CONSTRAINT table_u__id_unique UNIQUE(id); + /* + * Phase 3 of REINDEX CONCURRENTLY + * + * During this phase the concurrent indexes catch up with the INSERT that + * might have occurred in the parent table and are marked as valid once done. + * + * We once again wait until no transaction can have the table open with + * the index marked as read-only for updates. Each index validation is done + * with a separate transaction to avoid opening transaction for an + * unnecessary too long time. + */ Maybe I am being dumb because I have the feeling I said differently in the past, but why do we not need a WaitForMultipleVirtualLocks() here? The comment seems to say we need to do so. Yes you said the contrary in a previous review. The purpose of this function is to first gather the locks and then
Re: [HACKERS] WIP: index support for regexp search
On Wed, Jan 23, 2013 at 7:29 PM, Tom Lane t...@sss.pgh.pa.us wrote: Heikki Linnakangas hlinnakan...@vmware.com writes: On 23.01.2013 09:36, Alexander Korotkov wrote: On Wed, Jan 23, 2013 at 6:08 AM, Tom Lanet...@sss.pgh.pa.us wrote: The biggest problem is that I really don't care for the idea of contrib/pg_trgm being this cozy with the innards of regex_t. The only option I see now is to provide a method like export_cnfa which would export corresponding CNFA in fixed format. Yeah, I think that makes sense. The transformation code in trgm_regexp.c would probably be more readable too, if it didn't have to deal with the regex guts representation of the CNFA. Also, once you have intermediate representation of the original CNFA, you could do some of the transformation work on that representation, before building the tranformed graph containing trigrams. You could eliminate any non-alphanumeric characters, joining states connected by arcs with non-alphanumeric characters, for example. It's not just the CNFA though; the other big API problem is with mapping colors back to characters. Right now, that not only knows way too much about a part of the regex internals we have ambitions to change soon, but it also requires pg_wchar2mb_with_len() and lowerstr(), neither of which should be known to the regex library IMO. So I'm not sure how we divvy that up sanely. To be clear: I'm not going to insist that we have to have a clean API factorization before we commit this at all. But it worries me if we don't even know how we could get to that, because we are going to need it eventually. Now, we probably don't have enough of time before 9.3 to solve an API problem :(. It's likely we have to choose either commit to 9.3 without clean API factorization or postpone it to 9.4. -- With best regards, Alexander Korotkov.
Re: [HACKERS] Writable foreign tables: how to identify rows
On 2013-03-05 19:30:53 -0500, Tom Lane wrote: One of the core problems for a writable-foreign-tables feature is how to identify a previously-fetched row for UPDATE or DELETE actions. In an ordinary Postgres table, we use the ctid system column for that, but a remote table doesn't necessarily have such a thing. ... For postgres_fdw, that would really be enough, since it could just cause a ctid column to be created with the usual definition. Then it could put the remote ctid into the usual t_self field in returned tuples. Supporting magic identifiers that aren't of the TID data type is considerably harder, mainly because it's not clear how heap_getsysattr() could know how to fetch the column value. I have some rough ideas about that, but I suggest that we might want to punt on that extension for the time being. What about just making it a bytea in fdw's? Now its not nice to waste space for a (probably 1byte) bytea header, but its not too bad either. The fdw author then needs to ensure only the correct data ends up in that system column. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] Floating point error
Maciek Sakrejda wrote: On Tue, Mar 5, 2013 at 12:03 AM, Albe Laurenz laurenz.a...@wien.gv.at wrote: I don't think that it is about looking nice. C doesn't promise you more than FLT_DIG or DBL_DIG digits of precision, so PostgreSQL cannot either. If you allow more, that would mean that if you store the same number on different platforms and query it, it might come out differently. Among other things, that would be a problem for the regression tests. Thank you: I think this is what I was missing, and what wasn't clear from the proposed doc patch. But then how can pg_dump assume that it's always safe to set extra_float_digits = 3? Why the discrepancy between default behavior and what pg_dump gets? It can't know whether the dump is to be restored into the same system or a different one (and AFAICT, there's not even an option to tweak extra_float_digits there). How about this elaboration? Yours, Laurenz Albe float-2.patch Description: float-2.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Writable foreign tables: how to identify rows
On Wed, Mar 6, 2013 at 12:35 PM, Pavan Deolasee pavan.deola...@gmail.com wrote: In the context of postgres_fdw, I am not sure if we need an additional system column like a node_id. Would there be a possibility where tuples to-be-modified are coming from different foreign tables and at runtime we need to decide where to execute the UPDATE/DELETE operation ? If we start supporting inheritance involving foreign tables as child tables, this will become a reality. Foreign tables have tableoid system column which holds pg_class.oid of the foreign table. It seems sufficient to determine source server. -- Shigeru HANADA -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Materialized View patch broke pg_dump
It looks like the recent matview patch broke pg_dump in a way, which make it impossible to dump 9.1 and 9.2 databases. it fails with pg_dump: [Archivierer (DB)] query failed: ERROR: function pg_relation_is_scannable(oid) does not exist Looking into this issue, it seems the version check in getTables() of pg_dump.c is wrong. Shouldn't the check be if (fout-remoteVersion = 90300) { } since this is where pg_relation_is_scannable() is introduced? -- Thanks Bernd -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation
On Wednesday, March 06, 2013 2:57 AM Heikki Linnakangas wrote: On 04.03.2013 06:39, Amit Kapila wrote: On Sunday, March 03, 2013 8:19 PM Craig Ringer wrote: On 02/05/2013 11:53 PM, Amit Kapila wrote: Performance data for the patch is attached with this mail. Conclusions from the readings (these are same as my previous patch): I've been doing investigating the pglz option further, and doing performance comparisons of the pglz approach and this patch. I'll begin with some numbers: unpatched (63d283ecd0bc5078594a64dfbae29276072cdf45): testname | wal_generated | duration -+---+- - -+---+ two short fields, no change |1245525360 | 9.94613695144653 two short fields, one changed |1245536528 | 10.146910905838 two short fields, both changed |1245523160 | 11.2332470417023 one short and one long field, no change |1054926504 | 5.90477800369263 ten tiny fields, all changed|1411774608 | 13.4536008834839 hundred tiny fields, all changed| 635739680 | 7.57448387145996 hundred tiny fields, half changed | 636930560 | 7.56888699531555 hundred tiny fields, half nulled| 573751120 | 6.68991994857788 Amit's wal_update_changes_v10.patch: testname | wal_generated | duration -+---+- - -+---+ two short fields, no change |1249722112 | 13.0558869838715 two short fields, one changed |1246145408 | 12.9947438240051 two short fields, both changed |1245951056 | 13.0262880325317 one short and one long field, no change | 678480664 | 5.70031690597534 ten tiny fields, all changed|1328873920 | 20.0167419910431 hundred tiny fields, all changed| 638149416 | 14.4236788749695 hundred tiny fields, half changed | 635560504 | 14.8770561218262 hundred tiny fields, half nulled| 558468352 | 16.2437210083008 pglz-with-micro-optimizations-1.patch: testname | wal_generated | duration -+---+- - -+---+ two short fields, no change |1245519008 | 11.6702048778534 two short fields, one changed |1245756904 | 11.3233819007874 two short fields, both changed |1249711088 | 11.6836447715759 one short and one long field, no change | 664741392 | 6.44810795783997 ten tiny fields, all changed|1328085568 | 13.9679481983185 hundred tiny fields, all changed| 635974088 | 9.15514206886292 hundred tiny fields, half changed | 636309040 | 9.13769292831421 hundred tiny fields, half nulled| 496396448 | 8.77351498603821 For some of the tests, it doesn't even execute main part of compression/encoding. The reason is that the length of tuple is less than strategy min length, so it returns from below check in function pglz_delta_encode() if (strategy-match_size_good = 0 || slen strategy-min_input_size || slen strategy-max_input_size) return false; The tests for which it doesn't execute encoding are below: two short fields, no change two short fields, one changed two short fields, both changed ten tiny fields, all changed For above cases, the reason of difference in timings for both approaches with original could be due to the reason that this check is done after some processing. So I think if we check the length in log_heap_update, then there should not be timing difference for above test scenario's. I can check that once. This optimization helps only when tuple is of length 128~200 bytes and upto 1800 bytes (till it turns to toast), otherwise it could result in overhead without any major WAL reduction. Infact I think in one of my initial patch there is a check if length of tuple is greater than 128 bytes then perform the optimization. I shall try to run both patches for cases when length of tuple is 128~200 bytes, as this optimization has benefits in those cases. In each test, a table is created with a large number of identical rows, and fillfactor=50. Then a full-table UPDATE is performed, and the UPDATE is timed. Duration is the time spent in the UPDATE (lower is better), and wal_generated is the amount of WAL generated by the updates (lower is better). The summary is that Amit's patch is a small win in terms of CPU usage, in the best case where the table has few columns, with one
Re: [HACKERS] Writable foreign tables: how to identify rows
On 06-Mar-2013, at 4:12 PM, Shigeru Hanada shigeru.han...@gmail.com wrote: On Wed, Mar 6, 2013 at 12:35 PM, Pavan Deolasee pavan.deola...@gmail.com wrote: In the context of postgres_fdw, I am not sure if we need an additional system column like a node_id. Would there be a possibility where tuples to-be-modified are coming from different foreign tables and at runtime we need to decide where to execute the UPDATE/DELETE operation ? If we start supporting inheritance involving foreign tables as child tables, this will become a reality. Foreign tables have tableoid system column which holds pg_class.oid of the foreign table. It seems sufficient to determine source server. I think you are right. In the context of foreign tables, tableoid might be enough to get the source. Unfortunately, Postgres-XC had not yet leveraged foreign tables fully and hence special handling was required. Thanks, Pavan -- Shigeru HANADA -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Writable foreign tables: how to identify rows
On Wed, Mar 6, 2013 at 9:30 AM, Tom Lane t...@sss.pgh.pa.us wrote: For postgres_fdw, that would really be enough, since it could just cause a ctid column to be created with the usual definition. Then it could put the remote ctid into the usual t_self field in returned tuples. I'm not sure whether postgres_fdw should support, but updatable views have no system column including ctid. So, we need magic identifier, perhaps it would be set of primary key value(s), to support updating remote updatable views via foreign tables. Just a heads up. -- Shigeru HANADA -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for REINDEX CONCURRENTLY
On 2013-03-06 20:59:37 +0900, Michael Paquier wrote: OK. Patches updated... Please see attached. With all the work done on those patches, I suppose this is close to being something clean... Yes, its looking good. There are loads of improvements possible but those can very well be made incrementally. I have the feeling we are talking past each other. Unless I miss something *there is no* WaitForMultipleVirtualLocks between phase 2 and 3. But one WaitForMultipleVirtualLocks for all would be totally sufficient. OK, sorry for the confusion. I added a call to WaitForMultipleVirtualLocks also before phase 3. Honestly, I am still not very comfortable with the fact that the ShareLock wait on parent relation is done outside each index transaction for build and validation... Changed as requested though... Could you detail your concerns a bit? I tried to think it through multiple times now and I still can't see a problem. The lock only ensures that nobody has the relation open with the old index definition in mind... Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for REINDEX CONCURRENTLY
On Wed, Mar 6, 2013 at 9:09 PM, Andres Freund and...@2ndquadrant.comwrote: On 2013-03-06 20:59:37 +0900, Michael Paquier wrote: OK. Patches updated... Please see attached. With all the work done on those patches, I suppose this is close to being something clean... Yes, its looking good. There are loads of improvements possible but those can very well be made incrementally. I have the feeling we are talking past each other. Unless I miss something *there is no* WaitForMultipleVirtualLocks between phase 2 and 3. But one WaitForMultipleVirtualLocks for all would be totally sufficient. OK, sorry for the confusion. I added a call to WaitForMultipleVirtualLocks also before phase 3. Honestly, I am still not very comfortable with the fact that the ShareLock wait on parent relation is done outside each index transaction for build and validation... Changed as requested though... Could you detail your concerns a bit? I tried to think it through multiple times now and I still can't see a problem. The lock only ensures that nobody has the relation open with the old index definition in mind... I am making a comparison with CREATE INDEX CONCURRENTLY where the ShareLock wait is made inside the build and validation transactions. Was there any particular reason why CREATE INDEX CONCURRENTLY wait is done inside a transaction block? That's my only concern. -- Michael
Re: [HACKERS] Support for REINDEX CONCURRENTLY
On 2013-03-06 21:19:57 +0900, Michael Paquier wrote: On Wed, Mar 6, 2013 at 9:09 PM, Andres Freund and...@2ndquadrant.comwrote: On 2013-03-06 20:59:37 +0900, Michael Paquier wrote: OK. Patches updated... Please see attached. With all the work done on those patches, I suppose this is close to being something clean... Yes, its looking good. There are loads of improvements possible but those can very well be made incrementally. I have the feeling we are talking past each other. Unless I miss something *there is no* WaitForMultipleVirtualLocks between phase 2 and 3. But one WaitForMultipleVirtualLocks for all would be totally sufficient. OK, sorry for the confusion. I added a call to WaitForMultipleVirtualLocks also before phase 3. Honestly, I am still not very comfortable with the fact that the ShareLock wait on parent relation is done outside each index transaction for build and validation... Changed as requested though... Could you detail your concerns a bit? I tried to think it through multiple times now and I still can't see a problem. The lock only ensures that nobody has the relation open with the old index definition in mind... I am making a comparison with CREATE INDEX CONCURRENTLY where the ShareLock wait is made inside the build and validation transactions. Was there any particular reason why CREATE INDEX CONCURRENTLY wait is done inside a transaction block? That's my only concern. Well, it needs to be executed in a transaction because it needs a valid resource owner and a previous CommitTransactionCommand() will leave that at NULL. And there is no reason in the single-index case of CREATE INDEX CONCURRENTLY to do it in a separate transaction. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: index support for regexp search
* Alexander Korotkov (aekorot...@gmail.com) wrote: Now, we probably don't have enough of time before 9.3 to solve an API problem :(. It's likely we have to choose either commit to 9.3 without clean API factorization or postpone it to 9.4. As much as I'd like this to get in, I don't think there's a question here- it should be punted to 9.4. These don't look like small issues to me. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Enabling Checksums
On 06.03.2013 10:41, Simon Riggs wrote: On 5 March 2013 18:02, Jeff Davispg...@j-davis.com wrote: Fletcher is probably significantly faster than CRC-16, because I'm just doing int32 addition in a tight loop. Simon originally chose Fletcher, so perhaps he has more to say. IIRC the research showed Fletcher was significantly faster for only a small loss in error detection rate. It was sufficient to make our error detection 1 million times better, possibly more. That seems sufficient to enable early detection of problems, since if we missed the first error, a second is very likely to be caught (etc). So I am assuming that we're trying to catch a pattern of errors early, rather than guarantee we can catch the very first error. Fletcher's checksum is good in general, I was mainly worried about truncating the Fletcher-64 into two 8-bit values. I can't spot any obvious weakness in it, but if it's indeed faster and as good as a straightforward Fletcher-16, I wonder why that method is not more widely used. Another thought is that perhaps something like CRC32C would be faster to calculate on modern hardware, and could be safely truncated to 16-bits using the same technique you're using to truncate the Fletcher's Checksum. Greg's tests showed that the overhead of CRC calculation is significant in some workloads, so it would be good to spend some time to optimize that. It'd be difficult to change the algorithm in a future release without breaking on-disk compatibility, so let's make sure we pick the best one. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
Simon Riggs si...@2ndquadrant.com writes: On 5 March 2013 22:02, Tom Lane t...@sss.pgh.pa.us wrote: FWIW, my opinion is that doing anything like this in the planner is going to be enormously expensive. As we already said: no MVs = zero overhead = no problem. Well, in the first place that statement is false on its face: we'll still spend cycles looking for relevant MVs, or at least maintaining a complexly-indexed cache that helps us find out that there are none in a reasonable amount of time. In the second place, even if it were approximately true it wouldn't help the people who were using MVs. It costs in the cases where time savings are possible and not otherwise. And that is just complete nonsense: matching costs whether you find a match or not. Could we have a little less Pollyanna-ish optimism and a bit more realism about the likely cost of such a feature? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] Floating point error
On 03/05/2013 07:23 PM, Tom Lane wrote: Maciek Sakrejda m.sakre...@gmail.com writes: Thank you: I think this is what I was missing, and what wasn't clear from the proposed doc patch. But then how can pg_dump assume that it's always safe to set extra_float_digits = 3? It's been proven (don't have a link handy, but the paper is at least a dozen years old) that 3 extra digits are sufficient to accurately reconstruct any IEEE single or double float value, given properly written conversion functions in libc. So that's where that number comes from. Now, if either end is not using IEEE floats, you may or may not get equivalent results --- but it's pretty hard to make any guarantees at all in such a case. There's also gdtoa, which returns the shortest decimal representation which rounds to the same decimal number. It would print 0.1 as 0.1, but 0.1 + 0.2 as 0.30004. -- Florian Weimer / Red Hat Product Security Team -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Optimizing pglz compressor
On Tue, Mar 5, 2013 at 8:32 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: With these tweaks, I was able to make pglz-based delta encoding perform roughly as well as Amit's patch. Out of curiosity, do we know how pglz compares with other algorithms, e.g. lz4 ? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On 2013-03-06 13:34:21 +0200, Heikki Linnakangas wrote: On 06.03.2013 10:41, Simon Riggs wrote: On 5 March 2013 18:02, Jeff Davispg...@j-davis.com wrote: Fletcher is probably significantly faster than CRC-16, because I'm just doing int32 addition in a tight loop. Simon originally chose Fletcher, so perhaps he has more to say. IIRC the research showed Fletcher was significantly faster for only a small loss in error detection rate. It was sufficient to make our error detection 1 million times better, possibly more. That seems sufficient to enable early detection of problems, since if we missed the first error, a second is very likely to be caught (etc). So I am assuming that we're trying to catch a pattern of errors early, rather than guarantee we can catch the very first error. Fletcher's checksum is good in general, I was mainly worried about truncating the Fletcher-64 into two 8-bit values. I can't spot any obvious weakness in it, but if it's indeed faster and as good as a straightforward Fletcher-16, I wonder why that method is not more widely used. I personally am not that convinced that fletcher is a such good choice given that it afaik doesn't distinguish between all-zero and all-one runs that are long enough. Another thought is that perhaps something like CRC32C would be faster to calculate on modern hardware, and could be safely truncated to 16-bits using the same technique you're using to truncate the Fletcher's Checksum. Greg's tests showed that the overhead of CRC calculation is significant in some workloads, so it would be good to spend some time to optimize that. It'd be difficult to change the algorithm in a future release without breaking on-disk compatibility, so let's make sure we pick the best one. I had implemented a noticeably faster CRC32 implementation somewhere around 201005202227.49990.and...@anarazel.de . I have since repeatedly seen pg's CRC32 implementation being a major limitation, so I think brushing up that patch would be a good idea. We might think about switching the polynom for WAL at the same time, given, as you say, CRC32c is available in hardware. The bigger problem is probably stuff like the control file et al. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
Tatsuo Ishii is...@postgresql.org wrote: Was the remaining work on docs done? I would like to test MVs and am waiting for the docs completed. I think they are done. If you notice anything missing or in need of clarification please let me know. At this point missing docs would be a bug in need of a fix. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized View patch broke pg_dump
Bernd Helmle maili...@oopsware.de wrote: It looks like the recent matview patch broke pg_dump in a way, which make it impossible to dump 9.1 and 9.2 databases. it fails with pg_dump: [Archivierer (DB)] query failed: ERROR: function pg_relation_is_scannable(oid) does not exist Looking into this issue, it seems the version check in getTables() of pg_dump.c is wrong. Shouldn't the check be if (fout-remoteVersion = 90300) { } since this is where pg_relation_is_scannable() is introduced? Right. Will fix. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Optimizing pglz compressor
On Wed, Mar 6, 2013 at 8:32 AM, Joachim Wieland j...@mcknight.de wrote: On Tue, Mar 5, 2013 at 8:32 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: With these tweaks, I was able to make pglz-based delta encoding perform roughly as well as Amit's patch. Out of curiosity, do we know how pglz compares with other algorithms, e.g. lz4 ? This has been a subject of much recent discussion. It compares very poorly, but installing a new compressor tends to be problematic due to patent concerns (something which I disagree with but it's there). All that said, Heikki's proposed changes seem to be low risk and quite fast. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized View patch broke pg_dump
Bernd Helmle maili...@oopsware.de wrote: Looking into this issue, it seems the version check in getTables() of pg_dump.c is wrong. Shouldn't the check be if (fout-remoteVersion = 90300) { } since this is where pg_relation_is_scannable() is introduced? Fixed. Thanks for the report! -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
Kevin, I haven't seen a reply to this. Were you able to give my notes below any consideration? On 2/15/13 12:44 PM, Peter Eisentraut wrote: On 1/25/13 1:00 AM, Kevin Grittner wrote: New patch rebased, fixes issues raised by Thom Brown, and addresses some of your points. This patch doesn't apply anymore, so I just took a superficial look. I think the intended functionality and the interfaces look pretty good. Documentation looks complete, tests are there. I have a couple of notes: * What you call WITH [NO] DATA, Oracle calls BUILD IMMEDIATE/DEFERRED. It might be better to use that as well then. * You use fields named relkind in the parse nodes, but they don't actually contain relkind values, which is confusing. I'd just name the field is_matview or something. * More generally, I wouldn't be so fond of combining the parse handling of CREATE TABLE AS and CREATE MATERIALIZED VIEW. They are similar, but then again so are a lot of other things. * Some of the terminology is inconsistent. A materialized view is sometimes called valid, populated, or built, with approximately the same meaning. Personally, I would settle on built, as per above, but it should be one term only. * I find the name of the relisvalid column a bit confusing. Especially because it only applies to materialized views, and there is already a meaning of valid for indexes. (Recall that indexes are also stored in pg_class, but they are concerned about indisvalid.) I would name it something like relmvbuilt. Btw., half of the patch seems to consist of updating places referring to relkind. Is something wrong with the meaning of relkind that this sort of thing is required? Maybe these places should be operating in terms of features, not accessing relkind directly. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
On Tue, Mar 5, 2013 at 9:08 PM, Robert Haas robertmh...@gmail.com wrote: All that having been said, it's hard for me to imagine that anyone really cares about any of this until we have an incremental update feature, which right now we don't. Actually, I'm betting that's going to be significantly harder than automatic-query-rewrite, when all is said and done. Automatic-query-rewrite, if and when we get it, will not be easy and will require a bunch of work from someone with a good understanding of the planner, but it strikes me as the sort of thing that might work out to one large project and then it's done. Whereas, incremental update sounds to me like a series of projects over a series of releases targeting various special cases, where we can always point to some improvements vs. release N-1 but we're never actually done and able to move on to the next thing. As a roadmap goes, I think that's OK. Even a reasonably simplistic and partial implementation of incremental update will benefit a lot of users. But in terms of relative difficulty, it's not at all obvious to me that that's the easier part of the project. While true that's true for a lot of Postgres features. The only ones that are one-shot projects are buried deep in the internals. Anything with UI implications inevitably has limitations and then other people come along and and work on removing or extending those features. I do agree with Tom though -- the most frequently asked for materialized view in the past has always been select count(*) from tab. People assume we already do this and are surprised when we don't. The cookie cutter solution for it is basically exactly what a incrementally updated materialized view solution would look like (with the queue of updates with transacion information that are periodically flattened into the aggregate). Rewriting this might be a bit tricky and require heuristics to determine just how much work to expend trying to match materialized views, this type of view would be where most of the win would be. I also can't see implementing query rewriting for non-transactionally-accurate materialized views. If people want a snapshot of the data that may be out of date that's great. I can tons of use cases for that. But then surely they won't be surprised to have to query the snapshot explicitly. If can't see going to all this trouble to implement transactions and snapshots and wal logging and so on and then silently rewriting queries to produce data that is not up to date. I think users would be surprised to find bog-standard SQL occasionally producing incorrect results. That said, there are cases where snapshots might be up to date even though we don't implement any incremental updates. If the underlying data is read-only or hasn't received any update commits since the snapshot was taken then it might still be useful. There are tons of ETL applications where you load the data once and then build MVs for it and never touch the underlying data again. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Wed, Mar 06, 2013 at 01:34:21PM +0200, Heikki Linnakangas wrote: On 06.03.2013 10:41, Simon Riggs wrote: On 5 March 2013 18:02, Jeff Davispg...@j-davis.com wrote: Fletcher is probably significantly faster than CRC-16, because I'm just doing int32 addition in a tight loop. Simon originally chose Fletcher, so perhaps he has more to say. IIRC the research showed Fletcher was significantly faster for only a small loss in error detection rate. It was sufficient to make our error detection 1 million times better, possibly more. That seems sufficient to enable early detection of problems, since if we missed the first error, a second is very likely to be caught (etc). So I am assuming that we're trying to catch a pattern of errors early, rather than guarantee we can catch the very first error. Fletcher's checksum is good in general, I was mainly worried about truncating the Fletcher-64 into two 8-bit values. I can't spot any obvious weakness in it, but if it's indeed faster and as good as a straightforward Fletcher-16, I wonder why that method is not more widely used. I was wondering about the effectiveness of this resulting truncated hash function as well. Another thought is that perhaps something like CRC32C would be faster to calculate on modern hardware, and could be safely truncated to 16-bits using the same technique you're using to truncate the Fletcher's Checksum. Greg's tests showed that the overhead of CRC calculation is significant in some workloads, so it would be good to spend some time to optimize that. It'd be difficult to change the algorithm in a future release without breaking on-disk compatibility, so let's make sure we pick the best one. If picking a CRC why not a short optimal one rather than truncate CRC32C? I've been reading about optimal checksum for small messages for other reasons and found this paper quite good. http://www.ece.cmu.edu/~koopman/roses/dsn04/koopman04_crc_poly_embedded.pdf I was interested in small messages and small checksums so this paper may not be as much help here. Other than CRCs and fletcher sums, Pearson hashing with a 16-bit block might be worth considering. Either a pearson hash or a 16-CRC is small enough to implement with a lookup table rather than a formula. I've been wondering what kind of errors we expect? Single bit flips? Large swaths of bytes corrupted? Are we more worried about collisions (the odds total garbage has the same checksum) or the odds we detect a flip of n-bits. I would think since the message is large and a write to the wrong location seems about as likely as a bit flip a pearson hash be good. Any choice seems like it would be a nice improvement of noticing a storage stack problem. The difference would be subtle. Can I estimate the odds of undetected corruption that occurred since the condition was first detected accurately or does the checksum/hash perform poorly? Garick -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Optimizing pglz compressor
On 2013-03-06 09:36:19 -0600, Merlin Moncure wrote: On Wed, Mar 6, 2013 at 8:32 AM, Joachim Wieland j...@mcknight.de wrote: On Tue, Mar 5, 2013 at 8:32 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: With these tweaks, I was able to make pglz-based delta encoding perform roughly as well as Amit's patch. Out of curiosity, do we know how pglz compares with other algorithms, e.g. lz4 ? This has been a subject of much recent discussion. It compares very poorly, but installing a new compressor tends to be problematic due to patent concerns (something which I disagree with but it's there). All that said, Heikki's proposed changes seem to be low risk and quite fast. Imo the licensing part is by far the smaller one. The interesting part is making a compatible change to the way toast compression works that supports multiple compression schemes. Afaics nobody has done that work. After that the choice of to-be-integrated compression schemes needs to be discussed, sure. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On 2013-03-06 11:21:21 -0500, Garick Hamlin wrote: If picking a CRC why not a short optimal one rather than truncate CRC32C? CRC32C is available in hardware since SSE4.2. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Writable foreign tables: how to identify rows
Shigeru Hanada shigeru.han...@gmail.com writes: I'm not sure whether postgres_fdw should support, but updatable views have no system column including ctid. So, we need magic identifier, perhaps it would be set of primary key value(s), to support updating remote updatable views via foreign tables. Yeah, I considered that. I thought seriously about proposing that we forget magic row identifiers altogether, and instead make postgres_fdw require a remote primary key for a foreign table to be updatable. That would solve the immediate problem since we'd no longer need any system columns at all. On balance though I think it's better for postgres_fdw to use ctid for this purpose, at least for now: ctid is better-performing and more reliable than a PK (since someone might mis-describe the PK, or change a row's PK value underneath us). Perhaps in a future release we could add a table option to use PK row identification, but I don't want to try to implement both methods in postgres_fdw right now. On the other hand, I don't have a problem with decreeing that non-Postgres FDWs need to use PK row identification in the first release; which would be the consequence if we don't do anything about allowing new system columns in 9.3. We will certainly need that style of row identification to be written and tested anyway. It won't stop us from extending things later. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Optimizing pglz compressor
On Wed, Mar 6, 2013 at 8:53 AM, Andres Freund and...@2ndquadrant.comwrote: On 2013-03-06 09:36:19 -0600, Merlin Moncure wrote: On Wed, Mar 6, 2013 at 8:32 AM, Joachim Wieland j...@mcknight.de wrote: On Tue, Mar 5, 2013 at 8:32 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: With these tweaks, I was able to make pglz-based delta encoding perform roughly as well as Amit's patch. Out of curiosity, do we know how pglz compares with other algorithms, e.g. lz4 ? This has been a subject of much recent discussion. It compares very poorly, but installing a new compressor tends to be problematic due to patent concerns (something which I disagree with but it's there). All that said, Heikki's proposed changes seem to be low risk and quite fast. Imo the licensing part is by far the smaller one. The interesting part is making a compatible change to the way toast compression works that supports multiple compression schemes. Afaics nobody has done that work. After that the choice of to-be-integrated compression schemes needs to be discussed, sure. Another thing to consider would be some way of recording an exemplar value for each column which is used to seed whatever compression algorithm is used. I think there often a lot of redundancy that does not appear within any given value, but does appear when viewing all the values of a given column. Finding some way to take advantage of that could give a big improvement in compression ratio. Cheers, Jeff
Re: [HACKERS] Support for REINDEX CONCURRENTLY
On Wed, Mar 6, 2013 at 8:59 PM, Michael Paquier michael.paqu...@gmail.com wrote: OK. Patches updated... Please see attached. I found odd behavior. After I made REINDEX CONCURRENTLY fail twice, I found that the index which was not marked as INVALID remained unexpectedly. =# CREATE TABLE hoge (i int primary key); CREATE TABLE =# INSERT INTO hoge VALUES (generate_series(1,10)); INSERT 0 10 =# SET statement_timeout TO '1s'; SET =# REINDEX TABLE CONCURRENTLY hoge; ERROR: canceling statement due to statement timeout =# \d hoge Table public.hoge Column | Type | Modifiers +-+--- i | integer | not null Indexes: hoge_pkey PRIMARY KEY, btree (i) hoge_pkey_cct PRIMARY KEY, btree (i) INVALID =# REINDEX TABLE CONCURRENTLY hoge; ERROR: canceling statement due to statement timeout =# \d hoge Table public.hoge Column | Type | Modifiers +-+--- i | integer | not null Indexes: hoge_pkey PRIMARY KEY, btree (i) hoge_pkey_cct PRIMARY KEY, btree (i) INVALID hoge_pkey_cct1 PRIMARY KEY, btree (i) INVALID hoge_pkey_cct_cct PRIMARY KEY, btree (i) +The recommended recovery method in such cases is to drop the concurrent +index and try again to perform commandREINDEX CONCURRENTLY/. If an invalid index depends on the constraint like primary key, drop the concurrent index cannot actually drop the index. In this case, you need to issue alter table ... drop constraint ... to recover the situation. I think this informataion should be documented. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Optimizing pglz compressor
On 2013-03-06 09:08:10 -0800, Jeff Janes wrote: On Wed, Mar 6, 2013 at 8:53 AM, Andres Freund and...@2ndquadrant.comwrote: On 2013-03-06 09:36:19 -0600, Merlin Moncure wrote: On Wed, Mar 6, 2013 at 8:32 AM, Joachim Wieland j...@mcknight.de wrote: On Tue, Mar 5, 2013 at 8:32 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: With these tweaks, I was able to make pglz-based delta encoding perform roughly as well as Amit's patch. Out of curiosity, do we know how pglz compares with other algorithms, e.g. lz4 ? This has been a subject of much recent discussion. It compares very poorly, but installing a new compressor tends to be problematic due to patent concerns (something which I disagree with but it's there). All that said, Heikki's proposed changes seem to be low risk and quite fast. Imo the licensing part is by far the smaller one. The interesting part is making a compatible change to the way toast compression works that supports multiple compression schemes. Afaics nobody has done that work. After that the choice of to-be-integrated compression schemes needs to be discussed, sure. Another thing to consider would be some way of recording an exemplar value for each column which is used to seed whatever compression algorithm is used. I think there often a lot of redundancy that does not appear within any given value, but does appear when viewing all the values of a given column. Finding some way to take advantage of that could give a big improvement in compression ratio. But then that value cannot be changed/removed because existing values depend on it. So either its a one of thing - which means you can only compute it after the table is filled to some extent - or you need to have a growing list of such values somewhere (refcounting it would be hard). I think the more reasonable route for such a thing would be to try and get page-level compression working. Which is a pretty hard project, I admit. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for REINDEX CONCURRENTLY
On 2013-03-07 02:09:49 +0900, Fujii Masao wrote: On Wed, Mar 6, 2013 at 8:59 PM, Michael Paquier michael.paqu...@gmail.com wrote: OK. Patches updated... Please see attached. I found odd behavior. After I made REINDEX CONCURRENTLY fail twice, I found that the index which was not marked as INVALID remained unexpectedly. Thats to be expected. Indexes need to be valid *before* we can drop the old one. So if you abort in the right moment you will see those and thats imo fine. =# CREATE TABLE hoge (i int primary key); CREATE TABLE =# INSERT INTO hoge VALUES (generate_series(1,10)); INSERT 0 10 =# SET statement_timeout TO '1s'; SET =# REINDEX TABLE CONCURRENTLY hoge; ERROR: canceling statement due to statement timeout =# \d hoge Table public.hoge Column | Type | Modifiers +-+--- i | integer | not null Indexes: hoge_pkey PRIMARY KEY, btree (i) hoge_pkey_cct PRIMARY KEY, btree (i) INVALID =# REINDEX TABLE CONCURRENTLY hoge; ERROR: canceling statement due to statement timeout =# \d hoge Table public.hoge Column | Type | Modifiers +-+--- i | integer | not null Indexes: hoge_pkey PRIMARY KEY, btree (i) hoge_pkey_cct PRIMARY KEY, btree (i) INVALID hoge_pkey_cct1 PRIMARY KEY, btree (i) INVALID hoge_pkey_cct_cct PRIMARY KEY, btree (i) Huh, why did that go through? It should have errored out? +The recommended recovery method in such cases is to drop the concurrent +index and try again to perform commandREINDEX CONCURRENTLY/. If an invalid index depends on the constraint like primary key, drop the concurrent index cannot actually drop the index. In this case, you need to issue alter table ... drop constraint ... to recover the situation. I think this informataion should be documented. I think we just shouldn't set -isprimary on the temporary indexes. Now we switch only the relfilenodes and not the whole index, that should be perfectly fine. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transforms
On 3/5/13 5:52 PM, Josh Berkus wrote: More on this: the problem appears to be that the symbols for hstore are loaded only if I've just just created the extension in that database: I see. This is a problem with any kind of dynamically loadable module that uses another module. The other module is only loaded either at creation time or when a function from it is first used (or if a preload mechanism is used). At run time, this will sort itself out, because all the required modules will be loaded. The problem is when you create the glue extension and haven't invoked any code from any of the dependent extension in the session. Abstractly, the possible solutions are either not to check the functions when the extension is created (possibly settable by a flag) or to somehow force a load of all dependent extensions when the new extension is created. (I say extension here even though dynamically loadable modules are attached to functions, which makes this even more confusing.) In normal programming languages, this is normally addressed by placing explicit load/require/import statements before you do anything else. What we are doing here is more like an autoload functionality that some environments have. Those have the same problem. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Optimizing pglz compressor
On Wed, Mar 6, 2013 at 10:53 AM, Andres Freund and...@2ndquadrant.com wrote: On 2013-03-06 09:36:19 -0600, Merlin Moncure wrote: On Wed, Mar 6, 2013 at 8:32 AM, Joachim Wieland j...@mcknight.de wrote: On Tue, Mar 5, 2013 at 8:32 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: With these tweaks, I was able to make pglz-based delta encoding perform roughly as well as Amit's patch. Out of curiosity, do we know how pglz compares with other algorithms, e.g. lz4 ? This has been a subject of much recent discussion. It compares very poorly, but installing a new compressor tends to be problematic due to patent concerns (something which I disagree with but it's there). All that said, Heikki's proposed changes seem to be low risk and quite fast. Imo the licensing part is by far the smaller one. The interesting part is making a compatible change to the way toast compression works that supports multiple compression schemes. Afaics nobody has done that work. After that the choice of to-be-integrated compression schemes needs to be discussed, sure. That wasn't the conversation I remember. lz4 completely smokes pglz (Heikki's changes notwithstanding) and is bsd licensed so AFAICS there it no reason to support multiple compression technologies except for migration purposes (and even if you did want to, a plausible way to do that was identified). ...but that's a separate topic for another day. Heikki's proposal seems like a win either way. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for REINDEX CONCURRENTLY
On Thu, Mar 7, 2013 at 2:17 AM, Andres Freund and...@2ndquadrant.com wrote: Indexes: hoge_pkey PRIMARY KEY, btree (i) hoge_pkey_cct PRIMARY KEY, btree (i) INVALID hoge_pkey_cct1 PRIMARY KEY, btree (i) INVALID hoge_pkey_cct_cct PRIMARY KEY, btree (i) Huh, why did that go through? It should have errored out? I'm not sure why. Anyway hoge_pkey_cct_cct should not appear or should be marked as invalid, I think. +The recommended recovery method in such cases is to drop the concurrent +index and try again to perform commandREINDEX CONCURRENTLY/. If an invalid index depends on the constraint like primary key, drop the concurrent index cannot actually drop the index. In this case, you need to issue alter table ... drop constraint ... to recover the situation. I think this informataion should be documented. I think we just shouldn't set -isprimary on the temporary indexes. Now we switch only the relfilenodes and not the whole index, that should be perfectly fine. Sounds good. But, what about other constraint case like unique constraint? Those other cases also can be resolved by not setting -isprimary? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Writable foreign tables: how to identify rows
Kohei KaiGai kai...@kaigai.gr.jp writes: 2013/3/6 Tom Lane t...@sss.pgh.pa.us: I think if we're going to support magic row identifiers, they need to be actual system columns, complete with negative attnums and entries in pg_attribute. Sorry, -1 for me. The proposed design tried to kill two birds with one stone. The reason why the new GetForeignRelWidth() can reserve multiple slot for pseudo-columns is, that intends to push-down complex calculation in target-list to the remote computing resource. [ shrug... ] That's just pie in the sky: there is no such capability in the submitted patch, nor is it close to being implementable. When weighing that against the very real probability that this patch won't get into 9.3 at all, I have no problem tossing that overboard to try to get to something committable. The larger issue here is that the patch is confusing the set of possibly-computed columns that could be returned by a scan node with the catalog-specified set of columns for a table. I don't think that changing the (representation of the) latter on the fly within the planner is a tenable approach. You've had to put in an unreasonable number of crude hacks already to make that sort-of work, but I have exactly zero confidence that you've hacked everyplace that would have to change. I also have no confidence that there aren't unfixable problems where the same TupleDesc would need to be in both states to satisfy the expectations of different bits of code. (An example of the risks here is that, IIRC, parts of the planner use the relation's TupleDesc as a guide to what relname.* means. So I'm pretty suspicious that the existing patch breaks behavior for whole-row Vars in some cases.) Really the idea is a bit broken in this form anyway, because if we did have a plan that involved calculating (x-y)^2 at the scan level, we'd want that expression to be represented using Vars for x and y, not some made-up Var whose meaning is not apparent from the system catalogs. Also, the right way to deal with this desire is to teach the planner in general, not just FDWs, about pushing expensive calculations down the plan tree --- basically, resurrecting Joe Hellerstein's thesis work, which we ripped out more than ten years ago. I don't think there's that much that would need to change about the planner's data structures, but deciding where is cheapest to evaluate an expression is a pretty hard problem. Trying to handle that locally within FDWs is doomed to failure IMO. So my intention is to get rid of GetForeignRelWidth() and make use of the existing ctid system column for returning remote TIDs in postgres_fdw. (On review I notice that we're creating ctid columns for foreign tables already, so we don't even need the proposed hook to let FDWs control that; though we will certainly want one in future if we are to support non-TID magic row identifiers.) If you find that unacceptable, I'm quite willing to mark this patch Returned With Feedback and get on with dealing with some of the other forty-odd patches in the CF queue. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transforms
Peter, At run time, this will sort itself out, because all the required modules will be loaded. The problem is when you create the glue extension and haven't invoked any code from any of the dependent extension in the session. Just invoking code doesn't seem to be enough. I tried just using the Hstore data type, and then loading hstore_plperl, but that still didn't work. It seems like only CREATE EXTENSION loads *all* the symbols. Abstractly, the possible solutions are either not to check the functions when the extension is created (possibly settable by a flag) or to somehow force a load of all dependent extensions when the new extension is created. The latter would be ideal, but I don't know that we currently have any mechanism for it. --Josh -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Optimizing pglz compressor
On 2013-03-06 11:31:06 -0600, Merlin Moncure wrote: On Wed, Mar 6, 2013 at 10:53 AM, Andres Freund and...@2ndquadrant.com wrote: On 2013-03-06 09:36:19 -0600, Merlin Moncure wrote: On Wed, Mar 6, 2013 at 8:32 AM, Joachim Wieland j...@mcknight.de wrote: On Tue, Mar 5, 2013 at 8:32 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: With these tweaks, I was able to make pglz-based delta encoding perform roughly as well as Amit's patch. Out of curiosity, do we know how pglz compares with other algorithms, e.g. lz4 ? This has been a subject of much recent discussion. It compares very poorly, but installing a new compressor tends to be problematic due to patent concerns (something which I disagree with but it's there). All that said, Heikki's proposed changes seem to be low risk and quite fast. Imo the licensing part is by far the smaller one. The interesting part is making a compatible change to the way toast compression works that supports multiple compression schemes. Afaics nobody has done that work. After that the choice of to-be-integrated compression schemes needs to be discussed, sure. That wasn't the conversation I remember. lz4 completely smokes pglz (Heikki's changes notwithstanding) and is bsd licensed so AFAICS there it no reason to support multiple compression technologies except for migration purposes (and even if you did want to, a plausible way to do that was identified). Well, we need to permanently support at least two technologies - otherwise we will break pg_upgrade. And there is absolutely no reason to believe nothing better than lz4 will come along in the future so just supporting two seems like a bad idea. And sure, there are rough ideas on how to support different compression schemes in toast, but nobody has implemented anything afaics. It would be possible to reuse what I proposed for indirect toast tuples for that purpose although I am not sure whether I like using varatt_external's in multiple types as indicated by varatt1_be.va_len_1be (renamed to va_type or such) apointing to different types. Which means that an uncompressible datum would potentially have multiple representations. ...but that's a separate topic for another day. Heikki's proposal seems like a win either way. Yes. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transforms
On 2013-03-06 09:53:29 -0800, Josh Berkus wrote: Peter, At run time, this will sort itself out, because all the required modules will be loaded. The problem is when you create the glue extension and haven't invoked any code from any of the dependent extension in the session. Just invoking code doesn't seem to be enough. I tried just using the Hstore data type, and then loading hstore_plperl, but that still didn't work. It seems like only CREATE EXTENSION loads *all* the symbols. Your error looks like youre erroring out in plperl not in hstore? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Mon, Mar 4, 2013 at 3:13 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 04.03.2013 20:58, Greg Smith wrote: There is no such thing as a stable release of btrfs, and no timetable for when there will be one. I could do some benchmarks of that but I didn't think they were very relevant. Who cares how fast something might run when it may not work correctly? btrfs might as well be /dev/null to me right now--sure it's fast, but maybe the data won't be there at all. This PostgreSQL patch hasn't seen any production use, either. In fact, I'd consider btrfs to be more mature than this patch. Unless you think that there will be some major changes to the worse in performance in btrfs, it's perfectly valid and useful to compare the two. A comparison with ZFS would be nice too. That's mature, and has checksums. We've had a few EnterpriseDB customers who have had fantastically painful experiences with PostgreSQL + ZFS. Supposedly, aligning the ZFS block size to the PostgreSQL block size is supposed to make these problems go away, but in my experience it does not have that effect. So I think telling people who want checksums go use ZFS is a lot like telling them oh, I see you have a hangnail, we recommend that you solve that by cutting your arm off with a rusty saw. There may be good reasons to reject this patch. Or there may not. But I completely disagree with the idea that asking them to solve the problem at the filesystem level is sensible. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Strange Windows problem, lock_timeout test request
Robert Haas robertmh...@gmail.com writes: On Wed, Feb 27, 2013 at 8:58 AM, Stephen Frost sfr...@snowman.net wrote: It's still entirely possible to get 99% done and then hit that last tuple that you need a lock on and just tip over the lock_timeout_stmt limit due to prior waiting and ending up wasting a bunch of work, hence why I'm not entirely sure that this is that much better than statement_timeout. I tend to agree that this should be based on the length of any individual lock wait, not the cumulative duration of lock waits. Otherwise, it seems like it'll be very hard to set this to a meaningful value. For example, if you set this to 1 minute, and that means the length of any single wait, then you basically know that it'll only kick in if there is some other, long-running transaction that's holding the lock. But if it means the cumulative length of all waits, it's not so clear, because now you might also have this kick in if you wait for 100ms on 600 different occasions. In other words, complex queries that lock or update many tuples may get killed even if they never wait very long at all for any single lock. That seems like it will be almost indistinguishable from random, unprincipled query cancellations. Yeah. I'm also unconvinced that there's really much use-case territory here that statement_timeout doesn't cover well enough. To have a case that statement-level lock timeout covers and statement_timeout doesn't, you need to suppose that you know how long the query can realistically wait for all locks together, but *not* how long it's going to run in the absence of lock delays. That seems a bit far-fetched, particularly when thinking of row-level locks, whose cumulative timeout would presumably need to scale with the number of rows the query will visit. If statement-level lock timeouts were cheap to add, that would be one thing; but given that they're complicating the code materially, I think we need a more convincing argument for them. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Strange Windows problem, lock_timeout test request
2013-03-06 19:53 keltezéssel, Tom Lane írta: Robert Haas robertmh...@gmail.com writes: On Wed, Feb 27, 2013 at 8:58 AM, Stephen Frost sfr...@snowman.net wrote: It's still entirely possible to get 99% done and then hit that last tuple that you need a lock on and just tip over the lock_timeout_stmt limit due to prior waiting and ending up wasting a bunch of work, hence why I'm not entirely sure that this is that much better than statement_timeout. I tend to agree that this should be based on the length of any individual lock wait, not the cumulative duration of lock waits. Otherwise, it seems like it'll be very hard to set this to a meaningful value. For example, if you set this to 1 minute, and that means the length of any single wait, then you basically know that it'll only kick in if there is some other, long-running transaction that's holding the lock. But if it means the cumulative length of all waits, it's not so clear, because now you might also have this kick in if you wait for 100ms on 600 different occasions. In other words, complex queries that lock or update many tuples may get killed even if they never wait very long at all for any single lock. That seems like it will be almost indistinguishable from random, unprincipled query cancellations. Yeah. I'm also unconvinced that there's really much use-case territory here that statement_timeout doesn't cover well enough. To have a case that statement-level lock timeout covers and statement_timeout doesn't, you need to suppose that you know how long the query can realistically wait for all locks together, but *not* how long it's going to run in the absence of lock delays. That seems a bit far-fetched, particularly when thinking of row-level locks, whose cumulative timeout would presumably need to scale with the number of rows the query will visit. If statement-level lock timeouts were cheap to add, that would be one thing; but given that they're complicating the code materially, I think we need a more convincing argument for them. OK, so it's not wanted. Surprise, surprise, it was already dropped from the patch. Can you _please_ review the last patch and comment on it instead of the state of past? Thanks and best regards, Zoltán Böszörményi regards, tom lane -- -- Zoltán Böszörményi Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
There may be good reasons to reject this patch. Or there may not. But I completely disagree with the idea that asking them to solve the problem at the filesystem level is sensible. Yes, can we get back to the main issues with the patch? 1) argument over whether the checksum is sufficient to detect most errors, or if it will give users false confidence. 2) performance overhead. Based on Smith's report, I consider (2) to be a deal-killer right now. The level of overhead reported by him would prevent the users I work with from ever employing checksums on production systems. Specifically, the writing checksums for a read-only query is a defect I think is prohibitively bad. When we first talked about this feature for 9.2, we were going to exclude hint bits from checksums, in order to avoid this issue; what happened to that? (FWIW, I still support the idea of moving hint bits to a separate filehandle, as we do with the FSM, but clearly that's not happening for 9.3 ...) -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for REINDEX CONCURRENTLY
On Thu, Mar 7, 2013 at 2:09 AM, Fujii Masao masao.fu...@gmail.com wrote: On Wed, Mar 6, 2013 at 8:59 PM, Michael Paquier michael.paqu...@gmail.com wrote: OK. Patches updated... Please see attached. I found odd behavior. After I made REINDEX CONCURRENTLY fail twice, I found that the index which was not marked as INVALID remained unexpectedly. =# CREATE TABLE hoge (i int primary key); CREATE TABLE =# INSERT INTO hoge VALUES (generate_series(1,10)); INSERT 0 10 =# SET statement_timeout TO '1s'; SET =# REINDEX TABLE CONCURRENTLY hoge; ERROR: canceling statement due to statement timeout =# \d hoge Table public.hoge Column | Type | Modifiers +-+--- i | integer | not null Indexes: hoge_pkey PRIMARY KEY, btree (i) hoge_pkey_cct PRIMARY KEY, btree (i) INVALID =# REINDEX TABLE CONCURRENTLY hoge; ERROR: canceling statement due to statement timeout =# \d hoge Table public.hoge Column | Type | Modifiers +-+--- i | integer | not null Indexes: hoge_pkey PRIMARY KEY, btree (i) hoge_pkey_cct PRIMARY KEY, btree (i) INVALID hoge_pkey_cct1 PRIMARY KEY, btree (i) INVALID hoge_pkey_cct_cct PRIMARY KEY, btree (i) Invalid indexes cannot be reindexed concurrently and are simply bypassed during process, so _cct_cct has no reason to exist. For example here is what I get with a relation having an invalid index: ioltas=# \d aa Table public.aa Column | Type | Modifiers +-+--- a | integer | Indexes: aap btree (a) aap_cct btree (a) INVALID ioltas=# reindex table concurrently aa; WARNING: cannot reindex concurrently invalid index public.aap_cct, skipping REINDEX +The recommended recovery method in such cases is to drop the concurrent +index and try again to perform commandREINDEX CONCURRENTLY/. If an invalid index depends on the constraint like primary key, drop the concurrent index cannot actually drop the index. In this case, you need to issue alter table ... drop constraint ... to recover the situation. I think this information should be documented. You are right. I'll add a note in the documentation about that. Personally I find it more instinctive to use DROP CONSTRAINT for a primary key as the image I have of a concurrent index is a twin of the index it rebuilds. -- Michael
Re: [HACKERS] Bug in tm2timestamp
On Mon, Mar 04, 2013 at 05:08:26PM -0300, Alvaro Herrera wrote: Another point worth considering is that most of this is duplicated by ecpg's libpgtypes. Do we want to fix that one too, or do we just let it continue to be broken? I note that other bugs are already unfixed in ecpg's copy. One other idea to consider is moving these things to Meaning that a fix wasn't put there, too? But in light of this bug and other already fixed date/time bugs, perhaps it's warranted? Opinions please. I'd love to go to a single source. Most of libpgtypes was taken from the backend back when it was developed. Michael -- Michael Meskes Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org Jabber: michael.meskes at gmail dot com VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug in tm2timestamp
On Mon, Mar 04, 2013 at 05:55:26PM -0300, Alvaro Herrera wrote: error codes for the caller to figure out. Maybe we could create a layer on top of ereport, that gets both the error message, sqlstate etc, and ... Couldn't we just create ecpg's own version of ereport, that does the right thing for ecpg? Michael -- Michael Meskes Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org Jabber: michael.meskes at gmail dot com VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug in tm2timestamp
Michael Meskes wrote: On Mon, Mar 04, 2013 at 05:08:26PM -0300, Alvaro Herrera wrote: Another point worth considering is that most of this is duplicated by ecpg's libpgtypes. Do we want to fix that one too, or do we just let it continue to be broken? I note that other bugs are already unfixed in ecpg's copy. One other idea to consider is moving these things to Meaning that a fix wasn't put there, too? Yes, a fix was put there by Tom (which is why I retracted my comment initially). I did note that the ecpg code has diverged from the backend code; it's not unlikely that other bug fixes have not gone to the ecpg copy. But I didn't investigate each difference in detail. But in light of this bug and other already fixed date/time bugs, perhaps it's warranted? Opinions please. I'd love to go to a single source. Most of libpgtypes was taken from the backend back when it was developed. I will keep that in mind, if I get back to moving the timestamp/datetime code to src/common. It's not a high priority item right now. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for REINDEX CONCURRENTLY
On 2013-03-07 02:34:54 +0900, Fujii Masao wrote: On Thu, Mar 7, 2013 at 2:17 AM, Andres Freund and...@2ndquadrant.com wrote: Indexes: hoge_pkey PRIMARY KEY, btree (i) hoge_pkey_cct PRIMARY KEY, btree (i) INVALID hoge_pkey_cct1 PRIMARY KEY, btree (i) INVALID hoge_pkey_cct_cct PRIMARY KEY, btree (i) Huh, why did that go through? It should have errored out? I'm not sure why. Anyway hoge_pkey_cct_cct should not appear or should be marked as invalid, I think. Hm. Yea. I am still not sure yet why hoge_pkey_cct_cct sprung into existance, but that hoge_pkey_cct1 springs into existance makes sense. I see a problem here, there is a moment here between phase 3 and 4 where both the old and the new indexes are valid and ready. Thats not good because if we abort in that moment we essentially have doubled the amount of indexes. Options: a) we live with it b) we only mark the new index as valid within phase 4. That should be fine I think? c) we invent some other state to mark indexes that are in-progress to replace another one. I guess b) seems fine? +The recommended recovery method in such cases is to drop the concurrent +index and try again to perform commandREINDEX CONCURRENTLY/. If an invalid index depends on the constraint like primary key, drop the concurrent index cannot actually drop the index. In this case, you need to issue alter table ... drop constraint ... to recover the situation. I think this informataion should be documented. I think we just shouldn't set -isprimary on the temporary indexes. Now we switch only the relfilenodes and not the whole index, that should be perfectly fine. Sounds good. But, what about other constraint case like unique constraint? Those other cases also can be resolved by not setting -isprimary? Unique indexes can exist without a constraint attached, so thats fine. I need to read a bit more code whether its safe to unset it, although indisexclusion, indimmediate might be more important. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
Kevin Grittner kgri...@ymail.com wrote: Tatsuo Ishii is...@postgresql.org wrote: Was the remaining work on docs done? I would like to test MVs and am waiting for the docs completed. I think they are done. If you notice anything missing or in need of clarification please let me know. At this point missing docs would be a bug in need of a fix. I decided to take another pass through to try to spot anyplace I might have missed. I found that I had missed documenting the new pg_matviews system view, so I have just pushed that. I also think that something should be done about the documentation for indexes. Right now that always refers to a table. It would clearly be awkward to change that to table or materialized view everywhere. I wonder if most of thosse should be changed to relation with a few mentions that the relation could be a table or a materialized view, or whether some less intrusive change would be better. Opinions welcome. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for REINDEX CONCURRENTLY
On 2013-03-07 05:26:31 +0900, Michael Paquier wrote: On Thu, Mar 7, 2013 at 2:34 AM, Fujii Masao masao.fu...@gmail.com wrote: On Thu, Mar 7, 2013 at 2:17 AM, Andres Freund and...@2ndquadrant.com wrote: Indexes: hoge_pkey PRIMARY KEY, btree (i) hoge_pkey_cct PRIMARY KEY, btree (i) INVALID hoge_pkey_cct1 PRIMARY KEY, btree (i) INVALID hoge_pkey_cct_cct PRIMARY KEY, btree (i) Huh, why did that go through? It should have errored out? I'm not sure why. Anyway hoge_pkey_cct_cct should not appear or should be marked as invalid, I think. CHECK_FOR_INTERRUPTS were not added at each phase and they are needed in case process is interrupted by user. This has been mentioned in a pas review but it was missing, so it might have slipped out during a refactoring or smth. Btw, I am surprised to see that this *_cct_cct index has been created knowing that hoge_pkey_cct is invalid. I tried with the latest version of the patch and even the patch attached but couldn't reproduce it. The strange think about hoge_pkey_cct_cct is that it seems to imply that an invalid index was reindexed concurrently? But I don't see how it could happen either. Fujii, can you reproduce it? +The recommended recovery method in such cases is to drop the concurrent +index and try again to perform commandREINDEX CONCURRENTLY/. If an invalid index depends on the constraint like primary key, drop the concurrent index cannot actually drop the index. In this case, you need to issue alter table ... drop constraint ... to recover the situation. I think this informataion should be documented. I think we just shouldn't set -isprimary on the temporary indexes. Now we switch only the relfilenodes and not the whole index, that should be perfectly fine. Sounds good. But, what about other constraint case like unique constraint? Those other cases also can be resolved by not setting -isprimary? We should stick with the concurrent index being a twin of the index it rebuilds for consistency. I don't think its legal. We cannot simply have two indexes with 'indisprimary'. Especially not if bot are valid. Also, there will be no pg_constraint row that refers to it which violates very valid expectations that both users and pg may have. Also, I think that it is important from the session viewpoint to perform a swap with 2 valid indexes. If the process fails just before swapping indexes user might want to do that himself and drop the old index, then use the concurrent one. The most likely outcome will be to rerun REINDEX CONCURRENTLY. Which will then reindex one more index since it now has the old valid index and the new valid index. Also, I don't think its fair game to expose indexes that used to belong to a constraint without a constraint supporting it as valid indexes. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
On Mar 6, 2013, at 1:51 PM, Kevin Grittner kgri...@ymail.com wrote: I also think that something should be done about the documentation for indexes. Right now that always refers to a table. It would clearly be awkward to change that to table or materialized view everywhere. I wonder if most of thosse should be changed to relation with a few mentions that the relation could be a table or a materialized view, or whether some less intrusive change would be better. Opinions welcome. Isn’t a materialized view really just a table that gets updated periodically? And isn’t a non-matierialized view also thought of as a “relation”? If the answer to both those questions is “yes,” I think the term should remain “table,” with a few mentions that the term includes materialized views (and excludes foreign tables). Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
Kevin Grittner kgri...@ymail.com wrote: Tatsuo Ishii is...@postgresql.org wrote: Was the remaining work on docs done? I would like to test MVs and am waiting for the docs completed. I think they are done. If you notice anything missing or in need of clarification please let me know. At this point missing docs would be a bug in need of a fix. Ok. I decided to take another pass through to try to spot anyplace I might have missed. I found that I had missed documenting the new pg_matviews system view, so I have just pushed that. I also think that something should be done about the documentation for indexes. Right now that always refers to a table. It would clearly be awkward to change that to table or materialized view everywhere. I wonder if most of thosse should be changed to relation with a few mentions that the relation could be a table or a materialized view, or whether some less intrusive change would be better. Opinions welcome. You might want to add small description about MVs to Tutorial documentation 3.2 Views. Here is the current description of views in the doc. - 3.2. Views Refer back to the queries in Section 2.6. Suppose the combined listing of weather records and city location is of particular interest to your application, but you do not want to type the query each time you need it. You can create a view over the query, which gives a name to the query that you can refer to like an ordinary table: CREATE VIEW myview AS SELECT city, temp_lo, temp_hi, prcp, date, location FROM weather, cities WHERE city = name; SELECT * FROM myview; Making liberal use of views is a key aspect of good SQL database design. Views allow you to encapsulate the details of the structure of your tables, which might change as your application evolves, behind consistent interfaces. Views can be used in almost any place a real table can be used. Building views upon other views is not uncommon. - -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
Robert, We've had a few EnterpriseDB customers who have had fantastically painful experiences with PostgreSQL + ZFS. Supposedly, aligning the ZFS block size to the PostgreSQL block size is supposed to make these problems go away, but in my experience it does not have that effect. So I think telling people who want checksums go use ZFS is a lot like telling them oh, I see you have a hangnail, we recommend that you solve that by cutting your arm off with a rusty saw. Wow, what platform are you using ZFS on? (we have a half-dozen clients on ZFS ...) -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Wed, Mar 6, 2013 at 2:14 PM, Josh Berkus j...@agliodbs.com wrote: Based on Smith's report, I consider (2) to be a deal-killer right now. I was pretty depressed by those numbers, too. The level of overhead reported by him would prevent the users I work with from ever employing checksums on production systems. Agreed. Specifically, the writing checksums for a read-only query is a defect I think is prohibitively bad. That particular part doesn't bother me so much as some of the others - but let's step back and look at the larger issue. I suspect we can all agree that the performance of this feature is terrible. The questions I think we should be asking are: 1. Are the problems fundamental, or things where we can reasonable foresee future improvement? The latter situation wouldn't bother me very much even if the current situation is pretty bad, but if there's no real hope of improvement, that's more of a problem. 2. Are the performance results sufficiently bad that we think this would be more of a liability than an asset? When we first talked about this feature for 9.2, we were going to exclude hint bits from checksums, in order to avoid this issue; what happened to that? I don't think anyone ever thought that was a particularly practical design. I certainly don't. (FWIW, I still support the idea of moving hint bits to a separate filehandle, as we do with the FSM, but clearly that's not happening for 9.3 ...) Or, most likely, ever. The whole benefit of hint bits is that the information you need is available in the same bytes you have to read anyway. Moving the information to another fork (not filehandle) would probably give up most of the benefit. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Wed, Mar 6, 2013 at 6:00 PM, Josh Berkus j...@agliodbs.com wrote: We've had a few EnterpriseDB customers who have had fantastically painful experiences with PostgreSQL + ZFS. Supposedly, aligning the ZFS block size to the PostgreSQL block size is supposed to make these problems go away, but in my experience it does not have that effect. So I think telling people who want checksums go use ZFS is a lot like telling them oh, I see you have a hangnail, we recommend that you solve that by cutting your arm off with a rusty saw. Wow, what platform are you using ZFS on? (we have a half-dozen clients on ZFS ...) Not us, customers. But as to platform, I have yet to run across anyone running ZFS on anything but Solaris. I'd be interested to hear your experiences. Mine rhyme with sun a play dreaming. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On 03/06/2013 03:06 PM, Robert Haas wrote: On Wed, Mar 6, 2013 at 6:00 PM, Josh Berkus j...@agliodbs.com wrote: We've had a few EnterpriseDB customers who have had fantastically painful experiences with PostgreSQL + ZFS. Supposedly, aligning the ZFS block size to the PostgreSQL block size is supposed to make these problems go away, but in my experience it does not have that effect. So I think telling people who want checksums go use ZFS is a lot like telling them oh, I see you have a hangnail, we recommend that you solve that by cutting your arm off with a rusty saw. Wow, what platform are you using ZFS on? (we have a half-dozen clients on ZFS ...) Not us, customers. But as to platform, I have yet to run across anyone running ZFS on anything but Solaris. I'd be interested to hear your experiences. Mine rhyme with sun a play dreaming. I would guess he meant on X86_64 or Sparc. JD -- Command Prompt, Inc. - http://www.commandprompt.com/ PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC @cmdpromptinc - 509-416-6579 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
Andres Freund and...@2ndquadrant.com writes: On 2013-03-06 11:21:21 -0500, Garick Hamlin wrote: If picking a CRC why not a short optimal one rather than truncate CRC32C? CRC32C is available in hardware since SSE4.2. I think that should be at most a fourth-order consideration, since we are not interested solely in Intel hardware, nor do we have any portable way of getting at such a feature even if the hardware has it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] odd behavior in materialized view
Fujii Masao masao.fu...@gmail.com wrote: On Tue, Mar 5, 2013 at 7:36 AM, Kevin Grittner kgri...@ymail.com wrote: Fujii Masao masao.fu...@gmail.com wrote: When I accessed the materialized view in the standby server, I got the following ERROR message. Looks odd to me. Is this a bug? ERROR: materialized view hogeview has not been populated HINT: Use the REFRESH MATERIALIZED VIEW command. The procedure to reproduce this error message is: In the master server: CREATE TABLE hoge (i int); INSERT INTO hoge VALUES (generate_series(1,100)); CREATE MATERIALIZED VIEW hogeview AS SELECT * FROM hoge; DELETE FROM hoge; REFRESH MATERIALIZED VIEW hogeview; SELECT count(*) FROM hogeview; In the standby server SELECT count(*) FROM hogeview; SELECT count(*) goes well in the master, and expectedly returns 0. OTOH, in the standby, it emits the error message. Will investigate. Thanks! And I found another problem. When I ran the following SQLs in the master, PANIC error occurred in the standby. CREATE TABLE hoge (i int); INSERT INTO hoge VALUES (generate_series(1,100)); CREATE MATERIALIZED VIEW hogeview AS SELECT * FROM hoge; VACUUM ANALYZE; The PANIC error messages that I got in the standby are WARNING: page 0 of relation base/12297/16387 is uninitialized CONTEXT: xlog redo visible: rel 1663/12297/16387; blk 0 PANIC: WAL contains references to invalid pages CONTEXT: xlog redo visible: rel 1663/12297/16387; blk 0 base/12297/16387 is the file of the materialized view 'hogeview'. I was able to replicate both bugs, and they both appear to be fixed by the attached, which I have just pushed. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company*** a/src/backend/commands/matview.c --- b/src/backend/commands/matview.c *** *** 14,19 --- 14,20 */ #include postgres.h + #include access/heapam_xlog.h #include access/multixact.h #include access/relscan.h #include access/xact.h *** *** 68,77 SetRelationIsScannable(Relation relation) Assert(relation-rd_rel-relkind == RELKIND_MATVIEW); Assert(relation-rd_isscannable == false); - RelationOpenSmgr(relation); page = (Page) palloc(BLCKSZ); PageInit(page, BLCKSZ, 0); smgrextend(relation-rd_smgr, MAIN_FORKNUM, 0, (char *) page, true); pfree(page); smgrimmedsync(relation-rd_smgr, MAIN_FORKNUM); --- 69,83 Assert(relation-rd_rel-relkind == RELKIND_MATVIEW); Assert(relation-rd_isscannable == false); page = (Page) palloc(BLCKSZ); PageInit(page, BLCKSZ, 0); + + if (RelationNeedsWAL(relation)) + log_newpage((relation-rd_node), MAIN_FORKNUM, 0, page); + + RelationOpenSmgr(relation); smgrextend(relation-rd_smgr, MAIN_FORKNUM, 0, (char *) page, true); + pfree(page); smgrimmedsync(relation-rd_smgr, MAIN_FORKNUM); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Small patch for CREATE TRIGGER documentation
I found this sentence somewhat confusing: It is possible for a column's value to change even when the trigger is not fired, http://www.postgresql.org/docs/devel/static/sql-createtrigger.html#SQL-CREATETRIGGER-NOTES More precise would be something along the lines It is possible that changes to the column's value will not cause the trigger to be fired. The attached patch hopefully rewords the entire paragraph for clarity. Regards Ian Barwick create-trigger-doc-notes-2013-03-07.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On 03/06/2013 07:34 PM, Heikki Linnakangas wrote: It'd be difficult to change the algorithm in a future release without breaking on-disk compatibility, On-disk compatibility is broken with major releases anyway, so I don't see this as a huge barrier. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On 2013-03-07 08:37:40 +0800, Craig Ringer wrote: On 03/06/2013 07:34 PM, Heikki Linnakangas wrote: It'd be difficult to change the algorithm in a future release without breaking on-disk compatibility, On-disk compatibility is broken with major releases anyway, so I don't see this as a huge barrier. Uh, pg_upgrade? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On 3/6/13 1:34 PM, Robert Haas wrote: We've had a few EnterpriseDB customers who have had fantastically painful experiences with PostgreSQL + ZFS. Supposedly, aligning the ZFS block size to the PostgreSQL block size is supposed to make these problems go away, but in my experience it does not have that effect. There are a couple of major tuning issues you have to get right for good ZFS performance, like its tendency to gobble more RAM than is necessarily appropriate for a PostgreSQL host. If you nail down all those and carefully setup everything it can work OK. When Sun had a bunch of good engineers working on the problem they certainly pulled it off. I managed a 3TB database on a ZFS volume for a while myself. Being able to make filesystem snapshots cleanly and easily was very nice. As for the write performance implications of COW, though, at a couple of points I was only able to keep that system ingesting data fast enough if I turned fsync off :( It's not as if even ZFS makes all the filesystem issues the database worries about go away either. Take a look at http://www.c0t0d0s0.org/archives/6071-No,-ZFS-really-doesnt-need-a-fsck.html as an example. That should leave you with a healthy concern over ZFS handling of power interruption and lying drives. [NTFS and ext3] have the same problem, but it has different effects, that aren't as visible as in ZFS. ext4 actually fixed this for most hardware though, and I believe ZFS still has the same uberblock concern. ZFS reliability and its page checksums are good, but they're not magic for eliminating torn page issues. Normally I would agree with Heikki's theory of let's wait a few years and see if the filesystem will take care of it idea. But for me, the when do we get checksums? clock started ticking in 2006 when ZFS popularized its implementation, and now it's gone off and it keeps ringing at new places. I would love it if FreeBSD had caught a massive popularity wave in the last few years, so ZFS was running in a lot more places. Instead what I keep seeing is deployments Linux with filesystem choices skewed toward conservative. Forget about the leading edge--I'd be happy if I could get one large customer to migrate off of ext3... -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for REINDEX CONCURRENTLY
On Thu, Mar 7, 2013 at 7:19 AM, Andres Freund and...@2ndquadrant.comwrote: On 2013-03-07 05:26:31 +0900, Michael Paquier wrote: On Thu, Mar 7, 2013 at 2:34 AM, Fujii Masao masao.fu...@gmail.com wrote: On Thu, Mar 7, 2013 at 2:17 AM, Andres Freund and...@2ndquadrant.com wrote: Indexes: hoge_pkey PRIMARY KEY, btree (i) hoge_pkey_cct PRIMARY KEY, btree (i) INVALID hoge_pkey_cct1 PRIMARY KEY, btree (i) INVALID hoge_pkey_cct_cct PRIMARY KEY, btree (i) Huh, why did that go through? It should have errored out? I'm not sure why. Anyway hoge_pkey_cct_cct should not appear or should be marked as invalid, I think. CHECK_FOR_INTERRUPTS were not added at each phase and they are needed in case process is interrupted by user. This has been mentioned in a pas review but it was missing, so it might have slipped out during a refactoring or smth. Btw, I am surprised to see that this *_cct_cct index has been created knowing that hoge_pkey_cct is invalid. I tried with the latest version of the patch and even the patch attached but couldn't reproduce it. The strange think about hoge_pkey_cct_cct is that it seems to imply that an invalid index was reindexed concurrently? But I don't see how it could happen either. Fujii, can you reproduce it? Curious about that also. +The recommended recovery method in such cases is to drop the concurrent +index and try again to perform commandREINDEX CONCURRENTLY/. If an invalid index depends on the constraint like primary key, drop the concurrent index cannot actually drop the index. In this case, you need to issue alter table ... drop constraint ... to recover the situation. I think this informataion should be documented. I think we just shouldn't set -isprimary on the temporary indexes. Now we switch only the relfilenodes and not the whole index, that should be perfectly fine. Sounds good. But, what about other constraint case like unique constraint? Those other cases also can be resolved by not setting -isprimary? We should stick with the concurrent index being a twin of the index it rebuilds for consistency. I don't think its legal. We cannot simply have two indexes with 'indisprimary'. Especially not if bot are valid. Also, there will be no pg_constraint row that refers to it which violates very valid expectations that both users and pg may have. So what to do with that? Mark the concurrent index as valid, then validate it and finally mark it as invalid inside the same transaction at phase 4? That's moving 2 lines of code... -- Michael
Re: [HACKERS] Enabling Checksums
On 03/07/2013 08:41 AM, Andres Freund wrote: On 2013-03-07 08:37:40 +0800, Craig Ringer wrote: On 03/06/2013 07:34 PM, Heikki Linnakangas wrote: It'd be difficult to change the algorithm in a future release without breaking on-disk compatibility, On-disk compatibility is broken with major releases anyway, so I don't see this as a huge barrier. Uh, pg_upgrade? Yeah. I was thinking that pg_upgrade copes with a lot of incompatibilities already, but this is lower-level. Darn. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for REINDEX CONCURRENTLY
On Thu, Mar 7, 2013 at 9:48 AM, Michael Paquier michael.paqu...@gmail.comwrote: On Thu, Mar 7, 2013 at 7:19 AM, Andres Freund and...@2ndquadrant.comwrote: On 2013-03-07 05:26:31 +0900, Michael Paquier wrote: On Thu, Mar 7, 2013 at 2:34 AM, Fujii Masao masao.fu...@gmail.com wrote: On Thu, Mar 7, 2013 at 2:17 AM, Andres Freund and...@2ndquadrant.com wrote: Indexes: hoge_pkey PRIMARY KEY, btree (i) hoge_pkey_cct PRIMARY KEY, btree (i) INVALID hoge_pkey_cct1 PRIMARY KEY, btree (i) INVALID hoge_pkey_cct_cct PRIMARY KEY, btree (i) Huh, why did that go through? It should have errored out? I'm not sure why. Anyway hoge_pkey_cct_cct should not appear or should be marked as invalid, I think. CHECK_FOR_INTERRUPTS were not added at each phase and they are needed in case process is interrupted by user. This has been mentioned in a pas review but it was missing, so it might have slipped out during a refactoring or smth. Btw, I am surprised to see that this *_cct_cct index has been created knowing that hoge_pkey_cct is invalid. I tried with the latest version of the patch and even the patch attached but couldn't reproduce it. The strange think about hoge_pkey_cct_cct is that it seems to imply that an invalid index was reindexed concurrently? But I don't see how it could happen either. Fujii, can you reproduce it? Curious about that also. +The recommended recovery method in such cases is to drop the concurrent +index and try again to perform commandREINDEX CONCURRENTLY/. If an invalid index depends on the constraint like primary key, drop the concurrent index cannot actually drop the index. In this case, you need to issue alter table ... drop constraint ... to recover the situation. I think this informataion should be documented. I think we just shouldn't set -isprimary on the temporary indexes. Now we switch only the relfilenodes and not the whole index, that should be perfectly fine. Sounds good. But, what about other constraint case like unique constraint? Those other cases also can be resolved by not setting -isprimary? We should stick with the concurrent index being a twin of the index it rebuilds for consistency. I don't think its legal. We cannot simply have two indexes with 'indisprimary'. Especially not if bot are valid. Also, there will be no pg_constraint row that refers to it which violates very valid expectations that both users and pg may have. So what to do with that? Mark the concurrent index as valid, then validate it and finally mark it as invalid inside the same transaction at phase 4? That's moving 2 lines of code... Sorry phase 4 is the swap phase. Validation happens at phase 3. -- Michael
Re: [HACKERS] Strange Windows problem, lock_timeout test request
On Wed, Feb 27, 2013 at 8:58 AM, Stephen Frost sfr...@snowman.net wrote: * Boszormenyi Zoltan (z...@cybertec.at) wrote: But unlike statement_timeout, with lock_timeout_stmt the statement can still finish after this limit as it does useful work besides waiting for locks. It's still entirely possible to get 99% done and then hit that last tuple that you need a lock on and just tip over the lock_timeout_stmt limit due to prior waiting and ending up wasting a bunch of work, hence why I'm not entirely sure that this is that much better than statement_timeout. I tend to agree that this should be based on the length of any individual lock wait, not the cumulative duration of lock waits. Otherwise, it seems like it'll be very hard to set this to a meaningful value. For example, if you set this to 1 minute, and that means the length of any single wait, then you basically know that it'll only kick in if there is some other, long-running transaction that's holding the lock. But if it means the cumulative length of all waits, it's not so clear, because now you might also have this kick in if you wait for 100ms on 600 different occasions. In other words, complex queries that lock or update many tuples may get killed even if they never wait very long at all for any single lock. That seems like it will be almost indistinguishable from random, unprincipled query cancellations. Now, you could try to work around that by varying the setting based on the complexity of the query you're about to run, but that seems like a pain in the neck, to put it mildly. And it might still not give you the behavior that you want. Personally, I'd think a big part of the appeal of this is to make sure that you don't hang waiting for a RELATION level lock for too long before giving up. And for that, scaling with the complexity of the query would be exactly the wrong thing to do, even if you could figure out some system for it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.3] OAT_POST_ALTER object access hooks
On Sun, Jan 27, 2013 at 1:55 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote: The part-1 patch adds catalog/objectaccess.c to have entrypoints of object_access_hook, instead of simple macro definition, to simplify invocations with arguments. It is just a replacement of existing OAT_POST_CREATE and OAT_DROP invocation with new style, Looks good. Committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On 3/4/13 7:04 PM, Daniel Farina wrote: Corruption has easily occupied more than one person-month of time last year for us. Just FYI for anyone that's experienced corruption... we've looked into doing row-level checksums at work. The only challenge we ran into was how to check them when reading data back. I don't remember the details but there was an issue with doing this via SELECT rules. It would be possible if you were willing to put writable views on all your tables (which isn't actually as horrible as it sounds; it wouldn't be hard to write a function to automagically do that for you). -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On 3/6/13 1:14 PM, Josh Berkus wrote: There may be good reasons to reject this patch. Or there may not. But I completely disagree with the idea that asking them to solve the problem at the filesystem level is sensible. Yes, can we get back to the main issues with the patch? 1) argument over whether the checksum is sufficient to detect most errors, or if it will give users false confidence. 2) performance overhead. Based on Smith's report, I consider (2) to be a deal-killer right now. The level of overhead reported by him would prevent the users I work with from ever employing checksums on production systems. FWIW, the write workload most likely wouldn't be a problem for us. I am concerned about the reported 24-32% hit when reading back in from FS cache... that might kill this for us. I'm working on doing a test to see how bad it actually is for us... but getting stuff like that done at work is like pulling teeth, so we'll see... Specifically, the writing checksums for a read-only query is a defect I think is prohibitively bad. When we first talked about this feature for 9.2, we were going to exclude hint bits from checksums, in order to avoid this issue; what happened to that? (FWIW, I still support the idea of moving hint bits to a separate filehandle, as we do with the FSM, but clearly that's not happening for 9.3 ...) +1 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Using indexes for partial index builds
On 2/2/13 4:05 AM, Paul Norman wrote: Hello, After a discussion on IRC in #postgresql, I had a feature suggestion and it was suggested I write it up here. I have a large (200GB, 1.7b rows) table with a number of columns, but the two of interest here are a hstore column, tags and a postgis geometry column, geom. There is a GIN index on tags and a gist index on geom. These took about 36-48 hours to build in total. Obviously index building on a table this size is not trivial. Periodically I want to do a number of specialized queries on objects with a particular tag or in a particular area. To do this I often want to create a partial index. For example, I created the index btree ((tags - 'name_1'::text) text_pattern_ops) WHERE tags ? 'name_1'::text. My understanding is to create this index PostgreSQL does a scan of the entire table, even though the GIN index on tags could be used to identify which rows could belong in the index. Where the WHERE condition selects only a small portion of the table this is scanning a lot more data than is necessary. Another case where it would be useful is when I am conducting a detailed analysis of some aspect of the rows in a particular city. This leads to all the queries being of the form SELECT ... FROM ... WHERE is_in_my_area(geom)[1]. My current project is doing analysis involving addresses. The ability to create an index like btree((tags - 'addr:housenumber'), (tags - 'addr:street'), (tags - 'addr:city')) WHERE is_in_my_area(geom) in a reasonable time would allow me to use a view instead of copying the local area to a temporary table and indexing that table. The local area is about 350k rows, or about 0.02% of the database. [1] The actual function for determining if it's in my area is long and not really essential to the point here. Something worth considering on this... I suspect it's possible to use an index-only scan to do this, regardless of whether the heap page is all visible. The reason is that the newly created index would just use the same access methodology as the original index, so any dead rows would be ignored. We'd almost certainly need to block vacuums during the build however. Obviously not an issue for regular index builds, but it would be for concurrent ones. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal 9.4 plpgsql: allows access to call stack from GET DIAGNOSTICS statement
On 2/2/13 3:23 AM, Pavel Stehule wrote: Hello I propose enhancing GET DIAGNOSTICS statement about new field PG_CONTEXT. It is similar to GET STACKED DIAGNOSTICS' PG_EXCEPTION_CONTEXT. Motivation for this proposal is possibility to get call stack for debugging without raising exception. This code is based on cleaned code from Orafce, where is used four years without any error reports. CREATE OR REPLACE FUNCTION public.inner(integer) RETURNS integer LANGUAGE plpgsql AS $function$ declare _context text; begin get diagnostics _context = pg_context; raise notice '***%***', _context; return 2 * $1; end; $function$ postgres=# select outer_outer(10); NOTICE: ***PL/pgSQL function inner(integer) line 4 at GET DIAGNOSTICS PL/pgSQL function outer(integer) line 3 at RETURN PL/pgSQL function outer_outer(integer) line 3 at RETURN*** CONTEXT: PL/pgSQL function outer(integer) line 3 at RETURN PL/pgSQL function outer_outer(integer) line 3 at RETURN That could *really* stand for some indentation in the output instead of the *** business. But other than that, this definitely seems useful. I had no idea _context was even an option... :/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.3] OAT_POST_ALTER object access hooks
On Sun, Jan 27, 2013 at 1:55 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote: The part-2 patch adds new OAT_POST_ALTER event type, and its relevant permission checks on contrib/sepgsql. This documentation hunk is unclear: +On xref linkend=sql-createfunction, literalinstall/ permission +will be checked if literalleakproof/ attribute was given, not only +literalcreate/ on the new function. Based on previous experience reading your patches, I'm guessing that what you actually mean is that both things are checked, but the wording doesn't make that clear. Also, if permissions are now checked on functions, doesn't the previous sentence need an update? +In addition, literaladd_name/ and literalremove_name/ permission +will be checked towards relevant schema when we try to rename or set +new schema on the altered object. Suggest: In addition, literalremove_name/ and literaladd_name/ will be checked on the old and new schemas, respectively, when an object is moved to a new schema. +A few additional checks are applied depending on object types. For certain object types, additional checks are performed. +On xref linkend=sql-alterfunction, literalinstall/ permission +will be checked if literalleakproof/ attribute was turned on, not +only literalsetattr/ on the new function. This is a near-duplicate of the previous hunk and suffers from the same awkwardness. + * is_internal: TRUE if constraint is constructed unless user's intention I dunno what this means. What's the difference between an internal constraint and a non-internal constraint, and why do we need that distinction? This seems to percolate to a lot of places; I'd rather not do that without a real good reason. + /* XXX - should be checked at caller side */ XXX should be used only for things that really ought to be revisited and changed. See the wording I used in the just-committed part 1 patch. +#include catalog/objectaccess.h This is the only hunk in collationcmds.c, hence presumably not needed. + /* Post create hook of this access method operator */ + InvokeObjectPostCreateHook(AccessMethodOperatorRelationId, + entryoid, 0); I suggest uniformly adding a blank line before each of these hunks, rather than adding it for some and not others. I think, though, that we could probably ditch the comments throughout. They don't add anything, really. @@ -3330,7 +3342,6 @@ ATExecCmd(List **wqueue, AlteredTableInfo *tab, Relation rel, */ break; case AT_SetTableSpace: /* SET TABLESPACE */ - /* * Nothing to do here; Phase 3 does the work */ Spurious whitespace hunk. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On 3/6/13 6:34 AM, Heikki Linnakangas wrote: Another thought is that perhaps something like CRC32C would be faster to calculate on modern hardware, and could be safely truncated to 16-bits using the same technique you're using to truncate the Fletcher's Checksum. Greg's tests showed that the overhead of CRC calculation is significant in some workloads, so it would be good to spend some time to optimize that. It'd be difficult to change the algorithm in a future release without breaking on-disk compatibility, so let's make sure we pick the best one. Simon sent over his first rev of this using a quick to compute 16 bit checksum as a reasonable trade-off, one that it's possible to do right now. It's not optimal in a few ways, but it catches single bit errors that are missed right now, and Fletcher-16 computes quickly and without a large amount of code. It's worth double-checking that the code is using the best Fletcher-16 approach available. I've started on that, but I'm working on your general performance concerns first, with the implementation that's already there. From what I've read so far, I think picking Fletcher-16 instead of the main alternative, CRC-16-IBM AKA CRC-16-ANSI, is a reasonable choice. There's a good table showing the main possibilities here at https://en.wikipedia.org/wiki/Cyclic_redundancy_check One day I hope that in-place upgrade learns how to do page format upgrades, with the sort of background conversion tools and necessary tracking metadata we've discussed for that work. When that day comes, I would expect it to be straightforward to upgrade pages from 16 bit Fletcher checksums to 32 bit CRC-32C ones. Ideally we would be able to jump on the CRC-32C train today, but there's nowhere to put all 32 bits. Using a Fletcher 16 bit checksum for 9.3 doesn't prevent the project from going that way later though, once page header expansion is a solved problem. The problem with running CRC32C in software is that the standard fast approach uses a slicing technique that requires a chunk of pre-computed data be around, a moderately large lookup table. I don't see that there's any advantage to having all that baggage around if you're just going to throw away half of the result anyway. More on CRC32Cs in my next message. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On 3/6/13 1:24 PM, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2013-03-06 11:21:21 -0500, Garick Hamlin wrote: If picking a CRC why not a short optimal one rather than truncate CRC32C? CRC32C is available in hardware since SSE4.2. I think that should be at most a fourth-order consideration, since we are not interested solely in Intel hardware, nor do we have any portable way of getting at such a feature even if the hardware has it. True, but that situation might actually improve. The Castagnoli CRC-32C that's accelerated on the better Intel CPUs is also used to protect iSCSI and SCTP (a streaming protocol). And there is an active project to use a CRC32C to checksum ext4 metadata blocks on Linux: https://ext4.wiki.kernel.org/index.php/Ext4_Metadata_Checksums https://groups.google.com/forum/?fromgroups=#!topic/linux.kernel/APKfoMzjgdY Now, that project doesn't make the Postgres feature obsolete, because there's nowhere to put checksum data for every block on ext4 without whacking block alignment. The filesystem can't make an extra 32 bits appear on every block any more than we can. It's using a similar trick to the PG checksum feature, grabbing some empty space just for the metadata then shoving the CRC32C into there. But the fact that this is going on means that there are already Linux kernel modules built with both software/hardware accelerated versions of the CRC32C function. And the iSCSI/SCTP use cases means it's not out of the question this will show up in other useful forms one day. Maybe two years from now, there will be a common Linux library that autoconf can find to compute the CRC for us--with hardware acceleration when available, in software if not. The first of those ext4 links above even discusses the exact sort of issue we're facing. The author wonders if the easiest way to proceed for 16 bit checksums is to compute the CRC32C, then truncate it, simply because CRC32C creation is so likely to get hardware help one day. I think that logic doesn't really apply to the PostgreSQL case as strongly though, as the timetime before we can expect a hardware accelerated version to be available is much further off than a Linux kernel developer's future. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Wed, Mar 6, 2013 at 11:04 PM, Robert Haas robertmh...@gmail.com wrote: When we first talked about this feature for 9.2, we were going to exclude hint bits from checksums, in order to avoid this issue; what happened to that? I don't think anyone ever thought that was a particularly practical design. I certainly don't. Really? I thought it was pretty much the consensus for a good while. The main problem it ran into was that we kept turning up hint bits that we didn't realize we had. Index line pointers turned out to have hint bits, page headers have one, and so on. As long as it was just the heap page per-tuple transaction hint bits it seemed plausible to just skip them or move them all to a contiguous blocks. Once it started to look like the checksumming code had to know about every data structure on every page it seemed a bit daunting. But that wasn't something we realized for quite a long time. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
TL;DR summary: on a system I thought was a fair middle of the road server, pgbench tests are averaging about a 2% increase in WAL writes and a 2% slowdown when I turn on checksums. There are a small number of troublesome cases where that overhead rises to closer to 20%, an upper limit that's shown up in a few tests aiming to stress this feature now. On 3/4/13 10:09 PM, Jeff Davis wrote: = Test 2 - worst-case overhead for calculating checksum while reading data = Jeff saw an 18% slowdown, I get 24 to 32%. This one bothers me because the hit is going to happen during the very common situation where data is shuffling a lot between a larger OS cache and shared_buffers taking a relatively small fraction. I believe that test 1 and test 2 can be improved a little, if there is a need. Right now we copy the page and then calculate the checksum on the copy. If we instead calculate as we're copying, I believe it will make it significantly faster. It's good to know there's at least some ideas for optimizing this one further. I think the situation where someone has: shared_buffers database total RAM is fairly common for web applications. For people on Amazon EC2 instances for example, giving out the performance tuning advice of get a bigger instance until the database fits in RAM works amazingly well. If the hotspot of that data set fits in shared_buffers, those people will still be in good shape even with checksums enabled. If the hot working set is spread out more randomly, though, it's not impossible to see how they could suffer regularly from this ~20% OS cache-shared buffers movement penalty. Regardless, Jeff's three cases are good synthetic exercises to see worst-case behavior, but they are magnifying small differences. To see a more general case, I ran through a series of pgbench tests in its standard write mode. In order to be useful, I ended up using a system with a battery-backed write cache, but with only a single drive attached. I needed fsync to be fast to keep that from being the bottleneck. But I wanted physical I/O to be slow. I ran three test sets at various size/client loads: one without the BBWC (which I kept here because it gives some useful scale to the graphs), one with the baseline 9.3 code, and one with checksums enabled on the cluster. I did only basic postgresql.conf tuning: checkpoint_segments| 64 shared_buffers | 2GB There's two graphs comparing sets attached, you can see that the slowdown of checksums for this test is pretty minor. There is a clear gap between the two plots, but it's not a very big one, especially if you note how much difference a BBWC makes. I put the numeric results into a spreadsheet, also attached. There's so much noise in pgbench results that I found it hard to get a single number for the difference; they bounce around about +/-5% here. Averaging across everything gives a solid 2% drop when checksums are on that looked detectable above the noise. Things are worse on the bigger data sets. At the highest size I tested, the drop was more like 7%. The two larger size / low client count results I got were really bad, 25% and 16% drops. I think this is closing in on the range of things: perhaps only 2% when most of your data fits in shared_buffers, more like 10% if your database is bigger, and in the worst case 20% is possible. I don't completely trust those 25/16% numbers though, I'm going to revisit that configuration. The other thing I track now in pgbench-tools is how many bytes of WAL are written. Since the total needs to be measured relative to work accomplished, the derived number that looks useful there is average bytes of WAL per transaction. On smaller database this is around 6K, while larger databases topped out for me at around 22K WAL bytes/transaction. Remember that the pgbench transaction is several statements. Updates touch different blocks in pgbench_accounts, index blocks, and the small tables. The WAL increase from checksumming is a bit more consistent than the TPS rates. Many cases were 3 to 5%. There was one ugly case were it hit 30%, and I want to dig into where that came from more. On average, again it was a 2% increase over the baseline. Cases where you spew hint bit WAL data where before none were written (Jeff's test #3) remain a far worst performer than any of these. Since pgbench does a VACUUM before starting, none of those cases were encountered here though. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com attachment: clients-sets.pngattachment: scaling-sets.png Checksum-pgbench.xls Description: MS-Excel spreadsheet -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Wed, Mar 6, 2013 at 8:17 PM, Greg Smith g...@2ndquadrant.com wrote: TL;DR summary: on a system I thought was a fair middle of the road server, pgbench tests are averaging about a 2% increase in WAL writes and a 2% slowdown when I turn on checksums. There are a small number of troublesome cases where that overhead rises to closer to 20%, an upper limit that's shown up in a few tests aiming to stress this feature now. I have only done some cursory research, but cpu-time of 20% seem to expected for InnoDB's CRC computation[0]. Although a galling number, this comparison with other systems may be a way to see how much of that overhead is avoidable or just the price of entry. It's unclear how this 20% cpu-time compares to your above whole-system results, but it's enough to suggest that nothing comes for (nearly) free. [0]: http://mysqlha.blogspot.com/2009/05/innodb-checksum-performance.html -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposal for Allow postgresql.conf values to be changed via SQL [review]
On 3/5/13 9:07 AM, Amit Kapila wrote: In v11 patch, I have changed name of directory to config. For file name, currently I have not changed, but if you feel it needs to be changed, kindly suggest any one of above or if any other better you have in mind. This seems fine for now. Hashing out exactly which name is best is not going to be the thing that determines whether this can be committed or not. Your v11 applies cleanly for me too, and the code does look a lot nicer. The code diff itself is down to 1300 lines and not nearly as disruptive, the rest is docs or the many regression tests you've added. That's still not the sort of small change Tom was hoping this was possible, but it's closer to the right range. It may not really be possible to make this much smaller. I'm out of time for today, but I'll try to do some functional review work on this tomorrow if no one beats me to it. I would find it very useful to me personally if this feature were committed, and we know there's plenty of user demand for it too. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On 3/7/13 12:15 AM, Daniel Farina wrote: I have only done some cursory research, but cpu-time of 20% seem to expected for InnoDB's CRC computation[0]. Although a galling number, this comparison with other systems may be a way to see how much of that overhead is avoidable or just the price of entry. It's unclear how this 20% cpu-time compares to your above whole-system results, but it's enough to suggest that nothing comes for (nearly) free. That does provide a useful measuring point: how long does the computation take compared to the memcpy that moves the buffer around. It looks like they started out with 3.2 memcpy worth of work, and with enough optimization ended up at 1.27 worth. The important thing to keep in mind is that shared_buffers works pretty well at holding on to the most frequently accessed information. A typical server I see will show pg_statio information suggesting 90%+ of block requests are coming from hits there, the rest misses suggesting a mix of OS cache and real disk reads. Let's say 90% are hits, 5% are fetches at this 20% penalty, and 5% are real reads where the checksum time is trivial compared to physical disk I/O. That works out to be a real average slowdown of 6%. I think way more deployments are going to be like that case, which matches most of my pgbench runs, than the worse case workloads. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposal for Allow postgresql.conf values to be changed via SQL [review]
On Thursday, March 07, 2013 10:54 AM Greg Smith wrote: On 3/5/13 9:07 AM, Amit Kapila wrote: In v11 patch, I have changed name of directory to config. For file name, currently I have not changed, but if you feel it needs to be changed, kindly suggest any one of above or if any other better you have in mind. This seems fine for now. Hashing out exactly which name is best is not going to be the thing that determines whether this can be committed or not. Your v11 applies cleanly for me too, and the code does look a lot nicer. The code diff itself is down to 1300 lines and not nearly as disruptive, the rest is docs or the many regression tests you've added. I also think the tests added for regression may be more than required. Currently it is organized as below: 1. 3 type of parameters to use with set persistent. a. parameters for which new connection is required (log_connections) b. parameters for which server restart is required c. parameters for which reload is required I think in third category c, there are many parameters, I think may be 3-4 parameters should be sufficient. The initial intention to keep more parameters is to have test it for various type(Boolean, integer, float, string) 2. Set some of the parameters to default, then try reload and reconnect. 3. Again set those parameters to some value and re-verify by reload and reconnect. 4. Test for set persistent command inside functions and transaction. 5. Set all parameters value to default. I think tests for step-3 can be removed and tests for step-5 can be merged to step-2. If you think above optimization's to reduce number of tests are okay, then I will update the patch. That's still not the sort of small change Tom was hoping this was possible, but it's closer to the right range. It may not really be possible to make this much smaller. I'm out of time for today, but I'll try to do some functional review work on this tomorrow if no one beats me to it. I would find it very useful to me personally if this feature were committed, and we know there's plenty of user demand for it too. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers