Re: [HACKERS] Minor improvement to func.sgml
On Mon, Jun 1, 2015 at 10:44 PM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: Here is a doc patch to add materialized views and foreign tables to database objects that pg_table_is_visible() can be used with. Good catch, as usual. Committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Further issues with jsonb semantics, documentation
On Thu, Jun 4, 2015 at 12:10 PM, Peter Geoghegan p...@heroku.com wrote: jsonb_delete() should certainly be able to traverse objects, but it's much less clear that it should be able to *traverse* arrays (affecting arrays is a different story, though). That's why I proposed not supporting traversing arrays with it or with jsonb_set(). This would also removes the questionable second shadow type system within the text[] rhs operand too, which seems like a good thing. Here is a further example of why I find this new shadow type system for rhs text[] operands to be pretty questionable: postgres=# select jsonb_set('[1, 2, 3, 4, 5,6,7,8,9,10,11,12]', '{5e10}'::text[], 'Input unsanitized') ; jsonb_set --- [1, 2, 3, 4, 5, Input unsanitized, 7, 8, 9, 10, 11, 12] (1 row) BTW, there is a bug here -- strtol() needs additional defenses [1] (before casting to int): postgres=# select jsonb_set('[1, 2, 3, 4, 5,6,7,8,9,10,11,12,13,14,15,16,17,18]', '{9223372036854775806}'::text[], 'Input unsanitized', false) ; jsonb_set -- [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, Input unsanitized, 18] (1 row) [1] https://www.securecoding.cert.org/confluence/display/cplusplus/INT06-CPP.+Use+strtol()+or+a+related+function+to+convert+a+string+token+to+an+integer -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [idea] more aggressive join pushdown on postgres_fdw
On Sat, May 30, 2015 at 9:03 PM, Kouhei Kaigai kai...@ak.jp.nec.com wrote: Yesterday, JPUG held an unconference event at Tokyo, and Hanada-san had a talk about join-pushdown feature of postgres_fdw. At this talk, someone proposed an interesting idea to make join pushdown more aggressive/effective. Let me share it with pgsql-hackers. He said, we may have a workload to join a large foreign- scan and a small local-scan regardless of the plan type. For example: joinrel (expected nrows = 5) + outerrel ForeignScan (expected nrows = 100) + innerrel LocalScan (expected nrows = 5) In this case, we may be able to run the entire joinrel on the remote side then fetch just 5 rows, if fdw-driver construct VALUES() clause according to the contents of LocalScan then makes an entire join query with another one kept in ForeignScan. If above ForeignScan have the following remote query, SELECT a, b, c FROM t0 WHERE d 100 we may be able to construct the query below to run remote join with local (small) relation. SELECT a, b, c, x, y FROM (SELECT a, b, c FROM t0 WHERE d 100) AS ft JOIN (VALUES (1,'aaa'), (2,'bbb'), (3,'ccc'), (4,'ddd'), (5,'eee')) AS lt (x, y) ON ft.a = lt.x The VALUES clauses can be mechanically constructed according to the result set of LocalScan, and it is not difficult to make such a remote query on top of the existing ForeignScan. In the result, it will reduce amount of network traffic and CPU cycles to form/deform tuples dramatically. I don't intend to implement this idea urgently (of course, join pushdown for both ForeignScan case has higher priority), however, it makes sense to keep the future direction in mind. Also, as an aside, even though Hanada-san mentioned ForeignScan does not need an infrastructure to initialize child path nodes, this idea may require ForeignScan to have local child path. Neat idea. This ties into something I've thought about and mentioned before: what if the innerrel is local, but there's a replicated copy on the remote server? Perhaps both cases are worth thinking about at some point. I think, here is both merit and de-merit for each. It implies either of them never always-better-strategy. * Push out local table as VALUES(...) clause Good: No restriction to functions/operators in the local scan or underlying plan node. Bad: High cost for data format modification (HeapTupleSlot = VALUES(...) clause in text), and 2-way data transfer. * Remote join between foreign table and replicated table Good: Data already exists on remote side, no need to kick out contents of local relation (and no need to consume CPU cycle to make VALUES() clause). Bad: Functions/operators are restricted as existing postgres_fdw is doing. Only immutable and built-in ones are available to run on the remote side. BTW, do we need either of tables being foreign table, if entire database is (synchronously) replicated? Also, loopback server may be a candidate even if not replicated (although it may be an entrance of deadlock heaven). Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei kai...@ak.jp.nec.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Further issues with jsonb semantics, documentation
On 06/04/2015 03:10 PM, Peter Geoghegan wrote: On Thu, Jun 4, 2015 at 6:43 AM, Andrew Dunstan and...@dunslane.net wrote: I've noticed some more issues with the jsonb documentation, and the new jsonb stuff generally. I didn't set out to give Andrew feedback on the semantics weeks after feature freeze, but unfortunately this feels like another discussion that we need to have now rather than later. Yes, I wish you had raised these issues months ago when this was published. That's the way the process is supposed to work. I also wish that I managed to do that. As you know, I was working overtime to get UPSERT into 9.5 during that period. Finding time to review things is always difficult, and I which I could do more. That's happened to me in the past. My view has generally been that in that case I have missed my chance, and I need to live with what others have done. That seems to me preferable to tearing up any pretense we might have to be following a defined development process. I should point out that I have already gone out of my way to accommodate concerns you expressed extremely late about this set of features, and I have lately indicated another area where we can adjust it to meet your objections. Re-litigating this wholesale seems quite a different kettle of fish, however. Just in case it's not clear: I am not at all happy. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Further issues with jsonb semantics, documentation
On Thu, Jun 4, 2015 at 5:31 PM, Andrew Dunstan and...@dunslane.net wrote: Just in case it's not clear: I am not at all happy. I've offered to help you with several of the issue I raised; I had intended to offer more help. The issues I raise seem pretty substantive to me. I'm trying to make sure that we don't end up with something bad that we need to live with indefinitely. I have offered you something not far off an everybody wins proposal (i.e. no real loss of functionality), and that was my first proposal. I don't know what more I could do for you. I *am* trying to help. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Map basebackup tablespaces using a tablespace_map file
On Thu, Jun 4, 2015 at 8:43 PM, Andrew Dunstan and...@dunslane.net wrote: On 06/04/2015 12:44 AM, Amit Kapila wrote: Given that the function raises an error on failure, I think it will otherwise be OK as is. Please find an updated patch attached with this mail. No attachment. cheers I have sent it in the next mail, but anyway sending it again in case you have missed it. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com remove_only_symlinks_during_recovery_v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Map basebackup tablespaces using a tablespace_map file
On Fri, Jun 5, 2015 at 9:57 AM, Andrew Dunstan and...@dunslane.net wrote: On 06/04/2015 11:35 PM, Amit Kapila wrote: Theoretically, I don't see much problem by changing the checks way you have done in patch, but it becomes different than what we have in destroy_tablespace_directories() and it is slightly changing the way check was originally done in create_tablespace_directories(), basically original check will try unlink if lstat returns non-zero return code. If you want to proceed with the changed checks as in v3, then may be we can modify comments on top of function remove_tablespace_symlink() which indicates that it works like destroy_tablespace_directories(). The difference is that here we're getting the list from a base backup and it seems to me the risk of having a file we don't really want to unlink is significantly greater. Okay, I think I can understand why you want to be cautious for having a different check for this path, but in that case there is a chance that recovery might fail when it will try to create a symlink for that file. Shouldn't we then try to call this new function only when we are trying to restore from tablespace_map file and also is there a need of ifdef S_ISLINK in remove_tablespace_link? With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] [idea] more aggressive join pushdown on postgres_fdw
On Thu, Jun 4, 2015 at 9:40 PM, Kouhei Kaigai kai...@ak.jp.nec.com wrote: Neat idea. This ties into something I've thought about and mentioned before: what if the innerrel is local, but there's a replicated copy on the remote server? Perhaps both cases are worth thinking about at some point. I think, here is both merit and de-merit for each. It implies either of them never always-better-strategy. * Push out local table as VALUES(...) clause Good: No restriction to functions/operators in the local scan or underlying plan node. Bad: High cost for data format modification (HeapTupleSlot = VALUES(...) clause in text), and 2-way data transfer. * Remote join between foreign table and replicated table Good: Data already exists on remote side, no need to kick out contents of local relation (and no need to consume CPU cycle to make VALUES() clause). Bad: Functions/operators are restricted as existing postgres_fdw is doing. Only immutable and built-in ones are available to run on the remote side. Sure. BTW, do we need either of tables being foreign table, if entire database is (synchronously) replicated? Also, loopback server may be a candidate even if not replicated (although it may be an entrance of deadlock heaven). I suppose it's possible that this sort of thing could work out to a win, but I think it's much less likely to work out than pushing down a foreign/local join using either the VALUES trick or a replicated copy. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Map basebackup tablespaces using a tablespace_map file
On 06/04/2015 09:23 AM, Amit Kapila wrote: Okay, as we both seem to agree that it can be mostly used in tablespace symlinks context, so I have changed the name to remove_tablespace_symlink() and moved the function to tablespace.c. S_ISLINK check is used for non-windows code, so not sure adding it here makes any real difference now that we have made it specific to tablespace and we might need to write small port specific code if we want to add S_ISLINK check. Where is it used? I can't see it called at all in tablespace.c or xlog.c. Perhaps I'm being overcautious, but here's more or less what I had in mind. cheers andrew diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 666fa37..8d75d0c 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -38,6 +38,7 @@ #include catalog/catversion.h #include catalog/pg_control.h #include catalog/pg_database.h +#include commands/tablespace.h #include miscadmin.h #include pgstat.h #include postmaster/bgwriter.h @@ -6094,7 +6095,6 @@ StartupXLOG(void) if (read_tablespace_map(tablespaces)) { ListCell *lc; - struct stat st; foreach(lc, tablespaces) { @@ -6105,27 +6105,9 @@ StartupXLOG(void) /* * Remove the existing symlink if any and Create the symlink - * under PGDATA. We need to use rmtree instead of rmdir as - * the link location might contain directories or files - * corresponding to the actual path. Some tar utilities do - * things that way while extracting symlinks. + * under PGDATA. */ -if (lstat(linkloc, st) == 0 S_ISDIR(st.st_mode)) -{ - if (!rmtree(linkloc, true)) - ereport(ERROR, -(errcode_for_file_access(), - errmsg(could not remove directory \%s\: %m, - linkloc))); -} -else -{ - if (unlink(linkloc) 0 errno != ENOENT) - ereport(ERROR, -(errcode_for_file_access(), - errmsg(could not remove symbolic link \%s\: %m, - linkloc))); -} +remove_tablespace_symlink(linkloc); if (symlink(ti-path, linkloc) 0) ereport(ERROR, diff --git a/src/backend/commands/tablespace.c b/src/backend/commands/tablespace.c index 4ec1aff..e8acf27 100644 --- a/src/backend/commands/tablespace.c +++ b/src/backend/commands/tablespace.c @@ -627,31 +627,9 @@ create_tablespace_directories(const char *location, const Oid tablespaceoid) /* * In recovery, remove old symlink, in case it points to the wrong place. - * - * On Windows, junction points act like directories so we must be able to - * apply rmdir; in general it seems best to make this code work like the - * symlink removal code in destroy_tablespace_directories, except that - * failure to remove is always an ERROR. */ if (InRecovery) - { - if (lstat(linkloc, st) == 0 S_ISDIR(st.st_mode)) - { - if (rmdir(linkloc) 0) -ereport(ERROR, - (errcode_for_file_access(), - errmsg(could not remove directory \%s\: %m, -linkloc))); - } - else - { - if (unlink(linkloc) 0 errno != ENOENT) -ereport(ERROR, - (errcode_for_file_access(), - errmsg(could not remove symbolic link \%s\: %m, -linkloc))); - } - } + remove_tablespace_symlink(linkloc); /* * Create the symlink under PGDATA @@ -848,6 +826,61 @@ directory_is_empty(const char *path) return true; } +/* + * remove_tablespace_symlink + * + * This function removes symlinks in pg_tblspc. On Windows, junction points + * act like directories so we must be able to apply rmdir. This function + * works like the symlink removal code in destroy_tablespace_directories, + * except that failure to remove is always an ERROR. + */ +void +remove_tablespace_symlink(const char *linkloc) +{ + struct stat st; + + if (lstat(linkloc, st) != 0) + { + if (errno == ENOENT) + return; + else + ereport(ERROR, + (errcode_for_file_access(), + errmsg(could not stat \%s\: %m, + linkloc))); + } + + if (S_ISDIR(st.st_mode)) + { + /* + * This will fail if the directory isn't empty, but not + * if it's a junction point. + */ + if (rmdir(linkloc) 0) + ereport(ERROR, + (errcode_for_file_access(), + errmsg(could not remove directory \%s\: %m, + linkloc))); + } +#ifdef S_ISLNK + else if (S_ISLNK(st.st_mode)) + { + if (unlink(linkloc) 0 errno != ENOENT) + ereport(ERROR, + (errcode_for_file_access(), + errmsg(could not remove symbolic link \%s\: %m, + linkloc))); + } +#endif + else + { + /* Refuse to remove anything that's not a directory or symlink */ + ereport(ERROR, +(ERRCODE_SYSTEM_ERROR, + errmsg(Not a directory or symbolic link: \%s\, + linkloc))); + } +} /* * Rename a tablespace diff --git a/src/include/commands/tablespace.h b/src/include/commands/tablespace.h index 86b0477..6b928a5 100644 --- a/src/include/commands/tablespace.h +++ b/src/include/commands/tablespace.h @@ -56,6
Re: [HACKERS] [idea] more aggressive join pushdown on postgres_fdw
On Thu, Jun 4, 2015 at 9:40 PM, Kouhei Kaigai kai...@ak.jp.nec.com wrote: Neat idea. This ties into something I've thought about and mentioned before: what if the innerrel is local, but there's a replicated copy on the remote server? Perhaps both cases are worth thinking about at some point. I think, here is both merit and de-merit for each. It implies either of them never always-better-strategy. * Push out local table as VALUES(...) clause Good: No restriction to functions/operators in the local scan or underlying plan node. Bad: High cost for data format modification (HeapTupleSlot = VALUES(...) clause in text), and 2-way data transfer. * Remote join between foreign table and replicated table Good: Data already exists on remote side, no need to kick out contents of local relation (and no need to consume CPU cycle to make VALUES() clause). Bad: Functions/operators are restricted as existing postgres_fdw is doing. Only immutable and built-in ones are available to run on the remote side. Sure. BTW, do we need either of tables being foreign table, if entire database is (synchronously) replicated? Also, loopback server may be a candidate even if not replicated (although it may be an entrance of deadlock heaven). I suppose it's possible that this sort of thing could work out to a win, but I think it's much less likely to work out than pushing down a foreign/local join using either the VALUES trick or a replicated copy. Hmm, it might be too aggressive approach. If we would try to implement, postgres_fdw will need to add so many junk paths (expensive than usual local ones) to consider remote join between replicated local tables. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei kai...@ak.jp.nec.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Map basebackup tablespaces using a tablespace_map file
On Fri, Jun 5, 2015 at 7:29 AM, Andrew Dunstan and...@dunslane.net wrote: On 06/04/2015 09:23 AM, Amit Kapila wrote: Okay, as we both seem to agree that it can be mostly used in tablespace symlinks context, so I have changed the name to remove_tablespace_symlink() and moved the function to tablespace.c. S_ISLINK check is used for non-windows code, so not sure adding it here makes any real difference now that we have made it specific to tablespace and we might need to write small port specific code if we want to add S_ISLINK check. Where is it used? I can't see it called at all in tablespace.c or xlog.c. Below files use S_ISLINK check basebackup.c, fd.c, initdb.c, copy_fetch.c, pg_rewind/filemap.c and all these places use it with #ifndef WIN32 Perhaps I'm being overcautious, but here's more or less what I had in mind. What is making you feel nervous, if it is that we should not use unlink call without checking S_ISLINK, then we are already doing the same at many other places (rewriteheap.c, slru.c, timeline.c, xlog.c). It is already defined for Windows as pgunlink. Theoretically, I don't see much problem by changing the checks way you have done in patch, but it becomes different than what we have in destroy_tablespace_directories() and it is slightly changing the way check was originally done in create_tablespace_directories(), basically original check will try unlink if lstat returns non-zero return code. If you want to proceed with the changed checks as in v3, then may be we can modify comments on top of function remove_tablespace_symlink() which indicates that it works like destroy_tablespace_directories(). With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] [CORE] Restore-reliability mode
On Fri, Jun 5, 2015 at 8:53 AM, Craig Ringer cr...@2ndquadrant.com wrote: On 4 June 2015 at 22:43, Stephen Frost sfr...@snowman.net wrote: Josh, * Josh Berkus (j...@agliodbs.com) wrote: I would argue that if we delay 9.5 in order to do a 100% manual review of code, without adding any new automated tests or other non-manual tools for improving stability, then it's a waste of time; we might as well just release the beta, and our users will find more issues than we will. I am concerned that if we declare a cleanup period, especially in the middle of the summer, all that will happen is that the project will go to sleep for an extra three months. This is the exact same concern that I have. A delay just to have a delay is not useful. I completely agree that we need more automated testing, etc, though getting all of that set up and running could be done at any time too- there's no reason to wait, nor do I believe delaying 9.5 would make such automated testing appear. In terms of specific testing improvements, things I think we need to have covered and runnable on the buildfarm are: * pg_dump and pg_restore testing (because it's scary we don't do this) We do test it in some way with pg_upgrade using set of objects that are not removed by the regression test suite. Extension dumps are uncovered yet though. * WAL archiving based warm standby testing with promotion * Two node streaming replication with promotion, both with a slot and with archive fallback * Three node cascading streaming replication with middle node promotion then tail end node promotion * Logical decoding streaming testing, comparing to expected decoded output * hard-kill the postmaster, start up from crashed datadir * pg_basebackup + start up from backup * pg_start_backup, rsync, pg_stop_backup, start up in hot standby * Tests of crash recovery during various DDL operations Well, steps in this direction are the point of this patch, the replication test suite: https://commitfest.postgresql.org/5/197/ And this one, addition of Windows support for TAP tests: https://commitfest.postgresql.org/5/207/ * DDL deparse test coverage for all operations What do you have in mind except what is already in objectaddress.sql and src/test/modules/test_dll_deparse/? * disk exhaustion tests both for pg_xlog and for the main datadir, showing we can recover OK when disk is filled then space is freed This may be tricky. How would you emulate that? Is pg_tap a reasonable starting point for this sort of testing? IMO, using the TAP machinery would be a good base for that. What lacks is a basic set of perl routines that one can easily use to set of test scenarios. How would a test that would've caught the multixact issues look? I have not followed closely those discussions, not sure about that. Regards, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [GENERAL] 9.4.1 - 9.4.2 problem: could not access status of transaction 1
On Fri, Jun 5, 2015 at 11:47 AM, Thomas Munro thomas.mu...@enterprisedb.com wrote: On Fri, Jun 5, 2015 at 9:29 AM, Robert Haas robertmh...@gmail.com wrote: Here's a new version with some more fixes and improvements: - SetOffsetVacuumLimit was failing to set MultiXactState-oldestOffset when the oldest offset became known if the now-known value happened to be zero. Fixed. - SetOffsetVacuumLimit now logs useful information at the DEBUG1 level, so that you can see that it's doing what it's supposed to. - TruncateMultiXact now calls DetermineSafeOldestOffset to adjust the offsetStopLimit even if it can't truncate anything. This seems useless, but it's not, because it may be that the last checkpoint advanced lastCheckpointedOldest from a bogus value (i.e. 1) to a real value, and now we can actually set offsetStopLimit properly. - TruncateMultiXact no longer calls find_multixact_start when there are no remaining multixacts. This is actually a completely separate bug that goes all the way back to 9.3.0 and can potentially cause TruncateMultiXact to remove every file in pg_multixact/offsets. Restarting the cluster becomes impossible because TrimMultiXact barfs. - TruncateMultiXact now logs a message if the oldest multixact does not precede the earliest one on disk and is not equal to the next multixact and yet does not exist. The value of the log message is that it discovered the bug mentioned in the previous line, so I think it's earning its keep. With this version, I'm able to see that when you start up a 9.3.latest+this patch with a cluster that has a bogus value of 1 in relminmxid, datminmxid, and the control file, autovacuum vacuums everything in sight, all the values get set back to the right thing, and the next checkpoint enables the member-wraparound guards. This works with both autovacuum=on and autovacuum=off; the emergency mechanism kicks in as intended. We'll want to warn people with big databases who upgrade to 9.3.0 - 9.3.4 via pg_upgrade that they may want to pre-vacuum those tables before upgrading to avoid a vacuum storm. But generally I'm pretty happy with this: forcing those values to get fixed so that we can guard against member-space wraparound seems like the right thing to do. So, to summarize, this patch does the following: - Fixes the failure-to-start problems introduced in 9.4.2 in complicated pg_upgrade scenarios. - Prevents the new calls to find_multixact_start we added in 9.4.2 from happening during recovery, where they can only create failure scenarios. The call in TruncateMultiXact that has been there all along is not eliminated, but now handles failure more gracefully. - Fixes possible incorrect removal of every single pg_multixact/offsets file when no multixacts exist; one file should be kept. - Forces aggressive autovacuuming when the control file's oldestMultiXid doesn't point to a valid MultiXact and enables member wraparound at the next checkpoint following the correction of that problem. With this patch, when I run the script checkpoint-segment-boundary.sh from http://www.postgresql.org/message-id/CAEepm=1_KbHGbmPVmkUGE5qTP+B4efoCJYS0unGo-Mc5NV=u...@mail.gmail.com I see the following during shutdown checkpoint: LOG: could not truncate directory pg_multixact/offsets: apparent wraparound That message comes from SimpleLruTruncate. Suggested patch attached. -- Thomas Munro http://www.enterprisedb.com fence-post.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Map basebackup tablespaces using a tablespace_map file
On 06/04/2015 11:35 PM, Amit Kapila wrote: On Fri, Jun 5, 2015 at 7:29 AM, Andrew Dunstan and...@dunslane.net mailto:and...@dunslane.net wrote: On 06/04/2015 09:23 AM, Amit Kapila wrote: Okay, as we both seem to agree that it can be mostly used in tablespace symlinks context, so I have changed the name to remove_tablespace_symlink() and moved the function to tablespace.c. S_ISLINK check is used for non-windows code, so not sure adding it here makes any real difference now that we have made it specific to tablespace and we might need to write small port specific code if we want to add S_ISLINK check. Where is it used? I can't see it called at all in tablespace.c or xlog.c. Below files use S_ISLINK check basebackup.c, fd.c, initdb.c, copy_fetch.c, pg_rewind/filemap.c and all these places use it with #ifndef WIN32 Perhaps I'm being overcautious, but here's more or less what I had in mind. What is making you feel nervous, if it is that we should not use unlink call without checking S_ISLINK, then we are already doing the same at many other places (rewriteheap.c, slru.c, timeline.c, xlog.c). It is already defined for Windows as pgunlink. Theoretically, I don't see much problem by changing the checks way you have done in patch, but it becomes different than what we have in destroy_tablespace_directories() and it is slightly changing the way check was originally done in create_tablespace_directories(), basically original check will try unlink if lstat returns non-zero return code. If you want to proceed with the changed checks as in v3, then may be we can modify comments on top of function remove_tablespace_symlink() which indicates that it works like destroy_tablespace_directories(). The difference is that here we're getting the list from a base backup and it seems to me the risk of having a file we don't really want to unlink is significantly greater. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Incorrect order of database-locking operations in InitPostgres()
I've been chasing the intermittent cache lookup failed for access method 403 failure at session startup that's been seen lately in the buildfarm, for instance here: http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=axolotldt=2015-06-04%2019%3A22%3A46 (Axolotl has shown this 3 times in the last 90 days, not sure if any others have seen it.) I hypothesized that this was triggered by the VACUUM FULL pg_am in the concurrently running vacuum.sql regression test, so I started running the regression tests in parallel with a shell script doing while sleep 0.1; do psql -c 'vacuum full pg_am' regression; done and sure enough, I can reproduce it once in awhile. I've not tracked it down yet, but I have gotten pretty annoyed by an unrelated problem: every so often the DROP DATABASE regression at the start of the regression test sequence will hang up for 5 seconds and then fail, complaining there's another session already in the DB. Meanwhile, the current psql -c call also hangs up for 5 seconds. What is happening is a sort of deadlock between InitPostgres, which does this: /* Now we can mark our PGPROC entry with the database ID */ /* (We assume this is an atomic store so no lock is needed) */ MyProc-databaseId = MyDatabaseId; and then this: if (!bootstrap) LockSharedObject(DatabaseRelationId, MyDatabaseId, 0, RowExclusiveLock); and dropdb/CountOtherDBBackends, which first gets the database lock and then looks to see if any other sessions are advertising the target database OID in their PGPROC. If such sessions exist and don't go away within 5 seconds then CountOtherDBBackends fails. So, if an incoming connection sets MyProc-databaseId between the time that dropdb acquires the database lock and the time that CountOtherDBBackends looks, we have an effective deadlock because that incoming session will then block on the database lock. AFAICS, this is easy to fix by swapping the order of the above-mentioned operations, ie get the lock before setting MyProc-databaseId, as per attached patch. Processes examining the PGPROC array must acquire the database lock before looking for sessions in the target DB, so they'll still see any conflicting session. On the other hand, the incoming session already has to recheck that the target DB is still there once it's got the lock, so there's no risk of problems on that side either. Unless somebody can see a problem with this, I propose to apply and back-patch this change. The behavior is infrequent, but it's pretty nasty when it does occur, since all incoming connections for the target database are hung up for 5 seconds. The case of DROP DATABASE might not be a big deal, but this would also for example cause unwanted failures in CREATE DATABASE if there were short-term connections to the template database. regards, tom lane diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c index aa67f75..a5d88c1 100644 *** a/src/backend/utils/init/postinit.c --- b/src/backend/utils/init/postinit.c *** InitPostgres(const char *in_dbname, Oid *** 848,865 strcpy(out_dbname, dbname); } - /* Now we can mark our PGPROC entry with the database ID */ - /* (We assume this is an atomic store so no lock is needed) */ - MyProc-databaseId = MyDatabaseId; - - /* - * We established a catalog snapshot while reading pg_authid and/or - * pg_database; but until we have set up MyDatabaseId, we won't react to - * incoming sinval messages for unshared catalogs, so we won't realize it - * if the snapshot has been invalidated. Assume it's no good anymore. - */ - InvalidateCatalogSnapshot(); - /* * Now, take a writer's lock on the database we are trying to connect to. * If there is a concurrently running DROP DATABASE on that database, this --- 848,853 *** InitPostgres(const char *in_dbname, Oid *** 867,875 * pg_database). * * Note that the lock is not held long, only until the end of this startup ! * transaction. This is OK since we are already advertising our use of ! * the database in the PGPROC array; anyone trying a DROP DATABASE after ! * this point will see us there. * * Note: use of RowExclusiveLock here is reasonable because we envision * our session as being a concurrent writer of the database. If we had a --- 855,868 * pg_database). * * Note that the lock is not held long, only until the end of this startup ! * transaction. This is OK since we will advertise our use of the ! * database in the PGPROC array before dropping the lock (in fact, that's ! * the next thing to do). Anyone trying a DROP DATABASE after this point ! * will see us in PGPROC once they have the lock. ! * ! * Ordering is important here because we don't want to advertise ourselves ! * as being in this database until we have the lock; otherwise we create ! * what amounts to a
Re: [HACKERS] Memory leak with XLogFileCopy since de768844 (WAL file with .partial)
On Thu, Jun 4, 2015 at 10:40 PM, Fujii Masao masao.fu...@gmail.com wrote: On Mon, Jun 1, 2015 at 4:24 PM, Michael Paquier michael.paqu...@gmail.com wrote: On Thu, May 28, 2015 at 9:09 PM, Michael Paquier michael.paqu...@gmail.com wrote: Since commit de768844, XLogFileCopy of xlog.c returns to caller a pstrdup'd string that can be used afterwards for other things. XLogFileCopy is used in only one place, and it happens that the result string is never freed at all, leaking memory. That seems to be almost harmless because the startup process will exit just after calling XLogFileCopy(). No? Yes that's harmless. My point here is correctness, prevention does not hurt particularly if this code path is used more in the future. Also the current error message in case of failure is very C-like: elog(ERROR, InstallXLogFileSegment should not have failed); I thought that we to use function names in the error messages. Wouldn't something like that be more adapted? could not copy partial log file or could not copy partial log file into temporary segment file Or we can extend InstallXLogFileSegment so that we can give the log level to it. When InstallXLogFileSegment is called after XLogFileCopy, we can give ERROR as the log level and cause an exception to occur if link() or rename() fails in InstallXLogFileSegment. That's neat. Done this way in the attached. -- Michael diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 666fa37..5ee68c1 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -807,7 +807,7 @@ static bool XLogCheckpointNeeded(XLogSegNo new_segno); static void XLogWrite(XLogwrtRqst WriteRqst, bool flexible); static bool InstallXLogFileSegment(XLogSegNo *segno, char *tmppath, bool find_free, XLogSegNo max_segno, - bool use_lock); + bool use_lock, int elevel); static int XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli, int source, bool notexistOk); static int XLogFileReadAnyTLI(XLogSegNo segno, int emode, int source); @@ -3012,7 +3012,7 @@ XLogFileInit(XLogSegNo logsegno, bool *use_existent, bool use_lock) max_segno = logsegno + CheckPointSegments; if (!InstallXLogFileSegment(installed_segno, tmppath, *use_existent, max_segno, -use_lock)) +use_lock, LOG)) { /* * No need for any more future segments, or InstallXLogFileSegment() @@ -3041,18 +3041,15 @@ XLogFileInit(XLogSegNo logsegno, bool *use_existent, bool use_lock) /* * Copy a WAL segment file in pg_xlog directory. * - * dstfname destination filename * srcfname source filename * upto how much of the source file to copy? (the rest is filled with *zeros) * - * If dstfname is not given, the file is created with a temporary filename, - * which is returned. Both filenames are relative to the pg_xlog directory. - * - * NB: Any existing file with the same name will be overwritten! + * The file is created with a temporary filename, which is returned. Both + * filenames are relative to the pg_xlog directory. */ static char * -XLogFileCopy(char *dstfname, char *srcfname, int upto) +XLogFileCopy(char *srcfname, int upto) { char srcpath[MAXPGPATH]; char tmppath[MAXPGPATH]; @@ -3150,25 +3147,8 @@ XLogFileCopy(char *dstfname, char *srcfname, int upto) CloseTransientFile(srcfd); - /* - * Now move the segment into place with its final name. (Or just return - * the path to the file we created, if the caller wants to handle the rest - * on its own.) - */ - if (dstfname) - { - char dstpath[MAXPGPATH]; - - snprintf(dstpath, MAXPGPATH, XLOGDIR /%s, dstfname); - if (rename(tmppath, dstpath) 0) - ereport(ERROR, - (errcode_for_file_access(), - errmsg(could not rename file \%s\ to \%s\: %m, - tmppath, dstpath))); - return NULL; - } - else - return pstrdup(tmppath); + /* return name to caller, to let him hamdle the rest */ + return pstrdup(tmppath); } /* @@ -3195,6 +3175,8 @@ XLogFileCopy(char *dstfname, char *srcfname, int upto) * place. This should be TRUE except during bootstrap log creation. The * caller must *not* hold the lock at call. * + * elevel: log level used by this routine. + * * Returns TRUE if the file was installed successfully. FALSE indicates that * max_segno limit was exceeded, or an error occurred while renaming the * file into place. @@ -3202,7 +3184,7 @@ XLogFileCopy(char *dstfname, char *srcfname, int upto) static bool InstallXLogFileSegment(XLogSegNo *segno, char *tmppath, bool find_free, XLogSegNo max_segno, - bool use_lock) + bool use_lock, int elevel) { char path[MAXPGPATH]; struct stat stat_buf; @@ -3247,7 +3229,7 @@ InstallXLogFileSegment(XLogSegNo *segno, char *tmppath, { if (use_lock) LWLockRelease(ControlFileLock); - ereport(LOG, + ereport(elevel, (errcode_for_file_access(), errmsg(could not link file \%s\ to \%s\ (initialization
[HACKERS] Re: [GENERAL] 9.4.1 - 9.4.2 problem: could not access status of transaction 1
On Thu, Jun 4, 2015 at 5:29 PM, Robert Haas robertmh...@gmail.com wrote: - Forces aggressive autovacuuming when the control file's oldestMultiXid doesn't point to a valid MultiXact and enables member wraparound at the next checkpoint following the correction of that problem. Err, enables member wraparound *protection* at the next checkpoint, not the wraparound itself. It's worth noting that every startup will now include one of the following two messages: LOG: MultiXact member wraparound protections are now enabled Or: LOG: MultiXact member wraparound protections are disabled because oldest checkpointed MultiXact %u does not exist on disk ...where %u is probably 1 If you get the second one, you'll get the first one later after vacuum has done its thing and a checkpoint has happened. This is, obviously, some log chatter for people who don't have a problem and never have, but I think it's worth emitting the first message at startup even when there's no problem, so that people don't have to make inferences from the absence of a message. We can tell people very simply that (1) if they see the first message, everything is fine; (2) if they see the second message, autovacuum is going to clean things up and they will eventually see the first message; and (3) if they see neither message, they haven't upgraded to a fixed version yet. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [idea] more aggressive join pushdown on postgres_fdw
On Sat, May 30, 2015 at 9:03 PM, Kouhei Kaigai kai...@ak.jp.nec.com wrote: Yesterday, JPUG held an unconference event at Tokyo, and Hanada-san had a talk about join-pushdown feature of postgres_fdw. At this talk, someone proposed an interesting idea to make join pushdown more aggressive/effective. Let me share it with pgsql-hackers. He said, we may have a workload to join a large foreign- scan and a small local-scan regardless of the plan type. For example: joinrel (expected nrows = 5) + outerrel ForeignScan (expected nrows = 100) + innerrel LocalScan (expected nrows = 5) In this case, we may be able to run the entire joinrel on the remote side then fetch just 5 rows, if fdw-driver construct VALUES() clause according to the contents of LocalScan then makes an entire join query with another one kept in ForeignScan. If above ForeignScan have the following remote query, SELECT a, b, c FROM t0 WHERE d 100 we may be able to construct the query below to run remote join with local (small) relation. SELECT a, b, c, x, y FROM (SELECT a, b, c FROM t0 WHERE d 100) AS ft JOIN (VALUES (1,'aaa'), (2,'bbb'), (3,'ccc'), (4,'ddd'), (5,'eee')) AS lt (x, y) ON ft.a = lt.x The VALUES clauses can be mechanically constructed according to the result set of LocalScan, and it is not difficult to make such a remote query on top of the existing ForeignScan. In the result, it will reduce amount of network traffic and CPU cycles to form/deform tuples dramatically. I don't intend to implement this idea urgently (of course, join pushdown for both ForeignScan case has higher priority), however, it makes sense to keep the future direction in mind. Also, as an aside, even though Hanada-san mentioned ForeignScan does not need an infrastructure to initialize child path nodes, this idea may require ForeignScan to have local child path. Neat idea. This ties into something I've thought about and mentioned before: what if the innerrel is local, but there's a replicated copy on the remote server? Perhaps both cases are worth thinking about at some point. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [CORE] Restore-reliability mode
On 4 June 2015 at 22:43, Stephen Frost sfr...@snowman.net wrote: Josh, * Josh Berkus (j...@agliodbs.com) wrote: I would argue that if we delay 9.5 in order to do a 100% manual review of code, without adding any new automated tests or other non-manual tools for improving stability, then it's a waste of time; we might as well just release the beta, and our users will find more issues than we will. I am concerned that if we declare a cleanup period, especially in the middle of the summer, all that will happen is that the project will go to sleep for an extra three months. This is the exact same concern that I have. A delay just to have a delay is not useful. I completely agree that we need more automated testing, etc, though getting all of that set up and running could be done at any time too- there's no reason to wait, nor do I believe delaying 9.5 would make such automated testing appear. In terms of specific testing improvements, things I think we need to have covered and runnable on the buildfarm are: * pg_dump and pg_restore testing (because it's scary we don't do this) * WAL archiving based warm standby testing with promotion * Two node streaming replication with promotion, both with a slot and with archive fallback * Three node cascading streaming replication with middle node promotion then tail end node promotion * Logical decoding streaming testing, comparing to expected decoded output * DDL deparse test coverage for all operations * pg_basebackup + start up from backup * hard-kill the postmaster, start up from crashed datadir * pg_start_backup, rsync, pg_stop_backup, start up in hot standby * disk exhaustion tests both for pg_xlog and for the main datadir, showing we can recover OK when disk is filled then space is freed * Tests of crash recovery during various DDL operations Obviously some of these overlap, so one test can cover more than one item. Implementing these requires stepping outside the comfortable zone of pg_regress and the isolationtester and having something that can manage multiple data directories. It's also hard to be sure you're testing the same thing each time - for example, when using streaming replication with archive fallback, it might be tricky to ensure that your replica falls behind and falls back to WAL archive each time. There's always SIGSTOP I guess. While these are multi-node tests, at least in PostgreSQL we can just run on different ports, so there's no need to muck about with containers or VMs. I already run some of these tests using Ansible for BDR, but I don't imagine that'd be acceptable in core. It's Python, and it's not especially well suited to use as a regression testing framework, it's just what I had to hand and already needed for other automation tasks. Is pg_tap a reasonable starting point for this sort of testing? Am I missing obvious and important tests? How would a test that would've caught the multixact issues look? -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services
Re: [HACKERS] Further issues with jsonb semantics, documentation
On 06/04/2015 04:13 PM, David E. Wheeler wrote: On Jun 4, 2015, at 12:16 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: I'm just skimming here, but if a jsonb_path type is being proposed, Is this not the purpose of JSQuery? https://code.google.com/p/gwtquery/wiki/JsQuery No, it doesn't seem to have anything at all to do with it. What I suggested would be an implementation of json_pointer - see http://tools.ietf.org/html/rfc6901 cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [GENERAL] 9.4.1 - 9.4.2 problem: could not access status of transaction 1
On Thu, Jun 4, 2015 at 12:57 PM, Robert Haas robertmh...@gmail.com wrote: On Thu, Jun 4, 2015 at 9:42 AM, Robert Haas robertmh...@gmail.com wrote: Thanks for the review. Here's a new version. I've fixed the things Alvaro and Noah noted, and some compiler warnings about set but unused variables. I also tested it, and it doesn't quite work as hoped. If started on a cluster where oldestMultiXid is incorrectly set to 1, it starts up and indicates that the member wraparound guards are disabled. But even after everything is fixed, they don't get enabled until after the next full restart. I think that's because TruncateMultiXact() bails out too early, without calling DetermineSafeOldestOffset. My attempt at a quick fix for that problem didn't work out, so I'm posting this version for now to facilitate further review and testing. Here's a new version with some more fixes and improvements: - SetOffsetVacuumLimit was failing to set MultiXactState-oldestOffset when the oldest offset became known if the now-known value happened to be zero. Fixed. - SetOffsetVacuumLimit now logs useful information at the DEBUG1 level, so that you can see that it's doing what it's supposed to. - TruncateMultiXact now calls DetermineSafeOldestOffset to adjust the offsetStopLimit even if it can't truncate anything. This seems useless, but it's not, because it may be that the last checkpoint advanced lastCheckpointedOldest from a bogus value (i.e. 1) to a real value, and now we can actually set offsetStopLimit properly. - TruncateMultiXact no longer calls find_multixact_start when there are no remaining multixacts. This is actually a completely separate bug that goes all the way back to 9.3.0 and can potentially cause TruncateMultiXact to remove every file in pg_multixact/offsets. Restarting the cluster becomes impossible because TrimMultiXact barfs. - TruncateMultiXact now logs a message if the oldest multixact does not precede the earliest one on disk and is not equal to the next multixact and yet does not exist. The value of the log message is that it discovered the bug mentioned in the previous line, so I think it's earning its keep. With this version, I'm able to see that when you start up a 9.3.latest+this patch with a cluster that has a bogus value of 1 in relminmxid, datminmxid, and the control file, autovacuum vacuums everything in sight, all the values get set back to the right thing, and the next checkpoint enables the member-wraparound guards. This works with both autovacuum=on and autovacuum=off; the emergency mechanism kicks in as intended. We'll want to warn people with big databases who upgrade to 9.3.0 - 9.3.4 via pg_upgrade that they may want to pre-vacuum those tables before upgrading to avoid a vacuum storm. But generally I'm pretty happy with this: forcing those values to get fixed so that we can guard against member-space wraparound seems like the right thing to do. So, to summarize, this patch does the following: - Fixes the failure-to-start problems introduced in 9.4.2 in complicated pg_upgrade scenarios. - Prevents the new calls to find_multixact_start we added in 9.4.2 from happening during recovery, where they can only create failure scenarios. The call in TruncateMultiXact that has been there all along is not eliminated, but now handles failure more gracefully. - Fixes possible incorrect removal of every single pg_multixact/offsets file when no multixacts exist; one file should be kept. - Forces aggressive autovacuuming when the control file's oldestMultiXid doesn't point to a valid MultiXact and enables member wraparound at the next checkpoint following the correction of that problem. Thanks, -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company commit 87aa15fe5257060e0c971e135dd9f460fdc00bd0 Author: Robert Haas rhaas@postgresql.org Date: Thu Jun 4 11:58:49 2015 -0400 bar diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c index 9568ff1..7c457a6 100644 --- a/src/backend/access/transam/multixact.c +++ b/src/backend/access/transam/multixact.c @@ -198,13 +198,24 @@ typedef struct MultiXactStateData /* next-to-be-assigned offset */ MultiXactOffset nextOffset; + /* Have we completed multixact startup? */ + bool finishedStartup; + /* - * Oldest multixact that is still on disk. Anything older than this - * should not be consulted. These values are updated by vacuum. + * Oldest multixact that is still potentially referenced by a relation. + * Anything older than this should not be consulted. These values are + * updated by vacuum. */ MultiXactId oldestMultiXactId; Oid oldestMultiXactDB; + + /* + * Oldest multixact offset that is potentially referenced by a + * multixact referenced by a relation. We don't always know this value, + * so there's a flag here to indicate whether or not we currently do. + */ MultiXactOffset oldestOffset; + bool oldestOffsetKnown;
[HACKERS] Re: [GENERAL] 9.4.1 - 9.4.2 problem: could not access status of transaction 1
On Fri, Jun 5, 2015 at 9:29 AM, Robert Haas robertmh...@gmail.com wrote: Here's a new version with some more fixes and improvements: - SetOffsetVacuumLimit was failing to set MultiXactState-oldestOffset when the oldest offset became known if the now-known value happened to be zero. Fixed. - SetOffsetVacuumLimit now logs useful information at the DEBUG1 level, so that you can see that it's doing what it's supposed to. - TruncateMultiXact now calls DetermineSafeOldestOffset to adjust the offsetStopLimit even if it can't truncate anything. This seems useless, but it's not, because it may be that the last checkpoint advanced lastCheckpointedOldest from a bogus value (i.e. 1) to a real value, and now we can actually set offsetStopLimit properly. - TruncateMultiXact no longer calls find_multixact_start when there are no remaining multixacts. This is actually a completely separate bug that goes all the way back to 9.3.0 and can potentially cause TruncateMultiXact to remove every file in pg_multixact/offsets. Restarting the cluster becomes impossible because TrimMultiXact barfs. - TruncateMultiXact now logs a message if the oldest multixact does not precede the earliest one on disk and is not equal to the next multixact and yet does not exist. The value of the log message is that it discovered the bug mentioned in the previous line, so I think it's earning its keep. With this version, I'm able to see that when you start up a 9.3.latest+this patch with a cluster that has a bogus value of 1 in relminmxid, datminmxid, and the control file, autovacuum vacuums everything in sight, all the values get set back to the right thing, and the next checkpoint enables the member-wraparound guards. This works with both autovacuum=on and autovacuum=off; the emergency mechanism kicks in as intended. We'll want to warn people with big databases who upgrade to 9.3.0 - 9.3.4 via pg_upgrade that they may want to pre-vacuum those tables before upgrading to avoid a vacuum storm. But generally I'm pretty happy with this: forcing those values to get fixed so that we can guard against member-space wraparound seems like the right thing to do. So, to summarize, this patch does the following: - Fixes the failure-to-start problems introduced in 9.4.2 in complicated pg_upgrade scenarios. - Prevents the new calls to find_multixact_start we added in 9.4.2 from happening during recovery, where they can only create failure scenarios. The call in TruncateMultiXact that has been there all along is not eliminated, but now handles failure more gracefully. - Fixes possible incorrect removal of every single pg_multixact/offsets file when no multixacts exist; one file should be kept. - Forces aggressive autovacuuming when the control file's oldestMultiXid doesn't point to a valid MultiXact and enables member wraparound at the next checkpoint following the correction of that problem. With this patch, when I run the script checkpoint-segment-boundary.sh from http://www.postgresql.org/message-id/CAEepm=1_KbHGbmPVmkUGE5qTP+B4efoCJYS0unGo-Mc5NV=u...@mail.gmail.com I see the following during shutdown checkpoint: LOG: could not truncate directory pg_multixact/offsets: apparent wraparound That message comes from SimpleLruTruncate. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Fix documentation bug in how to calculate the quasi-unique pg_log session_id
On Tue, Jun 2, 2015 at 11:22 AM, Joel Jacobson j...@trustly.com wrote: Fix documentation bug in how to calculate the quasi-unique pg_log session_id Committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] brin regression test intermittent failures
Tom Lane wrote: Alvaro Herrera alvhe...@2ndquadrant.com writes: Evidently there is a problem right there. If I simply add an order by tenthous as proposed by Peter, many more errors appear; and what errors appear differs if I change shared_buffers. I think the real fix for this is to change the hand-picked values used in the brinopers table, so that they all pass the test using some reasonable ORDER BY specification in the populating query (probably tenk1.unique1). I may be confused, but why would the physical ordering of the table entries make a difference to the correct answers for this test? (I can certainly see why that might break the brin code, but not why it should change the seqscan's answers.) We create the brintest using a scan of tenk1 LIMIT 100, without specifying the order. So whether we find rows that match each test query is pure chance. Also, what I'd just noticed is that all of the cases that are failing are ones where the expected number of matching rows is exactly 1. I am wondering if the test is sometimes just missing random rows, and we're not seeing any reported problem unless that makes it go down to no rows. (But I do not know how that could simultaneously affect the seqscan case ...) Yeah, we compare the ctid sets of the results, and we assume that a seqscan would get that correctly. I think it would be a good idea to extend the brinopers table to include the number of expected matches, and to complain if that's not what we got, rather than simply checking for zero. That sounds reasonable. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [GENERAL] 9.4.1 - 9.4.2 problem: could not access status of transaction 1
Hi, On 2015-06-04 12:57:42 -0400, Robert Haas wrote: + /* + * Do we need an emergency autovacuum? If we're not sure, assume yes. + */ + return !oldestOffsetKnown || + (nextOffset - oldestOffset MULTIXACT_MEMBER_SAFE_THRESHOLD); I think without teaching autovac about those rules, this might just lead to lots of autovac processes starting without knowing they should do something? They know about autovacuum_multixact_freeze_age, but they know neither about !oldestOffsetKnown nor, afaics, about pending offset wraparounds? -static MultiXactOffset -find_multixact_start(MultiXactId multi) +static bool +find_multixact_start(MultiXactId multi, MultiXactOffset *result) { MultiXactOffset offset; int pageno; @@ -2630,6 +2741,9 @@ find_multixact_start(MultiXactId multi) pageno = MultiXactIdToOffsetPage(multi); entryno = MultiXactIdToOffsetEntry(multi); + if (!SimpleLruDoesPhysicalPageExist(MultiXactOffsetCtl, pageno)) + return false; + /* lock is acquired by SimpleLruReadPage_ReadOnly */ slotno = SimpleLruReadPage_ReadOnly(MultiXactOffsetCtl, pageno, multi); offptr = (MultiXactOffset *) MultiXactOffsetCtl-shared-page_buffer[slotno]; @@ -2642,25 +2756,31 @@ find_multixact_start(MultiXactId multi) I think it'd be a good idea to also return false in case of a InvalidMultiXactId - that'll be returned if the page has been zeroed. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] brin regression test intermittent failures
I wrote: I think it would be a good idea to extend the brinopers table to include the number of expected matches, and to complain if that's not what we got, rather than simply checking for zero. Also, further experimentation shows that there are about 30 entries in the brinopers table that give rise to seqscan plans even when we're commanding a bitmap scan, presumably because those operators aren't brin-indexable. They're not the problematic cases, but things like ((charcol)::text 'A'::text) Is there a reason to have such things in the table, or is this just a thinko? Or is it actually a bug that we're getting such plans? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] brin regression test intermittent failures
Tom Lane wrote: I wrote: I think it would be a good idea to extend the brinopers table to include the number of expected matches, and to complain if that's not what we got, rather than simply checking for zero. Also, further experimentation shows that there are about 30 entries in the brinopers table that give rise to seqscan plans even when we're commanding a bitmap scan, presumably because those operators aren't brin-indexable. They're not the problematic cases, but things like ((charcol)::text 'A'::text) Is there a reason to have such things in the table, or is this just a thinko? Or is it actually a bug that we're getting such plans? No, I left those there knowing that there are no plans involving brin -- in a way, they provide some future proofing if some of those operators are made indexable later. I couldn't think of a way to test that the plans are actually using the brin index or not, but if we can do that in some way, that would be good. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Further issues with jsonb semantics, documentation
On 06/04/2015 11:33 AM, Jim Nasby wrote: On 6/4/15 8:43 AM, Andrew Dunstan wrote: You are conflating two different things here, quite pointlessly. The RH operand of ?| is not a path, whereas the RH operand of this - variant is. The fact that they are both text arrays doesn't mean that they should mean the same thing. And this is really the whole problem with the rest of your analysis. Has the idea of a specific json_path datatype been discussed? I feel it would add a lot of clarity to the operators. It would also make it easy to have an array of paths, something that's difficult to do today because a path can be an arbitrary length and arrays don't support that. I actually thought of doing something like that earlier today, although I was thinking of making it an array under the hood - I'm not sure how much call there is for an array of paths. We could probably finesse that. I agree that there is some sense in having such a type, especially if we later want to implement json(b)_patch, see http://tools.ietf.org/html/rfc6902. And if we do we should call the type json_pointer to be consistent with http://tools.ietf.org/html/rfc6901. However, this is certainly not 9.5 material. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RFC: Remove contrib entirely
On 06/04/2015 10:34 AM, Robert Haas wrote: On Thu, Jun 4, 2015 at 11:49 AM, Joshua D. Drake j...@commandprompt.com wrote: Except, that is kind of the point. Why are we adding to it? If you don't know the answer to that question already, then you probably shouldn't be proposing to get rid of the thing. I know the answer some people are saying. That doesn't mean it is the correct answer, that I agree with it or that it is a good answer. I think it's because there are some things we want to include in the core distribution without baking them irrevocably into the server. I have mentioned before isn't really what this discussion is about. Stephen Frost and I even went through the entire list of modules of what should and should not be included. Which, IMV, is 100% reasonable. Nobody is arguing with that. Sincerely, Joshua D. Drake -- Command Prompt, Inc. - http://www.commandprompt.com/ 503-667-4564 PostgreSQL Centered full stack support, consulting and development. Announcing I'm offended is basically telling the world you can't control your own emotions, so everyone else should do it for you. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RFC: Remove contrib entirely
On Thu, Jun 4, 2015 at 1:57 PM, Joshua D. Drake j...@commandprompt.com wrote: I think it's because there are some things we want to include in the core distribution without baking them irrevocably into the server. I have mentioned before isn't really what this discussion is about. Stephen Frost and I even went through the entire list of modules of what should and should not be included. Which, IMV, is 100% reasonable. Nobody is arguing with that. Well then I am confused. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] xid wrap / optimize frozen tables?
On Wed, Jun 3, 2015 at 2:49 PM, Jeff Janes jeff.ja...@gmail.com wrote: I would not be surprised if it were the reading, not the writing, which caused the performance problem. Of course I screwed up that last sentence. I meant the opposite, it would not surprise me if it were the writing that caused the problem, despite there being 5 times less of it. Jeff
Re: [HACKERS] Re: [GENERAL] 9.4.1 - 9.4.2 problem: could not access status of transaction 1
On Thu, Jun 4, 2015 at 1:27 PM, Andres Freund and...@anarazel.de wrote: On 2015-06-04 12:57:42 -0400, Robert Haas wrote: + /* + * Do we need an emergency autovacuum? If we're not sure, assume yes. + */ + return !oldestOffsetKnown || + (nextOffset - oldestOffset MULTIXACT_MEMBER_SAFE_THRESHOLD); I think without teaching autovac about those rules, this might just lead to lots of autovac processes starting without knowing they should do something? They know about autovacuum_multixact_freeze_age, but they know neither about !oldestOffsetKnown nor, afaics, about pending offset wraparounds? You're right, but that's why the latest patch has changes in MultiXactMemberFreezeThreshold. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] brin regression test intermittent failures
Alvaro Herrera alvhe...@2ndquadrant.com writes: Tom Lane wrote: Also, further experimentation shows that there are about 30 entries in the brinopers table that give rise to seqscan plans even when we're commanding a bitmap scan, presumably because those operators aren't brin-indexable. They're not the problematic cases, but things like ((charcol)::text 'A'::text) Is there a reason to have such things in the table, or is this just a thinko? Or is it actually a bug that we're getting such plans? No, I left those there knowing that there are no plans involving brin -- in a way, they provide some future proofing if some of those operators are made indexable later. On closer investigation, I think the ones involving charcol are a flat out bug in the test, namely failure to quote char. Observe: regression=# explain select ctid from brintest where charcol = 'A'::char; QUERY PLAN -- Seq Scan on brintest (cost=0.00..101.88 rows=1 width=6) Filter: ((charcol)::text = 'A'::text) (2 rows) regression=# explain select ctid from brintest where charcol = 'A'::char; QUERY PLAN --- Bitmap Heap Scan on brintest (cost=48.02..58.50 rows=3 width=6) Recheck Cond: (charcol = 'A'::char) - Bitmap Index Scan on brinidx (cost=0.00..48.02 rows=3 width=0) Index Cond: (charcol = 'A'::char) (4 rows) Presumably we'd like to test the latter case not the former. The other cases that I found involve cidrcol, and seem to represent an actual bug in the brin planning logic, ie failure to disregard a no-op cast. I'll look closer. I couldn't think of a way to test that the plans are actually using the brin index or not, but if we can do that in some way, that would be good. Yeah, we can do that --- the way I found out there's a problem is to modify the test script to check the output of EXPLAIN. So at this point it looks like (1) chipmunk's issue might be explained by lack of forced ORDER BY; (2) the test script could be improved to test more carefully, and it has got an issue with char vs char; (3) there might be a planner bug. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RFC: Remove contrib entirely
On Thu, Jun 4, 2015 at 11:49 AM, Joshua D. Drake j...@commandprompt.com wrote: Except, that is kind of the point. Why are we adding to it? If you don't know the answer to that question already, then you probably shouldn't be proposing to get rid of the thing. I think it's because there are some things we want to include in the core distribution without baking them irrevocably into the server. Which, IMV, is 100% reasonable. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minor issue with BRIN regression tests
Peter Geoghegan p...@heroku.com writes: Attached patch adjusts BRIN regression tests to make a non-obvious dependency on tuple order explicit. Currently, an index-only scan plan is used by the query that I've adjusted. I'd rather be sure that that continues. Applied with a correction: the ordering that was being used was really ORDER BY thousand, tenthous because that's the order of the relevant index. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [GENERAL] 9.4.1 - 9.4.2 problem: could not access status of transaction 1
Alvaro Herrera wrote: Robert Haas wrote: So here's a patch taking a different approach. I tried to apply this to 9.3 but it's messy because of pgindent. Anyone would have a problem with me backpatching a pgindent run of multixact.c? Done. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RFC: Remove contrib entirely
On 06/04/2015 11:01 AM, Robert Haas wrote: On Thu, Jun 4, 2015 at 1:57 PM, Joshua D. Drake j...@commandprompt.com wrote: I think it's because there are some things we want to include in the core distribution without baking them irrevocably into the server. I have mentioned before isn't really what this discussion is about. Stephen Frost and I even went through the entire list of modules of what should and should not be included. Which, IMV, is 100% reasonable. Nobody is arguing with that. Well then I am confused. My argument was (after some preliminary discussion): 1. Review contrib 2. All modules that are core worthy install by default 3. Push all other contrib out into the wild Possibly: 1. Have a contrib project that sat outside of core 2. Push all contrib or some contrib to pgxn (or some other place) Because: 1. Decrease in code maintenance for core 2. Removal of the idea that contrib is a holding queue for not quite up to snuff code 3. Most extensions don't need to follow the same development pattern that core does 4. Eliminate the EGO of saying I have a contrib module in core Derived from: 1. 15 years of the same argument (current source: pg_audit) 2. Helping reduce overall resource need to manage contrib 3. And now (Nasby's excellent argument) I am not trying to throw a wrench in things. I am trying to help streamline them. A lot of the arguments presented just don't hold water (eating your own dogfood) because we aren't stopping anyone from doing that testing. Whether or not the source (whatever it may be) is in -contrib doesn't prevent that testing. Sincerely, Joshua D. Drake -- Command Prompt, Inc. - http://www.commandprompt.com/ 503-667-4564 PostgreSQL Centered full stack support, consulting and development. Announcing I'm offended is basically telling the world you can't control your own emotions, so everyone else should do it for you. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RFC: Remove contrib entirely
On 2015-06-04 11:33:47 -0700, Joshua D. Drake wrote: My argument was (after some preliminary discussion): 1. Review contrib 2. All modules that are core worthy install by default 3. Push all other contrib out into the wild Possibly: 1. Have a contrib project that sat outside of core 2. Push all contrib or some contrib to pgxn (or some other place) Because: 1. Decrease in code maintenance for core The modules that aren't going to be in core, hardly cost time, do they? 2. Removal of the idea that contrib is a holding queue for not quite up to snuff code I don't think that idea exists widely. 3. Most extensions don't need to follow the same development pattern that core does The point is that users want them to follow that. 4. Eliminate the EGO of saying I have a contrib module in core Seriously? 1. 15 years of the same argument (current source: pg_audit) I don't see getting rid of contrib helping with that. The only change will then be whether something should be in core. And there's stuff that's just very hard to develop out of core; but doesn't necessarily immediately belong into core. E.g. pg_stat_statements is relatively closely tied to things in core. 2. Helping reduce overall resource need to manage contrib This discussion a significant share of that. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] brin regression test intermittent failures
Alvaro Herrera alvhe...@2ndquadrant.com writes: Tom Lane wrote: I may be confused, but why would the physical ordering of the table entries make a difference to the correct answers for this test? (I can certainly see why that might break the brin code, but not why it should change the seqscan's answers.) We create the brintest using a scan of tenk1 LIMIT 100, without specifying the order. So whether we find rows that match each test query is pure chance. Oooh ... normally that would not matter, but I wonder if what's happening on chipmunk is that the synchronized-seqscan logic kicks in and causes us to read some other part of tenk1 than we normally would, as a consequence of some concurrent activity or other. The connection to smaller than normal shared_buffers would be that it would change our idea of what's a large enough table to justify using synchronized seqscans. Peter's patch failed to hit the place where this matters, btw. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RFC: Remove contrib entirely
On 2015-06-04 20:41:46 +0200, Andres Freund wrote: 1. 15 years of the same argument (current source: pg_audit) I don't see getting rid of contrib helping with that. The only change will then be whether something should be in core. And there's stuff that's just very hard to develop out of core; but doesn't necessarily immediately belong into core. E.g. pg_stat_statements is relatively closely tied to things in core. An even better example is pg_upgrade. It'd not have been possible to bring it somewhere close to maturity without having it in contrib/. It requires core cooperation, and we weren't immediately to declare it as part of core. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Further issues with jsonb semantics, documentation
On Thu, Jun 4, 2015 at 1:02 PM, Peter Geoghegan p...@heroku.com wrote: I would like these new-to-9.5 deletion operators to work at the top level only, like operator jsonb ? text and operator jsonb ?| text, sharing their idea of a key, __including that string array elements are keys__. We haven't got a containment-style nested delete operator for 9.5, but we can hope for it in the future. In the meantime, you get much of the benefit of that with jsonb_delete(), which *can* support nested deletion. It does so by buying into the operator jsonb ? text idea of a key (including that string array elements are keys), although with a twist: the paths text[] right operand operates at multiple nesting levels (not supporting traversing arrays, as Andrew implemented it, but OTOH adding support for deleting String array elements based on the string alone, useful for tag arrays). If in 9.6 we have something like an operator jsonb @- jsonb operator for containment style deletion, and a 9.5 era operator jsonb - text and operator jsonb - text[] pair of operators for existence style deletion (matching operator jsonb ? text, operating only on the top level), that will be pretty good. The fact that jsonb_delete() will have somewhat bridged the gap nesting-deletion-wise for 9.5 (without being usable through an operator) won't really matter then. I want to keep the twist I described out of any jsonb operators that are shipped, and only use it within functions. To be clear: these two paragraphs are a proposal about how I'd like to change things for 9.5 to make the jsonb operators more consistent than the way things are in the master branch, while still offering nested deletion through a function. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Further issues with jsonb semantics, documentation
On Jun 4, 2015, at 12:16 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: I'm just skimming here, but if a jsonb_path type is being proposed, Is this not the purpose of JSQuery? https://code.google.com/p/gwtquery/wiki/JsQuery David smime.p7s Description: S/MIME cryptographic signature
Re: [HACKERS] RFC: Remove contrib entirely
On Jun 4, 2015, at 11:53 AM, Neil Tiffin ne...@neiltiffin.com wrote: I have looked at PGXN and would never install anything from it. Why? Because it is impossible to tell, without inside knowledge or a lot of work, what is actively maintained and tested, and what is an abandoned proof-of-concept or idea. Well, you can see the last release dates for a basic idea of that sort of thing. Also the release status (stable, unstable, testing). There is no indication of what versions of pg any of PGXN modules are tested on, or even if there are tests that can be run to prove the module works correctly with a particular version of pg. Yeah, I’ve been meaning to integrate http://pgxn-tester.org/ results for all modules, which would help with that. In the meantime you can hit that site itself. Awesome work by Tomas Vondra. There are many modules that have not been updated for several years. What is their status? If they break is there still someone around to fix them or even cares about them? If not, then why waste my time. These are challenges to open-source software in general, and not specific to PGXN. So adding to Jim’s comment above, anything that vets or approves PGXN modules is, in my opinion, essentially required to make PGXN useful for anything other than a scratchpad. Most of the distributions on PGXN feature links to their source code repositories. A big help would be to pull in the date of the last git commit in the module overview and ask the authors to edit the readme to add what major version of pg the author last tested or ran on. That’s difficult to maintain; I used to do it for pgTAP, was too much work. pgxn-tester.org is a much better idea. Best, David smime.p7s Description: S/MIME cryptographic signature
Re: [HACKERS] Streaming replication for psycopg2
On Tue, Jun 2, 2015 at 2:23 PM, Shulgin, Oleksandr oleksandr.shul...@zalando.de wrote: Hello, I've submitted a patch to psycopg2 to support streaming replication protocol (COPY_BOTH): https://github.com/psycopg/psycopg2/pull/322 It would be great if more people had a chance to take a look and provide feedback about it. In particular, please see example usage at this github comment[1]. Some bikeshedding is really needed here. :-) I've been suggested that instead of putting the sync/stop methods into the replication message class like this class ReplicationMessage(str): #wal_end #data_start #send_time #def commit(self): #def stop(self): ... it would make more sense to put them into the cursor, like this: class ReplicationCursor(...): def sync_server(self, msg): ... def stop_replication(self): ... The client code will be able then to do this: class LogicalWriter(object): def __init__(self, cursor): self.cursor = cursor def write(self, msg): # receives instance of ReplicationMessage if should_stop_replication(): self.cursor.stop_replication() return self.actually_store_the_message(msg) if stored_reliably() and want_to_report_now(): self.cursor.sync_server(msg) # return value not examined by caller That seems like a more sane interface to me. -- Alex
Re: [HACKERS] RFC: Remove contrib entirely
On 06/04/2015 09:00 AM, Andrew Dunstan wrote: No. You keep getting this wrong. The fact that we have extensions doesn't mean that we want to throw out everything that is an extension. It's perfectly reasonable for us to maintain some ourselves, not least as part of eating out own dog food. Yes. I agree with that. Which is why I suggested .Org -contrib project. JD -- Command Prompt, Inc. - http://www.commandprompt.com/ 503-667-4564 PostgreSQL Centered full stack support, consulting and development. Announcing I'm offended is basically telling the world you can't control your own emotions, so everyone else should do it for you. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Further issues with jsonb semantics, documentation
On 6/4/15 8:43 AM, Andrew Dunstan wrote: You are conflating two different things here, quite pointlessly. The RH operand of ?| is not a path, whereas the RH operand of this - variant is. The fact that they are both text arrays doesn't mean that they should mean the same thing. And this is really the whole problem with the rest of your analysis. Has the idea of a specific json_path datatype been discussed? I feel it would add a lot of clarity to the operators. It would also make it easy to have an array of paths, something that's difficult to do today because a path can be an arbitrary length and arrays don't support that. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [GENERAL] 9.4.1 - 9.4.2 problem: could not access status of transaction 1
On Thu, Jun 4, 2015 at 9:42 AM, Robert Haas robertmh...@gmail.com wrote: Thanks for the review. Here's a new version. I've fixed the things Alvaro and Noah noted, and some compiler warnings about set but unused variables. I also tested it, and it doesn't quite work as hoped. If started on a cluster where oldestMultiXid is incorrectly set to 1, it starts up and indicates that the member wraparound guards are disabled. But even after everything is fixed, they don't get enabled until after the next full restart. I think that's because TruncateMultiXact() bails out too early, without calling DetermineSafeOldestOffset. My attempt at a quick fix for that problem didn't work out, so I'm posting this version for now to facilitate further review and testing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company commit eb39cf10e4ff853ed4b9d3fab599cf42911e6f70 Author: Robert Haas rhaas@postgresql.org Date: Thu Jun 4 11:58:49 2015 -0400 bar diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c index 699497c..209d3e6 100644 --- a/src/backend/access/transam/multixact.c +++ b/src/backend/access/transam/multixact.c @@ -196,13 +196,24 @@ typedef struct MultiXactStateData /* next-to-be-assigned offset */ MultiXactOffset nextOffset; + /* Have we completed multixact startup? */ + bool finishedStartup; + /* - * Oldest multixact that is still on disk. Anything older than this - * should not be consulted. These values are updated by vacuum. + * Oldest multixact that is still potentially referenced by a relation. + * Anything older than this should not be consulted. These values are + * updated by vacuum. */ MultiXactId oldestMultiXactId; Oid oldestMultiXactDB; + + /* + * Oldest multixact offset that is potentially referenced by a + * multixact referenced by a relation. We don't always know this value, + * so there's a flag here to indicate whether or not we currently do. + */ MultiXactOffset oldestOffset; + bool oldestOffsetKnown; /* * This is what the previous checkpoint stored as the truncate position. @@ -219,6 +230,7 @@ typedef struct MultiXactStateData /* support for members anti-wraparound measures */ MultiXactOffset offsetStopLimit; + bool offsetStopLimitKnown; /* * Per-backend data starts here. We have two arrays stored in the area @@ -348,10 +360,11 @@ static bool MultiXactOffsetPrecedes(MultiXactOffset offset1, MultiXactOffset offset2); static void ExtendMultiXactOffset(MultiXactId multi); static void ExtendMultiXactMember(MultiXactOffset offset, int nmembers); -static void DetermineSafeOldestOffset(MultiXactId oldestMXact); +static void DetermineSafeOldestOffset(MultiXactOffset oldestMXact); static bool MultiXactOffsetWouldWrap(MultiXactOffset boundary, MultiXactOffset start, uint32 distance); -static MultiXactOffset find_multixact_start(MultiXactId multi); +static bool SetOffsetVacuumLimit(bool finish_setup); +static bool find_multixact_start(MultiXactId multi, MultiXactOffset *result); static void WriteMZeroPageXlogRec(int pageno, uint8 info); @@ -960,7 +973,8 @@ GetNewMultiXactId(int nmembers, MultiXactOffset *offset) * against catastrophic data loss due to multixact wraparound. The basic * rules are: * - * If we're past multiVacLimit or the safe threshold for member storage space, + * If we're past multiVacLimit or the safe threshold for member storage + * space, or we don't know what the safe threshold for member storage is, * start trying to force autovacuum cycles. * If we're past multiWarnLimit, start issuing warnings. * If we're past multiStopLimit, refuse to create new MultiXactIds. @@ -969,6 +983,7 @@ GetNewMultiXactId(int nmembers, MultiXactOffset *offset) *-- */ if (!MultiXactIdPrecedes(result, MultiXactState-multiVacLimit) || + !MultiXactState-oldestOffsetKnown || (MultiXactState-nextOffset - MultiXactState-oldestOffset MULTIXACT_MEMBER_SAFE_THRESHOLD)) { @@ -1083,7 +1098,8 @@ GetNewMultiXactId(int nmembers, MultiXactOffset *offset) *-- */ #define OFFSET_WARN_SEGMENTS 20 - if (MultiXactOffsetWouldWrap(MultiXactState-offsetStopLimit, nextOffset, + if (MultiXactState-offsetStopLimitKnown + MultiXactOffsetWouldWrap(MultiXactState-offsetStopLimit, nextOffset, nmembers)) ereport(ERROR, (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), @@ -1095,7 +,8 @@ GetNewMultiXactId(int nmembers, MultiXactOffset *offset) MultiXactState-offsetStopLimit - nextOffset - 1), errhint(Execute a database-wide VACUUM in database with OID %u with reduced vacuum_multixact_freeze_min_age and vacuum_multixact_freeze_table_age settings., MultiXactState-oldestMultiXactDB))); - else if (MultiXactOffsetWouldWrap(MultiXactState-offsetStopLimit, + else if (MultiXactState-offsetStopLimitKnown + MultiXactOffsetWouldWrap(MultiXactState-offsetStopLimit,
Re: [HACKERS] RFC: Remove contrib entirely
On Thu, Jun 4, 2015 at 11:22 AM, Andrew Dunstan and...@dunslane.net wrote: The biggest problem is that packagers tend just to bundle contrib together in one lump. If we could divide it into two, something like standard modules and misc, with the former being included with the server package, I think that would be an advance, although packagers might reasonably want to treat pgcrypto as a special case. The problem is that it's very hard to agree on which stuff ought to be standard and which stuff ought to be misc. There's probably some stuff that almost everyone would agree is pretty useful (like hstore and postgres_fdw) but figuring out which stuff isn't useful is a lot harder. Almost anything you say - that's junk - someone else will say - no, that thing is great, I use it all the time. For example, I just offered contrib/isn as a sort of archetypal example of stuff that's pretty marginal crap and Josh immediately came back and said, hey, I use that! I don't see any principled way of getting past that difficulty. Just because a module isn't regularly used by someone who reads -hackers daily doesn't mean it's not worth keeping. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] brin regression test intermittent failures
Alvaro Herrera alvhe...@2ndquadrant.com writes: Tom Lane wrote: Fixed, see 79f2b5d583e2e2a7; but AFAICS this has no real-world impact so it does not explain whatever is happening on chipmunk. Ah, thanks for diagnosing that. The chipmunk failure is strange -- notice it only references the = operators, except for type box for which it's ~= that fails. The test includes a lot of operators ... Actually not --- if you browse through the last half dozen failures on chipmunk you will notice that (1) the set of operators complained of varies a bit from one failure to the next; (2) more often than not, this is one of the failures: WARNING: no results for (boxcol,@,box,((1,2),(300,400))) Certainly the majority of the complaints are about equality operators, but not quite all of them. Also, we have quite a number of ARM boxes: apart from chipmunk we have gull, hamster, mereswine, dangomushi, axolotl, grison. (hamster and chipmunk report hostname -m as armv6l, the others armv7l). All of them are running Linux, either Fedora or Debian. Most are using gcc, compilation flags look pretty standard. I have no idea what might be different about chipmunk compared to any other ARM buildfarm critter ... Heikki, any thoughts on that? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RFC: Remove contrib entirely
On 06/04/2015 08:55 AM, Jim Nasby wrote: Personally, I'd rather we publish a list of formally vetted and approved versions of PGXN modules. There are many benefits to that, and the downside of not having that stuff as part of make check would be overcome by the explicit testing we would need to have for approved modules. I tried to come up with more words but: Exactly. The benefit to this idea so far exceeds the benefit to having contrib. Sincerely, JD -- The most kicking donkey PostgreSQL Infrastructure company in existence. The oldest, the most experienced, the consulting company to the stars. Command Prompt, Inc. http://www.commandprompt.com/ +1 -503-667-4564 - 24x7 - 365 - Proactive and Managed Professional Services! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] brin regression test intermittent failures
Tom Lane wrote: Actually not --- if you browse through the last half dozen failures on chipmunk you will notice that (1) the set of operators complained of varies a bit from one failure to the next; (2) more often than not, this is one of the failures: WARNING: no results for (boxcol,@,box,((1,2),(300,400))) Certainly the majority of the complaints are about equality operators, but not quite all of them. Hm. Well, what this message says is that we ran that query using both BRIN and seqscan, and that in both cases no row was returned. Note that if the BRIN and seqscan cases had returned different sets of rows, the error message would have been different. So this might be related to the way the test table is created, rather than to a bug in BRIN. Peter G. recently pointed out that this seems to be relying on an index-only scan on table tenk1 and suggested an ORDER BY. Maybe that assumption is being violated on chipmunk and so the table populated is different than what the table actually expects. I just noticed that chipmunk has shared_buffers=10MB on its buildfarm config. I don't see that in any of the other ARM animals. Maybe that can change the plan choice. I will test locally with reduced shared_buffers and see if I can reproduce the results. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] brin regression test intermittent failures
Alvaro Herrera wrote: Hm. Well, what this message says is that we ran that query using both BRIN and seqscan, and that in both cases no row was returned. Note that if the BRIN and seqscan cases had returned different sets of rows, the error message would have been different. So this might be related to the way the test table is created, rather than to a bug in BRIN. Peter G. recently pointed out that this seems to be relying on an index-only scan on table tenk1 and suggested an ORDER BY. Maybe that assumption is being violated on chipmunk and so the table populated is different than what the table actually expects. Evidently there is a problem right there. If I simply add an order by tenthous as proposed by Peter, many more errors appear; and what errors appear differs if I change shared_buffers. I think the real fix for this is to change the hand-picked values used in the brinopers table, so that they all pass the test using some reasonable ORDER BY specification in the populating query (probably tenk1.unique1). -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RFC: Remove contrib entirely
Robert Haas robertmh...@gmail.com writes: On Thu, Jun 4, 2015 at 11:22 AM, Andrew Dunstan and...@dunslane.net wrote: The biggest problem is that packagers tend just to bundle contrib together in one lump. If we could divide it into two, something like standard modules and misc, with the former being included with the server package, I think that would be an advance, although packagers might reasonably want to treat pgcrypto as a special case. As an ex-packager, I agree that would be a good thing. In particular we should try to avoid people packaging stuff that's meant either for server testing or as sample-source-code-not-useful-in-itself. We've made some progress on the former via src/test/modules but I wonder if we're all the way there; test_decoding surely shouldn't be in contrib on this measure should it? BTW, pg_upgrade is also a special case from a packager's viewpoint, since it needs to be bundled with back-version executables. The problem is that it's very hard to agree on which stuff ought to be standard and which stuff ought to be misc. There's probably some stuff that almost everyone would agree is pretty useful (like hstore and postgres_fdw) but figuring out which stuff isn't useful is a lot harder. Almost anything you say - that's junk - someone else will say - no, that thing is great, I use it all the time. For example, I just offered contrib/isn as a sort of archetypal example of stuff that's pretty marginal crap and Josh immediately came back and said, hey, I use that! I don't see any principled way of getting past that difficulty. Just because a module isn't regularly used by someone who reads -hackers daily doesn't mean it's not worth keeping. Yeah. One possible way of measuring this would be to go through the commit logs and count commits in contrib/ that either added a new feature or fixed a field-reported bug (ie, did not arise simply from core-code-driven housekeeping). Either would be solid evidence that somebody out there is using it. Gathering the evidence would be pretty tedious though :-( regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] brin regression test intermittent failures
Alvaro Herrera alvhe...@2ndquadrant.com writes: Evidently there is a problem right there. If I simply add an order by tenthous as proposed by Peter, many more errors appear; and what errors appear differs if I change shared_buffers. I think the real fix for this is to change the hand-picked values used in the brinopers table, so that they all pass the test using some reasonable ORDER BY specification in the populating query (probably tenk1.unique1). I may be confused, but why would the physical ordering of the table entries make a difference to the correct answers for this test? (I can certainly see why that might break the brin code, but not why it should change the seqscan's answers.) Also, what I'd just noticed is that all of the cases that are failing are ones where the expected number of matching rows is exactly 1. I am wondering if the test is sometimes just missing random rows, and we're not seeing any reported problem unless that makes it go down to no rows. (But I do not know how that could simultaneously affect the seqscan case ...) I think it would be a good idea to extend the brinopers table to include the number of expected matches, and to complain if that's not what we got, rather than simply checking for zero. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RFC: Remove contrib entirely
On 06/04/2015 07:34 AM, Robert Haas wrote: I don't think it's very practical to talk about getting rid of contrib when we reliably add to it in every release: Except, that is kind of the point. Why are we adding to it? Contrib (AFAICS) is all things that don't need to be in -core. That is the whole point of having extensions, is it not? Sincerely, JD -- Command Prompt, Inc. - http://www.commandprompt.com/ 503-667-4564 PostgreSQL Centered full stack support, consulting and development. Announcing I'm offended is basically telling the world you can't control your own emotions, so everyone else should do it for you. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RFC: Remove contrib entirely
On 6/4/15 10:30 AM, Robert Haas wrote: On Thu, Jun 4, 2015 at 11:22 AM, Andrew Dunstanand...@dunslane.net wrote: The biggest problem is that packagers tend just to bundle contrib together in one lump. If we could divide it into two, something like standard modules and misc, with the former being included with the server package, I think that would be an advance, although packagers might reasonably want to treat pgcrypto as a special case. The problem is that it's very hard to agree on which stuff ought to be standard and which stuff ought to be misc. What I took away upthread was the idea here was to distinguish things that were intended as a POC (like worker_spi, auth_delay and test_decoding) from everything else. I think the real problem here that we're skirting around is this idea of 'blessed extensions', because that's really the only user benefit contrib brings: the idea that this stuff is formally blessed by the community. If that's really what we're after then we should just be explicit about that. Then we can decide if the best way to approach that is keeping it in the main repo (as opposed to say, publishing a list of explict PGXN package versions and their checksums). Personally, I'd rather we publish a list of formally vetted and approved versions of PGXN modules. There are many benefits to that, and the downside of not having that stuff as part of make check would be overcome by the explicit testing we would need to have for approved modules. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RFC: Remove contrib entirely
On 06/04/2015 11:49 AM, Joshua D. Drake wrote: On 06/04/2015 07:34 AM, Robert Haas wrote: I don't think it's very practical to talk about getting rid of contrib when we reliably add to it in every release: Except, that is kind of the point. Why are we adding to it? Contrib (AFAICS) is all things that don't need to be in -core. That is the whole point of having extensions, is it not? No. You keep getting this wrong. The fact that we have extensions doesn't mean that we want to throw out everything that is an extension. It's perfectly reasonable for us to maintain some ourselves, not least as part of eating out own dog food. And I say that as someone who has created and maintains quite a few third party extensions. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Dependency between bgw_notify_pid and bgw_flags
Hi, Documentation here http://www.postgresql.org/docs/devel/static/bgworker.html does not indicate any relation between the fields bgw_notify_pid and bgw_flags of BackgroundWorker structure. But in one has to set BGWORKER_BACKEND_DATABASE_CONNECTION in order to use bgw_notify_pid feature. In BackgroundWorkerStateChange 318 /* 319 * Copy the PID to be notified about state changes, but only if the 320 * postmaster knows about a backend with that PID. It isn't an error 321 * if the postmaster doesn't know about the PID, because the backend 322 * that requested the worker could have died (or been killed) just 323 * after doing so. Nonetheless, at least until we get some experience 324 * with how this plays out in the wild, log a message at a relative 325 * high debug level. 326 */ 327 rw-rw_worker.bgw_notify_pid = slot-worker.bgw_notify_pid; 328 if (!PostmasterMarkPIDForWorkerNotify(rw-rw_worker.bgw_notify_pid)) 329 { 330 elog(DEBUG1, worker notification PID %lu is not valid, 331 (long) rw-rw_worker.bgw_notify_pid); 332 rw-rw_worker.bgw_notify_pid = 0; 333 } bgw_notify_pid gets wiped out (and that too silently) if PostmasterMarkPIDForWorkerNotify() returns false. PostmasterMarkPIDForWorkerNotify() only looks at the BackendList, which does not contain all the background worker created by Register*BackgroundWorker() calls. Only a baground worker which has set BGWORKER_BACKEND_DATABASE_CONNECTION, gets added into BackendList in maybe_start_bgworker() 5629 if (rw-rw_worker.bgw_flags BGWORKER_BACKEND_DATABASE_CONNECTION) 5630 { 5631 if (!assign_backendlist_entry(rw)) 5632 return; 5633 } 5634 else 5635 rw-rw_child_slot = MyPMChildSlot = AssignPostmasterChildSlot(); 5636 5637 do_start_bgworker(rw); /* sets rw-rw_pid */ 5638 5639 if (rw-rw_backend) 5640 { 5641 dlist_push_head(BackendList, rw-rw_backend-elem); Should we change the documentation to say one needs to set BGWORKER_BACKEND_DATABASE_CONNECTION in order to use bgw_notify_pid feature? OR we should fix the code not to wipe out bgw_notify_pid in the code above (may be change PostmasterMarkPIDForWorkerNotify() to scan the list of other background workers as well)? -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] RFC: Remove contrib entirely
On Fri, May 29, 2015 at 10:30 AM, Stephen Frost sfr...@snowman.net wrote: That work will be much less if we simply split what's in contrib now into extension and contrib directories, as it's still all one source repo to the packagers. If we punt things out (unless they're being formally deprecated/removed) to another repo somewhere, then the packagers are going to have to deal with new source repos and related complexity. I'm not saying that's a horrible thing and it might make sense in some cases, but generally it's a lot easier to go from one soruce package to a bunch of binary ones than from lots of tiny source packages to lots of tiny packages. The versioning aspect of this does come into play though, as having everything with one relatively slow versioning cycle (on the order of months) is actually keeping the load on the packagers down. Lots of little releases, all at different times, from lots of different source packages, would increase the workload. These are all good points. I'm not sure where I ultimately come down on the question about in-repo vs. out-of-repo. My gut feeling is that if the community is willing to continue maintaining contrib modules, then that's ultimately a good thing and many of them are relatively low-maintenance anyway. Having a high barrier to entry for new modules looks a bit odd, given the definition of contrib, but would be more understandable with a proper 'extensions' directory. Of course, we'd have to collectivly agree that we feel comfortable with a lower barrier for contrib that what is being done now, if the distinction is going to have any meaning. I don't agree. It may make sense to keep some stuff in contrib that is not great code or not really that useful just for historical reasons. But if we accept every extension with the quality of, say, contrib/isn into the core distribution, or core distribution is going to be huge and bloated with crap. I'd like to point out that we've already done some significant cleanup of contrib - some things got moved to src/test/modules, and other stuff to src/bin. So what's in that directory is already a lot more homogenous than it was a few years ago. Pretty much everything in there is a loadable module, and if it does something SQL-visible, it's packaged as an extension. The only exceptions are oid2name, pg_standby, start-scripts, and vacuumlo. For a directory with 60 things in it, that's pretty good. I don't think it's very practical to talk about getting rid of contrib when we reliably add to it in every release: 9.1 added auth_delay, dummy_seclabel, file_fdw, pg_test_fsync, and sepgsql 9.2 added pg_test_timing and tcn 9.3 added pg_xlogdump, postgres_fdw, and worker_spi 9.4 added pg_prewarm, test_decoding, and test_shm_mq 9.5 added hstore_plperl, hstore_plpython, ltree_plpython, tsm_system_rows, and tsm_system_time I haven't really gotten familiar with the 9.5 stuff, but most of that earlier stuff was, uh, pretty good stuff. We wouldn't have been better off rejecting it, and we wouldn't have been better off putting it into the main tree. Any individual person who looks at contrib will no doubt see a fairly high percentage of stuff they don't care about. Given that part of the charter of contrib is to hold things that we don't want to put in the core product for one reason or another, that is to be expected. But every single one of those 18 contrib modules we've added in the last 5 years was something that someone cared deeply about, many of them get real use, and we're just sticking our head in the sand if we think that's not going to keep happening. For what it's worth, I also don't particularly support renaming contrib. I don't really see that it buys us enough to justify the hassle it will cause. One thing that may be worth doing yet is separating the code that is just intended as a POC (like worker_spi, auth_delay and test_decoding) from the stuff that you are really intended to run in production (like tcn and hstore). But that distinction is fuzzier than you might think, because while auth_delay was intended as a POC, I've subsequently heard rumors of it being used in production with satisfactory results. It's very easy to get the idea that you know what PostgreSQL users use but usually that tends to mean what I use and the community is broad enough that those things are Not The Same. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Map basebackup tablespaces using a tablespace_map file
On 06/04/2015 12:44 AM, Amit Kapila wrote: Given that the function raises an error on failure, I think it will otherwise be OK as is. Please find an updated patch attached with this mail. No attachment. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RFC: Remove contrib entirely
On 06/04/2015 10:34 AM, Robert Haas wrote: For what it's worth, I also don't particularly support renaming contrib. I don't really see that it buys us enough to justify the hassle it will cause. One thing that may be worth doing yet is separating the code that is just intended as a POC (like worker_spi, auth_delay and test_decoding) from the stuff that you are really intended to run in production (like tcn and hstore). But that distinction is fuzzier than you might think, because while auth_delay was intended as a POC, I've subsequently heard rumors of it being used in production with satisfactory results. It's very easy to get the idea that you know what PostgreSQL users use but usually that tends to mean what I use and the community is broad enough that those things are Not The Same. The biggest problem is that packagers tend just to bundle contrib together in one lump. If we could divide it into two, something like standard modules and misc, with the former being included with the server package, I think that would be an advance, although packagers might reasonably want to treat pgcrypto as a special case. I'm also not in favor of booting packages wholesale out of core, for all the good reasons you and others have advanced. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [CORE] Restore-reliability mode
Josh, * Josh Berkus (j...@agliodbs.com) wrote: I would argue that if we delay 9.5 in order to do a 100% manual review of code, without adding any new automated tests or other non-manual tools for improving stability, then it's a waste of time; we might as well just release the beta, and our users will find more issues than we will. I am concerned that if we declare a cleanup period, especially in the middle of the summer, all that will happen is that the project will go to sleep for an extra three months. This is the exact same concern that I have. A delay just to have a delay is not useful. I completely agree that we need more automated testing, etc, though getting all of that set up and running could be done at any time too- there's no reason to wait, nor do I believe delaying 9.5 would make such automated testing appear. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] xid wrap / optimize frozen tables?
Just FYI: We have worked around these issues by running regular (scripted and thus controlled) vaccuums on all tables but the active ones and adding L2 ZFS caching (l2arc). I hope to get back to this again soon. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [GENERAL] 9.4.1 - 9.4.2 problem: could not access status of transaction 1
On Wed, Jun 03, 2015 at 04:53:46PM -0400, Robert Haas wrote: So here's a patch taking a different approach. In this approach, if the multixact whose members we want to look up doesn't exist, we don't use a later one (that might or might not be valid). Instead, we attempt to cope with the unknown. That means: 1. In TruncateMultiXact(), we don't truncate. I like that change a lot. It's much easier to seek forgiveness for wasting = 28 GiB of disk than for deleting visibility information wrongly. 2. If setting the offset stop limit (the point where we refuse to create new multixact space), we don't arm the stop point. This means that if you're in this situation, you run without member wraparound protection until it's corrected. A message gets logged once per checkpoint telling you that you have this problem, and another message gets logged when things get straightened out and the guards are enabled. 3. If setting the vacuum force point, we assume that it's appropriate to immediately force vacuum. Those seem reasonable, too. I've only tested this very lightly - this is just to see what you and Noah and others think of the approach. As compared with the previous approach, it has the advantage of making minimal assumptions about the sanity of what's on disk. It has the disadvantage that, for some people, the member-wraparound guard won't be enabled at startup -- but note that those people can't start 9.3.7/9.4.2 *at all*, so currently they are either running without member wraparound protection anyway (if they haven't upgraded to those releases) or they're down entirely. That disadvantage is negligible, considering. Another disadvantage is that we'll be triggering what may be quite a bit of autovacuum activity for some people, which could be painful. On the plus side, they'll hopefully end up with sane relminmxid and datminmxid guards afterwards. That sounds good so long as each table requires just one successful emergency autovacuum. I'm not seeing code to ensure that the launched autovacuum will indeed perform a full-table scan and update relminmxid; is it there? For sites that can't tolerate an autovacuum storm, what alternative can we provide? Is SET vacuum_multixact_freeze_table_age = 0; VACUUM table of every table, done before applying the minor update, sufficient? static void -DetermineSafeOldestOffset(MultiXactId oldestMXact) +DetermineSafeOldestOffset(MultiXactOffset oldestMXact) Leftover change from an earlier iteration? The values passed continue to be MultiXactId values. /* move back to start of the corresponding segment */ - oldestOffset -= oldestOffset % - (MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT); + offsetStopLimit = oldestOffset - (oldestOffset % + (MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT)); + /* always leave one segment before the wraparound point */ + offsetStopLimit -= (MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT); + + /* if nothing has changed, we're done */ + if (prevOffsetStopLimitKnown offsetStopLimit == prevOffsetStopLimit) + return; LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE); - /* always leave one segment before the wraparound point */ - MultiXactState-offsetStopLimit = oldestOffset - - (MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT); + MultiXactState-offsetStopLimit = oldestOffset; That last line needs s/oldestOffset/offsetStopLimit/, I presume. Thanks, nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [CORE] [HACKERS] postpone next week's release
On 06/04/2015 12:17 PM, Andres Freund wrote: On 2015-06-04 11:51:44 +0300, Heikki Linnakangas wrote: So, I'm all for refactoring and adding abstractions where it makes sense, but it's not going to solve design problems. I personally don't really see the multixact changes being that bad on the overall design. It pretty much just extended an earlier design. Now that wasn't great, but I don't think too many people had realized that at that point. The biggest problem was underestimating the complexity. Yeah, many of the issues were pre-existing, and would've been good to fix anyway. The multixact issues remind me of the another similar thing we did: the visibility map. It too was non-critical when it was first introduced, but later we started using it for index-only-scans, and it suddenly became important that it's up-to-date and crash-safe. We did uncover some bugs in that area when index-only-scans were introduced, similar to the multixact bugs, only not as bad because it didn't lead to data loss. I don't have any point to make with that comparison, but it was similar in many ways. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Construction of Plan-node by CSP (RE: Custom/Foreign-Join-APIs)
-Original Message- From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Tom Lane Sent: Thursday, May 28, 2015 5:35 AM To: Robert Haas Cc: Kaigai Kouhei(海外 浩平); Thom Brown; Kohei KaiGai; Shigeru Hanada; pgsql-hackers@postgreSQL.org Subject: Re: [HACKERS] Construction of Plan-node by CSP (RE: Custom/Foreign-Join-APIs) Robert Haas robertmh...@gmail.com writes: Tom, do you want to review this patch and figure out how to solve the underlying problem? If not, I will take care of it. But I will be unhappy if I put time and effort into this and then you insist on changing everything afterwards, again. [ sorry for slow response, been busy ] I will take a look. Tom, how about your availability? From my side, I adjust my extension (PG-Strom) to fit the infrastructure you proposed, then confirmed it is workable even if custom-scan, that replaced relations join, takes more than two Path nodes in the custom_children list of CustomPath, with no exportiong create_plan_recurse(). Below is an example of custom-scan (GpuJoin) that involves four relations join. Its code base is the latest master + custom-join-children.v2.patch; unchanged from the last post. postgres=# explain analyze select avg(x) from t0 natural join t1 natural join t2 natural join t3 group by cat; QUERY PLAN -- HashAggregate (cost=298513.77..298514.10 rows=26 width=12) (actual time=5622.028..5622.033 rows=26 loops=1) Group Key: t0.cat - Custom Scan (GpuJoin) (cost=4702.00..249169.85 rows=9868784 width=12) (actual time=540.718..2075.566 rows=1000 loops=1) Bulkload: On (density: 100.00%) Depth 1: Logic: GpuHashJoin, HashKeys: (cid), JoinQual: (cid = cid), nrows_ratio: 0.98936439 Depth 2: Logic: GpuHashJoin, HashKeys: (bid), JoinQual: (bid = bid), nrows_ratio: 0.99748135 Depth 3: Logic: GpuHashJoin, HashKeys: (aid), JoinQual: (aid = aid), nrows_ratio: 1. - Custom Scan (BulkScan) on t0 (cost=0.00..242858.60 rows=1060 width=24) (actual time=8.555..903.864 rows=1000 loops=1) - Seq Scan on t3 (cost=0.00..734.00 rows=4 width=4) (actual time=0.019..4.370 rows=4 loops=1) - Seq Scan on t2 (cost=0.00..734.00 rows=4 width=4) (actual time=0.004..4.182 rows=4 loops=1) - Seq Scan on t1 (cost=0.00..734.00 rows=4 width=4) (actual time=0.005..4.275 rows=4 loops=1) Planning time: 0.918 ms Execution time: 6178.264 ms (13 rows) Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei kai...@ak.jp.nec.com custom-join-children.v2.patch Description: custom-join-children.v2.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Missing -i / --ignore-version in pg_dump help
On Wed, Jun 3, 2015 at 1:53 AM, Fujii Masao masao.fu...@gmail.com wrote: On Fri, May 22, 2015 at 11:01 PM, Tom Lane t...@sss.pgh.pa.us wrote: Fujii Masao masao.fu...@gmail.com writes: On Fri, May 22, 2015 at 8:59 PM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: There are some reason to -i, --ignore-version option doesn't appear in pg_dump help? Because it's obsolete option, I think. Yeah, see commit c22ed3d523782c43836c163c16fa5a7bb3912826. Considering we shipped that in 8.4, maybe it's time to remove all trace of those switches. We'd still have to wait a couple releases before it'd be safe to use -i for something else, but it'd be a start. +1 Barring any objection, I will apply the attached patch and remove that obsolete option.. Applied. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RFC: Remove contrib entirely
On Jun 4, 2015, at 10:55 AM, Jim Nasby jim.na...@bluetreble.com wrote: Personally, I'd rather we publish a list of formally vetted and approved versions of PGXN modules. There are many benefits to that, and the downside of not having that stuff as part of make check would be overcome by the explicit testing we would need to have for approved modules. I have looked at PGXN and would never install anything from it. Why? Because it is impossible to tell, without inside knowledge or a lot of work, what is actively maintained and tested, and what is an abandoned proof-of-concept or idea. There is no indication of what versions of pg any of PGXN modules are tested on, or even if there are tests that can be run to prove the module works correctly with a particular version of pg. There are many modules that have not been updated for several years. What is their status? If they break is there still someone around to fix them or even cares about them? If not, then why waste my time. So adding to Jim’s comment above, anything that vets or approves PGXN modules is, in my opinion, essentially required to make PGXN useful for anything other than a scratchpad. A big help would be to pull in the date of the last git commit in the module overview and ask the authors to edit the readme to add what major version of pg the author last tested or ran on. When I install from contrib, at least I have some minimal assurance that the code is meant to work with the corresponding version of pg. Neil -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Further issues with jsonb semantics, documentation
On Thu, Jun 4, 2015 at 6:43 AM, Andrew Dunstan and...@dunslane.net wrote: I've noticed some more issues with the jsonb documentation, and the new jsonb stuff generally. I didn't set out to give Andrew feedback on the semantics weeks after feature freeze, but unfortunately this feels like another discussion that we need to have now rather than later. Yes, I wish you had raised these issues months ago when this was published. That's the way the process is supposed to work. I also wish that I managed to do that. As you know, I was working overtime to get UPSERT into 9.5 during that period. Finding time to review things is always difficult, and I which I could do more. operator jsonb - integer === I think it was a bad idea to allow array-style removal of object key/value pairs. ISTM that it implies a level of stability in the ordering that doesn't make sense. Besides, is it really all that useful? But I agree that it's not a great contribution to science, especially since the index will be applied to the list of elements in the somewhat counter-intuitive storage order we use, and we could just raise an error if we try to apply integer delete to an object instead of an array. Cool. Do you want to write a patch, or should I? Also, what about negative array subscripting (making the 9.4-era operator jsonb - integer operator support that for consistency with the new operator jsonb - integer operator)? Should I write the patch? Will you commit it if I do? operator jsonb - text[] (and *nested* deletion more generally) === Summary: I think that this operator has many problems, and should be scraped (although only as an operator). IMV nested deletion should only be handled by functions, and the way that nested deletion works in general should be slightly adjusted. The new operator jsonb - text[] operator is confusingly inconsistent with: A) operator jsonb text What exactly is this? I have no idea what you're talking about. It's a typo -- I meant operator jsonb - text. The fact that operator jsonb - text and operator jsonb - text[] diverge in the way they do seems confusing. The fact that hstore uses it that way doesn't really concern me. Since hstore isn't nested it doesn't make a whole lot of sense for it to mean anything else there. It seems pretty obvious to me that it makes just as much sense as in hstore. In hstore, you might want to delete multiple key/value pairs at once, for exactly the same reason as you might want to with jsonb. Certainly, you'll also want to support nested deletion with jsonb, but that's beside the point. But json(b) is nested, and jsonb - path seems quite a reasonable treatment, something you're much more likely to want to do than removeing top level elements in bulk. Probably true. I think that this interface for nested deletion is complicated enough and inconsistent enough that I'd rather not have an operator at all, just a function (so somewhat like jsonb_set() -- jsonb_delete()). That is my main point on operator jsonb - text[]; I think the interface is complicated and inconsistent with everything else for no good reason. Regarding nested deletion behavior more generally, consider this example of how this can work out badly: postgres=# select jsonb_delete(jsonb_set('[a]', '{5}', 'b'), '{5}') ; jsonb_delete -- [a, b] (1 row) Here, we're adding and then deleting an array element at offset 5 (the string b). But the element is never deleted by the outer jsonb_delete(), because we can't rely on the element actually being stored at offset 5. Seems a bit fragile. The behaviour of jsonb_set is pretty explicitly documented. If we wanted to do something else then we'd have to disable the special meaning given to negative indices, but that would mean in turn we wouldn't be able to prepend to an array. jsonb_delete() should certainly be able to traverse objects, but it's much less clear that it should be able to *traverse* arrays (affecting arrays is a different story, though). That's why I proposed not supporting traversing arrays with it or with jsonb_set(). This would also removes the questionable second shadow type system within the text[] rhs operand too, which seems like a good thing. I think that traversing arrays in nested documents is a rare requirement, because the ordering within arrays is unstable. If you already know the ordinal number of the thing you want to nuke, then you probably have already locked the row, and you might as well manipulate the JSON using Javascript or Python at that stage. Making jsonb_delete() buy into the operator jsonb ? text idea of a key (a thing that it must delete) would also allow jsonb_delete() to reliably delete particular strings in arrays, which actually does make a lot of sense (think of arrays of tags). But FWIW it's the inconsistency that bothers me most. More importantly, consider the
Re: [HACKERS] brin regression test intermittent failures
I wrote: The other cases that I found involve cidrcol, and seem to represent an actual bug in the brin planning logic, ie failure to disregard a no-op cast. I'll look closer. I leapt to the wrong conclusion on that one. The reason for failure to match to an index column had nothing to do with the extra cast, and everything to do with the fact that there was no such index column. I think we're probably good now, though it'd be wise to keep an eye on chipmunk for awhile to verify that it doesn't see the problem anymore. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] brin regression test intermittent failures
Tom Lane wrote: I wrote: The other cases that I found involve cidrcol, and seem to represent an actual bug in the brin planning logic, ie failure to disregard a no-op cast. I'll look closer. I leapt to the wrong conclusion on that one. The reason for failure to match to an index column had nothing to do with the extra cast, and everything to do with the fact that there was no such index column. Oops! Thanks for reviewing this. I think we're probably good now, though it'd be wise to keep an eye on chipmunk for awhile to verify that it doesn't see the problem anymore. Will do. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Further issues with jsonb semantics, documentation
I'm just skimming here, but if a jsonb_path type is being proposed, perhaps it would be better not to have operators that take text or text[] as second argument. We can provide that functionality with just functions. For example, it will be confusing to have jsonb 'some json value' - '{foo,bar}' operate too differently from jsonb 'some json value' - json_path '{foo,bar}' And it will be a nasty regression to have 9.5 allow jsonb 'some json value' - '{foo,bar}' and then have 9.6 error out with ambiguous operator when the json_path thing is added. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] two-arg current_setting() with fallback
Hi, Attached patch which fixes my review comments. Since code changes were good, just fixed reported cosmetic changes. David, can you please cross check? Thanks -- Jeevan B Chalke Principal Software Engineer, Product Development EnterpriseDB Corporation The Enterprise PostgreSQL Company diff --git a/contrib/tsearch2/tsearch2.c b/contrib/tsearch2/tsearch2.c index 143dabb..4354c5b 100644 --- a/contrib/tsearch2/tsearch2.c +++ b/contrib/tsearch2/tsearch2.c @@ -363,7 +363,7 @@ tsa_tsearch2(PG_FUNCTION_ARGS) tgargs[i + 1] = trigger-tgargs[i]; tgargs[1] = pstrdup(GetConfigOptionByName(default_text_search_config, - NULL)); + NULL, false)); tgargs_old = trigger-tgargs; trigger-tgargs = tgargs; trigger-tgnargs++; diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index c6e3540..7f6c4ad 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -16435,7 +16435,7 @@ SELECT collation for ('foo' COLLATE de_DE); indexterm primarycurrent_setting/primary /indexterm -literalfunctioncurrent_setting(parametersetting_name/parameter)/function/literal +literalfunctioncurrent_setting(parametersetting_name/parameter [, parametermissing_ok/parameter ])/function/literal /entry entrytypetext/type/entry entryget current value of setting/entry @@ -16474,7 +16474,9 @@ SELECT collation for ('foo' COLLATE de_DE); The function functioncurrent_setting/function yields the current value of the setting parametersetting_name/parameter. It corresponds to the acronymSQL/acronym command -commandSHOW/command. An example: +commandSHOW/command. If parametersetting/parameter does not exist, +throws an error unless parametermissing_ok/parameter is +literaltrue/literal. An example: programlisting SELECT current_setting('datestyle'); diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index b3c9f14..a43925d 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -7137,7 +7137,7 @@ ExtractSetVariableArgs(VariableSetStmt *stmt) case VAR_SET_VALUE: return flatten_set_variable_args(stmt-name, stmt-args); case VAR_SET_CURRENT: - return GetConfigOptionByName(stmt-name, NULL); + return GetConfigOptionByName(stmt-name, NULL, false); default: return NULL; } @@ -7206,7 +7206,7 @@ set_config_by_name(PG_FUNCTION_ARGS) true, 0, false); /* get the new current value */ - new_value = GetConfigOptionByName(name, NULL); + new_value = GetConfigOptionByName(name, NULL, false); /* Convert return string to text */ PG_RETURN_TEXT_P(cstring_to_text(new_value)); @@ -7633,7 +7633,7 @@ GetPGVariableResultDesc(const char *name) const char *varname; /* Get the canonical spelling of name */ - (void) GetConfigOptionByName(name, varname); + (void) GetConfigOptionByName(name, varname, false); /* need a tuple descriptor representing a single TEXT column */ tupdesc = CreateTemplateTupleDesc(1, false); @@ -7656,7 +7656,7 @@ ShowGUCConfigOption(const char *name, DestReceiver *dest) char *value; /* Get the value and canonical spelling of name */ - value = GetConfigOptionByName(name, varname); + value = GetConfigOptionByName(name, varname, false); /* need a tuple descriptor representing a single TEXT column */ tupdesc = CreateTemplateTupleDesc(1, false); @@ -7740,19 +7740,26 @@ ShowAllGUCConfig(DestReceiver *dest) } /* - * Return GUC variable value by name; optionally return canonical - * form of name. Return value is palloc'd. + * Return GUC variable value by name; optionally return canonical form of + * name. If the GUC is unset, then throw an error unless missing_ok is true, + * in which case return NULL. Return value is palloc'd. */ char * -GetConfigOptionByName(const char *name, const char **varname) +GetConfigOptionByName(const char *name, const char **varname, bool missing_ok) { struct config_generic *record; record = find_option(name, false, ERROR); if (record == NULL) + { + if (missing_ok) + return NULL; + ereport(ERROR, (errcode(ERRCODE_UNDEFINED_OBJECT), errmsg(unrecognized configuration parameter \%s\, name))); + } + if ((record-flags GUC_SUPERUSER_ONLY) !superuser()) ereport(ERROR, (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), @@ -8046,12 +8053,42 @@ show_config_by_name(PG_FUNCTION_ARGS) varname = TextDatumGetCString(PG_GETARG_DATUM(0)); /* Get the value */ - varval = GetConfigOptionByName(varname, NULL); + varval = GetConfigOptionByName(varname, NULL, false); + + /* Convert to text */ + PG_RETURN_TEXT_P(cstring_to_text(varval)); +} + +/* + * show_config_by_name_missing_ok - equiv to SHOW X command but implemented as + * a function. If X does not exist, suppress the error and just return NULL + * if missing_ok is TRUE. + */ +Datum +show_config_by_name_missing_ok(PG_FUNCTION_ARGS) +{ + char *varname; + bool missing_ok; + char *varval;
Re: [CORE] [HACKERS] postpone next week's release
On 05/30/2015 11:47 PM, Andres Freund wrote: I don't think it's primarily a problem of lack of review; although that is a large problem. I think the biggest systematic problem is that the compound complexity of postgres has increased dramatically over the years. Features have added complexity little by little, each not incrementally not looking that bad. But very little has been done to manage complexity. Since 8.0 the codesize has roughly doubled, but little has been done to manage the increased complexity. Few new abstractions have been introduced and the structure of the code is largely the same. As a somewhat extreme example, let's look at StartupXLOG(). In 8.0 it was ~500 LOC, in master it's ~1500. The interactions in 8.0 were complex, they have gotten much more complex since. It fullfills lots of different roles, all in one function: (roughly in the order things happen, but simplified) * Read the control file/determine whether we crashed * recovery.conf handling * backup label handling * tablespace map handling (huh, I missed that this was added directly to StartupXLOG. What a bad idea) * Determine whether we're doing archive recovery, read the relevant checkpoint if so * relcache init file removal * timeline switch handling * Loading the checkpoint we're starting from * Initialization of a lot of subsystems * crash recovery/replay * Including pgstat, unlogged table, exported snapshot handling * iff hot standby, some more subsystems are initialized here * hot standby state handling * replay process intialization * crash replay itself, including * progress tracking * recovery pause handling * nextxid tracking * timeline increase handling * hot standby state handling * unlogged relations handling * archive recovery handling * creation/initialization of the end of recovery checkpoint * timeline increment if failover * subsystem initialization iff !hot_standby * end of recovery actions Yes. that's one routine. And, to make things even funnier, half of that routine isn't exercised by our tests. You can argue that this is an outlier, but I don't think so. Heapam, the planner, etc. have similar cases. And I think this, to some degree, explains a lot of the multixact problems. While there were a few simple bugs, most of them were interactions between the various subsystems that are rather intricate. I think this explanation is wrong. I agree that there are many places that would be good to refactor - like StartupXLOG() - but the multixact code was not too bad in that regard. IIRC the patch included some refactoring, it added some new helper functions in heapam.c, for example. You can argue that it didn't do enough of it, but that was not the big issue. The big issue was at the architecture level. Basically, we liked vacuuming of XIDs and clog so much that we decided that it'd be nice if you had to vacuum multixids too, in order to not lose data. Many of the bugs and issues were not new - we had multixids before - but we upped the ante and turned minor locking bugs into data loss. And that had nothing to do with the code structure - we'd have similar issues if we had rewritten everything java, with the same design. So, I'm all for refactoring and adding abstractions where it makes sense, but it's not going to solve design problems. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] two-arg current_setting() with fallback
The following review has been posted through the commitfest application: make installcheck-world: tested, failed Implements feature: tested, passed Spec compliant: tested, passed Documentation:tested, passed I have reviewed the patch. Here are my review comments: 1. Patch does not apply due to some recent changes in pg_proc.h 2. -GetConfigOptionByName(const char *name, const char **varname) +GetConfigOptionByNameMissingOK(const char *name, const char **varname, bool missing_ok) Will it be better if we keep the name as is and change the callers to pass false for missing_ok parameter? It looks weired to have an extra #define just to avoid that. I see countable callers and thus see NO issues changing those. 3. Oid used for new function is already used. Check unused_oids.sh. 4. Changes in builtins.h are accidental. Need to remove that. However, code changes looks good and implements the desired feature. The new status of this patch is: Waiting on Author -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [CORE] [HACKERS] postpone next week's release
On 2015-06-04 11:51:44 +0300, Heikki Linnakangas wrote: I think this explanation is wrong. I agree that there are many places that would be good to refactor - like StartupXLOG() - but the multixact code was not too bad in that regard. IIRC the patch included some refactoring, it added some new helper functions in heapam.c, for example. You can argue that it didn't do enough of it, but that was not the big issue. Yea, but the bugs were more around the interactions to other parts of the system. Like e.g. crash recovery, which now is about bug 7 or so. And those are the ones that are hard to understand. The big issue was at the architecture level. Basically, we liked vacuuming of XIDs and clog so much that we decided that it'd be nice if you had to vacuum multixids too, in order to not lose data. Many of the bugs and issues were not new - we had multixids before - but we upped the ante and turned minor locking bugs into data loss. And that had nothing to do with the code structure - we'd have similar issues if we had rewritten everything java, with the same design. I think we're probably just using slightly different terms here - for me one very good way of fixing some structurally bad things *is* improving the design. If you look at the bugs around multixacts: The first few were around ctid-chaining, hard to find and fix because there's about 8-10 places implementing it with slight differences. The next bunch were around vacuuming, some of them oversights, a good bunch of them more fundamental. Crash recovery wasn't thought about (lack of testing/review), and more generally the new code tripped over bad old decisions (hey, wraparound is ok!). Then there were a bunch of stupid bugs in crash-recovery (testing mainly), and larger scale bugs (hey, let's access stuff during recovery). Then there's the whole row level locking code - which is by now among the hardest to understand code in postgres - and voila it contained a bunch of oversights that were hard to spot. So yes, I think nicer code to work with would have prevented us from making a significant portion of these. It might have also made us realize earlier how significant the increase in complexity was. So, I'm all for refactoring and adding abstractions where it makes sense, but it's not going to solve design problems. I personally don't really see the multixact changes being that bad on the overall design. It pretty much just extended an earlier design. Now that wasn't great, but I don't think too many people had realized that at that point. The biggest problem was underestimating the complexity. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [CORE] [HACKERS] postpone next week's release
On 30 May 2015 at 05:08, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Fri, May 29, 2015 at 6:33 PM, Andres Freund and...@anarazel.de wrote: Why? A large portion of the input required to go from beta towards a release is from actual users. To see when things break, what confuses them and such. I have two concerns: 1. I'm concerned that once we release beta, any idea about reverting a feature or fixing something that is broken will get harder, because people will say well, we can't do that after we've released a beta. I confess to particularly wanting a solution to the item listed as custom-join has no way to construct Plan nodes of child Path nodes, the history of which I'll avoid recapitulating until I'm sure I can do it while maintaining my blood pressure at safe levels. 2. Also, if we're going to make significant multixact-related changes to 9.5 to try to improve reliability, as you proposed on the other thread, then it would be nice to do that before beta, so that it gets tested. Of course, someone is bound to point out that we could make those changes in time for beta2, and people could test that. But in practice I think that'll just mean that stuff is only out there for let's say 2 months before we put it in a major release, which ain't much. I think your position is completely nuts. The GROUPING SETS code is desperately in need of testing. The custom-plan code is desperately in need of fixing and testing. The multixact code is desperately in need of testing. The open-items list has several other problems besides those. All of those problems are independent. If we insist on tackling them serially rather than in parallel, 9.5 might not come out till 2017. I agree that we are not in a position to promise features won't change. So let's call it an alpha not a beta --- but for heaven's sake let's try to move forward on all these issues, not just some of them. I think releasing 9.5 in some form NOW will aid its software quality. We've never linked Beta release date to final release date, so if the quality proves to be as poor as some people think then the list of bugs will show that and we release later. AFAIK beta period is exactly the time when we are allowed to pull features from the release. I welcome the idea that we test it, if its stable and it works we release it. If doesn't, we pull it. Not releasing our software yet making a list of our fears doesn't work towards a solution. Our fears will make us shout at each other too, so I for one would rather skip that part and do some practical actions. -- Simon Riggshttp://www.2ndQuadrant.com/ http://www.2ndquadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services
Re: [HACKERS] Further issues with jsonb semantics, documentation
On Thu, Jun 4, 2015 at 12:16 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: I'm just skimming here, but if a jsonb_path type is being proposed, perhaps it would be better not to have operators that take text or text[] as second argument. We can provide that functionality with just functions. For example, it will be confusing to have jsonb 'some json value' - '{foo,bar}' operate too differently from jsonb 'some json value' - json_path '{foo,bar}' And it will be a nasty regression to have 9.5 allow jsonb 'some json value' - '{foo,bar}' and then have 9.6 error out with ambiguous operator when the json_path thing is added. Fair point, but FWIW I don't think it'll end up being a new type like json_path -- it'll just be jsonb, as with containment. I can see there being an operator that performs deletion in a very similar way to how operator jsonb @ jsonb performs containment (recall that jsonb containment is a very JSON-ish flavor of containment). I would like these new-to-9.5 deletion operators to work at the top level only, like operator jsonb ? text and operator jsonb ?| text, sharing their idea of a key, __including that string array elements are keys__. We haven't got a containment-style nested delete operator for 9.5, but we can hope for it in the future. In the meantime, you get much of the benefit of that with jsonb_delete(), which *can* support nested deletion. It does so by buying into the operator jsonb ? text idea of a key (including that string array elements are keys), although with a twist: the paths text[] right operand operates at multiple nesting levels (not supporting traversing arrays, as Andrew implemented it, but OTOH adding support for deleting String array elements based on the string alone, useful for tag arrays). If in 9.6 we have something like an operator jsonb @- jsonb operator for containment style deletion, and a 9.5 era operator jsonb - text and operator jsonb - text[] pair of operators for existence style deletion (matching operator jsonb ? text, operating only on the top level), that will be pretty good. The fact that jsonb_delete() will have somewhat bridged the gap nesting-deletion-wise for 9.5 (without being usable through an operator) won't really matter then. I want to keep the twist I described out of any jsonb operators that are shipped, and only use it within functions. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Publish autovacuum informations
2015-01-05 17:44 GMT+01:00 Guillaume Lelarge guilla...@lelarge.info: 2015-01-05 17:40 GMT+01:00 Robert Haas robertmh...@gmail.com: On Wed, Dec 31, 2014 at 12:46 PM, Tom Lane t...@sss.pgh.pa.us wrote: I'd be all right with putting the data structure declarations in a file named something like autovacuum_private.h, especially if it carried an annotation that if you depend on this, don't be surprised if we break your code in future. Works for me. I am not in general surprised when we do things that break my code, or anyway, the code that I'm responsible for maintaining. But I think it makes sense to segregate this into a separate header file so that we are clear that it is only exposed for the benefit of extension authors, not so that other things in the core system can touch it. I'm fine with that too. I'll try to find some time to work on that. So I took a look at this this week. I discovered, with the help of a coworker, that I can already use the AutoVacuumShmem pointer and read the struct. Unfortunately, it doesn't give me as much details as I would have liked. The list of databases and tables aren't in shared memory. They are local to the process that uses them. Putting them in shared memory (if at all possible) would imply a much bigger patch than I was willing to write right now. Thanks anyway for the help. -- Guillaume. http://blog.guillaume.lelarge.info http://www.dalibo.com
Re: [HACKERS] Re: [COMMITTERS] pgsql: Map basebackup tablespaces using a tablespace_map file
On Thu, Jun 4, 2015 at 10:14 AM, Amit Kapila amit.kapil...@gmail.com wrote: On Thu, Jun 4, 2015 at 1:52 AM, Andrew Dunstan and...@dunslane.net wrote: On 06/02/2015 11:55 PM, Amit Kapila wrote: On Tue, Jun 2, 2015 at 10:26 PM, Andrew Dunstan and...@dunslane.net mailto:and...@dunslane.net wrote: Well, it seems to me the new function is being altogether way too trusting about the nature of what it's being asked to remove. In the first place, the S_ISDIR/rmdir branch should only be for Windows, and secondly in the other branch we should be checking that S_ISLNK is true. It would actually be nice if we could test for a junction point on Windows, but that seems to be a bit difficult. I think during recovery for tablespace related operations, it is quite possible to have a directory instead of symlink in some special cases (see function TablespaceCreateDbspace() and comments in destroy_tablespace_directories() { ..Try to remove the symlink..}). Also this new function is being called from create_tablespace_directories() which uses the code as written in new function, so it doesn't make much sense to change it Windows and non-Windows specific code. Looking at it again, this might be not as bad as I thought, but I do think we should probably call the function something other than rmsymlink(). That seems too generic, since it also tries to remove directories - albeit that this will fail if the directory isn't empty. And I still think we should add a test for S_ISLNK in the second branch. As it stands the function could try to unlink anything that's not a directory. That might be safe-ish in the context it's used in for the tablespace code, but it's far from safe enough for a function that's in src/common Okay, as we both seem to agree that it can be mostly used in tablespace symlinks context, so I have changed the name to remove_tablespace_symlink() and moved the function to tablespace.c. S_ISLINK check is used for non-windows code, so not sure adding it here makes any real difference now that we have made it specific to tablespace and we might need to write small port specific code if we want to add S_ISLINK check. Given that the function raises an error on failure, I think it will otherwise be OK as is. Please find an updated patch attached with this mail. Sorry, forgot to attach the patch in previous mail, now attaching. Thanks Bruce for reminding me offlist. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com remove_only_symlinks_during_recovery_v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Memory leak with XLogFileCopy since de768844 (WAL file with .partial)
On Mon, Jun 1, 2015 at 4:24 PM, Michael Paquier michael.paqu...@gmail.com wrote: On Thu, May 28, 2015 at 9:09 PM, Michael Paquier michael.paqu...@gmail.com wrote: Since commit de768844, XLogFileCopy of xlog.c returns to caller a pstrdup'd string that can be used afterwards for other things. XLogFileCopy is used in only one place, and it happens that the result string is never freed at all, leaking memory. That seems to be almost harmless because the startup process will exit just after calling XLogFileCopy(). No? Having a second look at that, XLogFileCopy() is called only in one place, and dstfname is never used, hence I think that it would make sense to return unconditionally a temporary file name to the caller. +1 Also the current error message in case of failure is very C-like: elog(ERROR, InstallXLogFileSegment should not have failed); I thought that we to use function names in the error messages. Wouldn't something like that be more adapted? could not copy partial log file or could not copy partial log file into temporary segment file Or we can extend InstallXLogFileSegment so that we can give the log level to it. When InstallXLogFileSegment is called after XLogFileCopy, we can give ERROR as the log level and cause an exception to occur if link() or rename() fails in InstallXLogFileSegment. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [GENERAL] 9.4.1 - 9.4.2 problem: could not access status of transaction 1
On Thu, Jun 4, 2015 at 2:42 AM, Noah Misch n...@leadboat.com wrote: I like that change a lot. It's much easier to seek forgiveness for wasting = 28 GiB of disk than for deleting visibility information wrongly. I'm glad you like it. I concur. 2. If setting the offset stop limit (the point where we refuse to create new multixact space), we don't arm the stop point. This means that if you're in this situation, you run without member wraparound protection until it's corrected. A message gets logged once per checkpoint telling you that you have this problem, and another message gets logged when things get straightened out and the guards are enabled. 3. If setting the vacuum force point, we assume that it's appropriate to immediately force vacuum. Those seem reasonable, too. Cool. I've only tested this very lightly - this is just to see what you and Noah and others think of the approach. As compared with the previous approach, it has the advantage of making minimal assumptions about the sanity of what's on disk. It has the disadvantage that, for some people, the member-wraparound guard won't be enabled at startup -- but note that those people can't start 9.3.7/9.4.2 *at all*, so currently they are either running without member wraparound protection anyway (if they haven't upgraded to those releases) or they're down entirely. That disadvantage is negligible, considering. All right. Another disadvantage is that we'll be triggering what may be quite a bit of autovacuum activity for some people, which could be painful. On the plus side, they'll hopefully end up with sane relminmxid and datminmxid guards afterwards. That sounds good so long as each table requires just one successful emergency autovacuum. I'm not seeing code to ensure that the launched autovacuum will indeed perform a full-table scan and update relminmxid; is it there? No. Oops. For sites that can't tolerate an autovacuum storm, what alternative can we provide? Is SET vacuum_multixact_freeze_table_age = 0; VACUUM table of every table, done before applying the minor update, sufficient? I don't know. In practical terms, they probably need to ensure that if pg_multixact/offsets/ does not exist, no relations have relminmxid = 1 and no remaining databases have datminmxid = 1. Exactly what it will take to get there is possibly dependent on which minor release you are running; on current minor releases, I am hopeful that what you propose is sufficient. static void -DetermineSafeOldestOffset(MultiXactId oldestMXact) +DetermineSafeOldestOffset(MultiXactOffset oldestMXact) Leftover change from an earlier iteration? The values passed continue to be MultiXactId values. Oopsie. /* move back to start of the corresponding segment */ - oldestOffset -= oldestOffset % - (MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT); + offsetStopLimit = oldestOffset - (oldestOffset % + (MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT)); + /* always leave one segment before the wraparound point */ + offsetStopLimit -= (MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT); + + /* if nothing has changed, we're done */ + if (prevOffsetStopLimitKnown offsetStopLimit == prevOffsetStopLimit) + return; LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE); - /* always leave one segment before the wraparound point */ - MultiXactState-offsetStopLimit = oldestOffset - - (MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT); + MultiXactState-offsetStopLimit = oldestOffset; That last line needs s/oldestOffset/offsetStopLimit/, I presume. Another oops. Thanks for the review. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Further issues with jsonb semantics, documentation
On 06/03/2015 10:02 PM, Peter Geoghegan wrote: I've noticed some more issues with the jsonb documentation, and the new jsonb stuff generally. I didn't set out to give Andrew feedback on the semantics weeks after feature freeze, but unfortunately this feels like another discussion that we need to have now rather than later. Yes, I wish you had raised these issues months ago when this was published. That's the way the process is supposed to work. operator jsonb - integer === Summary: I think that this operator has a problem, but a problem that can easily be fixed. I think it was a bad idea to allow array-style removal of object key/value pairs. ISTM that it implies a level of stability in the ordering that doesn't make sense. Besides, is it really all that useful? The origin of this is nested hstore. Looking at my last version of that patch, I see: SELECT 'a=1, b=2, c=3'::hstore - 3; ?column? a=1, b=2, c=3 (1 row) But I agree that it's not a great contribution to science, especially since the index will be applied to the list of elements in the somewhat counter-intuitive storage order we use, and we could just raise an error if we try to apply integer delete to an object instead of an array. operator jsonb - text[] (and *nested* deletion more generally) === Summary: I think that this operator has many problems, and should be scraped (although only as an operator). IMV nested deletion should only be handled by functions, and the way that nested deletion works in general should be slightly adjusted. The new operator jsonb - text[] operator is confusingly inconsistent with: A) operator jsonb text What exactly is this? I have no idea what you're talking about. and: B) the established operator hstore - text[] operator, since that operator deletes all key/value pairs that have keys that match any of the right operand text array values. In contrast, this new operator is passed as its right operand an array of text elements that constitute a path (so the order in the rhs text[] operand matters). If the text element in the rhs text[] operand happens to be what would pass for a Postgres integer literal, it can be used to traverse lhs array values through subscripting at that nesting level. The fact that hstore uses it that way doesn't really concern me. Since hstore isn't nested it doesn't make a whole lot of sense for it to mean anything else there. But json(b) is nested, and jsonb - path seems quite a reasonable treatment, something you're much more likely to want to do than removeing top level elements in bulk. Regarding nested deletion behavior more generally, consider this example of how this can work out badly: postgres=# select jsonb_delete(jsonb_set('[a]', '{5}', 'b'), '{5}') ; jsonb_delete -- [a, b] (1 row) Here, we're adding and then deleting an array element at offset 5 (the string b). But the element is never deleted by the outer jsonb_delete(), because we can't rely on the element actually being stored at offset 5. Seems a bit fragile. The behaviour of jsonb_set is pretty explicitly documented. If we wanted to do something else then we'd have to disable the special meaning given to negative indices, but that would mean in turn we wouldn't be able to prepend to an array. More importantly, consider the inconsistency with operator jsonb text (point A above): postgres=# select '[a]'::jsonb ?| '{a}'::text[]; -- historic/9.4 behavior ?column? -- t (1 row) postgres=# select '[a]'::jsonb - '{a}'::text[]; -- new to 9.5 operator, does not delete! ?column? -- [a] (1 row) You are conflating two different things here, quite pointlessly. The RH operand of ?| is not a path, whereas the RH operand of this - variant is. The fact that they are both text arrays doesn't mean that they should mean the same thing. And this is really the whole problem with the rest of your analysis. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers