Re: [HACKERS] spoonbill vs. -HEAD
On 04/03/2013 12:59 AM, Tom Lane wrote: > I wrote: >> I think the simplest fix is to insert "PG_SETMASK(&UnBlockSig)" into >> StatementCancelHandler() and any other handlers that might exit via >> longjmp. I'm a bit inclined to only do this on platforms where a >> problem is demonstrable, which so far is only OpenBSD. (You'd >> think that all BSDen would have the same issue, but the buildfarm >> shows otherwise.) > > BTW, on further thought it seems like maybe this is an OpenBSD bug, > at least in part: what is evidently happening is that the temporary > blockage of SIGINT during the handler persists even after we've > longjmp'd back to the main loop. But we're using sigsetjmp(..., 1) > to establish that longjmp handler --- so why isn't the original signal > mask reinstalled when we return to the main loop? > > If (your version of) OpenBSD is getting this wrong, it'd explain why > we've not seen similar behavior elsewhere. hmm trolling the openbsd cvs history brings up this: http://www.openbsd.org/cgi-bin/cvsweb/src/sys/arch/sparc64/sparc64/machdep.c?r1=1.143;sortby=date#rev1.143 Stefan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review]
On Wednesday, April 03, 2013 2:23 AM Greg Smith wrote: > On 4/1/13 5:44 AM, Amit Kapila wrote: > > > I think in that case we can have 3 separate patches > > 1. Memory growth defect fix > > 2. Default postgresql.conf to include config directory and SET > Persistent > > into single file implementation > > 3. Rearrangement of GUC validation into validate_conf_option > function. > > > > As already there is review happened for point 2 and point 1 is an > existing > > code defect fix, so in my opinion > > Patch 1 and 2 should be considered for 9.3. > > They have been considered for 9.3, I just doubt they could get > committed > right now. In order for this to go in as part of the very last 9.3 > feature changes (which is where we're at in the development cycle), > you'd need to have a committer volunteer to take on the job of doing a > second level of review on it. And then that would have to happen > without any other issues popping up. That usually is not how it works. > Normally the first round of committer review finds another round of > issues, and there's at least one more update before commit. (Note that > this is exactly what just happened today, with the review from Peter > Eisentraut) > > I'm not saying it's impossible for this feature to go in to 9.3, but > I'm > not seeing any committers volunteer to take on the job either. I do > want to see this feature go in--I'll update it for 9.4 even if you > don't--but we're already into April. There isn't much time left for > 9.3. And the security release this week has gobbled up a good chunk of > committer and packager time unexpectedly, which is just bad luck for > your submission. > > From a process perspective, features that enter the last CF of a > release that are very close to ready from the start have good odds of > being committed. You've done an excellent job of updating this in > response to feedback, but it has involved a long list of changes so > far. > It's fair to say this was still a rough feature at the start of CF > 2013-01, and now it's good but can be usefully polished a bit more. > > For something of this size, going from rough feature to commit quality > normally takes more than one CF. I don't have any obvious errors to > point out right now. But I think there's still some room to improve on > this before commit. Andres mentioned on another thread that he thought > merging some of your ideas with the version Zoltan did was useful to > look at, and I was thinking of something similar. This is close to > being ready, and I hope you won't get discouraged just because it's > probably going to slip to 9.4. Thank you for keeping me motivated for this patch now and throughout the last CF by providing valuable suggestions and comments. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review]
On Tuesday, April 02, 2013 9:49 PM Peter Eisentraut wrote: > I'm going to ignore most of the discussion that led up to this and give > this patch a fresh look. Thank you. > + > + SET PERSISTENT max_connections To 10; > + > > The To should probably be capitalized. Okay, shall fix it. > I doubt this example actually works because changing max_connections > requires a restart. Try to find a better example. Any parameter's changed with this command can be effective either after restart or SIGHUP. So I will change it to SIGHUP parameter. > It's weird that SET LOCAL and SET SESSION actually *set* the value, and > the second key word determines how long the setting will last. SET > PERSISTENT doesn't actually set the value. I predict that this will be > a new favorite help-it-doesn't-work FAQ. > > >SCHEMA > > ! SET [ PERSISTENT ] SCHEMA > 'value' is an alias for > SET search_path TO value. Only > one > schema can be specified using this syntax. > > > I don't think [ PERSISTENT ] needs to be added in these and similar > snippets. We don't mention LOCAL etc. here either. Agreed, shall fix this. > --- 34,41 > postgresql.conf, > pg_hba.conf, and > pg_ident.conf are traditionally stored in > PGDATA (although in PostgreSQL > 8.0 and > ! later, it is possible to place them elsewhere). By default the > directory config is stored > ! in PGDATA, however it should be kept along with > postgresql.conf. > > > > > This chunk doesn't make any sense to me. "Should" is always tricky. > Why should I, and why might I not? I mean to say here, that user needs to move config directory and its contents along with postgresql.conf. Shall we change as below: By default config directory is stored in PGDATA, however it needs to be kept along with postgresql.conf > > + config > + Subdirectory containing automatically generated configuration > files > + > + > + >base >Subdirectory containing per-database subdirectories > > > Only automatically generated ones? No, any other files can also be present. How about change it as : Subdirectory containing generated configuration files. Any other suggestions? This new directory's will be used to place generated files, > COPY_STRING_FIELD(name); > COPY_NODE_FIELD(args); > COPY_SCALAR_FIELD(is_local); > + COPY_SCALAR_FIELD(is_persistent); > > return newnode; > } > > I suggest changing is_local into a new trivalued field that stores > LOCAL > or SESSION or PERSISTENT. > > n->is_local = false; > $$ = (Node *) n; > } Okay, I will change it. > + | SET PERSISTENT set_persistent > + { > + VariableSetStmt *n = $3; > + n->is_persistent = true; > + $$ = (Node *) n; > + } > ; > > set_rest: > > Why can't you use SET PERSISTENT set_rest? As SET PERSISTENT cannot be used with some of syntaxes, example (SESSION AUTHORIZATION) and also it is not supportted inside transaction blocks. > > *** a/src/backend/replication/basebackup.c > --- b/src/backend/replication/basebackup.c > *** > *** 755,760 sendDir(char *path, int basepathlen, bool sizeonly) > --- 755,766 > strlen(PG_TEMP_FILE_PREFIX)) == > 0) > continue; > > + /* skip auto conf temporary file */ > + if (strncmp(de->d_name, > + PG_AUTOCONF_FILENAME ".", > + sizeof(PG_AUTOCONF_FILENAME)) > == 0) > + continue; > + > > Maybe pg_basebackup should be taught to ignore certain kinds of > temporary files in general. The file name shouldn't be hardcoded into > pg_basebackup. This would effectively make the configuration file > naming scheme part of the replication protocol. See other thread about > pg_basebackup client/server compatibility. This needs to be > generalized. As we are just trying to ignore the file during backup, why it should effect replication protocol? I mean protocol should be effected, if try to send something new or don't send what is expected on the other side. Also the same is done for team and backup label file. > (Side thought: Does pg_basebackup copy editor temp and backup files?) Currently pg_basebackup doesn't copy temp or backup files. The check I made in code is similar to check for those two. > > + ereport(elevel, > + (errmsg("Configuration parameters changed with SET > PERSISTENT command before start of server " > + "will not be effective as \"%s\" file is not > accessible.", PG_AUTOCO
[HACKERS] commit dfda6ebaec67 versus wal_keep_segments
This commit introduced a problem with wal_keep_segments: commit dfda6ebaec6763090fb78b458a979b558c50b39b Author: Heikki Linnakangas Date: Sun Jun 24 18:06:38 2012 +0300 Don't waste the last segment of each 4GB logical log file. in a side window do: watch "ls -lrt /tmp/data/pg_xlog" dfda6ebaec/bin/initdb -D /tmp/data dfda6ebaec/bin/pg_ctl -D /tmp/data -l logfile restart -o "--fsync=off --wal_keep_segments=20" createdb pgbench -i -s10 pgbench -T3600 xlogs accumulate until there are about 26 of them. Then all of a sudden they drop down to 11 of them. Then it builds back up to around 26, and seems to stay there permanently. At some point when it is over-pruning and recycling, it recyles the log files that are still needed for recovery, and if the database crashes at that point it will not recover because it can't find either the primary secondary checkpoint records. Cheers, Jeff
[HACKERS] psql crash fix
I found that psql will crash if given a PSQLRC value containing a tilde: $ PSQLRC="~/x" psql test *** glibc detected *** psql: free(): invalid pointer: 0x7fffb7c933ec *** This is on Debian Squeeze 6.0.7. The fix is to pstrdup() the value returned by getenv(), so it can be free()'ed later --- you can't free getenv()-returned values: As typically implemented, getenv() returns a pointer to a string within the environment list. The caller must take care not to modify this string, since that would change the environment of the process. This bug exists in 9.2 and git head. I also removed the return value from expand_tilde() as no caller was using it. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c new file mode 100644 index be5e34a..3dea92c *** a/src/bin/psql/common.c --- b/src/bin/psql/common.c *** session_username(void) *** 1645,1655 * substitute '~' with HOME or '~username' with username's home dir * */ ! char * expand_tilde(char **filename) { if (!filename || !(*filename)) ! return NULL; /* * WIN32 doesn't use tilde expansion for file names. Also, it uses tilde --- 1645,1655 * substitute '~' with HOME or '~username' with username's home dir * */ ! void expand_tilde(char **filename) { if (!filename || !(*filename)) ! return; /* * WIN32 doesn't use tilde expansion for file names. Also, it uses tilde *** expand_tilde(char **filename) *** 1697,1701 } #endif ! return *filename; } --- 1697,1701 } #endif ! return; } diff --git a/src/bin/psql/common.h b/src/bin/psql/common.h new file mode 100644 index d8bb093..db645da *** a/src/bin/psql/common.h --- b/src/bin/psql/common.h *** extern bool is_superuser(void); *** 44,49 extern bool standard_strings(void); extern const char *session_username(void); ! extern char *expand_tilde(char **filename); #endif /* COMMON_H */ --- 44,49 extern bool standard_strings(void); extern const char *session_username(void); ! extern void expand_tilde(char **filename); #endif /* COMMON_H */ diff --git a/src/bin/psql/startup.c b/src/bin/psql/startup.c new file mode 100644 index 5cb6b5f..5d7fe6e *** a/src/bin/psql/startup.c --- b/src/bin/psql/startup.c *** process_psqlrc(char *argv0) *** 610,616 char rc_file[MAXPGPATH]; char my_exec_path[MAXPGPATH]; char etc_path[MAXPGPATH]; ! char *envrc; find_my_exec(argv0, my_exec_path); get_etc_path(my_exec_path, etc_path); --- 610,616 char rc_file[MAXPGPATH]; char my_exec_path[MAXPGPATH]; char etc_path[MAXPGPATH]; ! char *envrc = getenv("PSQLRC"); find_my_exec(argv0, my_exec_path); get_etc_path(my_exec_path, etc_path); *** process_psqlrc(char *argv0) *** 618,629 snprintf(rc_file, MAXPGPATH, "%s/%s", etc_path, SYSPSQLRC); process_psqlrc_file(rc_file); - envrc = getenv("PSQLRC"); - if (envrc != NULL && strlen(envrc) > 0) { ! expand_tilde(&envrc); ! process_psqlrc_file(envrc); } else if (get_home_path(home)) { --- 618,630 snprintf(rc_file, MAXPGPATH, "%s/%s", etc_path, SYSPSQLRC); process_psqlrc_file(rc_file); if (envrc != NULL && strlen(envrc) > 0) { ! /* might need to free() this */ ! char *envrc_alloc = pstrdup(envrc); ! ! expand_tilde(&envrc_alloc); ! process_psqlrc_file(envrc_alloc); } else if (get_home_path(home)) { -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] spoonbill vs. -HEAD
I wrote: > I think the simplest fix is to insert "PG_SETMASK(&UnBlockSig)" into > StatementCancelHandler() and any other handlers that might exit via > longjmp. I'm a bit inclined to only do this on platforms where a > problem is demonstrable, which so far is only OpenBSD. (You'd > think that all BSDen would have the same issue, but the buildfarm > shows otherwise.) BTW, on further thought it seems like maybe this is an OpenBSD bug, at least in part: what is evidently happening is that the temporary blockage of SIGINT during the handler persists even after we've longjmp'd back to the main loop. But we're using sigsetjmp(..., 1) to establish that longjmp handler --- so why isn't the original signal mask reinstalled when we return to the main loop? If (your version of) OpenBSD is getting this wrong, it'd explain why we've not seen similar behavior elsewhere. This theory doesn't really exonerate our code completely, because we use sigsetjmp(..., 0) in PG_TRY, which means that in a code path where we catch a longjmp and don't PG_RE_THROW it, we could be left with the signal disabled. I don't believe that is happening in the isolation-test cases, though. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] CREATE EXTENSION BLOCKS
Hackers, I am working on scripts to copy data from Oracle via oracle_fdw. They each do something like this: CREATE SCHEMA migrate_stuff; SET search_path TO migrate_stuff,public; CREATE EXTENSION oracle_fdw SCHEMA migrate_rules; CREATE SERVER oracle_stuff FOREIGN DATA WRAPPER oracle_fdw OPTIONS (dbserver :'oracle_uri'); CREATE USER MAPPING FOR postgres SERVER oracle_stuff OPTIONS (user :'oracle_user', password :'oracle_pass'); CREATE FOREIGN TABLE migrate_stuff ( stuff_id integer, name text ) SERVER oracle_rules OPTIONS(table 'STUFF'); INSERT INTO my.stuff SELECT * FROM migrate_stuff; DROP SCHEMA migrate_stuff CASCADE; COMMIT; Then I run them in parallel: for file in migrate*.sql; do psql -d foo -f $file & done wait This works fine except for one thing: the first CREATE EXTENSION statement blocks all the others. Even when I create the extension in separate schemas in each script! I have to remove the CREATE EXTENSION statement, create it in public before any of the scripts run, then drop it when they're done. I'm okay with this workaround, but wasn't sure if the blocking of CREATE EXTENSION was intentional or a known issue (id did not see it documented in http://www.postgresql.org/docs/current/static/sql-createextension.html). Thanks, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] spoonbill vs. -HEAD
Stefan Kaltenbrunner writes: > On 03/26/2013 11:30 PM, Tom Lane wrote: >> A different line of thought is that the cancel was received by the >> backend but didn't succeed in cancelling the query for some reason. > I added the "pgcancel failed" codepath you suggested but it does not > seem to get triggered at all so the above might actually be what is > happening... Stefan was kind enough to grant me access to spoonbill, and after some experimentation I found out the problem. It seems that OpenBSD blocks additional deliveries of a signal while the signal handler is in progress, and that this is implemented by just calling sigprocmask() before and after calling the handler. Therefore, if the handler doesn't return normally --- like, say, it longjmps --- the restoration of the previous mask never happens. So we're left with the signal still blocked, meaning second and subsequent attempts to interrupt the backend don't work. This isn't revealed by the regular regression tests because they don't exercise PQcancel, but several recently-added isolation tests do attempt to PQcancel the same backend more than once. It's a bit surprising that it's taken us this long to recognize the problem. Typical use of PQcancel doesn't necessarily cause a failure: StatementCancelHandler() won't exit through longjmp unless ImmediateInterruptOK is true, which is only the case while waiting for a heavyweight lock. But still, you'd think somebody would've run into the case in normal usage. I think the simplest fix is to insert "PG_SETMASK(&UnBlockSig)" into StatementCancelHandler() and any other handlers that might exit via longjmp. I'm a bit inclined to only do this on platforms where a problem is demonstrable, which so far is only OpenBSD. (You'd think that all BSDen would have the same issue, but the buildfarm shows otherwise.) BTW, this does not seem to explain the symptoms shown at http://www.postgresql.org/message-id/4fe4d89a.8020...@kaltenbrunner.cc because what we were seeing there was that *all* signals appeared to be blocked. However, after this round of debugging I no longer have a lot of faith in OpenBSD's ps, because it was lying to me about whether the process had signals blocked or not (or at least, it couldn't see the effects of the interrupt signal disable, although when I added debugging code to print the active signal mask according to sigprocmask() I got told the truth). So I'm not sure how much trust to put in those older ps results. It's possible that the previous failures were a manifestation of something related to this bug. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: index support for regexp search
On Wed, Apr 3, 2013 at 12:36 AM, Erikjan Rijkers wrote: > On Mon, April 1, 2013 23:15, Alexander Korotkov wrote: > > [trgm-regexp-0.14.patch.gz] > > Hi Alexander, > Hi Erik! > Something went wrong in this version of the patch: many (most) queries > that were earlier > spectacularly fast have become slow, often slower than a seqscan or only > marginally faster. See > the attached numbers; it compares head(seqscan) with trgm-regex patch > versions 13 and 14. > > I did not even complete the test-run because version 14 is so clearly > inferior to 13 (and earlier, > as far as I can remember). > > (let me know if you want the whole result, I can run it overnight) > Yes, there was bug in new color map scan implementation. It was fixed in attached version. Could you re-run tests please? Attached version of patch also addresses problem that we now using API it's not correct to use special values of color number. It introduces additional structure ExtendedColor which contain ExtendedColorClass field for handling additional "virtual" colors introduced by analysis. -- With best regards, Alexander Korotkov. trgm-regexp-0.15.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Getting to 9.3 beta
On 29 March 2013 15:05, Tom Lane wrote: > Bruce Momjian writes: >> Our final 9.3 commit-fest has has exceeded the two-month mark, so it is >> time to start targeting a date to close it and get to 9.3 beta. I see >> 25 items will needing attention before we can close it: > >> https://commitfest.postgresql.org/action/commitfest_view?id=17 > >> What is a reasonable timeframe to target for completion of these items? > > TBH, once Andrew commits the JSON patch, I wouldn't have a problem with > moving all the rest to Returned With Feedback or the next CF. None of > the others seem to me to be close-to-committable (with the possible > exception of the Max LSN patch, which I've not looked at), and April is > not the time to be doing development. Agreed I completely agree with the need to end this CF here/now/soon. As someone who has normally tried to stretch the limit on what is possible, I do recognise the need for this to come to an end. (Certainly my own available time is running out.) Thanks very much to everybody that tried to get so much into the release. > Next week is going to be tied up with the back-branch releases, but > maybe we could target beta for the week after? The main gating factor > at this point really would be how quickly we could write some draft > release notes, so people know what to test. I think we probably need a little more time for open items discussion and lists (not more feature commits, just docs and I-said-I'd-change-thats). In the past that has taken a couple of weeks to work through, so I'd guess there's a few things to sort out there. So I suggest about 23 April for Beta1. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review]
On 4/1/13 5:44 AM, Amit Kapila wrote: I think in that case we can have 3 separate patches 1. Memory growth defect fix 2. Default postgresql.conf to include config directory and SET Persistent into single file implementation 3. Rearrangement of GUC validation into validate_conf_option function. As already there is review happened for point 2 and point 1 is an existing code defect fix, so in my opinion Patch 1 and 2 should be considered for 9.3. They have been considered for 9.3, I just doubt they could get committed right now. In order for this to go in as part of the very last 9.3 feature changes (which is where we're at in the development cycle), you'd need to have a committer volunteer to take on the job of doing a second level of review on it. And then that would have to happen without any other issues popping up. That usually is not how it works. Normally the first round of committer review finds another round of issues, and there's at least one more update before commit. (Note that this is exactly what just happened today, with the review from Peter Eisentraut) I'm not saying it's impossible for this feature to go in to 9.3, but I'm not seeing any committers volunteer to take on the job either. I do want to see this feature go in--I'll update it for 9.4 even if you don't--but we're already into April. There isn't much time left for 9.3. And the security release this week has gobbled up a good chunk of committer and packager time unexpectedly, which is just bad luck for your submission. From a process perspective, features that enter the last CF of a release that are very close to ready from the start have good odds of being committed. You've done an excellent job of updating this in response to feedback, but it has involved a long list of changes so far. It's fair to say this was still a rough feature at the start of CF 2013-01, and now it's good but can be usefully polished a bit more. For something of this size, going from rough feature to commit quality normally takes more than one CF. I don't have any obvious errors to point out right now. But I think there's still some room to improve on this before commit. Andres mentioned on another thread that he thought merging some of your ideas with the version Zoltan did was useful to look at, and I was thinking of something similar. This is close to being ready, and I hope you won't get discouraged just because it's probably going to slip to 9.4. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: index support for regexp search
On Mon, April 1, 2013 23:15, Alexander Korotkov wrote: [trgm-regexp-0.14.patch.gz] Hi Alexander, Something went wrong in this version of the patch: many (most) queries that were earlier spectacularly fast have become slow, often slower than a seqscan or only marginally faster. See the attached numbers; it compares head(seqscan) with trgm-regex patch versions 13 and 14. I did not even complete the test-run because version 14 is so clearly inferior to 13 (and earlier, as far as I can remember). (let me know if you want the whole result, I can run it overnight) Thanks, Erik Rijkers re-head-13-14-20130402-2219.txt.bz2 Description: BZip2 compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Spin Lock sleep resolution
On Tue, 2 Apr 2013 09:01:36 -0700 Jeff Janes wrote: > Sorry. I triple checked that the patch was there, but it seems like if you > save a draft with an attachment, when you come back later to finish and > send it, the attachment may not be there anymore. The Gmail Offline teams > still has a ways to go. Hopefully it is actually there this time. I'll give the patch a try, I have a workload that is impacted by spinlocks fairly heavily sometimes and this might help or at least give me more information. Thanks! -dg -- David Gould da...@sonic.net If simplicity worked, the world would be overrun with insects. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Drastic performance loss in assert-enabled build in HEAD
Kevin Grittner wrote: > Tom Lane wrote: >> Another reason why I don't like this code is that >> pg_relation_is_scannable is broken by design: >> >> relid = PG_GETARG_OID(0); >> relation = RelationIdGetRelation(relid); >> result = relation->rd_isscannable; >> RelationClose(relation); >> >> You can't do that: if the relcache entry doesn't already exist, >> this will try to construct one while not holding any lock on the >> relation, which is subject to all sorts of race conditions. > > Hmm. I think I had that covered earlier but messed up in > rearranging to respond to review comments. Will review both new > calling locations. For the SQL-level function, does this look OK?: diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c index d589d26..94e55f0 100644 --- a/src/backend/utils/adt/dbsize.c +++ b/src/backend/utils/adt/dbsize.c @@ -850,9 +850,13 @@ pg_relation_is_scannable(PG_FUNCTION_ARGS) bool result; relid = PG_GETARG_OID(0); - relation = RelationIdGetRelation(relid); + relation = try_relation_open(relid, AccessShareLock); + + if (relation == NULL) + PG_RETURN_BOOL(false); + result = relation->rd_isscannable; - RelationClose(relation); + relation_close(relation, AccessShareLock); PG_RETURN_BOOL(result); } I think the call in ExecCheckRelationsScannable() is safe because it comes after the tables are all already locked. I put it there so that the appropriate lock strength should be used based on the whether it was locked by ExecInitNode() or something before it. Am I missing something? Can I not count on the lock being held at that point? Would the right level of API here be relation_open() with NoLock rather than RelationIdGetRelation()? Or is there some other call which is more appropriate there? -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page replacement algorithm in buffer cache
Sent from my iPad On 02-Apr-2013, at 23:41, Merlin Moncure wrote: > On Tue, Apr 2, 2013 at 12:50 PM, Atri Sharma wrote: >> On Tue, Apr 2, 2013 at 9:24 PM, Robert Haas wrote: >> >>> One thought I had for fiddling with usage_count is to make it grow >>> additively (x = x + 1) and decay exponentially (x = x >> 1). I'm not >>> sure the idea is any good, but one problem with the current system is >>> that it's pretty trivial for a buffer to accumulate five touches, and >>> after that we lose all memory of what the frequency of access is, so a >>> pages of varying different levels of "hotness" can all have usage >>> count 5. This might allow a little more refinement without letting >>> the time to degrade the usage count get out of control. >> >> This is just off the top of my head, but one possible solution could >> be to quantize the levels of hotness. Specifically, we could >> categorize buffers based on hotness. All buffers start in level 1 and >> usage_count 0. Whichever buffer reaches usage_count of 5, and next >> clock sweep which wants to increment its usage_count(hence taking it >> above 5) sees that, it promotes the buffer to the next level, and >> resets its usage_count to 0. Same logic applies for each level. When >> we decrement usage_count and see that it is zero(for some buffer), if >> it is in a level > 1, we demote the buffer to the next lower level. If >> the buffer is in level 1, it is a potential candidate for replacement. >> >> This will allow us to have a loose idea about the hotness of a page, >> without actually storing the usage_count for a buffer. We can still >> update usage_count without locking, as buffers in high contention >> which miss an update in their usage_count wont be affected by that >> missed update, in accordance with all the discussion upthread. > > how is that different from usage_count itself? usage_count *is* a > measure of hotness. the arbitrary cap at 5 is paranoia to prevent the > already considerable damage that occurs in the situation Andres is > talking about (where everyhing is marked 'hot' so you have to sweep a > lot more). > > also, any added complexity in terms of manipulating usage_count is a > move away from the lockless maintenance I'm proposing. maybe my idea > is a non-starter on that basis alone, but the mechanic should be kept > as simple as possible. the idea to move it to the bgwriter is to > pre-emptively do the work that backends are now doing: try and keep > ahead of the allocations being done so that buffer requests are > satisfied quickly. > I agree, we want to reduce the complexity of usage_count.I was only musing on the point Robert raised, if we want to continue using usage_count and refine it for more accurate tracking of hotness of a buffer. Regards, Atri -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Drastic performance loss in assert-enabled build in HEAD
Tom Lane wrote: > Kevin Grittner writes: >> Tom Lane wrote: >>> So maybe I'm nuts to care about the performance of an >>> assert-enabled backend, but I don't really find a 4X runtime >>> degradation acceptable, even for development work. Does anyone >>> want to fess up to having caused this, or do I need to start >>> tracking down what changed? > >> I checked master HEAD for a dump of regression and got about 4 >> seconds. I checked right after my initial push of matview code >> and got 2.5 seconds. I checked just before that and got 1 >> second. There was some additional pg_dump work for matviews >> after the initial push which may or may not account for the rest >> of the time. > > I poked at this a bit, and eventually found that the performance > differential goes away if I dike out the > pg_relation_is_scannable() call in getTables()'s table-collection > query. What appears to be happening is that those calls cause a > great many more relcache entries to be made than used to happen, > plus many entries are made earlier in the run than they used to > be. Many of those entries have subsidiary memory contexts, which > results in an O(N^2) growth in the time spent in AllocSetCheck, > since postgres.c does MemoryContextCheck(TopMemoryContext) once > per received command, and pg_dump will presumably issue O(N) > commands in an N-table database. > > So one question that brings up is whether we need to dial back > the amount of work done in memory context checking, but I'm loath > to do so in development builds. That code has fingered an awful > lot of bugs. OTOH, if the number of tables in the regression DB > continues to grow, we may have little choice. > > Anyway, the immediate takeaway is that this represents a horribly > expensive way for pg_dump to find out which matviews aren't > scannable. The cheap way for it to find out would be if we had a > boolean flag for that in pg_class. Would you review the bidding > as to why things were done the way they are? Because in general, > having to ask the kernel something is going to suck in any case, > so basing it on the size of the disk file doesn't seem to me to > be basically a good thing. The early versions of the patch had a boolean in pg_class to track this, but I got complaints from Robert and Noah (and possibly others?) that this got too ugly in combination with the support for unlogged matviews, and they suggested the current approach. For an unlogged matview we need to replace the heap (main fork) with the init fork before the database is up and running, so there would need to be some way for that to result in forcing the flag in pg_class. I was attempting to do that when a matview which was flagged as scannable in pg_class was opened, but I gotta agree that it wasn't pretty. Noah suggested a function based on testing the heap to see if it looked like the init fork, and the current state of affairs I my attempt to make it work that way. We could definitely minimize the overhead by only testing this for matviews. It seemed potentially useful for the function to have some purpose for other types of relations, so I was trying to avoid that. Maybe that was a bad idea. > We could alleviate the pain by changing pg_dump's query to > something like > > (case when c.relkind = 'm' > then pg_relation_is_scannable(c.oid) > else false end) Yeah, that's the sort of thing I was thinking of. If we're going to go that way, we may want to touch one or two other spots. > but TBH this feels like bandaging a bad design. So far nobody has been able to suggest a better way to support both unlogged matviews and some way to prevent a matview from being used if it does not contain materialized data for its query from *some* point in time. Suggestions welcome. > Another reason why I don't like this code is that > pg_relation_is_scannable is broken by design: > > relid = PG_GETARG_OID(0); > relation = RelationIdGetRelation(relid); > result = relation->rd_isscannable; > RelationClose(relation); > > You can't do that: if the relcache entry doesn't already exist, > this will try to construct one while not holding any lock on the > relation, which is subject to all sorts of race conditions. Hmm. I think I had that covered earlier but messed up in rearranging to respond to review comments. Will review both new calling locations. > In general, direct calls on RelationIdGetRelation are probably > broken. If valid contexts for use of the function are that limited, it might merit a comment where the function is defined. I'm not sure what a good alternative to this would be. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page replacement algorithm in buffer cache
On Tue, Apr 2, 2013 at 12:50 PM, Atri Sharma wrote: > On Tue, Apr 2, 2013 at 9:24 PM, Robert Haas wrote: > >> One thought I had for fiddling with usage_count is to make it grow >> additively (x = x + 1) and decay exponentially (x = x >> 1). I'm not >> sure the idea is any good, but one problem with the current system is >> that it's pretty trivial for a buffer to accumulate five touches, and >> after that we lose all memory of what the frequency of access is, so a >> pages of varying different levels of "hotness" can all have usage >> count 5. This might allow a little more refinement without letting >> the time to degrade the usage count get out of control. > > This is just off the top of my head, but one possible solution could > be to quantize the levels of hotness. Specifically, we could > categorize buffers based on hotness. All buffers start in level 1 and > usage_count 0. Whichever buffer reaches usage_count of 5, and next > clock sweep which wants to increment its usage_count(hence taking it > above 5) sees that, it promotes the buffer to the next level, and > resets its usage_count to 0. Same logic applies for each level. When > we decrement usage_count and see that it is zero(for some buffer), if > it is in a level > 1, we demote the buffer to the next lower level. If > the buffer is in level 1, it is a potential candidate for replacement. > > This will allow us to have a loose idea about the hotness of a page, > without actually storing the usage_count for a buffer. We can still > update usage_count without locking, as buffers in high contention > which miss an update in their usage_count wont be affected by that > missed update, in accordance with all the discussion upthread. how is that different from usage_count itself? usage_count *is* a measure of hotness. the arbitrary cap at 5 is paranoia to prevent the already considerable damage that occurs in the situation Andres is talking about (where everyhing is marked 'hot' so you have to sweep a lot more). also, any added complexity in terms of manipulating usage_count is a move away from the lockless maintenance I'm proposing. maybe my idea is a non-starter on that basis alone, but the mechanic should be kept as simple as possible. the idea to move it to the bgwriter is to pre-emptively do the work that backends are now doing: try and keep ahead of the allocations being done so that buffer requests are satisfied quickly. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page replacement algorithm in buffer cache
On Tue, Apr 2, 2013 at 9:24 PM, Robert Haas wrote: > One thought I had for fiddling with usage_count is to make it grow > additively (x = x + 1) and decay exponentially (x = x >> 1). I'm not > sure the idea is any good, but one problem with the current system is > that it's pretty trivial for a buffer to accumulate five touches, and > after that we lose all memory of what the frequency of access is, so a > pages of varying different levels of "hotness" can all have usage > count 5. This might allow a little more refinement without letting > the time to degrade the usage count get out of control. This is just off the top of my head, but one possible solution could be to quantize the levels of hotness. Specifically, we could categorize buffers based on hotness. All buffers start in level 1 and usage_count 0. Whichever buffer reaches usage_count of 5, and next clock sweep which wants to increment its usage_count(hence taking it above 5) sees that, it promotes the buffer to the next level, and resets its usage_count to 0. Same logic applies for each level. When we decrement usage_count and see that it is zero(for some buffer), if it is in a level > 1, we demote the buffer to the next lower level. If the buffer is in level 1, it is a potential candidate for replacement. This will allow us to have a loose idea about the hotness of a page, without actually storing the usage_count for a buffer. We can still update usage_count without locking, as buffers in high contention which miss an update in their usage_count wont be affected by that missed update, in accordance with all the discussion upthread. Thoughts/Comments? Regards, Atri -- Regards, Atri l'apprenant -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] citext like searches using index
On Apr 2, 2013, at 10:16 AM, Tom Lane wrote: >> Is this knowledge encapsulated in a to-do? > > I added an item to the "Indexes" section of the TODO page. Great, thanks. David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page replacement algorithm in buffer cache
On 2013-04-02 18:26:23 +0100, Greg Stark wrote: > I'm confused by this thread. We *used* to maintain an LRU. The whole > reason for the clock-sweep algorithm is precisely to avoid maintaining > a linked list of least recently used buffers since the head of that > list is a point of contention. I don't think anybody is proposing to put the LRU back into a linked list, given the frequency of usagecount manipulations that would probably end pretty badly. What I think Robert, Tom and I are talking are talking about is putting *some* buffers with usagecount = 0 into a linked list so that when a backend requires a new page it can take one buffer from the freelist instead of a) possibly touching quite some (I have seen 4 times *every* existing header) pages to find one with usagecount = 0 b) having to write the page out itself If everything is going well that would mean only the bgwritter (or if bgfreelist or whatever) performs the clock sweep. Others take *new* pages from the freelist instead of performing part of the sweep themselves. Makes more sense? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page replacement algorithm in buffer cache
I'm confused by this thread. We *used* to maintain an LRU. The whole reason for the clock-sweep algorithm is precisely to avoid maintaining a linked list of least recently used buffers since the head of that list is a point of contention. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] citext like searches using index
"David E. Wheeler" writes: > On Apr 2, 2013, at 8:03 AM, Tom Lane wrote: >>> Are there any widely known non-built-in cases besides citext? >> Well, indxpath.c knows about text LIKE and network subset operators, >> and it would be nice if it knew how to do the same type of optimization >> for range inclusion, ie btree_indexed_col <@ range_constant. The latter >> doesn't seem implementable in the current infrastructure because ranges >> aren't all built-in types. I agree that the citext case isn't too >> compelling in isolation, but surely the range case is interesting. On further reflection, I withdraw the claim that range-inclusion couldn't be implemented in the current design. Although the various range types might not be built-in, the "anyelement <@ anyrange" operator *is* built-in, so its OID could be added to the switch statements in indxpath.c. I don't think it'd be terribly difficult to then add datatype-agnostic code to pry apart the range value and construct a derived "btree_indexed_col >= lowbound AND btree_indexed_col <= highbound" indexclause. But this could not be extended to citext or other plugin extensions, because their operators don't have hard-wired OIDs. Anyway, even if the specific claim about ranges is bogus, I still think there are enough data points to justify the idea that a more extensible mechanism would be worth having. At the same time, there's a reason why it's not yet got to the top of anyone's priority list. > Is this knowledge encapsulated in a to-do? I added an item to the "Indexes" section of the TODO page. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page replacement algorithm in buffer cache
On 2013-04-02 12:56:56 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2013-04-02 12:22:03 -0400, Tom Lane wrote: > >> I agree in general, though I'm not sure the bgwriter process can > >> reasonably handle this need along with what it's already supposed to be > >> doing. We may need another background process that is just responsible > >> for keeping the freelist populated. > > > What else is the bgwriter actually doing otherwise? Sure, it doesn't put the > > pages on the freelist, but otherwise its trying to do the above isn't it? > > I think it will be problematic to tie buffer-undirtying to putting both > clean and dirty buffers into the freelist. It might chance to work all > right to use one scan process for both, but I'm afraid it's more likely > that we'd end up either overserving one goal or underserving the other. Hm. I had imagined that we would only ever put clean buffers into the freelist and that we would never write out a buffer that we don't need for a new page. I don't see much point in randomly writing out buffers that won't be needed for something else soon. Currently we can't do much better than basically undirtying random buffers since we don't really know which page will reach a usagecount of zero since bgwriter doesn't manipulate usagecounts. One other scenario I can see is the problem of strategy buffer reusage of dirtied pages (hint bits, pruning) during seqscans where we would benefit from pages being written out fast, but I can't imagine that that could be handled very well by something like the bgwriter? Am I missing something? > Also note the entire design of BgBufferSync right now is predicated on > the assumption that the rate of motion of the scan strategy point > reflects the rate at which new buffers are needed. If backends are > supposed to always get new buffers from the freelist, that logic becomes > circular: the bgwriter would be watching itself. Perhaps we can > refactor the feedback control loop logic so that the buffer scan rate is > driven by changes in the length of the freelist, but I'm not sure > exactly what the consequences would be. Yea, that will definitely need refactoring. What I am imagining is that that pacing is basically built ontop of a few StragetyControl variables like: * number of buffers from the freelist * number of buffers acquired by backend because freelist was empty * number of buffers written out by backend because freelist was empty Those should be pretty cheap to maintain and should be enough to implement sensible pacing for bgwriter. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page replacement algorithm in buffer cache
Andres Freund writes: > On 2013-04-02 12:22:03 -0400, Tom Lane wrote: >> I agree in general, though I'm not sure the bgwriter process can >> reasonably handle this need along with what it's already supposed to be >> doing. We may need another background process that is just responsible >> for keeping the freelist populated. > What else is the bgwriter actually doing otherwise? Sure, it doesn't put the > pages on the freelist, but otherwise its trying to do the above isn't it? I think it will be problematic to tie buffer-undirtying to putting both clean and dirty buffers into the freelist. It might chance to work all right to use one scan process for both, but I'm afraid it's more likely that we'd end up either overserving one goal or underserving the other. Also note the entire design of BgBufferSync right now is predicated on the assumption that the rate of motion of the scan strategy point reflects the rate at which new buffers are needed. If backends are supposed to always get new buffers from the freelist, that logic becomes circular: the bgwriter would be watching itself. Perhaps we can refactor the feedback control loop logic so that the buffer scan rate is driven by changes in the length of the freelist, but I'm not sure exactly what the consequences would be. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page replacement algorithm in buffer cache
On 2013-04-02 12:22:03 -0400, Tom Lane wrote: > Robert Haas writes: > > But, having said that, I still think the best idea is what Andres > > proposed, which pretty much matches my own thoughts: the bgwriter > > needs to populate the free list, so that buffer allocations don't have > > to wait for linear scans of the buffer array. That's just plain too > > slow. > > I agree in general, though I'm not sure the bgwriter process can > reasonably handle this need along with what it's already supposed to be > doing. We may need another background process that is just responsible > for keeping the freelist populated. What else is the bgwriter actually doing otherwise? Sure, it doesn't put the pages on the freelist, but otherwise its trying to do the above isn't it? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page replacement algorithm in buffer cache
On 2013-04-02 11:54:32 -0400, Robert Haas wrote: > On Tue, Apr 2, 2013 at 11:32 AM, Merlin Moncure wrote: > > That's a very fair point, although not being able to evict pinned > > buffers is a highly mitigating aspect. Also CLOG is a different beast > > entirely -- it's much more dense (2 bits!) vs a tuple so a single page > > can a lot of high priority things. But you could be right anyways. > > > > Given that, I wouldn't feel very comfortable with forced eviction > > without knowing for sure high priority buffers were immune from that. > > Your nailing idea is maybe the ideal solution. Messing around with > > the usage_count mechanic is tempting (like raising the cap and making > > the sweeper more aggressive as it iterates), but probably really > > difficult to get right, and, hopefully, ultimately moot. > > One thought I had for fiddling with usage_count is to make it grow > additively (x = x + 1) and decay exponentially (x = x >> 1). I'm not > sure the idea is any good, but one problem with the current system is > that it's pretty trivial for a buffer to accumulate five touches, and > after that we lose all memory of what the frequency of access is, so a > pages of varying different levels of "hotness" can all have usage > count 5. This might allow a little more refinement without letting > the time to degrade the usage count get out of control. > > But, having said that, I still think the best idea is what Andres > proposed, which pretty much matches my own thoughts: the bgwriter > needs to populate the free list, so that buffer allocations don't have > to wait for linear scans of the buffer array. That's just plain too > slow. That way the usagecount should go down far more slowly which essentially makes it more granular. And I think fiddling on that level before we have a more sensible buffer acquiration implementation is pretty premature since that will change too much. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page replacement algorithm in buffer cache
Robert Haas writes: > But, having said that, I still think the best idea is what Andres > proposed, which pretty much matches my own thoughts: the bgwriter > needs to populate the free list, so that buffer allocations don't have > to wait for linear scans of the buffer array. That's just plain too > slow. I agree in general, though I'm not sure the bgwriter process can reasonably handle this need along with what it's already supposed to be doing. We may need another background process that is just responsible for keeping the freelist populated. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review]
I'm going to ignore most of the discussion that led up to this and give this patch a fresh look. + + SET PERSISTENT max_connections To 10; + The To should probably be capitalized. I doubt this example actually works because changing max_connections requires a restart. Try to find a better example. It's weird that SET LOCAL and SET SESSION actually *set* the value, and the second key word determines how long the setting will last. SET PERSISTENT doesn't actually set the value. I predict that this will be a new favorite help-it-doesn't-work FAQ. SCHEMA ! SET [ PERSISTENT ] SCHEMA 'value' is an alias for SET search_path TO value. Only one schema can be specified using this syntax. I don't think [ PERSISTENT ] needs to be added in these and similar snippets. We don't mention LOCAL etc. here either. --- 34,41 postgresql.conf, pg_hba.conf, and pg_ident.conf are traditionally stored in PGDATA (although in PostgreSQL 8.0 and ! later, it is possible to place them elsewhere). By default the directory config is stored ! in PGDATA, however it should be kept along with postgresql.conf. This chunk doesn't make any sense to me. "Should" is always tricky. Why should I, and why might I not? + config + Subdirectory containing automatically generated configuration files + + + base Subdirectory containing per-database subdirectories Only automatically generated ones? COPY_STRING_FIELD(name); COPY_NODE_FIELD(args); COPY_SCALAR_FIELD(is_local); + COPY_SCALAR_FIELD(is_persistent); return newnode; } I suggest changing is_local into a new trivalued field that stores LOCAL or SESSION or PERSISTENT. n->is_local = false; $$ = (Node *) n; } + | SET PERSISTENT set_persistent + { + VariableSetStmt *n = $3; + n->is_persistent = true; + $$ = (Node *) n; + } ; set_rest: Why can't you use SET PERSISTENT set_rest? *** a/src/backend/replication/basebackup.c --- b/src/backend/replication/basebackup.c *** *** 755,760 sendDir(char *path, int basepathlen, bool sizeonly) --- 755,766 strlen(PG_TEMP_FILE_PREFIX)) == 0) continue; + /* skip auto conf temporary file */ + if (strncmp(de->d_name, + PG_AUTOCONF_FILENAME ".", + sizeof(PG_AUTOCONF_FILENAME)) == 0) + continue; + Maybe pg_basebackup should be taught to ignore certain kinds of temporary files in general. The file name shouldn't be hardcoded into pg_basebackup. This would effectively make the configuration file naming scheme part of the replication protocol. See other thread about pg_basebackup client/server compatibility. This needs to be generalized. (Side thought: Does pg_basebackup copy editor temp and backup files?) + ereport(elevel, + (errmsg("Configuration parameters changed with SET PERSISTENT command before start of server " + "will not be effective as \"%s\" file is not accessible.", PG_AUTOCONF_FILENAME))); I'm having trouble interpreting this: How can you change something with SET PERSISTENT before the server starts? Also: errmsg messages should start with a lowercase letter and should generally be much shorter. Please review other cases in your patch as well. + appendStringInfoString(&buf, "# Do not edit this file manually! " + "It is overwritten by the SET PERSISTENT command \n"); There is some punctuation or something missing at the end. I suggest you break this into two lines. It's pretty long. I think the naming of these files is suboptimal: + #define PG_AUTOCONF_DIR "config" "config" would seem to include pg_hba.conf and others. Or postgresql.conf for that matter. Maybe I should move postgresql.conf into config/. Another great new FAQ. The normal convention for this is "postgresql.conf.d". I don't see any reason not to use that. One reason that was brought up is that this doesn't match other things currently in PGDATA, but (a) actually it does (hint: postgresql.conf), and (b) neither does "config". + #define PG_AUTOCONF_FILENAME "persistent.auto.conf" "auto" sounds like the result of an automatic tuning tool. Why not just persistent.conf. Or set-persistent.conf. That makes the association clear enough. I don't know why the concept of a configuration directory is being introduced here. I mean, I'm all for that, but it seems completely independent of this. Some people
Re: [HACKERS] in-catalog Extension Scripts and Control parameters (templates?)
Heikki Linnakangas writes: > I'm quite worried about the security ramifications of this patch. Today, if > you're not sure if a system has e.g sslinfo installed, you can safely just > run "CREATE EXTENSION sslinfo". With this patch, that's no longer true, > because "foo" might not be the extension you're looking for. Mallory With the attached patch, you can't create a template for an extension that is already available, so to protect against your scenario you only have to make "sslinfo" available. Please also note that when actually installing the "sslinfo" extension, the one from the system will get prefered over the one from the templates, should you have both available. Now, I can see why you would still think it's not enough. Baring better ideas, the current patch restricts the feature to superusers. > Documentation doesn't build, multiple errors. In addition to the reference > pages, there should be a section in the main docs about these templates. I'm now working on that, setting up the documentation tool set. >> postgres=# create template for extension myextension version '1.0' with () >> as 'foobar'; >> CREATE TEMPLATE FOR EXTENSION >> postgres=# create extension myextension; >> ERROR: syntax error at or near "foobar" >> LINE 1: create extension myextension; >> ^ > > Confusing error message. Introducing a syntax error in hstore--1.1.sql leads to the same message. Even if we agree that it must be changed, I think that's for another patch. ~# create extension hstore; ERROR: syntax error at or near "foobar" at character 115 STATEMENT: create extension hstore; ERROR: syntax error at or near "foobar" >> postgres=# create template for extension myextension version '1.0' with () >> as $$create table foobar(i int4) $$; >> CREATE TEMPLATE FOR EXTENSION >> postgres=# create extension myextension; >> CREATE EXTENSION >> postgres=# select * from foobar; >> ERROR: relation "foobar" does not exist >> LINE 1: select * from foobar; >> ^ > > Where did that table go? The control->schema was not properly initialized when not given by the user. That's fixed, and I added a regression test. > Ah, here... Where did that "1.0" schema come from? Fixed too, was the same bug. >> postgres=> create template for extension myextension version '1.0' with >> (schema public) as $$ create function evilfunc() returns int4 AS >> evilfunc' language internal; $$; >> CREATE TEMPLATE FOR EXTENSION >> postgres=> create extension myextension version '1.0';ERROR: permission >> denied for language internal >> postgres=> drop template for extension myextension version '1.0'; >> ERROR: extension with OID 16440 does not exist > > Something wrong with catalog caching. Works for me, I couldn't reproduce the bug here, working from Álvaro's version 4 of the patch. Maybe he did already fix it, and you tested my version 3? >> $ make -s install >> /usr/bin/install: cannot stat `./hstore--1.0.sql': No such file or directory >> make: *** [install] Error 1 > > Installing hstore fails. Fixed in the attached. Seeing that makes me think that you actually used Álvaro's version 4, though. >> postgres=> create template for extension sslinfo version '1.0' with >> (schema public) as $$ create function evilfunc() returns int4 AS >> evilfunc' language internal; $$; >> ERROR: extension "sslinfo" is already available Expected. >> postgres=> create template for extension sslinfo2 version '1.0' with >> (schema public) as $$ create function evilfunc() returns int4 AS >> evilfunc' language internal; $$; >> CREATE TEMPLATE FOR EXTENSION >> postgres=> alter template for extension sslinfo2 rename to sslinfo; >> ALTER TEMPLATE FOR EXTENSION > > If we check for an existing extension at CREATE, should also check for that > in ALTER ... RENAME TO. Indeed, good catch. Fixed in the attached version 5 of the patch. I didn't add a regression test for that case, because we need to know which extensions are available when we try to obtain this error, and I don't know how to do that. We could create a template for the extension foobar, then foobar2, then rename foobar2 to foobar, but that wouldn't exercise the same code path. Thanks again for your reviewing, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support templates.v5.patch.gz Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Spin Lock sleep resolution
On Monday, April 1, 2013, Tom Lane wrote: > Jeff Janes writes: > > The problem is that the state is maintained only to an integer number of > > milliseconds starting at 1, so it can take a number of attempts for the > > random increment to jump from 1 to 2, and then from 2 to 3. > > Hm ... fair point, if you assume that the underlying OS has a sleep > resolution finer than 1ms. Otherwise it would not matter. Let's say you get a long stretch of increments that are all a ratio of <1.5 fold, for simplicity let's say they are all 1.3 fold. When you do intermediate truncations of the state variable, it never progresses at all. perl -le '$foo=1; foreach (1..10) {$foo*=1.3; print int $foo}' perl -le '$foo=1; foreach (1..10) {$foo*=1.3; $foo=int $foo; print int $foo}' Obviously the true stochastic case is not quite that stark. > No patch seen here ... > Sorry. I triple checked that the patch was there, but it seems like if you save a draft with an attachment, when you come back later to finish and send it, the attachment may not be there anymore. The Gmail Offline teams still has a ways to go. Hopefully it is actually there this time. Cheers, Jeff spin_delay_microsecond_v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] citext like searches using index
On Apr 2, 2013, at 8:03 AM, Tom Lane wrote: >> Are there any widely known non-built-in cases besides citext? > > Well, indxpath.c knows about text LIKE and network subset operators, > and it would be nice if it knew how to do the same type of optimization > for range inclusion, ie btree_indexed_col <@ range_constant. The latter > doesn't seem implementable in the current infrastructure because ranges > aren't all built-in types. I agree that the citext case isn't too > compelling in isolation, but surely the range case is interesting. Is this knowledge encapsulated in a to-do? David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page replacement algorithm in buffer cache
On Tue, Apr 2, 2013 at 11:32 AM, Merlin Moncure wrote: > That's a very fair point, although not being able to evict pinned > buffers is a highly mitigating aspect. Also CLOG is a different beast > entirely -- it's much more dense (2 bits!) vs a tuple so a single page > can a lot of high priority things. But you could be right anyways. > > Given that, I wouldn't feel very comfortable with forced eviction > without knowing for sure high priority buffers were immune from that. > Your nailing idea is maybe the ideal solution. Messing around with > the usage_count mechanic is tempting (like raising the cap and making > the sweeper more aggressive as it iterates), but probably really > difficult to get right, and, hopefully, ultimately moot. One thought I had for fiddling with usage_count is to make it grow additively (x = x + 1) and decay exponentially (x = x >> 1). I'm not sure the idea is any good, but one problem with the current system is that it's pretty trivial for a buffer to accumulate five touches, and after that we lose all memory of what the frequency of access is, so a pages of varying different levels of "hotness" can all have usage count 5. This might allow a little more refinement without letting the time to degrade the usage count get out of control. But, having said that, I still think the best idea is what Andres proposed, which pretty much matches my own thoughts: the bgwriter needs to populate the free list, so that buffer allocations don't have to wait for linear scans of the buffer array. That's just plain too slow. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page replacement algorithm in buffer cache
On Tue, Apr 2, 2013 at 9:55 AM, Robert Haas wrote: > On Tue, Apr 2, 2013 at 1:53 AM, Merlin Moncure wrote: >> That seems pretty unlikely because of A sheer luck of hitting that >> page for the dropout (if your buffer count is N the chances of losing >> it would seem to be 1/N at most) and B highly used pages are much more >> likely to be pinned and thus immune from eviction. But my issue with >> this whole line of analysis is that I've never been able to to turn it >> up in simulated testing. Probably to do it you'd need very very fast >> storage. > > Well, if you have shared_buffers=8GB, that's a million buffers. One > in a million events happen pretty frequently on a heavily loaded > server, which, on recent versions of PostgreSQL, can support several > hundred thousand queries per second, each of which accesses multiple > buffers. > > I've definitely seen evidence that poor choices of which CLOG buffer > to evict can result in a noticeable system-wide stall while everyone > waits for it to be read back in. I don't have any similar evidence > for shared buffers, but I wouldn't be very surprised if the same > danger exists there, too. That's a very fair point, although not being able to evict pinned buffers is a highly mitigating aspect. Also CLOG is a different beast entirely -- it's much more dense (2 bits!) vs a tuple so a single page can a lot of high priority things. But you could be right anyways. Given that, I wouldn't feel very comfortable with forced eviction without knowing for sure high priority buffers were immune from that. Your nailing idea is maybe the ideal solution. Messing around with the usage_count mechanic is tempting (like raising the cap and making the sweeper more aggressive as it iterates), but probably really difficult to get right, and, hopefully, ultimately moot. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] citext like searches using index
Peter Eisentraut writes: > On 4/2/13 10:26 AM, Tom Lane wrote: >> The issue with the LIKE special case is that left-anchored patterns >> are (to some extent) indexable with ordinary btree indexes, and so we >> want to exploit that rather than tell people they have to have a whole >> other index. > In practice, you need an index specifically for pattern matching anyway. Well, you do if you want to search for general patterns, but there's been enough interest in the LIKE optimization as-is to make me think we could not get away with removing it. >> So it's a pretty special case, but there are just enough instances of it >> to wish for some not-so-hard-wired way to deal with it. > Are there any widely known non-built-in cases besides citext? Well, indxpath.c knows about text LIKE and network subset operators, and it would be nice if it knew how to do the same type of optimization for range inclusion, ie btree_indexed_col <@ range_constant. The latter doesn't seem implementable in the current infrastructure because ranges aren't all built-in types. I agree that the citext case isn't too compelling in isolation, but surely the range case is interesting. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page replacement algorithm in buffer cache
On Tue, Apr 2, 2013 at 1:53 AM, Merlin Moncure wrote: > That seems pretty unlikely because of A sheer luck of hitting that > page for the dropout (if your buffer count is N the chances of losing > it would seem to be 1/N at most) and B highly used pages are much more > likely to be pinned and thus immune from eviction. But my issue with > this whole line of analysis is that I've never been able to to turn it > up in simulated testing. Probably to do it you'd need very very fast > storage. Well, if you have shared_buffers=8GB, that's a million buffers. One in a million events happen pretty frequently on a heavily loaded server, which, on recent versions of PostgreSQL, can support several hundred thousand queries per second, each of which accesses multiple buffers. I've definitely seen evidence that poor choices of which CLOG buffer to evict can result in a noticeable system-wide stall while everyone waits for it to be read back in. I don't have any similar evidence for shared buffers, but I wouldn't be very surprised if the same danger exists there, too. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page replacement algorithm in buffer cache
On Tue, Apr 2, 2013 at 6:32 AM, Andres Freund wrote: > On 2013-04-01 17:56:19 -0500, Jim Nasby wrote: >> On 3/23/13 7:41 AM, Ants Aasma wrote: >> >Yes, having bgwriter do the actual cleaning up seems like a good idea. >> >The whole bgwriter infrastructure will need some serious tuning. There >> >are many things that could be shifted to background if we knew it >> >could keep up, like hint bit setting on dirty buffers being flushed >> >out. But again, we have the issue of having good tests to see where >> >the changes hurt. >> >> I think at some point we need to stop depending on just bgwriter for all >> this stuff. I believe it would be much cleaner if we had separate procs for >> everything we needed (although some synergies might exist; if we wanted to >> set hint bits during write then bgwriter *is* the logical place to put that). >> >> In this case, I don't think keeping stuff on the free list is close enough >> to checkpoints that we'd want bgwriter to handle both. At most we might want >> them to pass some metrics back in forth. > > bgwriter isn't doing checkpoints anymore, there's the checkpointer since 9.2. > > In my personal experience and measurement bgwriter is pretty close to > useless right now. I think - pretty similar to what Amit has done - it > should perform part of a real clock sweep instead of just looking ahead > of the current position without changing usagecounts and the sweep > position and put enough buffers on the freelist to sustain the need till > its next activity phase. I hacked around that one night in a hotel and > got impressive speedups (and quite some breakage) for bigger than s_b > workloads. > > That would reduce quite a bit of pain points: > - fewer different processes/cpus looking at buffer headers ahead in the cycle > - fewer cpus changing usagecounts > - dirty pages are far more likely to be flushed out already when a new > page is needed > - stuff like the relation extension lock which right now frequently have > to search far and wide for new pages while holding the extension lock > exlusively should finish quite a bit faster > > If the freelist lock is separated from the lock protecting the clock > sweep this should get quite a bit of a scalability boost without having > potential unfairness you can have with partitioning the lock or such. I agree. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] citext like searches using index
On 4/2/13 10:26 AM, Tom Lane wrote: > The issue with the LIKE special case is that left-anchored patterns > are (to some extent) indexable with ordinary btree indexes, and so we > want to exploit that rather than tell people they have to have a whole > other index. In practice, you need an index specifically for pattern matching anyway. The difference would be that instead of using a different operator class that has no recorded association with the original operator, you'd use a different access method that is associated with the operator in normal ways. > So it's a pretty special case, but there are just enough instances of it > to wish for some not-so-hard-wired way to deal with it. Are there any widely known non-built-in cases besides citext? Presumably the reason you use citext is that you use a lot of letters. So I kind of doubt that there are going to be a lot of cases of pattern searches on citext with left-anchored patterns starting with (a sufficient number of) non-letters. It's possible, of course, but doesn't seem important enough by itself. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] citext like searches using index
Peter Eisentraut writes: > On 3/30/13 11:35 PM, Tom Lane wrote: >> The LIKE index optimization is hard-wired into >> match_special_index_operator(), which never heard of citext's ~~ >> operators. >> >> I've wanted for years to replace that mechanism with something that >> would support plug-in extensions, but have no very good idea how to >> do it. > I have been thinking there should be a GiST index that associates with > the pattern matching operators. I haven't worked out the details, but > at least this seems it might be the right framework to solve this problem. No, not really. You can already build things like that with GiST, see pg_trgm (which already does LIKE and will soon be able to do regexes). The issue with the LIKE special case is that left-anchored patterns are (to some extent) indexable with ordinary btree indexes, and so we want to exploit that rather than tell people they have to have a whole other index. It doesn't make sense IMO to try to mark ~~ as a btree operator, because btree doesn't really have any ability for operator-specific code to do what would have to be done. What does make sense is for the planner to understand how to extract a btree-indexable clause out of what's in the query, as that (a) keeps the complexity out of the run-time machinery, and (b) provides an opportunity for the planner to estimate whether the whole thing is worth the trouble or not, which it frequently isn't in LIKE cases. So it's a pretty special case, but there are just enough instances of it to wish for some not-so-hard-wired way to deal with it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] citext like searches using index
On 3/30/13 11:35 PM, Tom Lane wrote: > The LIKE index optimization is hard-wired into > match_special_index_operator(), which never heard of citext's ~~ > operators. > > I've wanted for years to replace that mechanism with something that > would support plug-in extensions, but have no very good idea how to > do it. I have been thinking there should be a GiST index that associates with the pattern matching operators. I haven't worked out the details, but at least this seems it might be the right framework to solve this problem. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page replacement algorithm in buffer cache
On 2013-04-01 17:56:19 -0500, Jim Nasby wrote: > On 3/23/13 7:41 AM, Ants Aasma wrote: > >Yes, having bgwriter do the actual cleaning up seems like a good idea. > >The whole bgwriter infrastructure will need some serious tuning. There > >are many things that could be shifted to background if we knew it > >could keep up, like hint bit setting on dirty buffers being flushed > >out. But again, we have the issue of having good tests to see where > >the changes hurt. > > I think at some point we need to stop depending on just bgwriter for all this > stuff. I believe it would be much cleaner if we had separate procs for > everything we needed (although some synergies might exist; if we wanted to > set hint bits during write then bgwriter *is* the logical place to put that). > > In this case, I don't think keeping stuff on the free list is close enough to > checkpoints that we'd want bgwriter to handle both. At most we might want > them to pass some metrics back in forth. bgwriter isn't doing checkpoints anymore, there's the checkpointer since 9.2. In my personal experience and measurement bgwriter is pretty close to useless right now. I think - pretty similar to what Amit has done - it should perform part of a real clock sweep instead of just looking ahead of the current position without changing usagecounts and the sweep position and put enough buffers on the freelist to sustain the need till its next activity phase. I hacked around that one night in a hotel and got impressive speedups (and quite some breakage) for bigger than s_b workloads. That would reduce quite a bit of pain points: - fewer different processes/cpus looking at buffer headers ahead in the cycle - fewer cpus changing usagecounts - dirty pages are far more likely to be flushed out already when a new page is needed - stuff like the relation extension lock which right now frequently have to search far and wide for new pages while holding the extension lock exlusively should finish quite a bit faster If the freelist lock is separated from the lock protecting the clock sweep this should get quite a bit of a scalability boost without having potential unfairness you can have with partitioning the lock or such. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] regression test failed when enabling checksum
On 2013-04-01 19:51:19 -0700, Jeff Janes wrote: > On Mon, Apr 1, 2013 at 10:37 AM, Jeff Janes wrote: > > > On Tue, Mar 26, 2013 at 4:23 PM, Jeff Davis wrote: > > > >> > >> Patch attached. Only brief testing done, so I might have missed > >> something. I will look more closely later. > >> > > > > After applying your patch, I could run the stress test described here: > > > > http://archives.postgresql.org/pgsql-hackers/2012-02/msg01227.php > > > > But altered to make use of initdb -k, of course. > > > > Over 10,000 cycles of crash and recovery, I encountered two cases of > > checksum failures after recovery, example: > > ... > > > > > > Unfortunately I already cleaned up the data directory before noticing the > > problem, so I have nothing to post for forensic analysis. I'll try to > > reproduce the problem. > > > > > I've reproduced the problem, this time in block 74 of relation > base/16384/4931589, and a tarball of the data directory is here: > > https://docs.google.com/file/d/0Bzqrh1SO9FcELS1majlFcTZsR0k/edit?usp=sharing > > (the table is in database jjanes under role jjanes, the binary is commit > 9ad27c215362df436f8c) > > What I would probably really want is the data as it existed after the crash > but before recovery started, but since the postmaster immediately starts > recovery after the crash, I don't know of a good way to capture this. > > I guess one thing to do would be to extract from the WAL the most recent > FPW for block 74 of relation base/16384/4931589 (assuming it hasn't been > recycled already) and see if it matches what is actually in that block of > that data file, but I don't currently know how to do that. Since I bragged somewhere else recently that it should be easy to do now that we have pg_xlogdump I hacked it up so it dumps all the full page writes into the directory specified by --dump-bkp=PATH. It currently overwrites previous full page writes to the same page but that should be trivial to change if you want by adding %X.%X for the lsn into the path sprintf. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services >From 09cdce611dc74082901ca1a646135a5ea1af709c Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Tue, 2 Apr 2013 11:43:23 +0200 Subject: [PATCH] pg_xlogdump: add option for dumping full page writes into a directory --- contrib/pg_xlogdump/pg_xlogdump.c | 56 - 1 file changed, 55 insertions(+), 1 deletion(-) diff --git a/contrib/pg_xlogdump/pg_xlogdump.c b/contrib/pg_xlogdump/pg_xlogdump.c index d6d5498..a5e4186 100644 --- a/contrib/pg_xlogdump/pg_xlogdump.c +++ b/contrib/pg_xlogdump/pg_xlogdump.c @@ -40,6 +40,7 @@ typedef struct XLogDumpConfig bool bkp_details; int stop_after_records; int already_displayed_records; + char *dump_bkp; /* filter options */ int filter_by_rmgr; @@ -373,6 +374,46 @@ XLogDumpDisplayRecord(XLogDumpConfig *config, XLogRecPtr ReadRecPtr, XLogRecord bkpb.block, bkpb.hole_offset, bkpb.hole_length); } } + + if (config->dump_bkp != NULL) + { + int bkpnum; + char *blk = (char *) XLogRecGetData(record) + record->xl_len; + + for (bkpnum = 0; bkpnum < XLR_MAX_BKP_BLOCKS; bkpnum++) + { + BkpBlock bkpb; + char bkpbdata[BLCKSZ]; + int outfd; + char outpath[MAXPGPATH]; + + if (!(XLR_BKP_BLOCK(bkpnum) & record->xl_info)) +continue; + + memcpy(&bkpb, blk, sizeof(BkpBlock)); + blk += sizeof(BkpBlock); + + memcpy(bkpbdata, blk, bkpb.hole_offset); + memset(bkpbdata + bkpb.hole_offset, 0, bkpb.hole_length); + memcpy(bkpbdata + bkpb.hole_offset + bkpb.hole_length, + blk + bkpb.hole_offset, + BLCKSZ - bkpb.hole_offset - bkpb.hole_length); + + blk += BLCKSZ - bkpb.hole_length; + + sprintf(outpath, "%s/%u-%u-%u:%u_%s", config->dump_bkp, + bkpb.node.spcNode, bkpb.node.dbNode, bkpb.node.relNode, + bkpb.block, forkNames[bkpb.fork]); + outfd = open(outpath, O_WRONLY|O_TRUNC|O_CREAT, S_IRUSR|S_IWUSR); + if (outfd < 0) +fatal_error("could not open output file %s: %s", +outpath, strerror(errno)); + if (write(outfd, bkpbdata, BLCKSZ) != BLCKSZ) +fatal_error("could not successfully write output block %s: %s", +outpath, strerror(errno)); + close(outfd); + } + } } static void @@ -387,6 +428,7 @@ usage(void) printf(" -?, --help show this help, then exit\n"); printf("\nContent options:\n"); printf(" -b, --bkp-details output detailed information about backup blocks\n"); + printf(" -d, --dump-bkp=pathdump all full page images into PATH\n"); printf(" -e, --end=RECPTR stop reading at log position RECPTR\n"); printf(" -n, --limit=N number of records to display\n"); printf(" -p, --path=PATHdirectory in which to find log segment files\n"); @@ -413,6 +455,7 @@ main(int argc, char **argv) static struct option l
Re: [HACKERS] regression test failed when enabling checksum
On 2 April 2013 02:53, Jeff Davis wrote: >> Any idea >> what is going on? > > Not right now. Since I'm now responsible for the quality of this patch, I need to say this before someone else does: we have until the end of the week to fix this conclusively, or I will need to consider whether to revoke the patch in this release. Thanks for your time and trouble to locate problems. It may be that the patch is revealing an underlying bug, so we don't yet have evidence for the source of the bug. > My primary suspect is what's going on in > visibilitymap_set() and heap_xlog_visible(), which is more complex than > some of the other code. That would require some VACUUM activity, which > isn't in your workload -- do you think autovacuum may kick in sometimes? Other candidates might be: * page hole is non-empty, so the overwrite of the "Full" page write may leave the block in an intermediate state. Tom recently fixed RestoreBkpBlock() to avoid it zeroing the contents first before applying the page. * something to do with access to GetPageLSN() -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [PATCH] Exorcise "zero-dimensional" arrays (Was: Re: [HACKERS] Should array_length() Return NULL)
On 29/03/13 13:12, Brendan Jurd wrote: On 28 March 2013 20:34, Dean Rasheed wrote: Is the patch also going to allow empty arrays in higher dimensions where not just the last dimension is empty? It doesn't allow that at present. It seems as though, if it's allowing 1-by-0 arrays like '{{}}' and '[4:4][8:7]={{}}', it should also allow 0-by-0 arrays like '[4:3][8:7]={}', and 0-by-3 arrays like '[4:3][11:13]={}'. I think the array literal parser would reject this on the grounds that the brace structure doesn't match the dimension specifiers. You could modify that check to respect zero length in dimensions other than the innermost one, but it's hard to say whether it would be worth the effort. It might be instructive to hear from somebody who does (or intends to) use multidim arrays for some practical purpose, but I don't even know whether such a person exists within earshot of this list. Cheers, BJ Multiple dimension arrays. 'might' be something I may need, or at least usefully use in my current project. I am not ready to do the database design to that detail yet. However, I know I will need to do 'clever'(TM) things with multiple images areas on images, and possibly group several images together (and still access their individual image areas). So potentially arrays of dimension 2 or greater may be useful. Certainly, some images will not have any image areas defined, some arrays might be empty. Cheers, Gavin