Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review]
2013-01-05 05:58 keltezéssel, Amit kapila írta: On Friday, January 04, 2013 10:57 PM Boszormenyi Zoltan wrote: Hi, I am reviewing your patch. Thank you very much. Since you are using a constant string, it would be a little faster to use sizeof(string)-1 as it can be computed at compile-time and not run the strlen() all the time this code is reached. I have reffered the code just above for PG_TEMP_FILE_PREFIX in same function sendDir(). I have that not only that place, but there other places where strlen is used for PG_TEMP_FILE_PREFIX. I think in this path, such optimization might not help much. However if you think we should take care of this, then I can find other places where similar change can be done to make it consistent? Yes, but it needs a separate cleanup patch. Also, since the SET PERSISTENT patch seems to create a single lock file, sizeof(string) (which includes the 0 byte at the end of the string, so it matches the whole filename) can be used instead of strlen(string) or sizeof(string)-1 that match the filename as a prefix only. In create_conf_lock_file(): Can't we add a new LWLock and use it as a critical section instead of waiting for 10 seconds? It would be quite surprising to wait 10 seconds when accidentally executing two SET PERSISTENT statements in parallel. Yes, you are right adding a new LWLock will avoid the use of sleep. However I am just not sure if for only this purpose we should add a new LWLock? Similar logic is used in CreateLockFile() for postmaster file but it doesn't use sleep. How about reducing the time of sleep or removing sleep, in that user might get error and he need to retry to get his command successful? With Regards, Amit Kapila. -- -- Zoltán Böszörményi Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: Proposal for Allow postgresql.conf values to be changed via SQL [review]
On Sat, Jan 05, 2013 at 08:05:11AM +0100, Boszormenyi Zoltan wrote: 2013-01-05 05:58 keltez?ssel, Amit kapila ?rta: On Friday, January 04, 2013 10:57 PM Boszormenyi Zoltan wrote: In create_conf_lock_file(): Can't we add a new LWLock and use it as a critical section instead of waiting for 10 seconds? It would be quite surprising to wait 10 seconds when accidentally executing two SET PERSISTENT statements in parallel. Yes, you are right adding a new LWLock will avoid the use of sleep. However I am just not sure if for only this purpose we should add a new LWLock? Similar logic is used in CreateLockFile() for postmaster file but it doesn't use sleep. How about reducing the time of sleep or removing sleep, in that user might get error and he need to retry to get his command successful? I think removing the loop entirely is better. Agreed. A new LWLock with a narrow purpose is fine. CreateLockFile() happens early, before shared memory is available. Its approach is not a relevant precedent for code that runs later. However, the behaviour should be discussed by a wider audience: - the backends should wait for each other just like other statements that fight for the same object (in which case locking is needed), or - throw an error immediately on conflicting parallel work All other things being equal, the first choice sounds better. (I don't think the decision is essential to the utility of this feature.) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_retainxlog for inclusion in 9.3?
On Fri, Jan 4, 2013 at 7:13 PM, Peter Eisentraut pete...@gmx.net wrote: On 1/3/13 12:30 PM, Robert Haas wrote: On Thu, Jan 3, 2013 at 11:32 AM, Magnus Hagander mag...@hagander.net wrote: Any particular reason? It goes pretty tightly together with pg_receivexlog, which is why I'd prefer putting it alongside that one. But if you have a good argument against it, I can change my mind :) Mostly that it seems like a hack, and I suspect we may come up with a better way to do this in the future. It does seem like a hack. Couldn't this be implemented with a backend switch instead? It definitely is a bit of a hack. I assume by backend switch you mean guc, right? If so, no, not easily so. Because it's the archiver process that does the deleting. And this process does not have access to a full backend interface, e.g. the ability to run a query. We could make it look at the same data that's currently shown in pg_stat_replicatoin through shared memory, but thta would *only* work in the very most simple cases (e.g. a single pg_receivexlog and no other replication). The ability to run a custom SQL query is going to be necessary for anything a bit more advanced. Also, as a small practical matter, since this is a server-side program (since it's being used as archive_command), we shouldn't put it into the pg_basebackup directory, because that would blur the lines about what to install where, in particular for the translations. Good argument. That along with the being a hack, and the comment from Robert, means that maybe contrib/ is a better place for it, yes. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bad examples in pg_dump README
On Fri, Jan 4, 2013 at 3:36 AM, Josh Kupershmidt schmi...@gmail.com wrote: The ./src/bin/pg_dump README contains several inoperable examples. First, this suggestion: | or to list tables: | | pg_restore backup-file --table | less seems bogus, i.e. the --table option requires an argument specifing which table to restore, and does not list tables. Next, this suggested command: | pg_restore backup-file -l --oid --rearrange | less hasn't worked since 7.4 or thereabouts, since we don't have --rearrange or --oid anymore. (I'm not sure we ever had --oid, maybe it was supposed to be --oid-order?). Then, three examples claim we support a --use flag, e.g. | pg_restore backup.bck --use=toc.lis script.sql where presumably --use-list was meant. Further little gripes with this README include mixed use of tabs and spaces for the examples, and lines running over the 80-column limit which at least some of our other READMEs seem to honor. I started out attempting to fix up the README, keeping the original example commands intact, but upon further reflection I think the examples of dumping, restoring, and selective-restoring in that REAMDE don't belong there anyway. There are already better examples of such usage in the pg_dump/pg_restore documentation pages, and IMO that's where such generally-useful usage information belongs. I propose slimming down the pg_dump README, keeping intact the introductory notes and details of the tar format. Do we need to keep it at all, really? Certainly the introductory part is covered in the main documentation already... I believe the tar notes are also out of date. For example, they claim that you can't expect pg_restore to work with an uncompressed tar format - yet the header in pg_backup_tar.c says that an uncompressed tar format is compatible with a directory format dump, and thus you *can* use pg_restore. (And fwiw,the note about the user should probably go in src/port/ now that we moved the tar header creation there a few days ago) I would suggest we just drop the README file completely. I don't think it adds any value at all. Any objections to that path? :) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Make pg_basebackup configure and start standby [Review]
On Thu, Jan 3, 2013 at 1:33 PM, Boszormenyi Zoltan z...@cybertec.at wrote: 2013-01-02 11:54 keltezéssel, Boszormenyi Zoltan írta: 2013-01-02 10:37 keltezéssel, Boszormenyi Zoltan írta: 2013-01-02 10:12 keltezéssel, Magnus Hagander írta: Attached is the quotes-v2 patch, the function is renamed and the comment is modified plus the pg_basebackup v21 patch that uses this function. Rebased after pgtar.h was added. The quotes.c comment was modified even more so it doesn't say this function is used for SQL string literals. Quotes patch applied with a few small changes: 1) Support for the msvc build (needs an entry added for new files that go in src/port if they should be used on Windows) 2) Makefile entries should be alphabetical (yes, that's really trivial nitpicking :D) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposal: Store timestamptz of database creation on pg_database
On Fri, Jan 4, 2013 at 4:07 PM, Peter Eisentraut pete...@gmx.net wrote: On 1/3/13 3:26 PM, Robert Haas wrote: It's true, as we've often said here, that leveraging the OS facilities means that we get the benefit of improving OS facilities for free - but it also means that we never exceed what the OS facilities are able to provide. And that should be the deciding factor, shouldn't it? Clearly, the OS timestamps do not satisfy the requirements of tracking database object creation times. +1 And IMHO we must decide what we do or if we'll don't anything. In this thread was discussed many ways to how to implement and how not implement, so I compile some important points discussed before (sorry if I forgot something): * the original proposal was just to add a column in shared catalog 'pg_database' to track creation time (I already sent a patch [1]), but the discussion going to implement a way to track creation time off all database objects * some people said if we implement that then we must have some way to alter creation times by SQL (ALTER cmds) and also have a way to dump and restore this info by pg_dump/pg_restore. Some agreed and others disagree. * we talk about implement it with EventTriggers, but they not cover shared objects (like databases, roles, tablespaces,...), and someone talked to extend EventTriggers to cover this kind of objects or maybe we have a way to create *shared tables* (this is what I understood). This way force to every people implement your own track time system or maybe someone share a extension to do that. * also we discuss about create two new catalogs, one local and another shared (like pg_description and pg_shdescription) to track creation times of all database objects. Please fix if I forgot something. Anyway, we must decide what to do. I don't know enough, but I have another idea. What you guys think about we have tables like stats tables to track creation times, with a GUC to enable or disable this behavior. Regards, [1] http://archives.postgresql.org/pgsql-hackers/2013-01/msg00111.php -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Blog sobre TI: http://fabriziomello.blogspot.com Perfil Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello
Re: [HACKERS] [PATCH] Make pg_basebackup configure and start standby [Review]
On Sat, Jan 5, 2013 at 3:41 PM, Magnus Hagander mag...@hagander.net wrote: On Thu, Jan 3, 2013 at 1:33 PM, Boszormenyi Zoltan z...@cybertec.at wrote: 2013-01-02 11:54 keltezéssel, Boszormenyi Zoltan írta: 2013-01-02 10:37 keltezéssel, Boszormenyi Zoltan írta: 2013-01-02 10:12 keltezéssel, Magnus Hagander írta: Attached is the quotes-v2 patch, the function is renamed and the comment is modified plus the pg_basebackup v21 patch that uses this function. Rebased after pgtar.h was added. The quotes.c comment was modified even more so it doesn't say this function is used for SQL string literals. Quotes patch applied with a few small changes: 1) Support for the msvc build (needs an entry added for new files that go in src/port if they should be used on Windows) 2) Makefile entries should be alphabetical (yes, that's really trivial nitpicking :D) After some further modifications, I've applied the pg_basebackup patch as well. Thanks! And again, apologies for dragging the process out longer than should've been necessary. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposal: Store timestamptz of database creation on pg_database
* Fabrízio de Royes Mello (fabriziome...@gmail.com) wrote: * also we discuss about create two new catalogs, one local and another shared (like pg_description and pg_shdescription) to track creation times of all database objects. Creating a separate catalog (or two) every time we want to track XYZ for all objects is rather overkill... Thinking about this a bit more, and noting that pg_description/shdescription more-or-less already exist as a framework for tracking 'something' for 'all catalog entries'- why don't we just add these columns to those tables..? This would also address Peter's concern about making sure we do this 'wholesale' and in one release rather than spread across multiple releases- just make sure it covers the same set of things which 'comment' does. Also, I don't think we really need a GUC for this. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] [PATCH] Make pg_basebackup configure and start standby [Review]
2013-01-05 16:58 keltezéssel, Magnus Hagander írta: On Sat, Jan 5, 2013 at 3:41 PM, Magnus Hagander mag...@hagander.net wrote: On Thu, Jan 3, 2013 at 1:33 PM, Boszormenyi Zoltan z...@cybertec.at wrote: 2013-01-02 11:54 keltezéssel, Boszormenyi Zoltan írta: 2013-01-02 10:37 keltezéssel, Boszormenyi Zoltan írta: 2013-01-02 10:12 keltezéssel, Magnus Hagander írta: Attached is the quotes-v2 patch, the function is renamed and the comment is modified plus the pg_basebackup v21 patch that uses this function. Rebased after pgtar.h was added. The quotes.c comment was modified even more so it doesn't say this function is used for SQL string literals. Quotes patch applied with a few small changes: 1) Support for the msvc build (needs an entry added for new files that go in src/port if they should be used on Windows) 2) Makefile entries should be alphabetical (yes, that's really trivial nitpicking :D) After some further modifications, I've applied the pg_basebackup patch as well. Thank you very much! Thanks! And again, apologies for dragging the process out longer than should've been necessary. I blamed it on me not having done reviews earlier in the commitfest, which I finally did last week. Now, if only Tom could find some time to review the lock_timeout patch... ;-) Thanks again and best regards, Zoltán Böszörményi -- -- Zoltán Böszörményi Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] enhanced error fields
Hello 2013/1/4 Robert Haas robertmh...@gmail.com: On Fri, Dec 28, 2012 at 1:21 PM, Peter Geoghegan pe...@2ndquadrant.com wrote: Now, as to the question of whether we need to make sure that everything is always fully qualified - I respectfully disagree with Stephen, and maintain my position set out in the last round of feedback [1], which is that we don't need to fully qualify everything, because *for the purposes of error handling*, which is what I think we should care about, these fields are sufficient: + column_name text, + table_name text, + constraint_name text, + schema_name text, If you use the same constraint name everywhere, you might get the same error message. The situations in which this scheme might fall down are too implausible for me to want to bloat up all those ereport sites any further (something that Stephen also complained about). I think that the major outstanding issues are concerning whether or not I have the API here right. I make explicit guarantees as to the availability of certain fields for certain errcodes (a small number of Class 23 - Integrity Constraint Violation codes). No one has really said anything about that, though I think it's important. I also continue to think that we shouldn't have routine_name, routine_schema and trigger_name fields - I think it's wrong-headed to have an exception handler magically act on knowledge about where the exception came from that has been baked into the exception - is there any sort of precedent for this? Pavel disagrees here. Again, I defer to others. I don't really agree with this. To be honest, my biggest concern about this patch is that it will make it take longer to report an error. At least in the cases where these additional fields are included, it will take CPU time to populate those fields, and maybe there will be some overhead even in the cases where they aren't even used (although I'd expect that to be too little to measure). Now, maybe that doesn't matter in the case where the error is being reported back to the client, because the overhead of shipping packets across even a local socket likely dwarfs the overhead, but I think it matters a lot where you are running a big exception-catching loop. That is already painfully slow, and -1 from me on anything that makes it significantly slower. I have a feeling this isn't the first time I'm ranting about this topic in relation to this patch, so apologies if this is old news. We did these tests independently - Peter and me with same result - in use cases developed for highlighting impact of this patch you can see some slowdown - if I remember well - less than 3% - and it raised exceptions really intensively - and It can be visible only from stored procedures environment - if somebody use it from client side, then impact is zero. But if we decide that there is no performance issue or that we don't care about the hit there, then I think Stephen and Pavel are right to want a large amount of very precise detail. What's the use case for this feature? Presumably, it's for people for whom parsing the error message is not a practical option, so either they textual error message doesn't identify the target object sufficiently precisely, and they want to make sure they know what it applies to; or else it's for people who want any error that applies to a table to be handled the same way (without worrying about exactly which error they have got). Imagine, for example, someone writing a framework that will be used as a basis for many different applications. It might want to do something, like, I don't know, update the comment on a table every time an error involving that table occurs. Clearly, precise identification of the table is a must. In a particular application development environment, it's reasonable to rely on users to name things sensibly, but if you're shipping a tool that might be dropped into any arbitrary environment, that's significantly more dangerous. Similarly, for a function-related error, you'd need something like that looks like the output of pg_proc.oid::regprocedure, or individual fields with the various components of that output. That sounds like routine_name et. al. Probably we don't need all fields mentioned in ANSI SQL, because some from these fields has no sense in pg, but routine name and trigger name is really basic for some little bit more sophisticate work with exception on PL level. Regards Pavel I'm not happy about the idea of shipping OIDs back to the client. OIDs are too user-visible as it is; we should try to make them less not moreso. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] enhanced error fields
2013/1/4 Peter Geoghegan pe...@2ndquadrant.com: On 4 January 2013 18:07, Tom Lane t...@sss.pgh.pa.us wrote: Exactly. To my mind, the *entire* point of this patch is to remove the need for people to try to dig information out of potentially-localized message strings. It's not clear to me that we have to strain to provide information that isn't in the currently-reported messages --- we are only trying to make it easier for client-side code to extract the information it's likely to need. It seems that we're in agreement, then. I'll prepare a version of the patch very similar to the one I previously posted, but with some caveats about how reliably the values can be used. I think that that should be fine. is there agreement of routine_name and trigger_name fields? Regards Pavel -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reporting hba lines
On Fri, Jun 29, 2012 at 4:39 PM, Tom Lane t...@sss.pgh.pa.us wrote: Magnus Hagander mag...@hagander.net writes: Turned out to be a bit more work than I thought, since the current parser reads pg_hba byte by byte, and not line by line. So I had to change that. See attached, seems reasonable? A couple of comments: * In some places you have if ((c = *(*lineptr)++) != '\0') and in other places just if ((c = *(*lineptr)++)). This should be consistent (and personally I prefer the first way). * I'm not sure that this conversion is right: ! if (c != EOF) ! ungetc(c, fp); --- ! if (c != '\0') ! (*lineptr)--; In the file case, it's impossible to push back EOF, and unnecessary since another getc will produce EOF again anyway. In the string case, though, I think you might need to decrement the lineptr unconditionally, else next call will run off the end of the string no? * This bit seems a bit ugly, and not Windows-aware either: ! /* We don't store the trailing newline */ ! if (rawline[strlen(rawline)-1] == '\n') ! rawline[strlen(rawline)-1] = '\0'; ! It might be better to strip trailing \n and \r from the line immediately upon read, rather than here. I re-found this branch that I had completely forgotten about, when cleaning up my reposiroty. oops. Attached is an updated version of the patch, per the comments from Tom and rebased on top of the current master. Since it's been a long time ago, and some code churn in the area, another round of review is probably a good thing... -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ hba_line3.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] git author vs committer
On Thu, Sep 13, 2012 at 9:00 AM, Magnus Hagander mag...@hagander.net wrote: On Thu, Sep 13, 2012 at 5:22 AM, Peter Eisentraut pete...@gmx.net wrote: On Wed, 2012-09-12 at 19:13 +0200, Magnus Hagander wrote: Just to be clear, what you're saying is we want to change the policy that says committer must be on list of approved committers commiter==author to committer must be on list of approved committers author must be on list of approved committers? Yes, that would be my request. Definitely sounds like a reasonable thing to do. So unless there are objections, I'll try to get around to getting that done not too long from now. Sorry, this took much longer than I had initially planned for, but at least it's been done now. Hopefully I didn't break anything else in the process :) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] enhanced error fields
On 5 January 2013 16:56, Pavel Stehule pavel.steh...@gmail.com wrote: It seems that we're in agreement, then. I'll prepare a version of the patch very similar to the one I previously posted, but with some caveats about how reliably the values can be used. I think that that should be fine. is there agreement of routine_name and trigger_name fields? Well, Tom and I are both opposed to including those fields. Peter E seemed to support it in some way, but didn't respond to Tom's criticisms (which were just a restatement of my own). So, it seems to me that we're not going to do that, assuming nothing changes. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposal: Store timestamptz of database creation on pg_database
* Stephen Frost sfr...@snowman.net wrote: * Fabrízio de Royes Mello (fabriziome...@gmail.com) wrote: * also we discuss about create two new catalogs, one local and another shared (like pg_description and pg_shdescription) to track creation times of all database objects. Creating a separate catalog (or two) every time we want to track XYZ for all objects is rather overkill... Thinking about this a bit more, and noting that pg_description/shdescription more-or-less already exist as a framework for tracking 'something' for 'all catalog entries'- why don't we just add these columns to those tables..? But those tables are filled only when we execute COMMENT ON statement... then your idea is create a 'null' comment every time we create a single object... is it? Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Blog sobre TI: http://fabriziomello.blogspot.com Perfil Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello
Re: [HACKERS] enhanced error fields
2013/1/5 Peter Geoghegan pe...@2ndquadrant.com: On 5 January 2013 16:56, Pavel Stehule pavel.steh...@gmail.com wrote: It seems that we're in agreement, then. I'll prepare a version of the patch very similar to the one I previously posted, but with some caveats about how reliably the values can be used. I think that that should be fine. is there agreement of routine_name and trigger_name fields? Well, Tom and I are both opposed to including those fields. Peter E seemed to support it in some way, but didn't respond to Tom's criticisms (which were just a restatement of my own). So, it seems to me that we're not going to do that, assuming nothing changes. if I understand well Robert Haas is for including these fields - so score is still 2:2 - but this is not a match :) I have no more new arguments for these fields - yes, there are no change Pavel -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bad examples in pg_dump README
On Sat, Jan 5, 2013 at 7:34 AM, Magnus Hagander mag...@hagander.net wrote: On Fri, Jan 4, 2013 at 3:36 AM, Josh Kupershmidt schmi...@gmail.com wrote: Do we need to keep it at all, really? Certainly the introductory part is covered in the main documentation already... Pretty much. (I did find note #2 mildly interesting.) I believe the tar notes are also out of date. For example, they claim that you can't expect pg_restore to work with an uncompressed tar format - yet the header in pg_backup_tar.c says that an uncompressed tar format is compatible with a directory format dump, and thus you *can* use pg_restore. (And fwiw,the note about the user should probably go in src/port/ now that we moved the tar header creation there a few days ago) Hrm yeah, so the second paragraph under the tar section can certainly be axed. I would suggest we just drop the README file completely. I don't think it adds any value at all. Any objections to that path? :) I think that's OK, since there's not much left in that README after removing the bogus examples, introductory text that's covered elsewhere, and obsolete second paragraph about the tar format. Perhaps we could keep the other paragraphs about the tar format, either in the header comments for pg_backup_tar.c or in src/port/, though? Oh, and for this comment in pg_backup_tar.c: | * See the headers to pg_backup_files pg_restore for more details. there is no longer a pg_backup_files.c. Josh -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_retainxlog for inclusion in 9.3?
On Tue, Jan 1, 2013 at 10:10 AM, Magnus Hagander mag...@hagander.net wrote: So, it turns out the reason I got no feedback on this tool, was that I forgot both to email about and to actually push the code to github :O So this is actually code that's almost half a year old and that I was supposed to submit for the first or second commitfest. Oops. So, the tool and a README for it right now can be found at https://github.com/mhagander/pg_retainxlog for the time being. The idea behind the tool is to plug a hole in the case when pg_receivexlog is used for log archiving, which is that you still need a blocking archive_command in order to make sure that files aren't recycled on the master. So for 9.2 you can do this with an archive_command that checks if the file has appeared properly on the slave - but that usually means you're back at requiring ssh connectivity between the machines, for example. Even though this information is actually avialable on the master... This can also be of use to pure replication scenarios, where you don't know how to tune wal_keep_segments, but using actual live feedback instead of guessing. When pg_retainxlog is used as an archive_command, it will check the pg_stat_replication view instead of checking the slave. It will then only return ok once the requested logfile has been replicated to the slave. By default it will look for a replication client named pg_receivexlog, but it supports overriding the query with anything - so you can say things like needs to have arrived on at least two replication slaves before we consider it archived. Or if used instead of wal_keep_segmnets, needs to have arrived at *all* replication slaves. Is this a tool that people would like to see included in the general toolchain? If so, I'll reformat it to work in the general build environment and submit it for the last commitfest. (comments on the code itself are of course also welcome) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers +1 to this concept, however it may be implemented. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Invent a one-shot variant of CachedPlans for better performanc
Simon Riggs si...@2ndquadrant.com writes: On 4 January 2013 22:42, Tom Lane t...@sss.pgh.pa.us wrote: Invent a one-shot variant of CachedPlans for better performance. Just a moment of reflection here but I did already invent a one-shot plan concept in a patch submission to 9.2, called exactly that, which enables a variety of optimizations, this being one. If you're speaking of http://archives.postgresql.org/pgsql-hackers/2011-06/msg01168.php that patch was vastly more invasive than this one, with vastly less clear performance characteristics (none of which were documented by test cases). And it had the potential for unexpected user-visible semantics changes; for instance, as submitted it made EXPLAIN produce results different from what might actually happen at execution. I don't think there's any comparison to this patch at all except in the nomenclature. We need a full one-shot concept linking the planner and executor for all sorts of reasons, not just this one. We can discuss the practicality of individual optimizations but the linkage should be clear throughout the whole infrastructure. I thought then, and I think now, that such a concept was too squishy to be useful as an actual guide to what to change. The particular arguments you advanced then have been overtaken by events anyway; notably that Marti Raudsepp's work on caching stable subexpressions at execution seems like a much more general answer to the problem of handling stable functions efficiently. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposal: Store timestamptz of database creation on pg_database
* Fabrízio de Royes Mello (fabriziome...@gmail.com) wrote: But those tables are filled only when we execute COMMENT ON statement... then your idea is create a 'null' comment every time we create a single object... is it? Yes, and have the actual 'description' field (as it's variable) at the end of the catalog. Regarding the semantics of it- I was thinking about how directories and unix files work. Basically, adding or removing a sub-object would update the alter time on the object itself, changing an already existing object or sub-object would update only the object/sub-object's alter time. Creating an object or sub/object would set its create time and alter time to the same value. I would distinguish 'create' from 'ctime', however, and have our 'create' time be only the actual *creation* time of the object. ALTER table OWNER TO user; would update tables alter time. Open to other thoughts on this and perhaps we should create a wiki page to start documentating the semantics. Once we get agreement there, it's just a bit of code. :) Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Cascading replication: should we detect/prevent cycles?
Robert, I'm sure it's possible; I don't *think* it's terribly easy. The usual algorithm for cycle detection is to have each node send to the next node the path that the data has taken. But, there's no unique identifier for each slave that I know of - you could use IP address, but that's not really unique. And, if the WAL passes through an archive, how do you deal with that? Not that I know how to do this, but it seems like a more direct approach is to check whether there's a master anywhere up the line. H. Still sounds fairly difficult. I'm sure somebody could figure all of this stuff out, but it seems fairly complicated for the benefit we'd get. I just don't think this is going to be a terribly common problem; if it turns out I'm wrong, I may revise my opinion. :-) I don't think it'll be that common either. The problem is that when it does happen, it'll be very hard for the hapless sysadmin involved to troubleshoot. To me, it seems that lag monitoring between master and standby is something that anyone running a complex replication configuration should be doing - and yeah, I think anything involving four standbys (or cascading) qualifies as complex. If you're doing that, you should notice pretty quickly that your replication lag is increasing steadily. There are many reasons why replication lag would increase steadily. You might also check pg_stat_replication the master and notice that there are no connections there any more. Well, if you've created a true cycle, every server has one or more replicas. The original case I presented was the most probably cause of accidental cycles: the original master dies, and the on-call sysadmin accidentally connects the first replica to the last replica while trying to recover the cluster. AFAICT, the only way to troubleshoot a cycle is to test every server in the network to see if it's a master and has replicas, and if no server is a master with replicas, it's a cycle. Again, not fast or intuitive. Could someone miss those tell-tale signs? Sure. But they could also set autovacuum_naptime to an hour and then file a support ticket complaining that about table bloat - and they do. Personally, as user screw-ups go, I'd consider that scenario (and its fourteen cousins, twenty-seven second cousins, and three hundred and ninety two other extended family members) as higher-priority and lower effort to fix than this particular thing. I agree that this isn't a particularly high-priority issue. I do think it should go on the TODO list, though, just in case we get a GSOC student or other new contributor who wants to tackle it. --Josh -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cascading replication: should we detect/prevent cycles?
On 21 December 2012 14:08, Robert Haas robertmh...@gmail.com wrote: I'm sure it's possible; I don't *think* it's terribly easy. I'm inclined to agree that this isn't a terribly pressing issue. Certainly, the need to introduce a bunch of new infrastructure to detect this case seems hard to justify. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PERFORM] Slow query: bitmap scan troubles
Jeff Janes jeff.ja...@gmail.com writes: [moved to hackers] On Wednesday, December 5, 2012, Tom Lane wrote: Hm. To tell you the truth, in October I'd completely forgotten about the January patch, and was thinking that the 1/1 cost had a lot of history behind it. But if we never shipped it before 9.2 then of course that idea is false. Perhaps we should backpatch the log curve into 9.2 --- that would reduce the amount of differential between what 9.2 does and what previous branches do for large indexes. I think we should backpatch it for 9.2.3. I've seen another email which is probably due to the same issue (nested loop vs hash join). And some monitoring of a database I am responsible for suggests it might be heading in that direction as well as the size grows. I received an off-list report of a case where not only did the 1/1 factor cause a nestloop-vs-hashjoin decision to be made wrongly, but even adding the ln() computation as in commit bf01e34b556 didn't fix it. I believe the index in question was on the order of 2 pages, so it's not too hard to see why this might be the case: * historical fudge factor 4 * 2/10 = 0.8 * 9.2 fudge factor 4 * 2/1 = 8.0 * with ln() correction 4 * ln(1 + 2/1) = 4.39 or so At this point I'm about ready to not only revert the 10-to-1 change, but keep the ln() adjustment, ie make the calculation be random_page_cost * ln(1 + index_pages/10). This would give essentially the pre-9.2 behavior for indexes up to some tens of thousands of pages, and keep the fudge factor from getting out of control even for very very large indexes. But I am wondering if it should be present at all in 9.3. When it was introduced, the argument seemed to be that smaller indexes might be easier to keep in cache. No. The argument is that if we don't have some such correction, the planner is liable to believe that different-sized indexes have *exactly the same cost*, if a given query would fetch the same number of index entries. This is quite easy to demonstrate when experimenting with partial indexes, in particular - without the fudge factor the planner sees no advantage of a partial index over a full index from which the query would fetch the same number of entries. We do want the planner to pick the partial index if it's usable, and a fudge factor is about the least unprincipled way to make it do so. The argument for increasing the penalty by a factor of 10 was that the smaller one could be swamped by noise such as page-boundary-roundoff behavior. Yeah, I wrote that, but in hindsight it seems like a mistaken idea. The noise problem is that because we round off page count and row count estimates to integers at various places, it's fairly easy for small changes in statistics to move a plan's estimated cost by significantly more than this fudge factor will. However, the case that the fudge factor is meant to fix is indexes that are otherwise identical for the query's purposes --- and any roundoff effects will be the same. (The fudge factor itself is *not* rounded off anywhere, it flows directly to the bottom-line cost for the indexscan.) One thing which depends on the index size which, as far as I can tell, is not currently being counted is the cost of comparing the tuples all the way down the index. This would be proportional to log2(indextuples) * cpu_index_tuple_cost, or maybe log2(indextuples) * (cpu_index_tuple_cost+cpu_operator_cost), or something like that. Yeah, I know. I've experimented repeatedly over the years with trying to account explicitly for index descent costs. But every time, anything that looks even remotely principled turns out to produce an overly large correction that results in bad plan choices. I don't know exactly why this is, but it's true. One other point is that I think it is better for any such correction to depend on the index's total page count, not total tuple count, because otherwise two indexes that are identical except for bloat effects will appear to have identical costs. So from that standpoint, the ln() form of the fudge factor seems quite reasonable as a crude form of index descent cost estimate. The fact that we're needing to dial it down so much reinforces my feeling that descent costs are close to negligible in practice. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] dynamic SQL - possible performance regression in 9.2
On Wednesday, January 2, 2013, Tom Lane wrote: Jeff Janes jeff.ja...@gmail.com writes: Using a RULE-based partitioning instead with row by row insertion, the plancache changes slowed it down by 300%, and this patch doesn't change that. But that seems to be down to the insertion getting planned repeatedly, because it decides the custom plan is cheaper than the generic plan. Whatever savings the custom plan may have are clearly less than the cost of doing the planning repeatedly. That scenario doesn't sound like it has anything to do with the one being discussed in this thread. But what do you mean by rule-based partitioning exactly? A rule per se wouldn't result in a cached plan at all, let alone one with parameters, which would be necessary to trigger any use of the custom-cached-plan code path. Right, it is not related to the dynamic SQL, but is to the plan-cache. Test cases are way more interesting than hand-wavy complaints. Sorry, when exiled to the hinterlands I have more time to test various things but not a good enough connectivity to describe them well. I'm attaching the test case to load 1e5 rows into a very skinny table with 100 partitions using rules. origin is from a few days ago, origin_reduce_copies is Heikki's patch, and origin_one_shot is your now-committed patch. (unshown are e6faf910d75027 and e6faf910d75027_prev, but that is where the regression was introduced) JJ /usr/local/pgsql_REL9_1_7/ Time: 64252.6907920837 ms JJ origin/ Time: 186657.824039459 ms JJ origin_reduce_copies/ Time: 185370.236873627 ms JJ origin_one_shot/ Time: 189104.484081268 ms The root problem is that it thinks the generic plan costs about 50% more than the custom one. I don't know why it thinks that, or how much it is worth chasing it down. On the other hand, your patch does fix almost all of the 9.2.[012] regression of using the following dynamic SQL trigger (instead of RULES) to load into the same test case. CREATE OR REPLACE FUNCTION foo_insert_trigger() RETURNS trigger AS $$ DECLARE tablename varchar(24); BEGIN tablename = 'foo_' || new.partition; EXECUTE 'INSERT INTO '|| tablename ||' VALUES (($1).*)' USING NEW ; RETURN NULL; END; $$ LANGUAGE plpgsql; CREATE TRIGGER foo_insert_trigger BEFORE INSERT ON foo FOR EACH ROW EXECUTE PROCEDURE foo_insert_trigger(); Cheers, Jeff ## The purpose of this program is to load the same files that \copy would ## load into a pgsql table, but do so one row at a time rather than in bulk. ## This could be useful to demonstrate the difference in loading efficiency between bulk ## and row by row, without having to use different formats for each. ## This makes no attempt to deal with escape characters like \t and \n the same way \copy does, ## so the loaded results will not be identical but that shouldn't matter if used only for ## benchmarking and not for actual production loading (and why would you use this for that purpose when ## \copy is available?) ## When loading into a 2 column unindexed untriggered table, Perl takes about half the CPU time and ## postgres about half. When loading to a table with triggers or indexs, Perl's slice becomes ## mostly immaterial. use strict; use warnings; use DBI; use Time::HiRes qw(time); my $start=time(); my $dbi=DBI-connect('DBI:Pg:'); my ($columns) = $dbi-selectrow_array(select count(*) from information_schema.columns where table_name=?, undef, $ARGV[0]); $columns 0 or die no such table '$ARGV[0]'; ## prepare an insert with as many placeholders as the table has columns my $insert=$dbi-prepare(Insert into $ARGV[0] values ( . (join ',', map '?', 1..$columns) .')'); open my $fh, , $ARGV[1] or die Couldn't open '$ARGV[1]': $!; $dbi-begin_work(); while ($fh) { chomp; my @x=split /\t/; $insert-execute(@x); }; $dbi-commit(); my $stop=time(); ## make a timing output that look like the one psql \timing would generate print Time: , 1000* ($stop-$start), ms\n; -- -- make rand.csv like below. -- perl -le 'print join \t, int(rand()*1e9),$_%100 foreach 1..1e5' rand.csv -- -- execute on different versions: -- while (true) ; do for f in /usr/local/pgsql_REL9_1_7/ /usr/local/pgsql_REL9_2_2/ origin_94afbd5831fbc1926f/; do pg_ctl stop ; rm -r /tmp/data; $f/bin/initdb; $f/bin/pg_ctl start -w -o --shared_buffers=512MB --checkpoint_segments=60 --shared_buffers=16MB; createdb; echo JJ $f; psql -f rules_test.sql -X ; done ; done rules_test.sql.out -- -- and pull results from the log file -- tail -n 100 -f rules_test.sql.out |grep -P 'JJ|Time:' -a drop table foo cascade; create table foo ( x integer, partition integer); --create index on foo (x); create table foo_0 (like foo including indexes) inherits (foo); create table foo_1 (like foo including indexes) inherits (foo); create table foo_2 (like foo including indexes) inherits (foo); create table foo_3 (like foo including indexes) inherits (foo); create table foo_4 (like foo including indexes) inherits (foo); create table foo_5 (like foo
Re: [HACKERS] too much pgbench init output
On 19.12.2012 06:30, Jeevan Chalke wrote: On Mon, Dec 17, 2012 at 5:37 AM, Tomas Vondra t...@fuzzy.cz mailto:t...@fuzzy.cz wrote: Hi, attached is a new version of the patch that (a) converts the 'log_step_seconds' variable to a constant (and does not allow changing it using a command-line option etc.) (b) keeps the current logging as a default (c) adds a -q switch that enables the new logging with a 5-second interval I'm still not convinced there should be yet another know for tuning the log interval - opinions? It seems that you have generated a patch over your earlier version and due to that it is not cleanly applying on fresh sources. Please generate patch on fresh sources. Seems you're right - I've attached the proper patch against current master. However, I absolutely no issues with the design. Also code review is already done and looks good to me. I think to move forward on this we need someone from core-team. So I am planning to change its status to ready-for-committor. Before that please provide updated patch for final code review. As a committer, I have looked into the patch. I noticed two things: 1) In the help you put '-q' option into Common options section. I think this should be moved to Initialization options section because the option is only applied while initializing. 2) Shouldn't a long option for '-q' be provided? Something like '--quiet-progress-logging'? 3) No patches for docs found (doc/src/sgml/pgbench.sgml) -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] question: foreign key constraints and AccessExclusive locks
When adding a foreign key constraint on tableA which references tableB, why is an AccessExclusive lock on tableB necessary? Wouldn't a lock that prevents writes be sufficient, or does PostgreSQL have to modify *both* tables in some fashion? I'm using PostgreSQL 8.4 on Linux. -- Jon -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] too much pgbench init output
On 6.1.2013 03:03, Tatsuo Ishii wrote: As a committer, I have looked into the patch. I noticed two things: 1) In the help you put '-q' option into Common options section. I think this should be moved to Initialization options section because the option is only applied while initializing. Good point, moved. 2) Shouldn't a long option for '-q' be provided? Something like '--quiet-progress-logging'? I don't think so. Currently pgbench has either short or long option, not both (for the same thing). I believe we should stick to this and either choose -q or --quiet-logging but not both. 3) No patches for docs found (doc/src/sgml/pgbench.sgml) I've added a brief description of the -q option into the docs. IMHO it's enough but feel free to add some more details. There's one more thing I've just noticed - the original version of the patch simply removed the old logging, but this one keeps both old and quiet logging. But the old logging still uses this: fprintf(stderr, %d of %d tuples (%d%%) done.\n, while the new logging does this fprintf(stderr, %d of %d tuples (%d%%) done (elapsed %.2f s, remaining %.2f s).\n, i.e. it prints additional info about elapsed/estimated time. Do we want to keep it this way (i.e. not to mess with the old logging) or do we want to add these new fields to the old logging too? I suggest to add it to the old logging, to keep the log messages the same, the only difference being the logging frequency. Tomas diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c index e376452..31764fe 100644 --- a/contrib/pgbench/pgbench.c +++ b/contrib/pgbench/pgbench.c @@ -39,6 +39,7 @@ #include portability/instr_time.h #include ctype.h +#include math.h #ifndef WIN32 #include sys/time.h @@ -102,6 +103,7 @@ extern int optind; #define MAXCLIENTS 1024 #endif +#define LOG_STEP_SECONDS 5 /* seconds between log messages */ #define DEFAULT_NXACTS 10 /* default nxacts */ intnxacts = 0; /* number of transactions per client */ @@ -150,6 +152,7 @@ char *index_tablespace = NULL; #define naccounts 10 bool use_log;/* log transaction latencies to a file */ +bool use_quiet; /* quiet logging onto stderr */ bool is_connect; /* establish connection for each transaction */ bool is_latencies; /* report per-command latencies */ intmain_pid; /* main process id used in log filename */ @@ -359,6 +362,7 @@ usage(void) -n do not run VACUUM after initialization\n -F NUM fill factor\n -s NUM scaling factor\n +-q quiet logging (one message each 5 seconds)\n --foreign-keys\n create foreign key constraints between tables\n --index-tablespace=TABLESPACE\n @@ -1362,6 +1366,11 @@ init(bool is_no_vacuum) charsql[256]; int i; + /* used to track elapsed time and estimate of the remaining time */ + instr_time start, diff; + double elapsed_sec, remaining_sec; + int log_interval = 1; + if ((con = doConnect()) == NULL) exit(1); @@ -1430,6 +1439,8 @@ init(bool is_no_vacuum) } PQclear(res); + INSTR_TIME_SET_CURRENT(start); + for (i = 0; i naccounts * scale; i++) { int j = i + 1; @@ -1441,10 +1452,33 @@ init(bool is_no_vacuum) exit(1); } - if (j % 10 == 0) + /* If we want to stick with the original logging, print a message each +* 100k inserted rows. */ + if ((! use_quiet) (j % 10 == 0)) fprintf(stderr, %d of %d tuples (%d%%) done.\n, - j, naccounts * scale, - (int) (((int64) j * 100) / (naccounts * scale))); + j, naccounts * scale, + (int) (((int64) j * 100) / (naccounts * scale))); + /* let's not call the timing for each row, but only each 100 rows */ + else if (use_quiet (j % 100 == 0)) + { + INSTR_TIME_SET_CURRENT(diff); + INSTR_TIME_SUBTRACT(diff, start); + + elapsed_sec = INSTR_TIME_GET_DOUBLE(diff); + remaining_sec = (scale * naccounts - j) * elapsed_sec / j; + + /* have we reached the next interval (or end)? */ + if ((j == scale * naccounts) ||
Re: [HACKERS] PATCH: optimized DROP of multiple tables within a transaction
On 4.1.2013 17:42, Robert Haas wrote: On Mon, Dec 31, 2012 at 11:51 AM, Tomas Vondra t...@fuzzy.cz wrote: I thought I followed the conding style - which guidelines have I broken? + /* If there are no non-local relations, then we're done. Release the memory + * and return. */ Multi-line comments should start with a line containing only /* and end with a line containing only */. +DropRelFileNodeAllBuffers(RelFileNodeBackend * rnodes, int nnodes) and +rnode_comparator(const void * p1, const void * p2) The extra spaces after the asterisks should be removed. +void smgrdounlinkall(SMgrRelation * rels, int nrels, bool isRedo) +{ void should be on a line by itself. Sorry to nitpick. No, thanks for the nitpicking! Code style is important. As for BSEARCH_LIMIT, I don't have a great idea - maybe just DROP_RELATIONS_BSEARCH_LIMIT? Sounds good. I've changed the name and fixed the codestyle issues in the attached version of the patch. Tomas diff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c index 8dc622a..8c12a43 100644 --- a/src/backend/catalog/storage.c +++ b/src/backend/catalog/storage.c @@ -312,6 +312,11 @@ smgrDoPendingDeletes(bool isCommit) PendingRelDelete *pending; PendingRelDelete *prev; PendingRelDelete *next; + + SMgrRelation *srels = palloc(sizeof(SMgrRelation)); + int nrels = 0, + i = 0, + maxrels = 1; prev = NULL; for (pending = pendingDeletes; pending != NULL; pending = next) @@ -335,14 +340,31 @@ smgrDoPendingDeletes(bool isCommit) SMgrRelation srel; srel = smgropen(pending-relnode, pending-backend); - smgrdounlink(srel, false); - smgrclose(srel); + + /* extend the array if needed (double the size) */ + if (maxrels = nrels) { + maxrels *= 2; + srels = repalloc(srels, sizeof(SMgrRelation) * maxrels); + } + + srels[nrels++] = srel; } /* must explicitly free the list entry */ pfree(pending); /* prev does not change */ } } + + if (nrels 0) + { + smgrdounlinkall(srels, nrels, false); + + for (i = 0; i nrels; i++) + smgrclose(srels[i]); + } + + pfree(srels); + } /* diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c index dddb6c0..52ca2b0 100644 --- a/src/backend/storage/buffer/bufmgr.c +++ b/src/backend/storage/buffer/bufmgr.c @@ -62,6 +62,7 @@ #define BUF_WRITTEN0x01 #define BUF_REUSABLE 0x02 +#define DROP_RELATIONS_BSEARCH_LIMIT 10 /* GUC variables */ bool zero_damaged_pages = false; @@ -108,6 +109,7 @@ static volatile BufferDesc *BufferAlloc(SMgrRelation smgr, static void FlushBuffer(volatile BufferDesc *buf, SMgrRelation reln); static void AtProcExit_Buffers(int code, Datum arg); +static int rnode_comparator(const void * p1, const void * p2); /* * PrefetchBuffer -- initiate asynchronous read of a block of a relation @@ -2094,35 +2096,87 @@ DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber forkNum, * */ void -DropRelFileNodeAllBuffers(RelFileNodeBackend rnode) +DropRelFileNodeAllBuffers(RelFileNodeBackend *rnodes, int nnodes) { - int i; + int i, j, n = 0; + RelFileNode * nodes; + + nodes = palloc(sizeof(RelFileNode) * nnodes); /* non-local relations */ /* If it's a local relation, it's localbuf.c's problem. */ - if (RelFileNodeBackendIsTemp(rnode)) + for (i = 0; i nnodes; i++) { - if (rnode.backend == MyBackendId) - DropRelFileNodeAllLocalBuffers(rnode.node); + if (RelFileNodeBackendIsTemp(rnodes[i])) + { + if (rnodes[i].backend == MyBackendId) + DropRelFileNodeAllLocalBuffers(rnodes[i].node); + } + else + { + nodes[n++] = rnodes[i].node; + } + } + + /* +* If there are no non-local relations, then we're done. Release the memory +* and return. +*/ + if (n == 0) + { + pfree(nodes); return; } + /* sort the list of rnodes (but only
Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system
On 3.1.2013 20:33, Magnus Hagander wrote: On Thu, Jan 3, 2013 at 8:31 PM, Tomas Vondra t...@fuzzy.cz wrote: On 3.1.2013 18:47, Heikki Linnakangas wrote: How about creating the new directory as a direct subdir of $PGDATA, rather than buried in global? global is supposed to contain data related to shared catalog relations (plus pg_control), so it doesn't seem like the right location for per-database stat files. Also, if we're going to have admins manually zapping the directory (hopefully when the system is offline), that's less scary if the directory is not buried as deep. That's clearly possible and it's a trivial change. I was thinking about that actually, but then I placed the directory into global because that's where the pgstat.stat originally was. Yeah, +1 for a separate directory not in global. OK, I moved the files from global/stat to stat. Tomas diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c index be3adf1..4ec485e 100644 --- a/src/backend/postmaster/pgstat.c +++ b/src/backend/postmaster/pgstat.c @@ -64,10 +64,14 @@ /* -- * Paths for the statistics files (relative to installation's $PGDATA). + * Permanent and temprorary, global and per-database files. * -- */ -#define PGSTAT_STAT_PERMANENT_FILENAME global/pgstat.stat -#define PGSTAT_STAT_PERMANENT_TMPFILE global/pgstat.tmp +#define PGSTAT_STAT_PERMANENT_DIRECTORYstat +#define PGSTAT_STAT_PERMANENT_FILENAME stat/global.stat +#define PGSTAT_STAT_PERMANENT_TMPFILE stat/global.tmp +#define PGSTAT_STAT_PERMANENT_DB_FILENAME stat/%d.stat +#define PGSTAT_STAT_PERMANENT_DB_TMPFILE stat/%d.tmp /* -- * Timer definitions. @@ -115,8 +119,11 @@ int pgstat_track_activity_query_size = 1024; * Built from GUC parameter * -- */ +char *pgstat_stat_directory = NULL; char *pgstat_stat_filename = NULL; char *pgstat_stat_tmpname = NULL; +char *pgstat_stat_db_filename = NULL; +char *pgstat_stat_db_tmpname = NULL; /* * BgWriter global statistics counters (unused in other processes). @@ -219,11 +226,16 @@ static intlocalNumBackends = 0; */ static PgStat_GlobalStats globalStats; -/* Last time the collector successfully wrote the stats file */ -static TimestampTz last_statwrite; +/* Write request info for each database */ +typedef struct DBWriteRequest +{ + Oid databaseid; /* OID of the database to write */ + TimestampTz request_time; /* timestamp of the last write request */ +} DBWriteRequest; -/* Latest statistics request time from backends */ -static TimestampTz last_statrequest; +/* Latest statistics request time from backends for each DB */ +static DBWriteRequest * last_statrequests = NULL; +static int num_statrequests = 0; static volatile bool need_exit = false; static volatile bool got_SIGHUP = false; @@ -252,11 +264,17 @@ static void pgstat_sighup_handler(SIGNAL_ARGS); static PgStat_StatDBEntry *pgstat_get_db_entry(Oid databaseid, bool create); static PgStat_StatTabEntry *pgstat_get_tab_entry(PgStat_StatDBEntry *dbentry, Oid tableoid, bool create); -static void pgstat_write_statsfile(bool permanent); -static HTAB *pgstat_read_statsfile(Oid onlydb, bool permanent); +static void pgstat_write_statsfile(bool permanent, bool force); +static void pgstat_write_db_statsfile(PgStat_StatDBEntry * dbentry, bool permanent); +static void pgstat_write_db_dummyfile(Oid databaseid); +static HTAB *pgstat_read_statsfile(Oid onlydb, bool permanent, bool onlydbs); +static void pgstat_read_db_statsfile(Oid databaseid, HTAB *tabhash, HTAB *funchash, bool permanent); static void backend_read_statsfile(void); static void pgstat_read_current_status(void); +static bool pgstat_write_statsfile_needed(); +static bool pgstat_db_requested(Oid databaseid); + static void pgstat_send_tabstat(PgStat_MsgTabstat *tsmsg); static void pgstat_send_funcstats(void); static HTAB *pgstat_collect_oids(Oid catalogid); @@ -285,7 +303,6 @@ static void pgstat_recv_recoveryconflict(PgStat_MsgRecoveryConflict *msg, int le static void pgstat_recv_deadlock(PgStat_MsgDeadlock *msg, int len); static void pgstat_recv_tempfile(PgStat_MsgTempFile *msg, int len); - /* * Public functions called from postmaster follow * @@ -549,8 +566,34 @@ startup_failed: void pgstat_reset_all(void) { - unlink(pgstat_stat_filename); - unlink(PGSTAT_STAT_PERMANENT_FILENAME); + DIR * dir; + struct dirent * entry; + + dir = AllocateDir(pgstat_stat_directory); + while ((entry = ReadDir(dir, pgstat_stat_directory)) != NULL) + { + char fname[strlen(pgstat_stat_directory) + strlen(entry-d_name) + 1]; + + if
Re: [HACKERS] too much pgbench init output
On 6.1.2013 03:03, Tatsuo Ishii wrote: As a committer, I have looked into the patch. I noticed two things: 1) In the help you put '-q' option into Common options section. I think this should be moved to Initialization options section because the option is only applied while initializing. Good point, moved. In addition to this, I'd suggest to add checking -q is only possible with -i option since without -i, -q is meaningless. 2) Shouldn't a long option for '-q' be provided? Something like '--quiet-progress-logging'? I don't think so. Currently pgbench has either short or long option, not both (for the same thing). I believe we should stick to this and either choose -q or --quiet-logging but not both. Ok. 3) No patches for docs found (doc/src/sgml/pgbench.sgml) I've added a brief description of the -q option into the docs. IMHO it's enough but feel free to add some more details. Good. There's one more thing I've just noticed - the original version of the patch simply removed the old logging, but this one keeps both old and quiet logging. But the old logging still uses this: fprintf(stderr, %d of %d tuples (%d%%) done.\n, while the new logging does this fprintf(stderr, %d of %d tuples (%d%%) done (elapsed %.2f s, remaining %.2f s).\n, i.e. it prints additional info about elapsed/estimated time. Do we want to keep it this way (i.e. not to mess with the old logging) or do we want to add these new fields to the old logging too? I suggest to add it to the old logging, to keep the log messages the same, the only difference being the logging frequency. If we do so, probably '-q' is not appropeate option name any more, since the only difference between old logging and new one is, the former is printed every 1 lines while the latter is every 5 seconds, which is not really quiet. What do you think? -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review]
On Saturday, January 05, 2013 12:35 PM Boszormenyi Zoltan wrote: 2013-01-05 05:58 keltezéssel, Amit kapila írta: On Friday, January 04, 2013 10:57 PM Boszormenyi Zoltan wrote: Hi, I am reviewing your patch. Thank you very much. Yes, you are right adding a new LWLock will avoid the use of sleep. However I am just not sure if for only this purpose we should add a new LWLock? Similar logic is used in CreateLockFile() for postmaster file but it doesn't use sleep. How about reducing the time of sleep or removing sleep, in that user might get error and he need to retry to get his command successful? I think removing the loop entirely is better. However, the behaviour should be discussed by a wider audience: - the backends should wait for each other just like other statements that fight for the same object (in which case locking is needed), or - throw an error immediately on conflicting parallel work Okay, I shall update the patch with first option as suggested by Noah as well. I also tried to trigger one of the errors by creating the lock file manually. You need an extra space between the ... retry and or ...: + ereport(ERROR, + (errcode_for_file_access(), +errmsg(could not create lock file \%s\: %m , LockFileName), +errhint(May be too many concurrent edit into file happening, please wait!! and retry +or .lock is file accidently left there please clean the file from config_dir))); Will fix. Thank you for review. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] too much pgbench init output
On 6.1.2013 05:07, Tatsuo Ishii wrote: On 6.1.2013 03:03, Tatsuo Ishii wrote: As a committer, I have looked into the patch. I noticed two things: 1) In the help you put '-q' option into Common options section. I think this should be moved to Initialization options section because the option is only applied while initializing. Good point, moved. In addition to this, I'd suggest to add checking -q is only possible with -i option since without -i, -q is meaningless. Done. There's one more thing I've just noticed - the original version of the patch simply removed the old logging, but this one keeps both old and quiet logging. But the old logging still uses this: fprintf(stderr, %d of %d tuples (%d%%) done.\n, while the new logging does this fprintf(stderr, %d of %d tuples (%d%%) done (elapsed %.2f s, remaining %.2f s).\n, i.e. it prints additional info about elapsed/estimated time. Do we want to keep it this way (i.e. not to mess with the old logging) or do we want to add these new fields to the old logging too? I suggest to add it to the old logging, to keep the log messages the same, the only difference being the logging frequency. If we do so, probably '-q' is not appropeate option name any more, since the only difference between old logging and new one is, the former is printed every 1 lines while the latter is every 5 seconds, which is not really quiet. What do you think? AFAIK the 5 second logging is much quieter in most cases (and a bit more verbose when the initialization gets very slower), so I think the quiet logging is not a bad match although maybe there's a better name. This change (adding the elapsed/remaining fields to the original loggin) would be consistent with this name, because considering a single line, the -q is more verbose right now. So I'd stick with the -q option and added the fields to the original logging. But I'm not opposing a different name, I just can't think of a better one. Tomas diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c index e376452..39bd8a5 100644 --- a/contrib/pgbench/pgbench.c +++ b/contrib/pgbench/pgbench.c @@ -39,6 +39,7 @@ #include portability/instr_time.h #include ctype.h +#include math.h #ifndef WIN32 #include sys/time.h @@ -102,6 +103,7 @@ extern int optind; #define MAXCLIENTS 1024 #endif +#define LOG_STEP_SECONDS 5 /* seconds between log messages */ #define DEFAULT_NXACTS 10 /* default nxacts */ intnxacts = 0; /* number of transactions per client */ @@ -150,6 +152,7 @@ char *index_tablespace = NULL; #define naccounts 10 bool use_log;/* log transaction latencies to a file */ +bool use_quiet; /* quiet logging onto stderr */ bool is_connect; /* establish connection for each transaction */ bool is_latencies; /* report per-command latencies */ intmain_pid; /* main process id used in log filename */ @@ -359,6 +362,7 @@ usage(void) -n do not run VACUUM after initialization\n -F NUM fill factor\n -s NUM scaling factor\n +-q quiet logging (one message each 5 seconds)\n --foreign-keys\n create foreign key constraints between tables\n --index-tablespace=TABLESPACE\n @@ -1362,6 +1366,11 @@ init(bool is_no_vacuum) charsql[256]; int i; + /* used to track elapsed time and estimate of the remaining time */ + instr_time start, diff; + double elapsed_sec, remaining_sec; + int log_interval = 1; + if ((con = doConnect()) == NULL) exit(1); @@ -1430,6 +1439,8 @@ init(bool is_no_vacuum) } PQclear(res); + INSTR_TIME_SET_CURRENT(start); + for (i = 0; i naccounts * scale; i++) { int j = i + 1; @@ -1441,10 +1452,33 @@ init(bool is_no_vacuum) exit(1); } - if (j % 10 == 0) + /* If we want to stick with the original logging, print a message each +* 100k inserted rows. */ + if ((! use_quiet) (j % 10 == 0)) fprintf(stderr, %d of %d tuples (%d%%) done.\n, - j, naccounts * scale, - (int) (((int64) j * 100) / (naccounts * scale))); + j, naccounts * scale, + (int) (((int64) j * 100) / (naccounts * scale))); + /* let's not call