Re: [HACKERS] Core Extensions relocation
Peter Eisentraut wrote: For the directory name, I'd prefer either src/extensions (since there is more than one), or if you want to go for short somehow, src/ext. (Hmm, I guess the installation subdirectory is also called "extension". But it felt wrong on first reading anyway.) I jumped between those two a couple of times myself, settling on "extension" to match the installation location as you figured out. Assuming that name shouldn't change at this point, this seemed the best way to name the new directory, even though I agree it seems weird at first. What version did you branch this off? :) Long enough ago that apparently I've missed some major changes; Magnus already pointed out I needed to revisit how MODULEDIR was used. Looks like I need to rebuild the first patch in this series yet again, which shouldn't be too bad. The second time I did that, I made the commits atomic enough that the inevitable third one would be easy. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Creating new remote branch in git?
On 06/10/2011 12:19 AM, Alex Hunsaker wrote: It looks like if you push the remote branch first everything should work nicely: git checkout master git push origin origin:refs/heads/REL9_1_STABLE git fetch # fetch the new branch git checkout REL9_1_STABLE This is basically the state of the art right now for the most frequently deployed versions of git. I don't think checking out master first is necessary though. Potentially useful automation/trivia for alternate approaches includes: 1) Write a little script to do this messy chore, so you don't have to remember this weird "create a new branch using a full refspec" syntax. There is an example named git-create-branch along with a short tutorial on this subject at http://www.zorched.net/2008/04/14/start-a-new-branch-on-your-remote-git-repository/ 2) Use git_remote_branch https://github.com/webmat/git_remote_branch which is the swiss army knife of remote branch hackery automation. 3) Rather than manually hack the config files, use "git config" to do it. Not sure if this is completely workable, but something like this might connect the newly created branch to your local one after pushing it out, without actually opening the config with an editor: git config branch.REL9_1_STABLE.remote origin git config branch.REL9_1_STABLE.merge refs/heads/REL9_1_STABLE 4) Use a system with git>=1.7.0, which adds: git branch --set-upstream REL9_1_STABLE origin/REL9_1_STABLE -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tuning autovacuum
On 06/09/2011 05:41 PM, Tom Lane wrote: As Robert said, we're already seeing scalability problems with the pg_stats subsystem. I'm not eager to add a bunch more per-table counters, at least not without some prior work to damp down the ensuing performance hit. That's fair. Anyone who is running into the sort of autovacuum issues prompting this discussion would happily pay the overhead to get better management of that; it's one of the easiest things to justify more per-table stats on IMHO. Surely the per-tuple counters are vastly more of a problem than these messages could ever be. But concerns about stats overload are why I was highlighting issues around sending multiple messages per vacuum, and why incremental updates as it runs are unlikely to work out. Balancing that trade-off, getting enough data to help but not so such the overhead is obnoxious, is the non obvious tricky part of the design here. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tuning autovacuum
On 06/09/2011 04:43 PM, Bernd Helmle wrote: I'd go further and expose the info or details issued by VACUUM VERBOSE into the view, too, at least the number of pages visited and cleaned (or dead but not yet cleaned). Customers are heavily interested in these numbers and i've found pgfouine to provide those numbers very useful. Agreed there. The fact that VACUUM VERBOSE reports them suggests they're not too terribly difficult to track either. What we'd probably need to do with those is handle them like the other stats in the system: store a total number for visited/cleaned/dead for each relation, then increment the total as each vacuum finishes. That way, you could point a standard monitoring system at it and see trends. Just saving the last snapshot of data there isn't as useful. I'm seeing these as being like the counters in pg_stat_bgwriter; while it's easy to think of VACUUM "what work happened?" data as info you just want the last snapshot of, a continuous incrementing counter can do that and a lot of other things too. Anyone who is extracting useful data from pg_stat_bgwriter can use the same logic to track this data, even if it only moves forward in big chunks as vacuum completes. And it may be feasible to update it in the middle, too. Stepping into implementation for a second, the stats that are showing up in pg_stat_user_tables are being driven by a PgStat_MsgVacuum message coming out of the stats collector when it finishes. While that's the obvious place to put some more stuff, that's not necessarily the right way to build a better monitoring infrastructure. Two things to consider: -It's not really aimed at being called multiple times for one operation ("needs vacuum", "started vacuum", "finished vacuum" -There is a mix of things that make sense as long-term counters and things that update as snapshots--the timestamps are the main thing there. I haven't thought about it enough to have a real opinion on whether you can squeeze everything into the existing message by adding more fields, or if another type of message is necessary. Just pointing out that it's not trivially obvious which approach is better. What is unambiguous is that all this new data is really going to need a new view for it, pg_stat_vacuum or something like that. The fields that are already in pg_stat_user_tables can stay there as deprecated for a while, but this all wants to be in its own new view. This would really be a nice medium sized feature that DBAs would love, and it would help adoption on big sites. I have some ideas on how to get some funding to develop it because I keep running into this, but if someone wants to run with the idea I'd be happy to just help instead. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tuning autovacuum
Robert Haas wrote: Well, if there are more tables that need vacuuming than there are workers available at any given time, there will be a delay. We probably don't keep track of that delay at present, but we could. There are at least four interesting numbers to collect each time autovacuum runs: 1) This one, when was the threshold crossed. I believe one of the AV workers would have to pause periodically to update these if they're all busy doing work. 2) What time did the last autovacuum start at 3) How many dead rows were there at the point when it started 4) When did the last autovacuum end (currently the only value stored) There may be a 5th piece of state I haven't looked at yet worth exposing/saving, something related to how much work was skipped by the partial vacuum logic introduced in 8.4. I haven't looked at that code enough to know which is the right metric to measure its effectiveness by, but I have tis gut feel it's eventually going to be critical for distinguishing between the various common types of vacuum-heavy workloads that show up. All of these need to be stored in a system table/view, so that an admin can run a query to answer questions like: -What is AV doing right now? -How far behind is AV on tables it needs to clean but hasn't even started on? -How long is the average AV taking on my big tables? -As I change the AV parameters, what does it do to the runtimes against my big tables? As someone who is found by a lot of people whose problems revolve around databases with heavy writes or update churn, limitations in the current state of tracking what autovacuum does have been moving way up my priority list the last year. I now have someone who is always running autovacuum on the same table, 24x7. It finishes every two days, and when it does the 20% threshold is already crossed for it to start again. The "wait until a worker was available" problem isn't there, but I need a good wasy to track all of the other three things to have a hope of improving their situation. Right now getting the data I could use takes parsing log file output and periodic dumps of pg_stat_user_tables, then stitching the whole mess together. You can't run a heavily updated database in the TB+ range and make sense of what autovacuum is doing without a large effort matching output from log_autovacuum_min_duration and the stats that are visible in pg_stat_user_tables. It must get easier than that to support the sort of bigger tables it's possible to build now. And if this data starts getting tracked, we can start to move toward AV parameters that are actually aiming at real-world units, too. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core Extensions relocation
Vinicius Abrahao wrote: This is my first post at Hackers, so sorry if I am been a noob here, but I am pretty confused about how to create the extension pg_buffercache. This list is for talking about development of new features, normally on the latest development version of the software (right now 9.1). There is no such thing as CREATE EXTENSION in versions before that. A question like "how do I install pg_buffercache for 9.0?" should normally get sent to one of the other mailing lists; any of pgsql-performance, pgsql-admin, or pgsql-general would be appropriate to ask that at. This one really isn't. It's also better to avoid taking someone else's discussion and replying to it with your questions. But even so, I need to ask, because my production is on another versions: What is the right way to install this contrib at 9.0.1, 9.0.2 and 9.0.4 ? But since I happen to know this answer, here's an example from a RedHat derived Linux system running PostgreSQL 9.0.4, logged in as the postgres user: -bash-3.2$ locate pg_buffercache.sql /usr/pgsql-9.0/share/contrib/pg_buffercache.sql /usr/pgsql-9.0/share/contrib/uninstall_pg_buffercache.sql -bash-3.2$ psql -d pgbench -f /usr/pgsql-9.0/share/contrib/pg_buffercache.sql SET CREATE FUNCTION CREATE VIEW REVOKE REVOKE -bash-3.2$ psql -d pgbench -c "select count(*) from pg_buffercache" count --- 4096 The location of the file will be different on other platforms, but that's the basic idea of how you install it. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch for new feature: Buffer Cache Hibernation
On 06/05/2011 08:50 AM, Mitsuru IWASAKI wrote: It seems that I don't have enough time to complete this work. You don't need to keep cc'ing me, and I'm very happy if postgres to be the first DBMS which support buffer cache hibernation feature. Thanks for submitting the patch, and we'll see what happens from here. I've switch to bcc'ing you here and we should get you off everyone else's cc: list here soon. If this feature ends up getting committed, I'll try to remember to drop you a note about it so you can see what happened. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Getting a bug tracker for the Postgres project
On 05/31/2011 05:41 PM, Alvaro Herrera wrote: Excerpts from Josh Berkus's message of mar may 31 17:05:23 -0400 2011: BTW, we talked to Debian about debbugs ages ago, and the Debian project said that far too much of debbugs was not portable to other projects. The good news is that the GNU folk proved them wrong, as evidenced elsewhere in the thread. What happened is that one of the authors got motivated (not sure why/how) to put a major amount of work into making the code portable so that sites other than Debian could use it. So past perceptions about it being really hard were correct, that's just been fixed since then. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch for new feature: Buffer Cache Hibernation
On 06/01/2011 03:03 AM, Tatsuo Ishii wrote: Also I really want to see the performance comparison between these two approaches in the real world database. Well, tell me how big of a performance improvement you want PgFincore to win by, and I'll construct a benchmark where it does that. If you pick a database size that fits in the OS cache, but is bigger than shared_buffers, the difference between the approaches is huge. The opposite--trying to find a case where this hibernation approach wins--is extremely hard to do. Anyway, further discussion of this patch is kind of a waste right now. We've never gotten the patch actually sent to the list to establish a proper contribution (just pointers to a web page), and no feedback on that or other suggestions for redesign (extension repackaging, GUC renaming, removing unused code, and a few more). Unless the author shows up again in the next two weeks, this is getting bounced back with no review as code we can't use. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How can I check the treatment of bug fixes?
On 05/27/2011 08:36 AM, MauMau wrote: > I posted a patch for bug #6011 to pgsql-hackers several days ago. How > can I check the status of bug fixes? I'm worried that the patch might > be forgotten, because bug #5842 was missed for two months until Bruce > noticed it. Discussion here seems to have wandered far away from useful suggestions for you, let's see if that's possible to return to that. Best way to confirm when a bug is resolved is to subscribe to the pgsql-committers mailing list. If a commit for this fix appears, odds are good the original bug number will be referenced. Even if it isn't, you may recognize it via its description. Until you see that, the bug is almost certainly still open. Bugs that are considered to impact the current version under development are sometimes listed at http://wiki.postgresql.org/wiki/Open_Items Adding a bug to there that's not really specific to the new version may not be considered good form by some. It is the closest thing to an open bug tracker around though, and adding items to there means they won't be forgotten about; it's checked regularly by developers considering when it's a good time to release another alpha or beta. In my mind, clarifying what circumstances it's appropriate for people to put a bug onto the Open Items list would be a useful way to spend a little time. Certainly more productive than trying firing more bullets at the unkillable zombie that is bug tracking software. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Getting a bug tracker for the Postgres project
On 05/29/2011 05:17 AM, Peter Eisentraut wrote: Here is a list to choose from: http://en.wikipedia.org/wiki/Comparison_of_issue_tracking_systems I turned this into a spreadsheet to sort and prune more easily; if anyone wants that let me know, it's not terribly useful beyond what I'm posting here. 44 total, 16 that are open-source. I would say that having an e-mail interface is the next major cut to make. While distasteful, it's possible for this project to adopt a solution that doesn't use PostgreSQL, and one interesting candidate is in that category. It's not feasible to adopt one that doesn't integrate well with e-mail though. List of software without listed e-mail integration: Fossil, GNATS, Liberum Help Desk, MantisBT, org-mode, Flyspray, ikiwiki, Trac. The 8 F/OSS programs left after that filter are: OTRS Debbugs Request Tracker Zentrack LibreSource Redmine Roundup Bugzilla The next two filters you might apply are: Support for Git: Redmine, Bugzilla PostgreSQL back-end: OTRS, Request Tracker, LibreSource, Redmine, Roundup, Bugzilla There are a couple of additional nice to have items I saw on the feature list, and they all seem to spit out just Redmine & Bugzilla. Those are the two I've ended up using the most on PostgreSQL related projects, too, so that isn't a surprise to me. While I'm not a strong fan of Redmine, it has repeatedly been the lesser of the evils available here for three different companies I've worked at or dealt with. Greg Stark is right that Debbugs has a lot of interesting features similar to the desired workflow here. It's not tied to just Debian anymore; the GNU project is also using it now. And the database backend isn't that terrible to consider: it's flat files with a BerkleyDB index built on top. I think if it was perfect except for that, it would still be worth considering. Debbugs is far from a general purpose solution though, so any customization to support differences in this project's workflow would likely end up being one-off hacks. The VCS support might be a problem, but I've gotten the impression that git is increasingly popular for other Debian work. Since the program is in Perl, I can't imagine it's a gigantic task to switch that out, and probably one other people would like to see. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench--new transaction type
On 05/29/2011 03:11 PM, Jeff Janes wrote: If you use "pgbench -S -M prepared" at a scale where all data fits in memory, most of what you are benchmarking is network/IPC chatter, and table locking. If you profile it, you'll find a large amount of the time is actually spent doing more mundane things, like memory allocation. The network and locking issues are really not the bottleneck at all in a surprising number of these cases. Your patch isn't really dependent on your being right about the cause here, which means this doesn't impact your submissions any. Just wanted to clarify that what people expect are slowing things down in this situation and what actually shows up when you profile are usually quite different. I'm not sure whether this feature makes sense to add to pgbench, but it's interesting to have it around for developer testing. The way you've built this isn't messing with the code too much to accomplish that, and your comments about it being hard to do this using "-f" are all correct. Using a custom test file aims to shoot your foot unless you apply a strong grip toward doing otherwise. some numbers for single client runs on 64-bit AMD Opteron Linux: 12,567 sps under -S 19,646 sps under -S -M prepared 58,165 sps under -P 10,000 is too big of a burst to run at once. The specific thing I'm concerned about is what happens if you try this mode when using "-T" to enforce a runtime limit, and your SELECT rate isn't high. If you're only doing 100 SELECTs/second because your scale is big enough to be seek bound, you could overrun by nearly two minutes. I think this is just a matter of turning the optimization around a bit. Rather than starting with a large batch size and presuming that's ideal, in this case a different approach is really needed. You want the smallest batch size that gives you a large win here. My guess is that most of the gain here comes from increasing batch size to something in the 10 to 100 range; jumping to 10K is probably overkill. Could you try some smaller numbers and see where the big increases start falling off at? Obligatory code formatting nitpick: try not to overrun the right margin any further than the existing code around line 1779, where you add more ttype comments. That needs to turn into a multi-line comment. Rest of the patch looks fine, and don't worry about resubmitting for that; just something to tweak on your next update. A slightly more descriptive filename for the patch would help out those of us who look at a lot of pgbench patches, too. Something like "pgbench_loop_v1.patch" for example would make it easier for me to remember which patch this was by its name. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch for new feature: Buffer Cache Hibernation
On 05/07/2011 03:32 AM, Mitsuru IWASAKI wrote: For 1, I've just finish my work. The latest patch is available at: http://people.freebsd.org/~iwasaki/postgres/buffer-cache-hibernation-postgresql-20110507.patch Reminder here--we can't accept code based on it being published to a web page. You'll need to e-mail it to the pgsql-hackers mailing list to be considered for the next PostgreSQL CommitFest, which is starting in a few weeks. Code submitted to the mailing list is considered a release of it to the project under the PostgreSQL license, which we can't just assume for things when given only a URL to them. Also, you suggested you were out of time to work on this. If that's the case, we'd like to know that so we don't keep cc'ing you about things in expectation of an answer. Someone else may pick this up as a project to continue working on. But it's going to need a fair amount of revision before it matches what people want here, and I'm not sure how much of what you've written is going to end up in any commit that may happen from this idea. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tackling full page writes
On 05/24/2011 04:34 PM, Robert Haas wrote: we could name this feature "partial full page writes" and enable it either with a setting of full_page_writes=partial +1 to overloading the initial name, but only if the parameter is named 'maybe', 'sometimes', or 'perhaps'. I've been looking into a similar refactoring of the names here, where we bundle all of these speed over safety things (fsync, full_page_writes, etc.) into one control so they're easier to turn off at once. Not sure if it should be named "web_scale" or "do_you_feel_lucky_punk". 3. Going a bit further, Greg proposed the idea of ripping out our current WAL infrastructure altogether and instead just having one WAL record that says "these byte ranges on this page changed to have these new contents". The main thing that makes this idea particularly interesting to me, over the other two, is that it might translate well into the idea of using sync_file_range to aim for a finer fsync call on Linux than is currently possible. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New/Revised TODO? Gathering actual read performance data for use by planner
Michael Nolan wrote: Based on last year's discussion of this TODO item, it seems thoughts have been focused on estimating how much data is being satisfied from PG's shared buffers. However, I think that's only part of the problem. Sure, but neither it nor what you're talking about are the real gating factor on making an improvement here. Figuring out how to expose all this information to the optimizer so it can use it when planning is the hard part. Collecting a read time profile is just one of the many ways you can estimate what's in cache, and each of the possible methods has good and bad trade-offs. I've been suggesting that people assume that's a solved problem--I'm pretty sure what you're proposing was done by Greg Stark once and a prototype built even--and instead ask what you're going to do next if you had this data. This data would probably need to be kept separately for each table or index, as some tables or indexes may be mostly or fully in cache or on faster physical media than others, although in the absence of other data about a specific table or index, data about other relations in the same tablespace might be of some use. This is the important part. Model how the data needs to get stored such that the optimizer can make decisions using it, and I consider it easy to figure out how it will get populated later. There are actually multiple ways to do it, and it may end up being something people plug-in an implementation that fits their workload into, rather than just having one available. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2 schedule
On 05/24/2011 01:35 PM, Josh Berkus wrote: I would suggest instead adding a new page to postgresql.org/developer which lists the development schedule, rather than linking to that wiki page. Maybe on this page? http://www.postgresql.org/developer/roadmap Now that I look at the roadmap page again, I think all that would really be needed here is to tweak its wording a bit. If the description on there of the link to the wiki looked like this: General development information A wiki page about various aspects of the PostgreSQL development process, including detailed schedules and submission guidelines I think that's enough info to keep there. Putting more information back onto the main site when it can live happily on the wiki seems counterproductive to me; if there's concerns about things like vandalism, we can always lock the page. I could understand the argument that "it looks more professional to have it on the main site", but perception over function only goes so far for me. The idea of adding a link back to the wiki from the https://commitfest.postgresql.org/ page would complete being able to navigate among the three major sites here, no matter which people started at. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2 schedule
On 05/24/2011 05:03 PM, Peter Eisentraut wrote: On mån, 2011-05-23 at 22:44 -0400, Greg Smith wrote: -Given that work in August is particularly difficult to line up with common summer schedules around the world, having the other>1 month gap in the schedule go there makes sense. You might want to add a comment on the schedule page about the June/July/August timing, because it looks like a typo, and the meeting minutes are also inconsistent how they talk about June and July. Yes, I was planning to (and just did) circle back to the minutes to make everything match up. It's now self-consistent, same dates as the schedule, and explains the rationale better. I'm not sure how to address the feeling of typo you have on the schedule page beyond that. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2 schedule
David Fetter wrote: I thought we'd agreed on the timing for the first CF, and that I was to announce it in the PostgreSQL Weekly News, so I did just that. Yes, and excellent. The other ideas were: -Publish information about the full schedule to some of the more popular mailing lists -Link to this page more obviously from postgresql.org (fixed redirect URL is probably the right approach) to "bless" it, and potentially improve its search rank too. The specific new problem being highlighted to work on here is that the schedule and development process is actually quite good as open-source projects go, but that fact isn't visible at all unless you're already on the inside of the project. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] 9.2 schedule
At the developer meeting last week: http://wiki.postgresql.org/wiki/PgCon_2011_Developer_Meeting there was an initial schedule for 9.2 hammered out and dutifully transcribed at http://wiki.postgresql.org/wiki/PostgreSQL_9.2_Development_Plan , and the one part I wasn't sure I had written down correctly I see Robert already fixed. There was a suggestion to add some publicity around the schedule for this release. There's useful PR value to making it more obvious to people that the main development plan is regular and time-based, even if the release date itself isn't fixed. The right time to make an initial announcement like that is "soon", particularly if a goal here is to get more submitted into the first 9.2 CF coming in only a few weeks. Anyone have changes to suggest before this starts working its way toward an announcement? The main parts of the discussion leading to changes from the 9.1 schedule, as I recall them, are: -Shooting for a slightly earlier branch/initial 9.2 CommitFest in June helps some with patch developer bit-rot, and may let developers who are focused on new features be productive for more of the year. The perception that new development is unwelcome between the final CF and version release seems to have overshot a bit from its intention. It's not the best period to chat on this list, with many people distracted by release goals. But some people just aren't in the right position to work on alpha/beta testing and stability work then, and the intention was not to block them from doing something else if that's the case. (A similar bit brought up during one of the patch prep talks is that review is also welcome outside of a CF, which isn't really clear) -The last CF of the release is tough to reschedule usefully due to concerns about December/beginning of the year holidays. -Given that work in August is particularly difficult to line up with common summer schedules around the world, having the other >1 month gap in the schedule go there makes sense. As for why there aren't more changes, it's hard to argue that the 9.1 process was broken such that it needs heavy modification. There were a large number of new features committed, people seem satisfied with the quality of the result so far, and the schedule didn't go too far off the rails. Outside of the manpower issues (which are serious), the sections that strained the most against problems seem really hard to identify with anything other than hindsight. The tension between "ship it" and "make the release better" is a really fundamental one to software development. The two main ideas that pop up regularly, organizing more CommitFests or making them shorter, are both hard to adopt without more active volunteers working on review (both at the initial and committer level) and an increase in available CF manager time. An idea I heard a couple of people suggest is that it would take a CF manager focused exclusively on the "patch chasing" parts of the role--not someone who is also trying to develop, commit, or review during the CF--before this would be feasible to consider. Some sort of relief for making that role less demanding is needed here, before it's practical to schedule those even more often. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Adding an example for replication configuration to pg_hba.conf
Two things that could be changed from this example to make it more useful: -Document at least a little bit more how this is different from the "all/all" rule. I can imagine users wondering "do I use this instead of the other one? In addition? Is it redundant if I have 'all' in there? A little more description here aiming at the expected FAQs here would make this much more useful. -The default database is based on your user name, which is postgres in most packaged builds but not if you compile your own. I don't know whether it's practical to consider substituting that into this file, or if it's just enough to mention that as an additional doc comment. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why not install pgstattuple by default?
Greg Smith wrote: Any packager who grabs the shared/postgresql/extension directory in 9.1, which I expect to be all of them, shouldn't need any changes to pick up this adjustment. For example, pgstattuple installs these files: share/postgresql/extension/pgstattuple--1.0.sql share/postgresql/extension/pgstattuple--unpackaged--1.0.sql share/postgresql/extension/pgstattuple.control And these are the same locations they were already at. ...and the bit I missed here is that there's a fourth file here: lib/postgresql/pgstattuple.so If you look at a 9.1 spec file, such as http://svn.pgrpms.org/browser/rpm/redhat/9.1/postgresql/EL-6/postgresql-9.1.spec , you'll find: %files contrib ... %{pgbaseinstdir}/lib/pgstattuple.so Which *does* require a packager change to relocate from the postgresql-91-package to the main server one. So the theory that a change here might happen without pushing a repackaging suggestion toward packagers is busted. This does highlight that some packaging guidelines would be needed here to completely this work. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why not install pgstattuple by default?
Greg Smith wrote: Attached is a second patch to move a number of extensions from contrib/ to src/test/. Extensions there are built by the default built target, making installation of the postgresql-XX-contrib package unnecessary for them to be available. That was supposed to be contrib/ to src/extension , no idea where that test bit came from. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] cache estimates, cache access cost
Cédric Villemain wrote: http://git.postgresql.org/gitweb?p=users/c2main/postgres.git;a=shortlog;h=refs/heads/analyze_cache This rebases easily to make Cedric's changes move to the end; I just pushed a version with that change to https://github.com/greg2ndQuadrant/postgres/tree/analyze_cache if anyone wants a cleaner one to browse. I've attached a patch too if that's more your thing. I'd recommend not getting too stuck on the particular hook Cédric has added here to compute the cache estimate, which uses mmap and mincore to figure it out. It's possible to compute similar numbers, albeit less accurate, using an approach similar to how pg_buffercache inspects things. And I even once wrote a background writer extension that collected this sort of data as it was running the LRU scan anyway. Discussions of this idea seem to focus on how the "what's in the cache?" data is collected, which as far as I'm concerned is the least important part. There are multiple options, some work better than others, and there's no reason that can't be swapped out later. The more important question is how to store the data collected and then use it for optimizing queries. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us diff --git a/contrib/Makefile b/contrib/Makefile index 6967767..47652d5 100644 --- a/contrib/Makefile +++ b/contrib/Makefile @@ -27,6 +27,7 @@ SUBDIRS = \ lo \ ltree \ oid2name \ + oscache \ pageinspect \ passwordcheck \ pg_archivecleanup \ diff --git a/contrib/oscache/Makefile b/contrib/oscache/Makefile new file mode 100644 index 000..8d8dcc5 --- /dev/null +++ b/contrib/oscache/Makefile @@ -0,0 +1,15 @@ +# contrib/oscache/Makefile + +MODULE_big = oscache +OBJS = oscache.o + +ifdef USE_PGXS +PG_CONFIG = pg_config +PGXS := $(shell $(PG_CONFIG) --pgxs) +include $(PGXS) +else +subdir = contrib/oscache +top_builddir = ../.. +include $(top_builddir)/src/Makefile.global +include $(top_srcdir)/contrib/contrib-global.mk +endif diff --git a/contrib/oscache/oscache.c b/contrib/oscache/oscache.c new file mode 100644 index 000..1ad7dc2 --- /dev/null +++ b/contrib/oscache/oscache.c @@ -0,0 +1,151 @@ +/*- + * + * oscache.c + * + * + * Copyright (c) 2011, PostgreSQL Global Development Group + * + * IDENTIFICATION + * contrib/oscache/oscache.c + * + *- + */ +/* { POSIX stuff */ +#include /* exit, calloc, free */ +#include /* stat, fstat */ +#include /* size_t, mincore */ +#include /* sysconf, close */ +#include /* mmap, mincore */ +/* } */ + +/* { PostgreSQL stuff */ +#include "postgres.h" /* general Postgres declarations */ +#include "utils/rel.h" /* Relation */ +#include "storage/bufmgr.h" +#include "catalog/catalog.h" /* relpath */ +/* } */ + +PG_MODULE_MAGIC; + +void _PG_init(void); + +float4 oscache(Relation, ForkNumber); + +/* + * Module load callback + */ +void +_PG_init(void) +{ + /* Install hook. */ + OSCache_hook = &oscache; +} + +/* + * oscache process the os cache inspection for the relation. + * It returns the percentage of blocks in OS cache. + */ +float4 +oscache(Relation relation, ForkNumber forkNum) +{ + int segment = 0; + char *relationpath; + char filename[MAXPGPATH]; + int fd; + int64 total_block_disk = 0; + int64 total_block_mem = 0; + + /* OS things */ + int64 pageSize = sysconf(_SC_PAGESIZE); /* Page size */ + register int64 pageIndex; + + relationpath = relpathperm(relation->rd_node, forkNum); + + /* + * For each segment of the relation + */ + snprintf(filename, MAXPGPATH, "%s", relationpath); + while ((fd = open(filename, O_RDONLY)) != -1) + { + // for stat file + struct stat st; + // for mmap file + void *pa = (char *)0; + // for calloc file + unsigned char *vec = (unsigned char *)0; + int64 block_disk = 0; + int64 block_mem = 0; + + if (fstat(fd, &st) == -1) + { + close(fd); + elog(ERROR, "Can not stat object file : %s", +filename); + return 0; + } + + /* + * if file ok + * then process + */ + if (st.st_size != 0) + { + /* number of block in the current file */ + block_disk = st.st_size/pageSize; + + /* TODO We need to split mmap size to be sure (?) to be able to mmap */ + pa = mmap(NULL, st.st_size, PROT_NONE, MAP_SHARED, fd, 0); + if (pa == MAP_FAILED) + { +close(fd); +elog(ERROR, "Can not mmap object file : %s, errno = %i,%s\nThis error can happen if there is not enought space in memory to do the projection. Please mail ced...@2ndquadrant.fr with '[oscache] ENOMEM' as subject.", + filename, errno, strerror(errno)); +return 0; + } + + /* Prepare our vector containing all blocks informatio
Re: [HACKERS] performance-test farm
Tom Lane wrote: There's no such thing as a useful performance test that runs quickly enough to be sane to incorporate in our standard regression tests. To throw a hard number out: I can get a moderately useful performance test on a SELECT-only workload from pgbench in about one minute. That's the bare minimum, stepping up to 5 minutes is really necessary before I'd want to draw any real conclusions. More importantly, a large portion of the time I'd expect regression test runs to be happening with debug/assert on. We've well established this trashes pgbench performance. One of the uglier bits of code added to add the "performance farm" feature to the buildfarm code was hacking in a whole different set of build options for it. Anyway, what I was envisioning here was that performance farm systems would also execute the standard buildfarm tests, but not the other way around. We don't want performance numbers from some platform that is failing the basic tests. I would just expect that systems running the performance tests would cycle through regression testing much less often, as they might miss a commit because they were running a longer test then. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] performance-test farm
Tomas Vondra wrote: Actually I was not aware of how the buildfarm works, all I knew was there's something like that because some of the hackers mention a failed build on the mailing list occasionally. So I guess this is a good opportunity to investigate it a bit ;-) Anyway I'm not sure this would give us the kind of environment we need to do benchmarks ... but it's worth to think of. The idea is that buildfarm systems that are known to have a) reasonable hardware and b) no other concurrent work going on could also do performance tests. The main benefit of this approach is it avoids duplicating all of the system management and source code building work needed for any sort of thing like this; just leverage the buildfarm parts when they solve similar enough problems. Someone has actually done all that already; source code was last sync'd to the build farm master at the end of March: https://github.com/greg2ndQuadrant/client-code By far the #1 thing needed to move this forward from where it's stuck at now is someone willing to dig into the web application side of this. We're collecting useful data. It needs to now be uploaded to the server, saved, and then reports of what happened generated. Eventually graphs of performance results over time will be straighforward to generate. But the whole idea requires someone else (not Andrew, who has enough to do) sits down and figures out how to extend the web UI with these new elements. I guess we could run a script that collects all those important parameters and then detect changes. Anyway we still need some 'really stable' machines that are not changed at all, to get a long-term baseline. I have several such scripts I use, and know where two very serious ones developed by others are at too. This part is not a problem. If the changes are big enough to matter, they will show up as a difference on the many possible "how is the server configured?" reports, we just need to pick the most reasonable one. It's a small details I'm not worried about yet. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Infinity bsearch crash on Windows
Tom Lane wrote: SELECT 'INFINITY'::TIMESTAMP; Hmm ... I bet this is related to the recent reports about ALTER USER VALID UNTIL 'infinity' crashing on Windows. Can the people seeing this get through the regression tests? Perhaps more to the point, what is their setting of TimeZone? What does the pg_timezone_abbrevs view show for them? I must have missed that thread, I think I'm missing one of these lists (pgsql-bugs maybe?). I've cc'd Mark Watson so maybe you can get better responses without me in the middle if needed; for this one, he reports: Show timezone gives US/Eastern Select * from pg_timezone_abbrevs returns zero rows My Linux system that doesn't have this problem is also in US/Eastern, too, but I get 189 rows in pg_timezone_abrevs. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Infinity bsearch crash on Windows
A 9.1Beta1 test report from Richard Broersma (and confirmed on another system by Mark Watson) showed up pgsql-testers this week at http://archives.postgresql.org/pgsql-testers/2011-05/msg0.php with the following test crashing his Windows server every time: SELECT 'INFINITY'::TIMESTAMP; That works fine for me on Linux. Richard chased the error in the logs, which was a generic "you can't touch that memory" one, down to a full stack trace: http://archives.postgresql.org/pgsql-testers/2011-05/msg9.php It looks like it's losing its mind inside of src/backend/utils/adt/datetime.c , specifically at this line in datebsearch: 3576 while (last >= base) 3577 { 3578 position = base + ((last - base) >> 1); 3579 result = key[0] - position->token[0]; Why crash there only on Windows? Was the problem actually introduced above this part of the code? These are all questions I have no answer for. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Formatting Curmudgeons WAS: MMAP Buffers
Heikki Linnakangas wrote: Well, my first patch was two-phase commit. And I had never even used PostgreSQL before I dived into the source tree and started to work on that Well, everyone knows you're awesome. A small percentage of the people who write patches end up having the combination of background skills, mindset, and approach to pull something like that off. But there are at least a dozens submissions that start review with "I don't think there will ever work, but I can't even read your malformed patch to be sure" for every one that went like 2PC. If every submitter was a budding Heikki we wouldn't need patch submission guidelines at all. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Formatting Curmudgeons WAS: MMAP Buffers
Josh Berkus wrote: As I don't think we can change this, I think the best answer is to tell people "Don't submit a big patch to PostgreSQL until you've done a few small patches first. You'll regret it". When I last did a talk about getting started writing patches, I had a few people ask me afterwards if I'd ever run into problems with having patch submissions rejected. I said I hadn't. When asked what my secret was, I told them my first serious submission modified exactly one line of code[1]. And *that* I had to defend in regards to its performance impact.[2] Anyway, I think the intro message should be "Don't submit a big patch to PostgreSQL until you've done a small patch and some patch review" instead though. It's both a good way to learn what not to do, and it helps with one of the patch acceptance bottlenecks. The problem is not the process itself, but that there is little documentation of that process, and that much of that documentation does not match the defacto process. Obviously, the onus is on me as much as anyone else to fix this. I know the documentation around all this has improved a lot since then. Unfortunately there's plenty of submissions done by people who never read it. Sometimes it's because people didn't know about it; in others I suspect it was seen but some hard parts ignored because it seemed like too much work. One of these days I'm going to write the "Formatting Curmudgeon Guide to Patch Submission", to give people an idea just how much diff reading and revision a patch should go through in order to keep common issues like spurious whitespace diffs out of it. Submitters can either spent a block of time sweating those details out themselves, or force the job onto a reviewer/committer; they're always there. And every minute it's sitting in someone else's hands who is doing that job instead of reading the real code, the odds of the patch being kicked back go up. [1] http://archives.postgresql.org/pgsql-patches/2007-03/msg00553.php [2] http://archives.postgresql.org/pgsql-patches/2007-02/msg00222.php -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why not install pgstattuple by default?
On 05/09/2011 02:31 PM, Robert Haas wrote: I don't think we should be too obstinate about trying to twist the arm of packagers who (as Tom points out) will do whatever they want in spite of us, but the current state of contrib, with all sorts of things of varying type, complexity, and value mixed together cannot possibly be a good thing. I think the idea I'm running with for now means that packagers won't actually have to do anything. I'd expect typical packaging for 9.1 to include share/postgresql/extension from the build results without being more specific. You need to grab 3 files from there to get the plpgsql extension, and I can't imagine any packager listing them all by name. So if I dump the converted contrib extensions to put files under there, and remove them from the contrib build area destination, I suspect they will magically jump from the contrib to the extensions area of the server package at next package build; no packager level changes required. The more I look at this, the less obtrusive of a change it seems to be. Only people who will really notice are users who discover more in the basic server package, and of course committers with backporting to do. Since packaged builds of 9.1 current with beta1 seem to be in short supply still, this theory looks hard to prove just yet. I'm very excited that it's packaging week here however (rolls eyes), so I'll check it myself. I'll incorporate the suggestions made since I posted that test patch and do a bigger round of this next, end to end with an RPM set as the output. It sounds like everyone who has a strong opinion on what this change might look like has sketched a similar looking bikeshed. Once a reasonable implementation is hammered out, I'd rather jump to the big argument between not liking change vs. the advocacy benefits to PostgreSQL of doing this; they are considerable in my mind. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why not install pgstattuple by default?
On 05/09/2011 03:31 PM, Alvaro Herrera wrote: For executables we already have src/bin. Do we really need a separate place for, say, pg_standby or pg_upgrade? There's really no executables in contrib that I find myself regularly desperate for/angry at because they're not installed as an integral part of the server, the way I regularly find myself cursing some things that are now extensions. The only contrib executable I use often is pgbench, and that's normally early in server life--when it's still possible to get things installed easily most places. And it's something that can be removed when finished, in cases where people are nervous about the contrib package. Situations where pg_standby or pg_upgrade suddenly pop up as emergency needs seem unlikely too, which is also the case with oid2name, pg_test_fsync, pg_archivecleanup, and vacuumlo. I've had that happen with pg_archivecleanup exactly once since it appeared--running out of space and that was the easiest way to make the problem go away immediately and permanently--but since it was on an 8.4 server we had to download the source and build anyway. Also, my experience is that people are not that paranoid about running external binaries, even though they could potentially do harm to the database. Can always do a backup beforehand. But the idea of loading a piece of code that lives in the server all the time freaks them out. The way the word contrib implies (and sometimes is meant to mean) low quality, while stuff that ships with the main server package does not, has been beaten up here for years already. It's only a few cases where that's not fully justified, and the program can easily become an extension, that I feel are really worth changing here. There are 49 directories in contrib/ ; at best maybe 20% of them will ever fall into that category. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Formatting Curmudgeons WAS: MMAP Buffers
On 05/09/2011 12:06 PM, Andrew Dunstan wrote: The fact that we can do in place upgrades of the data only addresses one pain point in upgrading. Large legacy apps require large retesting efforts when upgrading, often followed by lots more work renovating the code for backwards incompatibilities. This can be a huge cost for what the suits see as little apparent gain, and making them do it more frequently in order to stay current will not win us any friends. I just had a "why a new install on 8.3?" conversation today, and it was all about the application developer not wanting to do QA all over again for a later release. Right now, one of the major drivers for "why upgrade?" has been the performance improvements in 8.3, relative to any older version. The main things pushing happy 8.3 sites to 8.4 or 9.0 that I see are either VACUUM issues (improved with partial vacuum in 8.4) or wanting real-time replication (9.0). I predict many sites that don't want either are likely to sit on 8.3 for a really long time. The community won't be able to offer a compelling reason why smaller sites in particular should go through the QA an upgrade requires. The fact that the app QA time is now the main driver--not the dump and reload time--is good, because it makes it does make it easier for the people with the biggest data sets to move. They're the ones that need the newer versions the most anyway, and in that regard having in-place upgrade start showing up as of 8.3 was really just in time. I think 8.3 is going to be one of those releases like 7.4, where people just keep running it forever. At least shortening the upgrade path has made that concern a little bit better. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] improvements to pgtune
ased, while working on simpler programs that don't aim so high leads to software that ships to the world and finds users. The only reason pgtune is now available in packaged form on multiple operating systems is that I ignored all advice about aiming for a complicated tool and instead wrote a really simple one. That was hard enough to finish. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch for new feature: Buffer Cache Hibernation
Mitsuru IWASAKI wrote: the patch is available at: http://people.freebsd.org/~iwasaki/postgres/buffer-cache-hibernation-postgresql-20110508.patch We can't accept patches just based on a pointer to a web site. Please e-mail this to the mailing list so that it can be considered a submission under the project's licensing terms. I hope this would be committable and the final version. PostgreSQL has high standards for code submissions. Extremely few submissions are committed without significant revisions to them based on code review. So far you've gotten a first round of high-level design review, there's several additional steps before something is considered for a commit. The whole process is outlined at http://wiki.postgresql.org/wiki/Submitting_a_Patch From a couple of minutes of reading the patch, the first things that pop out as problems are: -All of the ControlFile -> controlFile renaming has add a larger difference to ReadControlFile than I would consider ideal. -Touching StrategyControl is not something this patch should be doing. -I don't think your justification ("debugging or portability") for keeping around your original code in here is going to be sufficient to do so. -This should not be named enable_buffer_cache_hibernation. That very large diff you ended up with in the regression tests is because all of the settings named enable_* are optimizer control settings. Using the name "buffer_cache_hibernation" instead would make a better starting point. From a bigger picture perspective, this really hasn't addressed any of my comments about shared_buffers only being the beginning of the useful cache state to worry about here. I'd at least like the solution to the buffer cache save/restore to have a plan for how it might address that too one day. This project is also picky about only committing code that fits into the long-term picture for desired features. Having a working example of a server-side feature doing cache storage and restoration is helpful though. Don't think your work here is unappreciated--it is. Getting this feature added is just a harder problem than what you've done so far. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why not install pgstattuple by default?
Attached patch is a first cut at what moving one contrib module (in this case pg_buffercache) to a new directory structure might look like. The idea is that src/extension could become a place for "first-class" extensions to live. Those are ones community is committed to providing in core, but are just better implemented as extensions than in-database functions, for reasons that include security. This idea has been shared by a lot of people for a while, only problem is that it wasn't really practical to implement cleanly until the extensions code hit. I think it is now, this attempts to prove it. Since patches involving file renaming are clunky, the changes might be easier to see at https://github.com/greg2ndQuadrant/postgres/commit/507923e21e963c873a84f1b850d64e895776574f where I just pushed this change too. The install step for the modules looks like this now: gsmith@grace:~/pgwork/src/move-contrib/src/extension/pg_buffercache$ make install /bin/mkdir -p '/home/gsmith/pgwork/inst/move-contrib/lib/postgresql' /bin/mkdir -p '/home/gsmith/pgwork/inst/move-contrib/share/postgresql/extension' /bin/sh ../../../config/install-sh -c -m 755 pg_buffercache.so '/home/gsmith/pgwork/inst/move-contrib/lib/postgresql/pg_buffercache.so' /bin/sh ../../../config/install-sh -c -m 644 ./pg_buffercache.control '/home/gsmith/pgwork/inst/move-contrib/share/postgresql/extension/' /bin/sh ../../../config/install-sh -c -m 644 ./pg_buffercache--1.0.sql ./pg_buffercache--unpackaged--1.0.sql '/home/gsmith/pgwork/inst/move-contrib/share/postgresql/extension/' $ psql -c "create extension pg_buffercache" CREATE EXTENSION The only clunky bit I wasn't really happy with is the amount of code duplication that comes from having a src/extension/Makefile that looks almost, but not quite, identical to contrib/Makefile. The rest of the changes don't seem too bad to me, and even that's really only 36 lines that aren't touched often. Yes, the paths are different, so backports won't happen without an extra step. But the code changes required were easier than I was expecting, due to the general good modularity of the extensions infrastructure. So long as the result ends up in share/postgresql/extension/ , whether they started in contrib/ or src/extension/ doesn't really matter to CREATE EXTENSION. But having them broke out this way makes it easy for the default Makefile to build and install them all. (I recognize I didn't do that last step yet though) I'll happily go covert pgstattuple and the rest of the internal diagnostics modules to this scheme, and do the doc cleanups, this upcoming week if it means I'll be able to use those things without installing all of contrib one day. Ditto for proposing RPM and Debian packaging changes that match them. All that work will get paid back the first time I don't have to fill out a bunch of paperwork (again) at a customer site justifying why they need to install the contrib [RPM|deb] package (which has some scary stuff in it) on all their servers, just so I can get some bloat or buffer inspection module. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us diff --git a/contrib/Makefile b/contrib/Makefile index 6967767..04cf330 100644 --- a/contrib/Makefile +++ b/contrib/Makefile @@ -30,7 +30,6 @@ SUBDIRS = \ pageinspect \ passwordcheck \ pg_archivecleanup \ - pg_buffercache \ pg_freespacemap \ pg_standby \ pg_stat_statements \ diff --git a/contrib/pg_buffercache/Makefile b/contrib/pg_buffercache/Makefile deleted file mode 100644 index 323c0ac..000 --- a/contrib/pg_buffercache/Makefile +++ /dev/null @@ -1,18 +0,0 @@ -# contrib/pg_buffercache/Makefile - -MODULE_big = pg_buffercache -OBJS = pg_buffercache_pages.o - -EXTENSION = pg_buffercache -DATA = pg_buffercache--1.0.sql pg_buffercache--unpackaged--1.0.sql - -ifdef USE_PGXS -PG_CONFIG = pg_config -PGXS := $(shell $(PG_CONFIG) --pgxs) -include $(PGXS) -else -subdir = contrib/pg_buffercache -top_builddir = ../.. -include $(top_builddir)/src/Makefile.global -include $(top_srcdir)/contrib/contrib-global.mk -endif diff --git a/contrib/pg_buffercache/pg_buffercache--1.0.sql b/contrib/pg_buffercache/pg_buffercache--1.0.sql deleted file mode 100644 index 9407d21..000 --- a/contrib/pg_buffercache/pg_buffercache--1.0.sql +++ /dev/null @@ -1,17 +0,0 @@ -/* contrib/pg_buffercache/pg_buffercache--1.0.sql */ - --- Register the function. -CREATE FUNCTION pg_buffercache_pages() -RETURNS SETOF RECORD -AS 'MODULE_PATHNAME', 'pg_buffercache_pages' -LANGUAGE C; - --- Create a view for convenient access. -CREATE VIEW pg_buffercache AS - SELECT P.* FROM pg_buffercache_pages() AS P - (bufferid integer, relfilenode oid, reltablespace oid, reldatabase oid, - r
Re: [HACKERS] Why not install pgstattuple by default?
On 05/06/2011 04:06 PM, Tom Lane wrote: FWIW, I did move pg_config from -devel to the "main" (really client) postgresql package in Fedora, as of 9.0. That will ensure it's present in either client or server installations. Eventually that packaging will reach RHEL ... We should make sure that the PGDG packages adopt that for 9.1 then, so it starts catching on more. Unless Devrim changed to catch up since I last installed an RPM set, in that 9.0 it's still in the same place: $ rpm -qf /usr/pgsql-9.0/bin/pg_config postgresql90-devel-9.0.2-2PGDG.rhel5 While Peter's question about whether it's really all that useful is reasonable, I'd at least like to get a better error message when you don't have everything needed to compile extensions. I think the shortest path to that is making pg_config more likely to be installed, then to check whether the file "pg_config --pgxs" references exists. I'll see if I can turn that idea into an actual change to propose. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why not install pgstattuple by default?
On 05/07/2011 12:42 PM, Peter Eisentraut wrote: On fre, 2011-05-06 at 14:32 -0400, Greg Smith wrote: Given the other improvements in being able to build extensions in 9.1, we really should push packagers to move pg_config from the PostgreSQL development package into the main one starting in that version. I've gotten bit by this plenty of times. Do you need pg_config to install extensions? No, but you still need it to build them. PGXN is a source code distribution method, not a binary one. It presumes users can build modules they download using PGXS. No pg_config, no working PGXS, no working PGXN. For such a small binary to ripple out to that impact is bad. The repmgr program we distribute has the same problem, so I've been getting first-hand reports of just how many people are likely to run into this recently. You have to install postgresql-devel with RPM and on Debian, the very non-obvious postgresql-server-dev-$version Anyway, didn't want to hijack this thread beyond pointing out that if there any package reshuffling that happens for contrib changes, it should check for and resolve this problem too. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why not install pgstattuple by default?
On 05/06/2011 05:58 PM, Josh Berkus wrote: Yeah, I wasn't thinking of including all of contrib. There's a lot of reasons not to do that. I was asking about pgstattuple in particular, since it's: (a) small (b) has no external dependancies (c) adds no stability risk or performance overhead (d) is usually needed on production systems when it's needed at all It's possible that we have one or two other diagnostic utilities which meet the above profile. pageinspect, maybe? I use pgstattuple, pageinspect, pg_freespacemap, and pg_buffercache regularly enough that I wish they were more common. Throw in pgrowlocks and you've got the whole group Robert put into the debug set. It makes me sad every time I finish a utility using one of these and realize I'll have to include the whole "make sure you have the contrib modules installed" disclaimer in its documentation again. These are the only ones I'd care about moving into a more likely place. The rest of the contrib modules are the sort where if you need them, you realize that early and get them installed. These are different by virtue of their need popping up most often during emergencies. The fact that I believe they all match the low impact criteria too makes it even easier to consider. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch for new feature: Buffer Cache Hibernation
On 05/05/2011 05:06 AM, Mitsuru IWASAKI wrote: In summary, PgFincore's target is File System Buffer Cache, Buffer Cache Hibernation's target is DB Buffer Cache(shared buffers). Right. The thing to realize is that shared_buffers is becoming a smaller fraction of the total RAM used by the database every year. On Windows it's been stuck at useful settings being less than 512MB for a while now. And on UNIX systems, around 8GB seems to be effective upper limit. Best case, shared_buffers is only going to be around 25% of total RAM; worst-case, approximately, you might have Windows server with 64GB of RAM where shared_buffers is less than 1% of total RAM. There's nothing wrong with the general idea you're suggesting. It's just only targeting a small (and shrinking) subset of the real problem here. Rebuilding cache state starts with shared_buffers, but that's not enough of the problem to be an effective tweak on many systems. I think that all the complexity with CRCs etc. is unlikely to lead anywhere too, and those two issues are not completely unrelated. The simplest, safest thing here is the right way to approach this, not the most complicated one, and a simpler format might add some flexibility here to reload more cache state too. The bottleneck on reloading the cache state is reading everything from disk. Trying to micro-optimize any other part of that is moving in the wrong direction to me. I doubt you'll ever measure a useful benefit that overcomes the expense of maintaining the code. And you seem to be moving to where someone can't restore cache state when they change shared_buffers. A simpler implementation might still work in that situation; reload until you run out of buffers if shared_buffers shrinks, reload until you're done with the original size. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why not install pgstattuple by default?
Christopher Browne wrote: I'm getting "paper cuts" quite a bit these days over the differences between what different packaging systems decide to install. The one *I* get notably bit on, of late, is that I have written code that expects to have pg_config to do some degree of self-discovery, only to find production folk complaining that they only have "psql" available in their environment. Given the other improvements in being able to build extensions in 9.1, we really should push packagers to move pg_config from the PostgreSQL development package into the main one starting in that version. I've gotten bit by this plenty of times. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Compiling a PostgreSQL 7.3.2 project with Eclipse
Krešimir Križanović wrote: However, where I compile and run the project, in the Eclipse console I get: POSTGRES backend interactive interface $Revision: 1.115 $ $Date: 2006/02/06 01:19:46 $ I don’t know what to do with this backend interface. I would like to get a postmaster running and to connect to a data base with psql. However, when i try to start psql, it says that there is no postmaster running. An example of a session with that interface is at http://archives.postgresql.org/pgsql-hackers/2000-01/msg01471.php ; you may find it useful at some point. Your problem is caused by a change made to PostgreSQL's naming convention made after the 7.3 fork you're using. In earlier versions, "postgres" meant start the server in single user mode: http://www.postgresql.org/docs/7.3/static/app-postgres.html While "postmaster" started it as a proper server: http://www.postgresql.org/docs/7.3/static/app-postmaster.html In modern versions, they are the same thing. The Eclipse example uses "postgres", which starts the regular server now, but in 7.3 only started single user mode. Change where you run the program to use "postmaster" instead and it should work more like what you're expecting. Doing something useful with the TelegraphCQ code is probably going to take you a while. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch for new feature: Buffer Cache Hibernation
Alvaro Herrera wrote: As for gain, I have heard of test setups requiring hours of runtime in order to prime the buffer cache. And production ones too. I have multiple customers where a server restart is almost a planned multi-hour downtime. The system may be back up, but for a couple of hours performance is so terrible it's barely usable. You can watch the MB/s ramp up as the more random data fills in over time; getting that taken care of in a larger block more amenable to elevator sorting would be a huge help. I never bothered with this particular idea though because shared_buffers is only a portion of the important data. Cedric's pgfincore code digs into the OS cache, too, which can then save enough to be really useful here. And that's already got a snapshot/restore feature. The slides at http://www.pgcon.org/2010/schedule/events/261.en.html have a useful into to that, pages 30 through 34 are the neat ones. That provides some other neat APIs for preloading popular data into cache too. I'd rather work on getting something like that into core, rather than adding something that only is targeting just shared_buffers. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Predicate locking
Kevin Grittner wrote: Check where the plan goes from a table scan to an indexed access. Also look at what is showing for SIRead locks in pg_locks as you go. Between those two bits of information, it should become apparent. OK, so this doesn't look to be an index lock related thing at all here. Updated test case does this to create the table and show some additional state: drop table t; create table t (id bigint, value bigint); insert into t(id,value) (select s,1 from generate_series(1,348) as s); create index t_idx on t(id); begin transaction; set transaction isolation level serializable; explain analyze select * from t where id = 2; select pid,locktype,relation::regclass,page,tuple from pg_locks where mode='SIReadLock'; insert into t (id, value) values (-2, 1); select pid,locktype,relation::regclass,page,tuple from pg_locks where mode='SIReadLock'; Do the same thing as before on the second process: begin transaction; set transaction isolation level serializable; select * from t where id = 3; insert into t (id, value) values (-3, 0); commit; Then return to the first client to commit. When I execute that with 348 records, the case that fails, it looks like this: gsmith=# explain analyze select * from t where id = 2; QUERY PLAN Seq Scan on t (cost=0.00..6.35 rows=2 width=16) (actual time=0.106..0.286 rows=1 loops=1) Filter: (id = 2) Total runtime: 0.345 ms (3 rows) gsmith=# select pid,locktype,relation::regclass,page,tuple from pg_locks where mode='SIReadLock'; pid | locktype | relation | page | tuple --+--+--+--+--- 1495 | relation | t| | So it's actually grabbing a lock on the entire table in that situation. The other client does the same thing, and they collide with the described serialization failure. The minute I try that with table that is 349 rows instead, it switches plans: gsmith=# explain analyze select * from t where id = 2; QUERY PLAN -- Bitmap Heap Scan on t (cost=4.27..6.29 rows=2 width=16) (actual time=0.169..0.171 rows=1 loops=1) Recheck Cond: (id = 2) -> Bitmap Index Scan on t_idx (cost=0.00..4.27 rows=2 width=0) (actual time=0.144..0.144 rows=1 loops=1) Index Cond: (id = 2) Total runtime: 0.270 ms (5 rows) gsmith=# select pid,locktype,relation::regclass,page,tuple from pg_locks where mode='SIReadLock'; pid | locktype | relation | page | tuple --+--+--+--+--- 1874 | page | t_idx|1 | 1874 | tuple| t|0 | 2 (2 rows) Grabbing a lock on the index page and the row, as Dan explained it would. This actually eliminates this particular serialization failure altogether here though, even with these still on the same table and index page. So the root problem with Vlad's test isn't the index lock at all; it's heavy locking from the sequential scan that's executing on the trivial cases. If he expands his tests to use a larger amount of data, such that the plan switches to a realistic one, his results with the new serialization mode may very well be more satisfying. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Predicate locking
Kevin Grittner wrote: I don't think Vlad is being unreasonable here; he's provided a test case demonstrating the behavior he'd like to see, and shown it doesn't work as expected. ... on a toy table with contrived values. How different is this from the often-asked question about why a query against a four-line table is not using the index they expect, and how can we expect it to scale if it doesn't? It's not, but in that case I've been known to show someone that the behavior they're seeing doesn't happen on a larger table. My point was just that no one has really done that here yet: provided an example showing SSI serialization working as a substitute for predicate locking in this sort of use case. I trust that the theory is sound here, and yet I'd still like to see that demonstrated. This is why I launched into making a less trivial test case. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Predicate locking
Dan Ports wrote: Yes, you're right -- the current implementation of SSI only locks indexes at the granularity of index pages. So although those transactions don't actually access the same records, they're detected as a conflict because they're on the same index page. Let's try to demonstrate that with an update to Vlad's example. Run this on a first client to generate the same type of table, but with an easy to vary number of rows in it: drop table t; create table t (id bigint, value bigint); insert into t(id,value) (select s,1 from generate_series(1,348) as s); create index t_idx on t(id); begin transaction; set transaction isolation level serializable; select * from t where id = 2; insert into t (id, value) values (-2, 1); Execute this on the second client: begin transaction; set transaction isolation level serializable; select * from t where id = 3; insert into t (id, value) values (-3, 0); commit; And then return to the first one to run COMMIT. I'm getting a serialization failure in that case. However, if I increase the generate_series to create 349 rows (or more) instead, it works. I don't see where it crosses a page boundary in table or index size going from 348 to 349, so I'm not sure why I'm seeing a change happening there. In both cases, there's only one non-header index block involved. I don't think Vlad is being unreasonable here; he's provided a test case demonstrating the behavior he'd like to see, and shown it doesn't work as expected. If we can prove that test does work on non-trivial sized tables, and that it only suffers from to-be-optimized broader locks than necessary, that would make a more compelling argument against the need for proper predicate locks. I don't fully understand why this attempt I tried to do that is working the way it does though. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] improvements to pgtune
Daniel Farina wrote: It seems like in general it lacks a feedback mechanism to figure things out settings from workloads, instead relying on Greg Smith's sizable experience to do some arithmetic and get you off the ground in a number of common cases. To credit appropriately, the model used right now actually originated with a Josh Berkus spreadsheet, from before I was doing this sort of work full-time. That's held up pretty well, but it doesn't fully reflect how I do things nowadays. The recent realization that pgtune is actually shipping as a package for Debian/Ubuntu now has made realize this is a much higher profile project now, one that I should revisit doing a better job on. Every time I've gotten pulled into discussions of setting parameters based on live monitoring, it's turned into a giant black hole--absorbs a lot of energy, nothing useful escapes from it. I credit completely ignoring that idea altogether, and using the simplest possible static settings instead, as one reason I managed to ship code here that people find useful. I'm not closed to the idea, just not optimistic it will lead anywhere useful. That makes it hard to work on when there are so many obvious things guaranteed to improve the program that could be done instead. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] improvements to pgtune
Shiv wrote: On the program I hope to learn as much about professional software engineering principles as PostgreSQL. My project is aimed towards extending and hopefully improving upon pgtune. If any of you have some ideas or thoughts to share. I am all ears!! Well, first step on the software engineering side is to get a copy of the code in a form you can modify. I'd recommend grabbing it from https://github.com/gregs1104/pgtune ; while there is a copy of the program on git.postgresql.org, it's easier to work with the one on github instead. I can push updates over to the copy on postgresql.org easily enough, and that way you don't have to worry about getting an account on that server. There's a long list of suggested improvements to make at https://github.com/gregs1104/pgtune/blob/master/TODO Where I would recommend getting started is doing some of the small items on there, some of which I have already put comments into the code about but just not finished yet. Some examples: -Validate against min/max -Show original value in output -Limit shared memory use on Windows (see notes on shared_buffers at http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server for more information) -Look for postgresql.conf file using PGDATA environment variable -Look for settings files based on path of the pgtune executable -Save a settings reference files for newer versions of PostgreSQL (right now I only target 8.4) and allow passing in the version you're configuring. A common mistake made by GSOC students is to dive right in to trying to make big changes. You'll be more successful if you get practice at things like preparing and sharing patches on smaller changes first. At the next level, there are a few larger features that I would consider valuable that are not really addressed by the program yet: -Estimate how much shared memory is used by the combination of settings. See Table 17-2 at http://www.postgresql.org/docs/9.0/static/kernel-resources.html ; those numbers aren't perfect, and improving that table is its own useful project. But it gives an idea how they fit together. I have some notes at the end of the TODO file on how I think the information needed to produce this needs to be passed around the inside of pgtune. -Use that estimate to produce a sysctl.conf file for one platform; Linux is the easiest one to start with. I've attached a prototype showing how to do that, written in bash. -Write a Python-TK or web-based front-end for the program. Now that I know someone is going to work on this program again, I'll see what I can do to clean some parts of it up. There are a couple of things it's easier for me to just fix rather than to describe, like the way I really want to change how it adds comments to the settings it changes. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us #!/bin/bash # Output lines suitable for sysctl configuration based # on total amount of RAM on the system. The output # will allow up to 50% of physical memory to be allocated # into shared memory. # On Linux, you can use it as follows (as root): # # ./shmsetup >> /etc/sysctl.conf # sysctl -p # Early FreeBSD versions do not support the sysconf interface # used here. The exact version where this works hasn't # been confirmed yet. page_size=`getconf PAGE_SIZE` phys_pages=`getconf _PHYS_PAGES` if [ -z "$page_size" ]; then echo Error: cannot determine page size exit 1 fi if [ -z "$phys_pages" ]; then echo Error: cannot determine number of memory pages exit 2 fi shmall=`expr $phys_pages / 2` shmmax=`expr $shmall \* $page_size` echo \# Maximum shared segment size in bytes echo kernel.shmmax = $shmmax echo \# Maximum number of shared memory segments in pages echo kernel.shmall = $shmall -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improving the memory allocator
On 04/25/2011 05:45 PM, Andres Freund wrote: The profile I used in this case was: pgbench -h /tmp/ -p5433 -s 30 pgbench -S -T 20 I'd suggest collecting data from running this with "-M prepared" at some point too, so that you can get a basic idea which of these allocations are avoided when using prepared statements. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] branching for 9.2devel
On 04/25/2011 02:26 PM, Josh Berkus wrote: Overall, I think the advantages to a faster/shorter CF cycle outweigh the disadvantages enough to make it at least worth trying. I'm willing to run the first 1-week CF, as well as several of the others during the 9.2 cycle to try and make it work. It will be interesting to see if it's even possible to get all the patches assigned a reviewer in a week. The only idea I've come up with for making this idea more feasible is to really buckle down on the idea that all submitters should also be reviewing. I would consider a fair policy to say that anyone who doesn't volunteer to review someone else's patch gets nudged toward the bottom of the reviewer priority list. Didn't offer to review someone else's patch? Don't be surprised if you get bumped out of a week long 'fest. This helps with two problems. It helps fill in short-term reviewers, and in the long-term it makes submitters more competent. The way things are setup now, anyone who submits a patch without having done a review first is very likely to get their patch bounced back; the odds of getting everything right without that background are low. Ideally submitters might even start fixing their own patches without reviewer prompting, once they get into someone else's and realize what they didn't do. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] fsync reliability
On 04/23/2011 09:58 AM, Matthew Woodcraft wrote: As far as I can make out, the current situation is that this fix (the auto_da_alloc mount option) doesn't work as advertised, and the ext4 maintainers are not treating this as a bug. See https://bugzilla.kernel.org/show_bug.cgi?id=15910 I agree with the resolution that this isn't a bug. As pointed out there, XFS does the same thing, and this behavior isn't going away any time soon. Leaving behind zero-length files in situations where developers tried to optimize away a necessary fsync happens. Here's the part where the submitter goes wrong: "We first added a fsync() call for each extracted file. But scattered fsyncs resulted in a massive performance degradation during package installation (factor 10 or more, some reported that it took over an hour to unpack a linux-headers-* package!) In order to reduce the I/O performance degradation, fsync calls were deferred..." Stop right there; the slow path was the only one that had any hope of being correct. It can actually slow things by a factor of 100X or more, worst-case. "So, we currently have the choice between filesystem corruption or major performance loss": yes, you do. Writing files is tricky and it can either be slow or safe. If you're going to avoid even trying to enforce the right thing here, you're really going to get really burned. It's unfortunate that so many people are used to the speed you get in the common situation for a while now with ext3 and cheap hard drives: all writes are cached unsafely, but the filesystem resists a few bad behaviors. Much of the struggle where people say "this is so much slower, I won't put up with it" and try to code around it is futile, and it's hard to separate out the attempts to find such optimizations from the legitimate complaints. Anyway, you're right to point out that the filesystem is not necessarily going to save anyone from some of the tricky rename situations even with the improvements made to delayed allocation. They've fixed some of the worst behavior of the earlier implementation, but there are still potential issues in that area it seems. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] fsync reliability
On 04/24/2011 10:06 PM, Daniel Farina wrote: On Thu, Apr 21, 2011 at 8:51 PM, Greg Smith wrote: There's still the "fsync'd a data block but not the directory entry yet" issue as fall-out from this too. Why doesn't PostgreSQL run into this problem? Because the exact code sequence used is this one: open write fsync close And Linux shouldn't ever screw that up, or the similar rename path. Here's what the close man page says, from http://linux.die.net/man/2/close : Theodore Ts'o addresses this *exact* sequence of events, and suggests if you want that rename to definitely stick that you must fsync the directory: http://www.linuxfoundation.org/news-media/blogs/browse/2009/03/don%E2%80%99t-fear-fsync Not exactly. That's talking about the sequence used for creating a file, plus a rename. When new WAL files are being created, I believe the ugly part of this is avoided. The path when WAL files are recycled using rename does seem to be the one with the most likely edge case. The difficult case Tso's discussion is trying to satisfy involves creating a new file and then swapping it for an old one atomically. PostgreSQL never does that exactly. It creates new files, pads them with zeros, and then starts writing to them; it also renames old files that are already of the correctly length. Combined with the fact that there are always fsyncs after writes to the files, and this case really isn't exactly the same as any of the others people are complaining about. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] fsync reliability
On 04/22/2011 09:32 AM, Simon Riggs wrote: OK, that's good, but ISTM we still have a hole during RemoveOldXlogFiles() where we don't fsync or open/close the file, just rename it. This is also something that many applications rely upon working as hoped for here, even though it's not technically part of POSIX. Early versions of ext4 broke that, and it caused a giant outcry of complaints. http://www.h-online.com/open/news/item/Ext4-data-loss-explanations-and-workarounds-740671.html has a good summary. This was broken on ext4 from around 2.6.28 to 2.6.30, but the fix for it was so demanded that it's even been ported by the relatively lazy distributions to their 2.6.28/2.6.29 kernels. There may be a small window for metadata issues here if you've put the WAL on ext2 and there's a crash in the middle of rename. That factors into why any suggestions I make about using ext2 come with a load of warnings about the risk of not journaling. It's hard to predict every type of issue that fsck might force you to come to terms with after a crash on ext2, and if there was a problem with this path I'd expect it to show up as something to be reconciled then. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] fsync reliability
Simon Riggs wrote: We do issue fsync and then close, but only when we switch log files. We don't do that as part of the normal commit path. Since all these files are zero-filled before use, the space is allocated for them, and the remaining important filesystem layout metadata gets flushed during the close. The only metadata that changes after that--things like the last access time--isn't important to the WAL functioning. So the metadata doesn't need to be updated after a normal commit, it's already there. There are two main risks when crashing while fsync is in the middle of executing a push out to physical storage: torn pages due to partial data writes, and other out of order writes. The only filesystems where this isn't true are the copy on write ones, where the blocks move around on disk too. But those all have their own more careful guarantees about metadata too. The issue you raise above where "fsync is not safe for Write Ahead Logging" doesn't sound good. I don't think what you've said has fully addressed that yet. We could replace the commit path with O_DIRECT and physically order the data blocks, but I would guess the code path to durable storage has way too many bits of code tweaking it for me to feel happy that was worth it. As far as I can tell the CRC is sufficient protection against that. This is all data that hasn't really been committed being torn up here. Once you trust that the metadata problem isn't real, reordered writes are the only going to destroy things that are in the middle of being flushed to disk. A synchronous commit mangled this way will be rolled back regardless because it never really finished (fsync didn't return); an asynchronous one was never guaranteed to be on disk. On many older Linux systems O_DIRECT is a less reliable code path than than write/fsync is, so you're right that isn't necessarily a useful step forward. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Formatting Curmudgeons WAS: MMAP Buffers
Andrew Dunstan wrote: Personally, I want pgindent installed in /usr/local/ or similar. That way I can have multiple trees and it will work in all of them without my having to build it for each. What I don't want is for the installed patched BSD indent to conflict with the system's indent, which is why I renamed it. If you still think that's a barrier to easy use, then I think we need a way to provide hooks in the makefiles for specifying the install location, so we can both be satisfied. I don't think there's a conflict here, because the sort of uses I'm worried about don't want to install the thing at all; just to run it. I don't really care what "make install" does because I never intend to run it; dumping into /usr/local is completely reasonable for the people who do. Since there's no script I know of other than your prototype, I don't think repackaging is likely to break anything. That makes it worth doing *now* rather than later. But frankly, I'd rather do without an extra script if possible. Fine, the renaming bit I'm not really opposed to. The odds there's anyone using this thing that isn't reading this message exchange is pretty low. There is the documentation backport issue if you make any serious change it though. Maybe put the new version in another location, leave the old one where it is? There's a fair number of steps to this though. It's possible to automate them all such that running the program is trivial. I don't know how we'd ever get that same ease of use without some sort of scripting for the whole process. Could probably do it in a makefile instead, but I don't know if that's really any better. The intersection between people who want to run this and people who don't have bash available is pretty slim I think. I might re-write in Perl to make it more portable, but I think that will be at the expense of making it harder for people to tweak if it doesn't work out of the box. More code, too. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench \for or similar loop
Kevin Grittner wrote: I'm not clear on exactly what you're proposing there, but the thing I've considered doing is having threads to try to keep a FIFO queue populated with a configurable transaction mix, while a configurable number of worker threads pull those transactions off the queue and... This is like the beginning of an advertisement for how Tsung is useful for simulating complicated workloads. The thought of growing pgbench to reach that level of capabilities makes my head hurt. When faced with this same issue, the sysbench team decided to embed Lua as their scripting language; sample scripts: http://bazaar.launchpad.net/~sysbench-developers/sysbench/0.5/files/head:/sysbench/tests/db/ This isn't very well known because the whole MySQL community fracturing has impacted their ability to actually release this overhaul--seems like they spend all their time just trying to add support for each new engine of the month. I don't even like Lua, yet this still seems like a much better idea than trying to build on top of the existing pgbench codebase. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench \for or similar loop
Alvaro Herrera wrote: Why do we have pgbench at all in the first place? Surely we could rewrite it in plpgsql with proper stored procedures. pgbench gives you a driver program with the following useful properties: 1) Multiple processes are spawned and each gets its own connection 2) A time/transaction limit is enforced across all of the connections at once 3) Timing information is written to a client-side log file 4) The work of running the clients can happen on a remote system, so that it's possible to just test the server-side performance 5) The program is similar enough to any other regular client, using the standard libpq interface, that connection-related overhead should be similar to a real workload. All of those have some challenges before you could duplicate them in a stored procedure context. My opinion of this feature is similar to the one Aiden already expressed: there's already so many ways to do this sort of thing using shell-oriented approaches (as well as generate_series) that it's hard to get too excited about implementing it directly in pgbench. Part of the reason for adding the \shell and \setshell commands way to make tricky things like this possible without having to touch the pgbench code further. I for example would solve the problem you're facing like this: 1) Write a shell script that generates the file I need 2) Call it from pgbench using \shell, passing the size it needs. Have that write a delimited file with the data required. 3) Import the whole thing with COPY. And next thing you know you've even got the CREATE/COPY optimization as a possibility to avoid WAL, as well as the ability to avoid creating the data file more than once if the script is smart enough. Sample data file generation can be difficult; most of the time I'd rather solve in a general programming language. The fact that simple generation cases could be done with the mechanism you propose is true. However, this only really helps cases that are too complicated to express with generate_series, yet not so complicated that you really want a full programming language to generate the data. I don't think there's that much middle ground in that use case. But if this is what you think makes your life easier, I'm not going to tell you you're wrong. And I don't feel that your desire for this features means you must tackle a more complicated thing instead--even though what I personally would much prefer is something making this sort of thing easier to do in regression tests, too. That's a harder problem, though, and you're only volunteering to solve an easier one than that. Stepping aside from debate over usefulness, my main code concern is that each time I look at the pgbench code for yet another tacked on bit, it's getting increasingly creakier and harder to maintain. It's never going to be a good benchmark driver program capable of really complicated tasks. And making it try keeps piling on the risk of breaking it for its intended purpose of doing simple tests. If you can figure out how to keep the code contortions to implement the feature under control, there's some benefit there. I can't think of a unique reason for it; again, lots of ways to solve this already. But I'd probably use it if it were there. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] fsync reliability
On 04/21/2011 04:26 AM, Simon Riggs wrote: However, that begs the question of what happens with WAL. At present, we do nothing to ensure that "the entry in the directory containing the file has also reached disk". Well, we do, but it's not obvious why that is unless you've stared at this for far too many hours. A clear description of the possible issue you and Dan are raising showed up on LKML a few years ago: http://lwn.net/Articles/270891/ Here's the most relevant part, which directly addresses the WAL case: "[fsync] is unsafe for write-ahead logging, because it doesn't really guarantee any _ordering_ for the writes at the hard storage level. So aside from losing committed data, it can also corrupt structural metadata. With ext3 it's quite easy to verify that fsync/fdatasync don't always write a journal entry. (Apart from looking at the kernel code :-) Just write some data, fsync(), and observe the number of writes in /proc/diskstats. If the current mtime second _hasn't_ changed, the inode isn't written. If you write data, say, 10 times a second to the same place followed by fsync(), you'll see a little more than 10 write I/Os, and less than 20." There's a terrible hack suggested where you run fchmod to force the journal out in the next fsync that makes me want to track the poster down and shoot him, but this part raises a reasonable question. The main issue he's complaining about here is a moot one for PostgreSQL. If the WAL rewrites have been reordered but have not completed, the minute WAL replay hits the spot with a missing block the CRC32 will be busted and replay is finished. The fact that he's assuming a database would have such a naive WAL implementation that it would corrupt the database if blocks are written out of order in between fsync call returning is one of the reasons this whole idea never got more traction--hard to get excited about a proposal whose fundamentals rest on an assumption that doesn't turns out to be true on real databases. There's still the "fsync'd a data block but not the directory entry yet" issue as fall-out from this too. Why doesn't PostgreSQL run into this problem? Because the exact code sequence used is this one: open write fsync close And Linux shouldn't ever screw that up, or the similar rename path. Here's what the close man page says, from http://linux.die.net/man/2/close : "A successful close does not guarantee that the data has been successfully saved to disk, as the kernel defers writes. It is not common for a filesystem to flush the buffers when the stream is closed. If you need to be sure that the data is physically stored use fsync(2). (It will depend on the disk hardware at this point.)" What this is alluding to is that if you fsync before closing, the close will write all the metadata out too. You're busted if your write cache lies, but we already know all about that issue. There was a discussion of issues around this on LKML a few years ago, with Alan Cox getting the good pull quote at http://lkml.org/lkml/2009/3/27/268 : "fsync/close() as a pair allows the user to correctly indicate their requirements." While fsync doesn't guarantee that metadata is written out, and neither does close, kernel developers seem to all agree that fsync-before-close means you want everything on disk. Filesystems that don't honor that will break all sorts of software. It is of course possible there are bugs in some part of this code path, where a clever enough test case might expose a window of strange file/metadata ordering. I think it's too weak of a theorized problem to go specifically chasing after though. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Formatting Curmudgeons WAS: MMAP Buffers
're not participating in the thing that needs the most help. This is not a problem you make better with fuzzy management directives to be nicer to people. There are real software engineering issues about how to ensure good code quality at its core. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Formatting Curmudgeons WAS: MMAP Buffers
On 04/21/2011 12:39 PM, Robert Haas wrote: In fact, I've been wondering if we shouldn't consider extending the support window for 8.2 past the currently-planned December 2011. There seem to be quite a lot of people running that release precisely because the casting changes in 8.3 were so painful, and I think the incremental effort on our part to extend support for another year would be reasonably small. The pending EOL for 8.2 is the only thing that keeps me sane when speaking with people who refuse to upgrade, yet complain that their 8.2 install is slow. This last month, that seems to be more than usual "why does autovacuum suck so much?" complaints that would all go away with an 8.3 upgrade. Extending the EOL is not doing any of these users a favor. Every day that goes by when someone is on a version of PostgreSQL that won't ever allow in-place upgrade is just making worse the eventual dump and reload they face worse. The time spent porting to 8.3 is a one-time thing; the suffering you get trying to have a 2011 sized database on 2006's 8.2 just keeps adding up the longer you postpone it. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Formatting Curmudgeons WAS: MMAP Buffers
Andrew Dunstan wrote: Now we could certainly make this quite a bit slicker. Apart from anything else, we should change the indent source code tarball so it unpacks into its own directory. Having it unpack into the current directory is ugly and unfriendly. And we should get rid of the "make clean" line in the install target of entab's makefile, which just seems totally ill-conceived. I think the script I submitted upthread has most of the additional slickness needed here. Looks like we both were working on documenting a reasonable way to do this at the same time the other day. The idea of any program here relying on being able to write to /usr/local/bin as your example did makes this harder for people to run; that's why I made everything in the build tree and just pushed the appropriate directories into the PATH. Since I see providing a script to automate this whole thing as the preferred way to make this easier, re-packaging the indent source tarball to extract to a directory doesn't seem worth the backwards compatibility trouble it will introduce. Improving the entab makefile I don't have an opinion on. It might also be worth setting it up so that instead of having to pass a path to a typedefs file on the command line, we default to a file sitting in, say, /usr/local/etc. Then you'd just be able to say "pgindent my_file.c". OK, so I need to update my script to handle either indenting a single file, or doing all of them. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Formatting Curmudgeons WAS: MMAP Buffers
Robert Haas wrote: But it turns out that it doesn't really matter. Whitespace or no whitespace, if you don't read the diff before you hit send, it's likely to contain some irrelevant cruft, whether whitespace changes or otherwise. Right. Presuming that pgident will actually solve anything leaps over two normally incorrect assumptions: -That the main tree was already formatted with pgident before you started, so no stray diffs will result from it touching things the submitter isn't even involved in. -There is no larger code formatting or diff issues except for spacing. This has been a nagging loose end for a while, so I'd like to see pgindent's rough edges get sorted out so it's easier to use. But whitespace errors because of bad editors are normally just a likely sign of a patch with bigger problems, rather than something that can get fixed and then submissions is good. There is no substitute for the discipline of reading your own diff before submission. I'll easily obsess over mine for an hour before I submit something major, and that time is always well spent. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Formatting Curmudgeons WAS: MMAP Buffers
Andrew Dunstan wrote: What makes you think this isn't possible to run pgindent? There are no secret incantations. The first hit newbies find looking for info about pgident is http://blog.hagander.net/archives/185-pgindent-vs-dash.html which sure looks like secret incantations to me. The documentation src/tools/pgindent/README reads like a magic spell too: find . -name '*.[ch]' -type f -print | \ egrep -v -f src/tools/pgindent/exclude_file_patterns | \ xargs -n100 pgindent src/tools/pgindent/typedefs.list And it doesn't actually work as written unless you've installed pgindent, entab/detab, and the specially patched NetBSD indent into the system PATH somewhere--unreasonable given that this may be executing on a source only tree that has never been installed.. The fact that the documention is only in the README and not with the rest of the code conventions isn't helping either. The last time I tried to do this a few years ago I failed miserably and never came back. I know way more about building software now though, and just got this to work for the first time. Attached is a WIP wrapper script for running pgident that builds all the requirements into temporary directories, rather than expecting you to install anything system-wide or into a PostgreSQL destination directory. Drop this into src/tools/pgindent, make it executable, and run it from that directory. Should do the right thing on any system that has "make" as an alias for "gmake" (TODO to be better about that in the file, with some other nagging things). When I just ran it against master I got a bunch of modified files, but most of them look like things that have been touched recently so I think it did the right thing. A test of my work here from someone who isn't running this for the first time would be helpful. If this works well enough, I think it would make a good helper script to include in the distribution. The loose ends to fix I can take care of easily enough once basic validation is finished. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us #!/bin/bash -ex # Presume that we're in the pgindent directory at the start. # TODO this should switch to the directory where this script is located at, # in case someone calls src/tools/pgident/run-pgident # Back to the base directory pushd ../../.. # Sanity check we're in the right place if [ ! -d src/tools/pgindent ] ; then echo run-pgindent can only be run from within the src/tools/pgindent directory, aborting popd exit 1 fi echo pgindent setting up environment wget ftp://ftp.postgresql.org/pub/dev/indent.netbsd.patched.tgz mkdir -p indent cd indent zcat ../indent.netbsd.patched.tgz | tar xvf - rm -f indent.netbsd.patched.tgz make INDENT_DIR=`pwd` cd .. pushd src/tools/entab directory make ln -s entab detab ENTAB_DIR=`pwd` popd export PATH="$INDENT_DIR:$ENTAB_DIR:$PATH" wget -O src/tools/pgindent/typedefs.list http://buildfarm.postgresql.org/cgi-bin/typedefs.pl # This cleanup can only happen if there is already a makefile; assume # that if there's isn't, this tree is clean enough if [ -f GNUmakefile ] ; then # TODO this may need to be "gmake" on some systems instead make maintainer-clean fi echo pgindent starting run find . -name '*.[ch]' -type f -print | \ egrep -v -f src/tools/pgindent/exclude_file_patterns | \ xargs -n100 src/tools/pgindent/pgindent src/tools/pgindent/typedefs.list # Cleanup of utilities built temporarily here unlink src/tools/entab/detab rm -rf indent popd echo pgindent run complete -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MMAP Buffers
Radosław Smogura wrote: Yes, but, hmm... in Netbeans I had really long gaps (probably 8 spaces, from tabs), so deeper "ifs", comments at the and of variables, went of out my screen. I really wanted to not format this, but sometimes I needed. The guide at http://www.open-source-editor.com/editors/how-to-make-netbeans-use-tabs-for-indention.html seems to cover how to fix this in Netbeans. You want it to look like that screen shot: 4 spaces per indent with matching tab size of 4, and "Expand Tabs to Spaces" unchecked. Generally, if you look at the diff you've created, and your new code doesn't line up right with what's already there, that means the tab/space setup isn't quite right when you were editing. Reading the diff is useful for catching all sorts of other issues, too, so it's just generally a good practice. As Peter already mentioned, the big problem here is that you checked in a modified configure file. I also note that you use C++ style "//" comments, which aren't allowed under the coding guidelines--even though they work fine on many common platforms. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Formatting Curmudgeons WAS: MMAP Buffers
Joshua Berkus wrote: Then you can say that politely and firmly with direct reference to the problem, rather than making the submitter feel bad. That's exactly what happened. And then you responded that it was possible to use a patch without fixing the formatting first. That's not true, and those of us who do patch review are tired of even trying. Our project has an earned reputation for being rejection-happy curmudgeons. This is something I heard more than once at MySQLConf, including from one student who chose to work on Drizzle instead of PostgreSQL for that reason. I think that we could stand to go out of our way to be helpful to first-time submitters. I'll trade you anecdotes by pointing out that I heard from half a dozen business people that the heavy emphasis on quality control and standards was the reason they were looking into leaving MySQL derived distributions for PostgreSQL. I've spent days of time working on documentation to help new submitters get their patches improve to where they meet this community's standards. This thread just inspired another round of that. What doesn't help is ever telling someone they can ignore those and still do something useful we're interested in. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MMAP Buffers
Robert Haas wrote: The OP says that this patch maintains the WAL-before-data rule without any explanation of how it accomplishes that seemingly quite amazing feat. I assume I'm going to have to read this patch at some point to refute this assertion, and I think that sucks. I don't think you have to read any patch that doesn't follow the submission guidelines. The fact that you do is a great contribution to the community. But if I were suggesting how your time would be best spent improving PostgreSQL, "reviewing patches that don't meet coding standards" would be at the bottom of the list. There's always something better for the project you could be working on instead. I just added http://wiki.postgresql.org/wiki/Submitting_a_Patch#Reasons_your_patch_might_be_returned , recycling some existing text, adding some new suggestions. I hope I got the tone of that text right. The intention was to have a polite but clear place to point submitters to when their suggestion doesn't meet the normal standards here, such that they might even get bounced before even entering normal CommitFest review. This MMAP patch looks like it has all 5 of the problems mentioned on that now more focused list. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MMAP Buffers
Tom Lane wrote: * On the other side of the coin, I have seen many a patch that was written to minimize the length of the diff to the detriment of readability or maintainability of the resulting code, and that's *not* a good tradeoff. Sure. that's possible. But based on the reviews I've done, I'd say that the fact someone is even aware that minimizing their diff is something important to consider automatically puts them far ahead of the average new submitter. There are a high percentage of patches where the submitter generates a diff and sents it without even looking at it. That a person would look at their diff and go too far without trying to make it small doesn't happen nearly as much. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MMAP Buffers
Joshua Berkus wrote: Guys, can we *please* focus on the patch for now, rather than the formatting, which is fixable with sed? Never, and that's not true. Heikki was being nice; I wouldn't have even slogged through it long enough to ask the questions he did before kicking it back as unusable. A badly formatted patch makes it impossible to evaluate whether the changes from a submission are reasonable or not without the reviewer fixing it first. And you can't automate correcting it, it takes a lot of tedious manual work. Start doing a patch review every CommitFest cycle and you very quickly realize it's not an ignorable problem. And lack of discipline in minimizing one's diff is always a sign of other code quality issues. Potential contributors to PostgreSQL should know that a badly formatted patch faces an automatic rejection, because no reviewer can work with it easily. This fact is not a mystery; in fact it's documented at http://wiki.postgresql.org/wiki/Submitting_a_Patch : "The easiest way to get your patch rejected is to make lots of unrelated changes, like reformatting lines, correcting comments you felt were poorly worded etc. Each patch should have the minimum set of changes required to fulfil the single stated objective." I think I'll go improve that text next--something like "Ways to get your patch rejected" should be its own section. The problem here isn't whether someone used an IDE or not, it's that this proves they didn't read their own patch before submitting it. Reading one's own diff and reflecting on what you've changed is one of the extremely underappreciated practices of good open-source software development. Minimizing the size of that diff is perhaps the most important thing someone can do in order to make their changes to a piece of software better. Not saying something that leads in that direction would be a disservice to the submitter. P.S. You know what else I feel should earn an automatic rejection without any reviewer even looking at the code? Submitting a patch that claims to improve performance and not attaching the test case you used, along with detailed notes about before/after tests on your own hardware. A hand wave "it's faster" is never good enough, and it's extremely wasteful of our limited reviewer resources to try and duplicate what the submitter claimed. Going to add something about that to the submission guidelines too. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Single client performance on trivial SELECTs
I did some more research into the memory allocation hotspots found in the profile, in regards to what MySQL has done in those areas. (And by "research" I mean "went to the bar last night and asked Baron about it") The issue of memory allocation being a limiter on performance has been nagging that community for long enough that the underlying malloc used can even be swapped with a LD_PRELOAD trick: http://dev.mysql.com/doc/refman/5.5/en/mysqld-safe.html#option_mysqld_safe_malloc-lib Plenty of people have benchmarked that and seen a big difference between the implementations; some sample graphs: http://locklessinc.com/articles/mysql_performance/ http://blogs.sun.com/timc/entry/mysql_5_1_memory_allocator http://mysqlha.blogspot.com/2009/01/double-sysbench-throughput-with_18.html To quote from the last of those, "Malloc is a bottleneck for sysbench OLTP readonly", so this problem is not unique to PostgreSQL. As of 5.5 the better builds are all defaulting to TCMalloc, which is interesting but probably not as useful because it's focused on improving multi-threaded performance: http://goog-perftools.sourceforge.net/doc/tcmalloc.html I'm not sure exactly what is useful to be learned from that specific work. But it does suggest two things: one, this is far from an easy thing to fix. Two, the only reason MySQL does so well on it is because there was some focused work on it, taking a quite a while to accomplish, and involving many people. Doing better for PostgreSQL is something I see as more of a long-term goal, rather than something it would be reasonable to expect quick progress on. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Single client performance on trivial SELECTs
Heikki Linnakangas wrote: In this case you could just use prepared statements and get rid of all the parser related overhead, which includes much of the allocations. Trying that gives me around 9200 TPS instead of 5500 on my laptop, so a pretty big gain here. Will have to include that in my next round of graphs across multiple client loads once I'm home again and can run easily on my server. To provide a matching profile from the same system as the one I already submitted from, for archival sake, here's what the profile I get looks like with "-M prepared": samples %image name symbol name 33093 4.8518 postgres AllocSetAlloc 30012 4.4001 postgres hash_seq_search 27149 3.9803 postgres MemoryContextAllocZeroAligned 26987 3.9566 postgres hash_search_with_hash_value 25665 3.7628 postgres hash_any 16820 2.4660 postgres _bt_compare 14778 2.1666 postgres LockAcquireExtended 12263 1.7979 postgres AllocSetFreeIndex 11727 1.7193 postgres tas 11602 1.7010 postgres SearchCatCache 11022 1.6159 postgres pg_encoding_mbcliplen 10963 1.6073 postgres MemoryContextAllocZero 9296 1.3629 postgres MemoryContextCreate 8368 1.2268 postgres fmgr_isbuiltin 7973 1.1689 postgres LockReleaseAll 7423 1.0883 postgres ExecInitExpr 7309 1.0716 postgres pfree -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Single client performance on trivial SELECTs
This week several list regulars here waded into the MySQL Convention. I decided to revisit PostgreSQL vs. MySQL performance using the sysbench program as part of that. It's not important to what I'm going to describe to understand exactly what statements sysbench runs here or how to use it, but if anyone is curious I've got some more details about how I ran the tests in my talk slides at http://projects.2ndquadrant.com/talks The program has recently gone through some fixes that make it run a bit better both in general and against PostgreSQL. The write tests are still broken against PostgreSQL, but it now seems to do a reasonable job simulating a simple SELECT-only workload. A fix from Jignesh recently made its way into the database generation side of the code that makes it less tedious to test with it too. The interesting part was how per-client scaling compared between the two databases; graph attached. On my 8 core server, PostgreSQL scales nicely up to a steady 50K TPS. I see the same curve, almost identical numbers, with PostgreSQL and pgbench--no reason to suspect sysbench is doing anything shady. The version of MySQL I used hits around 67K TPS with innodb when busy with lots of clients. That part doesn't bother me; nobody expects PostgreSQL to be faster on trivial SELECT statements and the gap isn't that big. The shocking part was the single client results. I'm using to seeing Postgres get around 7K TPS per core on those, which was the case here, and I never considered that an interesting limitation to think about before. MySQL turns out to hit 38K TPS doing the same work. Now that's a gap interesting enough to make me wonder what's going on. Easy enough to exercise the same sort of single client test case with pgbench and put it under a profiler: sudo opcontrol --init sudo opcontrol --setup --no-vmlinux createdb pgbench pgbench -i -s 10 pgbench psql -d pgbench -c "vacuum" sudo opcontrol --start sudo opcontrol --reset pgbench -S -n -c 1 -T 60 pgbench sudo opcontrol --dump ; sudo opcontrol --shutdown opreport -l image:$HOME/pgwork/inst/test/bin/postgres Here's the top calls, from my laptop rather than the server that I generated the graph against. It does around 5.5K TPS with 1 clients and 10K with 2 clients, so same basic scaling: samples %image name symbol name 53548 6.7609 postgres AllocSetAlloc 32787 4.1396 postgres MemoryContextAllocZeroAligned 26330 3.3244 postgres base_yyparse 21723 2.7427 postgres hash_search_with_hash_value 20831 2.6301 postgres SearchCatCache 19094 2.4108 postgres hash_seq_search 18402 2.3234 postgres hash_any 15975 2.0170 postgres AllocSetFreeIndex 14205 1.7935 postgres _bt_compare 13370 1.6881 postgres core_yylex 10455 1.3200 postgres MemoryContextAlloc 10330 1.3042 postgres LockAcquireExtended 10197 1.2875 postgres ScanKeywordLookup 9312 1.1757 postgres MemoryContextAllocZero I don't know nearly enough about the memory allocator to comment on whether it's possible to optimize it better for this case to relieve any bottleneck. Might just get a small gain then push the limiter to the parser or hash functions. I was surprised to find that's where so much of the time was going though. P.S. When showing this graph in my talk, I pointed out that anyone who is making decisions about which database to use based on trivial SELECTs on small databases isn't going to be choosing between PostgreSQL and MySQL anyway--they'll be deploying something like MongoDB instead if that's the important metric. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us <> -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Postgresql on multi-core CPU's: is this old news?
On 04/05/2011 02:21 PM, Mischa Sandberg wrote: Came across the following in a paper from Oct 2010. Was wondering is this is old news I missed in this group. http://pdos.csail.mit.edu/papers/linux:osdi10.pdf about Linux optimization on multi-core CPU's. Only a little old; http://postgresql.1045698.n5.nabble.com/MIT-benchmarks-pgsql-multicore-up-to-48-performance-td3173545.html shows most of the obvious comments to be made about it. There is more detail explaining why the hand-waving done in the paper about increasing NUM_LOCK_PARTITIONS is not a simple improvement at http://postgresql.1045698.n5.nabble.com/Lock-partitions-td1952557.html -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books
Re: [HACKERS] 2nd Level Buffer Cache
On 03/24/2011 03:36 PM, Jim Nasby wrote: On Mar 23, 2011, at 5:12 PM, Tom Lane wrote: Robert Haas writes: It looks like the only way anything can ever get put on the free list right now is if a relation or database is dropped. That doesn't seem too good. Why not? AIUI the free list is only for buffers that are totally dead, ie contain no info that's possibly of interest to anybody. It is *not* meant to substitute for running the clock sweep when you have to discard a live buffer. Turns out we've had this discussion before: http://archives.postgresql.org/pgsql-hackers/2010-12/msg01088.php and http://archives.postgresql.org/pgsql-hackers/2010-12/msg00689.php Investigating this has been on the TODO list for four years now: http://archives.postgresql.org/pgsql-hackers/2007-04/msg00781.php I feel that work in this area is blocked behind putting together a decent mix of benchmarks that can be used to test whether changes here are actually good or bad. All of the easy changes to buffer allocation strategy, ones that you could verify by inspection and simple tests, were made in 8.3. The stuff that's left has the potential to either improve or reduce performance, and which will happen is very workload dependent. Setting up systematic benchmarks of multiple workloads to run continuously on big hardware is a large, boring, expensive problem that few can justify financing (except for Jim of course), and even fewer want to volunteer time toward. This whole discussion of cache policy tweaks is fun, but I just delete all the discussion now because it's just going in circles without a good testing regime. The right way to start is by saying "this is the benchmark I'm going to improve with this change, and it has a profiled hotspot at this point". -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC 2011 Eager MV implementation proposal
On 03/31/2011 04:38 PM, AAMIR KHAN wrote: I would like to implement eager materialized view.An eager materialized view will be updated whenever the view changes. This is done with a system of triggers on all of the underlying tables. Last summer someone worked on snapshot materialized views. That didn't result in anything that could be committed. If you wanted to work on trying to actually finish that, that might go somewhere useful. There are a number of hard problems in getting a working implementation of materialized views that all get ignored by all of the student proposals we get, and what you're talking about doesn't address any of them. You really should read all of the messages in the following threads: http://archives.postgresql.org/pgsql-hackers/2010-04/msg00479.php http://archives.postgresql.org/pgsql-hackers/2010-06/msg00743.php http://archives.postgresql.org/pgsql-hackers/2010-07/msg00396.php And the following summaries: http://wiki.postgresql.org/wiki/Materialized_Views_GSoC_2010 http://rhaas.blogspot.com/2010/04/materialized-views-in-postgresql.html And then say how what you're suggesting fits into the issues raised last summer. The theory and way to implement eager MVs are interesting problems. But working on them won't lead toward code that can be committed to PostgreSQL this year. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
Re: [HACKERS] Performance Farm Release
I just dusted off this code and brought it back to current again. Basically a lot of reformatting the new performance farm parts to minimize their diff. Once that was done, all of the other buildfarm client updates since then applied cleanly. The result is now sitting as a fork of Andrew's client code repo at https://github.com/greg2ndQuadrant/client-code , replacing the repo Scott published at https://github.com/slux/PostgreSQL-Performance-Farm ; much easier to avoid future bit-rot with this structure. The main changes made here are now pretty easy to read on github: https://github.com/greg2ndQuadrant/client-code/commit/d0339a59ceb4711a6b042d3f1d053c77f07720f4 and I've attached the code as a patch here. It also uses some scripts from pgbench-tools that aren't the interesting part. I got a nibble from Endpoint at PGEast about interest in hacking the buildfarm server code to add support for a new test type, so this may start moving forward again if that works out. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us diff --git a/build-farm.conf b/build-farm.conf index 77ff694..fc4b103 100644 --- a/build-farm.conf +++ b/build-farm.conf @@ -39,6 +39,16 @@ my $branch; build_root => '/path/to/buildroot', use_vpath => undef, # set true to do vpath builds + # Performance farm testing + make_perf => 1, # set true to make performance farms, must keep on if not doing a build farm + clients => [qw(8 12 24)], + maxworkers => '12', + sets => '1', + scale => [qw(1 10)], + scriptvar => '-S', + transvar => '-T 60', + pgopts => '-l', + # path to directory with auxiliary web script # if relative, the must be relative to buildroot/branch # possibly only necessary now on WIndows, if at all @@ -150,6 +160,18 @@ my $branch; --with-openssl )], + # Options for performance tests + perf_opts => + [qw( + --enable-nls + --enable-integer-datetimes + --with-perl + --with-python + --with-tcl + --with-krb5 --with-includes=/usr/include/et + --with-openssl + )], + # per-branch contents of extra config for check stage # each branch has an array of setting lines (no \n required) extra_config => diff --git a/run_build.pl b/run_build.pl index 666268f..81487af 100755 --- a/run_build.pl +++ b/run_build.pl @@ -139,6 +139,14 @@ my ($buildroot,$target,$animal, $print_success, $aux_path, $trigger_filter, my $scm_timeout_secs = $PGBuild::conf{scm_timeout_secs} || $PGBuild::conf{cvs_timeout_secs}; +my $make_perf = $PGBuild::conf{make_perf}; +my $clients = $PGBuild::conf{clients}; +my $maxworkers = $PGBuild::conf{maxworkers}; +my $sets = $PGBuild::conf{sets}; +my $scale = $PGBuild::conf{scale}; +my $scriptvar= $PGBuild::conf{scriptvar}; +my $pgopts = $PGBuild::conf {pgopts}; + print scalar(localtime()),": buildfarm run for $animal:$branch starting\n" if $verbose; @@ -152,6 +160,7 @@ if (ref($force_every) eq 'HASH') } my $config_opts = $PGBuild::conf{config_opts}; +my $perf_opts = $PGBuild::conf{perf_opts}; my $scm = new PGBuild::SCM \%PGBuild::conf; my $buildport; @@ -261,6 +270,9 @@ chdir $buildroot || die "chdir to $buildroot: $!"; mkdir $branch unless -d $branch; +my $bench ="$branch/bench"; +mkdir $bench unless -d $bench; + chdir $branch || die "chdir to $buildroot/$branch"; # rename legacy status files/directories @@ -542,6 +554,15 @@ set_last('run.snap',$current_snap) unless $nostatus; my $started_times = 0; +# Additional setup for performance farm +my $perf_farm = 0; +my $time = time; +my $benchpath = "$bench/$time"; +if($make_perf) +{ + $perf_farm = 1; +} + # each of these routines will call send_result, which calls exit, # on any error, so each step depends on success in the previous # steps. @@ -593,6 +614,13 @@ foreach my $locale (@locales) make_install_check($locale); + if($perf_farm) + { + system("mkdir $buildroot/$benchpath"); + print time_str(),"running pgbench analysis ...\n" if $verbose; + benchmark(); + } + # releases 8.0 and earlier don't support the standard method for testing # PLs so only check them for later versions @@ -739,6 +767,218 @@ sub check_make return 'OK'; } +sub benchmark +{ + return if $skip_steps{benchmark}; + my $script; + my @scripts; + my @path; + chdir "$installdir/bin/"; + system("./createdb pgbench"); + my ($scount,$setcount,$ccount,$workers); + my $shorthost; + foreach $scount (@$scale) + { + my @benchout = `./pgbench -i -s $scount pgbench >$devnull 2>&1`; + for($setcount = 0; $setcount<$sets; $setcount++) + { + foreach $ccount (@$clients) + { +my $transvar = $PGBuild::conf{transvar}; +if
Re: [HACKERS] sync rep design architecture (was "disposition of remaining patches")
Daniel Farina wrote: Server A syncreps to Server B Now I want to provision server A-prime, which will eventually take the place of A. Server A syncreps to Server B Server A syncreps to Server A-prime Right now, as it stands, the syncrep patch will be happy as soon as the data has been fsynced to either B or A-prime; I don't think we can guarantee at any point that A-prime can become the leader, and feed B. One of the very fundamental breaks between how this patch implements sync rep and what some people might expect is this concern. Having such tight control over the exact order of failover isn't quite here yet, so sometimes people will need to be creative to work within the restrictions of what is available. The path for this case is probably: 1) Wait until A' is caught up 2) Switchover to B as the right choice to be the new master, with A' as its standby and A going off-line at the same time. 3) Switchover the master role from B to A'. Bring up B as its standby. There are other possible transition plans available too. I appreciate that you would like to do this as an atomic operation, rather than handling it as two steps--one of which puts you in a middle point where B, a possibly inferior standby, is operating at the master. There are a dozen other complicated "my use case says I want and it must be done as " requests for Sync Rep floating around here, too. They're all getting ignored in favor of something smaller that can get built today. The first question I'd ask is whether you could you settle for this more cumbersome than you'd prefer switchover plan for now. The second is whether implementing what this feature currently does would get in the way of coding of what you really want eventually. I didn't get the Streaming Rep + Hot Standby features I wanted in 9.0 either. But committing what was reasonable to include in that version let me march forward with very useful new code, doing another year of development on my own projects and getting some new things get fixed in core. And so far it looks like 9.1 will sort out all of the kinks I was unhappy about. The same sort of thing will need to happen to get Sync Rep committed and then appropriate for more use cases. There isn't any margin left for discussions of scope creep left here; really it's "is this subset useful for some situations and stable enough to commit" now. 2. The unprivileged user can disable syncrep, in any situation. This flexibility is *great*, but you don't really want people to do it when one is performing the switchover. For the moment you may have to live with a situation where user connections must be blocked during the brief moment of switchover to eliminate this issue. That's what I end up doing with 9.0 production systems to get a really clean switchover, there's a second of hiccup even in the best case. I'm not sure yet of the best way yet to build a UI to make that more transparent in the sync rep case. It's sure not a problem that's going to get solved in this release though. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] disposition of remaining patches
Robert Haas wrote: 2. Synchronous replication. Splitting up this patch has allowed some This has gotten a bunch of review, on several different threads. I assume Simon will publish an update when he gets back to his keyboard... That was the idea. If anyone has any serious concerns about the current patch, please don't hold off just because you know Simon is away for a bit. We've been trying to keep that from impacting community progress too badly this week. On top of 4 listed reviewers I know Dan Farina is poking at the last update, so we may see one more larger report on top of what's already shown up. And Jaime keeps kicking the tires too. What Simon was hoping is that a week of others looking at this would produce enough feedback that it might be possible to sweep the remaining issues up soon after he's back. It looks to me like that's about when everything else that's still open will probably settle too. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep v17
Daniel Farina wrote: As it will be somewhat hard to prove the durability guarantees of commit without special heroics, unless someone can suggest a mechanism. Could you introduce a hack creating deterministic server side crashes in order to test this out? The simplest thing that comes to mind is a rule like "kick shared memory in the teeth to force a crash after every 100 commits", then see if #100 shows up as expected. Pick two different small numbers for the interval and you could probably put that on both sides to simulate all sorts of badness. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Debian readline/libedit breakage
Andrew Dunstan wrote: You're assuming a fact not in evidence, namely the existence of an identifiable group of "libedit folks". Last time I looked there was no such group. There appear to be two people working periodically on the upstream NetBSD libedit: http://cvsweb.netbsd.org/bsdweb.cgi/src/lib/libedit/?sortby=date And a third who periodically packages that at http://www.thrysoee.dk/editline/ Those are the group as far as I can tell. It's not encouraging that the Debian issue with libedit+UTF8 has been documented for almost year a now: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=579729 (And we shouldn't assume that GnuTLS is the right replacement for OpenSSL either, BTW). The idea of using NSS instead is an interesting one. Looking at http://en.wikipedia.org/w/index.php?title=Comparison_of_TLS_Implementations it does seem to match the basic feature set of OpenSSL. And the nss_compat_ossl compatibility layer might be useful: http://fedoraproject.org/wiki/Nss_compat_ossl I find it hard to get excited about working to replace the software that has a reasonable license here (readline) rather than trying to eliminate dependence on the one with an unreasonable license (OpenSSL). -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PERFORM] pgbench to the MAXINT
Tom Lane wrote: I think that might be a good idea --- it'd reduce the cross-platform variability of the results quite a bit, I suspect. random() is not to be trusted everywhere, but I think erand48 is pretty much the same wherever it exists at all (and src/port/ provides it elsewhere). Given that pgbench will run with threads in some multi-worker configurations, after some more portability research I think odds are good we'd get nailed by http://sourceware.org/bugzilla/show_bug.cgi?id=10320 : "erand48 implementation not thread safe but POSIX says it should be". The AIX docs have a similar warning on them, so who knows how many versions of that library have the same issue. Maybe we could make sure the one in src/port/ is thread safe and make sure pgbench only uses it. This whole area continues to be messy enough that I think the patch needs to brew for another CF before it will all be sorted out properly. I'll mark it accordingly and can pick this back up later. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Debian readline/libedit breakage
Markus Wanner wrote: Anybody realized that this Debian bug (and several others) got closed in the mean time (Sunday)? According to the changelog [1], Martin Pitt (which I'm CC'ing here, as he might not be aware of this thread, yet) worked around this issue by pre-loading readline via LD_PRELOAD for psql. Personally, I'm a bit suspicious about that solution (technically as well as from a licensing perspective), but it's probably the simplest way to let only psql link against readline. This originated in http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=608442 , and from what I'm reading there it sounds like Martin is inserting this as a workaround but it hasn't passed through Debian Legal yet. I would expect them to reject this as unacceptable. Dynamic linking via LD_PRELOAD is still linking, even if it happens at runtime. I commend Martin for buying some time here by doing that, but this doesn't change the urgency to come up with an alternate solution much to me. As I see it, that change could be reverted at any time via pushback from legal. As far as working around this by releasing our own packages goes, that's useful, but I'd also characterize that as only a workaround rather than a real solution. OpenSSL is open-source, but it's not "free software" via that standards of the FSF, which I feel is a completely reasonable position given the license. When you depend on a software stack built from unambiguously free software, having components that aren't you've wedged in there and are dependent on is never a good idea. I won't consider this truly resolved until GnuTLS support for PostgreSQL is in core. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Debian readline/libedit breakage
Dimitri Fontaine wrote: Now, what I think I would do about the core package is a quite simple backport of them, using Martin's excellent work. Do we want our own QA on them? If yes, I think I would need some help here, maybe with some build farm support for running from our debian packages rather than from either CVS or git. What the RPM packaging does is run this (approximately): pushd src/test/regress make all make MAX_CONNECTIONS=5 check Something similar might be sufficient for QA on the Debian packaging too. The overhead of the buildfarm makes sense if you want to rebuild after every single commit. It may be overkill to go through that just for testing .deb packaging, which realistically you're only going to want to do after each point release. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Debian readline/libedit breakage
charles.mcdev...@emc.com wrote: The GNU people will never be 100% satisfied by anything you do to psql, other than making it GPL. Readline is specifically licensed in a way to try to force this (but many disagree with their ability to force this). The "GNU people" are perfectly content with the license of PostgreSQL. They are unhappy with the license terms of OpenSSL, which is fair because they are ridiculous. Eric Young and the rest of the contributors produced a useful piece of software, and made it considerly less valuable to the world due to the ego trip terms: http://www.openssl.org/source/license.html -- the worst specific problem is the requirement to acknowledge OpenSSL use in advertising of projects that use it. The PostgreSQL community has had similar issues with popular software commonly used on top of PostgreSQL, that happened to use a non-standard license with unique terms. It would be both hypocritical and incorrect to now blame the GNU projects for taking a similar stand on this one. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Debian readline/libedit breakage
Stephen Frost wrote: -Adding GnuTLS support to PostgreSQL would require solving several code quality issues I'm curious about this, but I don't know that I've got time to dive into it and solve it. :/ Note that the past discussion was on the difficulty of matching the existing OpenSSL API using GnuTLS, which is apparently difficult to do. I wasn't trying to suggest there were issues specificially with GnuTLS's code quality. It's more that the APIs are just different enough that it's not trivial to do a swap--which is surprising given how many people have seemingly needed to do exactly this conversion. You'd think there'd be a simple "OpenSSL-like" interface available for GnuTLS by now or something. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books
Re: [HACKERS] Debian readline/libedit breakage
nown and serious bugs/limitations in libedit relative to libreadline -Adding GnuTLS support to PostgreSQL would require solving several code quality issues Idealogically, I find the worst offendor here to be the OpenSSL license. From a license purity perspective I'd like to see their ridiculous requirements bypassed altogether by doing whatever is necessary to get GnuTLS support working. But pragmatically, fixing the bugs and adding features to libedit may be the easier route here. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
Re: [HACKERS] Spread checkpoint sync
e/16384/16752 time=0.002 msec 2011-02-10 02:54:09 EST: DEBUG: checkpoint sync: number=19 file=base/16384/16761_fsm time=0.001 msec 2011-02-10 02:54:09 EST: DEBUG: checkpoint sync: number=20 file=base/16384/16749_vm time=0.001 msec 2011-02-10 02:54:09 EST: DEBUG: checkpoint sync: number=21 file=base/16384/16385 time=0.001 msec 2011-02-10 02:54:09 EST: DEBUG: checkpoint sync: number=22 file=base/16384/16761.1 time=175.575 msec 2011-02-10 02:54:10 EST: LOG: checkpoint complete: wrote 242614 buffers (46.3%); 0 transaction log file(s) added, 0 removed, 34 recycled; write=716.637 s, sync=54.659 s, total=772.976 s; sync files=22, longest=20.874 s, average=2.484 s That's 12 minutes for the write phase, even though checkpoints should be happening every 5 minutes here. With that bad of a write phase overrun, spread sync had no room to work, so no net improvement at all. What is happening here is similar to the behavior I described seeing on my client system but didn't have an example to share until now. During the write phase, looking at "Dirty:" in /proc/meminfo showed the value peaking at over 1GB while writes were happening, and eventually the background writer process wasn't getting any serious CPU time compared to the backends; this is what it looked like via ps: %CPU %MEMTIME+ COMMAND 4001:51.28 /home/gsmith/pgwork/inst/spread-sync/bin/pgbench -f /home/gsmith/pgbench-tools 28.100:39.71 postgres: gsmith pgbench ::1(43871) UPDATE 2800:39.28 postgres: gsmith pgbench ::1(43875) UPDATE 28.100:39.92 postgres: gsmith pgbench ::1(43865) UPDATE 28.100:39.54 postgres: gsmith pgbench ::1(43868) UPDATE 2800:39.36 postgres: gsmith pgbench ::1(43870) INSERT 28.100:39.47 postgres: gsmith pgbench ::1(43877) UPDATE 1800:39.39 postgres: gsmith pgbench ::1(43864) COMMIT 18.100:39.78 postgres: gsmith pgbench ::1(43866) UPDATE 1800:38.99 postgres: gsmith pgbench ::1(43867) UPDATE 18.100:39.55 postgres: gsmith pgbench ::1(43872) UPDATE 18.100:39.90 postgres: gsmith pgbench ::1(43873) UPDATE 18.100:39.64 postgres: gsmith pgbench ::1(43876) UPDATE 18.100:39.93 postgres: gsmith pgbench ::1(43878) UPDATE 18.100:39.83 postgres: gsmith pgbench ::1(43863) UPDATE 1800:39.47 postgres: gsmith pgbench ::1(43869) UPDATE 18.100:40.11 postgres: gsmith pgbench ::1(43874) UPDATE 1000:11.91 [flush-9:1] 0027:43.75 [xfsdatad/6] 09.400:02.21 postgres: writer process I want to make this problem go away, but as you can see spreading the sync calls around isn't enough. I think the main write loop needs to get spread out more, too, so that the background writer is trying to work at a more reasonable pace. I am pleased I've been able to reproduce this painful behavior at home using test data, because that much improves my odds of being able to isolate its cause and test solutions. But it's a tricky problem, and I'm certainly not going to fix it in the next week. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PERFORM] pgbench to the MAXINT
Stephen Frost wrote: Just wondering, did you consider just calling random() twice and smashing the result together..? I did. The problem is that even within the 32 bits that random() returns, it's not uniformly distributed. Combining two of them isn't really going to solve the distribution problem, just move it around. Some number of lower-order bits are less random than the others, and which they are is implementation dependent. Poking around a bit more, I just discovered another possible approach is to use erand48 instead of rand in pgbench, which is either provided by the OS or emulated in src/port/erand48.c That's way more resolution than needed here, given that 2^48 pgbench accounts would be a scale of 2.8M, which makes for a database of about 42 petabytes. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PERFORM] pgbench to the MAXINT
oncerned about actually appear. Unfortunately I can't fit all of those tests in right now, as throwing around one of these 300GB data sets is painful--when you're only getting 72 TPS, looking for large scale patterns in the transactions takes a long time to do. For example, if I really wanted a good read on how bad the data distribution skew due to small random range is, I'd need to let some things run for a week just for a first pass. I'd like to see this go in, but the problems I've spotted are such that I would completely understand this being considered not ready by others. Just having this patch available here is a very useful step forward in my mind, because now people can always just grab it and do a custom build if they run into a larger system. Wavering between Returned with Feedback and Ready for Committer here. Thoughts? -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c index 55ca1e8..5b9b582 100644 --- a/contrib/pgbench/pgbench.c +++ b/contrib/pgbench/pgbench.c @@ -60,6 +60,8 @@ #define INT64_MAX INT64CONST(0x7FFF) #endif +#define MAX_RANDOM_VALUE64 INT64_MAX + /* * Multi-platform pthread implementations */ @@ -364,15 +366,84 @@ usage(const char *progname) progname, progname); } +/* + * strtoint64 -- convert a string to 64-bit integer + * + * This function is a modified version of scanint8() from + * src/backend/utils/adt/int8.c. + * + */ +static int64 +strtoint64(const char *str) +{ + const char *ptr = str; + int64 result = 0; + int sign = 1; + + /* + * Do our own scan, rather than relying on sscanf which might be broken + * for long long. + */ + + /* skip leading spaces */ + while (*ptr && isspace((unsigned char) *ptr)) + ptr++; + + /* handle sign */ + if (*ptr == '-') + { + ptr++; + + /* + * Do an explicit check for INT64_MIN. Ugly though this is, it's + * cleaner than trying to get the loop below to handle it portably. + */ + if (strncmp(ptr, "9223372036854775808", 19) == 0) + { + result = -INT64CONST(0x7fff) - 1; + ptr += 19; + goto gotdigits; + } + sign = -1; + } + else if (*ptr == '+') + ptr++; + + /* require at least one digit */ + if (!isdigit((unsigned char) *ptr)) + fprintf(stderr, "invalid input syntax for integer: \"%s\"\n", str); + + /* process digits */ + while (*ptr && isdigit((unsigned char) *ptr)) + { + int64 tmp = result * 10 + (*ptr++ - '0'); + + if ((tmp / 10) != result) /* overflow? */ + fprintf(stderr, "value \"%s\" is out of range for type bigint\n", str); + result = tmp; + } + +gotdigits: + + /* allow trailing whitespace, but not other trailing chars */ + while (*ptr != '\0' && isspace((unsigned char) *ptr)) + ptr++; + + if (*ptr != '\0') + fprintf(stderr, "invalid input syntax for integer: \"%s\"\n", str); + + return ((sign < 0) ? -result : result); +} + /* random number generator: uniform distribution from min to max inclusive */ -static int -getrand(int min, int max) +static int64 +getrand(int64 min, int64 max) { /* * Odd coding is so that min and max have approximately the same chance of * being selected as do numbers between them. */ - return min + (int) (((max - min + 1) * (double) random()) / (MAX_RANDOM_VALUE + 1.0)); + return min + (int64) (((max - min + 1) * (double) random()) / (MAX_RANDOM_VALUE + 1.0)); } /* call PQexec() and exit() on failure */ @@ -887,7 +958,7 @@ top: if (commands[st->state] == NULL) { st->state = 0; - st->use_file = getrand(0, num_files - 1); + st->use_file = (int) getrand(0, num_files - 1); commands = sql_files[st->use_file]; } } @@ -1007,7 +1078,7 @@ top: if (pg_strcasecmp(argv[0], "setrandom") == 0) { char *var; - int min, + int64 min, max; char res[64]; @@ -1019,15 +1090,15 @@ top: st->ecnt++; return true; } -min = atoi(var); +min = strtoint64(var); } else -min = atoi(argv[2]); +min = strtoint64(argv[2]); #ifdef NOT_USED if (min < 0) { -fprintf(stderr, "%s: invalid minimum number %d\n", argv[0], min); +fprintf(stderr, "%s: invalid minimum number " INT64_FORMAT "\n", argv[0], min); st->ecnt++; return; } @@ -1041,22 +1112,22 @@ top: st->ecnt++; return true; } -max = atoi(var); +max = strtoint64(var); } else -max = atoi(argv[3]); +max = strtoint64(argv[3]); - if (max < min || max > MAX_RANDOM_VALUE) + if (max < min || max > MAX_RANDOM_VALUE64) { -fprintf(stderr
[HACKERS] Allow pg_archivecleanup to ignore extensions
One bit of feedback I keep getting from people who archive their WAL files is that the fairly new pg_archivecleanup utility doesn't handle the case where those archives are compressed. As the sort of users who are concerned about compression are also often ones with giant archives they struggle to cleanup, they would certainly appreciate having a bundled utility to take care of that. The attached patch provides an additional option to the utility to provide this capability. It just strips a provided extension off any matching file it considers before running the test for whether it should be deleted or not. It includes updates to the usage message and some docs about how this might be used. Code by Jaime Casanova and myself. Here's an example of it working: $ psql -c "show archive_command" archive_command cp -i %p archive/%f < /dev/null && gzip archive/%f [Yes, I know that can be written more cleanly. I call external scripts with more serious error handling than you can put into a single command line for this sort of thing in production.] $ psql -c "select pg_start_backup('test',true)" $ psql -c "select pg_stop_backup()" $ psql -c "checkpoint" $ psql -c "select pg_switch_xlog()" $ cd $PGDATA/archive $ ls 00010025.gz 00010026.gz 00010027.gz 00010028.0020.backup.gz 00010028.gz 00010029.gz $ pg_archivecleanup -d -x .gz `pwd` 00010028.0020.backup pg_archivecleanup: keep WAL file "/home/gsmith/pgwork/data/archivecleanup/archive/00010028" and later pg_archivecleanup: removing file "/home/gsmith/pgwork/data/archivecleanup/archive/00010025.gz" pg_archivecleanup: removing file "/home/gsmith/pgwork/data/archivecleanup/archive/00010027.gz" pg_archivecleanup: removing file "/home/gsmith/pgwork/data/archivecleanup/archive/00010026.gz" $ ls 00010028.0020.backup.gz 00010028.gz 00010029.gz We recenty got some on-list griping that pg_standby doesn't handle archive files that are compressed, either. Given how the job I'm working on this week is going, I'll probably have to add that feature next. That's actually an easier source code hack than this one, because of how the pg_standby code modularizes the concept of a restore command. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us diff --git a/contrib/pg_archivecleanup/pg_archivecleanup.c b/contrib/pg_archivecleanup/pg_archivecleanup.c index 7989207..a95b659 100644 *** a/contrib/pg_archivecleanup/pg_archivecleanup.c --- b/contrib/pg_archivecleanup/pg_archivecleanup.c *** const char *progname; *** 36,41 --- 36,42 /* Options and defaults */ bool debug = false; /* are we debugging? */ + char *additional_ext = NULL; /* Extension to remove from filenames */ char *archiveLocation; /* where to find the archive? */ char *restartWALFileName; /* the file from which we can restart restore */ *** static void *** 93,105 --- 94,135 CleanupPriorWALFiles(void) { int rc; + int chop_at; DIR *xldir; struct dirent *xlde; + char walfile[MAXPGPATH]; if ((xldir = opendir(archiveLocation)) != NULL) { while ((xlde = readdir(xldir)) != NULL) { + strncpy(walfile, xlde->d_name, MAXPGPATH); + /* + * Remove any specified additional extension from the filename + * before testing it against the conditions below. + */ + if (additional_ext) + { + chop_at = strlen(walfile) - strlen(additional_ext); + /* + * Only chop if this is long enough to be a file name and the + * extension matches. + */ + if ((chop_at >= (XLOG_DATA_FNAME_LEN - 1)) && + (strcmp(walfile + chop_at,additional_ext)==0)) + { + walfile[chop_at] = '\0'; + /* + * This is too chatty even for regular debug output, but + * leaving it in for program testing. + */ + if (false) + fprintf(stderr, + "removed extension='%s' from file=%s result=%s\n", + additional_ext,xlde->d_name,walfile); + } + } + /* * We ignore the timeline part of the XLOG segment identifiers in * deciding whether a segment is still needed. This ensures that *** CleanupPriorWALFiles(void) *** 113,122 * file. Note that this means files are not removed in the order * they were originally written, in case this worries you. */ ! if (strlen(xlde->d_name) == XLO
Re: [HACKERS] Spread checkpoint sync
Kevin Grittner wrote: There are occasional posts from those wondering why their read-only queries are so slow after a bulk load, and why they are doing heavy writes. (I remember when I posted about that, as a relative newbie, and I know I've seen others.) Sure; I created http://wiki.postgresql.org/wiki/Hint_Bits a while back specifically to have a resource to explain that mystery to offer people. But there's a difference between having a performance issue that people don't understand, and having a real bottleneck you can't get rid of. My experience is that people who have hint bit issues run into them as a minor side-effect of a larger vacuum issue, and that if you get that under control they're only a minor detail in comparison. Makes it hard to get too excited about optimizing them. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PERFORM] pgbench to the MAXINT
The update on the work to push towards a bigger pgbench is that I now have the patch running and generating databases larger than any previously possible scale: $ time pgbench -i -s 25000 pgbench ... 25 tuples done. ... real258m46.350s user14m41.970s sys0m21.310s $ psql -d pgbench -c "select pg_size_pretty(pg_relation_size('pgbench_accounts'));" pg_size_pretty 313 GB $ psql -d pgbench -c "select pg_size_pretty(pg_relation_size('pgbench_accounts_pkey'));" pg_size_pretty 52 GB $ time psql -d pgbench -c "select count(*) from pgbench_accounts" count 25 real18m48.363s user0m0.010s sys0m0.000s The only thing wrong with the patch sent already needed to reach this point was this line: for (k = 0; k < naccounts * scale; k++) Which needed a (int64) cast for the multiplied value in the middle there. Unfortunately the actual test itself doesn't run yet. Every line I see when running the SELECT-only test says: client 0 sending SELECT abalance FROM pgbench_accounts WHERE aid = 1; So something about the updated random generation code isn't quite right yet. Now that I have this monster built, I'm going to leave it on the server until I can sort that out, which hopefully will finish up in the next day or so. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Spread checkpoint sync
Cédric Villemain wrote: Is it worth a new thread with the different IO improvements done so far or on-going and how we may add new GUC(if required !!!) with intelligence between those patches ? ( For instance, hint bit IO limit needs probably a tunable to define something similar to hint_write_completion_target and/or IO_throttling strategy, ...items which are still in gestation...) Maybe, but I wouldn't bring all that up right now. Trying to wrap up the CommitFest, too distracting, etc. As a larger statement on this topic, I'm never very excited about redesigning here starting from any point other than "saw a bottleneck doing on a production system". There's a long list of such things already around waiting to be addressed, and I've never seen any good evidence of work related to hint bits being on it. Please correct me if you know of some--I suspect you do from the way you're brining this up. If we were to consider kicking off some larger work here, I would drive that by asking where the data supporting that work being necessary is at first. It's hard enough to fix a bottleneck that's staring right at you, trying to address one that's just theorized is impossible. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Spread checkpoint sync
pening then is that there can be a long pause between the end of the write phase and when syncs start to happen, which I consider a good thing. Gives the kernel a little more time to try and get writes moving out to disk. Here's what that looks like on my development desktop: 2011-02-07 00:46:24 EST: LOG: checkpoint starting: time 2011-02-07 00:48:04 EST: DEBUG: checkpoint sync: estimated segments=10 2011-02-07 00:48:24 EST: DEBUG: checkpoint sync: naps=99 2011-02-07 00:48:36 EST: DEBUG: checkpoint sync: number=1 file=base/16736/16749.1 time=12033.898 msec 2011-02-07 00:48:36 EST: DEBUG: checkpoint sync: number=2 file=base/16736/16749 time=60.799 msec 2011-02-07 00:48:48 EST: DEBUG: checkpoint sync: naps=59 2011-02-07 00:48:48 EST: DEBUG: checkpoint sync: number=3 file=base/16736/16756 time=0.003 msec 2011-02-07 00:49:00 EST: DEBUG: checkpoint sync: naps=60 2011-02-07 00:49:00 EST: DEBUG: checkpoint sync: number=4 file=base/16736/16750 time=0.003 msec 2011-02-07 00:49:12 EST: DEBUG: checkpoint sync: naps=60 2011-02-07 00:49:12 EST: DEBUG: checkpoint sync: number=5 file=base/16736/16737 time=0.004 msec 2011-02-07 00:49:24 EST: DEBUG: checkpoint sync: naps=60 2011-02-07 00:49:24 EST: DEBUG: checkpoint sync: number=6 file=base/16736/16749_fsm time=0.004 msec 2011-02-07 00:49:36 EST: DEBUG: checkpoint sync: naps=60 2011-02-07 00:49:36 EST: DEBUG: checkpoint sync: number=7 file=base/16736/16740 time=0.003 msec 2011-02-07 00:49:48 EST: DEBUG: checkpoint sync: naps=60 2011-02-07 00:49:48 EST: DEBUG: checkpoint sync: number=8 file=base/16736/16749_vm time=0.003 msec 2011-02-07 00:50:00 EST: DEBUG: checkpoint sync: naps=60 2011-02-07 00:50:00 EST: DEBUG: checkpoint sync: number=9 file=base/16736/16752 time=0.003 msec 2011-02-07 00:50:12 EST: DEBUG: checkpoint sync: naps=60 2011-02-07 00:50:12 EST: DEBUG: checkpoint sync: number=10 file=base/16736/16754 time=0.003 msec 2011-02-07 00:50:12 EST: LOG: checkpoint complete: wrote 14335 buffers (43.7%); 0 transaction log file(s) added, 0 removed, 64 recycled; write=47.873 s, sync=127.819 s, total=227.990 s; sync files=10, longest=12.033 s, average=1.209 s Since this is ext3 the spike during the first sync is brutal, anyway, but it tried very hard to avoid that: it waited 99 * 200ms = 19.8 seconds between writing the last buffer and when it started syncing them (00:42:04 to 00:48:24). Given the slow write for #1, it was then behind, so it immediately moved onto #2. But after that, it was able to insert a moderate nap time between successive syncs--60 naps is 12 seconds, and it keeps that pace for the remainder of the sync. This is the same sort of thing I'd worked out as optimal on the system this patch originated from, except it had a lot more dirty relations; that's why its naptime was the 3 seconds hard-coded into earlier versions of this patch. Results on XFS with mini-server class hardware should be interesting... -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c index 4df69c2..f58ac3e 100644 *** a/src/backend/postmaster/bgwriter.c --- b/src/backend/postmaster/bgwriter.c *** static bool am_bg_writer = false; *** 168,173 --- 168,175 static bool ckpt_active = false; + static int checkpoint_flags = 0; + /* these values are valid when ckpt_active is true: */ static pg_time_t ckpt_start_time; static XLogRecPtr ckpt_start_recptr; *** static pg_time_t last_xlog_switch_time; *** 180,186 static void CheckArchiveTimeout(void); static void BgWriterNap(void); ! static bool IsCheckpointOnSchedule(double progress); static bool ImmediateCheckpointRequested(void); static bool CompactBgwriterRequestQueue(void); --- 182,188 static void CheckArchiveTimeout(void); static void BgWriterNap(void); ! static bool IsCheckpointOnSchedule(double progress,double target); static bool ImmediateCheckpointRequested(void); static bool CompactBgwriterRequestQueue(void); *** CheckpointWriteDelay(int flags, double p *** 691,696 --- 693,701 if (!am_bg_writer) return; + /* Cache this value for a later spread sync */ + checkpoint_flags=flags; + /* * Perform the usual bgwriter duties and take a nap, unless we're behind * schedule, in which case we just try to catch up as quickly as possible. *** CheckpointWriteDelay(int flags, double p *** 698,704 if (!(flags & CHECKPOINT_IMMEDIATE) && !shutdown_requested && !ImmediateCheckpointRequested() && ! IsCheckpointOnSchedule(progress)) { if (got_SIGHUP) { --- 703,709 if (!(flags & CHECKPOINT_IMMEDIATE) &&
Re: [HACKERS] keeping a timestamp of the last stats reset (for a db, table and function)
Tomas Vondra wrote: Because when I create a database, the field is NULL - that's true. But once I connect to the database, the stats are updated and the field is set (thanks to the logic in pgstat.c). OK--so it does what I was hoping for, I just didn't test it the right way. Let's call that a documentation issue and move on. Attached is an updated patch that fixes the docs and some other random bits. Looks ready for committer to me now. Make sure to adjust PGSTAT_FILE_FORMAT_ID, do a cat version bump, and set final OIDs for the new functions. Below is what changed since the last posted version, mainly as feedback for Tomas: -Explained more clearly that pg_stat_reset and pg_stat_reset_single_counters will both touch the database reset time, and that it's initialized upon first connection to the database. -Added the reset time to the list of fields in pg_stat_database and pg_stat_bgwriter. -Fixed some tab/whitespace issues. It looks like you had tab stops set at 8 characters during some points when you were editing non-code files. Also, there were a couple of spot where you used a tab while text in the area used spaces. You can normally see both types of errors if you read a patch, they showed up as misaligned things in the context diff. -Removed some extra blank lines that didn't fit the style of the surrounding code. Basically, all the formatting bits I'm nitpicking about I found just by reading the patch itself; they all stuck right out. I'd recommend a pass of that before submitting things if you want to try and avoid those in the future. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index ca83421..100f938 100644 *** a/doc/src/sgml/monitoring.sgml --- b/doc/src/sgml/monitoring.sgml *** postgres: user --- 267,273 by backends (that is, not by the background writer), how many times those backends had to execute their own fsync calls (normally the background writer handles those even when the backend does its own ! write), total buffers allocated, and time of last statistics reset. *** postgres: user --- 278,286 number of transactions committed and rolled back in that database, total disk blocks read, total buffer hits (i.e., block read requests avoided by finding the block already in buffer cache), ! number of rows returned, fetched, inserted, updated and deleted, the total number of queries cancelled due to conflict with recovery (on ! standby servers), and time of last statistics reset. *** postgres: user + pg_stat_get_db_stat_reset_time(oid) + timestamptz + +Time of the last statistics reset for the database. Initialized to the +system time during the first connection to each database. The reset time +is updated when you call pg_stat_reset on the +database, as well as upon execution of +pg_stat_reset_single_table_counters against any +table or index in it. + + + + pg_stat_get_numscans(oid) bigint *** postgres: user bgwriter_lru_maxpages parameter + + + pg_stat_get_bgwriter_stat_reset_time() + timestamptz + + Time of the last statistics reset for the background writer, updated + when executing pg_stat_reset_shared('bgwriter') + on the database cluster. + + pg_stat_get_buf_written_backend() diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index 718e996..904714f 100644 *** a/src/backend/catalog/system_views.sql --- b/src/backend/catalog/system_views.sql *** CREATE VIEW pg_stat_database AS *** 523,529 pg_stat_get_db_tuples_inserted(D.oid) AS tup_inserted, pg_stat_get_db_tuples_updated(D.oid) AS tup_updated, pg_stat_get_db_tuples_deleted(D.oid) AS tup_deleted, ! pg_stat_get_db_conflict_all(D.oid) AS conflicts FROM pg_database D; CREATE VIEW pg_stat_database_conflicts AS --- 523,530 pg_stat_get_db_tuples_inserted(D.oid) AS tup_inserted, pg_stat_get_db_tuples_updated(D.oid) AS tup_updated, pg_stat_get_db_tuples_deleted(D.oid) AS tup_deleted, ! pg_stat_get_db_conflict_all(D.oid) AS conflicts, ! pg_stat_get_db_stat_reset_time(D.oid) AS stats_reset FROM pg_database D; CREATE VIEW pg_stat_database_conflicts AS *** CREATE VIEW pg_stat_bgwriter AS *** 570,576 **
Re: [HACKERS] Linux filesystem performance and checkpoint sorting
Josh Berkus wrote: So: Linux flavor? Kernel version? Disk system and PG directory layout? OS configuration and PostgreSQL settings are saved into the output from the later runs (I added that somewhere in the middle): http://www.2ndquadrant.us/pgbench-results/294/pg_settings.txt That's Ubuntu 10.04, kernel 2.6.32. There is a test rig bug that queries the wrong PostgreSQL settings in the later ones, but they didn't change after #294 here. The kernel configuration stuff is accurate through, which confirms exactly what settings for the dirty_* parameters was effective for each during the tests I was changing those around. 16GB of RAM, 8 Hyperthreaded cores (4 real ones) via Intel i7-870. Areca ARC-1210 controller, 256MB of cache. Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda1 40G 7.5G 30G 20% / /dev/md1 838G 15G 824G 2% /stripe /dev/sdd1 149G 2.1G 147G 2% /xlog /stripe is a 3 disk RAID0, setup to only use the first section of the drive ("short-stroked"). That makes its performance a little more like a small SAS disk, rather than the cheapo 7200RPM SATA drives they actually are (Western Digital 640GB WD6400AAKS-65A7B). /xlog is a single disk, 160GB WD1600AAJS-00WAA. OS, server logs, and test results information all go to the root filesystem on a different drive. My aim was to get similar performance to what someone with an 8-disk RAID10 array might see, except without the redundancy. Basic entry-level database server here in 2011. bonnie++ on the main database disk: read 301MB/s write 215MB/s, seeks 423.4/second. Measured around 10K small commits/second to prove the battery-backed write cache works fine. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Linux filesystem performance and checkpoint sorting
Mark Kirkwood wrote: Are you going to do some runs with ext4? I'd be very interested to see how it compares (assuming that you are on a kernel version 2.6.32 or later so ext4 is reasonably stable...). Yes, before I touch this system significantly I'll do ext4 as well, and this is running the Ubuntu 10.04 2.6.32 kernel so ext4 should be stable enough. I have some PostgreSQL work that needs to get finished first though. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Spread checkpoint sync
As already mentioned in the broader discussion at http://archives.postgresql.org/message-id/4d4c4610.1030...@2ndquadrant.com , I'm seeing no solid performance swing in the checkpoint sorting code itself. Better sometimes, worse others, but never by a large amount. Here's what the statistics part derived from the sorted data looks like on a real checkpoint spike: 2011-02-04 07:02:51 EST: LOG: checkpoint starting: xlog 2011-02-04 07:02:51 EST: DEBUG: BufferSync 10 dirty blocks in relation.segment_fork 17216.0_2 2011-02-04 07:02:51 EST: DEBUG: BufferSync 159 dirty blocks in relation.segment_fork 17216.0_1 2011-02-04 07:02:51 EST: DEBUG: BufferSync 10 dirty blocks in relation.segment_fork 17216.3_0 2011-02-04 07:02:51 EST: DEBUG: BufferSync 548 dirty blocks in relation.segment_fork 17216.4_0 2011-02-04 07:02:51 EST: DEBUG: BufferSync 808 dirty blocks in relation.segment_fork 17216.5_0 2011-02-04 07:02:51 EST: DEBUG: BufferSync 799 dirty blocks in relation.segment_fork 17216.6_0 2011-02-04 07:02:51 EST: DEBUG: BufferSync 807 dirty blocks in relation.segment_fork 17216.7_0 2011-02-04 07:02:51 EST: DEBUG: BufferSync 716 dirty blocks in relation.segment_fork 17216.8_0 2011-02-04 07:02:51 EST: DEBUG: BufferSync 3857 buffers to write, 8 total dirty segment file(s) expected to need sync 2011-02-04 07:03:31 EST: DEBUG: checkpoint sync: number=1 file=base/16384/17216.5 time=1324.614 msec 2011-02-04 07:03:31 EST: DEBUG: checkpoint sync: number=2 file=base/16384/17216.4 time=0.002 msec 2011-02-04 07:03:31 EST: DEBUG: checkpoint sync: number=3 file=base/16384/17216_fsm time=0.001 msec 2011-02-04 07:03:47 EST: DEBUG: checkpoint sync: number=4 file=base/16384/17216.10 time=16446.753 msec 2011-02-04 07:03:53 EST: DEBUG: checkpoint sync: number=5 file=base/16384/17216.8 time=5804.252 msec 2011-02-04 07:03:53 EST: DEBUG: checkpoint sync: number=6 file=base/16384/17216.7 time=0.001 msec 2011-02-04 07:03:54 EST: DEBUG: compacted fsync request queue from 32768 entries to 2 entries 2011-02-04 07:03:54 EST: CONTEXT: writing block 1642223 of relation base/16384/17216 2011-02-04 07:04:00 EST: DEBUG: checkpoint sync: number=7 file=base/16384/17216.11 time=6350.577 msec 2011-02-04 07:04:00 EST: DEBUG: checkpoint sync: number=8 file=base/16384/17216.9 time=0.001 msec 2011-02-04 07:04:00 EST: DEBUG: checkpoint sync: number=9 file=base/16384/17216.6 time=0.001 msec 2011-02-04 07:04:00 EST: DEBUG: checkpoint sync: number=10 file=base/16384/17216.3 time=0.001 msec 2011-02-04 07:04:00 EST: DEBUG: checkpoint sync: number=11 file=base/16384/17216_vm time=0.001 msec 2011-02-04 07:04:00 EST: LOG: checkpoint complete: wrote 3813 buffers (11.6%); 0 transaction log file(s) added, 0 removed, 64 recycled; write=39.073 s, sync=29.926 s, total=69.003 s; sync files=11, longest=16.446 s, average=2.720 s You can see that it ran out of fsync absorption space in the middle of the sync phase, which is usually when compaction is needed, but the recent patch to fix that kicked in and did its thing. Couple of observations: -The total number of buffers I'm computing based on the checkpoint writes being sorted it not a perfect match to the number reported by the "checkpoint complete" status line. Sometimes they are the same, sometimes not. Not sure why yet. -The estimate for "expected to need sync" computed as a by-product of the checkpoint sorting is not completely accurate either. This particular one has a fairly large error in it, percentage-wise, being off by 3 with a total of 11. Presumably these are absorbed fsync requests that were already queued up before the checkpoint even started. So any time estimate I drive based off of this count is only going to be approximate. -The order in which the sync phase processes files is unrelated to the order in which they are written out. Note that 17216.10 here, the biggest victim (cause?) of the I/O spike, isn't even listed among the checkpoint writes! The fuzziness here is a bit disconcerting, and I'll keep digging for why it happens. But I don't see any reason not to continue forward using the rough count here to derive a nap time from, which I can then feed into the "useful leftovers" patch that Robert already refactored here. Can always sharpen up that estimate later, I need to get some solid results I can share on what the delay time does to the throughput/latency pattern next. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Spread checkpoint sync
Michael Banck wrote: On Sat, Jan 15, 2011 at 05:47:24AM -0500, Greg Smith wrote: For example, the pre-release Squeeze numbers we're seeing are awful so far, but it's not really done yet either. Unfortunately, it does not look like Debian squeeze will change any more (or has changed much since your post) at this point, except for maybe further stable kernel updates. Which file system did you see those awful numbers on and could you maybe go into some more detail? Once the release comes out any day now I'll see if I can duplicate them on hardware I can talk about fully, and share the ZCAV graphs if it's still there. The server I've been running all of the extended pgbench tests in this thread on is running Ubuntu simply as a temporary way to get 2.6.32 before Squeeze ships. Last time I tried installing one of the Squeeze betas I didn't get anywhere; hoping the installer bug I ran into has been sorted when I try again. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books
[HACKERS] Linux filesystem performance and checkpoint sorting
Switching to a new thread for this summary since there's some much more generic info here...at this point I've finished exploring the major Linux filesystem and tuning options I wanted to, as part of examining changes to the checkpoint code. You can find all the raw data at http://www.2ndquadrant.us/pgbench-results/index.htm Here are some highlights of what's been demonstrated there recently, with a summary of some of the more subtle and interesting data in the attached CSV file too: -On ext3, tuning the newish kernel tunables dirty_bytes and dirty_background_bytes down to a lower level than was possible using the older dirty_*ratio ones shows a significant reduction in maximum latency on ext3; it drops to about 1/4 of the worst-case behavior. Unfortunately transactions per second takes a 10-15% hit in the process. Not shown in the data there is that the VACUUM cleanup time between tests was really slowed down, too, running at around half the speed of when the system has a full-size write cache. -Switching from ext3 to xfs gave over a 3X speedup on the smaller test set: from the 600-700 TPS range to around 2200 TPS. TPS rate on the larger data set actually slowed down a touch on XFS, around 10%. Still, such a huge win when it's better makes it easy to excuse the occasional cases where it's a bit slower. And the latency situation is just wildly better, the main thing that drove me toward using XFS more in the first place. Anywhere from 1/6 to 1/25 of the worst-case latency seen on ext3. With abusively high client counts for this hardware, you can still see >10 second pauses, but you don't see >40 second ones at moderate client counts like ext3 experiences. -Switching to the lower possible dirty_*bytes parameters on XFS was negative in every way. TPS was cut in half, and maximum latency actually went up. Between this and the nasty VACUUM slowdown, I don't really see that much potential for these new tunables. They do lower latency on ext3 a lot, but even there the penalty you pay for that is quite high. VACUUM in particular seems to really, really benefit from having a giant write cache to dump its work into--possibly due to the way the ring buffer implementation avoids using the database's own cache for that work. -Since earlier tests suggested sorting checkpoints gave little change on ext3, I started testing that with XFS instead. The result is a bit messy. At the lower scale, TPS went up a bit, but so did maximum latency. At the higher scale, TPS dropped in some cases (typically less than 1%), but most latency results were better too. At this point I would say checkpoint sorting remains a wash: you can find workloads it benefits a little, and others it penalizes a little. I would say that it's neutral enough on average that if it makes sense to include for other purposes, that's unlikely to be a really bad change for anyone. But I wouldn't want to see it committed by itself; there needs to be some additional benefit from the sorting before it's really worthwhile. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books "Compact fsync",,"ext3",,,"XFS + Regular Writes",,"Sorted Writes" "scale","clients","tps","max_latency","XFS Speedup","tps","max_latency","tps","max_latency","TPS Delta","%","Latency Delta" 500,16,631,17116.31,3.49,2201,1290.73,2210,2070.74,9,0.41%,780.01 500,32,655,24311.54,3.37,2205,1379.14,2357,1971.2,152,6.89%,592.06 500,64,727,38040.39,3.11,2263,1440.48,2332,1763.29,69,3.05%,322.81 500,128,687,48195.77,3.2,2201,1743.11,2221,2742.18,20,0.91%,999.07 500,256,747,46799.48,2.92,2184,2429.74,2171,2356.14,-13,-0.60%,-73.6 1000,16,321,40826.58,1.21,389,1586.17,386,1598.54,-3,-0.77%,12.37 1000,32,345,27910.51,0.91,314,2150.94,331,2078.02,17,5.41%,-72.91 1000,64,358,45138.1,0.94,336,6681.57,320,6469.71,-16,-4.76%,-211.87 1000,128,372,47125.46,0.88,328,8707.42,330,9037.63,2,0.61%,330.21 1000,256,350,83232.14,0.91,317,11973.35,315,11248.18,-2,-0.63%,-725.17 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PERFORM] pgbench to the MAXINT
Robert Haas wrote: At least in my book, we need to get this committed in the next two weeks, or wait for 9.2. Yes, I was just suggesting that I was not going to get started in the first week or two given the other pgbench related tests I had queued up already. Those are closing up nicely, and I'll start testing performance of this change over the weekend. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] keeping a timestamp of the last stats reset (for a db, table and function)
Thinking I should start with why I think this patch is neat...most of the servers I deal with are up 24x7 minus small amounts of downtime, presuming everyone does their job right that is. In that environment, having a starting timestamp for when the last stats reset happened lets you quickly compute some figures in per-second terms that are pretty close to actual average activity on the server. Some examples of how I would use this: psql -c " SELECT CAST(buffers_backend * block_size AS numeric) / seconds_uptime / (1024*1024) AS backend_mb_per_sec FROM (SELECT *,extract(epoch from now()-stats_reset) AS seconds_uptime, (SELECT cast(current_setting('block_size') AS int8)) AS block_size FROM pg_stat_bgwriter) AS raw WHERE raw.seconds_uptime > 0 " backend_mb_per_sec 4.27150807681618 psql -c " SELECT datname,CAST(xact_commit AS numeric) / seconds_uptime AS commits_per_sec FROM (SELECT *,extract(epoch from now()-stats_reset) AS seconds_uptime FROM pg_stat_database) AS raw WHERE raw.seconds_uptime > 0 " datname | commits_per_sec ---+ template1 | 0.0338722604313051 postgres | 0.0363144438470267 gsmith| 0.0820573653236174 pgbench | 0.059147072347085 Now I reset, put some load on the system and check the same stats afterward; watch how close these match up: $ psql -d pgbench -c "select pg_stat_reset()" $ pgbench -j 4 -c 32 -T 30 pgbench transaction type: TPC-B (sort of) scaling factor: 100 query mode: simple number of clients: 32 number of threads: 4 duration: 30 s number of transactions actually processed: 6604 tps = 207.185627 (including connections establishing) tps = 207.315043 (excluding connections establishing) datname | commits_per_sec ---+ pgbench | 183.906308135572 Both these examples work as I expected, and some playing around with the patch didn't find any serious problems with the logic it implements. One issue though, an oversight I think can be improved upon; watch what happens when I create a new database: $ createdb blank $ psql -c "select datname,stats_reset from pg_stat_database where datname='blank'" datname | stats_reset -+- blank | That's not really what I would hope for here. One major sort of situation I'd like this feature to work against is the one where someone asks for help but has never touched their database stats before, which is exactly what I'm simulating here. In this case that person would be out of luck, the opposite of the experience I'd like a newbie to have at this point. The logic Tomas put in here to initialize things in the face of never having a stat reset is reasonable. But I think to really be complete, this needs to hook database creation and make sure the value gets initialized with the current timestamp, not just be blank. Do that, and I think this will make a nice incremental feature on top of the existing stats structure. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers