Re: [HACKERS] Parsing config files in a directory
On Mon, 26 Oct 2009, Alvaro Herrera wrote: some things are defined in postgresql.conf by initdb and you probably want to be able to change them by SET PERSISTENT anyway (e.g. lc_messages, listen_addresses, shared_buffers) An obvious next step once the directory parsing is committed is to change initdb to put all of its changes into a separate file. Ideally, 8.5 would ship with a postgresql.conf having zero active settings, and the conf/ directory would have two entries: initdb.conf : shared_buffers, lc_messages, listen_addresses, etc. persistent.conf : Blank except for comment text People who want to continue managing just the giant postgresql.conf are free to collapse the initdb.conf back into the larger file instead. If we wanted to make that transition easier, an option to initdb saying do things the old way might make sense. I think the best we can do here is make a path where new users who don't ask for anything special get a setup that's easy for tools to work on, while not completely deprecating the old approach for those who want it--but you have to ask for it. -- * Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parsing config files in a directory
On Mon, 26 Oct 2009, Alvaro Herrera wrote: But to me this also says that SET PERSISTENT has to go over 00initdb.conf and add a comment mark to the setting. Now you're back to being screwed if the server won't start because of your change, because you've lost the original working setting. I think the whole idea of making tools find duplicates and comment them out as part of making their changes is fundamentally broken, and it's just going to get worse when switching to use more config files. The fact that user edits can introduce the same problem, where something is set in more than one file but only one of them works, means that you can run into this even if tool editing hygiene is perfect. A whole new approach is needed if you're going to get rid of this problem both for tools and for manual edits. What I've been thinking of is making it possible to run a configuration file check that scans the config structure exactly the same way as the server, but when it finds a duplicate setting it produces a warning showing where the one being ignored is. The patch added near to the end of 8.4 development that remembers the source file and line number of lines already parsed made that more straightforward I think. Not having that data is what made this hard to write when I last considered it a while ago. If you had that utility, it's a simple jump to then make it run in a --fix mode that just comments out every such ignored duplicate. Now you've got a solution to this problem that handles any sort of way users can mess with the configuration. One might even make a case that this tool should get run just after every time the server starts successfully. -- * Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parsing config files in a directory
On Mon, 26 Oct 2009, Tom Lane wrote: BTW, why do we actually need an includedir mechanism for this? A simple include of a persistent.conf file seems like it would be enough. Sure, you could do it that way. This patch is more about elegance rather than being strictly required. The general consensus here seemed to be that if you're going to start shipping the database with more than one config file, rather than just hacking those in one at a time it would be preferrable to grab a directory of them. That seems to be how similar programs handle things once the number of shipped config files goes from 1 to 1. One thing this discussion has made me reconsider is whether one of those files needs to be enforced as always the last one to be parsed, similar to how postgresql.conf is always the first one. I am slightly concerned that a future SET PERSISTENT mechanism might update a setting that's later overriden by a file that just happens to be found later than the mythical persistent.conf. I'd rather worry about that in the future rather than burden current design with that detail though. Alvaro already introduced the init-script way of handling this by suggesting the configuration file name 00initdb ; using that and 99persistent would seem to be a reasonable solution that's quite familiar to much of the target audience here. Note that I don't think that standard requires anything beyond what the proposed patch already does, processing files in alphabetical order. -- * Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parsing config files in a directory
On Mon, 26 Oct 2009, Tom Lane wrote: When and if there is some evidence of people actually getting confused, we could consider trying to auto-comment-out duplicate settings. But I've never heard of any other tool doing that, and fail to see why we should think Postgres needs to. It's what people tend to do when editing the postgresql.conf file(s) by hand, which is why I think there's some expectation that tools will continue that behavior. What everyone should understand is that we don't have more tools exactly because their design always gets burdened with details like that. This is easy to handle by hand, but hard to get a program to do in a way that satisfies what everyone is looking for. Raising the bar for tool-assisted changes (and I'm including SET PERSISTENT in that category) like that is one reason so few such tools have been written. -- * Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parsing config files in a directory
On Mon, 26 Oct 2009, Kevin Grittner wrote: We do find the include capabilities useful. For example, for our 72 production servers for county Circuit Court systems, we copy an identical postgresql.conf file to each county, with the last line being an include to an overrides conf file in /etc/. For most counties that file is empty. For counties where we've installed extra RAM or where data is not fully cached, we override settings like effective_cache_size or the page costs. I can't see where any of the options under discussion would do much to help an environment like ours -- they seem more likely to help shops with fewer servers or more relaxed deployment procedures. That's exactly a use case the parsing config files in a directory feature aims to make easier to manage. You can just mix and match files that adjust a subset of the postgresql.conf without having to explicitly include them. For this sort of situation, you could create a base set of configuration changes, then a set that customizes for less common server configurations, and possibly even server-specific ones. Copy in the subset from that master list of possible configuration sets that apply to this server and you're done. Since variations on this feedback keep coming up, let's be be clear here: there is nothing this patch aims to add you can't already do with include files. It's just a way to make more aggressive use of include files easier to manage, and therefore make doing so in the default configuration less objectionable. -- * Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parsing config files in a directory
On Mon, 26 Oct 2009, Greg Stark wrote: When scanning postgresql.conf.d we should follow the Apache/Debian standard of scanning only files which match a single simple hard-coded template. I think the convention is basically the regexp ^[0-9a-zA-Z-]*.conf$. It's important that it exclude typical backup file conventions like foo~ or foo.bak and lock file conventions like .#foo. There's no need for this to be configurable and I think that would be actively harmful. If the default glob pattern is *.conf, won't all those already be screened out? I can see your point that letting it be adustable will inevitably result in some fool one day writing a bad matching pattern that does grab backup/lock files. But is that concern so important that we should limit what people who know what they're doing are allowed to do? That also seems to be the theme of the rest of your comments about how to reorganize the postgresql.conf file. Your comments about what should and shouldn't be configurable presumes it's OK for your priorities and what you like to be enforced as policy on everyone. Whether or not I agree with you, I object to the idea of dictating in this area because it just encourages argument. The goal here is to add flexibility and ways people can choose to work with the configuration, not to replace what's being done now outright with an approach everyone must adopt. -- * Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parsing config files in a directory
It sounds like there's a consensus brewing here on what should get done with this particular patch now. Let me try to summarize: -The new feature should be activated by allowing you to specify a directory to include in the postgresql.conf like this: includedir 'conf' With the same basic semantics for how that directory name is interpreted as the existing include directive. Tom has some concerns on how this will be implemented, with glob portability to Windows and error cleanup as two of the issues to consider. -Within that directory, only file names of the form *.conf will be processed. More flexibility is hard to implement and of questionable value. -The order they are processed in will be alphabetical. This allows (but doesn't explictly require) using the common convention of names like 99name to get a really obvious ordering. -The default postgresql.conf should be updated to end with the sample includedir statement shown above. This will make anything that goes into there be processed after the main file, and therefore override anything in it. -An intended purpose here is making tools easier to construct. It's impractical to expect every tool that touches files in the config directory to do an exhaustive sweep to find every other place there might be a conflict and comment them all out. The fact that pg_settings shows users the exact file and line they setting that is the active one is a good enough tool to allow DBAs to work through most of the problem cases. And as far as how it impacts planning: -A future patch to initdb could move the changes it makes from the primary file to one in the config directory. It might make sense to use a name like 00initdb.conf to encourage a known good naming practice for files in the config directory; that doesn't need to get nailed down now though. -This patch makes it easier to envision implementing a smaller default postgresql.conf, but it doesn't require such a change to be useful. -SET PERSISTENT is still a bit away. This patch assists in providing a cleaner preferred way to implement that, and certainly doesn't make it harder to build. The issue of how to handle backing out changes that result in a non-functional server configuration is still there. And there's some support for the idea that the SQL interface should do more sanity checks to make sure its setting changes aren't being overridden by config files parsed later than we might expect from external tuning tools. Magnus, was there anything else you wanted feedback on here? -- * Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parsing config files in a directory
On Tue, 27 Oct 2009, Dimitri Fontaine wrote: I parse the current status as always reading files in the postgresql.conf.d directory located in the same place as the current postgresql.conf file. Way upthread I pointed out that what some packagers have really wanted for a while now is to put the local postgresql.conf changes into /etc rather than have them live where the database does. Allowing the directory to be customized makes that possible. The idea is to improve flexiblity and options for DBAs and packagers as long as it's not difficult to implement the idea, and allowing for a relocatable config directory isn't that hard. Tom had a reserve about allowing the user the control the overloading behavior, but it appears that what we're trying to provide is a way for tools not to fight against DBA but help him/her. So Greg Stark's idea do sounds better: .d/ files are read first in alphabetical order, then postgresql.conf is read. If the DBA want to manually edit the configuration and be sure his edit will have effect, he just edits postgresql.conf. No wondering. We're trying to make allowances and a smooth upgrade path for old-school users who don't want to use this approach. At the same time, let's be clear: people who do that are going to find themselves increasingly cut-off from recommended pracice moving forward. I want to make it possible for them to continue operating as they have been, while making it obvious that approach is on its way out. If you want a future where it's easier for tools to operate, the config directory goes last and overrides anything put in the primary postgresql.conf in the default config. Having it inserted as an explicit includedir line lets the DBA move it to the front themselves if they want to. One thing we cannot do is make the includedir line implicit. It must be the case that someone who opens a new postgresql.conf file and browses it sees exactly what's being done, so they can disable it or move the order it happens in around. The regexp is still to be agreed upon, [0-9a-zA-Z-_.]+.conf or sth. This is being left to the author of the code to decide. There's reason to believe that *.conf is going to be hard enough to implement, and that's acceptable. If it turns out that it's easier than expected to make a full regex syntax possible here, maybe this should get revisited on next review. Then the pg_settings view could also embed the comments. That whole bit you outlined is an interesting idea, but it doesn't impact this patch so I'd rather not see it drag discussion out further right now. 00-initdb.conf if you want some bikesheding to happen That's a future patch anyway, we can bikeshed more after it's been submitted. One file per GUC is certainly never going to fly though, it's been hard enough getting people to accept going from one file to more than one. -- * Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parsing config files in a directory
On Tue, 27 Oct 2009, Kevin Grittner wrote: I have 200 clusters. I understand the proposal. I see no benefit to me. -Kevin, the troglodyte ;-) It looks like we'll have to settle this the only way your kind understands then: a battle to the death using clubs. See you at the next conference! -- * Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parsing config files in a directory
On Tue, 27 Oct 2009, Greg Stark wrote: If they all had to edit the same file then they have to deal with writing out values and also reading them back. Everyone would need a config file parser and have to make deductions about what other tools were trying to do and how to interact with them. Exactly, that's the situation we're trying to escape from now in a nutshell. To answer Tom's question about providing better guidelines for tool authors, I was hoping to provide the first such tool and submit a patch for refactoring initdb using the same approach before 8.5 is done. I'd rather see that nailed down with a concrete proof of concept attached that implements a candidate approach by example rather than to just talk about it in general. I don't think that needs to hold up work on this patch though, particularly given that I'm dependent on this one being committed for my plan to work. -- * Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parsing config files in a directory
On Tue, 27 Oct 2009, Robert Haas wrote: I guess I didn't consider the possibility that someone might reuse an 8.4 postgresql.conf on an 8.5 server. That could be awkward. Happens all the time, and it ends up causing problems like people still having settings for GUCs that doesn't even exist anymore. You know how we could make this problem less likely to bite people? By putting everything the user wants to customize that isn't done by initdb into another file. Then they can just move that file into the new version. That's the direction we're trying to move here, except much slower than you're suggesting because we've already through about some of these gotchas. Obviously you could do the same thing by completely gutting the whole postgresql.conf, but I was hoping for a step in the right direction that doesn't require something that drastic yet. The length of this thread has already proven why it's not worth even trying to completely trim the file down. Had you never brought that up this discussion would be done already. If you have a strong feeling about this, write a patch and submit it; I'm not going to talk about this anymore. I was thinking that the algorithm would be something like: Read the old postgresql.conf and write it back out to a new file line by line This sounds familiar...oh, that's right, this is almost the same algorithm pgtune uses. And it sucks, and it's a pain to covert the tool into C because of it, and the fact that you have to write this sort of boring code before you can do a single line of productive work is one reason why we don't have more tools available; way too much painful grunt work to write. True, but actually having a good SET PERSISTENT command would solve most of this problem, because the tools could just use that. The system running the tool and the one where the changes are being made are not the same. The database isn't necessarily even up when the tool is being run yet. The main overlap here is that one of the output formats available to future tools could be a series of SET PERSISTENT commands one could then run elsewhere, which is already on my pgtune roadmap when it's possible to implement. You're doing a good job of reminding me why I didn't have a good vision of where this all needed to go until after I wrote a working tuning tool, to get a feel for the painful parts. I wish I could share all of the postgresql.conf files I've seen so you could better appreciate how people torture the poor file in the field. -- * Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parsing config files in a directory
On Wed, 28 Oct 2009, Alvaro Herrera wrote: Huh, isn't this code in initdb.c already? The sketched out design I have for a contrib/pgtune in C presumes that I'd start by refactoring the relevant bits from initdb into a library for both programs to use. But the initdb code doesn't care about preserving existing values when making changes to them; it just throws in its new settings and moves along. So what's there already only handles about half the annoying parts most people would expect a tuning tool that reads the existing file and operates on it to do. Also, I wouldn't be surprised to find that it chokes on some real-world postgresql.conf files. The postgresql.conf.sample it's being fed is fairly pristine. A tuning tool that intends to read any postgresql.conf it's fed can't always assume it's in exactly standard form. I've recently started collecting complicated postgresql.conf lines that crashed my Python code as people submit bug reports with those. You might be surprised at all of the places people put whitespace at. -- * Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parsing config files in a directory
On Wed, 28 Oct 2009, Tom Lane wrote: Why in the world are you looking at initdb? The standard reference for postgresql.conf-reading code, by definition, is guc-file.l. I think the odds of building something that works right, without borrowing that same flex logic, are about nil. initdb was the only sample around that actually makes changes to the postgresql.conf. It's also a nice simple standalone program that's easy to borrow pieces from, which guc-file.l is not. That's the reason it looks tempting at first. If as you say the only right way to do this is to use the flex logic, that just reinforced how high the bar is for someone who wants to write a tool that modifies the file. Periodically we get people who show up saying hey, I'd like to write a little [web|cli|gui] tool to help people update their postgresql.conf file, and when the answer they get incudes first you need to implement this grammar... that's scares off almost all of them. It didn't work on me because I used to write compilers for fun before flex existed. But even I just skimmed it and pragmatically wrote a simpler postgresql.conf parser implementation that worked well enough to get a working prototype out the door, rather than properly the whole grammar. -- * Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parsing config files in a directory
On Wed, 28 Oct 2009, Greg Stark wrote: It's also a blatant violation of packaging rules for Debian if not every distribution. If you edit the user's configuration file then there's no way to install a modified default configuration file. You can't tell the automatic modifications apart from the user's modifications. So the user will get a prompt asking if he wants the new config file or to keep his modifications which he never remembered making. The postgresql.conf file being modified is generated by initdb, and it's already being customized per install by the initdb-time rules like detection for maximum supported shared_buffers. It isn't one of the files installed by the package manager where the logic you're describing kicks in. The conflict case would show up, to use a RHEL example, if I edited a /etc/sysconfig/postgresql file and then a changed version of that file appeared upstream. Stuff in PGDATA is all yours and not tracked as a config file. -- * Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parsing config files in a directory
On Wed, 28 Oct 2009, Josh Berkus wrote: It's the basic and unsolvable issue of how do you have a file which is both perfectly human-readable-and-editable *and* perfectly machine-readable-and-editable at the same time. Let's see...if I remember correctly from the last two rounds of this discussion, this is the point where someone pops up and says that switching to XML for the postgresql.conf will solve this problem. Whoever does that this time goes into the ring with Kevin and I, but they don't get a club. (All fight proceeds to benefit SPI of course). -- * Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parsing config files in a directory
On Wed, 28 Oct 2009, Robert Haas wrote: It would be completely logical to break up the configuration file into subfiles by TOPIC. That would complicate things for tool-writers because they would need to get each setting into the proper file, and we currently don't have any infrastructure for that. Already done: # select name,category from pg_settings limit 1; name | category --+--- add_missing_from | Version and Platform Compatibility / Previous PostgreSQL Versions You could make one per category, and pgtune for example already knows all this info. The somewhat arbitrary category assignments Josh put things into are what Peter was complaining about upthread. Questions like is 'effective_cache_size' a memory parameters or an optimizer one? show why this is not trivial to do well. -- * Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch set under development to add usage reporting.
On Fri, 30 Oct 2009, John Murtari wrote: We now have a basic patch set that works and is basically stable (not recommended for production servers!). We've dedicated a page at our web site and it hopefully has answers to most of your questions, and also has the patch set for download. These are for 7.4.19 - the version included with RHEL 4. This is kind of interesting, but targeting 7.4.19 isn't going to get you very far toward code anyone else will use. That release is 6 years old, it's filled with unsolvable limitations, it's basically at end of life. The fact that it's bundled with RHEL4 and there are some legacy installs still floating around are the only reason it's not completely gone from everyone's radar. In short, if you actually care about your data, you should be running a newer version of the database regardless of what RHEL ships. And you should be building patches against no earlier than 8.4 if you want something that has any hope of being accepted into mainstream development. Eventually the patch will need to apply to the 8.5 work in progress source code tree before it's even a candidate to merge. You can probably get away with developing against a more stable version like 8.4.1, if you must target something people can also deploy, but even that's not ideal and will eventually turn into a code merge hurdle. -- * Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] next CommitFest
On Sun, 8 Nov 2009, Robert Haas wrote: I would personally prefer not to be involved in the management of the next CommitFest. Having done all of the July CommitFest and a good chunk of the September CommitFest, I am feeling a bit burned out. I was just poking around on the Wiki, and it looks like the role of the CommitFest manager isn't very well documented yet. Since you've done all of them since introducing the new CF software, I'm not sure if anyone else even knows exactly what you've been doing. The transition over to that was so successful there isn't even a copy of the schedule for 8.5 on the Wiki itself. Could you find some time this week to rattle off an outline of the work involved? It's hard to decide whether to volunteer to help without having a better idea of what's required. -- * Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch committers
Bruce Momjian wrote: True, but even I avoid patches I don't understand, and practicing by applying them could lead to a very undesirable outcome, e.g. instability. The usual type of practice here should come from applying trivial patches, or ones that don't impact code quality. Docs patches come to mind as a good way someone could get used to the commit process without introducing much potential mayhem along the way. As far as keeping new people away from complicated patches, ultimately you just have to trust that anyone who can commit has a reasonable idea of their own capabilities. I seriously doubt you're going to find a new committer jumping right in by committing hot standby out of the gate just because they could do so. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch committers
Robert Haas wrote: I tried to help, but I was fairly tied up with overall CommitFest management and did not have time for a full read-through of every patch. I think it's completely unreasonable to expect the CF manager to do any patch review themselves. It's a hard enough job to keep going without actually getting your hands into the details. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch committers
Bruce Momjian wrote: I also think the bad economy is making it harder for people/companies to devote time to community stuff when paid work is available. I think this explains away more of the recent situation than you're giving it credit for. When everybody's fat and happy and it's easy to generate/raise money, it's also easy to throw money toward the community. When times are tight, giving away work that you might charge for (or have already charged for) is harder for a company to justify. It's easy to plan to have someone do community work when you hire them, only to realize down the line that business has dried up enough that you're stuck with the choice between them doing that and a job that will make or break and upcoming payroll. And that's where a lot more businesses are at right now than at any time in a long while. After looking for an example of the boom/bust cycle impacting this community's work that's old enough to be clearer in hindsight, I would suggest noting that Great Bridge was officially announced in May of 2000 and was gone by the end of 2001. Overlay those dates on top of http://www.google.com/finance?q=INDEXNASDAQ:.IXIC after switching Zoom to show 10 years. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] next CommitFest
Selena Deckelmann wrote: On Tue, Nov 10, 2009 at 10:40 PM, Greg Smith gsm...@gregsmith.com wrote: I was just poking around on the Wiki, and it looks like the role of the CommitFest manager isn't very well documented yet. It's pretty straightforward. Robert has actually done a great job of communicating about this to the patch reviewers. That's good to hear. What I was hinting at was that some of the community knowledge here should start getting written down now that the process has matured, rather than trying to directly transfer just to one other person. I'm not sure if Robert has shared 100% of what he does with the reviewers or not, but in general the easiest way to divest yourself of a position is to document how someone else can do it. I don't know that having to poke through list archives or chat with someone is necessarily the best way to transfer that knowledge. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com
Re: [HACKERS] next CommitFest
Robert Haas wrote: Here's an attempt. http://wiki.postgresql.org/wiki/Running_a_CommitFest Perfect, that's the sort of thing I was looking for the other day but couldn't find anywhere. I just made a pass through better wiki-fying that and linking it to the related pages in this area. Two things look to be true at the moment: 1) The call for reviewers is already running late and needs to start ASAP. 2) Some of the experienced helpers from the previous CFs, like Selena, should eventually be able to help, just everybody is busy during when the first round of action has to happen here. Given all that, I'm thinking that unless we get an enthusiastic volunteer by tomorrow, I'll kick off the call for reviewers myself and follow that through to initial patch assignments. I don't expect to have as much time as Robert put into the last couple of CommitFests after that, but this one looks smaller and with more familiar patches than those. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] write ahead logging in standby (streaming replication)
Tom Lane wrote: Fujii Masao masao.fu...@gmail.com writes: The problem is that fsync needs to be issued too frequently, which would be harmless in asynchronous replication, but not in synchronous one. A transaction would have to wait for the primary's and standby's fsync before returning a success to a client. Surely that is exactly what is *required* if the user has asked for synchronous replication. This a distressingly common thing people get wrong about replication. You can either have synchronous replication, which as you say has to be slow: you must wait for an fsync ACK from the secondary and a return trip before you can say something is committed on the primary. Or you can get better performance by not waiting for all of those things, but the minute you do that it's *not* synchronous replication anymore. You can't get high-performance and true synchronous behavior; you have to pick one. The best you can do if you need both is work on accelerating fsync everywhere using the standard battery-backed write cache technique. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com
[HACKERS] CommitFest 2009-11 Call for Reviewers
In a few days the 3rd 8.5 development CommitFest, 2009-11, is going to kick off, with the end goal being an alpha3 prerelease. If you have a patch in progress, you'll need to submit it before the deadline of 2009-11-15 00:00:00 GMT for it to be considered during this round: http://wiki.postgresql.org/wiki/Submitting_a_Patch The actual process of the CommitFest itself is fairly well documented at this point: http://wiki.postgresql.org/wiki/Reviewing_a_Patch http://wiki.postgresql.org/wiki/RRReviewers http://wiki.postgresql.org/wiki/Running_a_CommitFest For lack of a more qualified volunteer, I'll be handling the initial round of patch assignments and reviewer organization. I suspect we'll reorganize on the fly as things proceed based on who has time; I'd certainly welcome patch-chasing help in addition to reviewing. Since the backlog for this CommitFest is so far lighter than we've seen recently, the small patches that don't already have an active reviewer shouldn't be too difficult to get through. Please send me an email (without copying the list) if you are available to help with review. Include any information that might be helpful in assigning you an appropriate patch. If there's a specific one you want to claim, by all means let me know that. All reviewers will need to be subscribed to the RRR mailing list, so when you write me please also follow the subscription link at http://archives.postgresql.org/pgsql-rrreviewers/ to add yourself to that list, too, if you're not already there. The set of patches I have the least feel for are the five ECPG submissions, some of which were reviewed already. I would particularly appreciate any early information reviewers might provide about their capability/willingness to work on that set. Those are not so easy to just split among multiple people due to how they relate to one another. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RRR] CommitFest 2009-11 Call for Reviewers
Josh Berkus wrote: On 11/12/09 9:45 AM, Greg Smith wrote: For lack of a more qualified volunteer, I'll be handling the initial round of patch assignments and reviewer organization. Hmmm? Who's more qualified than you, exactly? I was alluding to the fact that Robert isn't available to handle this one. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com
Re: [HACKERS] CommitFest 2009-11 Call for Reviewers
Tom Lane wrote: AFAIK the ecpg patches are all waiting on Michael Meskes to have time to review/commit them. ecpg is pretty much his turf and no other committers are likely to touch these patches. Great to know, and since some of the regular reviewers already made a pass through them there's probably not too much general feedback left anyway. I just marked all of those as having Michael as the reviewer. If it gets to where those are the main remaining hold-up I guess we'll revisit who else might help out then. Would rather get the patches it's more obvious how to handle out of the way first. Not considering those, HS/SR, or other patches with an already assigned reviewer, we're at 16 patches in the queue, and I've got 9 reviewer volunteers just so far today. Barring a flood of last-minute entries, if I can get each reviewer to handle one patch and a moderate percentage of them to handle two, that should be all it takes for this round. Will move the rest of the discussion here to just rrreviewers. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch committers
Tom Lane wrote: While I'm not against promoting more committers to deal with the influx of patches, the only way I know for people to get to the skill level of being fully competent reviewers is to have done a lot of patch writing themselves. The dynamic going on right now is that many people who might otherwise be writing their own patches are instead doing patch review to try and keep the project as a whole moving forward. I actually had two off-list discussions about that just today, that topic pops up pretty regularly as I talk with contributors at all levels. Since most people have an upper limit on how much community time they can spend, every minute spent reviewing is one you're not working on your own patches during. The way you're describing the qualification process, it would be easy to conclude that there's a reviewer ladder, and a developer ladder, and only climbing the latter leads to being a committer--that no matter how much review you do, it doesn't really count as a committer grade skill. I'm not sure that's the message you want to be sending, because anyone who dreams of being a committer is going to stay as far away from doing review as they can if that notion spreads. Based on the growing frustration with doing review doesn't leave me with time for my own patches I keep hearing, that perception is already something to be wary of. If the primary criteria is generating patches that apply with minimal changes, you could make a case that someone who's gotten skilled enough as a reviewer to only pass through patches of that quality should get some recognition even if they didn't write them. That's clearly a useful subset of the skills needed to commit patches only if they look to be ready for it. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] write ahead logging in standby (streaming replication)
Fujii Masao wrote: Personally, I think that semi-synchronous replication is sufficient for HA. Whether or not you think it's sufficient for what you have in mind, synchronous replication requires a return ACK from the secondary before you say things are committed on the primary. If you don't do that, it's not true sync replication anymore; it's asynchronous replication. Plenty of people decide that a local commit combined with a promise to synchronize as soon as possible to the slave is good enough for their apps, which as you say is getting referred to as semi-synchronous replication nowadays. That's an awful name though, because it's not true--that's asynchronous replication, just aiming for minimal lag. It's OK to say that's what you want, but you can't say it's really a synchronous commit anymore if you do things that way. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] write ahead logging in standby (streaming replication)
Fujii Masao wrote: Umm... what is your definition of synchronous? I'm planning to provide four synchronization modes as follows, for v8.5. Does this fit in your thought? The primary waits ... before returning success of a transaction; * nothing - asynchronous replication * recv ACK - semi-synchronous replication * fsync ACK - semi-synchronous replication * redo ACK - synchronous replication Or, in synchronous replication, we must wait a fsync and a redo ACK? Right, those are the possibilities, all four of them have valid use cases in the field and are worth implementing. I don't like the label semi-synchronous replication myself, but it's a valuable feature to implement, and that is unfortunately the term other parts of the industry use for that approach. But everyone needs to be extremely careful with the terminology here: if you say synchronous replication, that *only* means what you're labeling redo ACK (WAL ACK really). Synchronous replication should not be used as a group term that includes the semi-synchronous variations, which are in fact asynchronous despite their marketing name. If someone means semi-synchronous, but they say synchronous thinking it's a shared term also applicable to the semi-synchronous variations here, that's just going to be confusing for everyone. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] SE-PgSQL/lite (r2429)
KaiGai Kohei wrote: In the v8.4 development cycle, I got a suggestion to reduce a burden of reviewer to split off a few functionalities, such as security_context system column and row-level access controls. I lost track of this patch and related bits somewhere along the way, had to triage my unread mail a few times. Could someone summarize how it now fits into plans for more general row-level access controls in the database? I know incompatibilities between the SEPosgreSQL model for row filtering and thoughts for a more general permissions feature that did something similar were a major design issue in the early 8.4 versions of SEPostgreSQL, and that as you say you've been working on that. I'm not sure what relationship there is between those two today though, or exactly where the general non-SELinux row filtering is at on the roadmap. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] write ahead logging in standby (streaming replication)
Fujii Masao wrote: On Fri, Nov 13, 2009 at 1:49 PM, Greg Smith g...@2ndquadrant.com wrote: Right, those are the possibilities, all four of them have valid use cases in the field and are worth implementing. I don't like the label semi-synchronous replication myself, but it's a valuable feature to implement, and that is unfortunately the term other parts of the industry use for that approach. BTW, MySQL and DRBD use the term semi-synchronous: http://forge.mysql.com/wiki/ReplicationFeatures/SemiSyncReplication http://www.drbd.org/users-guide/s-replication-protocols.html Yeah, that's the other parts of the industry I was referring to. MySQL uses semi-synchronous to distinguish between its completely asynchronous default replication mode and one where it provides a somewhat safer implementation. The description reads more as asynchronous with some synchronous elements, not one style of synchronous implementation. None of their documentation wanders into the problem area here by calling it a true synchronous solution when it's really not--MySQL Cluster is their synchronous vehicle. It's fine to adopt the term semi-synchronous, as it's become quite popular and people are going to label the PG implementation with it regardless of what is settled on here. But we should all try to be careful to use it as correctly as possible. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] SE-PgSQL/lite (r2429)
KaiGai Kohei wrote: I found a uncertain term in your comment. It seems to me the model has two meanings in this context. - The way to make access control decision (allowed? or denied?). - The granularity of access controls (tables? columns? or tuples?). What I meant by the SEPosgreSQL model for row filtering was the original implementation you had, where row filtering was handled by code specific to SEPostgreSQL, not something generic enough to be used for other purposes. I wasn't sure what if anything from there was still in the patch, and you answered that clearly enough. Thanks for clarifying where things are at. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] next CommitFest
Simon Riggs wrote: All the CF manager needs to do is ensure that every patch submitted chalks up one review. If you think about it, we wouldn't actually need any rr reviewers at all then, because if we have 20 patches we would have 20 reviews due. So the whole scheme is self-balancing In fact, just suggesting the guideline that everyone who submits a patch should review one here was sufficient to pull in a number of submitters who volunteered to do a single review as well, moving some distance toward what you're describing. It seems we had a perception here that joining rrreviewers subscribed you to doing multiple patch reviews; I've let multiple submitters who were trying to help out know it's OK to just grab one patch and review without even getting involved on that list. Take a look at https://commitfest.postgresql.org/action/commitfest_view?id=4 right now. I've been suggesting to people that they assign themselves to the patches they like, and it's nearing completely populated two days before the CommitFest has even started. I have 6 reviewers that haven't been assigned anything yet and there are only 8 unassigned patches out there. In several cases, assigning the reviewer turned out to be quite easy because so many submitters joined in--just assign someone who submitted a patch in the same area. So it far it looks sufficient to introduce the expectation that submitters should also do a review, without even needing to make that a firm rule. That helps increase the reviewer pool significantly, addressing the general problem Robert has been fighting, while not forcing people like Dave who have other pulls on their time into a review role. We'll see whether the follow-through here is good or not, maybe this will decay yet. For now, simply telling submitters that the review of their own patches might be influenced by whether they do a good job reviewing someone else's has improved things considerably over past CommitFests, and it's hard to imagine how someone could complain about a guideline that fair. The most difficult part here remains finding reviewers for the really big patches. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Commitfest patches
On Fri, 28 Mar 2008, Gregory Stark wrote: I described which interfaces worked on Linux and Solaris based on empirical tests. I posted source code for synthetic benchmarks so we could test it on a wide range of hardware. I posted graphs based on empirical results. Is it possible to post whatever script that generates the graph (gnuplot?) so people can compare the results they get to yours? It's when I realized I didn't have that and would have to recreate one myself that my intention to just run some quick results and ship them to you lost momentum and I never circled back to it. It would be nice if you made it easier for people to generate fancy results here as immediate gratification for them (while of course just asking them to send the raw data). I can run some tests on smaller Linux/Solaris systems to see if they don't show a regression, that was my main concern about this experiment. Some of the discussion that followed your original request for tests was kind of confusing as far as how to interpret the results as well; I think I know what to look for but certainly wouldn't mind some more guidance there, too. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New boxes available for QA
On Tue, 1 Apr 2008, Guillaume Smet wrote: I wonder if it's not worth it to have a very simple thing already reporting results as the development cycle for 8.4 has already started (perhaps several pgbench unit tests testing various type of queries with a daily tree) The pgbench-tools utilities I was working on at one point anticipated this sort of test starting one day. You can't really get useful results out of pgbench without running it enough times that you get average or median values. I dump everything into a results database which can be separated from the databases used for running the test, and then it's easy to compare day to day aggregate results across different query types. I haven't had a reason to work on that recently, but if you've got a semi-public box ready for benchmarks now I do. Won't be able to run any serious benchmarks on the systems you described, but should be great for detecting basic regressions and testing less popular compile-time options as you describe. As far as the other more powerful machines you mentioned go, would need to know a bit more about the disks and disk controller in there to comment about whether those are worth the trouble to integrate. The big missing piece of community hardware that remains elusive would be a system with =4 cores, =8GB RAM, and =8 disks with a usable write-caching controller in it. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [JDBC] Re: [HACKERS] How embarrassing: optimization of a one-shot query doesn't work
On Tue, 1 Apr 2008, Guillaume Smet wrote: A good answer is probably to plan optional JDBC benchmarks in the benchfarm design - not all people want to run Java on their boxes but we have servers of our own to do so. The original pgbench was actually based on an older test named JDBCbench. That code is kind of old and buggy at this point. But with some care and cleanup it's possible to benchmark not only relative Java performance with it, but you can compare it with pgbench running the same queries on the same tables to see how much overhead going through Java is adding. Original code at http://mmmysql.sourceforge.net/performance/ , there's also some improved versions at http://developer.mimer.com/features/feature_16.htm I'm not sure if all of those changes are net positive for PostgreSQL though, they weren't last time I played with this. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] build multiple indexes in single table pass?
On Tue, 1 Apr 2008, Andrew Dunstan wrote: I don't know if this has come up before exactly, but is it possible that we could get a performance gain from building multiple indexes from a single sequential pass over the base table? It pops up regularly, you might even have walked by a discussion of this idea with myself, Jan, and Jignesh over the weekend. Jignesh pointed out that index creation was a major drag on his PostgreSQL benchmarking operations and I've run into that myself. I have a large dataset and creating a simple index takes around 70% of the time it takes to load the data in the first place, his multiple index tables took multiples of load time to index. Considering that the bulk load isn't exactly speedy either this gives you an idea how much room for improvement there is. The idea we were bouncing around went a step past that and considered this: if you have good statistics on a table, and you have a sample set of queries you want to execute against it, how would you use that information to plan what indexes should be created? Needing to be able to create multiple indexes at once efficiently was an implementation detail to pull that off. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Patch queue - wiki (was varadic patch)
On Wed, 2 Apr 2008, Bruce Momjian wrote: The new permanent ones are permanent against mailbox movement, and in fact the comments and thread merging also travels with the email. The someone replied to your comment links in e-messages I've been getting the last few days have all been working, which is a first. The configuration you're running right now I'd consider the first candidate to be a stable version, so thumbs up from me for reaching that point. It's clear to me only now that you can think of the patch queue as being a list with this structure: 1) Patch name (defaults to the subject of the first message) 2) List of messages related to that patch 3) List of comments 4) Status 5) Assigned reviewers Bruce's toolchain converts an mbox of messages to generate the first two, then has a web interface to allow adding the third. Right now the message list is internally consistant but not useful in the long term (doesn't have links to the archives, just this temporary page). Until the search for message ID feature is added to the archives I don't know that this situation can be improved. Those hacking on tools to convert Bruce's currently preferred working form (that revolves around mbox files) into something else that's web oriented are stuck with considering how all the above information is going to be handled before everybody will be satisfied. I can see how a script that converts the current pages into wiki markup, with placeholders where someone can manually update the comments to summarize those on the page, would be helpful. That basically creates an easier to read Queue summary like Stephan was doing for 8.3--that included items 1,4,5 from the above. But that's a one-way operation that doesn't really help with the commenting situation, and it's inevitably going to lag behind the mailbox-centered queue unless it's made fully automatic. I can't think of anything better that doesn't require building some sort of database that holds all this information and drives page generation. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch queue - wiki (was varadic patch)
On Fri, 4 Apr 2008, Dave Page wrote: We must be talking at cross purposes because I really cannot believe you're asking me how to add a link to a wiki page :-o He wants to know how to automate turning an entire mbox file full of them into wiki markup, now how to do one at a time. Other people have been running such tools for Bruce but he doesn't have one he can become comfortable with running himself yet. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] modules
On Thu, 3 Apr 2008, Joshua D. Drake wrote: IMO the core modules should be compiled via configure with something like: ./configure --enable-module=ALL If you really want to make the problems with using contrib modules go away, so they are a) installed even by lazy ISPs who just do compile/make/make install, and b) not viewed as second-class citizens when people have to ask them to be installed, this won't do it. You should default to installing all the modules and provide configure options to turn them off instead. All PostgreSQL installations should have them all available (but not installed in the database, as you point out) unless someone goes out of their way to circumvent that. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Commit fest queue
On Wed, 9 Apr 2008, Tom Lane wrote: Gregory Stark [EMAIL PROTECTED] writes: What would move us in the direction of this mythical patch tracker would be if we knew exactly what our workflow was. Once we know what our workflow is then we could pick a tool which enforces that workflow. Well, I don't think we want or need an enforced workflow. What we need is just a list of pending patches so that nothing falls through the cracks. Making sure nothing falls through the cracks is exactly the point of an enforced workflow. It might be a manual operation, it might be some piece of software, but ultimately you need a well-defined process where things move around but don't get dropped. Exactly how said enforcement happens is certainly open to discussion though. Last time I chimed in on this subject I tried unsuccessfully to move discussion into this area--trying to nail down the structure of a patch processing workflow--but all I managed to do was kick off was a discussion of the trivia involved with one step. A better attempt is below. As you say, most of the work is in recognizing which emails deserve to be entered into the list, and that's not subject to automation (not in this decade anyway). Sure, but that can still be an input to the workflow. Since I'm unphased by criticism and have been watching this whole 'Fest fairly closely, I'll even throw out a sample for a more formal workflow outline. Always easier to map this stuff out when you've got a dummy proposal to beat up. This is aimed to look somewhat like what happened this time around (except using the newer tools that are basically built now) rather than to be a more grand vision: Input: submissions to -patches and -hackers Processing: Saved via mail reader software Output: mbox file with relevant items Person: Bruce Input: mbox file Processing: Run script Output: Patch queue detail wiki page, with links to the archives Person: Greg Stark via his script Input: Patch queue detail Processing: Manually editing page, perhaps with some tool assistance Output: Patch queue summary wiki page Person: Alvaro Input: Patch queue summary Processing: Patch committed, removed from page Output: Updated patch queue summary, e-mail to author Person: Tom, Bruce, other committers Input: Patch queue summary Processing: Patch changed to be a TODO item Output: Expanded TODO list, updated patch queue summary, e-mail to author Person: Bruce Input: Patch queue summary Processing: Patch rejected or bounced back with comments Output: Reduced patch queue summary, e-mail to author Person: Bruce There's a clear hole for messages to fall into when they're being summarized into the patch summary step, I recall Tom saying something about items that didn't make it into the current summary. That needs to be improved a bit. I also note that I didn't diagram separate review steps because I didn't see them happen in a formal way this time around that I could use as a model. As a sideline observer here it seems to me that Bruce has a good and hard to replace process to kick this all off already going, so don't mess with that. It would be nice to find vict...err, volunteers to pull him out of the later steps though for a net reduction in his time. Simply getting things organized better from the start should help with getting more people helping out with review; the common complaint seemed to be I can't figure out what to help with in this big mess which having a summary from the start should improve. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Commit fest queue
On Wed, 9 Apr 2008, Marc G. Fournier wrote: Do other large projects accept patches 'ad hoc' like we do? FreeBSD? Linux? KDE? The Linux procedure is documented at http://www.mjmwired.net/kernel/Documentation/SubmittingPatches Linux was forced into some structure by the SCO lawsuit circa 2004, in that they track who patches came from more carefully now. But the process of submission to the Linux kernel developer's mailing list is even less organized than here; as stated in that document, they will drop patches without comment whenever they please. However, they do have a person designated Trivial Patch Monkey which is such a great title that you have to forgive the rest of the problems in the process. FreeBSD includes a program called send-pr just to submit problem reports into their system which can include feature changes. You can get an idea how sophisticated their tracking for bug patches is by looking at http://www.freebsd.org/cgi/query-pr-summary.cgi?query KDE's process works similarly to here, e-mail based with specific people assigned to track submissions to the various portions of the project: http://developer.kde.org/documentation/other/developer-faq.html#q2.21 GNOME makes all submitters create a report in bugzilla and tracks from there: http://live.gnome.org/GnomeLove/SubmittingPatches Apache also pushes everything through bugzilla: http://httpd.apache.org/dev/patches.html The interesting quote there is: Traditionally, patches have been submitted on the developer's mailing list as well as through the bug database. Unfortunately, this has made it hard to easily track the patches. And without being able to easily track them, too many of them have been ignored. Patches must now be submitted through the bug database... The thing that will obviously go away if this project were to switch to such a model is that right now, there are lots of ideas that go by that would never be submitted as patches like that. But Bruce snags them and turns them into todo items and such rather than letting the idea just get lost in the archives. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Commit fest queue
On Thu, 10 Apr 2008, Brendan Jurd wrote: [Automatic e-mail notification] is trivial to configure in a real tracker. Less so for a wiki page, but it could still be accomplished with the careful application of script-fu. Anyone who is interested can sign up for e-mail notification whenever a specific wiki page is modified right now, that's a standard MediaWiki feature. If you wanted you could even sign up a mailing list as the entity being notified. That's not exactly what you had in mind I think, but it's close enough to be useful for now. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How to submit a patch
On Wed, 16 Apr 2008, Joshua D. Drake wrote: I've added a redirect at http://wiki.postgresql.org/wiki/CommitFest which currently points to May, but should be updated whenever we close a commitfest against new submissions. We should also update the FAQ. I wouldn't bother with that yet. That whole area of the Wiki is still moving around a bit, and I expect some more usefully targetted pages will emerge (How to submit a patch comes to mind). Having a stable CommitFest URL is handy, but I don't think it's where the FAQ should be sending people. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Plan targetlists in EXPLAIN output
On Thu, 17 Apr 2008, Tom Lane wrote: For debugging the planner work I'm about to do, I'm expecting it will be useful to be able to get EXPLAIN to print the targetlist of each plan node, not just the quals (conditions) as it's historically done. I've heard that some of the academic users of PostgreSQL were hoping to add features in this area in order to allow better using planner internals for educational purposes. It would be nice if that were available for such purposes without having to recompile. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Lessons from commit fest
On Fri, 18 Apr 2008, Gregory Stark wrote: The reason I was asking these questions was because I was thinking about how hard it would be to generate the list from a textual analysis instead of using object files. Is there some reason I don't understand why the listing doyxgen creates isn't good enough here? http://doxygen.postgresql.org/globals_type.html Scraping that HTML seems like it would be pretty straightforward. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How to submit a patch
On Wed, 16 Apr 2008, Heikki Linnakangas wrote: Based on my observations, there's basically three different workflows a patch can follow (assuming the patch gets committed in the end) This list was so good that I used it as the basis for a new page on the wiki: http://wiki.postgresql.org/wiki/Submitting_a_Patch I just did a big cleanup of the whole developer's area there. Rather than the nested mess there before, there's now a fairly complete entry page: http://wiki.postgresql.org/wiki/Development_information That should have the majority of what most people are looking for. The previous project management page was collapsed into the above. There's still a Development projects subpage there, but that's fairly specific to people who know what they're looking for I think. The March Commitfest section might be slimmed down a bit after the May one is better defined. One small change I'd suggest on the main site: http://www.postgresql.org/developer/coding links to http://wiki.postgresql.org/wiki/Developer_and_Contributor_Resources which is now a redirect to the above page. I separated out the advocacy contributors to their own section which made the longer title unneeded. It would be nice one day to change that to use the shorter Development_information URL instead. It would also be worth considering a direct link to that URL in the manual, I believe it will remain stable now. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How to submit a patch
On Sat, 19 Apr 2008, Joshua D. Drake wrote: Greg Smith wrote: One small change I'd suggest on the main site: http://www.postgresql.org/developer/coding links to http://wiki.postgresql.org/wiki/Developer_and_Contributor_Resources which is now a redirect to the above page. This request should be on -www and as a note I don't know that I like the idea. There were two suggestions there, and technically one of them goes to -docs instead if we're gonna get picky. 1) Switch the URL that's already on the coding page to more directly point to a URL that actually exists, rather than a redirect. That seems undebatable. 2) Put a link to an area that contains information like current CommitFest progress in the development section of the manual. (2) was already suggested here recently; I said I didn't think that was a good idea until the content there stabilized because I planned a reorganization. I was just announcing that I believe that to be stable now, and I nominate the revised Developer information page as the one to link to. If you don't like the idea of embedding a few choice URLs from the wiki into the main documentation in general, I don't know why. The manual is great for some things, the wiki is great for others, and the best way to use both for what they're good at is to start coupling them together at appropriate points. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Commitfest namespacing (was: TODO, FAQs to Wiki?)
On Tue, 22 Apr 2008, Brendan Jurd wrote: I wonder if we should namespace the CommitFest pages by year as well as month (i.e., move CommitFest:May to CommitFest:May2008). This already came up on pgsql-www and as I just replied to over there, the current structure has some things I'd like to fix beyond just this (and there's a pending namespace vs. categories argument brewing there). That's the list where this sort of thing will get hashed out at. Please come join so you can get sucke...err, volunteer to help out even more than you already have. This way, even after we've had a CommitFest:May in 2009/2010/etc., the history of the May 2008 CommitFest will still be easily viewable as a discrete item. There ultimately should be pages for CommitFest:2008 and CommitFest:8.4 that the Wiki generates itself. I'd prefer not to see any band-aid changes made in this area that aren't thinking forward to address those as well. Work on improving the structure for May instead like you've been doing, that's much more valuable right now IMHO. We probably need to have the following redirects in place: * CommitFest:Current (for reviewers)... Ditto here. I already intend to eliminate the CommitFest redirect you've put there already and replace it with a page listing the Views available one day, and I'd prefer not to see more of these floating around. Redirects are designed to be a useful hack when a page gets removed or to handle common shortcuts/errors. In general, if you're relying on them heavily for external navigation structure, you're probably not using the right tool for that sort of job. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO, FAQs to Wiki?
On Mon, 21 Apr 2008, Tino Wildenhain wrote: Alvaro Herrera wrote: I suggest we start an experiment with the FAQ in XML Docbook, which is amenable to automatic processing, and move from there. Well... or reStructuredText which has the advantage of beeing human editable? (without specialized editor that is) reST is a reasonable tool for building small documents, I don't use it because it really doesn't scale well to handle larger ones. Given that the rest of the project is already committed to using Docbook for those larger documents, I think it's hard to justify the additional toolchain needed for reST processing just to make the FAQ a little easier to edit. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Per-table random_page_cost for tables that we know are always cached
On Tue, 22 Apr 2008, PFC wrote: Example : let's imagine a cache priority setting. Which we can presume the DBA will set incorrectly because the tools needed to set that right aren't easy to use. An alternative would be for the background writer to keep some stats and do the thing for us : - begin bgwriter scan - setup hashtable of [relid = page count] - at each page that is scanned, increment page count for this relation... I've already got a plan sketched out that does this I didn't manage to get finished in time for 8.3. What I wanted it for was not for this purpose, but for instrumentation of what's in the cache that admins can look at. Right now you can get that out pg_buffercache, but that's kind of intrusive because of the locks it takes. In many cases I'd be perfectly happy with an approximation of what's inside the buffer cache, accumulated while the page header is being locked anyway as the BGW passed over it. And as you note having this data available can be handy for internal self-tuning as well once it's out there. Jim threw out that you can just look at the page hit percentages instead. That's not completely true. If you've had some nasty query blow out your buffer cache, or if the server has been up a looong time and the total stas don't really reflect recent reality, what's in the buffer cache and what the stats say have been historical cached can diverge. This would not examine whatever is in the OS' cache, though. I don't know that it's too unrealistic to model the OS as just being an extrapolated bigger version of the buffer cache. I can think of a couple of ways those can diverge: 1) Popular pages that get high usage counts can end up with a higher representation in shared_buffers than the OS 2) If you've being doing something like a bulk update, you can have lots of pages that have been written recently in the OS cache that aren't really accounted for fully in shared_buffers, because they never get a high enough usage count to stay there (only used once) but can fill the OS cache as they're spooled up to write. I'm not sure that either of these cases are so strong they invalidate your basic idea though. There's a pending 8.4 TODO to investigate whether increasing the maximum usage count a buffer can get would be an improvement. If that number got bumped up I could see (2) become more of a problem. I'd be a somewhat concerned about turning this mechanism on by default though, at least at first. A hybrid approach that gives the DBA some control might work well. Maybe have an adjust estimates for cache contents knob that you can toggle on a per-session or per-table basis? -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [0/4] Proposal of SE-PostgreSQL patches
On Thu, 1 May 2008, Andrej Ricnik-Bay wrote: Not a hacker, just a curious reader ... are there equivalent frameworks for the other supported platforms? E.g. MacOS, *BSD, Windows? SELinux is a Linux implementation of ideas from an earlier NSA project named Flask. There is port of another variant of that, Flask/TE, that is making its way into the BSD variants via a project called SEBSD. TrustedBSD, Darwin (OS X), and OpenSolaris all have projects in this area already (the Solaris one just launched last month). A good starter page is http://www.trustedbsd.org/sebsd.html Particularly given the common heritage, I suspect that the PostgreSQL side of all these projects will be similar, and that once those hooks are in place it will just be a matter of tying them into the higher levels of the other framework. It would be too ambitious to target all of them all at once for a first pass, but it may be worth a look at the fundamentals of SEBSD to make sure the right hooks look like they're in place. Windows has this thing called Group Policy that's supposedly leaped forward for Windows Server 2008. They are now advertising it as like SELinux, but better. The presentation PDF I just read on that subject sounds like something written by the crazy guy at Broadway 57th street I used to walk by, as he talked on fruit as if they were his cell phone. It's such a deluded and wildly misguided bit of sales fluff that you can't take it seriously, and the whole thing just leaves me feeling sorry for them instead. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [0/4] Proposal of SE-PostgreSQL patches
On Wed, 30 Apr 2008, KaiGai Kohei wrote: [1/4] sepostgresql-pgace-8.4devel-3-r739.patch provides PGACE (PostgreSQL Access Control Extension) framework. http://sepgsql.googlecode.com/files/sepostgresql-pgace-8.4devel-3-r739.patch For those overwhelmed by sheer volume here, this is the patch to start with, because it's got all the core changes to the server. I'm also in the camp that would like to see this feature added, but rather than just giving it a +1 I started looking at it. The overall code is nice: easy to understand, structured modularly. I have some concerns though. The first two things that jump out at me on an initial review appear right from the beginning for those who want to take a look: -I'm a bit unnerved by both the performance and reliability implications from how the security check calls are done in every case, even if there is no SELinux support included. Those checks are sitting in some pretty low level tuple and heap calls. The approach taken here is to put all the #ifdef logic into the underlying ACE interface (see patch [2/4]), so that the caller doesn't have to care. If SELinux support is off then the calls turns into void x(y) {} or bool a(b) { return true; } This is a very clean design, but it's putting extra (possibly optimized away) calls into a lot of places. While it would be uglier, it might make sense to put that on/off logic in all the places where the calls are made, so that when you turn SELinux support off most of the code really does go completely away rather than just turning into stubs. -The only error reporting and handling method used is elog(ERROR, That seems a bit heavy handed for something that can be expected to happen all the time. If I understand this correctly, when you're scanning a table with 1000 rows where you're only allowed to see 50% of them, that's going to be 500 call to elog(), one for each tuple you can't see. Having a tuple get screened out isn't really an error per se, and while I can see how sensitive installs would want those all reported there are others where this volume of log activity would be too much. Just because someone with classified clearance is looking at a big table that also has a lot of secret info in it, not all installs will want a million errors reported just because there's data that person can't see available. At a minimum, this needs some finer log control, and maybe a rethinking altogether of how to handle error cases. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [0/4] Proposal of SE-PostgreSQL patches
On Mon, 5 May 2008, Tom Lane wrote: elog() should not be used for user-facing errors. I couldn't easily tell just which of the messages are likely to be seen by users and which ones should be can't happen cases, but certainly there are a whole lot of these that need to be ereport()s. Likely there need to be some new ERRCODEs too. And it would be a nice step toward the scenarios I was asking about if there was a GUC variable for what level to log security violations at. I realize now the tuple-level warnings are going into the SELinux logs rather than the PostgreSQL ones, but it should be easier to change policy violations that impact the server to something other than just ERROR. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [0/4] Proposal of SE-PostgreSQL patches
On Tue, 6 May 2008, Tom Lane wrote: And of course the next question after that is why we should want to depend on SELinux at all, rather than implementing row filtering in the framework of SQL permissions... It may be the case that clean row and column filtering at the SQL layer are pre-requisites for a clean SELinux implementation, where the only difference is that the permission checks are handled by asking SELinux instead of looking in the catalog. As for why SQL restrictions alone aren't enough...the simple answer is because it's not SELinux, which I say in all seriousness because it is turning into a requirement in some places. SELinux lets you control what a user login is capable of no matter what application they run, and managing those capabilities can happen in one place--the SELinux tools. There's lots of ways to address OS login problems. Let's say the logins have a PAM plug-in that restrict what you can login to based on what machine you're on, and also require one of those randomly generating key cards so that you can't steal someone else's username/password. If you've got a scheme like that, and the database enforces SELinux restrictions, it doesn't matter whether your DBA followed all the PostgreSQL security rules correctly, as long as they got the SELinux mapping part right. And you don't have to make sure whatever custom security mechanism you've integrated into the login or post-login process is recognized by the database proper at all, as long as the restrictions can be mapped to the SELinux+database space. Simple example of something hard to replicate without this framework: you discover someone is a rat. You update your list of active users and push that to all your servers. Now even if said rat is already logged into the database server and busy doing 'psql -o /disk/usbkey -c select * from secretdata' you just cut them off in the middle of the query--without needing to find all the database servers and execute alter table secretdata set ..., just by doing simple user account maintenance the way people are already comfortable with and have procedures for. That's the basic idea here--put the authorization into one layer where it's easy (for some definitions of easy) to manage and extensible as needed, without having to touch the individual applications directly, just by adjusting what permissions you publish when data is requested. I'm sure someone can raise issues or suggest alternate implemenations for my specific examples, but much like other privledge escalation defense mechanisms these environments look for redundant layers of security. In reality users of this would aim for a completely locked down base PostgreSQL *and* a completely locked down SELinux implementation integrated into that, reinforcing one another, rather than just relying on one level of security. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Link requirements creep
On Sat, 17 May 2008, Tom Lane wrote: I was displeased to discover just now that in a standard RPM build of PG 8.3, psql and the other basic client programs pull in libxml2 and libxslt; this creates a package dependency that should not be there by any stretch of the imagination. When we noticed this recently, my digging suggested you'll be hard pressed to have a RedHat system now without those two installed. The libxslt RPM provides necessary components for KDE, GNOME, and Sun's Java RPM. libxml2 is far more intertwined even than that. These dependencies are unpleasant technically, but I don't think the introduce any real functional creep. It would be difficult to even strip a system down to the point where these packages weren't available. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New DTrace probes proposal
On Sat, 17 May 2008, Robert Lor wrote: I'd like to propose adding the following probes (some of which came from Simon) to 8.4. There's also a big DTrace probe set patch available from OmniTI: https://labs.omniti.com/project-dtrace/trunk/postgresql/ http://labs.omniti.com/trac/project-dtrace/wiki/Applications#PostgreSQL I don't know if you've looked at that before. There's some overlap but many unique and handy probes to each set. I think it would be nice to consider a superset union of the two. I would guess OmniTI would be glad to have their set assimilated into core as well so they don't have to maintain their patch past 8.3; hopefully Theo or Robert will chime in on that. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Can't t compile current HEAD
On Thu, 15 May 2008, Nikhils wrote: On Thu, May 15, 2008 at 11:59 AM, Pavel Stehule [EMAIL PROTECTED] wrote: I always use a ~/.cvsrc containing My .cvsrc also includes: Good hints, and there's now a little section including them all at http://wiki.postgresql.org/wiki/Working_with_CVS#Initial_setup -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GSoC08]some detail plan of improving hash index
On Fri, 16 May 2008, Josh Berkus wrote: For a hard-core benchmark, I'd try EAStress (SpecJAppserver Lite) This reminds me...Jignesh had some interesting EAStress results at the East conference I was curious to try and replicate more publicly one day. Now that there are some initial benchmarking servers starting to become available, it strikes me that this would make a good test case to run on some of those periodically. I don't have a spare $2K for a commercial license right now, but there's a cheap ($250) non-profit license for EAStress around. That might be a useful purchase for one of the PG non-profits to make one day though. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] triggers on prepare, commit, rollback... ?
On Tue, 20 May 2008, Hannu Krosing wrote: Tell others that this trx failed, maybe log a failure ? OTOH, this can be implemented by a daemon that sits on tail -f logfile | grep ROLLBACK In order to follow the log files like that successfully in many environments, you need to stay in sync as the underlying log file changes (it might rotate every day for example). Unfortunately it's not as simple as just using tail. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Catching exceptions from COPY
On Wed, 28 May 2008, Darren Reed wrote: Is it feasible to add the ability to catch exceptions from COPY? Depends on what you consider feasible. There's a start to a plan for that on the TODO list: http://www.postgresql.org/docs/faqs.TODO.html but it's not trivial to implement. It's also possible to do this right now using pgloader: http://pgfoundry.org/projects/pgloader/ That requires some setup and there's overhead to passing through that loading layer. A third possibility is to write a short script specifically aimed at your copy need that breaks your input files into smaller chunks and loads them, kicking back the ones that don't load, or breaking them into even smaller chunks until you've found the problem line or lines. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PERFORM] Memory question on win32 systems
On Thu, 29 May 2008, Justin wrote: I'm confussed trying to figure out how caches are being use and being moving through postgresql backend. The shared_buffers cache holds blocks from the database files. That's it. If you want some more information about how that actually works head to http://www.westnet.com/~gsmith/content/postgresql/ and read Inside the PostgreSQL Buffer Cache. The work memory allocated for sorting is separate from that, and it doesn't cache anything. It just provides working room for a query that's being executed right now. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Thu, 29 May 2008, David Fetter wrote: It's a giant up-hill slog to sell warm standby to those in charge of making resources available because the warm standby machine consumes SA time, bandwidth, power, rack space, etc., but provides no tangible benefit, and this feature would have exactly the same problem. This is an interesting commentary on the priorities of the customers you're selling to, but I don't think you can extrapolate from that too much. The deployments I normally deal with won't run a system unless there's a failover backup available, period, and the fact that such a feature is not integrated into the core yet is a major problem for them. Read-only slaves is a very nice to have, but by no means a prerequisite before core replication will be useful to some people. Hardware/machine resources are only worth a tiny fraction of what the data is in some environments, and in some of those downtime is really, really expensive. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Thu, 29 May 2008, Tom Lane wrote: There's no point in having read-only slave queries if you don't have a trustworthy method of getting the data to them. This is a key statement that highlights the difference in how you're thinking about this compared to some other people here. As far as some are concerned, the already working log shipping *is* a trustworthy method of getting data to the read-only slaves. There are plenty of applications (web oriented ones in particular) where if you could direct read-only queries against a slave, the resulting combination would be a giant improvement over the status quo even if that slave was as much as archive_timeout behind the master. That quantity of lag is perfectly fine for a lot of the same apps that have read scalability issues. If you're someone who falls into that camp, the idea of putting the sync replication job before the read-only slave one seems really backwards. I fully accept that it may be the case that it doesn't make technical sense to tackle them in any order besides sync-read-only slaves because of dependencies in the implementation between the two. If that's the case, it would be nice to explicitly spell out what that was to deflect criticism of the planned prioritization. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Fri, 30 May 2008, Andreas 'ads' Scherbaum wrote: Then you ship 16 MB binary stuff every 30 second or every minute but you only have some kbyte real data in the logfile. Not if you use pg_clearxlogtail ( http://www.2ndquadrant.com/replication.htm ), which got lost in the giant March commitfest queue but should probably wander into contrib as part of 8.4. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Sat, 31 May 2008, Gurjeet Singh wrote: Not if you use pg_clearxlogtail This means we need to modify pg_standby to not check for filesize when reading XLogs. No, the idea is that you run the segments through pg_clearxlogtail | gzip, which then compresses lightly used segments massively because all the unused bytes are 0. File comes out the same size at the other side, but you didn't ship a full 16MB if there was only a few KB used. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Overhauling GUCS
On Fri, 30 May 2008, Josh Berkus wrote: 1) Add several pieces of extra information to guc.c in the form of extra gettext commands: default value, subcategory, long description, recommendations, enum lists. 2) Incorporate this data into pg_settings When you last brought this up in February (I started on a long reply to http://archives.postgresql.org/pgsql-hackers/2008-02/msg00759.php that I never quite finished) the thing I got stuck on was how to deal with the way people tend to comment in these files as they change things. One problem I didn't really see addressed by the improvements you're suggesting is how to handle migrating customized settings to a new version (I'm talking about 8.4-9.0 after this is in place, 8.3-8.4 is a whole different problem). It would be nice to preserve history of what people did like in your examples (which look exactly like what I find myself running into in the field). Now, that will get a lot easier just by virtue of having a smaller config file, but I think that adding something into pg_settings that allows saving user-added commentary would be a nice step toward some useful standardization on that side of things. It would make future automated tools aimed at parsing and generating new files, as part of things like version upgrades, a lot easier if there was a standard way such comments were handled in addition to the raw data itself. The other thing I'd like to see make its way into pg_settings, so that tools can operate on it just by querying the database, is noting what file the setting came from so that you can track things like include file usage. I think with those two additions (comments and source file tracking) it would even be concievable to clone a working facsimile of even a complicated postgresql.conf file set remotely just by reading pg_settings. While a bit outside of the part you're specifically aiming to improve here, if you could slip these two additions in I think it would be a boon to future writers of multi-server management tools as well. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Overhauling GUCS
On Sun, 1 Jun 2008, Peter Eisentraut wrote: Josh Berkus wrote: 1. Most people have no idea how to set these. Could you clarify this? I can't really believe that people are incapable of editing a configuration file. The big problem isn't the editing, it's knowing what to set the configuration values to. This is not to say that editing a configuration file should be considered reasonable. Any GUCS overhaul should include a design goal of being able to completely manage the configuration system using, say, pgadmin (the manage settings via port access part that Josh already mentioned). This is why I was suggesting additions aimed at assimilating all the things that are in the postgresql.conf file. Joshua has been banging a drum for a while now that all this data needs to get pushing into the database itself. The GUCS data is clearly structured like a database table. Josh's suggested changes are basically adding all the columns needed to it in order to handle everything you'd want to do to the table. If you think of it in those terms and make it possible to manipulate that data using the tools already available for updating tables, you'll open up the potential to add a whole new class of user-friendly applications for making configuration easier to manage. However, I don't fully agree with taking that idea as far as Joshua has suggested (only having the config data in the database), because having everything in a simple text file that can be managed with SCM etc. has significant value. It's nice to allow admins to be able to make simple changes with just a file edit. It's nice that you can look at all the parameters in one place and browse them. However, I do think that the internal database representation must be capable of holding everything in the original postgresql.conf file and producing an updated version of the file, either locally or remotely, as needed. 4. We don't seem to be getting any closer to autotuning. True. But how does your proposal address this? The idea that Josh's suggestions are working toward is simplying the construction of tools that operate on the server configuration file, so that it's easier to write an autotuning tool. Right now, writing such a tool in a generic way gets so bogged down just in parsing/manipulating the postgresql.conf file that it's hard to focus on actually doing the tuning part. If we go back to his original suggestion: http://wiki.postgresql.org/wiki/GUCS_Overhaul Add a script called pg_generate_conf to generate a postgresql.conf based on guc.c and command-line switches (rather than postgresql.conf.sample) It's an easy jump from there to imagine a pg_generate_conf that provide a wizard interface to update a configuration file. I forsee a little GUI or web app that connects to a server on port 5432, finds out some basic information about the server, and gives something like this: Parameter Current Recommended Change? shared_buffers 32MB1024MB [X] effective_cache_size128MB 3000MB [ ] work_mem1MB 16MB[ ] Josh has the actual brains behind such an app all planned out if you look at his presentations, but without the larger overhaul it's just not possible to make the implementation elegant. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Overhauling GUCS
On Sun, 1 Jun 2008, Joshua D. Drake wrote: Well I don't know that a minimum of comments is what I am arguing as much as not too much comments. Josh's proposal included making three levels of documentation-level comments available: terse, normal, and verbose. The verbose comment level probably should include a web link to full documentation. The way the comments litter the existing file, the status quo that's called normal mode in this proposal, is IMHO a complete mess. Most use cases I can think of want either no comments or really verbose ones, the limited commentary in the current sample postgresql.conf seems a middle ground that's not right for anybody. The key thing thing here in my mind is that it should be possible to transform between those three different verbosity levels without losing any settings or user-added comments. They're really just different views on the same data, and which view you're seeing should be easy to change without touching the data. I just extracted the original design proposal and some of the relevent follow-up in this thread, made some additional suggestions, and put the result at http://wiki.postgresql.org/wiki/GUCS_Overhaul I think reading that version makes it a bit clearer what the proposed overhaul is aiming for. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Overhauling GUCS
On Mon, 2 Jun 2008, Jignesh K. Shah wrote: Most people I have seen will increase one or few but not all parameters related to memory which can result in loss of performance and productivity in figuring out. If it becomes easier to build a simple tool available to help people tune their configurations, that should help here without having to do anything more complicated than that. What happened to AvailRAM setting and base all memory gucs on that. Like some of the other GUC simplification ideas that show up sometimes (unifying all I/O and limiting background processes based on that total is another), this is hard to do internally. Josh's proposal has a fair amount of work involved, but the code itself doesn't need to be clever or too intrusive. Unifying all the memory settings would require being both clever and intrusive, and I doubt you'll find anybody who could pull it off who isn't already overtasked with more important improvements for the 8.4 timeframe. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Overhauling GUCS
On Mon, 2 Jun 2008, Tom Lane wrote: Greg Smith [EMAIL PROTECTED] writes: Joshua has been banging a drum for a while now that all this data needs to get pushing into the database itself. This is, very simply, not going to happen. Right, there are also technical challenges in the way of that ideal. I was only mentioning the reasons why it might not be the best idea even if it were feasible. However, I do not see why the limitations you bring up must get in the way of thinking about how to interact and manage the configuration data in a database context, even though it ultimately must be imported and exported to a flat file. The concerns you bring up again about leaving the database in an unstartable state are a particularly real danger in the only has access to 5432 hosted provider case that this redesign is trying to satisfy. I added a Gotchas section to the wiki page so that this issue doesn't get forgotten about. The standard way to handle this situation is to have a known good backup configuration floating around. Adding something in that area may end up being a hard requirement before remote editing makes sense. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Overhauling GUCS
On Tue, 3 Jun 2008, Paul van den Bogaard wrote: So overhauling the GUC parameters is one step, but adding proper instrumentation in order to really measure the impact of the new setting is necessary too. Correct, but completely off-topic regardless. One problem to be solved here is to take PostgreSQL tuning from zero to, say, 50% automatic. Wander the user lists for a few months; the number of completely misconfigured systems out there is considerable, partly because the default values for many parameters are completely unreasonable for modern hardware and there's no easy way to improve on that without someone educating themselves. Getting distracted by the requirements of the high-end systems will give you a problem you have no hope of executing in a reasonable time period. By all means bring that up as a separate (and much, much larger) project: Database Benchmarking and Sensitivity Analysis of Performance Tuning Parameters would make a nice PhD project for somebody, and there's probably a good patent in there somewhere. Even if you had such a tool, it wouldn't be usable by non-experts unless the mundate GUC generation issues are dealt with first, and that's where this is at right now. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Overhauling GUCS
On Wed, 4 Jun 2008, Andreas Pflug wrote: When reading this thread, I'm wondering if anybody ever saw a config file for a complex software product that was easily editable and understandable. I would recommend Apache's httpd.conf as an example of something that's easy to edit and follow. Like any complex product, the comments in the configuration file itself can't possibly be sufficient by themselves. But in general I've found Apache's config file to have enough comments to jog my memory when I'm editing it while not being overwhelming. They provide enough detail that when I run into a setting I don't understand there's enough context provided that it's easy to search for more information. Poking around with Google for a bit, here's a reasonable sample: http://webdav.org/goliath/dav_on_x/apache.conf IMHO the best compromise in machine and human readability is an XML format. If the primary PostgreSQL configuration file becomes XML I will quit working with the project. I'm not kidding. If you think XML is easy to generate, edit by hand, and use revision control on, we are at such an fundamental disagreement that I wouldn't even try and directly argue with you. Instead I'll quote Eric Raymond: The most serious problem with XML is that it doesn't play well with traditional Unix tools. Software that wants to read an XML format needs an XML parser; this means bulky, complicated programs. http://www.catb.org/esr/writings/taoup/html/ch05s02.html#id2907018 Let me suggest the following requirement instead which naturally rules it out: it should be possible for a DBA-level coder to write a simple shell script that does something useful with the configuration file in order for having a text-based configuration to be useful in this context. To give a simple example, I can write a single line [sed|awk|perl] command that will let me update the value for one parameter in the current postgresql.conf file. When you can give me a one-liner that does that on an XML file in any shell language in that class, then we might have something to talk about. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Overhauling GUCS
On Wed, 4 Jun 2008, Tom Lane wrote: The real problem we need to solve is how to allow newbies to have the system auto-configured to something that more or less solves their problems. Putting the config settings in XML does not accomplish that, and neither does putting them inside the database. The subtle issue here is that what makes sense for the database configuration changes over time; there's not just one initial generation and you're done. postgresql.conf files can end up moving from one machine to another for example. I think something that doesn't recognize that reality and move toward a tune-up capability as well as initial generation wouldn't be as useful, and that's where putting the settings inside the database helps so much. Also, there's a certain elegance to having a optimization tool that works again either a new installation or an existing one. I personally have zero interest in a one-shot config generator. It just doesn't solve the problems I see in the field. Performance starts out just fine even with the default settings when people first start, and then goes to hell after the system has been running for a while (and possibly moved to another machine). By that point nobody wants to mess with their configuration file unless it's one simple change at a time. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] rfc: add pg_dump options to dump output
On Tue, 3 Jun 2008, Tom Lane wrote: Well, the stuff included into the dump by pg_dump -v is informative, too. But we stopped doing that by default because of complaints. I remain unconvinced that this proposal won't suffer the same fate. I think it would be reasonable to only include the list of options used in the dump if you use one that changes what appears in the dump. That way, you wouldn't see anything by default. But if you make a modification that will likely break a diff with an existing dump done with the default parameters, the option change that introduced that should show at the very beginning. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Overhauling GUCS
On Wed, 4 Jun 2008, Andrew Dunstan wrote: Tom Lane wrote: * Can we build a configuration wizard to tell newbies what settings they need to tweak? That would trump all the other suggestions conclusively. Anyone good at expert systems? Sigh. I guess we need to start over again. Last year around this time, there was one of the recurring retreads of this topic named PostgreSQL Configuration Tool for Dummies: http://archives.postgresql.org/pgsql-performance/2007-06/msg00386.php Josh Berkus pointed out that he already had the expert system part of this problem solved pretty well with a spreadsheet: http://pgfoundry.org/docman/view.php/1000106/84/calcfactors.sxc (that's in the OpenOffice Calc format if you don't know the extension) That particular spreadsheet has more useful tuning suggestions in this area than 99.9% of PostgreSQL users have or will ever know. You can nitpick the exact recommendations, but the actual logic and thinking involved is pretty well solved. It could use a touch of tweaking and modernization but it's not too far off from being as good as you're likely to get at making guesses without asking the user too many questions. There is one ugly technical issue, that you can't increase shared_buffers usefully in many situations because of SHMMAX restrictions, and that issue will haunt any attempt to be completely automatic. Where Josh got hung up, where I got hung up, where Lance Campbell stopped at with his Dummies tool, and what some unknown number of other people have been twarted by, is that taking that knowledge and turning it into a tool useful to users is surprisingly difficult. The reason for that is the current postgresql.conf file and how it maps internally to GUC information isn't particularly well suited to automated generation, analysis, or updates. I think Josh got lost somewhere in the parsing the file stage. The parts I personally got stuck on were distinguishing user-added comments from ones the system put in, plus being completely dissatisfied with how lossy the internal GUC process was (I would like a lot more information out of pg_settings than are currently there). Lance's helper tool was hobbled by the limitations of being a simple web application. That's the background to Josh's proposal. It has about an 80% overlap with what I was working on suggesting, which is why I jumped on his bandwagon so fast. The outline at http://wiki.postgresql.org/wiki/GUCS_Overhaul includes the superset of our respective thinking on the first step here toward straightening out this mess, further expanded with observations made in this thread. I would respectively point out that comments about the actual tuning itself have no bearing whatsoever on this proposal. This is trying to nail down all the features needed to support both doing an initial generation and subsequent incremental improvements to the postgresql.conf file, while also reducing some redundancy in the code itself. Reducing the scope to only handling initial generation would make this a smaller task. But it happens to fall out that the work required to cut down on the redundancy and that required to better support incremental updates as well happen to be almost the same. Josh's stated agenda is to get this right in one swoop, with only one version worth of disruption to the format, and that goal is served better IMHO as well by addressing all these changes as one batch. I will attempt to resist further outbursts about non-productive comments here, and each time I am tempted instead work on prototyping the necessary code I think this really needs instead. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Overhauling GUCS
On Wed, 4 Jun 2008, Aidan Van Dyk wrote: * Are backends always writing out dirty buffers because there are no free ones? This might mean tweaking settings affecting bgwriter. What you mean on the first one is are backends always writing out dirty buffers becuase there are no *clean* ones; the server operates with no *free* buffers as standard operations. Figuring that out is now easy in 8.3 with the pg_stat_bgwriter view. * Are the evicted buffers ones with really high usage counts? This might mean an increase shared buffers would help? Evicted buffers must have a 0 usage count. The correct question to ask is are buffers never getting high usage counts because they keep getting evicted too fast?. You can look at that in 8.3 using pg_buffercache, I've got suggested queries as part of my buffer cache presentation at http://www.westnet.com/~gsmith/content/postgresql/ * Are we always spilling small amounts of data to disk for sorting? A a small work_mem increase might help... I was just talking to someone today about building a monitoring tool for this. Not having a clear way to recommend people monitor use of work_mem and its brother spilled to disk sorts is an issue right now, I'll whack that one myself if someone doesn't beat me to it before I get time. * Are all our reads from disk really quick? This probably means OS pagecache has our whole DB, and means random_page_cost could be tweaked? This is hard to do with low overhead in an OS-independant way. The best solution available now would use dtrace to try and nail it down. There's movement in this area (systemtap for Linux, recent discussion at the PGCon Developer Meeting of possibly needing more platform-specific code) but it's not quite there yet. So everything you mentioned is either recently added/documented or being actively worked on somewhere, and the first two were things I worked on myself after noticing they were missing. Believe me, I feel the items that still aren't there, but they're moving along at their own pace. There's already more tuning knowledge available than tools to help apply that knowledge to other people's systems, which is why I think a diversion to focus just on that part is so necessary. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Overhauling GUCS
, so that it's easy to figure out what people changed. That is sometimes handy to include as part of this sort of analysis, and it's necessary to provide improvements like a strip the unnecessary junk out of this file that many people would like from this sort of tool. When you show people that you recommend increasing a value to something larger, any comments about that setting will be shown and they'll know not to follow the tool's advice if there's a history there. This seems like such a better place to be that I'd rather drive toward the server-side changes necessary to support it rather than fight the difficult tool creation problems. That's why the focus on a new API for 'writing my config' for me; that particular goal is just one part of a set of revisions that streamline the tool creation process in a not necessarily obvious way. Unless, of course, you've tried to write a full-circle config tuning tool, in which case most of the proposed changes in this overhaul jump right out at you. [1] In the shared_buffers case, it may be possible to just recommend a value without caring one bit what the current one is. But for work_mem, you really need to actually understand the value if you want any real intelligence that combines that information with the maximum connections, so that you can compute how much memory is left over for things like effective_cache_size. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Overhauling GUCS
On Thu, 5 Jun 2008, Heikki Linnakangas wrote: A configuration wizard would be nice, but it would be a good start to add a section to the manual on how to do the basic tuning. AFAICS we don't have one. Clear instructions on how to set the few most important settings like shared_buffers and checkpoint_timeout/segments would probably be enough, with a link to the main configuration section that explains the rest of the settings. It hasn't gelled yet but I'm working on that. Most of the text needed is now linked to at http://wiki.postgresql.org/wiki/Performance_Optimization I already talked with Chris Browne about merging his document I put first in that list with useful pieces from some of mine into one more comprehensive document on the Wiki, covering everything you mention here. If we took a snapshot of that when it's done and dumped that into the manual, I don't think that would be a problem to wrap up before 8.4 is done. I'd like to include a link to the above performance page in that section of the manual as well, both so that people are more likely to find fresh content as well as to give them pointers toward more resources than the manual can possibly cover. If people don't read the manual, we can add a link to it from postgresql.conf.sample, add a screen to the Windows installer suggesting to read it, or even open postgresql.conf in Notepad. They don't. Putting pointers toward a relatively simple performance tuning document a bit more in people's faces might help lower some of the criticism the project takes over providing low defaults for so many things. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Overhauling GUCS
On Thu, 5 Jun 2008, Magnus Hagander wrote: We really need a proper API for it, and the stuff in pgAdmin isn't even enough to base one on. I would be curious to hear your opinion on whether the GUC overhaul discussed in this thread is a useful precursor to building such a proper API. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Overhauling GUCS
On Thu, 5 Jun 2008, Alvaro Herrera wrote: I must say that I am confused by this thread. What's the discussed GUC overhaul? http://wiki.postgresql.org/wiki/GUCS_Overhaul I drop that URL in every other message in hopes that people might start commenting on it directly if they see it enough; the fact that you're confused says I may need to keep that up :( (1) Add a lot more comments to each setting (2) Add documentation links to each setting (3) Move more frequently used settings to the top of the file (4) Ship different sample config files (5) Create an expert system to suggest tuning (6) Other random ideas (XML, settings in database, others?) To me, there are two ideas that are doable right now, which are (2) and (4). (1) seems to be a step backwards in pg_hba.conf experience, and we would have to maintain duplicate documentation. (3) seems messy. (5) is a lot of work; do we have volunteers? As for (6), the two examples I give can be easily dismissed. (2) and (4) do not seem necessary to get the config API built. (1) is in that proposal but is strictly optional as something to put in the configuration file itself. The idea behind (2) is to enable tool authors to have an easier way to suggest where to head for more information. I'd like for it to be trivial for a tool to say Suggested value for x is y; see http://www.postgresql.org/docs/8.3/interactive/runtime-config-resource.html for more information. I know what most of the settings I tinker with do, but even I'd like it to be easier to find the right spot in the manual; for newbies it's vital. You are correct that (2) isn't strictly necessary here, but it's valuable and will be easier to wrap into this than to bolt on later. (3) (4) (5) and (6) were off-topic diversions. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Overhauling GUCS
On Thu, 5 Jun 2008, Aidan Van Dyk wrote: People like me don't want to have postgresql.conf be *only* a machine-generated file, which I am not allowed to edit anymore because next DBA doing a SET PERSISTANT type of command is going to cause postgres to write out something else, over-writing my carefully documented reason for some particular setting. This is why there's the emphasis on preserving comments as they pass into the GUC structure and back to an output file. This is one of the implementation details I haven't fully made up my mind on: how to clearly label user comments in the postgresql.conf to distinguish them from verbose ones added to the file. I have no intention of letting manual user edits go away; what I'm trying to do here (and this part is much more me than Josh) is make them more uniform such that they can co-exist with machine edits without either stomping on the other. Right now doing that is difficult, because it's impossible to tell the default comments from the ones the users added and the current comment structure bleeds onto the same lines as the settings. But the big issue I have (not that it really matters, because I'm not one of the ones working on it, so I please don't take this as me telling anyone what they can or can't do) is that that goal doesn't solve any of the listed problems stated in the proposal 1. Most people have no idea how to set these. Making it much easier to build recommendation tools is how this helps here. 2. The current postgresql.conf file is a huge mess of 194 options, the vast majority of which most users will never touch. The proposed pg_generate_conf tool includes options to spit out a basic configuration file instead of the complete one. 3. GUCS lists are kept in 3 different places (guc.c, postgresql.conf, and settings.sgml), which are only synched with each other manually. The proposal throws away having a separate postgresql.conf file, so that reduces it from 3 places to 2. That's moving in the right direction 4. We don't seem to be getting any closer to autotuning. If you try to build a tuning tool, these areas end up being the unnecessarily hard parts. Thanks for the comments on the proposal. I'm only bothering to respond to messages like yours now, am deleting all of the continuing attemps to divert the discussion over to parameter tuning details or expanding the scope here. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Overhauling GUCS
On Thu, 5 Jun 2008, Alvaro Herrera wrote: FWIW smb.conf uses ; for one purpose and # for the other. They're actually combining the way UNIX files use # with how Windows INI files use ; in a config file context, which I personally find a little weird. I was already considering keeping user comments as # while making all system-inserted ones #! ; many people are already used to #! having a special system-related meaning from its use in UNIX shell scripting which makes it easier to remember. I think the next step to this whole plan is to generate a next-gen postgresql.conf mock-up showing what each of the outputs from the pg_generate_conf tool might look like to get feedback on that; it will make what is planned here a bit easier to understand as well. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Overhauling GUCS
On Fri, 6 Jun 2008, Peter Eisentraut wrote: - What settings do newbies (or anyone else) typically need to change? Please post a list. - What values would you set those settings to? Please provide a description for arriving at a value, which can later be transformed into code. Note that in some cases, not even the documentation provides more than handwaving help. Josh's spreadsheet at http://pgfoundry.org/docman/view.php/1000106/84/calcfactors.sxc provides five different models for setting the most critical parameters based on different types of workloads. Everyone can quibble over the fine tuning, but having a good starter set of reasonable settings for these parameters is a solved problem. It's just painful to build a tool to apply the available expert knowledge that is already around. - If we know better values, why don't we set them by default? Because there's not enough information available; the large differences between how you tune for different workloads is one example. Another is that people tune for peak and projected activity rather than just what's happening right now. Every model suggested for a tuning wizard recognizes you need to ask some set of questions to nail things down. I continue to repeat in broken-record style, exactly what a tuning tool will ask about and what settings it will suggest is not important, and getting into that is an entirely different discussion (one that gets hashed out every single day on pgsql-performance). The fact that writing such a tool is harder than it should be is the issue here. Another orthogonal stumbling block on the way to making all of this automatic is that the surely criticial shared_buffers setting will in any useful configuration require messing around with kernel settings that no PostgreSQL tool can really help with. Yes. So? All you can do is point this out to users. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Overhauling GUCS
On Fri, 6 Jun 2008, Heikki Linnakangas wrote: Or perhaps we should explicitly mark the settings the tool has generated, and comment out: #shared_buffers = 32MB # commented out by wizard on 2008-06-05 shared_buffers = 1024MB # automatically set by wizard on 2008-06-05 What I would like to do is make the tool spit out a revision history in the same way I find all big IT shops handling this already: by putting a revision history style commentary above the current setting. Here's a sample: # 2008-03-02 : 32MB : postgres : Database default # 2008-05-02 : 512MB : pg_autotune : Wizard update # 2008-05-15 : 1024MB : gsmith : Increased after benchmark tests shared_buffers = 1024MB If the first tuning tool that comes into existance used this format, and the format was reasonable, I think it would be possible to get people making manual edits to adopt it as well. The exact details of how this should look are off-topic for the main discussion here, though, so I'd prefer if this whole line of discussion died off. Anyone who wants to comment on this whole area, feel free to contact me off-list or edit the Wiki page (which has a section on this topic now) to hash out suggestions in this area, I'm trying to keep this somewhat thread focused now. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Overhauling GUCS
On Fri, 6 Jun 2008, Tom Lane wrote: I grow weary of this thread. If we keep it up for, oh, another three years, then maybe you'll be as weary as I am of struggling with problems in this area. Strinking a balance between the wants and needs of people who want a fancy GUI tool for configuring database settings with those who want to edit things manually is a difficult problem that is not going away. If this didn't keep coming back to haunt me all the time I'd like to forget about it myself. I will say it once more: I do not believe for one instant that the current formatting of postgresql.conf is the major impediment, or even a noticeable impediment, to producing a useful configuration wizard. Arguments about formatting change to postgresql.conf are a tangent to the central questions here, and having just closed some open comments on that I am with you on ignoring those as off-topic the same way I keep minimizing what are the parameters to tune? comments. Here are the relevant questions around since the first message that are not attracting discussion: 1) Is it worthwhile to expand the information stored in the GUC structure to make it better capable of supporting machine generation and to provide more information for tool authors via pg_settings? The exact fields that should or shouldn't be included remains controversial; consider default value, per-session/runtime/restart, and enum lists as the list of things that are most needed there. 2) Should the sample postgresql.conf file be replaced by a program that generates it using that beefed up structure instead, therefore removing one file that has to be manually kept in sync with the rest of the code base right now? 3) What now makes sense for a way to update database parameters for users whose primary (or only in some cases) access to the server is over the database port, given the other changes have improved automatic config file generation? If you wish to prove otherwise, provide a complete wizard except for the parts that touch the config file, and I will promise to finish it. You do realize that if I provided you with such a sample, the not implemented yet config API stubs it needs to work would be exactly what are suggested to add in the proposal page, right? I (and Josh) didn't just make them all up out of nowhere you know. I wrote a message here already about what the seemingly inevitable path the budding wizard tool hacker follows and why that leads into some of the changes suggested. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Overhauling GUCS
On Fri, 6 Jun 2008, Gregory Stark wrote: Greg Smith [EMAIL PROTECTED] writes: 1) Is it worthwhile to expand the information stored in the GUC structure to make it better capable of supporting machine generation and to provide more information for tool authors via pg_settings? The exact fields that should or shouldn't be included remains controversial; consider default value, per-session/runtime/restart, and enum lists as the list of things that are most needed there. Isn't that a list of what's *already* there? I should have been clearer there. Some of the items suggested are already in the structure, but aren't visible via pg_settings. In those cases it's just exporting information that's already there. In others (like the suggestion to add a URL to the documentation) it is actually a new field being added as well as its corresponding entry in the settings view. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Overhauling GUCS
On Fri, 6 Jun 2008, Tom Lane wrote: Well, you can't see the default or reset values in pg_settings, only the current value. However, I fail to see the use of either of those for a configure wizard. I'm under the impression that the primary reason to put the default in there is to make it easier for a file generator program to be decoupled a bit from the internal representation. Regardless, these values should be exposed for tool writers. If you build a prototype interface for an interactive settings changing tool, you quickly discover that showing the default, range, and recommended setting are all valuable things people would like to see when deciding what the change a setting to. And there's no reason accumulating all that info should be the responsibility of a tool writer when it's easy to expose and keep up to date inside the database itself. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Mon, 9 Jun 2008, Tom Lane wrote: It should also be pointed out that the whole thing becomes uninteresting if we get real-time log shipping implemented. So I see absolutely no point in spending time integrating pg_clearxlogtail now. There are remote replication scenarios over a WAN (mainly aimed at disaster recovery) that want to keep a fairly updated database without putting too much traffic over the link. People in that category really want zeroed tail+compressed archives, but probably not the extra overhead that comes with shipping smaller packets in a real-time implementation. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How to Sponsor a Feature
On Wed, 11 Jun 2008, Andrew Dunstan wrote: If we want to help people to sponsor features, then I think we need to deal with subjects like finding someone to undertake the development, the sponsor's relationship with the developer, methods and times of payment, etc. The bit on the wiki is helpful for developers trying to get a new feature implemented but I think that's where its scope ends. There seem to be occasional person wandering by here that it really doesn't help though. Periodically you'll see I want feature $X in PostgreSQL. I'm willing to help fund it. What do I do?. In most of those that have wandered by recently, $X is a known feature any number of other people want. Good sample cases here are recent requests to help fund or implement materialized views, supporting queries on read-only slaves, and SQL window support. I don't think these people need guidance on how to manage the project, they need some sort of way to feel comfortable saying will pledge $Y for feature $X in a way that makes sense on both sides. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Overhauling GUCS
On Wed, 11 Jun 2008, Tom Lane wrote: Who said anything about loops? What I am talking about is what happens during set memory_usage = X; // implicitly sets work_mem = X/100, say set work_mem = Y; set memory_usage = Z; What is work_mem now, and what's your excuse for saying so, and how will you document the behavior so that users can understand it? (Just to make things interesting, assume that some of the above SETs happen via changing postgresql.conf rather than directly.) People are already exposed to issues in this area via things like the include file mechanism. You can think of that two ways. You can say, there's already problems like this so who cares if there's another one. Or, you can say let's not add even more confusion like that. Having a mini programming language for setting parameters is interesting and all, and it might be enough to do a good job of handling the basic newbie setup chores. But I don't think it's a complete solution and therefore I find moving in that direction a bit of a distraction; your concerns about ambiguity just amplify that feeling. It's unlikely that will get powerful enough to enable the one true config file that just works for everybody. There's too many things that depend a bit on both data access pattern and on overall database size/structure no matter what you do. [If only there were some technology that did workload profiling and set the server parameters based on that. Some sort of dynamic tuning tool; wouldn't that be great? Oh well, that's just a dream right now I guess.] I'm not sure if I've stated this explicitly yet, but I personally have no interest in just solving the newbie problem. I want a tool to help out tuning medium to large installs, and generating a simple config file is absolutely something that should come out of that as a bonus. Anything that just targets the simple installs, though, I'm not very motivated to chase after. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Overhauling GUCS
On Thu, 12 Jun 2008, Bruce Momjian wrote: I am thinking a web-based wizard would make the most sense. I have not a single customer I work with who could use an external web-based wizard. Way too many companies have privacy policy restrictions that nobody dare cross by giving out any info about their server, or sometimes that they're even using PostgreSQL inside the firewall. If it's not a tool that you can run on the same server you're running PostgreSQL on, I'd consider that another diversion that's not worth pursuing. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How to Sponsor a Feature
On Thu, 12 Jun 2008, Alvaro Herrera wrote: Incidentally, we have minutes from the meeting. Is it OK to publish them openly? There's a set of minutes already up at http://wiki.postgresql.org/wiki/PgCon_2008_Developer_Meeting There was no solution proposed to the escrow problem, nor to allow sponsoring of one feature by multiple independent individuals. Pity, as those are the main things I get asked about. I've been thinking about this a fair amount recently, and it is difficult to figure out how SPI can handle this in reasonable way. It almost has to keep a hands-off approach, but the centeral organizers here are where people would think they should come for advice in this area. The best approach I've thought of is to have something like http://www.postgresql.org/support/professional_support this is instead a catalog of companies and/or associated worker bees who have successfully had submissions commited. Then the only interaction SPI/Core would have is to confirm that the claims people were making about what patches they were involved in were factual, which should be easy enough to verify just with the release notes, while disclaiming any interaction in contracting with said companies/individuals. This implements a meritocracy suggesting who people might work with by noting what areas they've worked in successfully before. For example, the last time I fielded one of these, the person I was advising wanted some PITR work done. I of course pointed them toward 2ndquadrant because everything they asked about was in code Simon wrote in the first place, and some pointers over to the release notes were sufficient to prove that was true. As for a format, I was thinking the directory would be organized like this: Company Person A 8.3 features involved in 8.2 features Person B 8.2 features ... Current/future projects 8.4 add feature Eventually add feature Nothing new, really, I'm just suggesting an alternate view on the data that's available if you know how to look for it, structured in a way that would make it easier for potential sponsors to navigate. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How to Sponsor a Feature
On Mon, 16 Jun 2008, Peter Eisentraut wrote: Joshua D. Drake wrote: In reality though, what should happen is we should have a list of companies and consultants that are willing to be paid to implement features, todos and bug fixes. I think the professional support company listing is already that list. It is a much larger superset of that list. There's a lot of entries there that provide support in various ways, but not core code customizations. You cannot expect anyone not already involved in the community to have any idea which of those companies have any track record of getting new features implemented. Maybe all that's needed is to extend the provides section there with a tag for those who are willing to take that sort of work on. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS]
On Wed, 25 Jun 2008, yuan fang wrote: i am studying the source code of postgresql and want to become a developer of it.What should i do? 1) If you send e-mail to pgsql-hackers, include a useful subject 2) Read the intros at http://www.postgresql.org/developer/ 3) For browsing the code itself, I like http://doxygen.postgresql.org/ 4) Notes on how to deal with version control issues, patch submission, and to find out what development is going on currently are all at http://wiki.postgresql.org/wiki/Development_information -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Location for pgstat.stat
On Tue, 1 Jul 2008, Tom Lane wrote: Magnus Hagander [EMAIL PROTECTED] writes: Tom Lane wrote: Hmm ... that would almost certainly result in the stats being lost over a system shutdown. How much do we care? Only for those who put it on a ramdrive. The default, unless you move/sync it off, would still be the same as it is today. While not perfect, the performance difference of going to a ramdrive might easily be enough to offset that in some cases, I think. Well, what I was wondering about is whether it'd be worth adding logic to copy the file to/from a safer location at startup/shutdown. Anyone who needs fast stats storage enough that they're going to symlink it to RAM should be perfectly capable of scripting server startup/shutdown to shuffle that to/from a more permanent location. Compared to the admin chores you're likely to encounter before reaching that scale it's a pretty easy job, and it's not like losing that data file is a giant loss in any case. The only thing I could see putting into the server code to help support this situation is rejecting an old stats file and starting from scratch instead if they restored a previous version after a crash that didn't save an updated copy. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] posix advises ...
On Sat, 12 Jul 2008, Abhijit Menon-Sen wrote: At 2008-07-12 00:52:42 +0100, [EMAIL PROTECTED] wrote: The later versions of mine had a GUC named effective_spindle_count which I think is nicely abstracted away from the implementation details. Yes, that does sound much better. (The patch I read had a preread_pages_bitmapscan variable instead.) This patch does need a bit of general care in a couple of areas. The reviewing game plan I'm working through goes like this: 1) Update the original fadvise test program Greg Stark wrote to be a bit easier to use for testing general compatibility of this approach. I want to collect some data from at least two Linux and Solaris systems with different disk setups. 2) Check out effective_spindle_count and see if it looks like a reasonable way to tune this feature. If so, will probably need to merge that in to Zoltan's version of the patch. May need some other cleanup in that patch set as well--I'm not sure that closed XLOG patch that got pushed into here as well is really helpful for example. 3) Generate a sequential scan test program aimed to hobble the Linux kernel in the way Zoltan described as motivation for his work. I'm working with Jeff Davis this week to try and repurpose some of his syncronized scan test programs to handle this while we're both in the same place for a bit. 4) Generate a bitmap scan test program to check the original patch. 5) If the performance results look useful and consistant, then move toward cleaning up broader compatibility issues like the segfault concerns Zoltan mentioned. Going to take a while to work through all that, but performance patches with platform-specific benefit are always painful like this. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sorting writes during checkpoint
On Mon, 7 Jul 2008, ITAGAKI Takahiro wrote: I will have a plan to test it on RAID-5 disks, where sequential writing are much better than random writing. I'll send the result as an evidence. If you're running more tests here, please turn on log_checkpoints and collect the logs while the test is running. I'm really curious if there's any significant difference in what that reports here in the sorted case vs. the regular one. Smoothed checkpoint in 8.3 spreads write(), but calls fsync() at once. With sorted writes, we can call fsync() segment-by-segment for each writes of dirty pages contained in the segment. It could improve worst response time during checkpoints. Further decreasing the amount of data that is fsync'd at any point in time might be a bigger improvement than just the sorting itself is doing (so far I haven't seen anything really significant just from the sort but am still testing). One thing I didn't see any comments from you on is how/if the sorted writes patch lowers worst-case latency. That's the area I'd hope an improved fsync protocol would help most with, rather than TPS, which might even go backwards because writes won't be as bunched and therefore will have more seeking. It's easy enough to analyze the data coming from pgbench -l to figure that out; example shell snipped that shows just the worst ones: pgbench -l -N db p=$! wait $p mv pgbench_log.${p} pgbench.log cat pgbench.log | cut -f 3 -d | sort -n | tail Actually graphing the latencies can be even more instructive, I have some examples of that on my web page you may have seen before. In addition, the current smgr layer is completely useless because it cannot be extended dynamically and cannot handle multiple md-layer modules. I would rather merge current smgr and part of bufmgr into a new smgr and add smgr_hook() than bulk_io_hook(). I don't really have a firm opinion here about the code to comment on this specific suggestion, but I will say that I've found the amount of layering in this area makes it difficult to understand just what's going on sometimes (especially when new to it). A lot of that abstraction felt a bit pass-through to me, and anything that would collapse that a bit would be helpful for streamlining the code instrumenting going on with things like dtrace. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [DOCS] [ADMIN] shared_buffers and shmmax
On Thu, 24 Jul 2008, Greg Sabino Mullane wrote: Bite the bullet and start showing the buffer settings as a pure number of bytes everywhere, and get rid of the confusing '8kB' unit in pg_settings? There's already some changes needed in this area needed to execute the full GUC cleanup/wizard plan that's being worked on. The pg_settings view really should show the value both as the user input it and as it's stored internally for cases like these, which lowers the confusion here a bit even without going so far as converting everything to bytes. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers