Re: [HACKERS] Confusion over Python drivers {license}
Jeff Davis wrote: Keep in mind that backwards compatibility is not the only issue here; forwards compatibility matters as well*. A lot of the encoding issues I wrote up ( http://wiki.postgresql.org/wiki/Driver_development ) will probably be real bugs in a python3 application using a driver that doesn't understand encoding properly. Sure, but the benefit of being the de-facto standard to some problem, because you're already supported by everybody, is that it's easier to attract the most help to move forward too. The disclaimers on investments always read "past performance is not indicative of future results". I don't think that's really true in the open-source project case. Generally, I'd rather trust a project that's been keeping up with things for a long time already to continue to do so, particularly one that's built up a community around it already, rather than risk switching to one without a known good track record in that record. (Exercise for the reader: consider how what I just said applies in both a positive and negative way toward MySQL and its associated development model) -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Remove old-style VACUUM FULL (which was known for a little while
Robert Haas wrote: Recording some bookkeeping information in pg_class so that pg_migrator can tell what's going on at a glance seems like the right approach, but I'm fuzzy on the details. November of 2008 was a pretty good month for me, so I enjoyed this flashback to it. That's when the path for how to handle space reservation wandered to this same place and then died there: how to know for sure what information to put into the catalog for the upgrade utility, before the upgrade utility exists. Message from Bruce at http://archives.postgresql.org/message-id/200901300457.n0u4v1707...@momjian.us and my follow-up summarized/linked to the highlights of the earlier discussion on that one. This time around, the way the upgrade is being staged allows a possible path through this dependency chain, as noted by Tom: It would probably be useful to have a utility that runs *in 9.0* and gets rid of MOVED bits, so that we could drop support for them in 9.1. But it's not happening for 9.0. As long as this one gets deprecated nicely here--so the server still knows how to deal with the ugly parts, but will not introduce any more of them--this should make for a good test case to gain experience with handling this whole class of problem. If the above exercise finishes with a clear "had we just recorded in the catalog before 9.0 came out we could have done this more easily", I think it would be much more likely that a future "we should record in the catalog to improve the odds of adding this feature in a way that can upgrade to it in-place" decision might get made correctly in advance of the upgrade utility actually existing. Right now, just like the 8.4 case, it seems quite possible no one will develop a plan in time they can prove will work well enough to justify adding speculative catalog support for it. Much easier to figure that out in retrospect though, after the matching utility that uses the data exists. If you think through the implications of that far enough, eventually you start to realize that you really can't even add a feature that requires an in-place upgrade hack to fix without first having the code that performs said hack done. Otherwise you're never completely sure that you put the right catalog pieces and related support code into the version you want to upgrade from. This is why it's not unheard of for commercial database products to require a working in-place upgrade code *before* the feature change gets committed. In this case, we get a lucky break in that it's easy to leave support for old path in there and punt the problem for now. I hope that we all learn something useful about this class of issue during this opportunity to get away with that with little downside. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL
Heikki Linnakangas wrote: Simon Riggs wrote: Might it not be simpler to add a parameter onto pg_standby? We send %s to tell pg_standby the standby_mode of the server which is calling it so it can decide how to act in each case. That would work too, but it doesn't seem any simpler to me. On the contrary. I don't know that the ideas are mutually exclusive. Should the server internals do a basic check that the files they're about to process seem sane before processing them? Yes. Should we provide a full-featured program that knows how far back it can delete archives rather than expecting people to write their own? Yes. And a modified pg_standby seems the easiest way to do that, since it already knows how to do the computations, among other benefits. I already have a work in progress patch to pg_standby I'm racing to finish here that cleans up some of the UI warts in the program, including the scary sounding and avoidable warnings Selena submitted a patch to improve. I think I'm going to add this feature to it, too, whether or not it's deemed necessary. I'm really tired of example setups provided that say "just write a simple script using cp like " and then supporting those non-production quality messes in the field. My scripts for this sort of thing have more error checking in them than functional code. And pg_standby already has a healthy sense of paranoia built into it. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.us
Re: [HACKERS] Confusion over Python drivers {license}
Tom Lane wrote: If you feel that a BSD/MIT license is a must-have for your purposes, you're certainly free to push development of one of the other driver projects instead, and to try to organize some other people to help. I don't believe anyone is trying to funnel all development effort into psycopg2. Certainly not, and I hope no one has gotten the impression that there's anything "official" being recognized about psycopg happening here because it hasn't. Anointing a "one true driver" (a phrase that seems to keep popping up in external discussions of this topic) isn't the sort of thing the PostgreSQL core does. And all that happens to people who ignore what I tell them to do is that they receive a steady stream of long, sarcastic e-mails for a while afterwards. I just updated http://wiki.postgresql.org/wiki/Python_PostgreSQL_Driver_TODO to reflect a few corrections I noted while researching (everybody else is welcome to edit that page too you know), and to include some of the recent feedback showing up on this list. I know I was just looking for a Python driver that is compatible with the apps I most often run into, documented well enough that I can write my own if I feel like it, fast enough, and free enough that I can deploy the result wherever I want. That seemed similar to the priorities other people who had an opinion here suggested too. Pragmatically, psycopg2 just seemed to have the shortest path toward being something useful to the largest userbase in that sort of context, and we've unofficially rolled down that path a bit. This rabble-rousing seems to have nudged both development communities toward being more closely aligned in the future in a couple of ways, which is great, but I wouldn't read much more into things than that. Other projects will continue to grow and shrink, and the pure Python ones in particular continue to be quite valuable because they fill a niche that psycopg2 doesn't target at all. I'd sure like to see all three of those projects merge into one big one though. My bet has to be on pg8000, since it has the perfect license, supports all the necessary Python versions, and it's been around long enough (almost two years) that support for it is already in the latest SQLAlchemy beta. We seem to have revitalized discussion around modernizing PyGreSQL too, so I wouldn't discount that one completely yet either. For those who feel a true BSD license is vital, I direct you toward http://mailman.vex.net/pipermail/pygresql/2010-February/002315.html to learn more about what direction they could use some help in going. But whatever you do, don't start another project instead. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Confusion over Python drivers {license}
Kevin Ar18 wrote: Based on that, I guess my question is what would it have taken to have picked one of BSD/MIT projects and working with those people instead? In other words, what key things affected the decision for psycopg? What areas is it so far ahead in or that would have just been too much work to fix in the other implementations? A lot of it as I see it is plain old fashioned popularity resulting from long sustained development. psycopg has been around since 2001, the current version is already compatible with SQLAlchemy, Django, Zope, PyDO, and it's integral to the large Python/PostgreSQL toolchain from Skype including tools like Londiste. When something is supported in that many places, and the main complaints are its license and documentation rather than its code, I know I'm not going to start by assuming I should just hack on somebody else's code to try and replace it just because their license is a little better. And I'd consider BSD/MIT only a little better than the LGPL. That sort of bad decision making is how we got to so many abandoned Python drivers in the first place. If everybody who decided they were going to write their own from scratch had decided to work on carefully and painfully refactoring and improving PyGreSQL instead, in an incremental way that grew its existing community along the way, we might have a BSD driver with enough features and users to be a valid competitor to psycopg2. But writing something shiny new from scratch is fun, while worrying about backward compatibility and implementing all the messy parts you need to really be complete on a project like this isn't, so instead we have a stack of not quite right drivers without any broad application support. (Aside: I have a bit of vocabulary I regularly use for this now. Open-source projects that just ignore working on an existing stack of working implementations, to instead start from scratch to build something of questionably improved fanciness instead regardless of its impact on backward compatibility and the existing userbase, have in my terminology "PulseAudioed" the older projects). The gauntlet I would throw down for anyone who thinks there's a better approach here is to demonstrate a driver that has working support going back to Python 2.4 for any two of the apps on my list above. Get even that much of a foothold, and maybe the fact that yours is more Pythonic, cleaner, or better licensed matters. Until then, working apps have to be the primary motivation for what to work on here, unless there's a really terrible problem with the driver. The existing psycopg license and the web site issues were in combination enough to reach that level of total issues for a number of people. With the news from Federico that a license improvement is approaching and signs of major documentation improvements, that problem seems like it will soon be solved as far as I'm concerned. When someone manages to make their alternative driver a prerequisite for an app I need, only at that point will that driver show back up on my radar. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] About psycopg2 (by its author)
gs... I tried to keep the part of that discussion that went into a more public form limited to listing suspected issues for further investigation. You can see the the list at http://wiki.postgresql.org/wiki/Python_PostgreSQL_Driver_TODO uses terms like "Confirm" and "Review" rather than saying outright there's a feature or bug issue. People say all sorts of things about bugs in software that aren't necessarily true, and I know your mailing list support is excellent once people report things there. I just checked, and apparently I first publicly plugged your project almost two years ago: http://archives.postgresql.org/pgsql-general/2008-04/msg00779.php But I will point out the reality is that the lack of a published (on a web page, stuff buried in list archives doesn't count) known bug list, development roadmap, and even standard web page documentation is working against your software being deployed more widely. If we had one, a lot of these questions about features or bugs wouldn't even pop up. Luckily the code quality is good enough that people put up with that situation, but I can tell you from plenty of conversations on this topic that all of your users I've come across (who are also our users!) cite this as a major weakness of your project. If the license issues get sorted out as you plan, that part I think we can end up helping out with using our infrastructure. You might note Marko Kreen already created http://wiki.postgresql.org/wiki/Psycopg to start working on just that. I think we'd all be fine with continuing to expand on that rather than worry about your revamping the initd.org site just to address the documentation goals we have. And we would certainly want to work more closely with you and your other contributors on that, to make sure everything is accurate and complete. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Confusion over Python drivers
Massa, Harald Armin wrote: I agree that there are some performance-challenges with pure-Python drivers. And we should not forget to look for the reasons for the incubation of that many pure-Python drivers: a) Python is no longer one-language, one-implementation. There are (at least) cPython (the original), Jython (on JVM), IronPython (from Microsoft on CLR), PyPy (Python on Python), Unladen Swallow (from Google on LLVM). In addition the nearly-Pythons as in Cython, RPython and ShedSkin... especially a) is a point to consider when standard, it's getting one driver that satisfies the needs of the people most like izing on a PostgreSQL blessed Python-Postgresql-driver. How will the blessing extend to Jython / Ironpython / PyPy? The point isn't so much "standardizing". Having a low performance Python driver turns into a PostgreSQL PR issue. Last thing we need is the old "PostgreSQL is slow" meme to crop back up again via the Python community, if the driver suggested by the community isn't written with performance as a goal so that, say, PostgreSQL+Python looks really slow compared to MySQL+Python. And if you're writing a database driver with performance as a goal, native Python is simply not an option. Now, once *that* problem is under control, and there's a nicely licensed, well documented, major feature complete, and good performing driver, at that point it would be completely appropriate to ask "what about people who want support for other Python platforms and don't care if it's slower?". And as you say, nurturing the "incubation" of such drivers is completely worthwhile. I just fear that losing focus by wandering too far in that direction, before resolving the main problem here, is just going to extend resolving the parts of the Python driver situation I feel people are most displeased with. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Confusion over Python drivers
Andrew McNamara wrote: I got sick of the constant stream of escaping bugs impacting on psycopg and pyPgSQL, and wrote my own DB-API driver, using the more modern libpq/binary/protocol 3 APIs where ever possible. The result is BSD licensed: http://code.google.com/p/ocpgdb/ I added you into the list at http://wiki.postgresql.org/wiki/Python Can you check what I put in there, confirm Windows compatibility, and comment on Python 3.X support? I'd be curious to hear more about the escaping bugs you ran into as well. We already have some notes on the TODO that pushing more of this work toward the standard libpq routines would seem appropriate for things passing between the driver and libpq. Were the issues you ran into on that side, or more on the Python side of how things were being formatted? -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Confusion over Python drivers
Josh Berkus wrote: Anyway, I don't yet have a full diagnosis on the transaction control issue or I'd already have posted it to psycopg -- it may be a toxic interaction between Django and Psycopg2 rather than psycopg2 alone. I'd not have brought it up except for this discussion. I'm going to remove it from the list on the wiki then for now. I don't want to annoy the developers by adding a more speculative bug that might not even be in their software. If you get to where it's confirmed and info posted to their list, please add a link back into the page once it's reported, i.e. link to their mailing list archives or something like that. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Confusion over Python drivers
Greg Smith wrote: Here's a full TODO page that includes everything mentioned here as best I could summarize it: http://wiki.postgresql.org/wiki/Python_PostgreSQL_Driver_TODO Looks like the first action item is to talk with the Psycopg people about their license. Oh: and I'm going to take care of this. License changes can be a very sensitive topic and I'm told that discussion probably needs to happy in Italian too; I can arrange that. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Confusion over Python drivers
Marko Kreen wrote: I think we should concentrate on the PR problem and technical issues related to that, keep the other low-level and non-user-visible issues out. Or at least separate. (PsycopgTodo wiki page?) That's just a matter of prioritizing the issues. Put the big ones at the top, the trivia at the bottom, and if you knock stuff of there somewhere along the way you discover you've made enough progress that the PR stuff starts going away, because people are able to get their work done with less drama. Here's a full TODO page that includes everything mentioned here as best I could summarize it: http://wiki.postgresql.org/wiki/Python_PostgreSQL_Driver_TODO Looks like the first action item is to talk with the Psycopg people about their license. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Faster CREATE DATABASE by delaying fsync (was 8.4.1 ubuntu karmic slow createdb)
Robert Haas wrote: Well it seems that what we're trying to implement is more like it_would_be_nice_if_you_would_start_syncing_this_file_range_but_its_ok_if_you_dont(), so maybe that would work. Anyway, is there something that we can agree on and get committed here for 9.0, or should we postpone this to 9.1? It seems simple enough that we ought to be able to get it done, but we're running out of time and we don't seem to have a clear vision here yet... This is turning into yet another one of those situations where something simple and useful is being killed by trying to generalize it way more than it needs to be, given its current goals and its lack of external interfaces. There's no catversion bump or API breakage to hinder future refactoring if this isn't optimally designed internally from day one. The feature is valuable and there seems at least one spot where it may be resolving the possibility of a subtle OS interaction bug by being more thorough in the way that it writes and syncs. The main contention seems to be over naming and completely optional additional abstraction. I consider the whole "let's make this cover every type of complicated sync on every platform" goal interesting and worthwhile, but it's completely optional for this release. The stuff being fretted over now is ultimately an internal interface that can be refactored at will in later releases with no user impact. If the goal here could be shifted back to finding the minimal level of abstraction that doesn't seem completely wrong, then updating the function names and comments to match that more closely, this could return to committable. That's all I thought was left to do when I moved it to "ready for committer", and as far as I've seen this expanded scope of discussion has just moved backwards from that point. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Confusion over Python drivers
Marko Kreen wrote: The pg8000 / bpgsql seem to be toy projects, and anyway you dont want to use pure-Python drivers in high-performance environments. We are not talking about C#/java here. Right, and the comments from James reinforce this general idea: there is little value to the people who most want Python+PostgreSQL support in working on any of the pure Python implementations, because best case performance is still half or less of the ones that more directly wrap libpq. Even the best case with psycopg is enough of a performance hit as it is. py-postgresql seems to be more serious, but as it's python3 only which makes it irrelevant today. Also true. Psycopg was the leader, especially in web-environments, but it has non-obvious license and with dead website it does not seem that attractive. Although it is well-maintained still. Best path forward would be to talk with Psycopg guys about license clarification/change. Agreed. A relicensed Psycopg, with a more professional looking introduction/documentation page (like the budding page on our Wiki) than what the initd web site has, seems like the best platform to hack on top of to me as well. The fact that they've moved to git recently makes it that much easier for another branch to exist even outside of their somewhat troubled infrastructure. To summarize what I saw on this thread, the primary wishlist of changes to it are: -License change -Consider refactoring to better follow standard driver practices, such as using PQExecParams -Improvement in transaction control to resolve issues that cause idle transactions -Possible simplifications in how it's implemented async operations, to improve robustness/reduce code complexity -Confirm/add multi-threaded support -Confirm/add support for the most common standard types (such as array) PyGreSQL is the oldest, older than DB-API, and so it's DB-API interface seems an afterthought and is untested/underused - eg. it does not support bytea. And if Psycopg can't be relicensed happily and/or improved as above, as the only other native Python driver PyGreSQL looks like the next candidate to build on top of. Its major issues are: -Test/complete/refactor for full DB-API 2.0 support -Add bytea support -Add extension support, perhaps modeled on what Psycopg. -Build a COPY extension -Confirm/add multi-threaded support -Confirm/add support for the most common standard types (such as array) Any other suggestions before I turn the above into a roadmap page on the wiki? -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Faster CREATE DATABASE by delaying fsync (was 8.4.1 ubuntu karmic slow createdb)
Andres Freund wrote: On 02/03/10 14:42, Robert Haas wrote: Well, maybe we should start with a discussion of what kernel calls you're aware of on different platforms and then we could try to put an API around it. In linux there is sync_file_range. On newer Posixish systems one can emulate that with mmap() and msync() (in batches obviously). No idea about windows. There's a series of parameters you can pass into CreateFile: http://msdn.microsoft.com/en-us/library/aa363858(VS.85).aspx A lot of these are already mapped inside of src/port/open.c in a pretty straightforward way from the POSIX-oriented interface: O_RDWR,O_WRONLY -> GENERIC_WRITE, GENERIC_READ O_RANDOM -> FILE_FLAG_RANDOM_ACCESS O_SEQUENTIAL -> FILE_FLAG_SEQUENTIAL_SCAN O_SHORT_LIVED -> FILE_ATTRIBUTE_TEMPORARY O_TEMPORARY -> FILE_FLAG_DELETE_ON_CLOSE O_DIRECT -> FILE_FLAG_NO_BUFFERING O_DSYNC -> FILE_FLAG_WRITE_THROUGH You have to read the whole "Caching Behavior" section to see exactly how all of those interact, and even then notes like http://support.microsoft.com/kb/99794 are needed to follow the fine points of things like FILE_FLAG_NO_BUFFERING vs. FILE_FLAG_WRITE_THROUGH. So anything that's setting those POSIX open flags better than before is getting the benefit of that improvement on Windows, too. But that's not quite the same as the changes using fadvise to provide better targeted cache control hints. I'm getting the impression that doing much better on Windows might fall into the same sort of category as Solaris, where the primary interface for this sort of thing is to use an AIO implementation instead: http://msdn.microsoft.com/en-us/library/aa365683(VS.85).aspx The effective_io_concurrency feature had proof of concept test programs that worked using AIO, but actually following through on that implementation would require a major restructuring of how the database interacts with the OS in terms of reads and writes of blocks. It looks to me like doing something similar to sync_file_range on Windows would be similarly difficult. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Confusion over Python drivers
Bruce Momjian wrote: While I realize experienced people can easily navigate this confusion... No, that's the worst part--the more you know and the deeper you dig into it, the more broken you realize the whole thing is. When one of the best drivers (in some respects) has a web page that looks like this: http://initd.org/ that doesn't seem so bad at first--but if you're experienced, you know that the page has been in that disturbing state since late 2006. You start digging into the driver mess, figure you just need to learn how things fit together, but the hole keeps getting bigger as you dig. The issues here have already been identified: the Perl DBI is an excellent spec, while the Python one is so weak everybody ends up needing their own extensions to it. And then portability *even among Python PostgreSQL drivers* goes out the window. If somebody built a BSD/MIT licensed driver that replaces every useful feature of all the forks, with no major problems, and a couple of major projects switched over to it (think "Skype" level big), maybe this mess could get resolved. I think it would take someone already familiar with the major issues involved a couple of months of regular work to make any serious progress on it. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] development setup and libdir
Ivan Sergio Borgonovo wrote: I'm absolutely aware I can't ask features unless someone is willing to implement them or is paid for... but the easier/cheaper it is to build up a dev/test environment the more people will try to build/test something for postgres. I do all my build/test work on Ubuntu using my peg tool: http://github.com/gregs1104/peg specifically because I find it such a giant pain to deal with the Debian packaging when developing. It does all of the things you're complaining take too long to setup just for development work. Dump the whole working tree in your home directory, build and test everything there, avoid the whole mess of what the OS install cares about. Once I've got something that works, only then do I bother figuring out how I'm going to get that integrated back into the packaging system's PostgreSQL install. The difficulties in making modules generally easier to install are well understood, and many people have been working on potential improvements to that whole situation for a while now. It's a hard problem though. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 64-bit size pgbench
Tom Lane wrote: In the past we've rejected proposed patches for pgbench on the grounds that they would make results non-comparable to previous results. So the key question here is how much this affects the speed. Please be sure to test that on a 32-bit machine, not a 64-bit one. Sheesh, who has a 32-bit machine anymore? I'll see what older hardware I can dig up. I've realized there are two separate issues to be concerned about: 1) On small scale data sets, what's the impact of the main piece of data being shuffled around in memory (the account number in the accounts table) now being 64 bits? That part might be significantly worse on 32-bit hardware. 2) How does the expansion in size of the related primary key on that data impact the breakpoint where the database doesn't fit in RAM anymore? I did just updated my pgbench-tools package this month so that it happily runs against either 8.3 or 8.4/9.0 and I've done two rounds of extensive test runs lately, so plenty of data to compare against here. ! retval = (int64) strtol(res, &endptr, 19); That bit is merely wishful thinking :-( I did specificially say I didn't trust that call one bit. There is a middle ground position here, similar to what Robert suggested, that I just add a "large mode" to the program for people who need it without touching the current case. That might allow me to sidestep some of these issues I may not have a good answer to with getting the \setshell feature working right in 64 bits, could just make that one specific to "regular mode". In any case, I think this limitation in what pgbench can do has risen to be a full-on bug at this point for the expected users of the next version, and I'll sit on this until there's something better we can make available. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WARNING: pgstat wait timeout
I just found a few of these errors in a log file during some pgbench testing tonight. Linux, recent CVS HEAD; given the range of systems and versions this has been reported against now, this bug doesn't look like a platform or version/build specific issue. Unfortunately the instance I had up wasn't setup very well for logging, but I did notice that all of the ones I saw had nearby messages about autovacuum issues, which seems to match Tom's earlier suggestion at http://archives.postgresql.org/pgsql-bugs/2009-07/msg00083.php that the source of the warning (but not necessarily the underlying problem) for these is the autovacuum launcher complaining; here's two different sets: ERROR: canceling autovacuum task CONTEXT: automatic analyze of table "pgbench.public.pgbench_accounts" WARNING: pgstat wait timeout WARNING: pgstat wait timeout ERROR: canceling autovacuum task CONTEXT: automatic analyze of table "pgbench.public.pgbench_accounts" WARNING: pgstat wait timeout ERROR: canceling autovacuum task CONTEXT: automatic analyze of table "pgbench.public.pgbench_accounts" ERROR: canceling autovacuum task CONTEXT: automatic analyze of table "pgbench.public.pgbench_accounts" Because of what I'm (not) seeing in pg_stat_bgwriter, I'm suspicious that its underlying work or messages may be involved here. I'm not seeing the sort of totals I expect in that view after these large bouts of activity. Now, my use tonight has included the new pg_stat_reset_shared('bgwriter') to clear out those stats between runs, so perhaps that's involved too. Guessing not only because of the reports going back to 8.4 that also have this error message. Will keep an eye out for this one now that I know I might run into it, have already cranked up the logging and will look for something reproducible to gather more info. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] 64-bit size pgbench
Attached is a patch that fixes a long standing bug in pgbench: it won't handle scale factors above ~4000 (around 60GB) because it uses 32-bit integers for its computations related to the number of accounts, and it just crashes badly when you exceed that. This month I've run into two systems where that was barely enough to exceed physical RAM, so I'd expect this to be a significant limiting factor during 9.0's lifetime. A few people have complained about it already in 8.4. The index size on the big accounts table has to increase for this to work, it's a bigint now instead of an int. That's going to mean a drop in results for some tests, just because less index will fit in RAM. I'll quantify that better before submitting something final here. I still have some other testing left to do as well: making sure I didn't break the new \setshell feature (am suspicious of strtol()), confirming the random numbers are still as random as they should be (there was a little bug in the debugging code related to that, too). Was looking for general feedback on whether the way I've converted this to use 64 bit integers for the account numbers seems appropriate, and to see if there's any objection to fixing this in general given the potential downsides. Here's the patch in action on previously unreachable sizes (this is a system with 8GB of RAM, so I'm basically just testing seek speed here): $ ./pgbench -j 4 -c 8 -T 30 -S pgbench starting vacuum...end. transaction type: SELECT only scaling factor: 5000 query mode: simple number of clients: 8 number of threads: 4 duration: 30 s number of transactions actually processed: 2466 tps = 82.010509 (including connections establishing) tps = 82.042946 (excluding connections establishing) $ psql -x -c "select relname,reltuples from pg_class where relname='pgbench_accounts'" -d pgbench relname | pgbench_accounts reltuples | 5e+08 $ psql -x -c "select pg_size_pretty(pg_table_size('pgbench_accounts'))" -d pgbench pg_size_pretty | 63 GB $ psql -x -c "select aid from pgbench_accounts order by aid limit 1" -d pgbench aid | 1 $ psql -x -c "select aid from pgbench_accounts order by aid desc limit 1" -d pgbench aid | 5 -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c index 38086a5..8a7064a 100644 *** a/contrib/pgbench/pgbench.c --- b/contrib/pgbench/pgbench.c *** usage(const char *progname) *** 313,326 } /* random number generator: uniform distribution from min to max inclusive */ ! static int ! getrand(int min, int max) { /* * Odd coding is so that min and max have approximately the same chance of * being selected as do numbers between them. */ ! return min + (int) (((max - min + 1) * (double) random()) / (MAX_RANDOM_VALUE + 1.0)); } /* call PQexec() and exit() on failure */ --- 313,326 } /* random number generator: uniform distribution from min to max inclusive */ ! static int64 ! getrand(int64 min, int64 max) { /* * Odd coding is so that min and max have approximately the same chance of * being selected as do numbers between them. */ ! return min + (int64) (((max - min + 1) * (double) random()) / (MAX_RANDOM_VALUE + 1.0)); } /* call PQexec() and exit() on failure */ *** runShellCommand(CState *st, char *variab *** 630,636 FILE *fp; char res[64]; char *endptr; ! int retval; /* * Join arguments with whilespace separaters. Arguments starting with --- 630,636 FILE *fp; char res[64]; char *endptr; ! int64 retval; /* * Join arguments with whilespace separaters. Arguments starting with *** runShellCommand(CState *st, char *variab *** 704,710 } /* Check whether the result is an integer and assign it to the variable */ ! retval = (int) strtol(res, &endptr, 10); while (*endptr != '\0' && isspace((unsigned char) *endptr)) endptr++; if (*res == '\0' || *endptr != '\0') --- 704,710 } /* Check whether the result is an integer and assign it to the variable */ ! retval = (int64) strtol(res, &endptr, 19); while (*endptr != '\0' && isspace((unsigned char) *endptr)) endptr++; if (*res == '\0' || *endptr != '\0') *** runShellCommand(CState *st, char *variab *** 712,718 fprintf(stderr, "%s: must return an integer ('%s' returned)\n", argv[0], res); return false; } ! snprintf(res, sizeof(res), "%d", retval); if (!putVariable(st, "setshell", variable, res)) return false; --- 712,718 fprintf(stderr, "%s: mu
Re: [HACKERS] Streaming replication, and walsender during recovery
Tom Lane wrote: (Anyway, the argument that it's important for performance is pure speculation AFAIK, untainted by any actual measurements. Given the lack of optimization of WAL replay, it seems entirely possible that the last thing you want to burden a slave with is sourcing data to more slaves.) On any typical production hardware, the work of WAL replay is going to leave at least one (and probably more) CPUs idle, and have plenty of network resources to spare too because it's just shuffling WAL in/out rather than dealing with so many complicated client conversations. And the thing you want to redistribute--the WAL file--is practically guaranteed to be sitting in the OS cache at the point where you'd be doing it, so no disk use either. You'll disrupt a little bit of memory/CPU cache, sure, but that's about it as far as leeching resources from the main replay in order to support the secondary slave. I'll measure it fully the next time I have one setup to give some hard numbers, I've never seen it rise to the point where it was worth worrying about before to bother. Anyway, I think what Simon was trying to suggest was that it's possible right now to ship partial WAL files over as they advance, if you monitor pg_xlogfile_name_offset and are willing to coordinate copying chunks over. That basic idea is even built already--the Skytools walmgr deals with partial WALs for example. Having all that built-into the server with a nicer UI is awesome, but it's been possible to build something with the same basic feature set since 8.2. Getting that going with a chain of downstreams slaves is not so easy though, so there's something that I think would be unique to the 9.0 implementation. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CommitFest status summary 2010-01-27
Robert Haas wrote: Waiting for Initial Review (4) == plpython3 - no reviewer yet This whole feature seems quite interesting, and I'm really impressed at how much work James has put into rethinking what a Python PL should look like. But isn't the fact that there isn't even a reviewer for it yet evidence it needs more time to develop a bit of a community first before being considered for core? This seems to me like the sort of patch you'd want to play with for a month or two before you could really have an informed opinion about how working with it flows compared to the existing implementation, and that's not going to be possible here. Provide rowcount for utility SELECTs I think I can write a review for this one right now based on the discussion that's already happened: -Multiple people think the feature is valuable and it seems worth exploring -The right way to handle the protocol change here is still not clear -There are any number of subtle ways clients might be impacted by this change, and nailing that down and determining who might break is an extremely wide ranging bit of QA work that's barely been exploring yet That last part screams "returned with feedback" for something submitted to the last CF before beta to me. As a candidate for 9.1-alpha1 where there's plenty of time to figure out what it breaks and revert if that turns out to be bad? That sounds like a much better time to be fiddling with something in this area. I would like to see the following in order to make this patch have a shot at being comittable: 1) Provide some examples showing how the feature would work in practice and of use-cases for it. 2) To start talking about what's going to break, some notes about what this does to the regression tests would be nice. 3) Demonstrate with example sessions the claims that there are no backward compatibility issues here. I read "when you mix old server with new clients or new server with old client, it will just work as before", but until I see a session proving that I have to assume that's based on code inspections rather than actual tests, and therefore not necessarily true. (Nothing personal, Zoltan--just done way too much QA in the last year to believe anything I'm told about code without a matching demo). 4) Investigate and be explicit about the potential breakage here both for libpq clients and at least one additional driver too. If I saw a demonstration that this didn't break the JDBC driver, for example, I'd feel a lot better about the patch. Expecting a community reviewer to do all that just to confirm this code change is worthwhile seems a pretty big stretch to me, and I wouldn't consider it very likely that set of work could get finished in time for this CF. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clustering Docs WAS: Mammoth in Core?
Alvaro Herrera wrote: Greg Smith wrote: Alvaro Herrera wrote: Greg Smith wrote: We've got pgsql-cluster-hackers to discuss this particular area. Huh, is this a new list? It wasn't added to wwwmaster's list of lists, apparently, right? The archives are at http://archives.postgresql.org/pgsql-cluster-hackers/ but it's not listed at http://archives.postgresql.org/ for some reason. Because it wasn't added to the database. It seems we've gotten sloppy about the list creation process (though I admit it doesn't seem to be documented anywhere). If you give me a description for the list I'll add it. Discussion of adding replication and clustering features to PostgreSQL, both inside the database and via integration with additional software. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clustering Docs WAS: Mammoth in Core?
Alvaro Herrera wrote: Greg Smith wrote: We've got pgsql-cluster-hackers to discuss this particular area. Huh, is this a new list? It wasn't added to wwwmaster's list of lists, apparently, right? The archives are at http://archives.postgresql.org/pgsql-cluster-hackers/ but it's not listed at http://archives.postgresql.org/ for some reason. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] C function accepting/returning cstring vs. text
Ivan Sergio Borgonovo wrote: Is there a book? You'll find a basic intro to this area in "PostgreSQL Developer's Handbook" by Geschwinde and Schonig. You're already past the level of questions they answer in there though. This whole cstring/text issue you're asking about, I figured out by reading the source code, deciphering the examples in the talk at http://www.joeconway.com/presentations/tut_oscon_2004.pdf , and then using interesting keywords there ("DatumGetCString" is a good one to find interesting places in the code) to find code examples and messages on that topic in the list archives. Finally, finding some commits/patches that do things similar to what you want, and dissecting how they work, is quite informative too. I agree the learning curve could be a shortened a lot, and have actually submitted a conference talk proposal in this area to try and improve that. You're a few months ahead of me having something written down to share though. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Faster CREATE DATABASE by delaying fsync (was 8.4.1 ubuntu karmic slow createdb)
Greg Stark wrote: Actually before we get there could someone who demonstrated the speedup verify that this patch still gets that same speedup? Let's step back a second and get to the bottom of why some people are seeing this and others aren't. The original report here suggested this was an ext4 issue. As I pointed out recently on the performance list, the reason for that is likely that the working write-barrier support for ext4 means it's passing through the fsync to "lying" hard drives via a proper cache flush, which didn't happen on your typical ext3 install. Given that, I'd expect I could see the same issue with ext3 given a drive with its write cache turned off, so that the theory I started trying to prove before seeing the patch operate. What I did was create a little test program that created 5 databases and then dropped them: \timing create database a; create database b; create database c; create database d; create database e; drop database a; drop database b; drop database c; drop database d; drop database e; (All of the drop times were very close by the way; around 100ms, nothing particularly interesting there) If I have my system's boot drive (attached to the motherboard, not on the caching controller) in its regular, lying mode with write cache on, the creates take the following times: Time: 713.982 ms Time: 659.890 ms Time: 590.842 ms Time: 675.506 ms Time: 645.521 ms A second run gives similar results; seems quite repeatable for every test I ran so I'll just show one run of each. If I then turn off the write-cache on the drive: $ sudo hdparm -W 0 /dev/sdb And repeat, these times show up instead: Time: 6781.205 ms Time: 6805.271 ms Time: 6947.037 ms Time: 6938.644 ms Time: 7346.838 ms So there's the problem case reproduced, right on regular old ext3 and Ubuntu Jaunty: around 7 seconds to create a database, not real impressive. Applying the last patch you attached, with the cache on, I see this: Time: 396.105 ms Time: 389.984 ms Time: 469.800 ms Time: 386.043 ms Time: 441.269 ms And if I then turn the write cache off, back to slow times, but much better: Time: 2162.687 ms Time: 2174.057 ms Time: 2215.785 ms Time: 2174.100 ms Time: 2190.811 ms That makes the average times I'm seeing on my server: HEAD Cached: 657 ms Uncached: 6964 ms Patched Cached: 417 ms Uncached: 2183 ms Modest speedup even with a caching drive, and a huge speedup in the case when you have one with slow fsync. Looks to me that if you address Tom's concern about documentation and function naming, comitting this patch will certainly deliver as promised on the performance side. Maybe 2 seconds is still too long for some people, but it's at least a whole lot better. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.co -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dividing progress/debug information in pg_standby, and stat before copy
Tom Lane wrote: Right, but the question is: is there enough use-case left in it to justify spending community effort on polishing rough edges? It's not like we haven't got plenty else to do to get 9.0 out the door. The point I was trying to make is that I'm on the hook to do this particular job whether or not the community feels it's worthwhile. The only real decision is whether there's enough interest in upgrading this utility to review and possibly commit the result, and I'm trying to minimize the public resources needed even for those parts. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dividing progress/debug information in pg_standby, and stat before copy
Tom Lane wrote: Maybe I'm missing something, but I thought pg_standby would be mostly dead once SR hits the streets. Is it worth spending lots of time on? I have to do the work I outlined regardless, to support installs on earlier versions (luckily there's few backport issues for this code). Also, some people are going to have working WAL shipping installs they just don't want to mess with as part of the upgrade--particularly given the relative newness of the SR code--that they should be able to convert over easily. One reason we hadn't really brought up merging these pg_standby changes into core yet is for the reason you describe. Simon and I didn't think there'd be much community uptake on the idea given it's a less likely approach for future installs to use, didn't want to distract everyone with this topic. But if it mainly occupies time from people who have similar requirements that drive working on it anyway, like Selena it appears, it would be nice to see a "final" pg_standby get shipped as a result that has less obvious rough parts. Doubt much work will go into it beyond this release though. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dividing progress/debug information in pg_standby, and stat before copy
Selena Deckelmann wrote: I can scan through the code tonight and look for other cases where this might be an issue. The main thing I'm looking for is to distinguish between harmful and non-harmful errors. Where I think this is going toward is where every line that comes out of this program gets tagged with an explicit log level, same as everything else in the server and using the same terminology (LOG, INFO, WARNING, DEBUG), and we just need to make sure those levels are appropriate given the severity of the message. I was planning a similar sweep through all the messages the program produces to classify them like that, I have a starter set of suggestions from Simon to chew on I'm working through. Yes, a wrapper is desperately needed with timestamps. Right--that's something I too was planning to add eventually too, so we might as well get the right infrastructure in there to make that easy while we're mucking around with all this logging anyway. I'll touch base with you later this week once I've done my initial pass through all this; not sure we need to drag this list through all those details until we've got a unified patch to propose. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dividing progress/debug information in pg_standby, and stat before copy
Josh Berkus wrote: We discussed this issue at LCA where I encountered these bogus error messages when I was doing the demo of HS. I consider Selena's patch to be a bug-fix for beta of 9.0, not a feature. Currently the database reports a lot of false error messages when running in standby mode, and once we have 1000's more users using standby mode, we're going to see a lot of related confusion. Does anyone have a complete list of the false error messages and what context they show up in so that a proper test case could be constructed? I extracted some pg_standby changes from Simon last week that have some overlap with Selena's patch (better logging, remove bogus link feature, throw less false error messages out). I'm not quite ready to submit anything here just yet, I'm going to break that into more targeted patches, but I will be later this week. I share the concern here that some of these issues are annoying enough to be considered bugs, and I need to fix them regardless of whether anybody else does. I'd be happy to work with Selena as a review pair here, to knock out the worst of the problems on this program, now that the use-case for it should be more popular. pg_standby could use a bit of an upgrade based on the rough edges found by all its field tests, most of which is in error handling and logging. I don't have anything like her stat check in what I'm working on, so there's certainly useful stuff uniquely in each patch. As far as her questions go: > * Could we just re-use '-l' for logging? The patch I'm working on adds "-v verbosity" so that logging can be a bit more fine-grained than that even. Having both debug and a progress report boolean can then get folded into a single verbosity level, rather than maintain two similar paths. Just make debug equal to the highest verbosity and maybe start deprecating that switch altogether. One reason I'm not quite ready to submit what I've got yet is that I want to unify things better here. I think that I'd prefer to use the same terminology as log_min_messages for the various options, and make a macro wrapper like ELOG this code uses instead of all these terrible direct fprintf([stderr|stdout]... calls. > * Is there a way to get a non-module to use the ereport/elog system? And that work would make this transition easier to make, too, if it became feasible. I fear that's outside of the scope of what anyone wants to touch at this point though. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clustering Docs WAS: Mammoth in Core?
Joshua D. Drake wrote: My suggestion would be to start a new thread entirely. And mine would be to drop this whole topic altogether until after the CommitFest is over, and instead focus on the replication features already committed that need work before beta can even start. Only reason I said something is that I objected to the idea that the information on the wiki was complete enough to be considered a roadmap for others to follow in this area and wanted to clarify that it wasn't--it's helpful, but everybody knows it still needs work yet. Actually improving things enough that it is, that work I think is well outside of what this list should be focusing on right now. We've got pgsql-cluster-hackers to discuss this particular area. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MySQL-ism help patch for psql
Ross J. Reedstrom wrote: So a quick mapping of most-needed commands, and a pointer to the docs for the full ramifications and subtle differences seems to fit the existing documentation module. And that's been done at least twice already: http://blog.endpoint.com/2009/12/mysql-and-postgres-command-equivalents.html http://en.wikibooks.org/wiki/Converting_MySQL_to_PostgreSQL -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] commit fests
Kevin Grittner wrote: Other posts have suggested that "review fests" might be helpful in this period. Again, it sounds to me, from other posts on this thread, as though the primary risk is that people working on the release could see something they couldn't resist getting drawn into -- taking them off-task and delaying the release. The obvious solution to that would be to create a pgsql-journeyman-peer-review list for review fests during the release window. Be careful, you're wandering quickly down the classic path by which you'll find yourself in charge of doing some work here. I think it's completely reasonable to say that someone could organize pgsql-rrreviewers (as an initial working area, maybe another list eventually) for periodic ReviewFest during periods where those patches won't be considered for commit, such as beta. Now that most patch submitters have gotten used to doing a matching bit of peer review, the pool of people to do the reviews is there without having to pull anyone else into that. I could even see the rrreviewers list or another one split out of it grow into a somewhat gentler place for people to ask for help with their patch development too--just ban all the grumpy people from there (I'll unsubscribe myself). The important thing is that everyone would need to careful to respect not letting that spill over onto this list during the periods there is no official CommitFest going on, or there will be a net increase in said grumpy people. Looking at stats here for the recent CFs, about 40% of patches submitted are returned with feedback (or rejected) rather than being committed anyway. And I'm estimating that >80% of patches only reach comittable after a round of review+corrections first. Getting a lot of that work out of the way outside of the regular CF seems a worthwhile goal. Starting the first CommitFest of the next version (normally a quite painful one) with a set of patches already marked "Ready for Committer" or pruned out with feedback already, all because they've been through a round or two of initial review, would be a nice improvement. Just drop a summary of what's been done onto pgsql-hackers once the CF opens again for the ones still in the running and off you go. The existing CF app could be used to track the early review work too, so someone who wasn't on pgsql-rrreviewers could dig into the archives to catch up in a few minutes by following the message ID links. I do that all the time for patches I had previously been ignoring and deleting out of my mailbox. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] commit fests
Kevin Grittner wrote: Does it really take the concerted efforts of the whole community five months to take things from the deadline for patch commits (end of last CF) to release? The idea is that it takes five months of complete lockdown to give the code enough testing time to hopefully reach stability. I don't think that part is particularly controversial--reduce it, and quality drops, period. And as pointed out by a couple of people, even one release per year of a database is more than many users really want to consume, as evidenced by the number of 7.X installs still going because "we don't want to touch the working app" we still hear about. The question then is what else you could be doing during that period. Let's say that the day 9.0 beta 1 came out, a new 9.1 branch was created and CommitFests against it started, during what right now would be time exclusively devoted to beta testing. Completely feasible idea. There would be some forward/backporting duplication of work while those were running in parallel, and that would be harder than it needs to be until something like a git transition happens. But you certainly could do it. So why not do that? Developing new features is fun and tends to attract sponsorship dollars. Testing a frozen release, finding bugs, and resolving them is boring, and no one sponsors it. Therefore, if you let both things go on at once, I guarantee you almost all of the community attention will be diverted toward new development during any period where both are happening at the same time. Give developers a choice between shiny and profitable vs. dull and unpaid, and they'll focus on the former every time. That means that there's no possible way you can keep new development open without hurting the dreary work around stabilizing the beta in the process. You have to put all the fun toys away in order to keep focus on the painful parts. Plenty of other projects don't work this way, with a beta drop being a kick off to resume full-on development. There's a reason why PostgreSQL has a better reputation for quality releases than they do. It's an enterprise-class database; you don't just throw code in there, release the result every quarter, and expect the result to be stable. If developers are turned away because they want more instant gratification for their development efforts, they're working on the wrong type of project for them. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] HS/SR and smart shutdown
Heikki Linnakangas wrote: It's a good question if that still makes sense with Hot Standby. Perhaps we should redefine smart shutdown in standby mode to shut down as soon as all read-only connections have died. I've advocated in the past that an escalating shutdown procedure would be helpful in general to have available. Start kicking off clients with smart, continue to fast if there's any left, and if there's still any left after that (have seen COPY clients that ignore fast) disconnect them and go to immediate to completely kill them. Once you've started the server on the road to shutdown, even with smart, you've basically committed to going all the way down by whatever means is available anyway, so why not make that more automated and easier. If something like that were available, I could see inserting a step in the middle there specifically aimed at resolving this issue. Maybe it's just a change to the beginning of fast shutdown, or to the end of smart as I think you're suggesting. Perhaps you only get it if you do one of these escalating shutdowns I'm proposing, making that the preferred way to handle HS servers. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MonetDB test says that PostgreSQL often has errors or missing results
Mark Wong wrote: What the TPC provides isn't really a usable kit. It could be entertaining to see how their kit works. The one for TPC-H seems to work for a lot of people; the best of the intros I found for how to make it go was http://bhairav.serc.iisc.ernet.in/doc/Installation/tpch.htm , there are others. I'd guess MonetDB followed a procedure just like that, discovered some queries didn't work right, and just called it a day and published rather than investigate. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MonetDB test says that PostgreSQL often has errors or missing results
Josh Berkus wrote: Actually, the report which MonetDB has published I believe is illegal. If they're not running it through the TPC, they can't claim it's a "TPCH" result. I just resisted getting into that but now you've set me off again. Presumably they're using the public TPC-H data and query generator distributed by the TPC, and there's certainly plenty of other unofficial reports of results using that floating around. Where I think they really crossed the line here is using that kit to produce unaudited results, and then publishing results that included comparisons against a competitor, which is clearly not what the TPC intends you to do here. I'd rather refute their results than cry to the daddy TPC about it though. Not that I'd be upset if someone else, perhaps someone who already has contacts within the TPC, did so. (looking up into the air and whistling) -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [PERFORM] Re: Faster CREATE DATABASE by delaying fsync (was 8.4.1 ubuntu karmic slow createdb)
Greg Stark wrote: On Tue, Jan 19, 2010 at 2:52 PM, Greg Stark wrote: Barring any objections shall I commit it like this? Actually before we get there could someone who demonstrated the speedup verify that this patch still gets that same speedup? I think the final version of this patch could use at least one more performance checking report that it does something useful. We got a lot of data from Andres, but do we know that the improvements here hold for others too? I can take a look at it later this week, I have some interest in this area. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com
Re: [HACKERS] MonetDB test says that PostgreSQL often has errors or missing results
Dann Corbit wrote: See: http://monetdb.cwi.nl/SQL/Benchmark/TPCH/ If the result is correct, then the problem queries should be added to the regression test suite. If the result is not correct, then perhaps they could get assistance on proper configuration of PostgreSQL and rerun the tests. We've already gotten a couple of reports that talks about the specific reasons why PostgreSQL 8.2 doesn't do well on this test, such as http://www.mail-archive.com/pgsql-gene...@postgresql.org/msg120882.html (broken queries) and http://archives.postgresql.org/pgsql-performance/2008-09/msg00157.php (why the default setting are completely ludicrous for this test). One of these days I'm going to re-run the whole suite with 8.4 and see how we're doing. Seriously doubt any queries are still flat out broken, given how many reports I've seen of successful (albeit slow sometimes) runs of the whole thing. That's just MonetDB being negligent in reporting things to favor their own agenda. It's completely permissible to customize queries that won't execute at all in order to fix their compatibility with a particular database, the fact that they didn't look into that or even say what was wrong is just shoddy journalism. I'm hoping to have some crow to serve to them about another benchmark result they've published soon, they're back to really rubbing me the wrong way again lately. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Mammoth in Core?
Takahiro Itagaki wrote: The conclusion is splitting existing projects into some 'modules', and getting the modules one by one into core. Voted features are here: http://wiki.postgresql.org/wiki/ClusterFeatures This page was a bit messy for someone who didn't attend the meeting to follow. I just cleaned it up so that the features are listed in voting order, and to have more inter-page links. It's better, but could use some more work still. There are a few things I noted that I didn't see in the voting order; I moved those all into another section. If anyone who was there can help prioritize those it would be helpful. Also, the voting included "XID set", but I didn't see any section that clearly matched that, so there may be a missing description section there too. Stepping back for a second, there are three big picture things that I think need to be sorted out before it's possible for this to be useful guidance for something like suggesting how JD might best fit his Mammoth work into the broad work being done in this area, which I consider a subset of the larger important question "how can people help here?": -There are dependencies that exist regardless of what order people would like to see the features at. For example, the one I thought I saw listed and therefore documented is that the "Modification trigger into core" feature presumes you've already resolved "Generalized Data Queue". Are there more of those? Does "Function scan push-down" depend on "Export snapshots" already being done? Those are the kind of questions I'm left with when reading. -Who if anyone is actually working in this area already with prototypes or at all? In some cases these are listed (like notes referencing Slony/Londiste etc.); it would be helpful to explicitly nail those down in every case. For the Mammoth example, that might help identify which components they have a uniquely useful piece for. -Is there anyone who's taken on responsibility for working on any of these specific parts yet? When we had a similar work prioritization session at the PGCon developer's meeting, most of the action items there came with a clearly labeled person on the hook who was working on them. I don't see any notion like that from this meeting's outcome. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Table size does not include toast size
Tom Lane wrote: I'm inclined to think that table vs. index is the right level of abstraction for these functions, and that breaking it down further than that isn't all that helpful. We have the bottom-level information (per-fork relation size) available for those who really want the details. Fair enough; this certainly knocks off all the important stuff already, just wanted final sanity check opinion. This one is ready for a committer to look at now. My test case seems to work fine with a moderately complex set of things to navigate. The main think I'm not familiar enough with to have looked at deeply is exactly how the FSM and toast computations are done, to check if there's any corner cases in how it navigates forks and such that aren't considered. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Table size does not include toast size
Bernd Helmle wrote: These are two new functions pg_table_size() and pg_indexes_size(). This patch also changes pg_total_relation_size() to be a shorthand for pg_table_size() + pg_indexes_size(). Attached is a test program to exercise these new functions. I thoroughly abuse generate_series and arrays to create a table with a few megabytes of both regular and TOAST-ed text, and with two indexes on it. Here's the results from a sample run (it's random data so each run will be a bit different): pg_relation_size | 11,755,520 pages_size | 11,755,520 toast_and_fsm | 22,159,360 pg_table_size | 33,914,880 pg_indexes_size|524,288 pkey |262,144 i |262,144 pg_total_relation_size | 34,439,168 computed_total | 34,439,168 This seems to work as expected. You can see that pg_relation_size gives a really misleading value for this table, whereas the new pg_table_size does what DBAs were asking for here. Having pg_indexes_size around is handy too. I looked over the code a bit, everything in the patch looks clean too. The only question I'm left with after browsing the patch and staring at the above results is whether it makes sense to expose a pg_toast_size function. That would make the set available here capable of handling almost every situation somebody might want to know about, making this area completely done as I see it. In addition to being a useful shorthand on its own, that would then allow you to indirectly compute just the FSM size, which seems like an interesting number to know as feedback on what VACUUM is up to. It's easy enough to add, too: the calculate_toast_table_size code needed is already in the patch, just have to add another external function to expose it. I don't think there's any useful case for further exposing the two component parts of the toast size. If you're enough of a hacker to know what to do with those, you can certainly break them down yourself. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com \x DROP TABLE test; CREATE TABLE test(s SERIAL PRIMARY KEY,d TEXT); CREATE INDEX i on TEST(s); INSERT INTO test(d) SELECT array_to_string(array( SELECT chr(ascii('A') + (random() * 64)::integer) FROM generate_series(1,2)),'') -- Size FROM generate_series(1,1000); -- Rows INSERT INTO test(d) SELECT array_to_string(array( SELECT chr(ascii('A') + (random() * 64)::integer) FROM generate_series(1,1000)),'') -- Size FROM generate_series(1,1); -- Rows --insert into test (d) SELECT repeat('xyz123'::text,(1+random() * 1000)::integer) FROM generate_series(1,10); ANALYZE test; SELECT pg_relation_size(relname::regclass),relpages * 8192 AS pages_size FROM pg_class where relname='test'; SELECT pg_table_size('test'::regclass) - pg_relation_size('test'::regclass) AS toast_and_fsm; SELECT pg_table_size('test'::regclass); SELECT pg_indexes_size('test'::regclass),pg_relation_size('test_pkey'::regclass) as pkey,pg_relation_size('i'::regclass) AS i; SELECT pg_total_relation_size('test'::regclass),pg_table_size('test'::regclass)+pg_indexes_size('test'::regclass) AS computed_total; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clearing global statistics
Magnus Hagander wrote: Maybe this should be "Unrecognized reset target: %s", target, and also a errhint() saying which targets are allowed. Thoughts? That seems reasonable. The other thing I realized is that I forgot to add the new function to the right place in doc/src/sgml/func.sgml : pg_stat_reset_shared I can send an updated patch with both of these things fixed tomorrow. Given that we're talking 5 lines of change here, if it's easier for you to just patch a working copy you've already started on that's fine too. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming replication status
Stefan Kaltenbrunner wrote: Another popular question is "how far behind real-time is the archiver process?" You can do this right now by duplicating the same xlog file name scanning and sorting that the archiver does in your own code, looking for .ready files. It would be simpler if you could call pg_last_archived_xlogfile() and then just grab that file's timestamp. well that one seems a more reasonable reasoning to me however I'm not so sure that the proposed implementation feels right - though can't come up with a better suggestion for now. That's basically where I'm at, and I was looking more for feedback on that topic rather than to get lost defending use-cases here. There are a few of them, and you can debate their individual merits all day. As a general comment to your line of criticism here, I feel the idea that "we're monitoring that already via " does not mean that an additional check is without value. The kind of people who like redundancy in their database like it in their monitoring, too. I feel there's at least one unique thing exposing this bit buys you, and the fact that it can be a useful secondary source of information too for systems monitoring is welcome bonus--regardless of whether good practice already supplies a primary one. If you continue your line of thought you will have to add all kind of stuff to the database, like CPU usage tracking, getting information about running processes, storage health. I'm looking to expose something that only the database knows for sure--"what is the archiver working on?"--via the standard way you ask the database questions, a SELECT call. The database doesn't know anything about the CPU, running processes, or storage, so suggesting this path leads in that direction doesn't make any sense. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming replication status
Stefan Kaltenbrunner wrote: Greg Smith wrote: The other popular request that keeps popping up here is providing an easy way to see how backlogged the archive_command is, to make it easier to monitor for out of disk errors that might prove catastrophic to replication. I tend to disagree - in any reasonable production setup basic stulff like disk space usage is monitored by non-application specific matters. While monitoring backlog might be interesting for other reasons, citing disk space usage/exhaustions seems just wrong. I was just mentioning that one use of the data, but there are others. Let's say that your archive_command works by copying things over to a NFS mount, and the mount goes down. It could be a long time before you noticed this via disk space monitoring. But if you were monitoring "how long has it been since the last time pg_last_archived_xlogfile() changed?", this would jump right out at you. Another popular question is "how far behind real-time is the archiver process?" You can do this right now by duplicating the same xlog file name scanning and sorting that the archiver does in your own code, looking for .ready files. It would be simpler if you could call pg_last_archived_xlogfile() and then just grab that file's timestamp. I think it's also important to consider the fact that diagnostic internals exposed via the database are far more useful to some people than things you have to setup outside of it. You talk about reasonable configurations above, but some production setups are not so reasonable. In many of the more secure environments I've worked in (finance, defense), there is *no* access to the database server beyond what comes out of port 5432 without getting a whole separate team of people involved. If the DBA can write a simple monitoring program themselves that presents data via the one port that is exposed, that makes life easier for them. This same issue pops up sometimes when we consider the shared hosting case too, where the user may not have the option of running a full-fledged monitoring script. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming replication status
Fujii Masao wrote: "I'm thinking something like pg_standbys_xlog_location() [on the primary] which returns one row per standby servers, showing pid of walsender, host name/ port number/user OID of the standby, the location where the standby has written/flushed WAL. DBA can measure the gap from the combination of pg_current_xlog_location() and pg_standbys_xlog_location() via one query on the primary." This function is useful but not essential for troubleshooting, I think. So I'd like to postpone it. Sure; in a functional system where primary and secondary are both up, you can assemble the info using the new functions you just added, so this other one is certainly optional. I just took a brief look at the code of the features you added, and it looks like it exposes the minimum necessary to make this whole thing possible to manage. I think it's OK if you postpone this other bit, more important stuff for you to work on. So: the one piece of information I though was most important to expose here at an absolute minimum is there now. Good progress. The other popular request that keeps popping up here is providing an easy way to see how backlogged the archive_command is, to make it easier to monitor for out of disk errors that might prove catastrophic to replication. I just spent some time looking through the WAL/archiving code in that context. It looks to me that that this information isn't really stored anywhere right now. The only thing that knows what segment is currently queued up to copy over is pgarch_ArchiverCopyLoop via its call to pgarch_readyXlog. Now, this is a pretty brute-force piece of code: it doesn't remember its previous work at all, it literally walks the archive_status directory looking for *.ready files that have names that look like xlog files, then returns the earliest. That unfortunately means that it's not even thinking in the same terms as all these other functions, which are driven by the xlog_location advancing, and then the filename is computed from that. All you've got is the filename at this point, and it's not even guaranteed to be real--you could easily fool this code if you dropped an inappropriately named file into that directory. I could easily update this code path to save the name of the last archived file in memory while all this directory scanning is going on anyway, and then provide a UDF to expose that bit of information. The result would need to have documentation that disclaims it like this: pg_last_archived_xlogfile() text: Get the name of the last file the archive_command [tried to|successfully] archived since the server was started. If archiving is disabled or no xlog files have become ready to archive since startup, a blank line will be returned. It is possible for this function to return a result that does not reflect an actual xlogfile if files are manually added to the server's archive_status directory. I'd find this extremely handy as a hook for monitoring scripts that want to watch the server but don't have access to the filesystem directly, even given those limitations. I'd prefer to have the "tried to" version, because it will populate with the name of the troublesome file it's stuck on even if archiving never gets its first segment delivered. I'd happily write a patch to handle all that if I thought it would be accepted. I fear that the whole approach will be considered a bit too hackish and get rejected on that basis though. Not really sure of a "right" way to handle this though. Anything better is going to be more complicated because it requires passing more information into the archiver, with little gain for that work beyond improving the quality of this diagnostic routine. And I think most people would find what I described above useful enough. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com
Re: [HACKERS] plpython3
James William Pye wrote: The documentation is back up, so please be sure to look at the numerous examples provided therein. In addition to that, I'll try to get some contrasting examples posted as a follow-up to an earlier message. "In plpython you do X whereas in plpython3 you do Y." I had read the documentation before so it wasn't that bad that I couldn't see it. Basically, the issue I had is that it's not really clear which features are unique to this implementation that make it compelling. So more targeted examples like you're considering now would help. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clearing global statistics
Itagaki Takahiro wrote: To be honest, I have a plan to add performance statistics counters to postgres. It is not bgwriter's counters, but cluster-level. I'd like to use your infrastructure in my work, too :) Attached patch provides just that. It still works basically the same as my earlier version, except you pass it a name of what you want to reset, and if you don't give it the only valid one right now ('bgwriter') it rejects it (for now): gsmith=# select checkpoints_req,buffers_alloc from pg_stat_bgwriter; checkpoints_req | buffers_alloc -+--- 4 | 129 (1 row) gsmith=# select pg_stat_reset_shared('bgwriter'); pg_stat_reset_shared -- (1 row) gsmith=# select checkpoints_req,buffers_alloc from pg_stat_bgwriter; checkpoints_req | buffers_alloc -+--- 0 | 7 (1 row) gsmith=# select pg_stat_reset_shared('rubbish'); ERROR: Unrecognized reset target I turn the input text into an enum choice as part of composing the message to the stats collector. If you wanted to add some other shared cluster-wide reset capabilities into there you could re-use most of this infrastructure. Just add an extra enum value, map the text into that enum, and write the actual handler that does the reset work. Should be able to reuse the same new message type and external UI I implemented for this specific clearing feature. I didn't see any interaction to be concerned about here with Magnus's suggestion he wanted to target stats reset on objects such as a single table at some point. The main coding choice I wasn't really sure about is how I flag the error case where you pass bad in. I do that validation and throw ERRCODE_SYNTAX_ERROR before composing the message to the stats collector. Didn't know if I should create a whole new error code just for this specific case or if reusing another error code was more appropriate. Also, I didn't actually have the collector process itself validate the data at all, it just quietly ignores bad messages on the presumption everything is already being checked during message creation. That seems consistent with the other code here--the other message handlers only seem to throw errors when something really terrible happens, not when they just don't find something useful to do. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 1f70fd4..b630bc8 100644 *** a/doc/src/sgml/monitoring.sgml --- b/doc/src/sgml/monitoring.sgml *** postgres: user + + + pg_stat_reset_shared() + text + +Reset some of the shared statistics counters for the database cluster to +zero (requires superuser privileges). Calling +pg_stat_reset_shared('bgwriter') will zero all the values shown by +pg_stat_bgwriter. + + diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c index 4fd2abf..7335a50 100644 *** a/src/backend/postmaster/pgstat.c --- b/src/backend/postmaster/pgstat.c *** static void pgstat_recv_tabstat(PgStat_M *** 270,275 --- 270,276 static void pgstat_recv_tabpurge(PgStat_MsgTabpurge *msg, int len); static void pgstat_recv_dropdb(PgStat_MsgDropdb *msg, int len); static void pgstat_recv_resetcounter(PgStat_MsgResetcounter *msg, int len); + static void pgstat_recv_resetsharedcounter(PgStat_MsgResetsharedcounter *msg, int len); static void pgstat_recv_autovac(PgStat_MsgAutovacStart *msg, int len); static void pgstat_recv_vacuum(PgStat_MsgVacuum *msg, int len); static void pgstat_recv_analyze(PgStat_MsgAnalyze *msg, int len); *** pgstat_reset_counters(void) *** 1153,1158 --- 1154,1190 pgstat_send(&msg, sizeof(msg)); } + /* -- + * pgstat_reset_shared_counters() - + * + * Tell the statistics collector to reset cluster-wide shared counters. + * -- + */ + void + pgstat_reset_shared_counters(const char *target) + { + PgStat_MsgResetsharedcounter msg; + + if (pgStatSock < 0) + return; + + if (!superuser()) + ereport(ERROR, + (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), + errmsg("must be superuser to reset statistics counters"))); + + if (strcmp(target, "bgwriter") == 0) + msg.m_resettarget = RESET_BGWRITER; + else + { + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("Unrecognized reset target"))); + } + + pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_RESETSHAREDCOUNTER); + pgstat_send(&msg, sizeof(msg)); + } /* -- * pgstat_report_autovac() - *** PgstatCollectorMain(int argc, char *argv *** 2915,2920 --- 2947,2958 len); break;
Re: [HACKERS] Clearing global statistics
Tom Lane wrote: Seems like a more appropriate solution would be to make it easier to do that subtraction, ie, make it easier to capture the values at a given time point and then get deltas from there. It's more general (you could have multiple saved sets of values), and doesn't require superuser permissions to do, and doesn't have the same potential for damn-I-wish-I-hadn't-done-that moments. You can make the same argument about the existing pg_stat_reset mechanism. I would love to completely rework the stats infrastructure so that it's easier to capture values with timestamps, compute diffs, and do trending. However, I'm not sure the database itself is necessarily the best place to do that at anyway. People who know what they're doing are already handling this exact job using external tools that grab regular snapshots for that purpose, so why try to duplicate that work? I'm trying to triage here, to scrub off the worst of the common problems. I would never claim this is the perfect direction to follow forever. There are a number of people who consider the inability to reset the pg_stat_bgwriter stats in any way a bug that's gone unfixed for two versions now. Your larger point that this style of implementation is not ideal as a long-term way to manage statistics I would completely agree with, I just don't have the time to spend on a major rewrite to improve that. What I can offer is a fix for the most common issue I get complaints about, in the form of a tool much more likely to be used correctly by people who go looking for it than misused IMHO. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clearing global statistics
Rafael Martinez wrote: One thing I miss from the statistics you can get via pg_stat_* is information about how long we have been collecting stats (or in other words, when was the last time the stats were reset) I've considered adding this for the same reasons you're asking about it, but am not happy with the trade-offs involved. The problem is that you have to presume the server was running the entirety of the time since stats were reset for that data to be useful. So unless people are in that situation, they're going to get data that may not represent what they think it does. Realistically, if you want a timestamp that always means something useful you have to rest the stats at every server start, which leads us to: Before 8.3, we had the stats_reset_on_server_start parameter and the pg_postmaster_start_time() function. This was an easy way of resetting *all* statistics delivered by pg_stat_* and knowing when this was done. We were able to produce stats with information about sec/hours/days average values in an easy way. With this new feature I'm submitting, you can adjust your database startup scripts to make this happen again. Start the server, immediately loop over every database and call pg_stat_reset on them all, and call pg_stat_reset_shared('bgwriter'). Now you've got completely cleared stats that are within a second or two of pg_postmaster_start_time(), should be close enough to most purposes. Theoretically we could automate that better, but I've found it hard to justify working on given that it's not that difficult to handle outside of the database once the individual pieces are exposed. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clearing global statistics
Euler Taveira de Oliveira wrote: Greg Smith escreveu: pg_stat_reset( which text ) which := 'buffers' | 'checkpoints' | 'tables' | 'functions' | ... What about adding 'all' too? Or the idea is resetting all global counters when we call pg_stat_reset() (without parameters)? Once there's more than one piece to clear maybe adding in an 'all' target makes sense. In the context of the update patch I've finished, it just doesn't make sense given the code involved. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com
Re: [HACKERS] Clearing global statistics
Tom Lane wrote: Actually, that brings up a more general question: what's with the enthusiasm for clearing statistics *at all*? ISTM that's something you should do only in dire emergencies, like the collector went haywire and has now got a bunch of garbage numbers. The notion of resetting subsets of the stats seems even more dubious, because now you have numbers that aren't mutually comparable. So I fail to understand why the desire to expend valuable development time on any of this. When doing checkpoint tuning, the usual thing you start with is by considering the ratio of time to segment-based checkpoints, along with the corresponding balance of buffers written by the backends vs. the checkpoint. When that shows poor behavior, typically because checkpoint_segments is too low, you change its value and then resume monitoring at the new setting. Right now, you're still carrying around the history of the bad period forever though, and every check of the pg_stat_bgwriter requires manually subtracting the earlier values out. What people would like to do is reset those after adjusting checkpoint_segments, and then you can eyeball the proportions directly instead. That's exactly what the patch does. If I didn't see this request in the field every month I wouldn't have spent a minute on a patch to add it. There was a suggestion that subsets of the data I'm clearing might be useful to target, which I rejected on the bounds that it made it possible to get an inconsistent set of results as you're concerned about. You really need to clear everything that shows up in pg_stat_bgwriter or not touch it at all. The main use case I'm trying to support is the person who just made a config change and now wants to do: select pg_stat_reset(); select pg_stat_reset_shared('bgwriter'); So that all of the stats they're now dealing with are from the same post-tuning time period. Having numbers that are "mutually comparable" across the whole system is exactly the reason why this new call is needed, because there's this one part you just can't touch. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plpython3
James William Pye wrote: On Jan 13, 2010, at 2:27 PM, Peter Eisentraut wrote: The problem I'm having with this discussion is that every time someone asks what the supposed advantages of this new Python PL are, a feature list like the above is dumped, I agree that this is unfortunate, but how else can we to discuss the advantages? It boils down to comparing a couple feature lists, and *maybe* some implementation details. No? Code samples. You're trying to unseat a well established incumbent here, and you're not ever going to do that with documentation. Maybe your plpython3 has some compelling features to it. I don't know, because even with several thousand lines of basic Python code to my credit I cannot understand a single one of the arguments you presented for why your implementation is better--except agreeing that, yes, tracebacks are useful And even on that one, I'm not going to take your word on the superiority of your implementation. You're writing way over people's heads here. (Doesn't help that your docs link at the bottom of http://wiki.postgresql.org/wiki/WIP:plpython3 is broken either). If one has to be a Python expert to understand your position, you've already lost. Python code is easy to read though. If you'd said "here's a great example of how Function Modules are an improvement over what you can do with the current pl/python," that would be infinitely more useful than the list of language trivia related to them. You should be aiming to put Peter on the spot to respond to claims you make like "you can't do this easily with the current implementation" after showing an elegant bit of code. One of the things I'm increasingly frustrated by (and don't take this personally, this is a general comment coming more from the last CF rather than something I mean to single you out for) is how many patch submissions we get that don't have *compelling* examples showing their value. Have a better programming approach to something? Show me the old way and how the new way is better. Performance improvement? Provide a complete, self-contained example showing how to demonstrate it. New type of feature? Cut and paste a whole session showing how it's used, with every single command you typed after initdb. Basically, if a reviewer can't confirm your patch is doing something useful in five minutes and be excited that they've watched something interesting happen, you're decreasing the chances that your patch will ever go anywhere dramatically. I hope that everyone submitting patches reads http://www.depesz.com/ at least once in a while. One of the things I really enjoy about his blog is how he shows complete working examples of so many patches. To pick a standout recent entry, http://www.depesz.com/index.php/2010/01/03/waiting-for-8-5-exclusion-constraints/ takes "exclusion constraints"--a feature I didn't follow a bit of the discussion about--and works through the whole feature with a series of examples that, while still complicated, are completely self-contained and possible to follow along until you understand how it all fits together. Patch submitters should consider it a goal to make life that easy for the reviewer stuck with checking their patch out. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com
Re: [HACKERS] [PATCH] remove redundant ownership checks
re for this to go right now except for "Returned with Feedback". It's extremely valuable to have had this patch submitted. I don't believe an exact spot of contention with the current code was ever highlighted so clearly before this discussion, because previous patches were just too big. We do need to get this whole thing off the list for a while now though, I think it's gotten quite a fair slice of discussion already. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming replication status
Stefan Kaltenbrunner wrote: so is there an actually concrete proposal of _what_ interals to expose? ' The pieces are coming together...summary: -Status quo: really bad, but could probably ship anyway because existing PITR is no better and people manage to use it -Add slave pg_current_xlog_location() and something like pg_standby_received_xlog_location(): Much better, gets rid of the worst issues here. -Also add pg_standbys_xlog_location() on the master: while they could live without it, this really helps out the "alert/monitor" script writer whose use cases keep popping up here. Details...the original idea from Fujii was: "I'm thinking something like pg_standbys_xlog_location() [on the primary] which returns one row per standby servers, showing pid of walsender, host name/ port number/user OID of the standby, the location where the standby has written/flushed WAL. DBA can measure the gap from the combination of pg_current_xlog_location() and pg_standbys_xlog_location() via one query on the primary." After some naming quibbles and questions about what direction that should happen in, Tom suggested the initial step here is: "It seems to me that we should have at least two functions available on the slave: latest xlog location received and synced to disk by walreceiver (ie, we are guaranteed to be able to replay up to here); and latest xlog location actually replayed (ie, the state visible to queries on the slave). The latter perhaps could be pg_current_xlog_location()." So there's the first two of them: on the slave, pg_current_xlog_location() giving the latest location replayed, and a new one named something like pg_standby_received_xlog_location(). If you take the position that an unreachable standby does provide answers to these questions too (you just won't like them), this pair might be sufficient to ship. To help a lot at dealing with all the error situations where the standby isn't reachable and segments are piling up (possibly leading to full disk), the next figure that seems to answer the most questions is asking the primary "what's the location of the last WAL segment file in the pile of ones to be archived/distributed that has been requested (or processed if that's the easier thing to note) by the standby?". That's what is named pg_standbys_xlog_location() in the first paragraph I quoted. If you know enough to identify that segment file on disk, you can always look at its timestamp (and the ones on the rest of the files in that directory) in a monitoring script to turn that information into segments or a time measurement instead--xlog segments are nicely ordered after all. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming replication status
Bruce Momjian wrote: Right, so what is the risk of shipping without any fancy monitoring? You can monitor the code right now by watching the output shown in the ps display and by trolling the database logs. If I had to I could build a whole monitoring system out of those components, it would just be very fragile. I'd rather see one or two very basic bits of internals exposed beyond those to reduce that effort. I think it's a stretch to say that request represents a design change; a couple of UDFs to expose some internals is all I think it would take to dramatically drop the amount of process/log scraping required here to support a SR system. I guess the slightly more ambitious performance monitoring bits that Simon was suggesting may cross the line as being too late to implement now though (depends on how productive the people actually coding on this are I guess), and certainly the ideas thrown out for implementing any smart behavior or alerting when replication goes bad like Josh's "archiving_lag_action" seem based the deadline to get addressed now--even though I agree with the basic idea. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clearing global statistics
Magnus Hagander wrote: I have on my TODO to implement the ability to do stats reset on a single object (say, one table only). Please take this into consideration when you design/name this, so theres no unnecessary overlap :-) Same goes for the stats message itself. The idea suggested upthread was to add this: pg_stat_reset( which text ) which := 'buffers' | 'checkpoints' | 'tables' | 'functions' | ... Now, the way the pg_stat_bgwriter tables are computed, it doesn't actually make sense to separate out clearing the buffers/checkpoints stats, since one of those values is in both categories: buffers_checkpoint. They're really all too tightly coupled to break them apart. So I was thinking of this: pg_stat_reset( which text ) which := 'bgwriter' | ... I could convert the patch I've got to be an initial implementation of this new "pg_stat_reset with a parameter", laying some useful groundwork in the process too. Then people who want to reset more things can just re-use that same outline and message passing mechanism, just adding comparisons for new text and a handler to go with it--not even touching the catalog again. This may not mesh well with what you plan though. If pg_stat_reset is updated to reset stats on an individual table, that could be a second version that takes in a regclass: pg_stat_reset('tablename'::regclass) But that seems like a confusing bit of overloading--I can easily see people thinking that pg_stat_reset('bgwriter') would be resetting the stats for a relation named 'bgwriter' rather than what it actually does if I build it that way. So, combining with Peter's naming suggestion, I think what I should build is: pg_stat_reset_shared( which text ) which := 'bgwriter' | ... Which satisfies what I'm looking for now, and future patches that need to reset other shared across the cluster statistics can re-use this without needing to add a whole new function/stats message. I think that satisfies the cross-section of planned use cases we're expecting now best. Any comments before I update my patch to do that? -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming replication status
Heikki Linnakangas wrote: Greg Smith wrote: I don't think anybody can deploy this feature without at least some very basic monitoring here. I like the basic proposal you made back in September for adding a pg_standbys_xlog_location to replace what you have to get from ps right now: http://archives.postgresql.org/pgsql-hackers/2009-09/msg00889.php That's basic, but enough that people could get by for a V1. It would be more straightforward to have a function in the standby to return the current replay location. It feels more logical to poll the standby to get the status of the standby, instead of indirectly from the master. Besides, the master won't know how far the standby is if the connection to the standby is broken. This is one reason I was talking in my other message about getting simple stats on how bad the archive_command backlog is, which I'd think is an easy way to inform the DBA "the standby isn't keeping up and disk is filling" in a way that's more database-centric than just looking at disk space getting gobbled. I think that it's important to be able to get whatever useful information you can from both the primary and the standby, because most of the interesting (read: painful) situations here are when one or the other is down. The fundamental questions here are: -When things are running normally, how much is the standby lagging by? This is needed for a baseline of good performance, by which you can detect problems before they get too bad. -If the standby is down altogether, how can I get more information about the state of things from the primary? -If the primary is down, how can I tell more from the standby? Predicting what people are going to want to do when one of these bad conditions pops up is a large step ahead of where I think this discussion should be focusing on now. You have to show how you're going to measure the badness here in the likely failure situations before you can then take action on them. If you do the former well enough, admins will figure out how to deal with the latter in a way compatible with their business processes in the first version. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com
Re: [HACKERS] Streaming replication status
Fujii Masao wrote: On Mon, Jan 11, 2010 at 5:36 PM, Craig Ringer wrote: Personally, I'd be uncomfortable enabling something like that without _both_ an admin alert _and_ the ability to refresh the slave's base backup without admin intervention. What feature do you specifically need as an alert? Just writing the warning into the logfile is enough? Or need to notify by using SNMP trap message? Though I'm not sure if this is a role of Postgres. It's impossible for the database to have any idea whatsoever how people are going to want to be alerted. Provide functions to monitor things like replication lag, like the number of segments queued up to feed to archive_command, and let people build their own alerting mechanism for now. They're going to do that anyway, so why waste precious time here building someone that's unlikely to fit any but a very narrow use case? -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com
Re: [HACKERS] Streaming replication status
Fujii Masao wrote: On Sun, Jan 10, 2010 at 8:17 PM, Simon Riggs wrote: What could happen is that the standby could slowly lag behind master. We don't have any way of monitoring that, as yet. Setting ps display is not enough here. I agree that the statistical information about replication activity is very useful. But I think that it's not an urgent issue. Shall we think it later? I don't think anybody can deploy this feature without at least some very basic monitoring here. I like the basic proposal you made back in September for adding a pg_standbys_xlog_location to replace what you have to get from ps right now: http://archives.postgresql.org/pgsql-hackers/2009-09/msg00889.php That's basic, but enough that people could get by for a V1. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com
Re: [HACKERS] damage control mode
Bruce Momjian wrote: I think the big issue with 8.4 was, do we close the commit-fest when we have open issues, and we aren't clear on how to fix them? A lot of unresolve issues get kept for that pre-beta period because all of a sudden we have to resolve all those complex problems. I don't see that changing for 8.5. That's not quite how I remember it; let's compare directly. Right now we have this: http://wiki.postgresql.org/wiki/PostgreSQL_8.5_Open_Items And at this point in the year 8.4 looked like this: http://wiki.postgresql.org/index.php?title=PostgreSQL_8.4_Open_Items&oldid=4121 You wrote a good summary of the open issues for 8.4 at the end of January that I think is informative reading for how things are compared with then: http://archives.postgresql.org/message-id/200901270402.n0r42xj01...@momjian.us That point had three major features in it that all got pushed to 8.5, and the whole in-place upgrade situation was also completely active as well--Bruce finishing up a first good pg_migrator beta on 2009-02-18. That one I'd consider a fourth major feature open at that point left off that list. Now, considering that list of four, one is committed this time around (Hot standby), another quite clearly tabled already (SEPostgreSQL), the third is still active but in much better shape (Streaming Replication), and the fourth (in-place upgrade) just closed its main list of open issues specific to this release. How about bugs? The big data dump of open issues and known bugs that Tom put together didn't show up until March 26: http://wiki.postgresql.org/index.php?title=PostgreSQL_8.4_Open_Items&direction=prev&oldid=5588 http://archives.postgresql.org/message-id/24552.1238093...@sss.pgh.pa.us Even with that whole mess, the 8.4 beta started on 2009-04-15, and 8.4.0 made it out on July 1. Concerns about trimming the CF list are certainly valid, but I think any comparison with 8.4 should recognize that there were four giant patches/issues floating around at this point for that release, while this time there's only a much closer to release SR. While the rest of the patches Robert is concerned about have their complications and aren't trivial, there's not a one of them that's anybody is going to fight over the way HS, SEPostgreSQL, and in-place upgrades were. Not to trivialize them or their authors work, but frankly changes to things like Listen/Notify and the GIN/GIST changes that seem to be gathering heat here seem pretty minor compared to the four giant things that were open at this point in the 8.4 cycle--the controversial ones seem like they have a pretty low destabilizing potential to me from what I've seen. Personally, I'd like the topic of a thread on "damage control" to be all about testing the one big patch that's already in there (HS), its related bits like the VACUUM FULL changes, and potentially SR too. Those are things that are touching internals that can introduce all sorts of new bugs, in the same way that the FSM changes were pretty scary for 8.4. Those are the things I worry about every day right now, not whether, say, a very walled-off indexing change for data that only a fraction of the user base uses might destabilize things. Getting excited about pre-emptively kicking out patches that may not even be commit candidates anyway is distracting thought from the stuff that really matters right now IMHO. I'm more concerned about how to reach a bug list as mature as the one Tom presented at the end of March last year earlier in this one, for the features that are already in there. (Oh, and to head off "well what are you doing about it?", I just updated pgbench-tools this week to add some missing 8.4 features, eventually working toward supporting some of the new 8.5 stuff before beta too. I'm hoping to get a couple of months of machine testing time in between now and release time.) -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add .gitignore files to CVS?
Peter Eisentraut wrote: Probably someone to actually track the open items that are mentioned every time this discussion happens. http://wiki.postgresql.org/wiki/Switching_PostgreSQL_from_CVS_to_Git now has what I believe the state of the world to be in this area. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] git help
Kevin Grittner wrote: What would the normal workflows be to: rebase the postgresql.git clone and the serializable branch on the server? rebase my working copy on my desktop from the serializable branch on the server? The way you asked this suggests you've chewed on someone's advanced workflow and swallowed some wrong bits. Rebase is an operation you do when you're trying to be picky about what and how you publish your changes to the world. Let's say you've updated your local branch: serialized -> commit A -> commit B Meanwhile, the upstream master has gotten a bunch of new commits made to it: master -> commit C -> commit D If you then merge those changes in, you'll have the following history: synchronized -> commit A -> commit B -> commit C -> commit D Now, what happens if you want to submit a patch showing your changes? This isn't really what you want, is it? Ideally, you'd want to see your changes--A,B--on top of the updated master instead. Basically, throw these new C,D changes into history behind yours. That's what rebase allows you to do. On a normal day, you won't be doing that at all. You'll be merging from the various sources and pushing to others. Rebasing is something you do before publishing your changes to the world, or as part of repo history cleanup. It's not something you have to do all the time. Also: rebase can allow you to squash A&B into one commit before you publish them, too, to make the commit log easier to read. Did a bunch of dumb stuff on your local copy? Squash them into a single, final version before sharing those with the world. Again, not necessarily a routine operation you have to do regularly. You could easily commit happily and usefully share your work without running rebase once. However, note that once you do push something out, and people grab it, you've lose the opportunity to do some rebase cleanup without making life difficult for them--you can't wipe out commit log history once it's in somebody else's repo or your copy is going to diverge from theirs, making a future merge messy for one of you. This is why some people compulsively rebase everything before they publish. You shouldn't assume you have to do that from day one though. On this project, if your code gets committed one day, it's not going to be with the commit history intact anyway, so it really doesn't matter at the end. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] damage control mode
Robert Haas wrote: If we're going to have any chance of getting these patches in, we have to give the patch authors good feedback early in the CommitFest so that they have time to make the necessary revisions before the end of the CommitFest. If we think we can swing it, I'm happy to handle these patches in the normal way; I'm also happy to say we'll review them all but not commit them for fear of destabilizing the tree; or we can punt them altogether. Presuming enough reviewers (which should be the case this time given the expectation that submitters also review), the suggested pacing here now has every patch passing through a round of review and potentially one update within ten days. Given that, I'm not sure why you're looking to special case anything here. Give everybody a fair initial run, just with the reminder that the bar for "ready for committer" is higher than normal on this last CF, which people have certainly gotten some warning of. If your patch smells funny at all or seems like it will take a lot of work from a committer to apply, it's out. Giving someone a review but then telling them "it looks good, but we just don't want to commit it right now" is more fair than not getting that author's patch a review at all, right? So why bounce them prematurely? -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Testing with concurrent sessions
Robert Haas wrote: It just seems crazy to me to try to test anything without proper language bindings. Opening a psql session and parsing the results seems extraordinarily painful. I've written a Python based program that spawns a captive psql and talks to it--twice for different people--that ultimately uses the same sort of open3() spawning David mentioned is available via IPC::Open3. You can throw together a prototype that works well enough for some purposes in a couple of hours. I don't know that it would ever reach really robust though. The main problem with that whole approach is that you have to be extremely careful in how you deal with the situation where the captive program is spewing an unknown amount of information back at you. How do you know when it's done? Easy for the child and its master to deadlock if you're not careful. In the past I worked around that issue by just waiting for the process to end and then returning everything it had written until that time. I don't know that this would be flexible enough for what's needed for concurrent testing, where people are probably going to want more of a "send a command, get some lines back again" approach that keeps the session open. If I thought a captive psql would work well in this context I'd have written a prototype already. I'm not sure if it's actually possible to do this well enough to meet expectations. Parsing psql output is completely viable for trivial purposes though, and if the requirements were constrained enough it might work well enough for simple concurrent testing. While both concurrent psql and the libpq shim you suggested would take more work, I feel a bit more confident those would result in something that really worked as expected on every platform when finished. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Setting oom_adj on linux?
Magnus Hagander wrote: On Mon, Jan 4, 2010 at 16:45, Alvaro Herrera wrote: Magnus Hagander wrote: I realize this is a very platform-specific thing, but should we consider setting the value of /proc//oom_adj when running on linux? See: http://archives.postgresql.org/message-id/20080201223336.GC24780%40alvh.no-ip.org Can't find a useful consensus though? In http://archives.postgresql.org/pgsql-hackers/2008-02/msg00049.php Tom points out that while you could make this adjustment in the init scripts for PostgreSQL, actually doing so is quite questionable as a packaging decision. That's where that thread ended as far as I was concerned. The best I think anyone could do here is to add such a capability into some of the init scripts, but it would probably need to be disabled by default. Since that sort of defeats the purpose of the change, I'm not sure what the benefit there is--if you have to turn it on, you might as well do something at a higher level instead. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com
Re: [HACKERS] Thoughts on statistics for continuously advancing columns
Joshua D. Drake wrote: postgres=# analyze verbose test_ten_million; INFO: analyzing "public.test_ten_million" INFO: "test_ten_million": scanned 3000 of 44248 pages, containing 678000 live rows and 0 dead rows; 3000 rows in sample, 1048 estimated total rows ANALYZE Time: 20145.148 ms At an ever larger table sizes, this would turn into 3000 random seeks all over the drive, one at a time because there's no async I/O here to queue requests better than that for this access pattern. Let's say they take 10ms each, not an unrealistic amount of time on current hardware. That's 30 seconds, best case, which is similar to what JD's example is showing even on a pretty small data set. Under load it could easily take over a minute, hammering the disks the whole time, and in a TOAST situation you're doing even more work. It's not outrageous and it doesn't scale linearly with table size, but it's not something you want to happen any more than you have to either--consider the poor client who is trying to get their work done while that is going on. On smaller tables, you're both more likely to grab a useful next page via readahead, and to just have the data you need cached in RAM already. There's a couple of "shelves" in the response time to finish ANALYZE as you exceed L1/L2 CPU cache size and RAM size, then it trails downward as the seeks get longer and longer once the data you need is spread further across the disk(s). That the logical beginning of a drive is much faster than the logical end doesn't help either. I should generate that graph again one day somewhere I can release it at... -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Thread safety and libxml2
We got an interesting report on the testers list today: http://archives.postgresql.org/pgsql-testers/2009-12/msg0.php Basically, configure failed on their OpenBSD system because thread safety is on but the libxml2 wasn't compiled with threaded support: http://xmlsoft.org/threads.html Disabling either feature (no --with-libxml or --disable-thread-safety) gives a working build. I wonder if it's worthwhile to document this coupling between thread safety and libxml2 in either http://developer.postgresql.org/pgdocs/postgres/install-procedure.html or even the release notes. It seems quite likely to bite someone else again the future. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Thoughts on statistics for continuously advancing columns
Joshua D. Drake wrote: We normally don't notice because most sets won't incur a penalty. We got a customer who has a single table that is over 1TB in size... We notice. Granted that is the extreme but it would only take a quarter of that size (which is common) to start seeing issues. Right, and the only thing that makes this case less painful is that you don't really need the stats to be updated quite as often in situations with that much data. If, say, your stats say there's 2B rows in the table but there's actually 2.5B, that's a big error, but unlikely to change the types of plans you get. Once there's millions of distinct values it's takes a big change for plans to shift, etc. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] parse tree to XML format
matt wrote: We are trying to gather statistics about our queries and get automatic suggestions for what indexes to utilize ...its easier to figure that on queries that are in some format else we have to effectively parse the queries ourself or hack the postgresql parser...which we dont want to do... You don't want the parse tree at all then; you want the query plans, as shown by EXPLAIN, which is a completely different thing. I'm a bit concerned you've got blinders on to what path you're going to take to work on this problem. Getting EXPLAIN plans out in machine readable format solves only a tiny fraction of the things you need to figure out in order to select better indexes. You'd be better off instrumenting your existing server with log analysis tools instead whether or not they include that specific format, rather than chasing after a feature only added in a version you can't put into production yet. There's a couple of ways to log information about the queries that are taking a long time to execute listed at http://wiki.postgresql.org/wiki/Logging_Difficult_Queries that you can use to help sort through finding the queries that are taking a lot of resources. Another helpful bit you should know about is that you can save log files in CSV format, which makes them easier to import for later analysis: http://www.postgresql.org/docs/current/static/runtime-config-logging.html Did you mention that the 8.5 code has such a functionality? i would like to download the code and play with it a bit, any pointers what i need to do to get the XML? http://developer.postgresql.org/pgdocs/postgres/sql-explain.html The "FORMAT XML" is what you want here. Not sure how easy it is to combine that with auto-explain. I just wrote something yesterday about a tool I created to make testing these pre-releases easier at http://notemagnet.blogspot.com/2009/12/testing-postgresql-85-alpha3-with-peg.html you might find helpful for your evaluation. Unless you have a good way to simulate your production app against a test server, I'm not sure what wandering down this path will accomplish for you though. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [PERFORM] [HACKERS] Faster CREATE DATABASE by delaying fsync (was 8.4.1 ubuntu karmic slow createdb)
Andres Freund wrote: As I said the real benefit only occurred after adding posix_fadvise(.., FADV_DONTNEED) which is somewhat plausible, because i.e. the directory entries don't need to get scheduled for every file and because the kernel can reorder a whole directory nearly sequentially. Without the advice it the kernel doesn't know in time that it should write that data back and it wont do it for 5 seconds by default on linux or such... I know they just fiddled with the logic in the last release, but for most of the Linux kernels out there now pdflush wakes up every 5 seconds by default. But typically it only worries about writing things that have been in the queue for 30 seconds or more until you've filled quite a bit of memory, so that's also an interesting number. I tried to document the main tunables here and describe how they fit together at http://www.westnet.com/~gsmith/content/linux-pdflush.htm It would be interesting to graph the "Dirty" and "Writeback" figures in /proc/meminfo over time with and without this patch in place. That should make it obvious what the kernel is doing differently in the two cases. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Table size does not include toast size
Tom Lane wrote: Perhaps invent pg_table_size() = base table + toast table + toast index and pg_indexes_size() = all other indexes for table giving us the property pg_table_size + pg_indexes_size = pg_total_relation_size Right; that's exactly the way I'm computing things now, I just have to crawl way too much catalog data to do it. I also agree that if we provide pg_table_size, the issue of "pg_relation_size doesn't do what I want" goes away without needing to even change the existing documentation--people don't come to that section looking for "relation", they're looking for "table". Bernd, there's a basic spec if you have time to work on this. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Table size does not include toast size
Bernd Helmle wrote: I've created a C-Function a while ago that extracts the TOAST size for a given relation. This gave me the opportunity to do a pg_relation_size(oid) + pg_relation_toast_size(oid) for a given table oid to calculate on disk data size required by a table. Maybe we should include such a function in core? Writing such a thing is already on my to-do list; it's absolutely a missing piece of the puzzle here. If you've got such a patch, by all means submit that. I just ran into my first heavily TOASTy database recently and the way I'm computing sizes on the relations there is too complicated for my tastes, so it's completely unreasonable to expect regular users to do that. To answer Rafael's concerns directly: you're right that this is confusing. pg_relation_size is always going to do what it does right now just because of how that fits into the design of the database. However, the documentation should be updated to warn against the issue with TOAST here. And it should be easier to get the total you're like to see here: main relation + toasted parts, since that's what most DBAs want in this area. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Backup history file should be replicated in Streaming Replication?
Robert Haas wrote: I think (as I did/do with Hot Standby) that the most important thing here is to get to a point where we have a reasonably good feature that is of some use, and commit it. It will probably have some annoying limitations; we can remove those later. I have a feel that what we have right now is going to be non-robust in the face of network breaks, but that is a problem that can be fixed by a future patch. Improving robustness in all the situations where you'd like things to be better for replication is a never ending job. As I understand it, a major issue with this patch right now is how it links to the client libpq. That's the sort of problem that can make this uncomittable. As long as the fundamentals are good, it's important not to get lost in optimizing the end UI here if it's at the expense of getting something you can deploy at all in the process. If Streaming Replication ships with a working core but a horribly complicated setup/failover mechanism, that's infinitely better than not shipping at all because resources were diverted toward making things more robust or easier to setup instead. Also, the pool of authors who can work on tweaking the smaller details here is larger than those capable of working on the fundamental streaming replication code. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Closing out CommitFest 2009-11
CommitFest 2009-11 is now closed, having committed 27 patches in 33 days. For comparison sake, 2009-09 committed 20 patches in 29 days, 2009-07 37 patches in 34 days, and 2008-09 29 patches in 30 days. The much bigger 2008-11 involved 58 patches going on for months, the bulk of it committed 28 patches in 36 days. Seems pretty consistent at this point: at the average patch contribution size seen over the last year, about one of those gets committed per day once we enter a CommitFest. I didn't bother accounting for things that were committed outside of the official dates, so it's actually a bit worse than that, but that gives a rough idea that's easy to remember. Also, just based on the last three CFs, 42% of patches are either returned with feedback or rejected (with quite a bit more CF to CF variation). The working estimation figure I'd suggest is that once a CF reaches 50 incoming patches it's unlikely that will finish in a month. CommitFest 2010-01, the last one for 8.5, begins on January 15th, 2010. I'll be out of commission with projects by then, so unless Robert wants to reprise his role as CF manager we may need to get someone else involved to do it. Between the CF application and how proactive everyone involved is at this point (almost all authors, reviewers, and committers do the bulk of the state changes and link to messages in the archives for you), the job of running things does keep getting easier. And the guidlines for how to be the CF manager are pretty nailed down now--you could just execute on a pretty mechanical plan and expect to make useful progress. It's still a lot of time though. I've never had an appreciation for exactly how many messages flow through this list like I do now, after a month of needing to read and pay attention to every single one of them. For those of you still furiously working on a patch with that deadline, if you have a large patch and it's not already been reviewed in a previous CommitFest, I wouldn't give you good odds of it being even looked at during that one. There doesn't seem to be any official warning of this where people will likely notice it, but this topic has been discussed on the list here. Large patches submitted just before the deadline for a release have not fared very well historically. Recognizing that, there's really no tolerance for chasing after them (at the expense of postponing the beta) left for this release. Just figured I'd pass along that warning before somebody discovers it the hard way, by working madly to finish their submission up only to see it get kicked to the next version anyway. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new CommitFest states
Robert Haas wrote: On Mon, Dec 14, 2009 at 12:38 PM, Greg Smith wrote: Robert Haas wrote: I don't think there should be a transition from Returned with Feedback back to Waiting for review. Granted we might allow that occasionally as an exceptional case, but normally Returned with Feedback is a final state. The main reason I put that in there is that sometimes a reviewer or even the CF manager (I did this myself once this time) will mark something "Returned with feedback", thinking there's no way the issues pointed out can be addressed right now. And then, a day or two later, in comes a patch that does just that; surprise! Hmm, I'm not aware of any actual cases of this. I'm usually pretty conservative about jumping to RWF unless there's been lag or we're near the end of the CommitFest, so it doesn't come up. I've concluded that the times this happened was just me being too aggressive here to close some patches out after getting behind, and I removed the path you objected to out of the page as not to encourage that behavior. I think that http://wiki.postgresql.org/wiki/Running_a_CommitFest makes for a pretty reasonable and quite detailed set of guidelines now for the whole process, which means we've successfully gotten "what Robert did to make things work well" documented fully. All it's missing is for the "Discussing review" state to be an official one. I could undo things back to where it's not listed, but I do think it matches what we really do better and might as well be recognized as such. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Closing out CommitFest 2009-11
On Tue, Dec 15, 2009 Greg Smith wrote: Sounds like we just are waiting for Simon to finish up, which is expected to happen by tomorrow, and for Tom to wrap up working on the ProcessUtility_hook. That makes the first reasonable date to consider alpha3 packaging Thursday 12/17 I think. Update: I just nagged Simon about this some more, as I know everyone is waiting for this and he's too deep in the code to be real talkative. It sounds like triage around the issues raised in the "An example of bugs for Hot Standby" thread is finished at this point, and he's still hopeful to get this wrapped up and committed this week--which given this is Friday means real soon now. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com
Re: [HACKERS] PostgreSQL project policy compendium
Kurt Harriman wrote: On 12/16/2009 7:10 AM, Tom Lane wrote: the project policy is to not require any compiler features not found > in C89. Is there somewhere a compendium of such policies which fledgling hackers should consult to avoid embarrassment? The list of suggestions at http://wiki.postgresql.org/wiki/Developer_FAQ and http://wiki.postgresql.org/wiki/Submitting_a_Patch are what we've got now. There's a number of us who try to record any interesting discussion bits like this one into one of those, to shorten the future learning curve for others. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Closing out CommitFest 2009-11
Robert Haas wrote: On Tue, Dec 15, 2009 at 12:27 PM, Greg Smith wrote: -New VACUUM FULL I get the impression there is still some discussion that needs to happen about the design of this. I think we should mark it Returned with Feedback for now, and let whoever ends up working on it resubmit whatever ends up getting agreed on. Given that some of the issues here intertwine with Hot Standby integration, Simon has already said he'll work on getting this committed, executing on the plan outlined at http://archives.postgresql.org/message-id/1260843982.1955.3436.ca...@ebony I just pinged him, and even the first useful step there isn't going to wrap up in time for this CommitFest and therefore alpha3. Accordingly, this one is going to stay in "ready for committer" but slip past stamping the end of this CF. I've marked him as the committer there and bounced it forward to the next CF. Simon is still burning the midnight oil to get us a version of HS that can be committed, the hope is that it's ready to go tomorrow. I think the best we can do here is to block and/or document any known limitations/bugs, so long as the feature works for almost all use cases it seems worth including now. Given that there's a working VF rewrite patch that just needs final integration and testing, it seems reasonable to me to commit HS with a warning that VF still has open issues when combined with that feature. Then everyone can continue to work through removing any reason you'd need the problematic form of VF anyway, so that the caveat turns into a non-issue. I'm more concerned about making sure we get plenty of testing in on HS *alone* to make sure there aren't any non-HS regressions introduced by its changes, before that gets even more complicated by adding SR on top of it. As for the tsearch improvements, not to trivialize the patch, but I think this one will survive being committed between alpha3 & CF 2010-01 if it doesn't make it in this week. Teodor can work on getting that committed when he has time, I don't think we need to wait for it. Sounds like we just are waiting for Simon to finish up, which is expected to happen by tomorrow, and for Tom to wrap up working on the ProcessUtility_hook. That makes the first reasonable date to consider alpha3 packaging Thursday 12/17 I think. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com
[HACKERS] Closing out CommitFest 2009-11
We're down to five patches that are ready for a committer still on the table: -New VACUUM FULL -tsearch parser inefficiency with urls or emails -ProcessUtility_hook -Aggregate ORDER BY support -Hot Standby I just bounced "Streaming Replication" forward to the next CF, and specifically noted the concerns about it linking with libpq. That one seems like it needs more time to brew. Also moved forward all of the still open ECPG patches. With a set of new code just showing up today, I don't think anyone wants to hold up this CF waiting for those to get processed completely. Michael can obviously keep working on them as he gets time. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] executor and slru dtrace probes
Zdenek Kotala wrote: Bernd Helmle píše v po 14. 12. 2009 v 20:42 +0100: Oh, and i was under the opinion the last discussions were about executor probes only (note the patch is split up into two parts now, SLRU and executor probes). The latter won't be fixed, but it seems the SLRU part at least is ready. I would like to add SimpleLruInit probe. I'm busy with PG packaging, but I hope that I will send updated version tomorrow. I'd like to see as many of these DTrace probes as possible make it into 8.5, and it's great that you've picked up these older ones and updated them. It looks like a lot of progress was made on actually measuring the overhead of adding these probes in even when they're not enabled, which is good. But I'm afraid we're already out of time for this one if you're still tweaking the probes here. With a functional change like that, our normal process at this point would be to have the reviewer re-evaluate things before they head to a committer, and I don't feel like this patch is quite at 100% yet--in particular, the probe documentation is improving but still a bit rough. I don't feel like we're quite ready to mark this one for commit for this one, and today we really want to clear the queue for things for committers to deal with. Please send that updated version, and let's keep working on this into the next CommitFest, where it will be in the front of the queue rather than how it ended up at the tail of this one just based on its submission date. You're not really getting a fair chunk of time here between your review and the end here because of problems lining up reviewer time, that shouldn't happen next time. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com
Re: [HACKERS] Hot Standby, release candidate?
Simon Riggs wrote: On Mon, 2009-12-14 at 11:44 +0200, Peter Eisentraut wrote: On mån, 2009-12-14 at 08:54 +, Simon Riggs wrote: Wednesday because that seemed a good delay to allow review. Josh and I had discussed the value of getting patch into Alpha3, so that was my wish and aim. I'm not aware of any particular date for end of commitfest, though possibly you are about to update me on that? Commit fests for 8.5 have usually run from the 15th to the 15th of next month, but the CF manager may choose to vary that. FWIW, the alpha release manager may also vary the release timeline of alpha3. ;-) I'm hoping that the alpha release manager can wait until Wednesday, please. At this point we've got 5 small to medium sized patches in the "Ready for Committer" queue. Maybe that gets all done on Tuesday, maybe it slips to Wednesday or later. It's not like a bell goes off tomorrow and we're done; there's probably going to be just a little slip here. In any case, it's certainly not the case that this is all done right now such that the alpha gets packed on Tuesday just because it's the 15th. It sounds like the worst case is that alpha would have to wait a day for Hot Standby to be finally committed, which seems well worth doing if it means HS gets that much more testing on it. It would be a help to eliminate the merge conflict issues for the Streaming Replication team by giving them only one code base to worry about merges against. And on the PR side, announcing HS as hitting core and available in the alpha is huge. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com
Re: [HACKERS] XLogInsert
Jaime Casanova wrote: So in this extreme case avg tps is just 6 transactions better Great job trying to find the spot where the code worked better. I'm not so sure I trust pgbench results where the TPS was so low though. Which leads us right back to exactly how Jeff measured his original results. As I said already, I think we need more insight into Jeff's performance report, a way to replicate that test, to look a bit at the latency as reported by the updated LWLock patch that Pierre submitted. Tweaking your test to give more useful results is a nice second opinion on top of that. But we're out of time for now, so this patch is getting returned with feedback. I encourage Jeff to resubmit the same patch or a better one with a little more data on performance measurements to our final 8.5 CommitFest in hopes we can confirm this an improvement worth committing. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench: new feature allowing to launch shell commands
Takahiro Itagaki wrote: Michael Paquier wrote: Yeah the grammar ":variable" is used to set a parameter, the user will feel disorientated if there is no support. The attached patch supports this new case, based on Itagaki-san's last version. What is the spec of "\setshell :variable" ? Literally interpreted, it declares a variable with name ":variable". But pgbench cannot use such variables because only variables named with alphabets, numbers, and _ can be accepted. Should we forbid to assign variables that name contain invalid characters instead? After reviewing this a bit more I've realized my idea wasn't very good here. If this is what we're already accepting: \set nbranches :scale It's completely reasonable to say that *only* this works: \setshell aid expr 1 + :naccounts While this does not: \setshell :aid expr 1 + :naccounts And I was wrong to ask that it should. Sorry to have lead you both around over this, my bad. Proposed patch attached. It checks for variable names whether they consist of isalnum or _. The check is only performed when a new variable is assigned to avoid overhead. In normal workload, variables with the same names are re-used repeatedly. I don't think the additinal check would decrease performance of pgbench. This is interesting, but we're already past where this should have wrapped up and I'm not sure if it's worth fiddling with this code path right now. I think your idea is good, just outside of what we should fool with right now and I'm out of time to work on testing it further too. How about this: the version you updated at http://archives.postgresql.org/pgsql-hackers/2009-12/msg00847.php , your pgbenchshell_20091210.patch, worked perfectly for me except for one complaint I've since dropped. How about we move toward committing that one, and maybe we look at this whole variable name handling improvement as a separate patch later if you think it's worth pursuing? At this point I'd just like to have a version of the shell/setshell feature go into the pgbench code without dragging things out further into new territory. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgAdmin III: timestamp displayed in what time zone?
Fred Janon wrote: Thanks Greg, at least among the socially unacceptable insults I got the the right list to post my question in. Aww, is somebody having a case of the Monday's? Frankly that was all useful advice. I was pointing out that despite what you thought, you might actually be decreasing your chances of getting an answer to your question in the future if you cross-post the way you did. Here's an equally helpful hint: when you have a question you'd like answered, don't insult someone in the group who spends their time every day answering questions in that area, particularly if they've actually given you a useful suggestion. Next time, I might be the best qualified to help you out, and you haven't really endeared yourself to me with your comments here. BTW, this list is listed as the list for tech questions in the pgAdmin tips, therefore if you don't want to be disturb, you might want to remove it from the pgAdmin tips. When I look at http://www.pgadmin.org/support/ for example it suggests the right list. I only see this one listed in the Translation section, as the place to ask to get added to the translators list. Does anyone know where the tips section suggesting people send tech questions to pgsql-hackers he's referring to is at? That seems like it should be cleaned up to point to the pgAdmin list instead if that's floating around there somewhere. Fred, if you could recall exactly what path you followed to end up here and let us know, we can try to keep someone else from being confused and instead directed to the right place more directly. It would be a nice gesture on your part to end our conversation here having done something useful, rather than with you just venting at me. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench: new feature allowing to launch shell commands
Takahiro Itagaki wrote: Heavily cleaned up patch attached. Please review it. This is almost there. The refactoring work you both did is exactly how I was hoping this would end up looking, code-wise, and the feature set is basically feature complete except for one small UI concern I have. Attached is an updated version of my skewed-select.sql test program, adjusted to take advantage of the now supported syntax. Note that in order to run this properly, you need to pass the database scale into pgbench when you run it like this: pgbench -T 10 -j 4 -c 8 -s 10 -f skewed-select.sql pgbench Because when you use a custom "-f" script, :scale isn't set otherwise. In the current version of the patch, you can do this: \setshell aid skewed-acct.pl :naccounts But you can't do this: \setshell :aid skewed-acct.pl :naccounts What's worse is that you don't get an error in that second case--it just doesn't set the variable. I thought this was fixed by one of Michael's versions here, so maybe this is a regression from the other clean-up? Since all the other ways you interact with this type of variable in pgbench require the ":", not supporting it in this context seems like a recipe for weird problems for those trying to use it. If we can get an updated version where you can use either syntax for the variable name passed to setshell, this one will be ready for commit I think. I was tempted to fix the bug myself, but I seem to be doing better as a tester on this patch rather than working on its development. Everything else--docs, code--looks pretty good now. Only thing we might add is a warning that you're going to completely tank your pgbench results by using the high-overhead shell command if your test would otherwise be limited by CPU. This feature is only really useful as far as performance testing goes if you're using it on an I/O bound test. I have some more pgbench hints to apply at some point in the future, so it wouldn't be problem to skip this for now; I can bundle it into that section later. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com \set naccounts 10 * :scale -- This works \setshell aid skewed-acct.pl :naccounts -- But this doesn't and should: -- \setshell :aid skewed-acct.pl :naccounts SELECT abalance FROM pgbench_accounts WHERE aid = :aid; \shell echo ::aid :aid selected from :naccounts accounts -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new CommitFest states
Robert Haas wrote: I don't think there should be a transition from Returned with Feedback back to Waiting for review. Granted we might allow that occasionally as an exceptional case, but normally Returned with Feedback is a final state. I did throw some disclaimers in the notes about this particular subject at the bottom of the table. The main reason I put that in there is that sometimes a reviewer or even the CF manager (I did this myself once this time) will mark something "Returned with feedback", thinking there's no way the issues pointed out can be addressed right now. And then, a day or two later, in comes a patch that does just that; surprise! Since it seems to happen anyway, and I'd prefer not to get in the position where people are screaming "you threw me out with 'RWF' unfairly", I thought it was better to accept that possibility so long as the whole thing is tightly bounded as far as how much time the author has to do it. (Also, Waiting for review is actually the wrong name for the state it's trying to talk about.) Uh, what are you talking about here? -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby, release candidate?
Heikki Linnakangas wrote: I note that if it was easy to run pgindent yourself on a patch before committing/submitting, we wouldn't need to have this discussion. I don't know hard it is to get it working right, but I recall I tried once and gave up. What sort of problems did you run into? -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new CommitFest states
Kevin Grittner wrote: http://wiki.postgresql.org/wiki/Running_a_CommitFest It seems to me that a patch could move from "Discussing review" to "Needs review" -- if the reviewer decided to discuss the approach before continuing the review process and the discussion confirms the approach as viable. In that case, the patch would be in "Needs review" the whole time. "Discussing review" is intended to be a "I'm done but not sure of the next step for this patch" state the reviewer can use. In the situation you described, the patch would never have left "Needs review". I just made that more clear by documenting that it's shorthand for "discussing review results". I also added a transition path for a similar situation though, where the discussion concludes the reviewer didn't do the right thing in the first place (even though they thought they did) and they return to reviewing after realizing what was missing. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com
Re: [HACKERS] pgAdmin III: timestamp displayed in what time zone?
Fred Janon wrote: Sorry if if's a double post, but I thought that it would be more likely I would get an answer on the hackers list. In that case just posting here would have been better than hitting both. I usually ignore any request for help that is posted on more than one list just to draw attention to itself; I wouldn't be surprised that are other people who are similarly annoyed by the behavior and do likewise. In your case, both lists you picked were the wrong ones, since you're asking a pgAdmin specific question which isn't the direct topic of any of the pgsql-* lists; pgadmin-support was the right place for it: http://archives.postgresql.org/pgadmin-support/ -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new CommitFest states
I didn't really get any feedback on my proposal to add a new "Discussing review" state to help out the reviewers and CF manager. To show how adding it helps track the common flow of patches through the system, I turned the whole CF into a big state machine and marked how the transitions should normally happen at http://wiki.postgresql.org/wiki/Running_a_CommitFest If you carefully read the "Followup" section, which Robert wrote initially, you can see it implies such a state exists via the "Bogged down in discussion" comments. I'd just like to have a way to flag patches that are entering discussion because the reviewer isn't sure what happens next as such directly, to make this whole process more mechanical, fair, and predictable to the bulk of the participants (reviewers and authors). I wasn't able to figure out how to do that without adding this additional state to the system. Robert's main objection to this idea, that at any point there should be one obvious person with the next action to take and authors should take more responsibility for their patch, is a good ideal. But since the CF manager ends up being the backstop for breakdowns in the process, they're always a second possible source for transitions. I'd rather them be a more predictable such source for a number of reasons. Note that a major secondary goal here is to make it easier to bring on-board new CF managers and have them hit the ground running, by having near deterministic suggestions for much of what they need to do. It's also not a coincidence that some of the more boring parts of that job step toward being specified well enough that one might start automating them too. How to construct a simple "nag bot" that mails pgsql-rrreviewers to perform most of the CF actions listed here is pretty obvious working from this table. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Adding support for SE-Linux security
I just did a round of integrating some of the big-picture feedback that has shown up here since the meeting into http://wiki.postgresql.org/wiki/SEPostgreSQL_Review_at_the_BWPUG , mainly supplementing the references in the "Works outside of SELinux" section with the new suggested reading here suggested by Stephen Smalley and Joshua Brindle. I'm trying to keep that a fairly readable intro to the controversial parts rather than going deeply technical. What I'm not going to try to track is all the low-level implementation details that are bouncing around right now, my brain is too full this week to cram more about OID trivia into it right now. That would be a good idea for someone to summarize eventually and then throw that onto the wiki somewhere else, so that it's easier to remember the context of what/why decisions were made. The way Simon has been keeping an ongoing log at http://wiki.postgresql.org/wiki/Hot_Standby shows a reasonable way to organize such a thing from a similarly complicated patch. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] dtrace probes for memory manager
Tom Lane wrote: It's amazing to me that we've never gone back and improved on the original quick-and-dirty MemoryContextStats mechanism. I certainly find myself using that a lot for issues like tracking down memory leaks. That code hasn't really gone anywhere since Neil tweaked the indentation two years ago. What sorts of improvements were you looking for? I started on trying to improve this area at one point but didn't get far. My first step was going to be just wrapping the call into a UDF to make it easier to reach without having to set loose gdb. I thought that might expand the possible uses for MemoryContextStats to "help find a memory leak safely on the production box", and therefore attact more attention to improving it. People really don't like running gdb on production, but a UDF dumping the same data wouldn't seem so risky. As Daniel Farina pointed out to me one day in a "duh" moment, that idea is quite obviously doomed by the fact that people want dumps from processes that you won't be talking to in a UDF context. You won't be able to find a leak in the background writer by dumping the context the UDF can see. There would need to be some sort of IPC/signaling mechanism in each process if you wanted it to work everywhere, and once that realization hit I realized this was a deeper problem than it appeared. If you want to make it easier to export into user space, it seems to me the memory context stats data either needs to get pushed somewhere it's easier to get to (the way the regular stats are), or there needs to be some way of polling any random PG process to find it--presumably by passing the request the pid you want to instrument. Neither of which may really be a reasonable idea at all, particularly since that in most cases, you're using a debugger to track your leak down only after reproducing a test case for it. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Adding support for SE-Linux security
Stephen Frost wrote: I agree with this- one issue is, unfortunately, an overabundance from KaiGai of "code-writing man-power". This is an odd situation for this community, in general, so we're having a hard time coming to grasp with it. There are plenty of parallels to when Zdenek was writing a ton of in-place upgrade code faster than anyone else was fully consuming it. The "do it right or don't do it at all" approach of the PG community seems particularly hard to reconcile with larger patches from people we don't get enough face time with. It's easy to get deadlocked and not have a good way to navigate out when faced with a set of difficult decisions and only electronic communications between participants. Shoot, you and Robert have spent time doing technical arguments in person and we still got a little rough patch on-list this week out of the debate. I hate to even use this terminology, but these big patches seem to need a project manager advocate sometimes: someone who knows everyone well enough to clear these stalled spots, smooth over any personality conflicts, and is motivated to intervene because they need the feature. I see it as a sort of scaling problem that might be a recurring one. It would be nice if we could all get better at identifying when it does happen, and perhaps find someone to help with planning before we waste so much time chasing code that isn't going to be accepted yet again. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] thread safety on clients
Tom Lane wrote: Also, the reason why Bruce's mistake exposed this is interesting. Omitting the #define for ENABLE_THREAD_SAFETY does not actually break "pgbench -j" at all -- it has a fallback strategy that uses multiple subprocesses instead of multiple threads. However, it has only one srandom() call, which occurs in the parent process before forking. This means that the subprocesses all start with the same random number state, which means they generate the same sequence of "random" account IDs to update We just got that as a bug report the other day too, with suggested fixes: http://archives.postgresql.org/pgsql-hackers/2009-12/msg00841.php I'm inclined to think this is bad, and we should fix pgbench to re-randomize after forking. If we don't, we'll have synchronized behavior on some platforms and not others, which doesn't seem like a good idea. On the other hand, if you did want that type of behavior, it's hard to see how you'd get it. Is it worth trying to provide that as a (probably non-default) mode in pgbench? If so, how would you do it on threaded platforms? It sounds like random pgbench run for a while would certainly expose the same thing you're concerned about eventually. I doubt it's worth the trouble to codify a bug just because it found another bug (it's hard enough to maintain code that only has to work right). If we want to stress this behavior, it would be easier to just test with a a bunch of clients going through a scale=1 database and use enough transactions to make update collisions likely. I'm working on a guide to stress testing new alpha builds with pgbench that will be ready in time for alpha3. I could easily add that as one of the suggested tests to run if you're concerned about getting more test coverage of that code path. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SE-PostgreSQL/Lite Review
Robert Haas wrote: One comment I have in general about this process is that I think it would enormously reduce the level of pain associated with making these kinds of changes if we could get patches that were not full of minor issues that need to be cleaned up (like comments not properly adjusted, spelling/grammar errors, formatting inconsistent with the rest of the code base, poor English). Sure, it's not fair to ask the committers to do that. But it's not really something we can expect to get in all situations from our English as a second language contributors either--it's hard enough to get these details right even on stuff done in this country by native speakers. At least we've got a big stack of comments that usually make sense if you read them to work with on the SE-PostgreSQL patches, which is much better than the alternative. I think we need a two pronged attack on this issue. Eventually I think someone who wants this feature in there will need to sponsor someone (and not even necessarily a coder) to do a sizable round of plain old wording cleanup on the comment text of the patch. Robert is right to point out the committers are too expensive, in terms of what work for the community they could be doing besides that, to waste on that job. English editing help good enough to help a lot for this purpose isn't hard to find though. As far as formatting, it drives me a little crazy that it's not more common practice for contributors to do their own targeted pgindent runs to at least nail some of the low-hanging fruit in that area, due the difficulty of working with the tool. That's something else I think it would be nice to invest a little time in improving. For example, there's no reason we should still have to kick back patches regularly just because of tab/space issues, and I know I run into one of those nearly every CommitFest. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Adding support for SE-Linux security
Tom Lane wrote: It's been perfectly clear since day one, and was reiterated as recently as today http://archives.postgresql.org/message-id/4b21757e.7090...@2ndquadrant.com that what the security community wants is row-level security. I think David Quigley's comments from earlier today summarize the situation better than I did: "For our purposes in DoD we need the MAC Framework and the row based access controls. But if a good starting point is to just do the access control over the database objects, then it will be useful for some commercial cases and some limited military cases" So it's not without value even in its current "Lite" form. But there's clearly a whole lot more use-cases that would benefit from a version with row filtering. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SE-PostgreSQL/Lite Review
Stephen Frost wrote: * Greg Smith (g...@2ndquadrant.com) wrote: I have to be honest and say that I'm not optimistic that this is possible or even a good idea to accomplish in the time remaining during this release. While I agree with you, I wish you hadn't brought it up. :) Mostly because I feel like it may discourage people from wanting to spend time on it due to desire to focus on things which are likely to make it into the next release. The reason I mentioned it is that I was getting a general feeling that the community at large was going to view not getting the patch into 8.5 as a final straw for working on it. I wanted to be clear that the scope of doing this right may extend beyond that, but that shouldn't be a reason to give up on it. I think KaiGai's comments about how it took 3 years to get SELinux similarly integrated into the Linux core rather then shipping as an add-on is encouraging, in that the way and scope of how we're struggling with this topic is not something we should consider extraordinary and therefore a failing. This is not an easy thing to do for anyone who tries it. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Adding support for SE-Linux security
Tom Lane wrote: My guess is that a credible SEPostgres offering will require a long-term amount of work at least equal to, and very possibly a good deal more than, what it took to make a native Windows port. Wow, if I thought that was the case I'd be as negative about the whole thing as you obviously are. In my head, I've been mentally bounding the effort by thinking that its worst case work would be more like what it took to add the role-based security to the system. I'd think that adding a new feature to the existing security setup couldn't be more painful than adding security in the first place, right? I didn't carefully watch either play out , but I was under the impression that the Windows port was quite a bit more work than that. Since the current discussion keeps going around in circles, the way I was trying to tilt the other thread I started towards was asking the question "what would need to change in the current PostgreSQL code to make the impact of adding the SEPostgreSQL code smaller?" I'd be curious to hear any thoughts you had on that topic. We already sort of refactored out "adding row-level security" as one answer to that, I feel like there may be others in there too. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] thread safety on clients
Bruce Momjian wrote: Peter Eisentraut wrote: if i remove the -j option then it runs without a problem Possibly related to the incomplete removal of the enable-thread-safety option that I just posted about. I thought about that but I can't figure out how that would affect pgbench. The "-j" option is the recent addition to pgbench that causes it to launch multiple client threads when enabled, each handling a subset of the transactions. There's blocks of codes in pgbench.c now that depend on having sane values for thread safety in libpq. That it may be detecting the wrong thing and operating in an unsafe way after the recent change is what Peter's suggesting. This is good, actually, because I don't think we had many client-side thread-safety tests floating around to catch problems in this area before. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com
[HACKERS] CommitFest 2009-11: Reviews complete
We have a couple of lingering re-reviews that are still ongoing (myself on "pgbench setshell" and the two DTrace ones), and the ECPG patches are still hanging around, but we're basically finished with reviews for all submitted patches at this point. So far 12 patches have been commited since the official CF started, and there's now 8 queued up as "Ready for Committer"--not counting the HS/SR pair or the ones still being worked on. That means the work on part of the committers part is at best around halfway done though, so please don't start throwing new material at the list just yet. This is particularly true given that I think there's still potential for HS or SR to make it in this round; I know I'd like to see one or both available in the easier to test alpha3 build. Thanks to all our reviewers for helping cut down on the amount of work our committers have to do though. It certainly feels to me like the average quality of patches making their way through to our committers keeps itching upwards, so they spend less time fighting things like patch bit-rot and stuff that just plain doesn't work, and more time doing serious review work. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers