Re: [HACKERS] pgbench throttling latency limit
On 10/13/14, 1:54 PM, Heikki Linnakangas wrote: Greg Smith, I'd still appreciate it if you could take a look at this, to check how this will work for pgbench-tools. I'll do a QA pass on the committed version looking for issues, and update the toolchain I publish to be compatible with it along the way too. -- Greg Smith greg.sm...@crunchydatasolutions.com Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal for 9.5: monitoring lock time for slow queries
On 8/13/14, 5:14 AM, MauMau wrote: I'm interested in b, too. I was thinking of proposing a performance diagnostics feature like Oracle's wait events (V$SYSTEM_EVENT and V$SESSION_EVENT). So, if you do this, I'd like to contribute to the functional design, code and doc review, and testing. I already wrote up a design like this once: http://www.postgresql.org/message-id/509300f7.5000...@2ndquadrant.com The main problem when I tried to code it was figuring out how to store the data. When I wrote that, I thought I could just stuff it into a shared memory block the way pg_stat_statements did. That didn't really work out. I think it's manageable now because the Dynamic Shared Memory management written since then has the right sort of shape to do the job. This is one of those jobs where I think the coding itself is the hard part, not the design nor the review. What I really want is something that dumps this data into memory, then a second process that persists to disk in batches. I think that's the only way we'll get high performance on reads while still saving enough data to be useful after a crash. We're working on getting a few things in this area fully funded to dig into them harder. The idea of designing for high-speed audit logs into memory and then persisting to disk has a lot of overlap with this one too, and that may get picked up too. -- Greg Smith greg.sm...@crunchydatasolutions.com Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fixed xloginsert_locks for 9.4
On 10/3/14, 8:26 AM, Andres Freund wrote: #define NUM_XLOGINSERT_LOCKS 1 tps = 52.711939 (including connections establishing) #define NUM_XLOGINSERT_LOCKS 8 tps = 286.496054 (including connections establishing) #define NUM_XLOGINSERT_LOCKS 16 tps = 346.113313 (including connections establishing) #define NUM_XLOGINSERT_LOCKS 24 tps = 363.242111 (including connections establishing) Just to clarify: that 10% number I threw out was meant as a rough estimate for a system with the default configuration, which is all that I tested. It seemed like people would likely need to tune all the usual things like checkpoint_segments, shared_buffers, etc. as well before seeing much better. You did all that, and sure enough the gain went up; thanks for confirming my guess. I still don't think that means this needs a GUC for 9.4. Look at that jump from 1 to 8. The low-hanging fruit here hasn't just been knocked off. It's been blended into a frozen drink, poured into a glass, and had a little paper umbrella put on top. I think that's enough for 9.4. But, yes, let's see if we can add delivery to the side of the pool in the next version too. -- Greg Smith greg.sm...@crunchydatasolutions.com Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fixed xloginsert_locks for 9.4
On 10/3/14, 10:11 AM, Andres Freund wrote: So 25% performance on a relatively small machine improvements aren't worth a GUC? That are likely to be larger on a bigger machine? I utterly fail to see why that's a service to our users. I didn't say that. I said I don't think they're worth a GUC today if it can be quietly and automatically slipped into the next release--and that seems quite feasible. I have introduced GUCs that almost no one can tune properly into the system before. Can't say I was pleased with how that played out. Another thing I don't know yet, and this is going to take me a while to characterize, is how that 25% gain on a benchmark that is specifically designed to highlight the problem use case impacts the various mixes of more average cases I try as well. Is it 0.1% for a typical pgbench workload? Now that GUC isn't so exciting anymore either. And there's one more big issue I'd prefer not to discover from real-world complaints: is there any downside to making this number very large on a system where it shouldn't be? The history of settings like this says that providing an exposed knob will result in some people tinkering it with and making the system slower. The gain of going from 1 to 8 is so clear and simple that I'm not worried too hard about performance regressions like that. But if we make it a GUC and it can be set to 100 to extract maximum performance on big machines...I'd take a bet that we'll find people setting it to 100 saying more must be better where it isn't. Every time someone wanders into pgsql-performance with a complaint, each one of these obscure GUCs they tweaked magnifies the troubleshooting mess a little. I do not disagree with you fundamentally here: this *is* worth refining further, for sure, and the gains might be even greater if that keeps going. My guess is just that the Postgres community would not get a net benefit chasing that as a GUC in 9.4, not by the time you try to account for all the future overhead and risk that adds to the release. That was Heikki's gut feel on this when he yanked it out already; I was mainly trying to do sanity checking on that. You've made a good case that wasn't the ideal answer. Even with that new data, I still don't think it was a outright bad decision though--especially not in an October where there's no new version out yet. This thread spun out of Open Items, and cutting complexity should be the preferred direction for everything left on there now. -- Greg Smith greg.sm...@crunchydatasolutions.com Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fixed xloginsert_locks for 9.4
On 10/3/14, 5:23 PM, Andres Freund wrote: How are we ever going to be able to tune it further without feedback from actual production usage? With improved targeted synthetic test cases that isolate the bottleneck to where it's really obvious, way more obvious than it will ever be in production? I just watched you do a very successful round of chewing right through this problem that way. And haven't the synthetic tests already been more useful to guide development than feedback from production for some time? An alternate way to state one of your questions along this angle is how can we tell if the fixed limit of 8 is still a bottleneck on a 9.4 server unless there's a GUC to increase past that? In my first message here I was trying to highlight that we have little idea what that world looks like yet. This change is already so beneficial and large, the hotspots on systems impacted by it are probably going to move to somewhere *very* different than earlier versions. And when that happens, it's typically not revisiting the thing you just made way, way faster than is still the problem anymore. If it turns out a large bottleneck in real-world 9.4 deployments is *still* xloginsert_locks, and there's no GUC for tuning it beyond 8, then the people in the support trenches are going to see removing that GUC as a terrible error. That might happen. I'm not trying to criticize your work here, but you haven't actually made the case yet this is likely--you cheated a little with the I/O part to get around where the bottleneck shifts once you have a decent number of slots (8). That tweak was part of why you got a larger gain than I did. So that's a bad path everyone might see in the future, and if we end up there all of us in the support trenches will suffer for having done the wrong thing. I get what you're saying there, and I'll owe you an apology if it plays out that way. In every other potential future I can imagine, and in every future I consider probable, eliminating the GUC for now makes life easier. Everyone has already talked a bunch about why extra GUCs have a support cost too. It's one less thing to maintain, document, mess with, break in the field, talk about, and suffer from unintended regressions. Do you want to put the GUC right back again in the active branch to keep progress on this easier to make, since it's really clear already there's still gain to be chased here? I think you've already justified doing that. I'm still running the commit with the GUC here. I will join you among the annoyed that it's gone group the minute I do my next git pull. -- Greg Smith greg.sm...@crunchydatasolutions.com Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time measurement format - more human readable
On 9/29/14, 1:08 AM, Andres Freund wrote: On 2014-09-28 20:32:30 -0400, Gregory Smith wrote: There are already a wide range of human readable time interval output formats available in the database; see the list at http://www.postgresql.org/docs/current/static/datatype-datetime.html#INTERVAL-STYLE-OUTPUT-TABLE He's talking about psql's \timing... I got that. My point was that even though psql's timing report is kind of a quick thing hacked into there, if it were revised I'd expect two things will happen eventually: -Asking if any of the the interval conversion code can be re-used for this purpose, rather than adding yet another custom to one code path standard. -Asking if this should really just be treated like a full interval instead, and then overlapping with a significant amount of that baggage so that you have all the existing format choices. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] open items for 9.4
On 9/29/14, 2:30 PM, Andres Freund wrote: Can we explain those reasons in the form of documentation? Yes. Try and benchmark it. It'll be hardware and workload dependant. I missed this whole thing, and eventually I have to circle back to it. I could do it this week. Could you (or someone else familiar with the useful benchmarks) give me more detail on how to replicate one case where things should improve? I can do that, and I have a lab full of hardware if it's easier to spot on one type of server. That exercise will probably lead to a useful opinion on the feature in its final form, any associated GUC, tunables, and necessary level of associated documentation in a day or two. This is a pretty low bar, right? Examples and documentation sufficient for just me to catch up is not asking for very much. If it isn't possible to do that in a very short period of time, it would not bode well for broader consumption. -- Greg Smith greg.sm...@crunchydatasolutions.com Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time measurement format - more human readable
On 9/28/14, 7:49 AM, Bogdan Pilch wrote: I have created a small patch to postgres source (in particular the psql part of it) that modifies the way time spent executing the SQL commands is printed out. The idea is to have a human readable time printed There are already a wide range of human readable time interval output formats available in the database; see the list at http://www.postgresql.org/docs/current/static/datatype-datetime.html#INTERVAL-STYLE-OUTPUT-TABLE If none of those are acceptable to you, it would be difficult but not impossible to justify something new. I could see tweaking one of those into a slightly updated new style aimed at this specific job, especially since it doesn't have to consider things like negative intervals. There's value in printing time measurements using one of these interval styles sometimes, instead of the relatively raw values given right now. It would need to be an option though, and one that let the user allow choosing any of the supported interval formats. I personally would prefer to never see the existing format the number is reported in go away--too much software already expects it to be there, in that format. But adding this human readable version after that, when the user asks specifically for it, could be an acceptable addition. So there's a rough spec for the job you'd have to take on here. I'd expect it to expand in scope almost immediately to also consider the output of similar time intervals from mechanisms like log_min_duration_statement, too though, rather than just focusing on psql timing data. There's a whole second round of almost inevitable scope creep to working on this. If you were hoping what you submitted might be considered directly, sorry; that's not going to happen. Handling input and output of times and dates is a very deep topic, and small patches trying to adjust such behavior without grappling with the full complexity are normally rejected outright. There are cases where the existing code just has simple hacks in there now that could easily be whacked around. But once the topic of cleaning those up appears, swapping to an alternate simple hack is rarely how that goes. It normally heads toward considering the full right thing to do to handle all cases usefully. -- Greg Smith greg.sm...@crunchydatasolutions.com Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: rounding up time value less than its unit.
On 9/26/14, 3:22 PM, Tom Lane wrote: We could alternatively try to split up these cases into multiple GUCs, which I guess is what you're imagining as a multi-year battle. No, I was just pointing out that even the cleanest major refactoring possible here is going to result in broken config files complaints for years. And no matter how well that goes, a second rewrite in the next major version, addressing whatever things nobody saw coming in the first redesign, is a very real possibility. The minute the GUC box is opened that far, it will not close up again in a release, or likely even two, and the griping from the field is going to take 3+ years to settle. I have no desire to waste time on partial measures either though. I think I've been clear that's my theme on this--if we're gonna mess with this significantly, let's do it right and in a big way. If there's a really trivial fix to apply, that's fine too. Throw an error when the value is between the special one and the useful minimum, no other changes; that I could see slipping into a single release without much pain. Might even be possible to write as a useful sub-step toward a bigger plan too. Wouldn't bother breaking that out as a goal unless it just happened to fall out of the larger design though. The rest of the rounding and error handling ideas wandering around seem in this uncomfortable middle ground to me. They are neither large enough to feel like a major improvement happened, nor trivial enough to apply with minimal work/risk. And this is not a bug that must be fixed ASAP--it's an unfriendly UI surprise. -- Greg Smith greg.sm...@crunchydatasolutions.com Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Last Commitfest patches waiting review
On 9/27/14, 4:18 AM, Heikki Linnakangas wrote: add latency limit to pgbench throttling (--rate) --- Rukh: Are you planning to review the latest patch version? Rukh is free to jump onto this ASAP, but if it's still there next week I already had my eye on picking that up and taking it out for a spin. I already updated my pgbench-tools set to incorporate the rate limit for 9.4, and I use that part all the time now. I think I understand Fabien's entire game plan for this now, and I want the remainder of the set to land in 9.5. -- Greg Smith greg.sm...@crunchydatasolutions.com Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: rounding up time value less than its unit.
On 9/26/14, 2:17 PM, Tom Lane wrote: Well, ok, let's allow zero as a special case, but it has to be written as 0 not something else. If you try to set a positive value then we should act as though the min_val is 1 unit. So I'm coming around to the idea that throw an error if a nonzero input would round (or truncate) to zero is a reasonable solution. I expressed some distaste for throwing errors before, but I find this specific rule reasonable. I'll even write the patch. I owe the GUC system a re-match after my wal_buffers auto-scaling thing for 9.1 rippled to cause extra work for you (maybe Robert too). I just changed this submission from Ready for Committer to Returned with Feedback, with the feedback being that we'd like to see special values rounded to 0 treated differently altogether first, before touching any rounding. And that's a whole new submission for the next CF. I think it'd be even more reasonable if we also fixed the rounding rule to be round to nearest, but the two changes can be considered independently. regards, tom lane Personally I'm still -1 on any rounding change, instead preferring to work toward eliminating these 0 -1 special values altogether. Outside of these special cases, I feel you are fundamentally right that if the rounding really matters, the unit is simply too large. And I believe that is the case for the log rotation one right now. I can see my idea for rescaling the rotation age parameter is an unpopular one, but I still like it. That cleanly splits out into a third thing though, where it can live or die without being tied to the rest of these issues. -- Greg Smith greg.sm...@crunchydatasolutions.com Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: rounding up time value less than its unit.
On 9/26/14, 2:34 PM, David Johnston wrote: I don't get where we need to do anything else besides that...the whole actual min values comment is unclear to me. If you look at pg_settings, there is a minimum value exposed there as min_val. For some of these parameters, that number would normally be 1. But since we have decided that 0 is a special flag value, min_val is 0 instead. There are others where min_val is -1 for the same reason, where functionally the minimum is really 0. Some of us would like to see min_val reflect the useful minimum, period, and move all these special case ones out of there. That is a multi-year battle to engage in though, and there's little real value to the user community coming out of it relative to that work scope. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: rounding up time value less than its unit.
On 9/26/14, 2:50 PM, David G Johnston wrote: Like I said I just want to fix the bug and call it a day :) I'm afraid you've come to the wrong project and mailing list for *that*. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On 9/26/14, 2:38 PM, Gavin Flower wrote: Curious: would it be both feasible and useful to have multiple workers process a 'large' table, without complicating things too much? The could each start at a different position in the file. Not really feasible without a major overhaul. It might be mildly useful in one rare case. Occasionally I'll find very hot single tables that vacuum is constantly processing, despite mostly living in RAM because the server has a lot of memory. You can set vacuum_cost_page_hit=0 in order to get vacuum to chug through such a table as fast as possible. However, the speed at which that happens will often then be limited by how fast a single core can read from memory, for things in shared_buffers. That is limited by the speed of memory transfers from a single NUMA memory bank. Which bank you get will vary depending on the core that owns that part of shared_buffers' memory, but it's only one at a time. On large servers, that can be only a small fraction of the total memory bandwidth the server is able to reach. I've attached a graph showing how this works on a system with many NUMA banks of RAM, and this is only a medium sized system. This server can hit 40GB/s of memory transfers in total; no one process will ever see more than 8GB/s. If we had more vacuum processes running against the same table, there would then be more situations where they were doing work against different NUMA memory banks at the same time, therefore making faster progress through the hits in shared_buffers possible. In the real world, this situation is rare enough compared to disk-bound vacuum work that I doubt it's worth getting excited over. Systems with lots of RAM where performance is regularly dominated by one big ugly table are common though, so I wouldn't just rule the idea out as not useful either. -- Greg Smith greg.sm...@crunchydatasolutions.com Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: rounding up time value less than its unit.
On 9/25/14, 1:41 AM, David Johnston wrote: If the error message is written correctly most people upon seeing the error will simply fix their configuration and move on - regardless of whether they were proactive in doing so having read the release notes. The important part to realize here is that most people will never see such an error message. There is a person/process who breaks the postgresql.conf, a process that asks for a configuration restart/reload (probably via pg_ctl, and then the postmaster program process creating a server log entry that shows the error (maybe in pgstartup.log, maybe in pg_log, maybe in syslog, maybe in the Windows Event Log) In practice, the top of that food chain never knows what's happening at the bottom unless something goes so seriously wrong the system is down, and they are forced to drill down into all of these log destinations. That's why a subset of us consider any error message based approaches to GUC guidance a complete waste of time. I won't even address the rest of your comments; you're focusing on trivia around something that just fundamentally isn't useful at all. My challenge to anyone who things error checking has value for this issue is to demonstrate how that would play out usefully on a mainstream Postgres system like RHEL/CentOS, Debian, or even Windows Put your bogus setting in the config file, activate that config file in a Postgres that looks for the round errors people dislike, and show me how that mistake is made apparent to the user who made it. I've done similar exercises myself, and my guess is that if the system is up at all, those error messages went by completely unnoticed. -- Greg Smith greg.sm...@crunchydatasolutions.com Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] add modulo (%) operator to pgbench
On 9/25/14, 8:38 AM, Robert Haas wrote: That's my whole reason for not wanting to adopt Fabien's approach in the first place: I was cool with exposing C's modulo operator, but any other modulo semantics seem like they should be built up from general-purpose primitives. Maybe; I don't quite understand his requirements well enough yet to know if that's possible, or if it's easier to give him a full special operator of his own. But since what you did makes that easier, too, forward progress regardless. Anyway, I think the first thing is that somebody needs to spend some time testing, polishing, and documenting this patch, before we start adding to it. I'm hoping someone else will volunteer - other tasks beckon. I bouncing it to here for you, and I expect to help with those parts presumably in addition to Fabien's help: https://commitfest.postgresql.org/action/patch_view?id=1581 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: rounding up time value less than its unit.
On 9/25/14, 2:02 PM, Peter Eisentraut wrote: But having the same parameter setting mean different things in different versions is the path to complete madness. Could we go so far as to remove support for unitless time settings eventually? The fact that people are setting raw numbers in the configuration file and have to know the unit to understand what they just did has never been something I like. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: rounding up time value less than its unit.
On 9/24/14, 6:45 PM, Peter Eisentraut wrote: But then this proposal is just one of several others that break backward compatibility, and does so in a more or less silent way. Then we might as well pick another approach that gets closer to the root of the problem. I was responding to some desire to get a quick fix that cut off the worst of the problem, and the trade-offs of the other ideas bothered me even more. Obvious breakage is annoying, but small and really subtle version to version incompatibilities drive me really crazy. A 60X scale change to log_rotation_age will be big enough that impacted users should definitely notice, while not being so big it's scary. Rounding differences are small enough to miss. And I do see whacking the sole GUC that's set in minutes, which I was surprised to find is a thing that existed at all, as a useful step forward. I can't agree with Stephen's optimism that some wording cleanup is all that's needed here. I predict it will be an annoying, multiple commit job just to get the docs right, describing both the subtle rounding change and how it will impact users. (Cry an extra tear for the translators) Meanwhile, I'd wager the entire work of my log_rotation_scale unit change idea, from code to docs to release notes, will take less time than just getting the docs done on any rounding change. Probably cause less field breakage and confusion in the end too. The bad case I threw out is a theorized one that shows why we can't go completely crazy with jerking units around, why 1000:1 adjustments are hard. I'm not actually afraid of that example being in the wild in any significant quantity. The resulting regression from a 60X scale change should not be so bad that people will suffer greatly from it either. Probably be like the big 10:1 change in default_statistics_target. Seemed scary that some people might be nailed by it; didn't turn out to be such a big deal. I don't see any agreement on the real root of a problem here yet. That makes gauging whether any smaller change leads that way or not fuzzy. I personally would be fine doing nothing right now, instead waiting until that's charted out--especially if the alternative is applying any of the rounding or error throwing ideas suggested so far. I'd say that I hate to be that guy who tells everyone else they're wrong, but we all know I enjoy it. -- Greg Smith greg.sm...@crunchydatasolutions.com Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] add modulo (%) operator to pgbench
On 9/24/14, 10:10 PM, Robert Haas wrote: I think you're making a mountain out of a molehill. I implemented this today in about three hours. I think you're greatly understating your own efficiency at shift/reducing parser mountains down to molehills. Fabien even guessed the LOC size of the resulting patch with less than a 9% error. That's some top notch software metrics and development work there boys; kudos all around. Let's get this operator support whipped into shape, then we can add the 2 to 3 versions of the modulo operator needed to make the major use cases work. (There was never going to be just one hacked in with a quick patch that satisfied the multiple ways you can do this) Then onto the last missing pieces of Fabien's abnormally distributed test workload vision. He seems kind of stressed about the process lately; not sure what to say about it. Yes, the PostgreSQL community is hostile to short, targeted feature improvements unless they come already fit into a large, standards compliant strategy for improving the database. That doesn't work well when approached by scratch an itch stye development. Nothing we can really do about it -- Greg Smith greg.sm...@crunchydatasolutions.com Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RLS feature has been committed
On 9/23/14, 9:00 AM, Andres Freund wrote: I also think committers need to be much more careful when committing patches which they (or their employer) appear to have a business interest in. Rushing ahead to commit the patch of somebody 'unrelated' leaves a completely different taste than committing your colleagues patch. *INDEPENDENT* of the actual reasons and the state of the patch. I haven't been doing much personal development work around here lately, but I did more than a little of the planning on how to handle this as responsibly as I felt it deserved. I think this is worth talking about a little bit because this whole topic, how to deal with a funded project where the committer is also a contributor, isn't something a lot of people have visibility into. (Not talking about you, Andres, you know the deal) This commit didn't happen until I first succeeded in getting Stephen Frost fully funded as a community PostgreSQL contributor (for this job, as well as others with a broader community appeal). Everyone attached to the budget side very explicitly understands that includes an open-ended responsibility to the community for addressing fall-out from RLS, going from unforeseen issues to revisiting the things left on the known open items list. It's hard to reach perfect before commit; eventually you just have to just shoot and see what happens. Just to be really safe, I also kicked off training a whole new PostgreSQL contributor (Adam Brightwell) five months ago, so that by the time we got to here he's also making his own contributions to the security code of the database. And that's included exercises tracking down bugs in the early RLS POC deployments already going on here at Crunchy, so that Stephen isn't the sole Postgres community expertise bottleneck on the code even at this company. (I am NOT on the hook for fixing RLS bugs) The decision on when to commit was strictly Stephen's. During our last chat, we agreed this seemed like an ideal time of year in the development cycle for such a thing to land though. 9.5 is a fresh branch, and there is plenty of time to clean up and/or revert if that's the unfortunate end for anything going in today. Plus the ongoing CommitFest schedule means everyone in the community already is arranging to provide review resources needed in the near future pipeline. I can understand that Robert feels a procedural sting and/or slight due to the exact sequence of events here, and I'm staying out of that. But in general, I don't know what we could have done much better to be a responsible contributor, to finally push this long in development feature over the finish line. A description like rushing ahead would feel inappropriate to me for this one, given how much care went into timing even roughly when this particular commit should happen. The time of year in particular was very specifically aimed at for minimal disruption, based on both major version release dates and the related community development cycle around them. -- Greg Smith greg.sm...@crunchydatasolutions.com Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Scaling shared buffer eviction
On 9/23/14, 10:31 AM, Robert Haas wrote: I suggest we count these things: 1. The number of buffers the reclaimer has put back on the free list. 2. The number of times a backend has run the clocksweep. 3. The number of buffers past which the reclaimer has advanced the clock sweep (i.e. the number of buffers it had to examine in order to reclaim the number counted by #1). 4. The number of buffers past which a backend has advanced the clocksweep (i.e. the number of buffers it had to examine in order to allocate the number of buffers count by #3). 5. The number of buffers allocated from the freelist which the backend did not use because they'd been touched (what you're calling buffers_touched_freelist). All sound reasonable. To avoid wasting time here, I think it's only worth doing all of these as DEBUG level messages for now. Then only go through the overhead of exposing the ones that actually seem relevant. That's what I did for the 8.3 work, and most of that data at this level was barely relevant to anyone but me then or since. We don't want the system views to include so much irrelevant trivia that finding the important parts becomes overwhelming. I'd like to see that level of instrumentation--just the debug level messages--used to quantify the benchmarks that people are running already, to prove they are testing what they think they are. That would have caught the test error you already stumbled on for example. Simple paranoia says there may be more issues like that hidden in here somewhere, and this set you've identified should find any more of them around. If all that matches up so the numbers for the new counters seem sane, I think that's enough to commit the first basic improvement here. Then make a second pass over exposing the useful internals that let everyone quantify either the improvement or things that may need to be tunable. -- Greg Smith greg.sm...@crunchydatasolutions.com Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core function in Postgres
On 9/23/14, 8:02 PM, Michael Paquier wrote: pgsql-hackers is as well a mailing list where people have technical discussions about features and patches, hence your question would be more adapted for pgsql-novice or pgsql-general. Let's be fair and get the details perfect if we're going to advise people on list policy, and that's not quite right if you compare against http://www.postgresql.org/list/ The official description says you consider trying elsewhere first, and If your question cannot be answered by people in the other lists, and it is likely that only a developer will know the answer, you may re-post your question in this list. From that, it's completely reasonable that Mingzhe guessed how main() is used in PostgreSQL is something only a developer would know the answer to, and therefore should go here instead of pgsql-general. My feedback would be to point out that that pgsql-general is also a list where many of the developers monitor traffic, and that's the right place to post first for this sort of question. The mailing list page does not make that completely obvious though, so this is an understandable thing to be unsure about. This is definitely not a pgsql-novice question. The mailing list page could probably use an explicit clarification that pgsql-general *is* an appropriate venue for simpler questions that only a developer will know the answer. -- Greg Smith greg.sm...@crunchydatasolutions.com Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: rounding up time value less than its unit.
On 9/23/14, 1:21 AM, David Johnston wrote: This patch should fix the round-to-zero issue. If someone wants to get rid of zero as a special case let them supply a separate patch for that improvement. I am going to wander into this fresh after just reading everything once (but with more than a little practice with real-world GUC mucking) and say that, no, it should not even do that. The original complaint here is real and can be straightforward to fix, but I don't like any of the proposals so far. My suggestion: fix the one really bad one in 9.5 with an adjustment to the units of log_rotation_age. That's a small commit that should thank Tomonari Katsumata for discovering the issue and even suggesting a fix (that we don't use) and a release note; sample draft below. Stop there. Let someone with a broader objection take on the fact that zero (and -1) values have special properties, and that sucks, as a fully reasoned redesign. I have a larger idea for minimizing rounding issues of timestamps in particular at the bottom of this too. I wouldn't dare take on changes to rounding of integers without sorting out the special flag value issue first, because the variety of those in the database is large compared to the time ones. == log_rotation_age == Back to where this started at http://www.postgresql.org/message-id/53992ff8.2060...@po.ntts.co.jp : log_rotation_age would be disabled if we set it less than one minute, with this example of a surprising behavior: log_rotation_age = 10s postgres=# show log_rotation_age ; log_rotation_age -- 0 That's bad and the GUC system is not being friendly; fully agreed. But this is just one where the resolution for the parameter is poor. log_rotation_age is the *only* GUC with a time interval that's set in minutes: postgres=# SELECT name, unit FROM pg_settings WHERE unit ='min'; name | unit --+-- log_rotation_age | min For comparison, there are 10 GUCs set in seconds and 13 in ms in HEAD. We could just change the units for log_rotation_age to seconds, then let the person who asks for a 10s rotation time suffer for that decision only with many log files. The person who instead chooses 10ms may find log rotation disabled altogether because that rounds to zero, but now we are not talking about surprises on something that seems sane--that's a fully unreasonable user request. Following that style of fix all the way through to the sort of release notes needed would give something like this: log_rotation_age is now set in units of seconds instead of minutes. Earlier installations of PostgreSQL that set this value directly, to a value in minutes, should change that setting to use a time unit before migrating to this version. And we could be done for now. == Flag values and error handling == I would like to see using 0 and -1 as special values in GUCs overhauled one day, with full disregard for backward compatibility as a straightjacket on doing the right thing. This entire class of behavior is annoying. But I am skeptical anything less than such an overhaul will really be useful. To me it's not worth adding new code, documentation, and associated release notes to guide migration all to try and describe new rounding trivia--which I don't see as better nor worse than the trade-offs of what happens today. I can't take the idea of throwing errors for something this minor seriously at all. There are a lot more precedents for spreading settings around in a few places now, from include_dir to postgresql.auto.conf. There are so many ways to touch a config now and not notice the error message now, I really don't want to see any more situations appear where trying to change a setting will result in a broken file added into that mix. None of this broken units due to rounding stuff that I've found, now that I went out of my way looking for some, even comes close to rising to that level of serious to me. Only this log rotation one is so badly out of line that it will act quite badly. == Overhauling all time unit GUCs == Here's the complete list of potential ugly time settings to review in the future, if my suggestion of only fixing log_rotation_age were followed: gsmith=# SELECT name,setting, unit, min_val FROM pg_settings where unit in ('s','ms') and (min_val::integer)=0 order by unit,min_val,name; name | setting | unit | min_val --+-+--+- autovacuum_vacuum_cost_delay | 20 | ms | -1 log_autovacuum_min_duration | -1 | ms | -1 log_min_duration_statement | -1 | ms | -1 max_standby_archive_delay| 3 | ms | -1 max_standby_streaming_delay | 3 | ms | -1 lock_timeout | 0 | ms | 0 statement_timeout| 0 | ms | 0 vacuum_cost_delay| 0 | ms | 0 wal_receiver_timeout | 6
Re: [HACKERS] proposal: rounding up time value less than its unit.
On 9/23/14, 10:52 PM, David Johnston wrote: Given the following why not just pick ms for log_rotation_age now? Right now there are people out there who have configurations that look like this: log_rotation_age=60 In order to get hourly rotation. These users will suffer some trauma should they upgrade to a version where this parameter now means a new log file pops every 60 seconds instead. If they didn't catch the warning in the release notes and it happens, I'm pretty comfortable they'll survive though, just with a bunch of cursing until it's resolved. I'd even wager a beer that more than 10% of PostgreSQL installs that get hit won't even notice. I'd prefer not to make that experience into one where they get a log file every 60ms though. That's seriously bad. I'm not opposed to making major changes like that, I just like them to be as part of updates big enough that people are unlikely to miss that something is different. Just doing this log_rotation_age thing is small enough that I expect people to miss the change in the release notes. That limits how much I think we can change the magnitude of an easy to miss setting with a large unit adjustment. ? are there any special considerations for people using pg_dump vs. those using pg_upgrade? I don't know that there's any code in pg_upgrade aiming at this sort of job. I'd prefer not to add find parameters that have changed in meaning and flag them to pg_upgrade's job requirements. It has enough problems to worry about, and we really don't do this very often. If we are going to go that far why not simply modify the GUC input routine to require unit-values on these particular parameters? Direct manipulation of pg_settings probably would need some attention but everything else could reject integers and non-unit-specifying text. That would be fine by me as a long-term direction, but it's more of a two to three version release level of change. To keep that from being terrible for users, we'd need to provide a transition release that helped people find the problem ones without units. After that was in the wild for a while, then could we have some confidence that enough people had flushed the issue out enough to turn it into a hard requirement. The project went through this exact sort of exercise with the standard_conforming_strings GUC being the way we flagged the bad constructs for people. That was a much harder job because it touched everyone's application code too. But it took many years to play out. I'd be shocked if we could herd our flock of old time users fast enough to remove unitless GUC values from PostgreSQL in less than 3 years worth of new releases. -- Greg Smith greg.sm...@crunchydatasolutions.com Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Scaling shared buffer eviction
On 9/23/14, 7:13 PM, Robert Haas wrote: I think we expose far too little information in our system views. Just to take one example, we expose no useful information about lwlock acquire or release, but a lot of real-world performance problems are caused by lwlock contention. I sent over a proposal for what I was calling Performance Events about a year ago. The idea was to provide a place to save data about lock contention, weird checkpoint sync events, that sort of thing. Replacing log parsing to get at log_lock_waits data was my top priority. Once that's there, lwlocks was an obvious next target. Presumably we just needed collection to be low enough overhead, and then we can go down to whatever shorter locks we want; lower the overhead, faster the event we can measure. Sometimes the database will never be able to instrument some of its fastest events without blowing away the event itself. We'll still have perf / dtrace / systemtap / etc. for those jobs. But those are not the problems of the average Postgres DBA's typical day. The data people need to solve this sort of thing in production can't always show up in counters. You'll get evidence the problem is there, but you need more details to actually find the culprit. Some info about the type of lock, tables and processes involved, maybe the query that's running, that sort of thing. You can kind of half-ass the job if you make per-tables counter for everything, but we really need more, both to serve our users and to compare well against what other databases provide for tools. That's why I was trying to get the infrastructure to capture all that lock detail, without going through the existing logging system first. Actually building Performance Events fell apart on the storage side: figuring out where to put it all without waiting for a log file to hit disk. I wanted in-memory storage so clients don't wait for anything, then a potentially lossy persistence writer. I thought I could get away with a fixed size buffer like pg_stat_statements uses. That was optimistic. Trying to do better got me lost in memory management land without making much progress. I think the work you've now done on dynamic shared memory gives the right shape of infrastructure that I could pull this off now. I even have funding to work on it again, and it's actually the #2 thing I'd like to take on as I get energy for new feature development. (#1 is the simple but time consuming job of adding block write counters, the lack of which which is just killing me on some fast growing installs) I have a lot of unread messages on this list to sort through right now. I know I saw someone try to revive the idea of saving new sorts of performance log data again recently; can't seem to find it again right now. That didn't seem like it went any farther than thinking about the specifications though. The last time I jumped right over that and hit a wall with this one hard part of the implementation instead, low overhead memory management for saving everything. -- Greg Smith greg.sm...@crunchydatasolutions.com Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Per table autovacuum vacuum cost limit behaviour strange
On 8/28/14, 12:18 PM, Robert Haas wrote: At least in situations that I've encountered, it's typical to be able to determine the frequency with which a given table needs to be vacuumed to avoid runaway bloat, and from that you can work backwards to figure out how fast you must process it in MB/s, and from there you can work backwards to figure out what cost delay will achieve that effect. But if the system tinkers with the cost delay under the hood, then you're vacuuming at a different (slower) rate and, of course, the table bloats. The last time I took a whack at this, I worked toward making all of the parameters operate in terms of target MB/s, for exactly this style of thinking and goal. Those converted into the same old mechanism under the hood and I got the math right to give the same behavior for the simple cases, but that could have been simplified eventually. I consider that line of thinking to be the only useful one here. The answer I like to these values that don't inherit as expected in the GUC tree is to nuke that style of interface altogether in favor of simplifer bandwidth measured one, then perhaps add multiple QoS levels. Certainly no interest in treating the overly complicated innards of cost computation as a bug and fixing them with even more complicated behavior. The part of this I was trying hard to find time to do myself by the next CF was a better bloat measure tool needed to actually see the problem better. With that in hand, and some nasty test cases, I wanted to come back to simplified MB/s vacuum parameters with easier to understand sharing rules again. If other people are hot to go on that topic, I don't care if I actually do the work; I just have a pretty clear view of what I think people want. The only plausible use case for setting a per-table rate that I can see is when you actually want the system to use that exact rate for that particular table. That's the main one, for these must run on schedule or else jobs. Yes. On 8/29/14, 9:45 AM, Alvaro Herrera wrote: Anyway it seems to me maybe there is room for a new table storage parameter, say autovacuum_do_balance which means to participate in the balancing program or not. If that eliminates some of the hairy edge cases, sure. A useful concept to consider is having a soft limit that most thing work against, along with a total hard limit for the server. When one of these tight schedule queries with !autovacuum_do_balance starts, they must run at their designed speed with no concern for anyone else. Which means: a) Their bandwidth gets pulled out of the regular, soft limit numbers until they're done. Last time I had one of these jobs, once the big important boys were running, everyone else in the regular shared set were capped at vacuum_cost_limit=5 worth of work. Just enough to keep up with system catalog things, and over the course of many hours process small tables. b) If you try to submit multiple locked rate jobs at once, and the total goes over the hard limit, they have to just be aborted. If the rush of users comes back at 8AM, and you can clean the table up by then if you give it 10MB/s, what you cannot do is let some other user decrease your rate such that you're unfinished at 8AM. Then you'll have aggressive AV competing against the user load you were trying to prepare for. It's better to just throw a serious error that forces someone to look at the hard limit budget and adjust the schedule instead. The systems with this sort of problem are getting cleaned up every single day, almost continuously; missing a day is not bad as long as it's noted and fixed again before the next cleanup window. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Scaling shared buffer eviction
On 9/16/14, 8:18 AM, Amit Kapila wrote: I think the main reason for slight difference is that when the size of shared buffers is almost same as data size, the number of buffers it needs from clock sweep are very less, as an example in first case (when size of shared buffers is 12286MB), it actually needs at most 256 additional buffers (2MB) via clock sweep, where as bgreclaimer will put 2000 (high water mark) additional buffers (0.5% of shared buffers is greater than 2000 ) in free list, so bgreclaimer does some extra work when it is not required This is exactly what I was warning about, as the sort of lesson learned from the last round of such tuning. There are going to be spots where trying to tune the code to be aggressive on the hard cases will work great. But you need to make that dynamic to some degree, such that the code doesn't waste a lot of time sweeping buffers when the demand for them is actually weak. That will make all sorts of cases that look like this slower. We should be able to tell these apart if there's enough instrumentation and solid logic inside of the program itself though. The 8.3 era BGW coped with a lot of these issues using a particular style of moving average with fast reaction time, plus instrumenting the buffer allocation rate as accurately as it could. So before getting into high/low water note questions, are you comfortable that there's a clear, accurate number that measures the activity level that's important here? And have you considered ways it might be averaging over time or have a history that's analyzed? The exact fast approach / slow decay weighted moving average approach of the 8.3 BGW, the thing that tried to smooth the erratic data set possible here, was a pretty critical part of getting itself auto-tuning to workload size. It ended up being much more important than the work of setting the arbitrary watermark levels. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Scaling shared buffer eviction
On 9/11/14, 7:01 AM, Andres Freund wrote: I'm not convinced that these changes can be made without also changing the bgwriter logic. Have you measured whether there are differences in how effective the bgwriter is? Not that it's very effective right now :) The current background writer tuning went out of its way to do nothing when it wasn't clear there was something that always worked. What happened with all of the really clever schemes was that they worked on some workloads, and just trashed others. Most of the gain from the 8.3 rewrite came from looking at well theorized ideas for how to handle things like pre-emptive LRU scanning for writes, and just throwing them out altogether in favor of ignoring the problem. The magic numbers left in or added to the code were tuned to do very little work, intentionally. If anything, since then the pressure to do nothing has gone up in the default install, because now people are very concerned about extra wakeups using power. To find bad cases before, I was running about 30 different test combinations by the end, Heikki was running another set in the EDB lab, I believe there was a lab at NTT running their own set too. What went in was the combination that didn't break any of them badly--not the one that performed best on the good workloads. This looks like it's squashed one of the very fundamental buffer scaling issues though; well done Amit. What I'd like to see is preserving the heart of that while touching as little as possible. When in doubt, do nothing; let the backends suck it up and do the work themselves. I had to take a health break from community development for a while, and I'm hoping to jump back into review again for the rest of the current development cycle. I'll go back to my notes and try to recreate the pathological cases that plagued both the 8.3 BGW rewrite and the aborted 9.2 fsync spreading effort I did; get those running again and see how they do on this new approach. I have a decent sized 24 core server that should be good enough for this job. I'll see what I can do. -- Greg Smith greg.sm...@crunchydatasolutions.com Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench throttling latency limit
On 9/10/14, 10:57 AM, Fabien COELHO wrote: Indeed. I think that people do not like it to change. I remember that I suggested to change timestamps to .yy instead of the unreadable yyy, and be told not to, because some people have tool which process the output so the format MUST NOT CHANGE. So my behavior is not to avoid touching anything in this area. That somewhat hysterical version of events isn't what I said. Heikki has the right idea for backpatching, so let me expand on that rationale, with an eye toward whether 9.5 is the right time to deal with this. Not all software out there will process epoch timestamps with milliseconds added as a fraction at the end. Being able to read an epoch time in seconds as an integer is a well defined standard; the fraction part is not. Here's an example of the problem, from a Mac OS X system: $ date -j -f %a %b %d %T %Z %Y `date` +%s 1410544903 $ date -r 1410544903 Fri Sep 12 14:01:43 EDT 2014 $ date -r 1410544903.532 usage: date [-jnu] [-d dst] [-r seconds] [-t west] [-v[+|-]val[ymwdHMS]] ... [-f fmt date | [[[mm]dd]HH]MM[[cc]yy][.ss]] [+format] The current file format allows any random shell script to use a tool like cut to pull out the second resolution timestamp column as an epoch integer field, then pass it through even a utility as simple as date to reformat that. And for a lot of people, second resolution is perfectly fine anyway. The change you propose will make that job harder for some people, in order to make the job you're interested in easier. I picked the simplest possible example, but there are more. Whether epoch timestamps can have millisecond parts depends on your time library in Java, in Python some behavior depends on whether you have 2.6 or earlier, I don't think gnuplot handles milllisecond ones at all yet; the list goes on and on. Some people will just have to apply a second split for timestamp string pgbench outputs, at the period and use the left side, where right now they can just split the whole thing on a space. What you want to do is actually fine with me--and as far as I know, I'm the producer of the most popular pgbench latency parsing script around--but it will be a new sort of headache. I just wanted the benefit to outweigh that. Breaking the existing scripts and burning compatibility with simple utilities like date was not worth the tiny improvement you wanted in your personal workflow. That's just not how we do things in PostgreSQL. If there's a good case that the whole format needs to be changed anyway, like adding a new field, then we might as well switch to fractional epoch timestamps too now though. When I added timestamps to the latency log in 8.3, parsers that handled milliseconds were even more rare. Today it's still inconsistent, but the workarounds are good enough to me now. There's a lot more people using things like Python instead of bash pipelines here in 2014 too. -- Greg Smith greg.sm...@crunchydatasolutions.com Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for N synchronous standby servers
On 9/12/14, 2:28 PM, Robert Haas wrote: I hate to be the guy always suggesting a mini-language (cf. recent discussion of an expression syntax for pgbench), but we could do much more powerful and flexible things here if we had one. For example, suppose we let each element of synchronous_standby_names use the constructs (X,Y,Z,...) While I have my old list history hat on this afternoon, when the 9.1 deadline was approaching I said that some people were not going to be happy until is it safe to commit? calls an arbitrary function that is passed the names of all the active servers, and then they could plug whatever consensus rule they wanted into there. And then I said that if we actually wanted to ship something, it should be some stupid simple thing like just putting a list of servers in synchronous_standby_names and proceeding if one is active. One of those two ideas worked out... Can you make a case for why it needs to be a mini-language instead of a function? -- Greg Smith greg.sm...@crunchydatasolutions.com Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?
On 6/11/14, 10:26 AM, Robert Haas wrote: Now, as soon as we introduce the concept that selecting from a table might not really mean read from the table but read from the table after applying this owner-specified qual, we're opening up a whole new set of attack surfaces. Every pg_dump is an opportunity to hack somebody else's account, or at least audit their activity. I'm in full agreement we should clearly communicate the issues around pg_dump in particular, because they can't necessarily be eliminated altogether without some major work that's going to take a while to finish. And if the work-around is some sort of GUC for killing RLS altogether, that's ugly but not unacceptable to me as a short-term fix. One of the difficult design requests in my inbox right now asks how pg_dump might be changed both to reduce its overlap with superuser permissions and to allow auditing of its activity. Those requests aren't going away; their incoming frequency is actually rising quite fast right now. They're both things people expect from serious SQL oriented commercial database products, and I'd like to see PostgreSQL continue to displace those as we reach feature parity in those areas. Any way you implement finer grained user permissions and auditing features will be considered a new attack vector when you use those features. The way the proposed RLS feature inserts an arbitrary function for reads has a similar new attack vector when you use that feature. I'm kind of surprised to see this turn into a hot button all of the sudden though, because my thought on all that so far has been a giant so what? This is what PostgreSQL does. You wanna write your own C code and then link the thing right into the server, so that bugs can expose data and crash the whole server? Not only can you shoot yourself in the foot that way, we supply a sample gun and bullets in contrib. How about writing arbitrary code in any one of a dozen server-side languages of wildly varying quality, then hooking that code so it runs as a trigger function whenever you change a row? PostgreSQL is *on it*; we love letting people write some random thing, and then running that random thing against your data as a side-effect of doing an operation. And if you like that...just wait until you learn about this half-assed rules feature we have too! And when the database breaks because the functions people inserted were garbage, that's their fault, not a cause for a CVE. And when someone blindly installs adminpack because it sounded like a pgAdmin requirement, lets a monitoring system run as root so it can watch pg_stat_activity, and then discovers that pair of reasonable decisions suddenly means any fool with monitoring access can call pg_file_unlink...that's their fault too. These are powerful tools with serious implications, and they're expected to be used by equally serious users. We as a development community do need to put a major amount of work into refactoring all of these security mechanisms. There should be less of these embarrassing incidents where bad software design really forced the insecure thing to happen, which I'd argue is the case for that pg_stat_activity example. And luckily so far development resources are appearing for organizations I know of working in that direction recently, as fast as the requirements are rising. I think there's a good outcome at the end of that road. But let's not act like RLS is a scary bogeyman because it introduces a new way to hack the server or get surprising side-effects. That's expected and possibly unavoidable behavior in a feature like this, and there are much worse instances of arbitrary function risk throughout the core code already. -- Greg Smith greg.sm...@crunchydatasolutions.com Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch (v2) for updatable security barrier views
On 4/9/14 9:56 PM, Stephen Frost wrote: As for docs and testing, those are things we would certainly be better off with and may mean that this isn't able to make it into 9.4, which is fair, but I wouldn't toss it out solely due to that. We have a git repo with multiple worked out code examples, ones that have been in active testing for months now. It's private just to reduce mistakes as things are cleared for public consumption, but I (and Mark and Jeff here) can pull anything we like from it to submit to hackers. There's also a test case set from Yeb Havinga he used for his review. I expected to turn at least one of those into a Postgres regression test. The whole thing squealed to a stop when the regression tests Craig was working on were raising multiple serious questions. I didn't see much sense in bundling more boring, passing tests when the ones we already had seemed off--and no one was sure why. Now that Tom has given some guidance on the row locking weirdness, maybe it's time for me to start updating those into make check form. The documentation has been in a similar holding pattern. I have lots of resources to help document what does and doesn't work here to the quality expected in the manual. I just need a little more confidence about which feature set is commit worthy. The documentation that makes sense is very different if only updatable security barrier views is committed. -- Greg Smith greg.sm...@crunchydatasolutions.com Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pending 9.4 patches
On 4/7/14 2:59 AM, Craig Ringer wrote: On 04/05/2014 03:57 AM, Andres Freund wrote: c07) Updatable security barrier views. This needs a serious look by a committer. I've been exercising it via row security and it's been looking pretty solid. It isn't a huge or intrusive patch, and it's seen several rounds of discussion during its development and refinement. (Thanks Dean). Same here, nothing but good feedback from testing. The updatable security barrier views has been sitting in Ready For Committer since late January. Unfortunately, I just learned that some of the people who might commit in this area--Stephen Frost for example--thought there were still major oustanding issues with that part. I (and I think Craig too) been waiting for that to be picked up by a committer, Stephen was waiting for me or Craig to fix unrelated bugs, and that's where the CF process has been deadlocked on this one. A core bugfix with locking in security barrier views is required before the regression tests can be fixed up properly, for one thing. Tom also expressed concerns about how plan invalidation works, though it's not yet clear whether that was just miscommunication about how it works on my part or whether there's a concrete problem there. This is similarly stuck. I'm not out of resources, there's just nothing I can do here myself. For this to move forward, a committer needs to pick up the security barrier views part. We also need a bug fix for the issue that's breaking the regression tests. Once all that's done, RLS on top of updateable security barrier views might be commitable. But there's no way to tell for sure until those other two bits are sorted out. I'd really love to finish this off for 9.4, but other projects have to come first. I have no other projects ahead of this for the remainder of this month. I just can't figure out what to do next until there's a committer (or committers, if someone else is going to take on the locking bug) identified. I looked at the locking problem enough to know that one is over my head. -- Greg Smith greg.sm...@crunchydatasolutions.com Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] four minor proposals for 9.5
On 4/6/14 2:46 PM, Pavel Stehule wrote: Proposed options are interesting for enterprise using, when you have a some more smart tools for log entry processing, and when you need a complex view about performance of billions queries - when cancel time and lock time is important piece in mosaic of server' fitness. I once sent a design proposal over for something I called Performance Events that included this. It will be difficult to get everything people want to track into log_line_prefix macro form. You're right that this particular one can probably be pushed into there, but you're adding four macros just for this feature. And that's only a fraction of what people expect from database per-query performance metrics. The problem I got stuck on with the performance event project was where to store the data collected. If you want to keep up with read rates, you can't use the existing log infrastructure. It has to be something faster, lighter. I wanted to push the data into shared memory somewhere instead. Then some sort of logging consumer could drain that queue and persist it to disk. Since then, we've had a number of advances, particularly these two: -Dynamic shared memory allocation. -Query log data from pg_stat_statements can persist. With those important fundamentals available, I'm wandering around right now trying to get development resources to pick the whole event logging idea up again. The hardest parts of the infrastructure I was stuck on in the past are in the code today. -- Greg Smith greg.sm...@crunchydatasolutions.com Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance
On 1/20/14 9:46 AM, Mel Gorman wrote: They could potentially be used to evalate any IO scheduler changes. For example -- deadline scheduler with these parameters has X transactions/sec throughput with average latency of Y millieseconds and a maximum fsync latency of Z seconds. Evaluate how well the out-of-box behaviour compares against it with and without some set of patches. At the very least it would be useful for tracking historical kernel performance over time and bisecting any regressions that got introduced. Once we have a test I think many kernel developers (me at least) can run automated bisections once a test case exists. That's the long term goal. What we used to get out of pgbench were things like 60 second latencies when a checkpoint hit with GBs of dirty memory. That does happen in the real world, but that's not a realistic case you can tune for very well. In fact, tuning for it can easily degrade performance on more realistic workloads. The main complexity I don't have a clear view of yet is how much unavoidable storage level latency there is in all of the common deployment types. For example, I can take a server with a 256MB battery-backed write cache and set dirty_background_bytes to be smaller than that. So checkpoint spikes go away, right? No. Eventually you will see dirty_background_bytes of data going into an already full 256MB cache. And when that happens, the latency will be based on how long it takes to write the cached 256MB out to the disks. If you have a single disk or RAID-1 pair, that random I/O could easily happen at 5MB/s or less, and that makes for a 51 second cache clearing time. This is a lot better now than it used to be because fsync hasn't flushed the whole cache in many years now. (Only RHEL5 systems still in the field suffer much from that era of code) But you do need to look at the distribution of latency a bit because of how the cache impact things, you can't just consider min/max values. Take the BBWC out of the equation, and you'll see latency proportional to how long it takes to clear the disk's cache out. It's fun upgrading from a disk with 32MB of cache to 64MB only to watch worst case latency double. At least the kernel does the right thing now, using that cache when it can while forcing data out when fsync calls arrive. (That's another important kernel optimization we'll never be able to teach the database) -- Greg Smith greg.sm...@crunchydatasolutions.com Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.4] row level security
On 12/13/13 11:40 PM, Craig Ringer wrote: You may want to check out the updated writable security-barrier views patch. http://www.postgresql.org/message-id/52ab112b.6020...@2ndquadrant.com It may offer a path forward for the CF submission for RLS, letting us get rid of the var/attr fiddling that many here objected to. With my advocacy hat on, I'd like to revisit this idea now that there's a viable updatable security barrier view submission. I thought the most serious showstopper feedback from the last CF's RLS submission was that this needed to be sorted out first. Reworking KaiGai's submission to merge against Dean's new one makes it viable again in my mind, and I'd like to continue toward re-reviewing it as part of this CF in that light. Admittedly it's not ideal to try and do that at the same time the barrier view patch is being modified, but I see that as a normal CF merge of things based on other people's submissions. I mentioned advocacy because the budding new PostgreSQL test instances I'm seeing now will lose a lot of momentum if we end up with no user visible RLS features in 9.4. The pieces we have now can assemble into something that's useful, and I don't think that goal is unreasonably far away. -- Greg Smith greg.sm...@crunchydatasolutions.com Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance
On 1/17/14 10:37 AM, Mel Gorman wrote: There is not an easy way to tell. To be 100%, it would require an instrumentation patch or a systemtap script to detect when a particular page is being written back and track the context. There are approximations though. Monitor nr_dirty pages over time. I have a benchmarking wrapper for the pgbench testing program called pgbench-tools: https://github.com/gregs1104/pgbench-tools As of October, on Linux it now plots the Dirty value from /proc/meminfo over time. You get that on the same time axis as the transaction latency data. The report at the end includes things like the maximum amount of dirty memory observed during the test sampling. That doesn't tell you exactly what's happening to the level someone reworking the kernel logic might want, but you can easily see things like the database's checkpoint cycle reflected by watching the dirty memory total. This works really well for monitoring production servers too. I have a lot of data from a plugin for the Munin monitoring system that plots the same way. Once you have some history about what's normal, it's easy to see when systems fall behind in a way that's ruining writes, and the high water mark often correlates with bad responsiveness periods. Another recent change is that pgbench for the upcoming PostgreSQL 9.4 now allows you to specify a target transaction rate. Seeing the write latency behavior with that in place is far more interesting than anything we were able to watch with pgbench before. The pgbench write tests we've been doing for years mainly told you the throughput rate when all of the caches were always as full as the database could make them, and tuning for that is not very useful. Turns out it's far more interesting to run at 50% of what the storage is capable of, then watch what happens to latency when you adjust things like the dirty_* parameters. I've been working on the problem of how we can make a benchmark test case that acts enough like real busy PostgreSQL servers that we can share it with kernel developers, and then everyone has an objective way to measure changes. These rate limited tests are working much better for that than anything I came up with before. I am skeptical that the database will take over very much of this work and perform better than the Linux kernel does. My take is that our most useful role would be providing test cases kernel developers can add to a performance regression suite. Ugly we never though that would happen situations seems at the root of many of the kernel performance regressions people here get nailed by. Effective I/O scheduling is very hard, and we are unlikely to ever out innovate the kernel hacking community by pulling more of that into the database. It's already possible to experiment with moving in that direction with tuning changes. Use a larger database shared_buffers value, tweak checkpoints to spread I/O out, and reduce things like dirty_ratio. I do some of that, but I've learned it's dangerous to wander too far that way. If instead you let Linux do even more work--give it a lot of memory to manage and room to re-order I/O--that can work out quite well. For example, I've seen a lot of people try to keep latency down by using the deadline scheduler and very low settings for the expire times. Theory is great, but it never works out in the real world for me though. Here's the sort of deadline I deploy instead now: echo 500 ${DEV}/queue/iosched/read_expire echo 30${DEV}/queue/iosched/write_expire echo 1048576 ${DEV}/queue/iosched/writes_starved These numbers look insane compared to the defaults, but I assure you they're from a server that's happily chugging through 5 to 10K transactions/second around the clock. PostgreSQL forces writes out with fsync when they must go out, but this sort of tuning is basically giving up on it managing writes beyond that. We really have no idea what order they should go out in. I just let the kernel have a large pile of work queued up, and trust things like the kernel's block elevator and congestion code are smarter than the database can possibly be. -- Greg Smith greg.sm...@crunchydatasolutions.com Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gaussian distribution pgbench
On 12/19/13 5:52 PM, Gavin Flower wrote: Curious, wouldn't the common usage pattern tend to favour a skewed distribution, such as the Poisson Distribution (it has been over 40 years since I studied this area, so there may be better candidates). Some people like database load testing with a Pareto principle distribution, where 80% of the activity hammers 20% of the rows such that locking becomes important. (That's one specific form of Pareto distribution) The standard pgbench load indirectly gets you quite a bit of that due to all the contention on the branches table. Targeting all of that at a single table can be more realistic. My last round of reviewing a pgbench change left me pretty worn out with wanting to extend that code much further. Adding in some new probability distributions would be fine though, that's a narrow change. We shouldn't get too excited about pgbench remaining a great tool for too much longer though. pgbench is fast approaching a wall nowadays, where it's hard for any single client server to fully overload today's larger server. You basically need a second large server to generate load, whereas what people really want is a bunch of coordinated small clients. (That sort of wall was in early versions too, it just got pushed upward a lot by the multi-worker changes in 9.0 coming around the same time desktop core counts really skyrocketed) pgbench started as a clone of a now abandoned Java project called JDBCBench. I've been seriously considering a move back toward that direction lately. Nowadays spinning up ten machines to run load generation is trivial. The idea of extending pgbench's C code to support multiple clients running at the same time and collating all of their results is not a project I'd be excited about. It should remain a perfectly fine tool for PostgreSQL developers to find code hotspots, but that's only so useful. (At this point someone normally points out Tsung solved all of those problems years ago if you'd only give it a chance. I think it's kind of telling that work on sysbench is rewriting the whole thing so you can use Lua for your test scripts.) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] row security roadmap proposal
On 12/18/13 10:21 PM, Craig Ringer wrote: In the end, sometimes I guess there's no replacement for WHERE call_some_procedure() That's where I keep ending up at. The next round of examples I'm reviewing this week plug pl/pgsql code into that model. And the one after that actually references locally cached data that starts stored in LDAP on another machine altogether. That one I haven't even asked for permission to share with the community because of my long standing LDAP allergy, but the whole thing plugs into the already submitted patch just fine. (Shrug) I started calling all of the things that generate data for RLS to filter on label providers. You've been using SELinux as an example future label provider. Things like this LDAP originated bit are another provider. Making the database itself a richer label provider one day is an interesting usability improvement to map out. But on the proof of concept things I've been getting passed I haven't seen an example where I'd use that yet anyway. The real world label providers are too complicated. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] row security roadmap proposal
On 12/16/13 9:36 AM, Craig Ringer wrote: - Finish and commit updatable security barrier views. I've still got a lot of straightening out to do there. I don't follow why you've put this part first. It has a lot of new development and the risks that go along with that, but the POC projects I've been testing are more interested in the view side issues. - Decide on and implement a structure for row-security functionality its self. I'm persuaded by Robert's comments here, that whatever we expose must be significantly more usable than a DIY on top of views (with the fix for updatable security barrier views to make that possible). Can't say i agree we should be getting tangled into making this slick on the user side right now. If you open a new can of worms like heirachical access labels, this goes back into basic design, and we'll spend months on that before returning to exactly the same implementation details. I tried to make a case for how having a really generic mechanism that's doesn't presume to know how labels will be assigned can be a good thing. Given the state this is all in right now, I'd much rather publish a hard to use but powerful API than to presume you're going to get an easier to use design right. The place I'm at here is trying to figure out the simplest useful thing that could be committed and then hammer on the details. (Not the first time I've beat that drum on a feature) Your roadmap goes way past that, which is great to avoid being painted into a corner, but I'm thinking more about what a useful feature freeze point would look like given it's December 16 now. -- Greg Smith greg.sm...@crunchydatasolutions.com Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers