Re: [HACKERS] Caching for stable expressions with constant arguments v6
On Mon, Jan 16, 2012 at 12:06 PM, Marti Raudsepp wrote: > > Here's v6 of my expression caching patch. i little review... first, i notice a change of behaviour... i'm not sure if i can say this is good or not. with this function: """ create or replace function cached_random() returns numeric as $$ begin raise notice 'cached'; return random(); end; $$ language plpgsql stable; """ if you execute: select *, cached_random() from (select generate_series(1, 10) ) i; on head you get 10 random numbers, with your patch you get 10 times the same random number... wich means your patch make stable promise a hard one. personally i think that's good but i know there are people using, mistakenly, volatile functions inside stable ones --- seems you are moving code in simplify_function(), do you think is useful to do that independently? at least if it provides some clarity to the code --- benchmark. i run a few tests in my laptop (which is not very performant but...) from what i see there is no too much gain for the amount of complexity added... i can see there should be cases which a lot more gain (for example if you use a function to hide a select and you use such a function several times in the select... but i guess it should work the same with a CTE) configuration: name | setting --+ shared_buffers | 4096 synchronous_commit | off filesystem: xfs -- This is from the bench_cache.sh but with -T 150 select * from ts where ts between to_timestamp('2005-01-01', '-MM-DD') and to_timestamp('2005-01-01', '-MM-DD'); head 0.423855 cache 1.949242 select * from ts where ts>now(); head 2.420200 cache 2.580885 /*uncachable*/ select * from one where ts >= to_date(clock_timestamp()::date::text, '-MM-DD') and ts < (to_date(clock_timestamp()::date::text, '-MM-DD') + interval '1 year') head 955.007129 cache 846.917163 /*cachable*/ select * from one where ts >= to_date(now()::date::text, '-MM-DD') and ts < (to_date(now()::date::text, '-MM-DD') + interval '1 year') head 827.484067 cache 801.743863 a benchmark with pgbench scale 1 (average of 3 runs, -T 300 clients =1, except on second run) -scale 1 == simple == head 261.833794 cache 250.22167 == simple (10 clients) == head 244.075592 cache 233.815389 == extended == head 194.676093 cache 202.778580 == prepared == head 300.460328 cache 302.061739 == select only == head 886.207252 cache 909.832986 a benchmark with pgbench scale 20 (average of 3 runs, -T 300 clients =1, except on second run) -scale 20 == simple == head 19.890278 cache 19.536342 == simple (10 clients) == head 40.864455 cache 44.457357 == extended == head 21.372751 cache 19.992955 == prepared == head 19.543434 cache 20.226981 == select only == head 31.780529 cache 36.410658 -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte 24x7 y capacitación -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Group commit, revised
On Mon, Jan 30, 2012 at 6:57 AM, Heikki Linnakangas wrote: > On 27.01.2012 15:38, Robert Haas wrote: >> >> On Fri, Jan 27, 2012 at 8:35 AM, Heikki Linnakangas >> wrote: >>> >>> Yeah, we have to be careful with any overhead in there, it can be a hot >>> spot. I wouldn't expect any measurable difference from the above, though. >>> Could I ask you to rerun the pgbench tests you did recently with this >>> patch? >>> Or can you think of a better test for this? >> >> >> I can't do so immediately, because I'm waiting for Nate Boley to tell >> me I can go ahead and start testing on that machine again. But I will >> do it once I get the word. > > > I committed this. I ran pgbench test on an 8-core box and didn't see any > slowdown. It would still be good if you get a chance to rerun the bigger > test, but I feel confident that there's no measurable slowdown. Is it safe to assume that, under "#ifdef LWLOCK_STATS", a call to LWLockAcquire will always precede any calls to LWLockWaitUntilFree when a new process is started, to calloc the stats assays? I guess it is right now, because the only user is WALWrite, which would never be acquired before WALInsert is at least once. But this doesn't seem very future proof. I guess the same complain could be logged against LWLockConditionalAcquire. Since people wouldn't be expected to define LWLOCK_STATS on production builds, perhaps this issue is ignorable. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LWLockWaitUntilFree (was: Group commit, revised)
On Fri, Feb 3, 2012 at 12:57 PM, Robert Haas wrote: > > I think I recommended a bad name for this function. It's really > LWLockAcquireOrWaitUntilFree. Can we rename that? > > We might also want to consider what all the behaviors are that we > might want here. It strikes me that there are two possible actions > when a lock is immediately available (acquire or noop) and three > possible actions when it isn't (wait-then-acquire, wait-then-noop, or > immediate-noop), for a total of six possible combinations: > > acquire/wait-then-acquire = LWLockAcquire > acquire/wait-then-noop = what you just added > acquire/immediate-noop = LWLockConditionalAcquire > noop/wait-then-acquire > noop/wait-then-noop > noop/immediate-noop > > The last one is clearly pointless, since you don't need a lock at all > to do nothing. But is there a use case for either of the other two? You've only considered things with the same mode on each side of the "/". One thing that I've wanted is: acquire-exclusive-immediate-and-return-true/wait-then-acquire-shared-and-return-false. The idea was that there is something that I need to verify has been done, but it doesn't matter who did it. To do the work, I need exclusive, but to verify that the work has already been done I only need shared. So, if I don't get the exclusive lock, then by the time I can get it the work has probably been done by someone else so I only need a shared lock to verify that. Come to think of it, I think what I wanted my behavior for was for this very thing that inspired this WaitUntilFree, improving group-commit. But I see in this case the verification is being done under a spinlock rather than a shared LWLock. I don't know if there is any case left over for my proposed behavior, or if spinlock verification would always be easy to arrange so as to make it pointless. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checkpoint sync pause
On Mon, Jan 16, 2012 at 5:59 PM, Greg Smith wrote: > On 01/16/2012 11:00 AM, Robert Haas wrote: >> >> Also, I am still struggling with what the right benchmarking methodology >> even is to judge whether >> any patch in this area "works". Can you provide more details about >> your test setup? > > > The "test" setup is a production server with a few hundred users at peak > workload, reading and writing to the database. Each RAID controller (couple > of them with their own tablespaces) has either 512MG or 1GB of > battery-backed write cache. The setup that leads to the bad situation > happens like this: > > -The steady stream of backend writes that happen between checkpoints have > filled up most of the OS write cache. A look at /proc/meminfo shows around > 2.5GB "Dirty:" "backend writes" includes bgwriter writes, right? > > -Since we have shared_buffers set to 512MB to try and keep checkpoint storms > from being too bad, there might be 300MB of dirty pages involved in the > checkpoint. The write phase dumps this all into Linux's cache. There's now > closer to 3GB of dirty data there. @64GB of RAM, this is still only 4.7% > though--just below the effective lower range for dirty_background_ratio. Has using a newer kernal with dirty_background_bytes been tried, so it could be set to a lower level? If so, how did it do? Or does it just refuse to obey below the 5% level, as well? > Linux is perfectly content to let it all sit there. > > -Sync phase begins. Between absorption and the new checkpoint writes, there > are >300 segments to sync waiting here. If there is 3GB of dirty data spread over >300 segments each segment is about full-sized (1GB) then on average <1% of each segment is dirty? If that analysis holds, then it seem like there is simply an awful lot of data has to be written randomly, no matter how clever the re-ordering is. In other words, it is not that a harried or panicked kernel or RAID control is failing to do good re-ordering when it has opportunities to, it is just that you dirty your data too randomly for substantial reordering to be possible even under ideal conditions. Does the BBWC, once given an fsync command and reporting success, write out those block forthwith, or does it lolly-gag around like the kernel (under non-fsync) does? If it is waiting around for write-combing opportunities that will never actually materialize in sufficient quantities to make up for the wait, how to get it to stop? Was the sorted checkpoint with an fsync after every file (real file, not VFD) one of the changes you tried? > -The first few syncs force data out of Linux's cache and into the BBWC. > Some of these return almost instantly. Others block for a moderate number > of seconds. That's not necessarily a showstopper, on XFS at least. So long > as the checkpointer is not being given all of the I/O in the system, the > fact that it's stuck waiting for a sync doesn't mean the server is > unresponsive to the needs of other backends. Early data might look like > this: > > DEBUG: Sync #1 time=21.969000 gap=0.00 msec > DEBUG: Sync #2 time=40.378000 gap=0.00 msec > DEBUG: Sync #3 time=12574.224000 gap=3007.614000 msec > DEBUG: Sync #4 time=91.385000 gap=2433.719000 msec > DEBUG: Sync #5 time=2119.122000 gap=2836.741000 msec > DEBUG: Sync #6 time=67.134000 gap=2840.791000 msec > DEBUG: Sync #7 time=62.005000 gap=3004.823000 msec > DEBUG: Sync #8 time=0.004000 gap=2818.031000 msec > DEBUG: Sync #9 time=0.006000 gap=3012.026000 msec > DEBUG: Sync #10 time=302.75 gap=3003.958000 msec Syncs 3 and 5 kind of surprise me. It seems like the times should be more bimodal. If the cache is already full, why doesn't the system promptly collapse, like it does later? And if it is not, why would it take 12 seconds, or even 2 seconds? Or is this just evidence that the gaps you are inserting are partially, but not completely, effective? > > [Here 'gap' is a precise measurement of how close the sync pause feature is > working, with it set to 3 seconds. This is from an earlier version of this > patch. All the timing issues I used to measure went away in the current > implementation because it doesn't have to worry about doing background > writer LRU work anymore, with the checkpointer split out] > > But after a few hundred of these, every downstream cache is filled up. The > result is seeing more really ugly sync times, like #164 here: > > DEBUG: Sync #160 time=1147.386000 gap=2801.047000 msec > DEBUG: Sync #161 time=0.004000 gap=4075.115000 msec > DEBUG: Sync #162 time=0.005000 gap=2943.966000 msec > DEBUG: Sync #163 time=962.769000 gap=3003.906000 msec > DEBUG: Sync #164 time=45125.991000 gap=3033.228000 msec > DEBUG: Sync #165 time=4.031000 gap=2818.013000 msec > DEBUG: Sync #166 time=212.537000 gap=3039.979000 msec > DEBUG: Sync #167 time=0.005000 gap=2820.023000 msec > ... > DEBUG: Sync #355 time=2.55 gap=2806.425000 msec > LOG: Sync 355 files longest=45125.991
Re: [HACKERS] Review of: explain / allow collecting row counts without timing info
On Fri, Feb 3, 2012 at 8:12 AM, Robert Haas wrote: > On Sat, Jan 21, 2012 at 10:32 PM, Jeff Janes wrote: >> On Fri, Jan 13, 2012 at 3:07 PM, Tomas Vondra wrote: >>> >>> Fixed. The default value of TIMING option did not work as intended, it >>> was set to true even for plain EXPLAIN (without ANALYZE). In that case >>> the EXPLAIN failed. >>> >> >> I've applied this over the "show Heap Fetches in EXPLAIN ANALYZE >> output for index-only scans" patch. It applied and built and passes >> installcheck. >> >> I have not done a full review of this, but I want to say that I want >> this very much. Not only can I get the row counts, but also the Heap >> Fetches and the output of BUFFERS, without the overhead of timing. I >> haven't looked at contrib aspects of it at all. >> >> Thank you for making this. > > +1. > > After looking at this a bit, I think we can make the interface a bit > more convenient. I'd like to propose that we call the EXPLAIN option > "ROWS" rather than "TIMING", and that we provide the row-count > information if *either* ROWS or ANALYZE is specified. If we are allowed to make big breaks with the past, I would get rid of ANALYZE altogether. "analyze" basically is a synonym of "explain", or is contained within it. You can't explain anything until you have first analyzed it. Without ANALYZE, you are explaining the analysis done by the planner. With it, you are explaining an analysis done by the planner and the executor. So we could do away with ANALYZE and have TIMING, ROWS, and BUFFERS, any of which causes the query to actually be executed, not just planned. I suspect we will be unwilling to make such a break with the past. In that case, I think I prefer the originally proposed semantics, even though I agree they are somewhat less natural. ANALYZE is a big flag that means "This query will be executed, not just planned". If we are not going to make a major break, but only nibble around the edges, then I don't think we should remove the property that the query will be executed if and only if ANALYZE is specified. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] some longer, larger pgbench tests with various performance-related patches
Robert Haas wrote: > A couple of things stand out at me from these graphs. First, some > of these transactions had really long latency. Second, there are a > remarkable number of seconds all of the test during which no > transactions at all manage to complete, sometimes several seconds > in a row. I'm not sure why. Third, all of the tests initially start > of processing transactions very quickly, and get slammed down very > hard, probably because the very high rate of transaction processing > early on causes a checkpoint to occur around 200 s. The amazing performance at the very start of all of these tests suggests that there is a write-back cache (presumably battery-backed) which is absorbing writes until the cache becomes full, at which point actual disk writes become a bottleneck. The problems you mention here, where no transactions complete, sounds like the usual problem that many people have complained about on the lists, where the controller cache becomes so overwhelmed that activity seems to cease while the controller catches up. Greg, and to a lesser degree myself, have written about this for years. On the nofpw graph, I wonder whether the lower write rate just takes that much longer to fill the controller cache. I don't think it's out of the question that it could take 700 seconds instead of 200 seconds depending on whether full pages are being fsynced to WAL. This effect is precisely why I think that on such machines the DW feature may be a huge help. If one small file is being written to and fsynced repeatedly, it stays "fresh" enough not to actually be written to the disk (it will stay in OS or controller cache), and the disks are freed up to write everything else, helping to keep the controller cache from being overwhelmed. (Whether patches to date are effective at achieving this is a separate question -- I'm convinced the concept is sound for certain important workloads.) > I didn't actually log when the checkpoints were occurring, It would be good to have that information if you can get it for future tests. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: Review of: explain / allow collecting row counts without timing info
On Fri, Feb 03, 2012 at 11:12:33AM -0500, Robert Haas wrote: > After looking at this a bit, I think we can make the interface a bit > more convenient. I'd like to propose that we call the EXPLAIN option > "ROWS" rather than "TIMING", and that we provide the row-count > information if *either* ROWS or ANALYZE is specified. > > Then, if you want rows without timing, you can say this: > > EXPLAIN (ROWS) query... > > Rather than, as in the approach taken by the current patch: > > EXPLAIN (ANALYZE, TIMING off) query... > > ...which is a little more long-winded. I favor the originally-submitted syntax. I like having a single option, ANALYZE, signal whether to actually run the statement. The submitted syntax also aligns with the fact that would motivate someone to use it: "Timing has high overhead on my system." or "Timing makes the output unstable." We have both estimated row counts and actual row counts. It may someday prove useful to have an invocation that plans the query and shows estimated row counts, omitting only cost estimates. Having a "ROWS" option that implies running the query could complicate that effort. Thanks, nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ecpglib use PQconnectdbParams
On Thu, Feb 02, 2012 at 08:01:48PM +0200, Peter Eisentraut wrote: > I noticed ecpglib still uses PQconnectdb() with a craftily assembled > connection string. Here is a patch to use PQconnectdbParams() instead. Thanks, committed. Will sync as soon as I'm online again. Michael -- Michael Meskes Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org Jabber: michael.meskes at googlemail dot com VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should we add crc32 in libpgport?
On Tue, Jan 31, 2012 at 3:43 PM, Tom Lane wrote: > Daniel Farina writes: >> On Tue, Jan 17, 2012 at 2:14 AM, Daniel Farina wrote: >>> See the attached patch. It has a detailed cover letter/comment at the >>> top of the file. > >> I have amended that description to be more accurate. > > I looked at this a bit, and it seems to go considerably further than > I had in mind: unless I've missed something, this will instantiate a > couple of kilobytes of static data in every .c file that includes > pg_crc.h, directly or indirectly. While that might be tolerable in an > external project, there are going to be a a LOT of copies of that table > in the backend, many of them unused. Did you check to see how much > larger the backend executable got? For that matter, aren't there a lot > of build warnings about unused static variables? Ah, yes, I think my optimizations were off when building, or something. I didn't get such verbosity at first, and then I remember doing something slightly different and then getting a lot of output. I didn't pay attention to the build size. I will investigate. > It occurs to me that rather than an #ifdef symbol, it might be > appropriate to put the constant table into a separate .h file, > say pg_crc_tables.h, so that users would control it by including > that file or not rather than an #ifdef symbol. This is pretty > trivial but might look a bit nicer. I agree, I was about to say "what about a preprocessor hack..." after the last paragraph, but saw you beat me to the punch. I'll have a look soon. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot standby fails if any backend crashes
On Thu, Feb 2, 2012 at 8:48 PM, Tom Lane wrote: > It's a bit disturbing that nobody has reported this from the field yet. > Seems to imply that hot standby isn't being used much. I have seen this, but didn't get to dig in, as I thought it could be a problem from other things done outside Postgres (it also came up in #6200, but I didn't mention it). Consider it retroactively reported. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review of: explain / allow collecting row counts without timing info
On 3.2.2012 22:51, Robert Haas wrote: > On Fri, Feb 3, 2012 at 2:56 PM, Tomas Vondra wrote: >> OK, thanks for the explanation. I don't like the idea of subsets as it >> IMHO makes it less obvious what options are enabled. For example this >> >> EXPLAIN (ROWS) query... >> >> does not immediately show it's actually going to do ANALYZE. > > Well, it isn't, if ANALYZE means rows + timing... Yeah, sure. It depends on how you define ANALYZE. I see that as 'stats collected at runtime' (in contrast to a query plan, which is just a ... well, plan). >> but what if someone wants both at the same time? Maybe he could do >> >> EXPLAIN (ROWS, BUFFERS) >> >> and treat that as a union of those subsets. I don't think it's worth it. > > Yeah, I forgot that we'd have to allow that, though I don't think it > would be a big deal to fix that. > >> I surely can live with both solutions (mine or the one you proposed). > > Let's wait and see if anyone else has an opinion. Sure. And thanks for your feedback! Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review of: explain / allow collecting row counts without timing info
On Fri, Feb 3, 2012 at 2:56 PM, Tomas Vondra wrote: > OK, thanks for the explanation. I don't like the idea of subsets as it > IMHO makes it less obvious what options are enabled. For example this > > EXPLAIN (ROWS) query... > > does not immediately show it's actually going to do ANALYZE. Well, it isn't, if ANALYZE means rows + timing... > I prefer to keep the current 'ANALYZE' definition, i.e. collecting both > row counts and timing data (which is what 99% of people wants anyway), > and an option to disable the timing. > > And the BUFFERS option currently works exactly like that, so defining > ROWS the way you proposed would be inconsistent with the current behavior. > > Sure, we could redefine BUFFERS as a subset, so you could do > > EXPLAIN (ROWS) ... instead of ... EXPLAIN (ANALYZE, TIMING off) > EXPLAIN (BUFFERS) ... instead of ... EXPLAIN (ANALYZE, BUFFERS on) > > but what if someone wants both at the same time? Maybe he could do > > EXPLAIN (ROWS, BUFFERS) > > and treat that as a union of those subsets. I don't think it's worth it. Yeah, I forgot that we'd have to allow that, though I don't think it would be a big deal to fix that. > I surely can live with both solutions (mine or the one you proposed). Let's wait and see if anyone else has an opinion. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] double writes using "double-write buffer" approach [WIP]
On Fri, Feb 3, 2012 at 3:14 PM, Dan Scales wrote: > Thanks for the feedback! I think you make a good point about the small size > of dirty data in the OS cache. I think what you can say about this > double-write patch is that it will work not work well for configurations that > have a small Postgres cache and a large OS cache, since every write from the > Postgres cache requires double-writes and an fsync. The general guidance for setting shared_buffers these days is 25% of RAM up to a maximum of 8GB, so the configuration that you're describing as not optimal for this patch is the one normally used when running PostgreSQL. I've run across several cases where larger values of shared_buffers are a huge win, because the entire working set can then be accommodated in shared_buffers. But it's certainly not the case that all working sets fit. And in this case, I think that's beside the point anyway. I had shared_buffers set to 8GB on a machine with much more memory than that, but the database created by pgbench -i -s 10 is about 156 MB, so the problem isn't that there is too little PostgreSQL cache available. The entire database fits in shared_buffers, with most of it left over. However, because of the BufferAccessStrategy stuff, pages start to get forced out to the OS pretty quickly. Of course, we could disable the BufferAccessStrategy stuff when double_writes is in use, but bear in mind that the reason we have it in the first place is to prevent cache trashing effects. It would be imprudent of us to throw that out the window without replacing it with something else that would provide similar protection. And even if we did, that would just delay the day of reckoning. You'd be able to blast through and dirty the entirety of shared_buffers at top speed, but then as soon as you started replacing pages performance would slow to an utter crawl, just as it did here, only you'd need a bigger scale factor to trigger the problem. The more general point here is that there are MANY aspects of PostgreSQL's design that assume that shared_buffers accounts for a relatively small percentage of system memory. Here's another one: we assume that backends that need temporary memory for sorts and hashes (i.e. work_mem) can just allocate it from the OS. If we were to start recommending setting shared_buffers to large percentages of the available memory, we'd probably have to rethink that. Most likely, we'd need some kind of in-core mechanism for allocating temporary memory from the shared memory segment. And here's yet another one: we assume that it is better to recycle old WAL files and overwrite the contents rather than create new, empty ones, because we assume that the pages from the old files may still be present in the OS cache. We also rely on the fact that an evicted CLOG page can be pulled back in quickly without (in most cases) a disk access. We also rely on shared_buffers not being too large to avoid walloping the I/O controller too hard at checkpoint time - which is forcing some people to set shared_buffers much smaller than would otherwise be ideal. In other words, even if setting shared_buffers to most of the available system memory would fix the problem I mentioned, it would create a whole bunch of new ones, many of them non-trivial. It may be a good idea to think about what we'd need to do to work efficiently in that sort of configuration, but there is going to be a very large amount of thinking, testing, and engineering that has to be done to make it a reality. There's another issue here, too. The idea that we're going to write data to the double-write buffer only when we decide to evict the pages strikes me as a bad one. We ought to proactively start dumping pages to the double-write area as soon as they're dirtied, and fsync them after every N pages, so that by the time we need to evict some page that requires a double-write, it's already durably on disk in the double-write buffer, and we can do the real write without having to wait. It's likely that, to make this perform acceptably for bulk loads, you'll need the writes to the double-write buffer and the fsyncs of that buffer to be done by separate processes, so that one backend (the background writer, perhaps) can continue spooling additional pages to the double-write files while some other process (a new auxiliary process?) fsyncs the ones that are already full. Along with that, the page replacement algorithm probably needs to be adjusted to avoid evicting pages that need an as-yet-unfinished double-write like the plague, even to the extent of allowing the BufferAccessStrategy rings to grow if the double-writes can't be finished before the ring wraps around. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LWLockWaitUntilFree (was: Group commit, revised)
On Fri, Jan 27, 2012 at 8:35 AM, Heikki Linnakangas wrote: >> I think you should break this off into a new function, >> LWLockWaitUntilFree(), rather than treating it as a new LWLockMode. >> Also, instead of adding lwWaitOnly, I would suggest that we generalize >> lwWaiting and lwExclusive into a field lwWaitRequest, which can be set >> to 1 for exclusive, 2 for shared, 3 for wait-for-free, etc. That way >> we don't have to add another boolean every time someone invents a new >> type of wait - not that that should hopefully happen very often. A >> side benefit of this is that you can simplify the test in >> LWLockRelease(): keep releasing waiters until you come to an exclusive >> waiter, then stop. This possibly saves one shared memory fetch and >> subsequent test instruction, which might not be trivial - all of this >> code is extremely hot. > > Makes sense. Attached is a new version, doing exactly that. I think I recommended a bad name for this function. It's really LWLockAcquireOrWaitUntilFree. Can we rename that? We might also want to consider what all the behaviors are that we might want here. It strikes me that there are two possible actions when a lock is immediately available (acquire or noop) and three possible actions when it isn't (wait-then-acquire, wait-then-noop, or immediate-noop), for a total of six possible combinations: acquire/wait-then-acquire = LWLockAcquire acquire/wait-then-noop = what you just added acquire/immediate-noop = LWLockConditionalAcquire noop/wait-then-acquire noop/wait-then-noop noop/immediate-noop The last one is clearly pointless, since you don't need a lock at all to do nothing. But is there a use case for either of the other two? I can't immediately think of any reason why we'd want noop/wait-then-acquire but noop/wait-then-noop seems potentially useful (that's what I originally thought LWLockWaitUntilFree was going to do). For example, if for some reason you wanted all the WAL flushes to happen in one process, you could have everyone else use that primitive to sleep, and then recheck whether enough flushing had happened (I don't actually think that's better; it's not very robust against WAL writer going away). Or, for ProcArrayLock, you might (as you proposed previously) we could mark our XID with a removal sequence number in lieu of actually removing it, and then wait for ProcArrayLock to be free before overwriting it with a new XID. Since we're out of time for 9.2 I don't think we probably want to devote too much time to experimenting with this right now, but it might be interesting to play around with it next cycle. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] double writes using "double-write buffer" approach [WIP]
Hi Robert, Thanks for the feedback! I think you make a good point about the small size of dirty data in the OS cache. I think what you can say about this double-write patch is that it will work not work well for configurations that have a small Postgres cache and a large OS cache, since every write from the Postgres cache requires double-writes and an fsync. However, it should work much better for configurations with a much large Postgres cache and relatively smaller OS cache (including the configurations that I've given performance results for). In that case, there is a lot more capacity for dirty pages in the Postgres cache, and you won't have nearly as many dirty evictions. The checkpointer is doing a good number of the writes, and this patch sorts the checkpointer's buffers so its IO is efficient. Of course, I can also increase the size of the non-checkpointer ring buffer to be much larger, though I wouldn't want to make it too large, since it is consuming memory. If I increase the size of the ring buffers significantly, I will probably need to add some data structures so that the ring buffer lookups in smgrread() and smgrwrite() are more efficient. Can you let me know what the shared_buffers and RAM sizes were for your pgbench run? I can try running the same workload. If the size of shared_buffers is especially small compared to RAM, then we should increase the size of shared_buffers when using double_writes. Thanks, Dan - Original Message - From: "Robert Haas" To: "Dan Scales" Cc: "PG Hackers" Sent: Thursday, February 2, 2012 7:19:47 AM Subject: Re: [HACKERS] double writes using "double-write buffer" approach [WIP] On Fri, Jan 27, 2012 at 5:31 PM, Dan Scales wrote: > I've been prototyping the double-write buffer idea that Heikki and Simon > had proposed (as an alternative to a previous patch that only batched up > writes by the checkpointer). I think it is a good idea, and can help > double-writes perform better in the case of lots of backend evictions. > It also centralizes most of the code change in smgr.c. However, it is > trickier to reason about. This doesn't compile on MacOS X, because there's no writev(). I don't understand how you can possibly get away with such small buffers. AIUI, you must retained every page in the double-write buffer until it's been written and fsync'd to disk. That means the most dirty data you'll ever be able to have in the operating system cache with this implementation is (128 + 64) * 8kB = 1.5MB. Granted, we currently have occasional problems with the OS caching too *much* dirty data, but that seems like it's going way, way too far in the opposite direction. That's barely enough for the system to do any write reordering at all. I am particularly worried about what happens when a ring buffer is in use. I tried running "pgbench -i -s 10" with this patch applied, full_page_writes=off, double_writes=on. It took 41.2 seconds to complete. The same test with the stock code takes 14.3 seconds; and the actual situation is worse for double-writes than those numbers might imply, because the index build time doesn't seem to be much affected, while the COPY takes a small eternity with the patch compared to the usual way of doing things. I think the slowdown on COPY once the double-write buffer fills is on the order of 10x. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review of: explain / allow collecting row counts without timing info
On 3.2.2012 18:28, Robert Haas wrote: > On Fri, Feb 3, 2012 at 12:08 PM, Tomas Vondra wrote: >> I don't think changing the EXPLAIN syntax this way is a good idea. I think >> that one option should not silently enable/disable others, so ROWS should >> not enable ANALYZE automatically. > > I didn't propose that. The point is that the desired behavior > (however we name it) is a SUBSET of what ANALYZE does. > > So we can either: > > 1. Have ANALYZE enable all the behavior, and have another option > (TIMING) that can be used to turn some of it back on again. > > 2. Have ANALYZE enable all the behavior, and have another option > (ROWS) that enables just the subset of it that we want. > > I prefer #2 to #1, because I think it's a bit confusing to have an > option whose effect is to partially disable another option. YMMV, of > course. OK, thanks for the explanation. I don't like the idea of subsets as it IMHO makes it less obvious what options are enabled. For example this EXPLAIN (ROWS) query... does not immediately show it's actually going to do ANALYZE. I prefer to keep the current 'ANALYZE' definition, i.e. collecting both row counts and timing data (which is what 99% of people wants anyway), and an option to disable the timing. And the BUFFERS option currently works exactly like that, so defining ROWS the way you proposed would be inconsistent with the current behavior. Sure, we could redefine BUFFERS as a subset, so you could do EXPLAIN (ROWS) ... instead of ... EXPLAIN (ANALYZE, TIMING off) EXPLAIN (BUFFERS) ... instead of ... EXPLAIN (ANALYZE, BUFFERS on) but what if someone wants both at the same time? Maybe he could do EXPLAIN (ROWS, BUFFERS) and treat that as a union of those subsets. I don't think it's worth it. I surely can live with both solutions (mine or the one you proposed). kind regards Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review of: explain / allow collecting row counts without timing info
On Fri, Feb 3, 2012 at 12:08 PM, Tomas Vondra wrote: > I don't think changing the EXPLAIN syntax this way is a good idea. I think > that one option should not silently enable/disable others, so ROWS should > not enable ANALYZE automatically. I didn't propose that. The point is that the desired behavior (however we name it) is a SUBSET of what ANALYZE does. So we can either: 1. Have ANALYZE enable all the behavior, and have another option (TIMING) that can be used to turn some of it back on again. 2. Have ANALYZE enable all the behavior, and have another option (ROWS) that enables just the subset of it that we want. I prefer #2 to #1, because I think it's a bit confusing to have an option whose effect is to partially disable another option. YMMV, of course. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review of: explain / allow collecting row counts without timing info
On 3 Únor 2012, 17:12, Robert Haas wrote: > On Sat, Jan 21, 2012 at 10:32 PM, Jeff Janes wrote: >> On Fri, Jan 13, 2012 at 3:07 PM, Tomas Vondra wrote: >>> >>> Fixed. The default value of TIMING option did not work as intended, it >>> was set to true even for plain EXPLAIN (without ANALYZE). In that case >>> the EXPLAIN failed. >>> >> >> I've applied this over the "show Heap Fetches in EXPLAIN ANALYZE >> output for index-only scans" patch. It applied and built and passes >> installcheck. >> >> I have not done a full review of this, but I want to say that I want >> this very much. Not only can I get the row counts, but also the Heap >> Fetches and the output of BUFFERS, without the overhead of timing. I >> haven't looked at contrib aspects of it at all. >> >> Thank you for making this. > > +1. > > After looking at this a bit, I think we can make the interface a bit > more convenient. I'd like to propose that we call the EXPLAIN option > "ROWS" rather than "TIMING", and that we provide the row-count > information if *either* ROWS or ANALYZE is specified. > > Then, if you want rows without timing, you can say this: > > EXPLAIN (ROWS) query... > > Rather than, as in the approach taken by the current patch: > > EXPLAIN (ANALYZE, TIMING off) query... > > ...which is a little more long-winded. This also has the nice > advantage of making the name of the EXPLAIN option match the name of > the INSTRUMENT flag. > > Revised patch along those lines attached. Barring objections, I'll > commit this version. I don't think changing the EXPLAIN syntax this way is a good idea. I think that one option should not silently enable/disable others, so ROWS should not enable ANALYZE automatically. It should throw an error just like the TIMING option does when invoked without ANALYZE (IIRC). Which leads to syntax like this EXPLAIN (ANALYZE, ROWS) and that's a bit strange because it mentions ROWS but actually manipulates TIMING behind the curtains. You can't actually do EXPLAIN (ANALYZE, ROWS off) because row counts are always collected. So I think the syntax I proposed makes more sense. kind regards Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] xlog min recovery request ... is past current point ...
PostgreSQL 9.0.4: While bringing up a streaming replica, and while it is working its way through the WAL segments before connecting to the primary, I see a lot of messages of the form: 2012-02-01 21:26:13.978 PST,,,24448,,4f2a1e61.5f80,54,,2012-02-01 21:25:53 PST,1/0,0,LOG,0,"restored log file ""00010DB40065"" from archive","" 2012-02-01 21:26:14.032 PST,,,24448,,4f2a1e61.5f80,55,,2012-02-01 21:25:53 PST,1/0,0,WARNING,01000,"xlog min recovery request DB5/42E15098 is past current point DB4/657FA490","writing block 5 of relation base/155650/156470_vm xlog redo insert: rel 1663/155650/1658867; tid 9640/53""" 2012-02-01 21:26:14.526 PST,,,24448,,4f2a1e61.5f80,56,,2012-02-01 21:25:53 PST,1/0,0,LOG,0,"restored log file ""00010DB40066"" from archive","" All of these are on _vm relations. The recovery completed successfully and the secondary connected to the primary without issue, so: Are these messages something to be concerned over? -- -- Christophe Pettus x...@thebuild.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review of: explain / allow collecting row counts without timing info
On Sat, Jan 21, 2012 at 10:32 PM, Jeff Janes wrote: > On Fri, Jan 13, 2012 at 3:07 PM, Tomas Vondra wrote: >> >> Fixed. The default value of TIMING option did not work as intended, it >> was set to true even for plain EXPLAIN (without ANALYZE). In that case >> the EXPLAIN failed. >> > > I've applied this over the "show Heap Fetches in EXPLAIN ANALYZE > output for index-only scans" patch. It applied and built and passes > installcheck. > > I have not done a full review of this, but I want to say that I want > this very much. Not only can I get the row counts, but also the Heap > Fetches and the output of BUFFERS, without the overhead of timing. I > haven't looked at contrib aspects of it at all. > > Thank you for making this. +1. After looking at this a bit, I think we can make the interface a bit more convenient. I'd like to propose that we call the EXPLAIN option "ROWS" rather than "TIMING", and that we provide the row-count information if *either* ROWS or ANALYZE is specified. Then, if you want rows without timing, you can say this: EXPLAIN (ROWS) query... Rather than, as in the approach taken by the current patch: EXPLAIN (ANALYZE, TIMING off) query... ...which is a little more long-winded. This also has the nice advantage of making the name of the EXPLAIN option match the name of the INSTRUMENT flag. Revised patch along those lines attached. Barring objections, I'll commit this version. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company explain-rows.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch for parallel pg_dump
On Fri, Feb 3, 2012 at 7:52 AM, Robert Haas wrote: > If you're OK with that much I'll go do it. Sure, go ahead! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Assertion failure in AtCleanup_Portals
If I run the following sequence of commands, I get an assertion failure in current HEAD. postgres=# BEGIN; BEGIN postgres=# SELECT 1/0; ERROR: division by zero postgres=# ROLLBACK TO A; ERROR: no such savepoint postgres=# \q The process fails when the session is closed and aborted transaction gets cleaned at the proc_exit time. The stack trace is as below. (gdb) bt #0 0x00c16422 in __kernel_vsyscall () #1 0x00244651 in *__GI_raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #2 0x00247a82 in *__GI_abort () at abort.c:92 #3 0x0844b800 in ExceptionalCondition (conditionName=0x86161e8 "!(portal->cleanup == ((void *)0))", errorType=0x8615e94 "FailedAssertion", fileName=0x8615e88 "portalmem.c", lineNumber=793) at assert.c:57 #4 0x08472a5a in AtCleanup_Portals () at portalmem.c:793 #5 0x080f2535 in CleanupTransaction () at xact.c:2373 #6 0x080f40d7 in AbortOutOfAnyTransaction () at xact.c:3855 #7 0x0845cc5e in ShutdownPostgres (code=0, arg=0) at postinit.c:966 #8 0x08332ba7 in shmem_exit (code=0) at ipc.c:221 #9 0x08332ab3 in proc_exit_prepare (code=0) at ipc.c:181 #10 0x08332a15 in proc_exit (code=0) at ipc.c:96 #11 0x0835b07c in PostgresMain (argc=2, argv=0xa12fa64, username=0xa12f958 "pavan") at postgres.c:4091 I analyzed this a bit. It seems that the first statement throws an error, runs AbortCurrentTransaction and puts the transaction block in TBLOCK_ABORT state. Any commands executed after this would usually fail with error "current transaction is aborted". But the ROLLBACK TO gets past this check because IsTransactionExitStmt() returns success (and rightly so). So we end up creating a portal to run the ROLLBACK TO command. This command fails because the SAVEPOINT A does not exist and we throw an error. But since the transaction block is already in TBLOCK_ABORT state, the AbortCurrentTransaction doesn't do any work. The cleanup routine for the portal remains unexecuted. Later when the backend exits and cleanup routines are called, the assertion inside AtCleanup_Portals fails. I am not sure what is the right fix for this. I see Tom fixed a similar complain sometime back. http://git.postgresql.org/pg/commitdiff/6252c4f9e201f619e5eebda12fa867acd4e4200e But may its not enough to cover all code paths, as demonstrated by this example. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?
On Wed, Feb 1, 2012 at 2:39 PM, Simon Riggs wrote: > On Wed, Feb 1, 2012 at 7:31 PM, Peter Eisentraut wrote: >> On sön, 2012-01-29 at 22:01 +, Simon Riggs wrote: >>> Patch now locks index in AccessExclusiveLock in final stage of drop. >> >> Doesn't that defeat the point of doing the CONCURRENTLY business in the >> first place? > > That was my initial reaction. > > We lock the index in AccessExclusiveLock only once we are certain > nobody else is looking at it any more. > > So its a Kansas City Shuffle, with safe locking in case of people > doing strange low level things. Yeah, I think this is much safer, and in this version that doesn't seem to harm concurrency. Given our previous experiences in this area, I wouldn't like to bet my life savings on this having no remaining bugs - but if it does, I can't find them. I'll mark this "Ready for Committer". -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.2] sepgsql's DROP Permission checks
On Sat, Jan 28, 2012 at 1:53 PM, Kohei KaiGai wrote: > 2012/1/26 Robert Haas : >> On Thu, Jan 26, 2012 at 7:27 AM, Kohei KaiGai wrote: >>> It seems to me reasonable design. >>> The attached patch is rebased one according to your perform-deletion patch. >> >> That looks pretty sensible. But I don't think this is true any more: >> >> + Please note that it shall not be checked on the objects removed by >> + cascaded deletion according to the standard manner in SQL. >> >> I've been thinking more about the question of object access hooks with >> arguments as well. In a couple of designs you've proposed, we've >> needed InvokeObjectAccessHook0..11 or whatever, and I don't like that >> design - it seems grotty and not type-safe. However, looking at what >> you did in this patch, I have another idea. Suppose that we add ONE >> additional argument to the object access hook function, as you've done >> here, and if a particular type of hook invocation needs multiple >> arguments, it can pass them with a struct. In fact, let's use a >> struct regardless, for uniformity, and pass the value as a void *. >> That is: >> >> typedef struct ObjectAccessDrop { >> int dropflags; >> } ObjectAccessDrop; >> >> At the call site, we do this: >> >> if (object_access_hook) >> { >> ObjectAccessDrop arg; >> arg.dropflags = flags; >> InvokeObjectAccessHook(..., arg); >> } >> >> If there's no argument, then we can just do: >> >> InvokeObjectAccessHook(..., NULL); >> >> The advantage of this is that if we change the structure definition, >> loadable modules compiled against a newer code base should either (1) >> still work or (2) fail to compile. The way we have it right now, if >> we decide to pass a second argument (say, the DropBehavior) to the >> hook, we're potentially going to silently break sepgsql no matter how >> we do it. But if we enforce use of a struct, then the only thing the >> client should ever be doing with the argument it gets is casting it to >> ObjectAccessDrop *. Then, if we've added fields to the struct, the >> code will still do the right thing even if the field order has been >> changed or whatever. If we've removed fields or changed their data >> types, things should blow up fairly obviously instead of seeming to >> work but actually failing. >> >> Thoughts? >> > I also think your idea is good; flexible and reliable toward future > enhancement. > > If we have one point to improve this idea, is it needed to deliver > "access", "classid", > "objectid" and "subid" as separated argument? > > If we define a type to deliver information on object access hook as follows: > > typedef struct { > ObjectAccessType access; > ObjectAddress address; > union { > struct { > int flags; > } drop; > } arg; > } ObjectAccessHookData; > > All the argument that object_access_hook takes should be a pointer of this > structure only, and no need to type cast on the module side. > > One disadvantage is that it needs to set up this structure on caller > side including > ObjectAccessType and ObjectAddress information. However, it can be embedded > within preprocessor macro to keep nums of lines as currently we do. > > example: > #define InvaokeDropAccessHook(classid, objectid, objsubid, flags) \ > do { > if (object_access_hook) > { > ObjectAccessHookData __hook_data; > > __hook_data.access = OAT_DROP; > __hook_data.address.classId = (classid); > __hook_data.address.objectId = (objectid); > __hook_data.address.objectSubid = (objsubid); > __hook_data.args.drop.flags = (flags); > > (*object_access_hook)(&__hook_data); > } > } while (0) > > How about your opinion? I don't see any real advantage of that. One advantage of the current design is that any hook types which *don't* require extra arguments need not set up and pass a structure; they can just pass NULL. So I suggest we keep classid, objid, and subid as separate arguments, and just add one new one which can be type-specific. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch for parallel pg_dump
On Thu, Feb 2, 2012 at 8:31 PM, Joachim Wieland wrote: > On Wed, Feb 1, 2012 at 12:24 PM, Robert Haas wrote: >> I think we're more-or-less proposing to rename "Archive" to >> "Connection", aren't we? >> >> And then ArchiveHandle can store all the things that aren't related to >> a specific connection. > > How about something like that: > (Hopefully you'll come up with better names...) > > StateHandle { > Connection > Schema > Archive > Methods > union { > DumpOptions > RestoreOptions > } > } > > Dumping would mean to do: > > Connection -> Schema -> Archive using DumpOptions through the > specified Methods > > Restore: > > Archive -> Schema -> Connection using RestoreOptions through the > specified Methods > > Dumping from one database and restoring into another one would be two > StateHandles with different Connections, Archive == NULL, Schema > pointing to the same Schema, Methods most likely also pointing to the > same function pointer table and each with different Options in the > union of course. > > Granted, you could say that above I've only grouped the elements of > the ArchiveHandle, but I don't really see that breaking it up into > several objects makes it any better or easier. There could be some > accessor functions to hide the details of the whole object to the > different consumers. I'm not sure I understand what the various structures would contain. My gut feeling for how to begin grinding through this is to go through and do the following: 1. Rename Archive to Connection. 2. Add a PGconn object to it. 3. Change the declaration inside ArchiveHandle from "Archive public" to "Connection source_connection". I think that'd get us significantly closer to sanity and be pretty simple to understand, and then we can take additional passes over it until we're happy with what we've got. If you're OK with that much I'll go do it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ecpglib use PQconnectdbParams
* Peter Eisentraut wrote: I noticed ecpglib still uses PQconnectdb() with a craftily assembled connection string. Here is a patch to use PQconnectdbParams() instead. + const char *conn_keywords[6]; + const char *conn_values[6]; Shouldn't these be [7]? You can have up to 6 items, plus the terminator. -- Christian Ullrich -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch pg_is_in_backup()
On Fri, Feb 3, 2012 at 6:52 PM, Magnus Hagander wrote: > On Fri, Feb 3, 2012 at 10:47, Fujii Masao wrote: >> On Fri, Feb 3, 2012 at 6:10 PM, Bernd Helmle wrote: >>> >>> >>> --On 3. Februar 2012 13:21:11 +0900 Fujii Masao >>> wrote: >>> It seems to be more user-friendly to introduce a view like pg_stat_backup rather than the function returning an array. >>> >>> >>> I like this idea. A use case i saw for monitoring backup_label's in the >>> past, was mainly to discover a forgotten exclusive pg_stop_backup() (e.g. >>> due to broken backup scripts). If the view would be able to distinguish >>> both, exclusive and non-exclusive backups, this would be great. >> >> Agreed. Monitoring an exclusive backup is very helpful. But I wonder >> why we want to monitor non-exclusive backup. Is there any use case? > > Actually, we can already monitor much of the non-exclusive one through > pg_stat_replication. Including the info on when it was started (at > least in almost every case, that will be more or less the > backend_start time for that one) Right. >> If we want to monitor non-exclusive backup, why not pg_dump backup? > > .. which we can also monitor though pg_stat_activity by looking at > application_name (which can be faked of course, but still) Yep. >> If there is no use case, it seems sufficient to implement the function >> which reports the information only about exclusive backup. > > Yeah, thinking more of it, i think I agree. But the function should > then probably be named in such a way that it's clear that we're > talking about exclusive backups, e.g. not pg_is_in_backup() but > instead pg_is_in_exclusive_backup() (renamed if we change it to return > the timestamp instead, of course, but you get the idea) Agreed. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch pg_is_in_backup()
On Fri, Feb 3, 2012 at 10:47, Fujii Masao wrote: > On Fri, Feb 3, 2012 at 6:10 PM, Bernd Helmle wrote: >> >> >> --On 3. Februar 2012 13:21:11 +0900 Fujii Masao >> wrote: >> >>> It seems to be more user-friendly to introduce a view like pg_stat_backup >>> rather than the function returning an array. >> >> >> I like this idea. A use case i saw for monitoring backup_label's in the >> past, was mainly to discover a forgotten exclusive pg_stop_backup() (e.g. >> due to broken backup scripts). If the view would be able to distinguish >> both, exclusive and non-exclusive backups, this would be great. > > Agreed. Monitoring an exclusive backup is very helpful. But I wonder > why we want to monitor non-exclusive backup. Is there any use case? Actually, we can already monitor much of the non-exclusive one through pg_stat_replication. Including the info on when it was started (at least in almost every case, that will be more or less the backend_start time for that one) > If we want to monitor non-exclusive backup, why not pg_dump backup? .. which we can also monitor though pg_stat_activity by looking at application_name (which can be faked of course, but still) > If there is no use case, it seems sufficient to implement the function > which reports the information only about exclusive backup. Yeah, thinking more of it, i think I agree. But the function should then probably be named in such a way that it's clear that we're talking about exclusive backups, e.g. not pg_is_in_backup() but instead pg_is_in_exclusive_backup() (renamed if we change it to return the timestamp instead, of course, but you get the idea) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch pg_is_in_backup()
On Fri, Feb 3, 2012 at 6:10 PM, Bernd Helmle wrote: > > > --On 3. Februar 2012 13:21:11 +0900 Fujii Masao > wrote: > >> It seems to be more user-friendly to introduce a view like pg_stat_backup >> rather than the function returning an array. > > > I like this idea. A use case i saw for monitoring backup_label's in the > past, was mainly to discover a forgotten exclusive pg_stop_backup() (e.g. > due to broken backup scripts). If the view would be able to distinguish > both, exclusive and non-exclusive backups, this would be great. Agreed. Monitoring an exclusive backup is very helpful. But I wonder why we want to monitor non-exclusive backup. Is there any use case? If we want to monitor non-exclusive backup, why not pg_dump backup? If there is no use case, it seems sufficient to implement the function which reports the information only about exclusive backup. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch pg_is_in_backup()
--On 3. Februar 2012 13:21:11 +0900 Fujii Masao wrote: It seems to be more user-friendly to introduce a view like pg_stat_backup rather than the function returning an array. I like this idea. A use case i saw for monitoring backup_label's in the past, was mainly to discover a forgotten exclusive pg_stop_backup() (e.g. due to broken backup scripts). If the view would be able to distinguish both, exclusive and non-exclusive backups, this would be great. -- Thanks Bernd -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Fix float8 parsing of denormal values (on some platforms?)
On 02/03/2012 03:49 AM, Tom Lane wrote: Marti Raudsepp writes: We're already seeing first buildfarm failures, on system "narwhal" using an msys/mingw compiler. Yeah. After a full day's cycle, the answer seems to be that denormalized input works fine everywhere except: 1. mingw on Windows (even though MSVC builds work) 2. some Gentoo builds fail (knowing that platform, the phase of the moon is probably the most predictable factor determining this). I'm inclined at this point to remove the regression test cases, because we have no principled way to deal with case #2. We could install alternative "expected" files that accept the failure, but then what's the point of having the test case at all? It might be worth pressuring the mingw folk to get with the program, but I'm not going to be the one to do that. No doubt they would tell us to upgrade the compiler. Narwhal's is horribly old. Neither of my mingw-based members is failing. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Fix float8 parsing of denormal values (on some platforms?)
Marti Raudsepp writes: > We're already seeing first buildfarm failures, on system "narwhal" > using an msys/mingw compiler. Yeah. After a full day's cycle, the answer seems to be that denormalized input works fine everywhere except: 1. mingw on Windows (even though MSVC builds work) 2. some Gentoo builds fail (knowing that platform, the phase of the moon is probably the most predictable factor determining this). I'm inclined at this point to remove the regression test cases, because we have no principled way to deal with case #2. We could install alternative "expected" files that accept the failure, but then what's the point of having the test case at all? It might be worth pressuring the mingw folk to get with the program, but I'm not going to be the one to do that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers