Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
On Tue, Nov 16, 2010 at 3:39 PM, Greg Smith g...@2ndquadrant.com wrote: I want to next go through and replicate some of the actual database level tests before giving a full opinion on whether this data proves it's worth changing the wal_sync_method detection. So far I'm torn between whether that's the right approach, or if we should just increase the default value for wal_buffers to something more reasonable. How about both? open_datasync seems problematic for a number of reasons - you get an immediate write-through whether you need it or not, including, as you point out, the case where the you want to write several blocks at once and then force them all out together. And 64kB for a ring buffer just seems awfully small. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
On Tue, Nov 16, 2010 at 6:25 PM, Josh Berkus j...@agliodbs.com wrote: On 11/16/10 12:39 PM, Greg Smith wrote: I want to next go through and replicate some of the actual database level tests before giving a full opinion on whether this data proves it's worth changing the wal_sync_method detection. So far I'm torn between whether that's the right approach, or if we should just increase the default value for wal_buffers to something more reasonable. We'd love to, but wal_buffers uses sysV shmem. places tongue firmly in cheek Gee, too bad there's not some other shared-memory implementation we could use... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Running PostgreSQL as fast as possible no matter the consequences
On Fri, Nov 5, 2010 at 8:12 AM, Jon Nelson jnelson+pg...@jamponi.net wrote: On Fri, Nov 5, 2010 at 7:08 AM, Guillaume Cottenceau g...@mnc.ch wrote: Marti Raudsepp marti 'at' juffo.org writes: On Fri, Nov 5, 2010 at 13:32, A B gentosa...@gmail.com wrote: I was just thinking about the case where I will have almost 100% selects, but still needs something better than a plain key-value storage so I can do some sql queries. The server will just boot, load data, run, hopefully not crash but if it would, just start over with load and run. If you want fast read queries then changing fsync/full_page_writes/synchronous_commit won't help you. That illustrates how knowing the reasoning of this particular requests makes new suggestions worthwhile, while previous ones are now seen as useless. I disagree that they are useless - the stated mechanism was start, load data, and run. Changing the params above won't likely change much in the 'run' stage but would they help in the 'load' stage? Yes, they certainly will. And they might well help in the run stage, too, if there are temporary tables in use, or checkpoints flushing hint bit updates, or such things. It's also important to crank up checkpoint_segments and checkpoint_timeout very high, especially for the bulk data load but even afterwards if there is any write activity at all. And it's important to set shared_buffers correctly, too, which helps on workloads of all kinds. But as said upthread, turning off fsync, full_page_writes, and synchronous_commit are the things you can do that specifically trade reliability away to get speed. In 9.1, I'm hopeful that we'll have unlogged tables, which will even better than turning these parameters off, and for which I just posted a patch to -hackers. Instead of generating WAL and writing WAL to the OS and then NOT trying to make sure it hits the disk, we just won't generate it in the first place. But if PostgreSQL or the machine it's running on crashes, you won't need to completely blow away the cluster and start over; instead, the particular tables that you chose to create as unlogged will be truncated, and the rest of your data, including the system catalogs, will still be intact. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Running PostgreSQL as fast as possible no matter the consequences
On Mon, Nov 15, 2010 at 2:27 PM, Andy Colson a...@squeakycode.net wrote: On 11/15/2010 9:06 AM, Robert Haas wrote: In 9.1, I'm hopeful that we'll have unlogged tables, which will even better than turning these parameters off, and for which I just posted a patch to -hackers. Instead of generating WAL and writing WAL to the OS and then NOT trying to make sure it hits the disk, we just won't generate it in the first place. But if PostgreSQL or the machine it's running on crashes, you won't need to completely blow away the cluster and start over; instead, the particular tables that you chose to create as unlogged will be truncated, and the rest of your data, including the system catalogs, will still be intact. if I am reading this right means: we can run our db safely (with fsync and full_page_writes enabled) except for tables of our choosing? If so, I am very +1 for this! Yep. But we need some vic^H^Holunteers to reviews and test the patches. https://commitfest.postgresql.org/action/patch_view?id=424 Code review, benchmarking, or just general tinkering and reporting what you find out on the -hackers thread would be appreciated. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Difference between explain analyze and real execution time
2010/11/15 Artur Zając aza...@ang.com.pl: Why there is so big difference between explain analyze (0.710 ms) and real execution time (3309 ms)? Any suggestions? Could it be that it takes a long time to plan for some reason? How fast is a plain EXPLAIN? What happens if you start up psql, turn on \timing, and then run EXPLAIN ANALYZE from within an interactive session? That's usually a better way to test, as it avoids counting the session-startup overhead. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why dose the planner select one bad scan plan.
On Thu, Nov 11, 2010 at 3:43 AM, t...@fuzzy.cz wrote: Okay, I want to know how the planner computes the cost of constructing bitmap. And when the planner computes the cost of 'Bitmap Index Scan', if it considers the influence of memory cache? As when I do not clear the memory cache, I find the 'Bitmap Index Scan' is real fast than 'Seq Scan'. There are two things here - loading the data from a disk into a cache (filesystem cache at the OS level / shared buffers at the PG level), and then the execution itself. PostgreSQL estimates the first part using an effective_cache_size hint, and uses that to estimate the probability that the data are already in the filesystem cache. No, it does not do that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] anti-join chosen even when slower than old plan
On Sat, Nov 13, 2010 at 4:32 AM, Marc Mamin m.ma...@intershop.de wrote: Hello, Just a short though: Is it imaginable to compare the prognoses of the plans with the actual results and somehow log the worst cases ? a) to help the DBA locate bad statistics and queries b) as additional information source for the planner This could possibly affect parameters of your formula on the fly. Yeah, I've thought about this, but it's not exactly clear what would be most useful. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] temporary tables, indexes, and query plans
On Sat, Nov 13, 2010 at 10:41 AM, Tom Lane t...@sss.pgh.pa.us wrote: Jon Nelson jnelson+pg...@jamponi.net writes: OK. This is a highly distilled example that shows the behavior. BEGIN; CREATE TEMPORARY TABLE foo AS SELECT x AS A, chr(x % 75 + 32) AS b, ''::text AS c from generate_series(1,500) AS x; UPDATE foo SET c = 'foo' WHERE b = 'A' ; CREATE INDEX foo_b_idx on foo (b); [ and the rest of the transaction can't use that index ] OK, this is an artifact of the HOT update optimization. Before creating the index, you did updates on the table that would have been executed differently if the index had existed. When the index does get created, its entries for those updates are incomplete, so the index can't be used in transactions that could in principle see the unmodified rows. Is the in principle here because there might be an open snapshot other than the one under which CREATE INDEX is running, like a cursor? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] temporary tables, indexes, and query plans
On Sat, Nov 13, 2010 at 7:54 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Sat, Nov 13, 2010 at 10:41 AM, Tom Lane t...@sss.pgh.pa.us wrote: OK, this is an artifact of the HOT update optimization. Before creating the index, you did updates on the table that would have been executed differently if the index had existed. When the index does get created, its entries for those updates are incomplete, so the index can't be used in transactions that could in principle see the unmodified rows. Is the in principle here because there might be an open snapshot other than the one under which CREATE INDEX is running, like a cursor? Well, the test is based on xmin alone, not cmin, so it can't really tell the difference. It's unclear that it'd be worth trying. Yeah, I'm not familiar with the logic in that area of the code, so I can't comment all that intelligently. However, I feel like there's a class of things that could potentially be optimized if we know that the only snapshot they could affect is the one we're currently using. For example, when bulk loading a newly created table with COPY or CTAS, we could set the xmin-committed hint bit if it weren't for the possibility that some snapshot with a command-ID equal to or lower than our own might take a look and get confused. That seems to require a BEFORE trigger or another open snapshot. And, if we HOT-update a tuple created by our own transaction that can't be of interest to anyone else ever again, it would be nice to either mark it for pruning or maybe even overwrite it in place; similarly if we delete such a tuple it would be nice to schedule its execution. There are problems with all of these ideas, and I'm not totally sure how to make any of it work, but to me this sounds suspiciously like another instance of a somewhat more general problem. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] anti-join chosen even when slower than old plan
On Fri, Nov 12, 2010 at 4:15 AM, Cédric Villemain cedric.villemain.deb...@gmail.com wrote: I wondering if we could do something with a formula like 3 * amount_of_data_to_read / (3 * amount_of_data_to_read + effective_cache_size) = percentage NOT cached. That is, if we're reading an amount of data equal to effective_cache_size, we assume 25% caching, and plot a smooth curve through that point. In the examples above, we would assume that a 150MB read is 87% cached, a 1GB read is 50% cached, and a 3GB read is 25% cached. But isn't it already the behavior of effective_cache_size usage ? No. The ideal of trying to know what is actually in cache strikes me as an almost certain non-starter. It can change very quickly, even as a result of the query you're actually running. And getting the information we'd need in order to do it that way would be very expensive, when it can be done at all. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] anti-join chosen even when slower than old plan
On Fri, Nov 12, 2010 at 11:43 AM, Tom Lane t...@sss.pgh.pa.us wrote: I think his point is that we already have a proven formula (Mackert-Lohmann) and shouldn't be inventing a new one out of thin air. The problem is to figure out what numbers to apply the M-L formula to. I'm not sure that's really measuring the same thing, although I'm not opposed to using it if it produces reasonable answers. I've been thinking that we ought to try to use it in the context of the query as a whole rather than for individual table scans; the current usage already has some of that flavor but we haven't taken it to the logical conclusion. That's got a pretty severe chicken-and-egg problem though, doesn't it? You're going to need to know how much data you're touching to estimate the costs so you can pick the best plan, but you can't know how much data will ultimately be touched until you've got the whole plan. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] questions regarding shared_buffers behavior
On Sun, Nov 7, 2010 at 10:03 PM, Cédric Villemain cedric.villemain.deb...@gmail.com wrote: 2010/11/8 Mark Rostron mrost...@ql2.com: What is the procedure that postgres uses to decide whether or not a table/index block will be left in the shared_buffers cache at the end of the operation? The only special cases are for sequential scans and VACUUM, which use continuously re-use a small section of the buffer cache in some cases instead. Thanks - the part about sequential scans and the re-use of a small section of shared_buffers is the bit I was interested in. I don't suppose you would be able to tell me how large that re-useable area might be? There are 256KB per seqscan and 256KB per vacuum. I suggest you to go reading src/backend/storage/buffer/README Note that there is a different, higher limit for the bulk write strategy when using COPY IN or CTAS. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] anti-join chosen even when slower than old plan
escape hatches elsewhere. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] anti-join chosen even when slower than old plan
On Thu, Nov 11, 2010 at 1:23 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: Let's back up a moment and talk about what the overall goal is, here. Ideally, we would like PostgreSQL to have excellent performance at all times, under all circumstances, with minimal tuning. Therefore, we do NOT want to add variables that will, by design, need constant manual adjustment. That is why I suggested that Tom's idea of an assume_cached GUC is probably not what we really want to do. On the other hand, what I understand Mladen to be suggesting is something completely different. He's basically saying that, of course, he wants it to work out of the box most of the time, but since there are guaranteed to be cases where it doesn't, how about providing some knobs that aren't intended to be routinely twaddled but which are available in case of emergency? Bravo, I say! Um ... those are exactly the same thing. You're just making different assumptions about how often you will need to twiddle the setting. Neither assumption is based on any visible evidence, unfortunately. I was thinking of assume_cached as something that could be set-and-forget most of the time, and you're entirely right to criticize it on the grounds that maybe it wouldn't. But to support a proposal that doesn't even exist yet on the grounds that it *would* be set-and-forget seems a tad inconsistent. We can't make that judgment without a whole lot more details than have been provided yet for any idea in this thread. Well, maybe I misunderstood what you were proposing. I had the impression that you were proposing something that would *by design* require adjustment for each query, so evidently I missed the point. It seems to me that random_page_cost and seq_page_cost are pretty close to set-and-forget already. We don't have many reports of people needing to tune these values on a per-query basis; most people seem to just guesstimate a cluster-wide value and call it good. Refining the algorithm should only make things better. I do think that something based around a settable-per-table caching percentage might be a reasonable way to proceed. But the devil is in the details, and we don't have those yet. I think one of the larger devils in the details is deciding how to estimate the assumed caching percentage when the user hasn't specified one. Frankly, I suspect that if we simply added a reloption called assumed_caching_percentage and made it default to zero, we would make a bunch of DBAs happy; they'd knock down seq_page_cost and random_page_cost enough to account for the general level of caching and then bump assumed_caching_percentage up for hot tables/indexes (or ones that they want to have become hot). I think we can do better than that, but the right formula isn't exactly obvious. I feel safe saying that if effective_cache_size=1GB and table_size=4MB, then we ought to take the table as fully cached. But it's far from clear what caching percentage we should assume when table_size=400MB, and it seems like the sort of thing that will lead to endless bikeshedding. There's probably no perfect answer, but I feel we can likely come up with something that is better than a constant (which would probably still be better than what we have now). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] anti-join chosen even when slower than old plan
On Thu, Nov 11, 2010 at 1:58 PM, Tom Lane t...@sss.pgh.pa.us wrote: I wrote: I do think that something based around a settable-per-table caching percentage might be a reasonable way to proceed. BTW ... on reflection it seems that this would *not* solve the use-case Kevin described at the start of this thread. What he's got AIUI is some large tables whose recent entries are well-cached, and a lot of queries that tend to hit that well-cached portion, plus a few queries that hit the whole table and so see largely-not-cached behavior. We can't represent that very well with a caching knob at the table level. Either a high or a low setting will be wrong for one set of queries or the other. Yeah. For Kevin's case, it seems like we want the caching percentage to vary not so much based on which table we're hitting at the moment but on how much of it we're actually reading. However, the two problems are related enough that I think it might be feasible to come up with one solution that answers both needs, or perhaps two somewhat-intertwined solutions. The most practical solution for his case still seems to be to twiddle some GUC or other locally in the maintenance scripts that do the full-table-scan queries. Unfortunately we don't have an equivalent of per-session SET (much less SET LOCAL) for per-relation attributes. Not sure if we want to go there. I doubt it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] anti-join chosen even when slower than old plan
On Thu, Nov 11, 2010 at 2:35 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: Yeah. For Kevin's case, it seems like we want the caching percentage to vary not so much based on which table we're hitting at the moment but on how much of it we're actually reading. Well, we could certainly take the expected number of pages to read and compare that to effective_cache_size. The thing that's missing in that equation is how much other stuff is competing for cache space. I've tried to avoid having the planner need to know the total size of the database cluster, but it's kind of hard to avoid that if you want to model this honestly. I'm not sure I agree with that. I mean, you could easily have a database that is much larger than effective_cache_size, but only that much of it is hot. Or, the hot portion could move around over time. And for reasons of both technical complexity and plan stability, I don't think we want to try to model that. It seems perfectly reasonable to say that reading 25% of effective_cache_size will be more expensive *per-page* than reading 5% of effective_cache_size, independently of what the total cluster size is. Would it be at all workable to have an estimate that so many megs of a table are in cache (independently of any other table), and then we could scale the cost based on the expected number of pages to read versus that number? The trick here is that DBAs really aren't going to want to set such a per-table number (at least, most of the time) so we need a formula to get to a default estimate for that number based on some simple system-wide parameters. I'm not sure if that's any easier. That's an interesting idea. For the sake of argument, suppose we assume that a relation which is less than 5% of effective_cache_size will be fully cached; and anything larger we'll assume that much of it is cached. Consider a 4GB machine with effective_cache_size set to 3GB. Then we'll assume that any relation less than 153MB table is 100% cached, a 1 GB table is 15% cached, and a 3 GB table is 5% cached. That doesn't seem quite right, though: the caching percentage drops off very quickly after you exceed the threshold. *thinks* I wondering if we could do something with a formula like 3 * amount_of_data_to_read / (3 * amount_of_data_to_read + effective_cache_size) = percentage NOT cached. That is, if we're reading an amount of data equal to effective_cache_size, we assume 25% caching, and plot a smooth curve through that point. In the examples above, we would assume that a 150MB read is 87% cached, a 1GB read is 50% cached, and a 3GB read is 25% cached. BTW, it seems that all these variants have an implicit assumption that if you're reading a small part of the table it's probably part of the working set; which is an assumption that could be 100% wrong. I don't see a way around it without trying to characterize the data access at an unworkably fine level, though. Me neither, but I think it will frequently be true, and I'm not sure it will hurt very much when it isn't. I mean, if you execute the same query repeatedly, that data will become hot soon enough. If you execute a lot of different queries that each touch a small portion of a big, cold table, we might underestimate the costs of the index probes, but so what? There's probably no better strategy for accessing that table anyway. Perhaps you can construct an example where this underestimate affects the join order in an undesirable fashion, but I'm having a hard time getting worked up about that as a potential problem case. Our current system - where we essentially assume that the caching percentage is uniform across the board - can have the same problem in less artificial cases. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] anti-join chosen even when slower than old plan
On Wed, Nov 10, 2010 at 10:15 AM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: But wait -- it turns out that this pain was self-inflicted. Based on heavy testing of the interactive queries which users run against this database we tuned the database for fully-cached settings, with both random_page_cost and _seq_page_cost at 0.1. In a practical sense, the users are almost always running these queries against very recent data which is, in fact, heavily cached -- so it's no surprise that the queries they run perform best with plans based on such costing. The problem is that these weekly maintenance runs need to pass the entire database, so caching effects are far less pronounced. If I set seq_page_cost = 1 and random_page_cost = 2 I get exactly the same (fast) plan as above. I guess the lesson here is not to use the same costing for database-wide off-hours maintenance queries as for ad hoc queries against a smaller set of recent data by users who expect quick response time. I'm fine with tweaking the costs in our maintenance scripts, but it does tend to make me daydream about how the optimizer might possibly auto-tweak such things Wow. That's fascinating, and if you don't mind, I might mention this potential problem in a future talk at some point. I've given some thought in the past to trying to maintain some model of which parts of the database are likely to be cached, and trying to adjust costing estimates based on that data. But it's a really hard problem, because what is and is not in cache can change relatively quickly, and you don't want to have too much plan instability. Also, for many workloads, you'd need to have pretty fine-grained statistics to figure out anything useful, which would be expensive and difficult to maintain. But thinking over what you've written here, I'm reminded of something Peter said years ago, also about the optimizer. He was discussed the ratio of the estimated cost to the actual cost and made an off-hand remark that efforts had been made over the years to make that ratio more consistent (i.e. improve the quality of the cost estimates) but that they'd been abandoned because they didn't necessarily produce better plans. Applying that line of thinking to this problem, maybe we should give up on trying to make the estimates truly model reality, and focus more on assigning them values which work well in practice. For example, in your case, it would be sufficient to estimate the amount of data that a given query is going to grovel through and then applying some heuristic to choose values for random_page_cost and seq_page_cost based on the ratio of that value to, I don't know, effective_cache_size. Unfortunately, to know how much data we're going to grovel through, we need to know the plan; and to decide on the right plan, we need to know how much data we're going to grovel through. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] anti-join chosen even when slower than old plan
On Wed, Nov 10, 2010 at 6:07 PM, Tom Lane t...@sss.pgh.pa.us wrote: Kevin Grittner kevin.gritt...@wicourts.gov writes: Robert Haas robertmh...@gmail.com wrote: Unfortunately, to know how much data we're going to grovel through, we need to know the plan; and to decide on the right plan, we need to know how much data we're going to grovel through. And that's where they've been ending. The only half-sane answer I've thought of is to apply a different cost to full-table or full-index scans based on the ratio with effective cache size. Kevin, yes, good point. Bravo! Let's do that. Details TBD, but suppose effective_cache_size = 1GB. What we know for sure is that a 4 GB table is not going to be fully cached but a 4 MB table may well be. In fact, I think we should assume that the 4 MB table IS cached, because the point is that if it's used at all, it soon will be. It's almost certainly a bad idea to build a plan around the idea of minimizing reads from that 4 MB table in favor of doing a substantial amount of additional work somewhere else. I suppose this could break down if you had hundreds and hundreds of 4 MB tables all of which were accessed regularly, but that's an unusual situation, and anyway it's not clear that assuming them all uncached is going to be any better than assuming them all cached. This might have some connection to some rather half-baked ideas I've been having in connection with the generalized-inner-indexscan problem. I don't have anything in the way of a coherent presentation to make yet, but the thing I'm being forced to realize is that sane modeling of a complex subplan that's on the inside of a nestloop join requires treating *every* scan type as having different costs the first time versus during rescan. If the total amount of data touched in the query is less than effective_cache_size, it's not unreasonable to suppose that I/O costs during rescan might be zero, even for a seqscan or a non-parameterized indexscan. In fact, only parameterized indexscans would be able to touch pages they'd not touched the first time, and so they ought to have higher not lower rescan costs in this environment. But once the total data volume exceeds effective_cache_size, you have to do something different since you shouldn't any longer assume the data is all cached from the first scan. (This isn't quite as hard as the case you're talking about, since I think the relevant data volume is the sum of the sizes of the tables used in the query; which is easy to estimate at the start of planning, unlike the portion of the tables that actually gets touched.) Well, we don't want the costing model to have sharp edges. effective_cache_size can't be taken as much more than an educated guess, and what actually happens will depend a lot on what else is going on on the system. If only one query is running on a system at a time and it is repeatedly seq-scanning a large table, the cost of reading pages in will be very small until the table grows large enough that you can't fit the whole thing in memory at once, and then will abruptly go through the roof. But realistically you're not going to know exactly where that edge is going to be, because you can't predict exactly how much concurrent activity there will be, for example, or how much work_mem allocations will push out of the OS buffer cache. So I'm thinking we should start the costs at something like 0.05/0.05 for tables that are much smaller than effective_cache_size and ramp up to 4/1 for tables that are larger than effective_cache_size. Maybe just by linearly ramping up, although that has a certain feeling of being without mathemetical soundness. An idea that isn't even half-baked yet is that once we had a cost model like that, we might be able to produce plans that are well-tuned for a heavily cached environment by applying the rescan cost model even to the first scan for a particular query. So that might lead to some sort of assume_cached GUC parameter, and perhaps Kevin could tune his reporting queries by turning that off instead of messing with individual cost constants. I think the real goal here should be to try to avoid needing a GUC. A lot of people could benefit if the system could make some attempt to recognize on its own which queries are likely to be cached. We already have parameters you can hand-tune for each query as necessary. Being able to set some parameters system-wide and then get sensible behavior automatically would be much nicer. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Regression: 8.3 2 seconds - 8.4 100+ seconds
On Wed, Oct 27, 2010 at 8:41 AM, Francisco Reyes li...@stringsutils.com wrote: - Nested Loop (cost=293.80..719.87 rows=2434522 width=4) (actual time=228.867..241.909 rows=2 loops=1) - HashAggregate (cost=293.80..294.13 rows=33 width=29) (actual time=169.551..169.553 rows=2 loops=1) - Nested Loop (cost=11.33..293.71 rows=33 width=29) (actual time=145.940..169.543 rows=2 loops=1) - HashAggregate (cost=11.33..11.66 rows=33 width=4) (actual time=64.730..64.732 rows=2 loops=1) - Index Scan using members_commonid on members (cost=0.00..11.25 rows=33 width=4) (actual time = 64.688..64.703 rows=2 loops=1) Index Cond: (commonid = 3594) - Index Scan using cards_membid on cards (cost=0.00..8.53 rows=1 width=33) (actual time= 52.400..52.401 rows=1 loops=2) Index Cond: (public.cards.membid = public.members.membid) - Index Scan using cards_useraccount on cards (cost=0.00..12.88 rows=2 width=33) (actual time=36.172.. 36.173 rows=1 loops=2) Index Cond: (public.cards.useraccount = public.cards.useraccount) This part looks really strange to me. Here we have a nested loop whose outer side is estimated to produce 33 rows and whose outer side is estimated to produce 2 rows. Given that, one would think that the estimate for the loop as a whole shouldn't be more than 33 * 2 = 66 rows (or maybe a bit more if 33 is really 33.4999 rounded down, and 2 is really 2.4 rounded down). But the actual estimate is 5 orders of magnitude larger. How is that possible? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] BBU Cache vs. spindles
On Fri, Oct 29, 2010 at 11:56 AM, Aidan Van Dyk ai...@highrise.ca wrote: 1) The pages you write to must be in the page cache, or your memcpy is going to fault them in. With a plain write, you don't need the over-written page in the cache. I seem to remember a time many years ago when I got bitten by this problem. The fact that our I/O is in 8K pages means this could be a pretty severe hit, I think. 2) Now, instead of the torn-page problem being FS block/sector sized base, you can now actually have a possibly arbitrary amount of the block memory written when the kernel writes out the page. you *really* need full-page-writes. Yeah. 3) The mmap overhead required for the kernel to setup the mappings is less than the repeated syscalls of a simple write(). You'd expect to save something from that; but on the other hand, at least on 32-bit systems, there's a very limited number of 1GB files that can be simultaneously mapped into one address space, and it's a lot smaller than the number of file descriptors that you can have open. Rumor has it that cutting down the number of fds that can stay open simultaneously is pretty bad for performance, so cutting it down to a number you can count on one hand (maybe one finger) would probably be bad. Maybe on 64-bit it would be OK but it seems like an awful lot of complexity for at most a minor savings (and a pretty bad anti-savings if point #1 kicks in). Anyway this is all totally off-topic... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] temporary tables, indexes, and query plans
On Wed, Oct 27, 2010 at 3:44 PM, Justin Pitts justinpi...@gmail.com wrote: Jason Pitts: RE: changing default_statistics_target (or via ALTER TABLE SET STATS) not taking effect until ANALYZE is performed. I did already know that, but it's probably good to put into this thread. However, you'll note that this is a temporary table created at the beginning of a transaction. ( giving up on replying to the group; the list will not accept my posts ) Evidently it's accepting some of them... I've been following the thread so long I had forgotten that. I rather strongly doubt that analyze can reach that table's content inside that transaction, if you are creating, populating, and querying it all within that single transaction. Actually I don't think that's a problem, at least for a manual ANALYZE. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow Query- Simple taking
On Tue, Oct 19, 2010 at 2:21 PM, Ozer, Pam po...@automotive.com wrote: I have the following query running on 8.4, which takes 3516 ms. It is very straight forward. It brings back 116412 records. The explain only takes 1348ms select VehicleUsed.VehicleUsedId as VehicleUsedId , VehicleUsed.VehicleUsedDisplayPriority as VehicleUsedDisplayPriority , VehicleUsed.VehicleYear as VehicleYear , VehicleUsed.VehicleUsedDisplayPriority as VehicleUsedDisplayPriority , VehicleUsed.HasVehicleUsedThumbnail as HasVehicleUsedThumbnail , VehicleUsed.HasVehicleUsedPrice as HasVehicleUsedPrice , VehicleUsed.VehicleUsedPrice as VehicleUsedPrice , VehicleUsed.HasVehicleUsedMileage as HasVehicleUsedMileage , VehicleUsed.VehicleUsedMileage as VehicleUsedMileage , VehicleUsed.IsCPO as IsCPO , VehicleUsed.IsMTCA as IsMTCA from VehicleUsed where ( VehicleUsed.VehicleMakeId = 28 ) order by VehicleUsed.VehicleUsedDisplayPriority , VehicleUsed.VehicleYear desc , VehicleUsed.HasVehicleUsedThumbnail desc , VehicleUsed.HasVehicleUsedPrice desc , VehicleUsed.VehicleUsedPrice , VehicleUsed.HasVehicleUsedMileage desc , VehicleUsed.VehicleUsedMileage , VehicleUsed.IsCPO desc , VehicleUsed.IsMTCA desc The explain is also very straight forward Sort (cost=104491.48..105656.24 rows=116476 width=41) (actual time=1288.413..1325.457 rows=116412 loops=1) Sort Key: vehicleuseddisplaypriority, vehicleyear, hasvehicleusedthumbnail, hasvehicleusedprice, vehicleusedprice, hasvehicleusedmileage, vehicleusedmileage, iscpo, ismtca Sort Method: quicksort Memory: 19443kB - Bitmap Heap Scan on vehicleused (cost=7458.06..65286.42 rows=116476 width=41) (actual time=34.982..402.164 rows=116412 loops=1) Recheck Cond: (vehiclemakeid = 28) - Bitmap Index Scan on vehicleused_i08 (cost=0.00..7341.59 rows=116476 width=0) (actual time=22.854..22.854 rows=116412 loops=1) Index Cond: (vehiclemakeid = 28) Total runtime: 1348.487 ms Can someone tell me why after it runs the index scan it hen runs a bitmap heap scan? It should not take this long to run should it? If I limit the results it comes back in 300ms. It doesn't. The EXPLAIN output shows it running the bitmap index scan first and then bitmap heap scan. The bitmap index scan is taking 22 ms, and the bitmap index and bitmap heap scans combined are taking 402 ms. The sort is then taking another 800+ ms for a total of 1325 ms. Any additional time is spent returning rows to the client. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow Query- Simple taking
On Thu, Oct 28, 2010 at 10:39 AM, Robert Haas robertmh...@gmail.com wrote: Can someone tell me why after it runs the index scan it hen runs a bitmap heap scan? It should not take this long to run should it? If I limit the results it comes back in 300ms. It doesn't. The EXPLAIN output shows it running the bitmap index scan first and then bitmap heap scan. The bitmap index scan is taking 22 ms, and the bitmap index and bitmap heap scans combined are taking 402 ms. The sort is then taking another 800+ ms for a total of 1325 ms. Any additional time is spent returning rows to the client. Doh! I misread your email. You had it right, and I'm all wet. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow Query- Simple taking
On Tue, Oct 19, 2010 at 6:05 PM, Ozer, Pam po...@automotive.com wrote: On mysql the same query only takes milliseconds not seconds. That's a big difference. I can believe that MySQL is faster, because they probably don't need to do the bitmap heap scan. There is a much-anticipated feature called index-only scans that we don't have yet in PG, which would help cases like this a great deal. But I don't see how MySQL could send back 116,000 rows to the client in milliseconds, or sort them that quickly. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow Query- Simple taking
On Thu, Oct 28, 2010 at 11:23 AM, Mladen Gogala mladen.gog...@vmsinfo.com wrote: On 10/28/2010 10:53 AM, Richard Broersma wrote: On Thu, Oct 28, 2010 at 7:51 AM, Mladen Gogala mladen.gog...@vmsinfo.com wrote: Yyesss! Any time frame on that? Can you make it into 9.0.2? Maybe 9.1.0 or 9.2.0 :) 9.0's features are already frozen. Well, with all this global warming around us, index scans may still thaw in time to make it into 9.0.2 I fear this is not going to happen for 9.1. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query much faster with enable_seqscan=0
On Tue, Oct 12, 2010 at 10:28 PM, Samuel Gendler sgend...@ideasculptor.com wrote: On Tue, Sep 21, 2010 at 4:30 PM, Tom Lane t...@sss.pgh.pa.us wrote: Ogden li...@darkstatic.com writes: SELECT tr.id, tr.sid FROM test_registration tr, INNER JOIN test_registration_result r on (tr.id = r.test_registration_id) WHERE. tr.test_administration_id='32a22b12-aa21-11df-a606-96551e8f4e4c'::uuid GROUP BY tr.id, tr.sid Seeing that tr.id is a primary key, I think you might be a lot better off if you avoided the inner join and group by. I think what you really want here is something like SELECT tr.id, tr.sid FROM test_registration tr WHERE tr.test_administration_id='32a22b12-aa21-11df-a606-96551e8f4e4c'::uuid AND EXISTS(SELECT 1 FROM test_registration_result r WHERE tr.id = r.test_registration_id) regards, tom lane Could you explain the logic behind why this structure is better than the other? Is it always the case that one should just always use the 'exists(select 1 from x...)' structure when trying to strip rows that don't join or is it just the case when you know that the rows which do join are a fairly limited subset? Does the same advantage exist if filtering rows in the joined table on some criteria, or is it better at that point to use an inner join and add a where clause to filter the joined rows. select table1.columns from table1, table2 where table1.column = 'some_value' and table1.fk = table2.pk AND table2.column = 'some_other_value' versus select table1.columns from table1 where table1.column = 'some_value' and exists(select 1 from table2 where table1.fk = table2.pk and table2.column ='some_other_value') I don't think there's much difference between those two cases. I think Tom's point was that GROUP BY can be expensive - which it certainly can. It's absolutely necessary and unavoidable for certain queries, of course, but don't include it unless you need it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] how to get the total number of records in report
On Tue, Oct 19, 2010 at 7:56 PM, Merlin Moncure mmonc...@gmail.com wrote: On Mon, Oct 18, 2010 at 1:16 AM, AI Rumman rumman...@gmail.com wrote: At present for reporting I use following types of query: select crm.*, crm_cnt.cnt from crm, (select count(*) as cnt from crm) crm_cnt; Here count query is used to find the total number of records. Same FROM clause is copied in both the part of the query. Is there any other good alternative way to get this similar value? Probably the best way to do this type of thing is handle it on the client. However, if you want to do it this way and your from clause is more complex than 'from table', you can possibly improve on this with a CTE: with q as (select * from something expensive) select q.* q_cnt.cnt from q, (select count(*) as cnt from q) q_cnt; The advantage here is that the CTE is materialized without having to do the whole query again. This can be win or loss depending on the query. What about select crm.*, sum(1) over () as crm_count from crm limit 10; -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] BBU Cache vs. spindles
On Wed, Oct 27, 2010 at 12:41 AM, Rob Wultsch wult...@gmail.com wrote: On Tue, Oct 26, 2010 at 7:25 AM, Robert Haas robertmh...@gmail.com wrote: On Tue, Oct 26, 2010 at 10:13 AM, Rob Wultsch wult...@gmail.com wrote: The double write buffer is one of the few areas where InnoDB does more IO (in the form of fsynch's) than PG. InnoDB also has fuzzy checkpoints (which help to keep dirty pages in memory longer), buffering of writing out changes to secondary indexes, and recently tunable page level compression. Baron Schwartz was talking to me about this at Surge. I don't really understand how the fuzzy checkpoint stuff works, and I haven't been able to find a good description of it anywhere. How does it keep dirty pages in memory longer? Details on the other things you mention would be interesting to hear, too. For checkpoint behavior: http://books.google.com/books?id=S_yHERPRZScCpg=PA606lpg=PA606dq=fuzzy+checkpointsource=blots=JJrzRUKBGhsig=UOMPsRy5E-YDgjAFkaSVn3dps_Mhl=enei=_k8yTOfeHYzZnAepyumLBAsa=Xoi=book_resultct=resultresnum=8ved=0CEYQ6AEwBw#v=onepageq=fuzzy%20checkpointf=false I would think that best case behavior sharp checkpoints with a large checkpoint_completion_target would have behavior similar to a fuzzy checkpoint. Well, under that definition of a fuzzy checkpoint, our checkpoints are fuzzy even with checkpoint_completion_target=0. What Baron seemed to be describing was a scheme whereby you could do what I might call partial checkpoints. IOW, you want to move the redo pointer without writing out ALL the dirty buffers in memory, so you write out the pages with the oldest LSNs and then move the redo pointer to the oldest LSN you have left. Except that doesn't quite work, because the page might have been dirtied at LSN X and then later updated again at LSN Y, and you still have to flush it to disk before moving the redo pointer to any value X. So you work around that by maintaining a first dirtied LSN for each page as well as the current LSN. I'm not 100% sure that this is how it works or that it would work in PG, but even assuming that it is and does, I'm not sure what the benefit is over the checkpoint-spreading logic we have now. There might be some benefit in sorting the writes that we do, so that we can spread out the fsyncs. So, write all the blocks to a give file, fsync, and then repeat for each underlying data file that has at least one dirty block. But that's completely orthogonal to (and would actually be hindered by) the approach described in the preceding paragraph. Insert (for innodb 1.1+ evidently there is also does delete and purge) buffering: http://dev.mysql.com/doc/refman/5.5/en/innodb-insert-buffering.html We do something a bit like this for GIST indices. It would be interesting to see if it also has a benefit for btree indices. For a recent ~800GB db I had to restore, the insert buffer saved 92% of io needed for secondary indexes. Compression: http://dev.mysql.com/doc/innodb-plugin/1.0/en/innodb-compression-internals.html For many workloads 50% compression results in negligible impact to performance. For certain workloads compression can help performance. Please note that InnoDB also has non-tunable toast like feature. Interesting. I am surprised this works well. It seems that this only works for pages that can be compressed by =50%, which seems like it could result in a lot of CPU wasted on failed attempts to compress. Given that InnoDB is not shipping its logs across the wire, I don't think many users would really care if it used the double writer or full page writes approach to the redo log (other than the fact that the log files would be bigger). PG on the other hand *is* pushing its logs over the wire... So how is InnoDB doing replication? Is there a second log just for that? The other log is the binary log and it is one of the biggest problems with MySQL. Running MySQL in such a way that the binary log stays in sync with the InnoDB redo has a very significant impact on performance. http://www.mysqlperformanceblog.com/2010/10/23/mysql-limitations-part-2-the-binary-log/ http://mysqlha.blogspot.com/2010/09/mysql-versus-mongodb-update-performance.html (check out the pretty graph) Hmm. That seems kinda painful. Having to ship full page images over the wire doesn't seems so bad by comparison, though I'm not very happy about having to do that either. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Re: Postgres insert performance and storage requirement compared to Oracle
On Wed, Oct 27, 2010 at 6:13 AM, Ivan Voras ivo...@freebsd.org wrote: On 10/26/10 17:41, Merlin Moncure wrote: On Tue, Oct 26, 2010 at 11:08 AM, Leonardo Francalanci m_li...@yahoo.it wrote: temp tables are not wal logged or synced. Periodically they can be flushed to a permanent table. What do you mean with Periodically they can be flushed to a permanent table? Just doing insert into tabb select * from temptable yup, that's exactly what I mean -- this will give you more uniform In effect, when so much data is in temporary storage, a better option would be to simply configure synchronous_commit = off (better in the sense that the application would not need to be changed). The effects are almost the same - in both cases transactions might be lost but the database will survive. Gee, I wonder if it would possible for PG to automatically do an asynchronous commit of any transaction which touches only temp tables. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Auto ANALYZE criteria
On Wed, Oct 13, 2010 at 5:20 PM, Joe Miller joe.d.mil...@gmail.com wrote: Thanks for fixing the docs, but if that's the case, I shouldn't be seeing the behavior that I'm seeing. Should I flesh out this test case a little better and file a bug? A reproducible test case is always a good thing to have... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] BBU Cache vs. spindles
On Fri, Oct 22, 2010 at 3:05 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Rob Wultsch wult...@gmail.com wrote: I would think full_page_writes=off + double write buffer should be far superior, particularly given that the WAL is shipped over the network to slaves. For a reasonably brief description of InnoDB double write buffers, I found this: http://www.mysqlperformanceblog.com/2006/08/04/innodb-double-write/ One big question before even considering this would by how to determine whether a potentially torn page is inconsistent. Without a page CRC or some such mechanism, I don't see how this technique is possible. There are two sides to this problem: figuring out when to write a page to the double write buffer, and figuring out when to read it back from the double write buffer. The first seems easy: we just do it whenever we would XLOG a full page image. As to the second, when we write the page out to the double write buffer, we could also write to the double write buffer the LSN of the WAL record which depends on that full page image. Then, at the start of recovery, we scan the double write buffer and remember all those LSNs. When we reach one of them, we replay the full page image. The good thing about this is that it would reduce WAL volume; the bad thing about it is that it would probably mean doing two fsyncs where we only now do one. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] BBU Cache vs. spindles
On Tue, Oct 26, 2010 at 10:13 AM, Rob Wultsch wult...@gmail.com wrote: The double write buffer is one of the few areas where InnoDB does more IO (in the form of fsynch's) than PG. InnoDB also has fuzzy checkpoints (which help to keep dirty pages in memory longer), buffering of writing out changes to secondary indexes, and recently tunable page level compression. Baron Schwartz was talking to me about this at Surge. I don't really understand how the fuzzy checkpoint stuff works, and I haven't been able to find a good description of it anywhere. How does it keep dirty pages in memory longer? Details on the other things you mention would be interesting to hear, too. Given that InnoDB is not shipping its logs across the wire, I don't think many users would really care if it used the double writer or full page writes approach to the redo log (other than the fact that the log files would be bigger). PG on the other hand *is* pushing its logs over the wire... So how is InnoDB doing replication? Is there a second log just for that? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Select count(*), the sequel
On Sat, Oct 16, 2010 at 2:44 PM, Kenneth Marshall k...@rice.edu wrote: Interesting data points. The amount of rows that you managed to insert into PostgreSQL before Oracle gave up the ghost is 95% of the rows in the Oracle version of the database. To count 5% fewer rows, it took PostgreSQL 24 seconds longer. Or adjusting for the missing rows, 52 seconds longer for the entire table or 18% longer than the full table scan in Oracle. This seems to be well within the table layout size differences, possibly due to the fillfactor used --not really bad at all. I don't think this is due to fillfactor - the default fillfactor is 100, and anyway we ARE larger on disk than Oracle. We really need to do something about that, in the changes to NUMERIC in 9.1 are a step in that direction, but I think a lot more work is needed. I think it would be really helpful if we could try to quantify where the extra space is going. Some places to look: - Bloated representations of individual datatypes. (I know that even the new NUMERIC format is larger than Oracle's NUMBER.) - Excessive per-tuple overhead. Ours is 24 bytes, plus the item pointer. - Alignment requirements. We have a fair number of datatypes that require 4 or 8 byte alignment. How much is that hurting us? - Compression. Maybe Oracle's algorithm does better than PGLZ. If we can quantify where we're losing vs. Oracle - or any other competitor - that might give us some idea where to start looking. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] HashJoin order, hash the large or small table? Postgres likes to hash the big one, why?
On Mon, Oct 18, 2010 at 9:40 PM, Scott Carey sc...@richrelevance.com wrote: 8.4.5 I consistently see HashJoin plans that hash the large table, and scan the small table. This is especially puzzling in some cases where I have 30M rows in the big table and ~ 100 in the small... shouldn't it hash the small table and scan the big one? Here is one case I saw just recently Hash Cond: ((a.e_id)::text = (ta.name)::text) - Index Scan using c_a_s_e_id on a (cost=0.00..8.21 rows=14 width=27) Index Cond: (id = 12) - Hash (cost=89126.79..89126.79 rows=4825695 width=74) - Seq Scan on p_a_1287446030 tmp (cost=0.00..89126.79 rows=4825695 width=74) Filter: (id = 12) Can we have the complex EXPLAIN output here, please? And the query? For example, this would be perfectly sensible if the previous line started with Hash Semi Join or Hash Anti Join. rhaas=# explain select * from little where exists (select * from big where big.a = little.a); QUERY PLAN --- Hash Semi Join (cost=3084.00..3478.30 rows=10 width=4) Hash Cond: (little.a = big.a) - Seq Scan on little (cost=0.00..1.10 rows=10 width=4) - Hash (cost=1443.00..1443.00 rows=10 width=4) - Seq Scan on big (cost=0.00..1443.00 rows=10 width=4) (5 rows) I'm also a bit suspicious of the fact that the hash condition has a cast to text on both sides, which implies, to me anyway, that the underlying data types are not text. That might mean that the query planner doesn't have very good statistics, which might mean that the join selectivity estimates are wackadoo, which can apparently cause this problem: rhaas=# explain select * from little, big where little.a = big.a; QUERY PLAN --- Hash Join (cost=3084.00..3577.00 rows=2400 width=8) Hash Cond: (little.a = big.a) - Seq Scan on little (cost=0.00..34.00 rows=2400 width=4) - Hash (cost=1443.00..1443.00 rows=10 width=4) - Seq Scan on big (cost=0.00..1443.00 rows=10 width=4) (5 rows) rhaas=# analyze; ANALYZE rhaas=# explain select * from little, big where little.a = big.a; QUERY PLAN --- Hash Join (cost=1.23..1819.32 rows=10 width=8) Hash Cond: (big.a = little.a) - Seq Scan on big (cost=0.00..1443.00 rows=10 width=4) - Hash (cost=1.10..1.10 rows=10 width=4) - Seq Scan on little (cost=0.00..1.10 rows=10 width=4) (5 rows) This doesn't appear to make a lot of sense, but... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] odd postgresql performance (excessive lseek)
On Tue, Oct 19, 2010 at 10:36 AM, Tom Lane t...@sss.pgh.pa.us wrote: Jon Nelson jnelson+pg...@jamponi.net writes: This is another situation where using pread would have saved a lot of time and sped things up a bit, but failing that, keeping track of the file position ourselves and only lseek'ing when necessary would also help. No, it wouldn't; you don't have the slightest idea what's going on there. Those lseeks are for the purpose of detecting the current EOF location, ie, finding out whether some other backend has extended the file recently. We could get rid of them, but only at the cost of putting in some other communication mechanism instead. I don't get it. Why would be doing that in a tight loop within a single backend? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow count(*) again...
On Wed, Oct 13, 2010 at 1:59 PM, Jesper Krogh jes...@krogh.cc wrote: On 2010-10-13 15:28, Robert Haas wrote: On Wed, Oct 13, 2010 at 6:16 AM, Neil Whelchelneil.whelc...@gmail.com wrote: I might go as far as to rattle the cage of the developers to see if it makes any sense to add some column oriented storage capability to Postgres. That would be the hot ticket to be able to specify an attribute on a column so that the back end could shadow or store a column in a column oriented table so aggregate functions could work on them with good efficiency, or is that an INDEX? I'd love to work on that, but without funding it's tough to find the time. It's a big project. Is it hugely different from just getting the visibillity map suitable for doing index-only scans and extracting values from the index directly as Heikki has explained?] I think that there's a lot more to a real column-oriented database than index-only scans, although, of course, index-only scans are very important. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow count(*) again...
On Thu, Oct 14, 2010 at 12:22 AM, mark dvlh...@gmail.com wrote: Could this be an interesting test use of https://www.fossexperts.com/ ? 'Community' driven proposal - multiple people / orgs agree to pay various portions? Maybe with multiple funders a reasonable target fund amount could be reached. Just throwing around ideas here. This is a bit off-topic, but as of now, they're only accepting proposals for projects to be performed by CommandPrompt itself. So that doesn't help me much (note the sig). But in theory it's a good idea. Of course, when and if they open it up, then what? If more than one developer or company is interested in a project, who determines who gets to do the work and get paid for it? If that determination is made by CommandPrompt itself, or if it's just a free-for-all to see who can get their name on the patch that ends up being committed, it's going to be hard to get other people/companies to take it very seriously. Another problem is that even when they do open it up, they apparently intend to charge 7.5 - 15% of the contract value as a finder's fee. That's a lot of money. For a $100 project it's totally reasonable, but for a $10,000 project it's far more expensive than the value of the service they're providing can justify. (Let's not even talk about a $100,000 project.) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How does PG know if data is in memory?
On Tue, Oct 12, 2010 at 10:35 AM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: (1) Heavily used data could be kept fully cached in RAM and not driven out by transient activity. We've attempted to address this problem by adding logic to prevent the buffer cache from being trashed by vacuums, bulk loads, and sequential scans. It would be interesting to know if anyone has examples of that logic falling over or proving inadequate. (2) You could flag a cache used for (1) above as using relaxed LRU accounting -- it saved a lot of time tracking repeated references, leaving more CPU for other purposes. We never do strict LRU accounting. (3) Each named cache had its own separate set of locks, reducing contention. We have lock partitions, but as discussed recently on -hackers, they seem to start falling over around 26 cores. We probably need to improve that, but I'd rather do that by making the locking more efficient and by increasing the number of partitions rather than by allowing users to partition the buffer pool by hand. (4) Large tables for which the heap was often were scanned in its entirety or for a range on the clustered index could be put in a relatively small cache with large I/O buffers. This avoided blowing out the default cache space for situations which almost always required disk I/O anyway. I think, but am not quite sure, that my answer to point #1 is also relevant here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow count(*) again...
On Tue, Oct 12, 2010 at 1:07 PM, Tom Lane t...@sss.pgh.pa.us wrote: Anyway, if anyone is hot to make COUNT(*) faster, that's where to look. I don't think any of the previous discussion in this thread is on-point at all, except for the parts where people suggested avoiding it. I kind of hope that index-only scans help with this, too. If you have a wide table and a narrow (but not partial) index, and if the visibility map bits are mostly set, it ought to be cheaper to read the index than the table - certainly in the case where any disk I/O is involved, and maybe even if it isn't. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow count(*) again...
On Wed, Oct 13, 2010 at 4:38 AM, Neil Whelchel neil.whelc...@gmail.com wrote: There seems to be allot of discussion about VACUUM FULL, and its problems. The overall buzz seems to be that VACUUM FULL is a bad idea (I could be wrong here). It has been some time since I have read the changelogs, but I seem to remember that there have been some major changes to VACUUM FULL recently. Maybe this needs to be re-visited in the documentation. In 9.0, VACUUM FULL does something similar to what CLUSTER does. This is a much better idea than what it did in 8.4 and prior. crash:~# time psql -U test test -c VACUUM FULL log; VACUUM real 4m49.055s user 0m0.000s sys 0m0.000s crash:~# time psql -U test test -c SELECT count(*) FROM log; count -- 10050886 (1 row) real 0m9.665s user 0m0.000s sys 0m0.004s A huge improvement from the minute and a half before the VACUUM FULL. This is a very surprising result that I would like to understand better. Let's assume that your UPDATE statement bloated the table by 2x (you could use pg_relation_size to find out exactly; the details probably depend on fillfactor which you might want to lower if you're going to do lots of updates). That ought to mean that count(*) has to grovel through twice as much data, so instead of taking 9 seconds it ought to take 18 seconds. Where the heck is the other 1:12 going? This might sort of make sense if the original table was laid out sequentially on disk and the updated table was not, but how and why would that happen? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow count(*) again...
On Wed, Oct 13, 2010 at 6:16 AM, Neil Whelchel neil.whelc...@gmail.com wrote: I might go as far as to rattle the cage of the developers to see if it makes any sense to add some column oriented storage capability to Postgres. That would be the hot ticket to be able to specify an attribute on a column so that the back end could shadow or store a column in a column oriented table so aggregate functions could work on them with good efficiency, or is that an INDEX? I'd love to work on that, but without funding it's tough to find the time. It's a big project. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How does PG know if data is in memory?
On Mon, Oct 11, 2010 at 11:11 PM, gnuo...@rcn.com wrote: An approach that works can be found in DB2, and likely elsewhere. The key is that tablespaces/tables/indexes/buffers are all attached through the bufferpool (the DB2 term). A tablespace/bufferpool match is defined. Then tables and indexes are assigned to the tablespace (and implicitly, the bufferpool). As a result, one can effectively pin data in memory. This is very useful, but not low hanging fruit to implement. The introduction of rudimentary tablespaces is a first step. I assumed that the point was to get to a DB2-like structure at some point. Yes? We already have tablespaces, and our data already is accessed through the buffer pool. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Runtime dependency from size of a bytea field
On Fri, Oct 8, 2010 at 12:53 AM, Sander, Ingo (NSN - DE/Munich) ingo.san...@nsn.com wrote: The difference to my test is that we use the ODBC interface in our C program. Could it be that the difference in the runtimes is caused by the ODBC? I've heard tell that ODBC is substantially slower than a native libpq connection, but I don't know that for a fact, not being an ODBC user. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] gist indexes for distance calculations
On Fri, Oct 1, 2010 at 1:56 AM, Jesper Krogh jes...@krogh.cc wrote: On 2010-09-30 20:33, Marcelo Zabani wrote: If you can also pinpoint me to where I can find this sort of information (index utilization and planning, performance tuning), I'd be very grateful. Thank you already, Isn't this what the knngist patches are for? https://commitfest.postgresql.org/action/patch_view?id=350 http://www.sai.msu.su/~megera/wiki/knngist Those are for when you want to order by distance; the OP is trying to *filter* by distance, which is different. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How does PG know if data is in memory?
On Mon, Oct 4, 2010 at 6:47 PM, Jeremy Harris j...@wizmail.org wrote: On 10/04/2010 04:22 AM, Greg Smith wrote: I had a brain-storming session on this subject with a few of the hackers in the community in this area a while back I haven't had a chance to do something with yet (it exists only as a pile of scribbled notes so far). There's a couple of ways to collect data on what's in the database and OS cache, and a couple of ways to then expose that data to the optimizer. But that needs to be done very carefully, almost certainly as only a manual process at first, because something that's producing cache feedback all of the time will cause plans to change all the time, too. Where I suspect this is going is that we may end up tracking various statistics over time, then periodically providing a way to export a mass of typical % cached data back to the optimizer for use in plan cost estimation purposes. But the idea of monitoring continuously and always planning based on the most recent data available has some stability issues, both from a too many unpredictable plan changes and a ba d short-term feedback loop perspective, as mentioned by Tom and Kevin already. Why not monitor the distribution of response times, rather than cached vs. not? That a) avoids the issue of discovering what was a cache hit b) deals neatly with multilevel caching c) feeds directly into cost estimation. I was hot on doing better cache modeling a year or two ago, but the elephant in the room is that it's unclear that it solves any real-world problem. The OP is clearly having a problem, but there's not enough information in his post to say what is actually causing it, and it's probably not caching effects. We get occasional complaints of the form the first time I run this query it's slow, and then after that it's fast but, as Craig Ringer pointed out upthread, not too many. And even with respect to the complaints we do get, it's far from clear that the cure is any better than the disease. Taking caching effects into account could easily result in the first execution being slightly less slow and all of the subsequent executions being moderately slow. That would not be an improvement for most people. The reports that seem really painful to me are the ones where people with really big machines complain of needing HOURS for the cache to warm up, and having the system bogged down to a standstill until then. But changing the cost model isn't going to help them either. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] [HACKERS] MIT benchmarks pgsql multicore (up to 48)performance
On Wed, Oct 6, 2010 at 10:07 PM, Stephen Frost sfr...@snowman.net wrote: * Robert Haas (robertmh...@gmail.com) wrote: It's good to be you. They're HP BL465 G7's w/ 2x 12-core AMD processors and 48G of RAM. Unfortunately, they currently only have local storage, but it seems unlikely that would be an issue for this. I don't suppose you could try to replicate the lseek() contention? I can give it a shot, but the impression I had from the paper is that the lseek() contention wouldn't be seen without the changes to the lock manager...? Or did I misunderstand? rereads appropriate section of paper Looks like the lock manager problems hit at 28 cores, and the lseek problems at 36 cores. So your system might not even be big enough to manifest either problem. It's unclear to me whether a 48-core system would be able to see the lseek issues without improvements to the lock manager, but perhaps it would be possible by, say, increasing the number of lock partitions by 8x. It would be nice to segregate these issues though, because using pread/pwrite is probably a lot less work than rewriting our lock manager. Do you have tools to measure the lseek overhead? If so, we could prepare a patch to use pread()/pwrite() and just see whether that reduced the overhead, without worrying so much about whether it was actually a major bottleneck. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] [HACKERS] MIT benchmarks pgsql multicore (up to 48)performance
On Thu, Oct 7, 2010 at 1:21 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Robert Haas robertmh...@gmail.com wrote: perhaps it would be possible by, say, increasing the number of lock partitions by 8x. It would be nice to segregate these issues though, because using pread/pwrite is probably a lot less work than rewriting our lock manager. You mean easier than changing this 4 to a 7?: #define LOG2_NUM_LOCK_PARTITIONS 4 Or am I missing something? Right. They did something more complicated (and, I think, better) than that, but that change by itself might be enough to ameliorate the lock contention enough to see the lsek() issue. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Odd behaviour with redundant CREATE statement
On Mon, Sep 27, 2010 at 3:27 PM, Gurjeet Singh singh.gurj...@gmail.com wrote: On Mon, Sep 27, 2010 at 8:50 PM, Dave Crooke dcro...@gmail.com wrote: Our Java application manages its own schema. Some of this is from Hibernate, but some is hand-crafted JDBC. By way of an upgrade path, we have a few places where we have added additional indexes to optimize performance, and so at startup time the application issues CREATE INDEX ... statements for these, expecting to catch the harmless exception ERROR: relation date_index already exists, as a simpler alternative to using the meta-data to check for it first. In general, this seems to work fine, but we have one installation where we observed one of these CREATE statements hanging up in the database, as if waiting for a lock, thus stalling the app startup You can tell if it is really waiting by looking at 'select * from pg_locks', and check the 'granted' column. CREATE INDEX (without CONCURRENTLY) tries to acquire a share-lock on the table, which will conflict with any concurrent INSERT, UPDATE, DELETE, or VACUUM. It probably tries to acquire the lock before noticing that the index is a duplicate. CREATE INDEX CONCURRENTLY might be an option, or you could write and call a PL/pgsql function (or, in 9.0, use a DO block) to test for the existence of the index before trying create it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] [HACKERS] MIT benchmarks pgsql multicore (up to 48)performance
On Wed, Oct 6, 2010 at 6:31 PM, Ivan Voras ivo...@freebsd.org wrote: On 10/04/10 20:49, Josh Berkus wrote: The other major bottleneck they ran into was a kernel one: reading from the heap file requires a couple lseek operations, and Linux acquires a mutex on the inode to do that. The proper place to fix this is certainly in the kernel but it may be possible to work around in Postgres. Or we could complain to Kernel.org. They've been fairly responsive in the past. Too bad this didn't get posted earlier; I just got back from LinuxCon. So you know someone who can speak technically to this issue? I can put them in touch with the Linux geeks in charge of that part of the kernel code. Hmmm... lseek? As in lseek() then read() or write() idiom? It AFAIK cannot be fixed since you're modifying the global strean position variable and something has got to lock that. Well, there are lock free algorithms using CAS, no? OTOH, pread() / pwrite() don't have to do that. Hey, I didn't know about those. That sounds like it might be worth investigating, though I confess I lack a 48-core machine on which to measure the alleged benefit. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] [HACKERS] MIT benchmarks pgsql multicore (up to 48)performance
On Wed, Oct 6, 2010 at 9:30 PM, Stephen Frost sfr...@snowman.net wrote: * Robert Haas (robertmh...@gmail.com) wrote: Hey, I didn't know about those. That sounds like it might be worth investigating, though I confess I lack a 48-core machine on which to measure the alleged benefit. I've got a couple 24-core systems, if it'd be sufficiently useful to test with.. It's good to be you. I don't suppose you could try to replicate the lseek() contention? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query much faster with enable_seqscan=0
On Wed, Sep 22, 2010 at 9:36 AM, Ogden li...@darkstatic.com wrote: On Sep 21, 2010, at 2:34 PM, Ogden wrote: On Sep 21, 2010, at 2:16 PM, Greg Smith wrote: Joshua D. Drake wrote: PostgreSQL's defaults are based on extremely small and some would say (non production) size databases. As a matter of course I always recommend bringing seq_page_cost and random_page_cost more in line. Also, they presume that not all of your data is going to be in memory, and the query optimizer needs to be careful about what it does and doesn't pull from disk. If that's not the case, like here where there's 8GB of RAM and a 7GB database, dramatic reductions to both seq_page_cost and random_page_cost can make sense. Don't be afraid to think lowering below 1.0 is going too far--something more like 0.01 for sequential and 0.02 for random may actually reflect reality here. I have done just that, per your recommendations and now what took 14 seconds, only takes less than a second, so it was certainly these figures I messed around with. I have set: seq_page_cost = 0.01 random_page_cost = 0.02 cpu_tuple_cost = 0.01 Everything seems to run faster now. I think this should be fine - I'll keep an eye on things over the next few days. I truly appreciate everyone's help. Ogden I spoke too soon - well I came in this morning and reran the query that was speeded up yesterday by a lot after tweaking those numbers. This morning the first time I ran it, it took 16 seconds whereas every subsequent run was a matter of 2 seconds. I assume there is OS caching going on for those results. Is this normal or could it also be the speed of my disks which is causing a lag when I first run it (it's RAID 5 across 6 disks). Is there any explanation for this or what should those settings really be? Perhaps 0.01 is too low? Yeah, I think those numbers are a bit low. Your database probably isn't fully cached. Keep in mind there's going to be some fluctuation as to what is and is not in cache, and you can't expect whatever plan the planner picks to be exactly perfect for both cases. I might try something more like 0.2 / 0.1. If you really need the query to be fast, though, you might need to do more than jigger the page costs. Did you try Tom's suggested rewrite? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] how to enforce index sub-select over filter+seqscan
On Thu, Sep 23, 2010 at 10:26 AM, Dmitry Teslenko dtesle...@gmail.com wrote: I know I can set enable_seqscan = off. Is there other ways to enforce index usage? Not really, but I suspect random_page_cost and seq_page_cost might help the planner make better decisions. Is your data by any chance mostly cached in memory? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] turn off caching for performance test
On Fri, Aug 27, 2010 at 1:57 PM, Scott Marlowe scott.marl...@gmail.com wrote: On Thu, Aug 26, 2010 at 4:32 AM, Willy-Bas Loos willy...@gmail.com wrote: Hi, I have a colleague that is convinced that the website is faster if enable_seqscan is turned OFF. I'm convinced of the opposite (better to leave it ON), but i would like to show it, prove it to him. Stop, you're both doing it wrong. The issue isn't whether or not turning off seq scans will make a few things faster here and there, it's why is the query planner choosing sequential scans when it should be choosing index scans. So, what are your non-default settings in postgresql.conf? Have you increased effective_cache_size yet? Lowered random_page_cost? Raised default stats target and re-analyzed? Have you been looking at the problem queries with explain analyze? What does it have to say about the planners choices? [a bit behind on my email] This was exactly my thought on first reading this post. If the indexes are faster and PG thinks they are slower, it's a good bet that there are some parameters that need tuning. Specifically, effective_cache_size may be too low, and random_page_cost and seq_page_cost are almost certainly too high. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Is disableing nested_loops a bad idea ?
On Thu, Sep 16, 2010 at 10:13 AM, Franck Routier franck.rout...@axege.com wrote: Thanks Kevin and Samuel for your input. The point is we already made a lot of tweaking to try to tune postgresql to behave correctly. I work with Damien, and here is a post he did in july to explain the kind of problems we have http://comments.gmane.org/gmane.comp.db.postgresql.performance/25745 The end of the thread was Robert Hass concluding that Disabling nestloops altogether, even for one particular query, is often going to be a sledgehammer where you need a scalpel. But then again, a sledgehammer is better than no hammer. So I wanted to better understand to what extend using a sledgehammer will impact me :-) One particular case where you may get a nasty surprise is: Nested Loop - Whatever - Index Scan This isn't necessarily terrible if the would-be index scan is on a small table, because a hash join may be not too bad. It may not be too good, either, but if the would-be index scan is on a large table the whole thing might turn into a merge join. That can get pretty ugly. Of course in some cases the planner may be able to rejigger the whole plan in some way that mitigates the damage, but not necessarily. One of the things I've noticed about our planner is that it becomes less predictable in stressful situations. As you increase the number of tables involved in join planning, for example, the query planner still delivers a lot of very good plans, but not quite as predictably. Things don't slow down uniformly across the board; instead, most of the plans remain pretty good but every once in a while (and with increasing frequency as you keep cranking up the table count) you get a bad one. Shutting off any of the enable_* constants will, I think, produce a similar effect. Many queries can be adequate handled using some other technique and you won't really notice it, but you may find that you have a few (or someone will eventually write one) which *really* needs whatever technique you turned off for decent performance. At that point you don't have a lot of options... Incidentally, it's Haas, rather than Hass. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Using Between
On Tue, Sep 21, 2010 at 4:04 PM, Ozer, Pam po...@automotive.com wrote: There are 850,000 records in vehicleused. And the database is too big to be kept in memory. Ah. So in other words, you are retrieving about half the rows in that table. For those kinds of queries, using the index tends to actually be slower, because (1) you read the index in addition to reading the table, which has CPU and I/O cost, and (2) instead of reading the table sequentially, you end up jumping around and reading it out of order, which tends to result in more disk seeks and defeats the OS prefetch logic. The query planner is usually pretty smart about making good decisions about this kind of thing. As a debugging aid (but never in production), you can try disabling enable_seqscan and see what plan you get that way. If it's slower, well then the query planner did the right thing. If it's faster, then probably you need to adjust seq_page_cost and random_page_cost a bit. But my guess is that it will be somewhere between a lot slower and only very slightly faster. A whole different line of inquiry is ask the more general question how can I make this query faster?, but I'm not sure whether you're unhappy with how the query is running or just curious about why the index isn't being used. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Using Between
On Wed, Sep 22, 2010 at 11:18 AM, Ozer, Pam po...@automotive.com wrote: The question is how can we make it faster. If there's just one region ID for any given postal code, you might try adding a column to vehicleused and storing the postal codes there. You could possibly populate that column using a trigger; probably it doesn't change unless the postalcode changes. Then you could index that column and query against it directly, rather than joining to PostalCodeRegionCountyCity. Short of that, I don't see any obvious way to avoid reading most of the vehicleused table. There may or may not be an index that can speed that up slightly and of course you can always throw hardware at the problem, but fundamentally reading half a million or more rows isn't going to be instantaneous. Incidentally, it would probably simplify things to store postal codes in the same case throughout the system. If you can avoid the need to write lower(x) = lower(y) and just write x = y you may get better plans. I'm not sure that's the case in this particular example but it's something to think about. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Useless sort by
On Mon, Sep 13, 2010 at 1:09 PM, Gaetano Mendola mend...@gmail.com wrote: I see your point, but some functions like: unique, count are not affected by the order of values fed, and I don't think either that unique has to give out the unique values in the same fed order. Gee, I'd sure expect it to. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Useless sort by
On Wed, Sep 22, 2010 at 11:05 PM, gnuo...@rcn.com wrote: Spoken like a dyed in the wool COBOL coder. The RM has no need for order; it's set based. I've dabbled in PG for some time, and my sense is increasingly that PG developers are truly code oriented, not database (set) oriented. I'm struggling to think of an adequate response to this. I think I'm going to go with: huh? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] GPU Accelerated Sorting
On Mon, Aug 30, 2010 at 9:46 AM, Eliot Gable ega...@gmail.com wrote: Not sure if anyone else saw this, but it struck me as an interesting idea if it could be added to PostgreSQL. GPU accelerated database operations could be very... interesting. Of course, this could be difficult to do in a way that usefully increases performance of PostgreSQL, but I'll leave that up to you guys to figure out. http://code.google.com/p/back40computing/wiki/RadixSorting It would be hard to use this because, in addition to the fact that this is specific to a very particular type of hardware, it only works if you're trying to do a very particular type of sort. For example, it wouldn't handle multi-byte characters properly. And it wouldn't handle integers properly either - you'd end up sorting negatives after positives. You could possibly still find applications for it but they'd be quite narrow, I think. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Using Between
, vehicleused.vehicleusedmileagerangefloor, vehicleused.hasvehicleusedmileage, vehicleused.VehicleUsedIntId.distinct_count, vehicleused.VehicleUsedPrice.average, vehicleused.VehicleUsedId.distinct_count, vehicleused.iscpo, vehicleused.ismtca, vehicleused.cpoprogramoemid, vehicleused.cpoprogram3rdpartyid Filter: ((vehicleusedpricerangefloor = 0) AND (vehicleusedpricerangefloor = 15000)) - Hash (cost=283.32..283.32 rows=522 width=6) (actual time=1.084..1.084 rows=532 loops=1) Output: postalcoderegioncountycity.postalcode - Bitmap Heap Scan on postalcoderegioncountycity (cost=12.30..283.32 rows=522 width=6) (actual time=0.092..0.361 rows=532 loops=1) Output: postalcoderegioncountycity.postalcode Recheck Cond: (regionid = 26) - Bitmap Index Scan on postalcoderegioncountycity_i05 (cost=0.00..12.17 rows=522 width=0) (actual time=0.082..0.082 rows=532 loops=1) Index Cond: (regionid = 26) Total runtime: 1945.244 ms How many rows are in the vehicleused table in total? Is your database small enough to fit in memory? Do you have any non-default settings in postgresql.conf? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] slow DDL creation
On Tue, Aug 31, 2010 at 11:35 AM, Kevin Kempter cs_...@consistentstate.com wrote: On Monday 30 August 2010 17:04, bricklen wrote: On Mon, Aug 30, 2010 at 3:28 PM, Kevin Kempter cs_...@consistentstate.com wrote: Hi all ; we have an automated partition creation process that includes the creation of an FK constraint. we have a few other servers with similar scenarios and this is the only server that stinks per when we create the new partitions. Anyone have any thoughts on how to debug this? were running postgres 8.4.4 on CentOS 5.5 Thanks in advance Is the referenced column indexed? no, but its for a new partition so there's no data as of yet in the partition What exactly does stinks mean? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: Fwd: [PERFORM] Vacuum Full + Cluster + Vacuum full = non removable dead rows
On Sat, Aug 21, 2010 at 9:49 AM, Alexandre de Arruda Paes adald...@gmail.com wrote: Only for discussion: the CLUSTER command, in my little knowledge, is a intrusive command that's cannot recover the dead tuples too. Only TRUNCATE can do this job, but obviously is not applicable all the time. Either VACUUM or CLUSTER will recover *dead* tuples. What you can't recover are tuples that are still visible to some running transaction. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Two fast searches turn slow when used with OR clause
On Thu, Aug 5, 2010 at 2:34 PM, Craig James craig_ja...@emolecules.com wrote: = explain analyze select p.price, p.amount, p.units, s.catalogue_id, vn.version_id - from plus p join sample s - on (p.compound_id = s.compound_id and p.supplier_id = s.supplier_id) - join version vn on (s.version_id = vn.version_id) join parent pn - on (s.parent_id = pn.parent_id) - where vn.isosmiles = 'Fc1n1B1OC(C)(C)C(C)(C)O1' - or pn.isosmiles = 'Fc1n1B1OC(C)(C)C(C)(C)O1' - order by price; Well, you can't evaluate the WHERE clause here until you've joined {s vn pn}. If I only query the VERSION table, it's very fast: x= explain analyze select p.price, p.amount, p.units, s.catalogue_id, vn.version_id - from plus p - join sample s on (p.compound_id = s.compound_id and p.supplier_id = s.supplier_id) - join version vn on (s.version_id = vn.version_id) - where vn.isosmiles = 'Fc1n1B1OC(C)(C)C(C)(C)O1' order by price; But here you can push the WHERE clause all the way down to the vn table, and evaluate it right at the get go, which is pretty much exactly what is happening. In the first case, you have to join all 297,306 vn rows against s, because they could be interesting if the other half of the WHERE clause turns out to hold. In the second case, you can throw away 297,305 of those 297,306 rows before doing anything else, because there's no possibility that they can ever be interesting. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Questions on query planner, join types, and work_mem
On Wed, Aug 11, 2010 at 9:42 PM, Bruce Momjian br...@momjian.us wrote: This confused me. If we are assuing the data is in effective_cache_size, why are we adding sequential/random page cost to the query cost routines? See the comments for index_pages_fetched(). We basically assume that all data starts uncached at the beginning of each query - in fact, each plan node. effective_cache_size only measures the chances that if we hit the same block again later in the execution of something like a nested-loop-with-inner-indexscan, it'll still be in cache. It's an extremely weak knob, and unless you have tables or indices that are larger than RAM, the only mistake you can make is setting it too low. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Questions on query planner, join types, and work_mem
On Tue, Aug 3, 2010 at 3:03 AM, Hannu Krosing ha...@2ndquadrant.com wrote: In case of fully cached database it is closer to 1. In the case of a fully cached database I believe the correct answer begins with a decimal point. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Strange explain on partitioned tables
On Mon, Jul 26, 2010 at 5:26 PM, Gerald Fontenay gvfonte...@lbl.gov wrote: Thank you for your response. So if I query only my target child table, this should be just like any other single table wrt planning right? I have thousands of these tables. (I suppose that I'm only using inheritance for the sake of organization in this situation...) Yeah, I wouldn't expect planning time to be affected by whether a table has parents; only whether it has children. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Questions on query planner, join types, and work_mem
On Mon, Aug 2, 2010 at 5:23 PM, Peter Hussey pe...@labkey.com wrote: I already had effective_cache_size set to 500MB. I experimented with lowering random_page_cost to 3 then 2. It made no difference in the choice of plan that I could see. In the explain analyze output the estimated costs of nested loop were in fact lowererd, but so were the costs of the hash join plan, and the hash join remained the lowest predicted costs in all tests i tried. What do you get if you set random_page_cost to a small value such as 0.01? What seems wrong to me is that the hash join strategy shows almost no difference in estimated costs as work_mem goes from 1MB to 500MB. The cost function decreases by 1%, but the actual time for the query to execute decreases by 86% as work_mem goes from 1MB to 500MB. Wow. It would be interesting to find out how many batches are being used. Unfortunately, releases prior to 9.0 don't display that information. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Explains of queries to partitioned tables
On Mon, Jul 26, 2010 at 4:47 AM, Vlad Arkhipov arhi...@dc.baikal.ru wrote: There is a partitioned table with 2 partitions: drop table if exists p cascade; create table p ( id bigint not null primary key, ts timestamp); create table p_actual ( check (ts is null) ) inherits (p); create table p_historical ( check (ts is not null) ) inherits (p); -- I skipped the triggers and rules creation insert into p (id, ts) values (1, '2000-01-01'); insert into p (id, ts) values (2, null); insert into p (id, ts) values (3, '2001-01-01'); insert into p (id, ts) values (4, '2005-01-01'); analyze p; analyze p_actual; analyze p_historical; Here is the explain output for the query 'select * from p where ts is null' Result (cost=0.00..188.10 rows=10 width=16) (actual time=0.028..0.038 rows=1 loops=1) - Append (cost=0.00..188.10 rows=10 width=16) (actual time=0.023..0.029 rows=1 loops=1) - Seq Scan on p (cost=0.00..187.00 rows=9 width=16) (actual time=0.002..0.002 rows=0 loops=1) Filter: (ts IS NULL) - Seq Scan on p_actual p (cost=0.00..1.10 rows=1 width=16) (actual time=0.014..0.016 rows=1 loops=1) Filter: (ts IS NULL) Total runtime: 0.080 ms You can notice that the optimizer expects 10 rows in the table p and as a result of this assumption the whole query is estimated as 10 rows. Whether it will cause a performance impact further? pg_stats does not contain any statistics on the table 'p'. Is this a cause of such behaviour? The estimation is worse for some other queries, for example 'select * from p where ts is not null' Result (cost=0.00..188.30 rows=1764 width=16) (actual time=0.021..0.049 rows=3 loops=1) - Append (cost=0.00..188.30 rows=1764 width=16) (actual time=0.016..0.032 rows=3 loops=1) - Seq Scan on p (cost=0.00..187.00 rows=1761 width=16) (actual time=0.003..0.003 rows=0 loops=1) Filter: (ts IS NOT NULL) - Seq Scan on p_historical p (cost=0.00..1.30 rows=3 width=16) (actual time=0.008..0.015 rows=3 loops=1) Filter: (ts IS NOT NULL) Total runtime: 0.095 ms It would be easier to comment on this if you mentioned things like which version of PG you're using, and what you have constraint_exclusion set to, but as a general comment analyze doesn't store statistics for any tables that are empty, because it assumes that at some point you're going to put data in them. So in this case p_historical is probably using fake stats. But it's not clear that it really matters: you haven't got any relevant indices, so a sequential scan is the only possible plan; and even if you did have some, there's only 4 rows, so a sequential scan is probably the only plan that makes sense anyway. And your query ran in a tenth of a millisecond, which is pretty zippy. So I'm not really sure what the problem is. If this isn't the real data, post an example with the real data and ask for help about that. http://wiki.postgresql.org/wiki/SlowQueryQuestions -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.
On Wed, Jul 28, 2010 at 3:44 PM, Josh Berkus j...@agliodbs.com wrote: On 7/27/10 6:56 PM, Tom Lane wrote: Yeah, if it weren't for that I'd say sure let's try it. But I'm afraid we'd be introducing significant headaches in return for a gain that's quite speculative. Well, the *gain* isn't speculative. For example, I am once again dealing with the issue that PG backend processes on Solaris never give up their RAM, resulting in pathological swapping situations if you have many idle connections. This requires me to install pgpool, which is overkill (since it has load balancing, replication, and more) just to make sure that connections get recycled so that I don't have 300 idle connections eating up 8GB of RAM. Relative to switching databases, I'd tend to say that, like pgbouncer and pgpool, we don't need to support that. Each user/database combo can have their own pool. While not ideal, this would be good enough for 90% of users. However, if we don't support that, we can't do any sort of pooling-ish thing without the ability to pass file descriptors between processes; and Tom seems fairly convinced there's no portable way to do that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.
On Wed, Jul 28, 2010 at 4:10 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: However, if we don't support that, we can't do any sort of pooling-ish thing without the ability to pass file descriptors between processes; and Tom seems fairly convinced there's no portable way to do that. Well, what it would come down to is: are we prepared to not support pooling on platforms without such a capability? It's certainly possible to do it on many modern platforms, but I don't believe we can make it happen everywhere. Generally we've tried to avoid having major features that don't work everywhere ... I suppose it depends on the magnitude of the benefit. And how many platforms aren't covered. And how much code is required. In short, until someone writes a patch, who knows? I think the core question we should be thinking about is what would be the cleanest method of resetting a backend - either for the same database or for a different one, whichever seems easier. And by cleanest, I mean least likely to introduce bugs. If we can get to the point where we have something to play around with, even if it's kind of kludgey or doesn't quite work, it'll give us some idea of whether further effort is worthwhile and how it should be directed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.
On Sat, Jul 24, 2010 at 2:23 AM, Craig Ringer cr...@postnewspapers.com.au wrote: On 24/07/10 01:28, Robert Haas wrote: Well, if we could change the backends so that they could fully reinitialize themselves (disconnect from a database to which they are bound, etc.), I don't see why we couldn't use the Apache approach. This would offer the bonus on the side that it'd be more practical to implement database changes for a connection, akin to MySQL's USE. Inefficient, sure, but possible. Yep. I don't care about that current limitation very much. I think anyone changing databases all the time probably has the wrong design and should be using schema. I'm sure there are times it'd be good to be able to switch databases on one connection, though. I pretty much agree with this. I think this is merely slightly nice on its own, but I think it might be a building-block to other things that we might want to do down the road. Markus Wanner's Postgres-R replication uses worker processes; autovacuum does as well; and then there's parallel query. I can't help thinking that not needing to fork a new backend every time you want to connect to a new database has got to be useful. My question with all this remains: is it worth the effort when external poolers already solve the problem. Whether it's worth the effort is something anyone who is thinking about working on this will have to decide for themselves. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.
On Tue, Jul 27, 2010 at 4:40 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Thu, Jul 22, 2010 at 5:29 PM, Andres Freund and...@anarazel.de wrote: The problem is harder for us because a backend can't switch identities once it's been assigned to a database. I haven't heard an adequate explanation of why that couldn't be changed, though. Possibly it might decrease the performance significantly enough by reducing the cache locality (syscache, prepared plans)? Those things are backend-local. The worst case scenario is you've got to flush them all when you reinitialize, in which case you still save the overhead of creating a new process. Flushing them all is not zero-cost; it's not too hard to believe that it could actually be slower than forking a clean new backend. I'm not so sure I believe that. Is a sinval overrun slower than forking a clean new backend? Is DISCARD ALL slower that forking a clean new backend? How much white space is there between either of those and what would be needed here? I guess it could be slower, but I wouldn't want to assume that without evidence. What's much worse, it's not zero-bug. We've got little bitty caches all over the backend, including (no doubt) some caching behavior in third-party code that wouldn't get the word about whatever API you invented to deal with this. I think this is probably the biggest issue with the whole idea, and I agree there would be some pain involved. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.
On Tue, Jul 27, 2010 at 9:56 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Tue, Jul 27, 2010 at 4:40 PM, Tom Lane t...@sss.pgh.pa.us wrote: Flushing them all is not zero-cost; it's not too hard to believe that it could actually be slower than forking a clean new backend. I'm not so sure I believe that. I'm not asserting it's true, just suggesting it's entirely possible. Other than the fork() cost itself and whatever authentication activity there might be, practically all the startup cost of a new backend can be seen as cache-populating activities. You'd have to redo all of that, *plus* pay the costs of getting rid of the previous cache entries. Maybe the latter costs less than a fork(), or maybe not. fork() is pretty cheap on modern Unixen. What's much worse, it's not zero-bug. I think this is probably the biggest issue with the whole idea, and I agree there would be some pain involved. Yeah, if it weren't for that I'd say sure let's try it. But I'm afraid we'd be introducing significant headaches in return for a gain that's quite speculative. I agree that the gain is minimal of itself; after all, how often do you need to switch databases, and what's the big deal if the postmaster has to fork a new backend? Where I see it as a potentially big win is when it comes to things like parallel query. I can't help thinking that's going to be a lot less efficient if you're forever forking new backends. Perhaps the point here is that you'd actually sort of like to NOT flush all those caches unless it turns out that you're switching databases - many installations probably operate with essentially one big database, so chances are good that even if you distributed connections / parallel queries to backends round-robin, you'd potentially save quite a bit of overhead. Of course, for the guy who has TWO big databases, you might hope to be a little smarter, but that's another problem altogether. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.
On Fri, Jul 23, 2010 at 11:58 AM, Hannu Krosing ha...@krosing.net wrote: On Thu, 2010-07-22 at 20:57 -0400, Robert Haas wrote: On Thu, Jul 22, 2010 at 3:15 PM, Hannu Krosing ha...@krosing.net wrote: On Thu, 2010-07-22 at 14:36 -0400, Robert Haas wrote: On Mon, Jul 12, 2010 at 6:58 AM, Craig Ringer cr...@postnewspapers.com.au wrote: So rather than asking should core have a connection pool perhaps what's needed is to ask what can an in-core pool do that an external pool cannot do? Avoid sending every connection through an extra hop. not really. in-core != magically-in-right-backend-process Well, how about if we arrange it so it IS in the right backend process? I don't believe magic is required. Do you have any design in mind, how you can make it so ? Well, if we could change the backends so that they could fully reinitialize themselves (disconnect from a database to which they are bound, etc.), I don't see why we couldn't use the Apache approach. There's a danger of memory leaks but that's why Apache has MaxRequestsPerChild, and it works pretty darn well. Of course, passing file descriptors would be even nicer (you could pass the connection off to a child that was already bound to the correct database, perhaps) but has pointed out more than once, that's not portable. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow query with planner row strange estimation
On Mon, Jul 12, 2010 at 4:33 PM, phb07 ph...@apra.asso.fr wrote: Dimitri a écrit : It's probably one of the cases when having HINTS in PostgreSQL may be very helpful.. SELECT /*+ enable_nestloop=off */ ... FROM ... will just fix this query without impacting other queries and without adding any additional instructions into the application code.. So, why there is a such resistance to implement hints withing SQL queries in PG?.. +1. Another typical case when it would be helpful is with setting the cursor_tuple_fraction GUC variable for a specific statement, without being obliged to issue 2 SET statements, one before the SELECT and the other after. We've previously discussed adding a command something like: LET (variable = value, variable = value, ...) command ...which would set those variables just for that one command. But honestly I'm not sure how much it'll help with query planner problems. Disabling nestloops altogether, even for one particular query, is often going to be a sledgehammer where you need a scalpel. But then again, a sledgehammer is better than no hammer. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.
On Mon, Jul 12, 2010 at 6:58 AM, Craig Ringer cr...@postnewspapers.com.au wrote: On 12/07/10 17:45, Matthew Wakeling wrote: I'm surprised. Doesn't apache httpd do this? Does it have to do a whole load of non-portable stuff? It seems to work on a whole load of platforms. A lot of what Apache HTTPd does is handled via the Apache Portable Runtime (APR). It contains a lot of per-platform handlers for various functionality. Apache just has all of the worker processes call accept() on the socket, and whichever one the OS hands it off to gets the job. The problem is harder for us because a backend can't switch identities once it's been assigned to a database. I haven't heard an adequate explanation of why that couldn't be changed, though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.
On Mon, Jul 12, 2010 at 6:58 AM, Craig Ringer cr...@postnewspapers.com.au wrote: So rather than asking should core have a connection pool perhaps what's needed is to ask what can an in-core pool do that an external pool cannot do? Avoid sending every connection through an extra hop. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.
On Thu, Jul 22, 2010 at 3:15 PM, Hannu Krosing ha...@krosing.net wrote: On Thu, 2010-07-22 at 14:36 -0400, Robert Haas wrote: On Mon, Jul 12, 2010 at 6:58 AM, Craig Ringer cr...@postnewspapers.com.au wrote: So rather than asking should core have a connection pool perhaps what's needed is to ask what can an in-core pool do that an external pool cannot do? Avoid sending every connection through an extra hop. not really. in-core != magically-in-right-backend-process Well, how about if we arrange it so it IS in the right backend process? I don't believe magic is required. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.
On Thu, Jul 22, 2010 at 5:29 PM, Andres Freund and...@anarazel.de wrote: The problem is harder for us because a backend can't switch identities once it's been assigned to a database. I haven't heard an adequate explanation of why that couldn't be changed, though. Possibly it might decrease the performance significantly enough by reducing the cache locality (syscache, prepared plans)? Those things are backend-local. The worst case scenario is you've got to flush them all when you reinitialize, in which case you still save the overhead of creating a new process. The best case scenario is that you can keep some of them around, in which case, great. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow query with planner row strange estimation
On Fri, Jul 9, 2010 at 6:13 AM, damien hostin damien.hos...@axege.com wrote: Have you tried running ANALYZE on the production server? You might also want to try ALTER TABLE ... SET STATISTICS to a large value on some of the join columns involved in the query. Hello, Before comparing the test case on the two machines, I run analyse on the whole and look at pg_stats table to see if change occurs for the columns. but on the production server the stats never became as good as on the desktop computer. I set statistic at 1 on column used by the join, run analyse which take a 300 row sample then look at the stats. The stats are not as good as on the desktop. Row number is nearly the same but only 1 or 2 values are found. The data are not balanced the same way on the two computer : - Desktop is 12000 rows with 6000 implicated in the query (50%), - Production (actually a dev/test server) is 6 million rows with 6000 implicated in the query (0,1%). Columns used in the query are nullable, and in the 5994000 other rows that are not implicated in the query these columns are null. I don't know if the statistic target is a % or a number of value to obtain, It's a number of values to obtain. but event set at max (1), it didn't managed to collect good stats (for this particular query). I think there's a cutoff where it won't collect values unless they occur significantly more often than the average frequency. I wonder if that might be biting you here: without the actual values in the MCV table, the join selectivity estimates probably aren't too good. As I don't know what more to do, my conclusion is that the data need to be better balanced to allow the analyse gather better stats. But if there is a way to improve the stats/query with this ugly balanced data, I'm open to it ! I hope that in real production, data will never be loaded this way. If this appened we will maybe set enable_nestloop to off, but I don't think it's a good solution, other query have a chance to get slower. Yeah, that usually works out poorly. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Need help in performance tuning.
On Fri, Jul 9, 2010 at 12:42 AM, Tom Lane t...@sss.pgh.pa.us wrote: Samuel Gendler sgend...@ideasculptor.com writes: On Thu, Jul 8, 2010 at 8:11 PM, Craig Ringer cr...@postnewspapers.com.au wrote: If you're not using a connection pool, start using one. I see this issue and subsequent advice cross this list awfully frequently. Is there in architectural reason why postgres itself cannot pool incoming connections in order to eliminate the requirement for an external pool? Perhaps not, but there's no obvious benefit either. Since there's More Than One Way To Do It, it seems more practical to keep that as a separate problem that can be solved by a choice of add-on packages. I'm not buying it. A separate connection pooler increases overhead and management complexity, and, I believe, limits our ability to implement optimizations like parallel query execution. I'm glad there are good ones available, but the fact that they're absolutely necessary for good performance in some environments is not a feature. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow query with planner row strange estimation
On Wed, Jul 7, 2010 at 10:39 AM, damien hostin damien.hos...@axege.com wrote: Hello again, At last, I check the same query with the same data on my desktop computer. Just after loading the data, the queries were slow, I launch a vaccum analyse which collect good stats on the main table, the query became quick (~200ms). Now 1classic sata disk computer is faster than our little monster server !! Have you tried running ANALYZE on the production server? You might also want to try ALTER TABLE ... SET STATISTICS to a large value on some of the join columns involved in the query. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [Slony1-general] [PERFORM] WAL partition overloaded--by autovacuum?
On Wed, Jul 7, 2010 at 3:32 PM, Richard Yen rich...@iparadigms.com wrote: On Jul 6, 2010, at 8:25 PM, Scott Marlowe wrote: Tell us what you can about your hardware setup. Sorry, I made the bad assumption that the hardware setup would be irrelevant--dunno why I thought that. My hardware setup is 2 FusionIO 160GB drives in a RAID-1 configuration, running on an HP DL360 G5 I think I figured out the problem: -- I figured that pg_xlog and data/base could both be on the FusionIO drive, since there would be no latency when there are no spindles. -- However, I didn't take into account the fact that pg_xlog might grow in size when autovacuum does its work when vacuuming to prevent XID wraparound. I *just* discovered this when one of my other replication nodes decided to die on me and fill up its disk. -- Unfortunately, my db is 114GB (including indexes) or 60GB (without indexes), leaving ~37GB for pg_xlog (since they are sharing a partition). So I'm guessing what happened was that when autovacuum ran to prevent XID wraparound, it takes each table and changes the XID, and it gets recorded in WAL, causing WAL to bloat. This this the correct understanding? That seems logical (and un-fun), but I don't understand how you managed to fill up 37GB of disk with WAL files. Every time you fill up checkpoint_segments * 16MB of WAL files, you ought to get a checkpoint. When it's complete, WAL segments completely written before the start of the checkpoint should be recyclable. Unless I'm confused, which apparently I am. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] big data - slow select (speech search)
On Wed, Jul 7, 2010 at 9:31 AM, Michal Fapso michal.fa...@gmail.com wrote: thank you for your help. I tried to cluster the table on hyps_wordid_index and the query execution time dropped from 4.43 to 0.19 seconds which is not that far from Lucene's performance of 0.10 second. Dang. Nice! -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Highly Efficient Custom Sorting
On Sat, Jul 3, 2010 at 4:17 PM, Eliot Gable egable+pgsql-performa...@gmail.com wrote: Read RFC 2782 on random weighted load balancing of SRV records inside DNS. It may be asking a bit much to expect people here to read an RFC to figure out how to help you solve this problem, but... I've looked through the documentation on how to re-write this in C, but I cannot seem to find good documentation on working with the input array (which is an array of a complex type). I also don't see good documentation for working with the complex type. I found stuff that talks about constructing a complex type in C and returning it. However, I'm not sure how to take an input complex type and deconstruct it into something I can work with in C. Also, the memory context management stuff is not entirely clear. ...there's no question that writing things in C is a lot more work, and takes some getting used to. Still, it's fast, so maybe worth it, especially since you already know C++, and will therefore mostly just need to learn the PostgreSQL coding conventions. The best thing to do is probably to look at some of the existing examples within the backend code. Most of the datatype code is in src/backend/utils/adt. You might want to look at arrayfuncs.c (perhaps array_ref() or array_map()); and also rowtypes.c (perhaps record_cmp()). Specifically, how do I go about preserving the pointers to the data that I allocate in multi-call memory context so that they still point to the data on the next call to the function for the next result row? Am I supposed to set up some global variables to do that, or am I supposed to take a different approach? If I need to use global variables, then how do I deal with concurrency? Global variables would be a bad idea, not so much because of concurrency as because they won't get cleaned up properly. Again, the best thing to do is to look at existing examples, like array_unnest() in src/backend/utils/adt/arrayfuncs.c; the short answer is that you probably want to compute all your results on the first call and stash them in the FuncCallContext (funcctx-user_fctx); and then on subsequent calls just return one row per call. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Two equivalent WITH RECURSIVE queries, one of them slow.
On Mon, Jul 5, 2010 at 2:07 AM, Octavio Alvarez alvar...@alvarezp.ods.org wrote: Hello. I have a tree-like table with a three-field PK (name, date, id) and one parent field. It has 5k to 6k records as of now, but it will hold about 1 million records. I am trying the following WITH RECURSIVE query: WITH RECURSIVE t AS ( SELECT par.id AS tid, par.name, par.date, par.id, par.text, par.h_title, par.h_name, par.parent FROM _books.par UNION SELECT t.tid AS pid, p.name, p.date, p.id, p.text, p.h_title, p.h_name, p.parent FROM t, _books.par p WHERE p.name = t.name AND p.date = t.date AND t.id = p.parent ) SELECT t.tid, t.name, t.date, t.id, t.text, t.h_title, t.h_name, t.parent FROM t WHERE name = 'cfx' AND date = '2009-08-19' AND tid = '28340'; ... which takes 2547.503 ms However, if I try the same query but adding the same WHERE clause to the non-recursive term, I get much better results. WITH RECURSIVE t AS ( SELECT par.id AS tid, par.name, par.date, par.id, par.text, par.h_title, par.h_name, par.parent FROM _books.par WHERE name = 'cfx' AND date = '2009-08-19' AND par.id = '28340' UNION SELECT t.tid AS pid, p.name, p.date, p.id, p.text, p.h_title, p.h_name, p.parent FROM t, _books.par p WHERE p.name = t.name AND p.date = t.date AND t.id = p.parent ) SELECT t.tid, t.name, t.date, t.id, t.text, t.h_title, t.h_name, t.parent FROM t WHERE name = 'cfx' AND date = '2009-08-19' AND tid = '28340'; ... which takes 0.221 ms I am being forced to use the slow query because I want to define it as a view, leaving the WHERE clause to the application. I think this is just a limitation of the optimizer. Recursive queries are a relatively new feature and the optimizer doesn't know a whole lot about how to deal with them. That may get improved at some point, but the optimization you're hoping for here is pretty tricky. In order to push the WHERE clauses down into the non-recursive term, the optimizer would need to prove that this doesn't change the final results. I think that's possible here because it so happens that your recursive term only generates results that have the same name, date, and tid as some existing result, but with a slightly different recursive query that wouldn't be true, so you'd need to make the code pretty smart to work this one out. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Question about partitioned query behavior
On Tue, Jul 6, 2010 at 12:30 PM, Ranga Gopalan ranga_gopa...@hotmail.com wrote: It seems that this is an issue faced by others as well - Please see this link: http://stackoverflow.com/questions/2236776/efficient-querying-of-multi-partition-postgres-table Is this a known bug? Is this something that someone is working on or is there a known work around? I think that we know this problem exists, but I'm not aware anyone is working on fixing it. There is a Merge Append patch floating around out there that I think might help with this, but AFAICS it was last updated on July 5, 2009, and still needed some more work at that time. Since this is an all-volunteer effort, complicated problems like this don't always get fixed as fast as we'd like; most of us have to spend most of our time on whatever it is that our employer pays us to do. Of course if you're in a position to sponsor a developer there are a number of companies that will be happy to work with you. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] big data - slow select (speech search)
On Thu, Jul 1, 2010 at 6:34 PM, Michal Fapso michal.fa...@gmail.com wrote: It took about 4.5 seconds. If I rerun it, it takes less than 2 miliseconds, but it is because of the cache. I need to optimize the first-run. laptop ASUS, CPU dual core T2300 1.66GHz, 1.5G RAM EXPLAIN ANALYZE SELECT h1.docid FROM hyps AS h1 WHERE h1.wordid=65658; Bitmap Heap Scan on hyps h1 (cost=10.97..677.09 rows=171 width=4) (actual time=62.106..4416.864 rows=343 loops=1) Recheck Cond: (wordid = 65658) - Bitmap Index Scan on hyps_wordid_index (cost=0.00..10.92 rows=171 width=0) (actual time=42.969..42.969 rows=343 loops=1) Index Cond: (wordid = 65658) Total runtime: 4432.015 ms If I run the same query in Lucene search engine, it takes 0.105 seconds on the same data which is quite a huge difference. So PostgreSQL is reading 343 rows from disk in 4432 ms, or about 12 ms/row. I'm not an expert on seek times, but that might not really be that unreasonable, considering that those rows may be scattered all over the index and thus it may be basically random I/O. Have you tried clustering hyps on hyps_wordid_index? If you had a more sophisticated disk subsystem you could try increasing effective_io_concurrency but that's not going to help with only one spindle. If you run the same query in Lucene and it takes only 0.105 s, then Lucene is obviously doing a lot less I/O. I doubt that any amount of tuning of your existing schema is going to produce that kind of result on PostgreSQL. Using the full-text search stuff, or a gin index of some kind, might get you closer, but it's hard to beat a special-purpose engine that implements exactly the right algorithm for your use case. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance issues with postgresql-8.4.0: Query gets stuck sometimes
On Fri, Jul 2, 2010 at 1:40 AM, Sachin Kumar sachin...@globallogic.com wrote: At times we have observed that postgres stops responding for several minutes, even couldn’t fetch the number of entries in a particular table. One such instance happens when we execute the following steps: Sounds sort of like a checkpoint spike. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Low perfomance SUM and Group by large databse
On Tue, Jun 29, 2010 at 7:59 AM, Sergio Charpinel Jr. sergiocharpi...@gmail.com wrote: One more question about two specifics query behavior: If I add AND (ip_dst = x.x.x.x), it uses another plan and take a much more time. In both of them, I'm using WHERE clause. Why this behavior? With either query, the planner is choosing to scan backward through the acct_2010_26_pk index to get the rows in descending order by the bytes column. It keeps scanning until it finds 50 rows that match the WHERE clause. With just the critieria on stamp_inserted, matches are pretty common, so it doesn't have to scan very far before finding 50 suitable rows. But when you add the ip_dst = 'x.x.x.x' criterion, suddenly a much smaller percentage of the rows match and so it has to read much further into the index before it finds 50 that do. A second index on just the ip_dst column might help a lot - then it could consider index-scanning for the matching rows and sorting them afterwards. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] No hash join across partitioned tables?
On Fri, Jul 2, 2010 at 4:58 PM, Tom Lane t...@sss.pgh.pa.us wrote: Bruce Momjian br...@momjian.us writes: I am surprised there is no documentation update requirement for this. Somebody put something about it in the docs a few days ago, IIRC. That was me. http://archives.postgresql.org/pgsql-committers/2010-06/msg00144.php -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL as a local in-memory cache
On Mon, Jun 28, 2010 at 5:57 PM, Bruce Momjian br...@momjian.us wrote: The patch also documents that synchronous_commit = false has potential committed transaction loss from a database crash (as well as an OS crash). Is this actually true? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL as a local in-memory cache
On Tue, Jun 29, 2010 at 9:32 AM, Bruce Momjian br...@momjian.us wrote: Robert Haas wrote: On Mon, Jun 28, 2010 at 5:57 PM, Bruce Momjian br...@momjian.us wrote: The patch also documents that synchronous_commit = false has potential committed transaction loss from a database crash (as well as an OS crash). Is this actually true? I asked on IRC and was told it is true, and looking at the C code it looks true. What synchronous_commit = false does is to delay writing the wal buffers to disk and fsyncing them, not just fsync, which is where the commit loss due to db process crash comes from. Ah, I see. Thanks. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] WAL+Os on a single disk
On Thu, Jun 24, 2010 at 10:55 AM, Anj Adu fotogra...@gmail.com wrote: What would you recommend to do a quick test for this? (i.e WAL on internal disk vs WALon the 12 disk raid array )? Maybe just pgbench? http://archives.postgresql.org/pgsql-performance/2010-06/msg00223.php -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL as a local in-memory cache
On Thu, Jun 24, 2010 at 4:40 AM, Rob Wultsch wult...@gmail.com wrote: On Fri, Jun 18, 2010 at 1:55 PM, Josh Berkus j...@agliodbs.com wrote: It must be a setting, not a version. For instance suppose you have a session table for your website and a users table. - Having ACID on the users table is of course a must ; - for the sessions table you can drop the D You're trying to solve a different use-case than the one I am. Your use-case will be solved by global temporary tables. I suggest that you give Robert Haas some help feedback on that. My use case is people using PostgreSQL as a cache, or relying entirely on replication for durability. Is he? Wouldn't a global temporary table have content that is not visible between db connections? A db session many not be the same as a user session. I'm planning to implement global temporary tables, which can have different contents for each user session. And I'm also planning to implement unlogged tables, which have the same contents for all sessions but are not WAL-logged (and are truncated on startup). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] requested shared memory size overflows size_t
On Thu, Jun 24, 2010 at 7:19 PM, Alvaro Herrera alvhe...@commandprompt.com wrote: Excerpts from Craig James's message of jue jun 24 19:03:00 -0400 2010: select relname, pg_relation_size(relname) from pg_class where pg_get_userbyid(relowner) = 'emol_warehouse_1' and relname not like 'pg_%' order by pg_relation_size(relname) desc; ERROR: relation rownum_temp does not exist emol_warehouse_1= select relname from pg_class where relname = 'rownum_temp'; relname -- rownum_temp (1 row) What's the full row? I'd just add a WHERE relkind = 'r' to the above query anyway. Yeah - also, it would probably be good to call pg_relation_size on pg_class.oid rather than pg_class.relname, to avoid any chance of confusion over which objects are in which schema. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Aggressive autovacuuming ?
On Sun, Jun 20, 2010 at 4:13 PM, Scott Marlowe scott.marl...@gmail.com wrote: The largest consequence I can see at the moment is that when I get a full vacuum (for preventing transaction-id wraparound) it would be I assume you mean the automatic database wide vacuum. I don't think 8.4 and above need that anymore. I thnk 8.3 does that too, but I'm not 100% sure. 8.4 (and 9.0) do still need to do vacuums to freeze tuples before transaction ID wraparound occurs. This is not to be confused with VACUUM FULL, which is something else altogether. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Aggressive autovacuuming ?
On Wed, Jun 23, 2010 at 2:20 PM, Scott Marlowe scott.marl...@gmail.com wrote: On Wed, Jun 23, 2010 at 1:58 PM, Robert Haas robertmh...@gmail.com wrote: On Sun, Jun 20, 2010 at 4:13 PM, Scott Marlowe scott.marl...@gmail.com wrote: The largest consequence I can see at the moment is that when I get a full vacuum (for preventing transaction-id wraparound) it would be I assume you mean the automatic database wide vacuum. I don't think 8.4 and above need that anymore. I thnk 8.3 does that too, but I'm not 100% sure. 8.4 (and 9.0) do still need to do vacuums to freeze tuples before transaction ID wraparound occurs. This is not to be confused with VACUUM FULL, which is something else altogether. My point was that modern pgsql doesn't need db wide vacuum to prevent wrap around anymore, but can vacuum individual relations to prevent wraparound. Oh, I see. I didn't realize we used to do that. Looks like that change was committed 11/5/2006. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL as a local in-memory cache
On Wed, Jun 23, 2010 at 3:37 PM, Bruce Momjian br...@momjian.us wrote: Tom Lane wrote: Dimitri Fontaine dfonta...@hi-media.com writes: Josh Berkus j...@agliodbs.com writes: a) Eliminate WAL logging entirely If we elimiate WAL logging, that means a reinstall is required for even a postmaster crash, which is a new non-durable behavior. Also, we just added wal_level = minimal, which might end up being a poor name choice of we want wal_level = off in PG 9.1. Perhaps we should have used wal_level = crash_safe in 9.0. I have added the following TODO: Consider a non-crash-safe wal_level that eliminates WAL activity * http://archives.postgresql.org/pgsql-performance/2010-06/msg00300.php I don't think we need a system-wide setting for that. I believe that the unlogged tables I'm working on will handle that case. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL as a local in-memory cache
On Thu, Jun 17, 2010 at 1:29 PM, Josh Berkus j...@agliodbs.com wrote: a) Eliminate WAL logging entirely In addition to global temporary tables, I am also planning to implement unlogged tables, which are, precisely, tables for which no WAL is written. On restart, any such tables will be truncated. That should give you the ability to do this (by making all your tables unlogged). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance