Re: [PERFORM] Sort and index
On Sun, 24 Apr 2005 17:01:46 -0500, "Jim C. Nasby" <[EMAIL PROTECTED]> wrote: >> >> Feel free to propose better cost equations. I did. More than once. >estimated index scan cost for (project_id, id, date) is >0.00..100117429.34 while the estimate for work_units is >0.00..103168408.62; almost no difference, ~3% > even though project_id correlation is .657 This is divided by the number of index columns, so the index correlation is estimated to be 0.219. > while work_units correlation is .116. So csquared is 0.048 and 0.013, respectively, and you get a result not far away from the upper bound in both cases. The cost estimations differ by only 3.5% of (max_IO_cost - min_IO_cost). >you'll see that the cost of the index scan is way overestimated. Looking >at the code, the runcost is calculated as > >run_cost += max_IO_cost + csquared * (min_IO_cost - max_IO_cost); > >where csquared is indexCorrelation^2. Why is indexCorrelation squared? >The comments say a linear interpolation between min_IO and max_IO is >used, but ISTM that if it was linear then instead of csquared, >indexCorrelation would just be used. In my tests I got much more plausible results with 1 - (1 - abs(correlation))^2 Jim, are you willing to experiment with one or two small patches of mine? What version of Postgres are you running? Servus Manfred ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Sort and index
On Wed, 11 May 2005 16:15:16 -0500, "Jim C. Nasby" <[EMAIL PROTECTED]> wrote: >> This is divided by the number of index columns, so the index correlation >> is estimated to be 0.219. > >That seems like a pretty bad assumption to make. Any assumption we make without looking at entire index tuples has to be bad. A new GUC variable secondary_correlation introduced by my patch at least gives you a chance to manually control the effects of additional index columns. >> In my tests I got much more plausible results with >> >> 1 - (1 - abs(correlation))^2 > >What's the theory behind that? The same as for csquared -- pure intuition. But the numbers presented in http://archives.postgresql.org/pgsql-hackers/2002-10/msg00072.php seem to imply that in this case my intiution is better ;-) Actually above formula was not proposed in that mail. AFAIR it gives results between p2 and p3. >And I'd still like to know why correlation squared is used. On Wed, 02 Oct 2002 18:48:49 -0400, Tom Lane <[EMAIL PROTECTED]> wrote: |The indexCorrelation^2 algorithm was only a quick hack with no theory |behind it :-(. >It depends on the patches, since this is a production machine. Currently >it's running 7.4.*mumble*, The patch referenced in http://archives.postgresql.org/pgsql-hackers/2003-08/msg00931.php is still available. It doesn't touch too many places and should be easy to review. I'm using it and its predecessors in production for more than two years. Let me know, if the 74b1 version does not apply cleanly to your source tree. Servus Manfred ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] slow queries, possibly disk io
>On 5/31/05, Martin Fandel <[EMAIL PROTECTED]> wrote: >> In the documentation of >> http://www.powerpostgresql.com/Downloads/annotated_conf_80.html >> is the shared_buffers set to 1/3 of the availble RAM. Well, it says "you should never use more than 1/3 of your available RAM" which is not quite the same as "it is set to 1/3." I'd even say, never set it higher than 1/10 of your available RAM, unless you know what you're doing and why you're doing it. Servus Manfred ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Odd Locking Problem
On Thu, 11 Aug 2005 16:11:58 -0500, John A Meinel <[EMAIL PROTECTED]> wrote: >the insert is occurring into table 'a' not table 'b'. >'a' refers to other tables, but these should not be modified. So your "a" is Alvaro's "b", and one of your referenced tables is Alvaro's "a". This is further supported by the fact that the problem doesn't occur with 8.1. Servus Manfred ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Insert performance (OT?)
On Tue, 19 Jul 2005 11:51:51 +0100, Richard Huxton wrote: >You could get away with one query if you converted them to left-joins: >INSERT INTO ... >SELECT * FROM upload LEFT JOIN ... WHERE f3 IS NULL >UNION >SELECT * FROM upload LEFT JOIN ... WHERE f4 IS NULL For the archives: This won't work. Each of the two SELECTs eliminates rows violating one of the two constraints but includes rows violating the other constraint. After the UNION you are back to violating both constraints :-( Servus Manfred ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] LEFT JOIN optimization
On Mon, 12 Sep 2005 00:47:57 +0300, Ksenia Marasanova <[EMAIL PROTECTED]> wrote: > -> Seq Scan on user_ (cost=0.00..7430.63 rows=12763 width=245) >(actual time=360.431..1120.012 rows=12763 loops=1) If 12000 rows of the given size are stored in more than 7000 pages, then there is a lot of free space in these pages. Try VACUUM FULL ... Servus Manfred ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Queries taking ages in PG 8.1, have been much faster in PG<=8.0
On Mon, 05 Dec 2005 10:11:41 -0500, Tom Lane <[EMAIL PROTECTED]> wrote: >> Correlation -0.0736492 >> Correlation -0.237136 >That has considerable impact on the >estimated cost of an indexscan The cost estimator uses correlationsquared. So all correlations between -0.3 and +0.3 can be considered equal under the assumption that estimation errors of up to 10% are acceptable. Servus Manfred ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Joining 2 tables with 300 million rows
On Thu, 8 Dec 2005 11:59:24 -0500 , Amit V Shah <[EMAIL PROTECTED]> wrote: > CONSTRAINT pk_runresult_has_catalogtable PRIMARY KEY >(runresult_id_runresult, catalogtable_id_catalogtable, value) >' -> Index Scan using runresult_has_catalogtable_id_runresult >on runresult_has_catalogtable runresult_has_catalogtable_1 >(cost=0.00..76.65 rows=41 width=8) (actual time=0.015..0.017 rows=1 >loops=30)' >'Index Cond: >(runresult_has_catalogtable_1.runresult_id_runresult = >"outer".runresult_id_runresult)' >'Filter: ((catalogtable_id_catalogtable = 54) AND (value >= 1))' If I were the planner, I'd use the primary key index. You seem to have a redundant(?) index on runresult_has_catalogtable(runresult_id_runresult). Dropping it might help, or it might make things much worse. But at this stage this is pure speculation. Give us more information first. Show us the complete definition (including *all* indices) of all tables occurring in your query. What Postgres version is this? And please post EXPLAIN ANALYSE output of a *slow* query. Servus Manfred ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] index v. seqscan for certain values
On Tue, 13 Apr 2004 13:55:49 -0400, Tom Lane <[EMAIL PROTECTED]> wrote: >Possibly the >nonuniform clumping of CID has something to do with the poor results. It shouldn't. The sampling algorithm is designed to give each tuple the same chance of ending up in the sample, and tuples are selected independently. (IOW each one of the {N \chooose n} possible samples has the same probability.) There are known problems with nonuniform distribution of dead vs. live and large vs. small tuples, but AFAICS the order of values does not matter. Servus Manfred ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] query slows down with more accurate stats
[Just a quick note here; a more thorough discussion of my test results will be posted to -hackers] On Tue, 13 Apr 2004 15:18:42 -0400, Tom Lane <[EMAIL PROTECTED]> wrote: >Well, the first problem is why is ANALYZE's estimate of the total row >count so bad :-( ? I suspect you are running into the situation where >the initial pages of the table are thinly populated and ANALYZE >mistakenly assumes the rest are too. Manfred is working on a revised >sampling method for ANALYZE that should fix this problem The new method looks very promising with respect to row count estimation: I got estimation errors of +/- 1% where the old method was off by up to 60%. (My test methods might be a bit biased though :-)) My biggest concern at the moment is that the new sampling method violates the contract of returning each possible sample with he same probability: getting several tuples from the same page is more likely than with the old method. Servus Manfred ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] query slows down with more accurate stats
On Thu, 15 Apr 2004 20:18:49 -0400, Tom Lane <[EMAIL PROTECTED]> wrote: >> getting several tuples from the same page is more likely >> than with the old method. > >Hm, are you sure? Almost sure. Let's look at a corner case: What is the probability of getting a sample with no two tuples from the same page? To simplify the problem assume that each page contains the same number of tuples c. If the number of pages is B and the sample size is n, a perfect sampling method collects a sample where all tuples come from different pages with probability (in OpenOffice.org syntax): p = prod from{i = 0} to{n - 1} {{c(B - i)} over {cB - i}} or in C: p = 1.0; for (i = 0; i < n; ++i) p *= c*(B - i) / (c*B - i) This probability grows with increasing B. >Also, I'm not at all sure that the old method satisfies that constraint >completely in the presence of nonuniform numbers of tuples per page, >so we'd not necessarily be going backwards anyhow ... Yes, it boils down to a decision whether we want to replace one not quite perfect sampling method with another not quite perfect method. I'm still working on putting together the pros and cons ... Servus Manfred ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] query slows down with more accurate stats
On Fri, 16 Apr 2004 10:34:49 -0400, Tom Lane <[EMAIL PROTECTED]> wrote: >> p = prod from{i = 0} to{n - 1} {{c(B - i)} over {cB - i}} > >So? You haven't proven that either sampling method fails to do the >same. On the contrary, I believe that above formula is more or less valid for both methods. The point is in what I said next: | This probability grows with increasing B. For the one-stage sampling method B is the number of pages of the whole table. With two-stage sampling we have to use n instead of B and get a smaller probability (for n < B, of course). So this merely shows that the two sampling methods are not equivalent. >The desired property can also be phrased as "every tuple should be >equally likely to be included in the final sample". Only at first sight. You really expect more from random sampling. Otherwise I'd just put one random tuple and its n - 1 successors (modulo N) into the sample. This satisfies your condition but you wouldn't call it a random sample. Random sampling is more like "every possible sample is equally likely to be collected", and two-stage sampling doesn't satisfy this condition. But if in your opinion the difference is not significant, I'll stop complaining against my own idea. Is there anybody else who cares? >You could argue that a tuple on a heavily populated page is >statistically likely to see a higher T when it's part of the page sample >pool than a tuple on a near-empty page is likely to see, and therefore >there is some bias against selection of the former tuple. But given a >sample over a reasonably large number of pages, the contribution of any >one page to T should be fairly small and so this effect ought to be >small. It is even better: Storing a certain number of tuples on heavily populated pages takes less pages than to store them on sparsely populated pages (due to tuple size or to dead tuples). So heavily populated pages are less likely to be selected in stage one, and this exactly offsets the effect of increasing T. >So I think this method is effectively unbiased at the tuple level. Servus Manfred ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Setting Shared Buffers , Effective Cache, Sort Mem Parameters
On Thu, 22 Apr 2004 13:51:42 -0400, Pallav Kalva <[EMAIL PROTECTED]> wrote: >I need some help with setting these parameters (shared buffers, >effective cache, sort mem) in the pg_conf file. It really depends on the kind of queries you intend to run, the number of concurrent active connections, the size of the working set (active part of the database), what else is running on the machine, and and and ... Setting shared_buffers to 1, effective_cache_size to 40 (80% of installed RAM), and sort_mem to a few thousand might be a good start. > Also can anyone explain >the difference between shared buffers and effective cache , how these >are allocated in the main memory (the docs are not clear on this). Shared_buffers directly controls how many pages are allocated as internal cache. Effective_cache_size doesn't allocate anything, it is just a hint to the planner how much cache is available on the system level. Servus Manfred ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Setting Shared Buffers , Effective Cache, Sort Mem Parameters
On Fri, 23 Apr 2004 10:20:10 -0400, Pallav Kalva <[EMAIL PROTECTED]> wrote: > the database sizes is around 2- 4 gig and >there are 5 of them. this machine is > mainly for the databases and nothing is running on them. Did I understand correctly that you run (or plan to run) five postmasters? Is there a special reason that you cannot put all your tables into one database? >setting shared buffers to 1 allocates (81Mb) and effective >cache to 40 would be around (3gig) >does this means that if all of the 81mb of the shared memory gets >allocated it will use rest from the effective >cache of (3g-81mb) ? Simply said, if Postgres wants to access a block, it first looks whether this block is already in shared buffers which should be the case, if the block is one of the last 1 blocks accessed. Otherwise the block has to be read in. If the OS has the block in its cache, reading it is just a (fast) memory operation, else it involves a (slow) physical disk read. The number of database pages residing in the OS cache is totally out of control of Postgres. Effective_cache_size tells the query planner how many database pages can be *expected* to be present in the OS cache. >increasing the shared buffers space to 2g Setting shared_buffers to half your available memory is the worst thing you can do. You would end up caching exactly the same set of blocks in the internal buffers and in the OS cache, thus effectively making one of the caches useless. Better keep shared_buffers low and let the OS do its job. Servus Manfred ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Why will vacuum not end?
On Sat, 24 Apr 2004 10:45:40 -0400, "Shea,Dan [CIS]" <[EMAIL PROTECTED]> wrote: >[...] 87 GB table with a 39 GB index? >The vacuum keeps redoing the index, but there is no indication as to why it >is doing this. If VACUUM finds a dead tuple, if does not immediately remove index entries pointing to that tuple. It instead collects such tuple ids and later does a bulk delete, i.e. scans the whole index and removes all index items pointing to one of those tuples. The number of tuple ids that can be remembered is controlled by vacuum_mem: it is VacuumMem * 1024 / 6 Whenever this number of dead tuples has been found, VACUUM scans the index (which takes ca. 6 seconds, more than 16 hours), empties the list and continues to scan the heap ... >From the number of dead tuples you can estimate how often your index will be scanned. If dead tuples are evenly distributed, expect there to be 15 index scans with your current vacuum_mem setting of 196608. So your VACUUM will run for 11 days :-( OTOH this would mean that there are 500 million dead tuples. Do you think this is possible? Servus Manfred ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Why will vacuum not end?
On Sat, 24 Apr 2004 15:48:19 -0400, "Shea,Dan [CIS]" <[EMAIL PROTECTED]> wrote: >Manfred is indicating the reason it is taking so long is due to the number >of dead tuples in my index and the vacuum_mem setting. Not dead tuples in the index, but dead tuples in the table. >The last delete that I did before starting a vacuum had 219,177,133 >deletions. Ok, with vacuum_mem = 196608 the bulk delete batch size is ca. 33.5 M tuple ids. 219 M dead tuples will cause 7 index scans. The time for an index scan is more or less constant, 6 seconds in your case. So yes, a larger vacuum_mem will help, but only if you really have as much *free* memory. Forcing the machine into swapping would make things worse. BTW, VACUUM frees millions of index pages, is your FSM large enough? Servus Manfred ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Why will vacuum not end?
On Sat, 24 Apr 2004 15:58:08 -0400, "Shea,Dan [CIS]" <[EMAIL PROTECTED]> wrote: >There were defintely 219,177,133 deletions. >The deletions are most likely from the beginning, it was based on the >reception_time of the data. >I would rather not use re-index, unless it is faster then using vacuum. I don't know whether it would be faster. But if you decide to reindex, make sure sort_mem is *huge*! >What do you think would be the best way to get around this? >Increase vacuum_mem to a higher amount 1.5 to 2 GB or try a re-index (rather >not re-index so that data can be queried without soing a seqscan). Just out of curiosity: What kind of machine is this running on? And how long does a seq scan take? >Once the index is cleaned up, how does vacuum handle the table? If you are lucky VACUUM frees half the index pages. And if we assume that the most time spent scanning an index goes into random page accesses, future VACUUMs will take "only" 3 seconds per index scan. Servus Manfred ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[PERFORM] Number of pages in a random sample (was: query slows down with more accurate stats)
On Mon, 19 Apr 2004 12:00:10 -0400, Tom Lane <[EMAIL PROTECTED]> wrote: >A possible compromise is to limit the number of pages sampled to >something a bit larger than n, perhaps 2n or 3n. I don't have a feeling >for the shape of the different-pages probability function; would this >make a significant difference, or would it just waste cycles? I would have replied earlier, if I had a good answer. What I have so far contains at least one, probably two flaws. Knowing not much more than the four basic arithmetic operations I was not able to improve my model. So I post what I have: As usual we assume a constant number c of tuples per page. If we have a table of size B pages and want to collect a sample of n tuples, the number of possible samples is (again in OOo syntax) left( binom{cB}{n} right) If we select an arbitrary page, the number of possible samples that do NOT contain any tuple from this page is left( binom {c (B-1)} {n} right) Let's forget about our actual implementations of sampling methods and pretend we have a perfect random sampling method. So the probability Pnot(c, B, n) that a certain page is not represented in a random sample is left( binom {c (B-1)} {n} right) over left( binom{cB}{n} right) which can be transformed into the more computing-friendly form prod from{i=0} to{n-1} {{cB-c - i} over {cB - i}} Clearly the probability that a certain page *is* represented in a sample is Pyes(c, B, n) = 1 - Pnot(c, B, n) The next step assumes that these probabilities are independent for different pages, which in reality they are not. We simply estimate the number of pages represented in a random sample as numPag(c, B, n) = B * Pyes(c, B, n) Here are some results for n = 3000: B \ c->10 | 100 | 200 ---+---+---+--- 100 | --- | 100 | 100 1000 | 972 | 953 | 951 2000 | 1606 | 1559 | 1556 3000 | 1954 | 1902 | 1899 6000 | 2408 | 2366 | 2363 9000 | 2588 | 2555 | 2553 2 | 2805 | 2788 | 2787 3 | 2869 | 2856 | 2856 10 | 2960 | 2956 | 2956 This doesn't look to depend heavily on the number of tuples per page, which sort of justifies the assumption that c is constant. In the next step I tried to estimate the number of pages that contain exactly 1, 2, ... tuples of the sample. My naive procedure works as follows (I'm not sure whether it is even valid as a rough approximation, constructive criticism is very welcome): For c=100, B=3000, n=3000 we expect 1902 pages to contain at least 1 tuple of the sample. There are 1098 more tuples than pages, these tuples lie somewhere in those 1902 pages from the first step. numPag(99, 1902, 1098) = 836 pages contain at least a second tuple. So the number of pages containing exactly 1 tuple is 1902 - 836 = 1066. Repeating these steps we get 611 pages with 2 tuples, 192 with 3, 30 with 4, and 3 pages with 5 tuples. Here are some more numbers for c = 100 and n = 3000: B | pages with 1, 2, ... tuples ---+ 100 | 1 to 24 tuples: 0, then 1, 2, 4, 10, 18, 26, 24, 11, 4 1000 | 108, 201, 268, 229, 113, 29, 5 2000 | 616, 555, 292, 83, 12, 1 3000 | 1066, 611, 192, 30, 3 6000 | 1809, 484, 68, 5 9000 | 2146, 374, 32, 2 2 | 2584, 196, 8 3 | 2716, 138, 3 10 | 2912, 44 A small C program to experimentally confirm or refute these calculations is attached. Its results are fairly compatible with above numbers, IMHO. Servus Manfred /* ** samsim.c - sampling simulator */ #include #include #include #include typedef int bool; #define MAX_RANDOM_VALUE (0x7FFF) static void initrandom() { struct timeval tv; gettimeofday(&tv, NULL); srandom(tv.tv_sec ^ tv.tv_usec); }/*initrandom*/ /* Select a random value R uniformly distributed in 0 < R < 1 */ static double random_fract(void) { longz; /* random() can produce endpoint values, try again if so */ do { z = random(); } while (z <= 0 || z >= MAX_RANDOM_VALUE); return (double) z / (double) MAX_RANDOM_VALUE; } /* ** data structure for (modified) Algorithm S from Knuth 3.4.2 */ typedef struct { longN; /* number of tuples, known in advance */ int n; /* sample size */ longt; /* current tuple number */ int m; /* tuples selected so far */ } SamplerData; typedef SamplerData *Sampler; static void Sampler_Init(Sampler bs, long N, int samplesize); static bool Sampler_HasMore(Sampler bs); static long Sampler_Next(Sampler bs); /* **
Re: [PERFORM] Why will vacuum not end?
On Sun, 25 Apr 2004 09:05:11 -0400, "Shea,Dan [CIS]" <[EMAIL PROTECTED]> wrote: >It is set at max_fsm_pages = 150 . This might be too low. Your index has ca. 5 M pages, you are going to delete half of its entries, and what you delete is a contiguous range of values. So up to 2.5 M index pages might be freed (minus inner nodes and pages not completely empty). And there will be lots of free heap pages too ... I wrote: >If you are lucky VACUUM frees half the index pages. And if we assume >that the most time spent scanning an index goes into random page >accesses, future VACUUMs will take "only" 3 seconds per index scan. After a closer look at the code and after having slept over it I'm not so sure any more that the number of tuple ids to be removed has only minor influence on the time spent for a bulk delete run. After the current VACUUM has finished would you be so kind to run another VACUUM VERBOSE with only a few dead tuples and post the results here? Servus Manfred ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] analyzer/planner and clustered rows
On Thu, 29 Apr 2004 19:09:09 -0400, Joseph Shraibman <[EMAIL PROTECTED]> wrote: >How does the analyzer/planner deal with rows clustered together? There's a correlation value per column. Just try SELECT attname, correlation FROM pg_stats WHERE tablename = '...'; if you are interested. It indicates how well the hypothetical order of tuples if sorted by that column corresponds to the physical order. +1.0 is perfect correlation, 0.0 is totally chaotic, -1.0 means reverse order. The optimizer is more willing to choose an index scan if correlation for the first index column is near +/-1. > What if the data in the table happens to be close >together because it was inserted together originally? Having equal values close to each other is not enough, the values should be increasing, too. Compare 5 5 5 4 4 4 7 7 7 2 2 2 6 6 6 3 3 3 8 8 8 low correlation and 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 correlation = 1.0 ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] planner/optimizer question
On Fri, 30 Apr 2004 19:46:24 +0200, Jochem van Dieten <[EMAIL PROTECTED]> wrote: >> While the storage overhead could be reduced to 1 bit (not a joke) > >You mean adding an isLossy bit and only where it is set the head >tuple has to be checked for visibility, if it is not set the head >tuple does not have to be checked? Yes, something like this. Actually I imagined it the other way round: a visible-to-all flag similar to the existing dead-to-all flag (search for LP_DELETE and ItemIdDeleted in nbtree.c). >> we'd >> still have the I/O overhead of locating and updating index tuples for >> every heap tuple deleted/updated. > >Would there be additional I/O for the additional bit in the index >tuple (I am unable to find the layout of index tuple headers in >the docs)? Yes, the visible-to-all flag would be set as a by-product of an index scan, if the heap tuple is found to be visible to all active transactions. This update is non-critical and, I think, not very expensive. Deleting (and hence updating) a tuple is more critical, regarding both consistency and performance. We'd have to locate all index entries pointing to the heap tuple and set their visible-to-all flags to false. ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] planner/optimizer question
On Sat, 01 May 2004 13:18:04 +0200, Jochem van Dieten <[EMAIL PROTECTED]> wrote: >Tom Lane wrote: >> Oh really? I think you need to think harder about the transition >> conditions. Indeed. >> >> Dead-to-all is reasonably safe to treat as a hint bit because *it does >> not ever need to be undone*. Visible-to-all does not have that >> property. > >Yes, really :-) No, not really :-( As Tom has explained in a nearby message his concern is that -- unlike dead-to-all -- visible-to-all starts as false, is set to true at some point in time, and is eventually set to false again. Problems arise if one backend wants to set visible-to-all to true while at the same time another backend wants to set it to false. This could be curable by using a second bit as a deleted flag (might be even the same bit that's now used as dead-to-all, but I'm not sure). An index tuple having both the visible flag (formerly called visible-to-all) and the deleted flag set would cause a heap tuple access to check visibility. But that leaves the question of what to do after the deleting transaction has rolled back. I see no clean way from the visible-and-deleted state to visible-to-all. This obviously needs another round of hard thinking ... ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Shared buffers, Sort memory, Effective Cache Size
On Wed, 21 Apr 2004 10:01:30 -0700, Qing Zhao <[EMAIL PROTECTED]> wrote: >I have recently configured my PG7.3 on a G5 (8GB RAM) with >shmmax set to 512MB and shared_buffer=5, sort_mem=4096 >and effective cache size = 1. It seems working great so far but >I am wondering if I should make effctive cache size larger myself. Yes, much larger! And while you are at it make shared_buffers smaller. Servus Manfred ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] planner/optimizer question
On Wed, 28 Apr 2004 07:35:41 +0100, "Gary Doades" <[EMAIL PROTECTED]> wrote: >Why is there an entry in the index for a row if the row is not valid? Because whether a row is seen as valid or not lies in the eye of the transaction looking at it. Full visibility information is stored in the heap tuple header. The developers' consensus is that this overhead should not be in every index tuple. Servus Manfred ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Simply join in PostrgeSQL takes too long
On Tue, 27 Apr 2004 18:01:34 -0400, Rod Taylor <[EMAIL PROTECTED]> wrote: >On Tue, 2004-04-27 at 17:27, Vitaly Belman wrote: >> Hello pgsql-performance, >> >> I discussed the whole subject for some time in DevShed and didn't >> achieve much (as for results). I wonder if any of you guys can help >> out: >> >> http://forums.devshed.com/t136202/s.html >The point is that a book cannot be of a certain genre more than once. Rod, he has a hierarchy of genres. Genre 1 has 6379 child genres and a book can be in more than one of these. Vitaly, though LIMIT makes this look like a small query, DISTINCT requires the whole result set to be retrieved. 0.7 seconds doesn't look so bad for several thousand rows. Did you try with other genre_ids? Maybe a merge join is not the best choice. Set enable_mergejoin to false and see whether you get a (hopefully faster) hash join, assuming that sort_mem is large enough to keep the hash table in memory. If you send me your table contents I'll try it on Linux. Servus Manfred ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] planner/optimizer question
On Wed, 28 Apr 2004 09:05:04 -0400, Tom Lane <[EMAIL PROTECTED]> wrote: >> [ ... visibility information in index tuples ... ] >Storing that information would at least double the overhead space used >for each index tuple. The resulting index bloat would significantly >slow index operations by requiring more I/O. So it's far from clear >that this would be a win, even for those who care only about select >speed. While the storage overhead could be reduced to 1 bit (not a joke) we'd still have the I/O overhead of locating and updating index tuples for every heap tuple deleted/updated. Servus Manfred ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Simply join in PostrgeSQL takes too long
On Wed, 28 Apr 2004 08:23:35 -0400, Rod Taylor <[EMAIL PROTECTED]> wrote: >The below plpgsql forces the kind of algorithm we wish the planner could >choose. It should be fairly quick irregardless of dataset. That reminds me of hash aggregation. So here's another idea for Vitaly: SELECT book_id FROM ... WHERE ... GROUP BY book_id LIMIT ... Servus Manfred ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Simply join in PostrgeSQL takes too long
On Thu, 29 Apr 2004 13:36:47 -0400, Rod Taylor <[EMAIL PROTECTED]> wrote: >The reason for the function is that the sort routines (hash aggregation >included) will not stop in mid-sort Good point. Servus Manfred ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Quad processor options
On Tue, 11 May 2004 15:46:25 -0700, Paul Tuckfield <[EMAIL PROTECTED]> wrote: >- the "cache" column shows that linux is using 2.3G for cache. (way too >much) There is no such thing as "way too much cache". > you generally want to give memory to postgres to keep it "close" to >the user, Yes, but only a moderate amount of memory. > not leave it unused to be claimed by linux cache Cache is not unused memory. >- I'll bet you have a low value for shared buffers, like 1. On >your 3G system > you should ramp up the value to at least 1G (125000 8k buffers) In most cases this is almost the worst thing you can do. The only thing even worse would be setting it to 1.5 G. Postgres is just happy with a moderate shared_buffers setting. We usually recommend something like 1. You could try 2, but don't increase it beyond that without strong evidence that it helps in your particular case. This has been discussed several times here, on -hackers and on -general. Search the archives for more information. Servus Manfred ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] [HACKERS] Number of pages in a random sample
On Mon, 26 Apr 2004 08:08:16 -0700, Sailesh Krishnamurthy <[EMAIL PROTECTED]> wrote: > "A Bi-Level Bernoulli Scheme for Database Sampling" > Peter Haas, Christian Koenig (SIGMOD 2004) Does this apply to our problem? AFAIK with Bernoulli sampling you don't know the sample size in advance. Anyway, thanks for the hint. Unfortunately I couldn't find the document. Do you have a link? Servus Manfred ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] optimization ideas for frequent, large(ish) updates in frequently accessed DB?
On Fri, 13 Feb 2004 16:21:29 +0100, I wrote: >Populate this table with > INSERT INTO idmap > SELECT id, id, true > FROM t; This should be INSERT INTO idmap SELECT DISTINCT id, id, true FROM t; Servus Manfred ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Use of Functional Indexs and Planner estimates
On Tue, 8 Jun 2004 17:24:36 +1000, Russell Smith <[EMAIL PROTECTED]> wrote: >Also I am interested in how functional indexes have statistics collected for them, if >they do. Not in any released version. http://developer.postgresql.org/cvsweb.cgi/pgsql-server/src/backend/commands/analyze.c | Revision 1.70 / Sun Feb 15 21:01:39 2004 UTC (3 months, 3 weeks ago) by tgl | Changes since 1.69: +323 -16 lines | | First steps towards statistics on expressional (nee functional) indexes. | This commit teaches ANALYZE to store such stats in pg_statistic, but | nothing is done yet about teaching the planner to use 'em. So statistics gathering for expressional indexes will be in 7.5, but I don't know about the state of the planner ... Servus Manfred ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] I could not get postgres to utilizy indexes
On Thu, 19 Aug 2004 09:54:47 +0200, "Leeuw van der, Tim" <[EMAIL PROTECTED]> wrote: >You asked the very same question yesterday, and I believe you got some useful >answers. Why do you post the question again? Tim, no need to be rude here. We see this effect from time to time when a new user sends a message to a mailing list while not subscribed. The sender gets an automated reply from majordomo, subscribes to the list and sends his mail again. One or two days later the original message is approved (by Marc, AFAIK) and forwarded to the list. Look at the timestamps in these header lines: |Received: from postgresql.org (svr1.postgresql.org [200.46.204.71]) | by svr4.postgresql.org (Postfix) with ESMTP id 32B1F5B04F4; | Wed, 18 Aug 2004 15:54:13 + (GMT) |Received: from localhost (unknown [200.46.204.144]) | by svr1.postgresql.org (Postfix) with ESMTP id E6B2B5E4701 | for <[EMAIL PROTECTED]>; Tue, 17 Aug 2004 11:23:07 -0300 (ADT) >[more instructions] And while we are teaching netiquette, could you please stop top-posting and full-quoting. Igor, welcome to the list! Did the suggestions you got solve your problem? Servus Manfred ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Large # of rows in query extremely slow, not using
On Thu, 16 Sep 2004 20:51:11 -0500, Stephen Crowley <[EMAIL PROTECTED]> wrote: >explain analyze select * from history where date='2004-09-07' and >stock='ORCL' LIMIT 10; >" -> Index Scan using island_history_date_stock_time on >island_history (cost=0.00..183099.72 rows=102166 width=83) (actual >time=1612.000..1702.000 rows=10 loops=1)" ^^ LIMIT 10 hides what would be the most interesting info here. I don't believe that EXPLAIN ANALYSE SELECT * FROM history WHERE ... consumes lots of memory. Please try it. And when you post the results please include your Postgres version, some info about hardware and OS, and your non-default settings, especially random_page_cost and effective_cache_size. May I guess that the correlation of the physical order of tuples in your table to the contents of the date column is pretty good (examine correlation in pg_stats) and that island_history_date_stock_time is a 3-column index? It is well known that the optimizer overestimates the cost of index scans in those situations. This can be compensated to a certain degree by increasing effective_cache_size and/or decreasing random_page_cost (which might harm other planner decisions). You could also try CREATE INDEX history_date_stock ON history("date", stock); This will slow down INSERTs and UPDATEs, though. Servus Manfred ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Large # of rows in query extremely slow, not using
On Fri, 17 Sep 2004 19:23:44 -0500, Stephen Crowley <[EMAIL PROTECTED]> wrote: >Seq Scan [...] rows=265632 > Filter: ((date = '2004-09-07'::date) AND ((stock)::text = 'MSFT'::text)) >Total runtime: 412703.000 ms > >random_page_cost and effective_cache_size are both default, 8 and 1000 Usually random_page_cost is 4.0 by default. And your effective_cache_size setting is far too low for a modern machine. >"Index Scan [...] rows=159618 >" Index Cond: ((date = '2004-09-07'::date) AND ((stock)::text = 'ORCL'::text))" >"Total runtime: 201009.000 ms" Extrapolating this to 265000 rows you should be able to get the MSFT result in ca. 330 seconds, if you can persuade the planner to choose an index scan. Fiddling with random_page_cost and effective_cache_size might do the trick. >So now this in all in proportion and works as expected.. the question >is, why would the fact that it needs to be vaccumed cause such a huge >hit in performance? When i vacuumed it did free up nearly 25% of the >space. So before the VACCUM a seq scan would have taken ca. 550 seconds. Your MSFT query with LIMIT 10 took ca. 350 seconds. It's not implausible to assume that more than half of the table had to be scanned to find the first ten rows matching the filter condition. Servus Manfred ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Possibly slow query
On Wed, 26 Jan 2005 07:16:25 -0800, "Peter Darley" <[EMAIL PROTECTED]> wrote: >SELECT User_ID >FROM Assignments A NATURAL LEFT JOIN (SELECT * FROM Assignment_Settings >WHERE Setting='Status') ASet >WHERE A.User_ID IS NOT NULL > AND ASet.Assignment_ID IS NULL >GROUP BY User_ID; "ASet.Assignment_ID IS NULL" and "value IS NULL" as you had in your original post don't necessarily result in the same set of rows. SELECT DISTINCT a.User_ID FROM Assignments a LEFT JOIN Assignment_Settings s ON (a.Assignment_ID=s.Assignment_ID AND s.Setting='Status') WHERE a.User_ID IS NOT NULL AND s.Value IS NULL; Note how the join condition can contain subexpressions that only depend on columns from one table. BTW, |neo=# \d assignment_settings | [...] | setting | character varying(250) | not null | [...] |Indexes: |[...] |"assignment_settings_assignment_id_setting" unique, btree (assignment_id, setting) storing the setting names in their own table and referencing them by id might speed up some queries (and slow down others). Certainly worth a try ... Servus Manfred ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] index scan on =, but not < ?
On Thu, 10 Mar 2005 10:24:46 +1000, David Brown <[EMAIL PROTECTED]> wrote: >What concerns me is that this all depends on the correlation factor, and >I suspect that the planner is not giving enough weight to this. The planner does the right thing for correlations very close to 1 (and -1) and for correlations near zero. For correlations somewhere between 0 and 1 the cost is estimated by interpolation, but it tends too much towards the bad end, IMHO. Servus Manfred ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] cpu_tuple_cost
On Mon, 14 Mar 2005 21:23:29 -0500, Tom Lane <[EMAIL PROTECTED]> wrote: > I think that the "reduce random_page_cost" mantra >is not an indication that that parameter is wrong, but that the >cost models it feeds into need more work. One of these areas is the cost interpolation depending on correlation. This has been discussed on -hackes in October 2002 and August 2003 ("Correlation in cost_index()"). My Postgres installations contain the patch presented during that discussion (and another index correlation patch), and I use *higher* values for random_page_cost (up to 10). Servus Manfred ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] multi-column index
On Wed, 16 Mar 2005 22:19:13 -0500, Tom Lane <[EMAIL PROTECTED]> wrote: >calculate the correlation explicitly for each index May be it's time to revisit an old proposal that has failed to catch anybody's attention during the 7.4 beta period: http://archives.postgresql.org/pgsql-hackers/2003-08/msg00937.php I'm not sure I'd store index correlation in a separate table today. You've invented something better for functional index statistics, AFAIR. Servus Manfred ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] multi-column index
On Thu, 17 Mar 2005 16:55:15 +0800, Christopher Kings-Lynne <[EMAIL PROTECTED]> wrote: >Make it deal with cross-table fk correlations as well :) That's a different story. I guess it boils down to cross-column statistics for a single table. Part of this is the correlation between values in two or more columns, which is not the same as the correlation between column (or index tuple) values and tuple positions. And yes, I did notice the smiley ;-) Servus Manfred ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] multi-column index
On Thu, 17 Mar 2005 23:48:30 -0800, Ron Mayer <[EMAIL PROTECTED]> wrote: >Would this also help estimates in the case where values in a table >are tightly clustered, though not in strictly ascending or descending >order? No, I was just expanding the existing notion of correlation from single columns to index tuples. >For example, address data has many fields that are related >to each other (postal codes, cities, states/provinces). This looks like a case for cross-column statistics, though you might not have meant it as such. I guess what you're talking about can also be described with a single column. In a list like 3 3 ... 3 1 1 ... 1 7 7 ... 7 4 4 ... 4 ... equal items are "clustered" together but the values are not "correlated" to their positions. This would require a whole new column characteristic, something like the probability that we find the same value in adjacent heap tuples, or the number of different values we can expect on one heap page. The latter might even be easy to compute during ANALYSE. Servus Manfred ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] multi-column index
On Thu, 17 Mar 2005 13:15:32 -0500, Tom Lane <[EMAIL PROTECTED]> wrote: >I am coming around to the view that we really do need to calculate >index-specific correlation numbers, Correlation is a first step. We might also want distribution information like number of distinct index tuples and histograms. >Now, as to the actual mechanics of getting the numbers: the above link >seems to imply reading the whole index in index order. That turned out to be surprisingly easy (no need to look at data values, no operator lookup, etc.) to implement as a proof of concept. As it's good enough for my use cases I never bothered to change it. > Which is a >hugely expensive proposition for a big index, Just a thought: Could the gathering of the sample be integrated into the bulk delete phase of VACUUM? (I know, ANALYSE is not always performed as an option to VACUUM, and VACUUM might not even have to delete any index tuples.) > We need a way >to get the number from a small sample of pages. I had better (or at least different) ideas at that time, like walking down the tree, but somehow lost impetus :-( >The idea I was toying with was to recalculate the index keys for the >sample rows that ANALYZE already acquires, and then compare/sort >those. This seems to be the approach that perfectly fits into what we have now. > This is moderately expensive CPU-wise though, and it's also not >clear what "compare/sort" means for non-btree indexes. Nothing. We'd need some notion of "clusteredness" instead of correlation. C.f. my answer to Ron in this thread. BTW, the more I think about it, the more I come to the conclusion that when the planner starts to account for "clusteredness", random page cost has to be raised. Servus Manfred ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Performance advice
[ This has been written offline yesterday. Now I see that most of it has already been covered. I send it anyway ... ] On Tue, 24 Jun 2003 09:39:32 +0200, "Michael Mattox" <[EMAIL PROTECTED]> wrote: >Websites are monitored every 5 or 10 minutes (depends on client), >there are 900 monitors which comes out to 7,800 monitorings per hour. So your server load - at least INSERT, UPDATE, DELETE - is absolutely predictable. This is good. It enables you to design a cron-driven VACUUM strategy. |INFO: --Relation public.jdo_sequencex-- |INFO: Pages 28: Changed 1, Empty 0; Tup 1: Vac 5124, Keep 0, UnUsed 0. ^ This table could stand more frequent VACUUMs, every 15 minutes or so. BTW, from the name of this table and from the fact that there is only one live tuple I guess that you are using it to keep track of a sequence number. By using a real sequence you could get what you need with less contention; and you don't have to VACUUM a sequence. |INFO: --Relation public.monitorx-- |INFO: Removed 170055 tuples in 6036 pages. |CPU 0.52s/0.81u sec elapsed 206.26 sec. |INFO: Pages 6076: Changed 0, Empty 0; Tup 2057: Vac 170055, Keep 568, UnUsed 356. |Total CPU 6.28s/13.23u sec elapsed 486.07 sec. The Vac : Tup ratio for this table is more than 80. You have to VACUUM this table more often. How long is "overnight"? Divide this by 80 and use the result as the interval between VACUUM [VERBOSE] [ANALYSE] public.monitorx; Thus you'd have approximately as many dead tuples as live tuples and the table size should not grow far beyond 150 pages (after an initial VACUUM FULL, of course). Then VACUUM of this table should take no more than 20 seconds. Caveat: Frequent ANALYSEs might trigger the need to VACUUM pg_catalog.pg_statistic. > The >monitor table has columns "nextdate" and "status" which are updated with >every monitoring, [...] > updating the "nextdate" before the monitoring and inserting the >status and status item records after. Do you mean updating monitor.nextdate before the monitoring and monitor.status after the monitoring? Can you combine these two UPDATEs into one? > During the vacuum my application does slow down quite a bit Yes, because VACUUM does lots of I/O. > and afterwards is slow speeds back up. ... because the working set is slowly fetched into the cache after having been flushed out by VACUUM. Your five largest relations are monitorstatus_statusitemsx, monitorstatusitemlistd8ea58a5x, monitorstatusitemlistx, monitorstatusitemx, and monitorstatusx. The heap relations alone (without indexes) account for 468701 pages, almost 4GB. VACUUMing these five relations takes 23 minutes for freeing less than 200 out of 6 million tuples for each relation. This isn't worth it. Unless always the same tuples are updated over and over, scheduling a VACUUM for half a million deletions/updates should be sufficient. >shared_buffers = 3072 # min max_connections*2 or 16, 8KB each >sort_mem = 8192 # min 64, size in KB >vacuum_mem = 24576 # min 1024, size in KB > >The rest are left uncommented (using the defaults). As has already been said, don't forget effective_cache_size. I'm not so sure about random_page_cost. Try to find out which queries are too slow. EXPLAIN ANALYSE is your friend. One more thing: I see 2 or 3 UPDATEs and 5 INSERTs per monitoring. Are these changes wrapped into a single transaction? Servus Manfred ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Performance advice
On Wed, 25 Jun 2003 11:47:48 +0200, "Michael Mattox" <[EMAIL PROTECTED]> wrote: >> |INFO: --Relation public.jdo_sequencex-- >> |INFO: Pages 28: Changed 1, Empty 0; Tup 1: Vac 5124, Keep 0, UnUsed 0. >> ^ >> This table could stand more frequent VACUUMs, every 15 minutes or so. > >Can you explain what the "Vac" is That's a long story, where shall I start? Search for MVCC in the docs and in the list archives. So you know that every DELETE and every UPDATE leaves behind old versions of tuples. The space occupied by these cannot be used immediately. VACUUM is responsible for finding dead tuples, which are so old that there is no active transaction that could be interested in their contents, and reclaiming the space. The number of such tuples is reported as "Vac". > and how you knew that it should be vacuumed more often? jdo_sequencex stores (5000 old versions and 1 active version of) a single row in 28 pages. Depending on when you did ANALYSE it and depending on the SQL statement, the planner might think that a sequential scan is the most efficient way to access this single row. A seq scan has to read 28 pages instead of a single page. Well, probably all 28 pages are in the OS cache or even in PG's shared buffers, but 27 pages are just wasted and push out pages you could make better use of. And processing those 28 pages does not come at no CPU cost. If you VACUUM frequently enough, this relation never grows beyond one page. >I'm using Java Data Objects (JDO) which is an O/R mapper. It generated the >schema from my object model by default it used a table for a sequence. I >just got finished configuring it to use a real postgres sequence. With the >way they have it designed, it opens and closes a connection each time it >retrieves a sequence. Would I get a performance increase if I modify their >code to retrieve multiple sequence numbers in one connection? For example I >could have it grab 50 at a time, which would replace 50 connections with 1. Better yet you modify the code to use the normal access functions for sequences. Servus Manfred ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] problem with pg_statistics
On Thu, 26 Jun 2003 10:08:05 -0400, Tom Lane <[EMAIL PROTECTED]> wrote: >Andre Schubert <[EMAIL PROTECTED]> writes: >> i think i need a little help with a problem with pg_statistic. > >Try reducing random_page_cost With index scan cost being more than 25 * seq scan cost, I guess that - all other things held equal - even random_page_cost = 1 wouldn't help. Andre might also want to experiment with effective_cache_size and with ALTER TABLE ... SET STATISTICS. Or there's something wrong with correlation? Andre, what hardware is this running on? What are the values of shared_buffers, random_page_cost, effective_cache_size, ... ? Could you show us the result of SELECT * FROM pg_stats WHERE tablename = "tbl_traffic" AND attname = "time_stamp"; Servus Manfred ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] problem with pg_statistics
On Fri, 27 Jun 2003 08:07:35 +0200, Andre Schubert <[EMAIL PROTECTED]> wrote: >Traffic data are inserted every 5 minutes with the actual datetime >of the transaction, thatswhy the table should be physically order by time_stamp. So I'd expect a correlation of nearly 1. Why do your statistics show a value of -0.479749? A negative correlation is a sign of descending sort order, and correlation values closer to 0 indicate poor correspondence between column values and tuple positions. Could this be the effect of initial data loading? Are there any updates or deletions in your traffic table? >To answer Manfreds questions: >> Andre, what hardware is this running on? What are the values of >> shared_buffers, random_page_cost, effective_cache_size, ... ? Could >> you show us the result of >> >> SELECT * FROM pg_stats >> WHERE tablename = "tbl_traffic" AND attname = "time_stamp"; ^ ^ ^ ^ Oops, these should have been single quotes. It's too hot here these days :-) >sort_mem = 32000 >shared_buffers = 13000 Personally I would set them to lower values, but if you have good reasons ... >#effective_cache_size = 1000 # default in 8k pages This is definitely too low. With 512MB or more I tend to set this to ca. 80% of available RAM. Use top and free to find hints for good values. Servus Manfred ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] problem with pg_statistics
On Fri, 27 Jun 2003 11:10:58 +0200, Andre Schubert <[EMAIL PROTECTED]> wrote: >Once a month we delete the all data of the oldest month. >And after that a vacuum full verbose analyze is performed. >Could this cause reordering of the data ? I may be wrong, but I think VACUUM FULL starts taking tuples from the end of the relation and puts them into pages at the beginning until read and write position meet somewhere in the middle. This explains the bad correlation. >And should i do a cluster idx_ts tbl_traffic ? I think so. >> >#effective_cache_size = 1000 # default in 8k pages >> >> This is definitely too low. With 512MB or more I tend to set this to >> ca. 80% of available RAM. Use top and free to find hints for good >> values. >> > >Ok, i will talk with my coworker ( he is the sysadmin of our machine ) >and look if can use such amount of RAM, because there are several other >processes that are running on these machines. >But i will test and report ... effective_cache_size does not *control* resource consumption, it just *reports* it as a hint to the planner. Servus Manfred ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Effective Cache Size
On Tue, 1 Jul 2003 15:50:14 +0200 , Howard Oblowitz <[EMAIL PROTECTED]> wrote: >What then will be the effect of setting this too high? The planner might choose an index scan where a sequential scan would be faster. >And too low? The planner might choose a sequential scan where an index scan would be faster. >How does it impact on other applications eg Java ? It doesn't -- at least not directly. (There could be very subtle effects when Postgres does a sequential scan over a large relation thus pushing everything else out of the cache, where an index scan would have read only a small number of pages. Or when a large index scan turns your machine from CPU bound to I/O bound.) Servus Manfred ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Version 7 question
On Tue, 1 Jul 2003 15:02:21 +0200, "Michael Mattox" <[EMAIL PROTECTED]> wrote: >I have 1.5 gigs of RAM on my >server but I'm also running a few other java programs that take up probably >500 megs total of memory, leaving me 1gig for Postgres. Should I set my >shared buffers to be 25% of 1gig? That would be 32768. Then what should my >effective cache be? Right now I have it set to 64000 which would be >512megs. Between the buffers and cache that'd be a total of 768megs, >leaving approximately 768 for my other java apps & the OS. Michael, by setting effective_cache_size you do not allocate anything. This configuration variable is just a *hint* to the planner how much RAM is used for caching on your system (as shown by top or free). Servus Manfred ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Help on my database performance
On Thu, 31 Jul 2003 11:06:09 -0400, "Jianshuo Niu" <[EMAIL PROTECTED]> wrote: >I ran the same explain analyze on two similar tables. However, the table >with less data took much more time than the one with more data. Could anyone >tell me what happened? >Seq Scan on tfd_catalog (cost=0.00..43769.82 rows=161282 width=10) (actual >time=3928.64..12905.76 rows=161282 loops=1) >Total runtime: 13240.21 msec > >Seq Scan on hm_catalog (cost=0.00..22181.18 rows=277518 width=9) (actual >time=21.32..6420.76 rows=277518 loops=1) >Total runtime: 6772.95 msec The first SELECT takes almost twice the time because tfd_catalog has almost twice as many pages than hm_catalog. This may be due to having wider tuples or more dead tuples in tfd_catalog. In the former case theres not much you can do. But the high startup cost of the first SELECT is a hint for lots of dead tuples. So VACUUM FULL ANALYSE might help. Servus Manfred ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Moving postgresql.conf tunables into 2003...
[jumping in late due to vacation] On Thu, 3 Jul 2003 17:06:46 -0700, Sean Chittenden <[EMAIL PROTECTED]> wrote: >> is some other problem that needs to be solved. (I'd wonder about >> index correlation myself; we know that that equation is pretty >> bogus.) > >Could be. I had him create a multi-column index on the date and a >non-unique highly redundant id. Tom has already suspected index correlation to be a possible source of the problem and recommended to CLUSTER on the index. A weakness of the current planner implementation is that a multi column index is always thought to have low correlation. In your case even after CLUSTER the 2-column index on (date, sensorid) is treated like a single column index with correlation 0.5. I have an experimental patch lying around somewhere that tries to work around these problems by offering different estimation methods for index scans. If you are interested, I'll dig it out. In the meantime have him try with a single column index on date. On 04 Jul 2003 08:29:04 -0400, Rod Taylor <[EMAIL PROTECTED]> wrote: |That's one heck of a poor estimate for the number of rows returned. | |> -> Seq Scan on mss_fwevent (cost=0.00..223312.60 rows=168478 width=12) |(actual time=24253.66..24319.87 rows=320 loops=1) > -> Index Scan using mss_fwevent_evtime_sensorid_idx on mss_fwevent > (cost=0.00..2442524.70 rows=168478 width=12) >(actual time=68.36..132.84 rows=320 loops=1) > Index Cond: ((evtime > (now() - '06:00'::interval)) AND (sensorid = 7)) > Filter: (NOT "action") Estimated number of rows being wrong by a factor 500 seems to be the main problem hiding everything else. With statistics already set to 1000, does this mean that sensorid, evtime, and action are not independent? It'd be interesting to know whether the estimation error comes from "Index Cond" or from "Filter". Servus Manfred ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Help on my database performance
On Thu, 31 Jul 2003 16:08:11 -0400, "Jianshuo Niu" <[EMAIL PROTECTED]> wrote: >explain analyze select count(*) from tfd_catalog ; >NOTICE: QUERY PLAN: > >Aggregate (cost=15986.02..15986.02 rows=1 width=0) > (actual time=1089.99..1089.99 rows=1 loops=1) > -> Seq Scan on tfd_catalog (cost=0.00..15582.82 rows=161282 width=0) > (actual time=0.11..833.41 rows=161282 loops=1) >Total runtime: 1090.51 msec >Could you tell me what does "Aggregate (cost=15986.02..15986.02 rows=1 >width=0) (actual time=1089.99..1089.99 rows=1 loops=1)" mean? It does not >show in my previous report. In your first post you did SELECT productid FROM tfd_catalog; now you did SELECT count(*) FROM tfd_catalog; count() is an aggregate function which in your case takes 161282 rows as input and produces a single row as output. The "actual" part of the "Aggregate" line tells you that the first resulting row is generated 1089.99 milliseconds after query start and the last row (not surprisingly) at the same time. The "cost" part contains the planner's estimations for these values. Servus Manfred ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] I can't wait too much: Total runtime 432478.44 msec
On Fri, 1 Aug 2003 18:17:17 -0300, "Fernando Papa" <[EMAIL PROTECTED]> wrote: > AND cont_publicacion.fecha_publicacion = (SELECT >max(cp1.fecha_publicacion) > FROM cont_publicacion cp1 > WHERE cp1.id_instalacion = >cont_publicacion.id_instalacion >AND cp1.id_contenido = cont_publicacion.id_contenido > >AND cp1.generar_vainilla = >cont_publicacion.generar_vainilla) If certain uniqueness conditions are met, the Postgres specific DISTINCT ON clause could help totally eliminating the subselect: SELECT DISTINCT ON ( cp.id_instalacion, cp.id_contenido, cp.generar_vainilla, cp.fecha_publicacion ) cc.id_contenido ,cc.pertenece_premium ,cc.Titulo_esp as v_sufix ,cc.url_contenido ,cc.tipo_acceso ,cc.id_sbc ,cc.cant_vistos ,cc.cant_votos ,cc.puntaje_total ,cc.id_contenido_padre ,jc.imagen_tapa_especial ,jc.info_general_esp as info_general ,jc.ayuda ,jc.tips_tricks_esp as tips_tricks ,jc.mod_imagen_tapa_especial ,cp.fecha_publicacion as fecha_publicacion ,cp.generar_Vainilla FROM cont_contenido cc ,juegos_config jc ,cont_publicacion cp WHERE cc.id_instalacion= 2 AND cc.id_sbc = 619 AND cc.id_tipo = 2 AND cc.id_instalacion = jc.id_instalacion AND cc.id_contenido = jc.id_contenido AND upper(cp.generar_Vainilla) = 'S' AND cp.id_instalacion = cc.id_instalacion AND cp.id_contenido = cc.id_contenido ORDER BY cp.id_instalacion, cp.id_contenido, cp.generar_vainilla, cp.fecha_publicacion desc However, this doesn't get the result in the original order, so you have to wrap another SELECT ... ORDER BY ... LIMIT around it. Or try to move the subselect into the FROM clause: SELECT cc.id_contenido ,cc.pertenece_premium ,cc.Titulo_esp as v_sufix ,cc.url_contenido ,cc.tipo_acceso ,cc.id_sbc ,cc.cant_vistos ,cc.cant_votos ,cc.puntaje_total ,cc.id_contenido_padre ,jc.imagen_tapa_especial ,jc.info_general_esp as info_general ,jc.ayuda ,jc.tips_tricks_esp as tips_tricks ,jc.mod_imagen_tapa_especial ,cp.fecha_publicacion as fecha_publicacion ,cp.generar_Vainilla FROM cont_contenido cc ,juegos_config jc ,(SELECT DISTINCT ON ( id_instalacion, id_contenido, generar_vainilla, fecha_publicacion ) * FROM cont_publicacion ORDER BY id_instalacion, id_contenido, generar_vainilla, fecha_publicacion desc ) AS cp WHERE cc.id_instalacion= 2 AND cc.id_sbc = 619 AND cc.id_tipo = 2 AND cc.id_instalacion = jc.id_instalacion AND cc.id_contenido = jc.id_contenido AND upper(cp.generar_Vainilla) = 'S' AND cp.id_instalacion = cc.id_instalacion AND cp.id_contenido = cc.id_contenido ORDER BY cp.fecha_publicacion desc LIMIT 10 OFFSET 0 [completely untested] Servus Manfred ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] I can't wait too much: Total runtime 432478.44 msec
On Mon, 4 Aug 2003 12:02:46 -0300, "Fernando Papa" <[EMAIL PROTECTED]> wrote: >FROM >cont_contenido >,juegos_config >,cont_publicacion >,(SELECT max(cp1.fecha_publicacion) as max_pub > --change here > FROM cont_publicacion cp1) a --change here But this calculates the global maximum, not per id_instalacion, id_contenido, and generar_vainilla as in >AND cont_publicacion.fecha_publicacion = (SELECT > max(cp1.fecha_publicacion) > FROM cont_publicacion cp1 > WHERE cp1.id_instalacion = > cont_publicacion.id_instalacion > AND cp1.id_contenido = > cont_publicacion.id_contenido > AND cp1.generar_vainilla = > cont_publicacion.generar_vainilla) Servus Manfred ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] OSDL Database Test Suite 3 is available on PostgreSQL
On 01 Aug 2003 11:04:10 -0700, Jenny Zhang <[EMAIL PROTECTED]> wrote: >A sample OSDL-DBT3 test result report can be found at: >http://khack.osdl.org/stp/276912/ > >Your comments are welcome, | effective_cache_size | 1000 With 4GB of memory this is definitely too low and *can* (note that I don't say *must*) lead the planner to wrong decisions. | shared_buffers | 15200 ... looks reasonable. Did you test with other values? | sort_mem | 524288 This is a bit high, IMHO, but might be ok given that DBT3 is not run with many concurrent sessions (right?). http://khack.osdl.org/stp/276912/results/plot/vmstat_swap.png shows some swapping activity towards the end of the run which could be caused by a too high sort_mem setting. Servus Manfred ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] I can't wait too much: Total runtime 432478.44 msec
On Mon, 04 Aug 2003 16:10:18 +0200, I wrote: >SELECT DISTINCT ON ( >cp.id_instalacion, >cp.id_contenido, >cp.generar_vainilla, >cp.fecha_publicacion > ) Cut'n'paste error! fecha_publicacion should not be in the DISTINCT ON list. The same error is in my second suggestion (FROM (subselect)). Servus Manfred ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
[PERFORM] Index correlation (was: Moving postgresql.conf tunables into 2003... )
On Thu, 07 Aug 2003 19:31:52 -0400, Tom Lane <[EMAIL PROTECTED]> wrote: >The correlation is between index order and heap order --- that is, are >the tuples in the table physically in the same order as the index? >The better the correlation, the fewer heap-page reads it will take to do >an index scan. This is true for a column that is the first column of a btree index. Correlation doesn't help with additional index columns and with functional indices. >Note it is possible to measure correlation without regard to whether >there actually is any index; But there is no need to, because the correlation is only used for index access cost estimation. >One problem we have is extrapolating from the single-column correlation >stats computed by ANALYZE to appropriate info for multi-column indexes. >It might be that the only reasonable fix for this is for ANALYZE to >compute multi-column stats too when multi-column indexes are present. I wonder whether it would be better to drop column correlation and calculate index correlation instead, i.e. correlation of index tuples to heap tuple positions. This would solve both the multi-column index and the functional index cost estimation problem. >People are used to the assumption that you don't need to re-ANALYZE >after creating a new index, but maybe we'll have to give that up. Index correlation would be computed on CREATE INDEX and whenever the heap relation is analysed ... Servus Manfred ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Moving postgresql.conf tunables into 2003...
On Tue, 5 Aug 2003 15:26:09 -0700, Sean Chittenden <[EMAIL PROTECTED]> wrote: >> I have an experimental patch lying around somewhere that tries to >> work around these problems by offering different estimation methods >> for index scans. If you are interested, I'll dig it out. > >Sure, I'll take a gander... had my head in enough Knuth recently to >even hopefully have some kind of a useful response to the patch. Sean, the patch is at http://www.pivot.at/pg/16-correlation-732.diff. A short description of its usage can be found at http://archives.postgresql.org/pgsql-performance/2002-11/msg00256.php. If you are interested how the different interpolation methods work, read the source - it shouldn't be too hard to find. You might also want to read the thread starting at http://archives.postgresql.org/pgsql-hackers/2002-10/msg00072.php. >> does this mean that sensorid, evtime, and action are not >> independent? > >Hrm... sensorid is sequence and grows proportional with evtime, >obviously. So a *low* sensorid (7) is quite uncommon for a *late* evtime? This would help understand the problem. Unfortunately I have no clue what to do about it. :-( >Having spent a fair amount of time looking at the two following plans, >it seems as though an additional statistic is needed to change the >cost of doing an index lookup when the index is linearly ordered. I'm not sure I understand what you mean by "index is linearly ordered", but I guess correlation is that statistic you are talking about. However, it is calculated per column, not per index. >Whether CLUSTER does this or not, I don't know, If you CLUSTER on an index and then ANALYSE, you get a correlation of 1.0 (== optimum) for the first column of the index. > I never heard back >from him after getting the runtime down to a few ms. :-/ Pity! I'd have liked to see EXPLAIN ANALYSE for SELECT * FROM mss_fwevent WHERE sensorid = 7 AND evtime > (now() - '6 hours'::INTERVAL) AND NOT action; SELECT * FROM mss_fwevent WHERE sensorid = 7 AND evtime > (now() - '6 hours'::INTERVAL); SELECT * FROM mss_fwevent WHERE evtime > (now() - '6 hours'::INTERVAL); SELECT * FROM mss_fwevent WHERE sensorid = 7; > Are indexes >on linearly ordered data rebalanced somehow? I thought CLUSTER only >reordered data on disk. -sc AFAIK CLUSTER re-creates all indices belonging to the table. Servus Manfred ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Serious issues with CPU usage
On Mon, 8 Sep 2003 13:50:23 +0300, Andri Saar <[EMAIL PROTECTED]> wrote: >Basically I do this: >1) select about ~700 ID's I have to poll >2) poll them >3) update those 700 rows in that "table" I used (~2700 rows total). > >And I do this cycle once per minute, so yes, I've got a zillion updates. 700 >of 2700 is roughly 25%, so I'd have to vacuum once per minute? With such a small table VACUUM should be a matter of less than one second: fred=# vacuum verbose t; INFO: --Relation public.t-- INFO: Index t_pkey: Pages 65; Tuples 16384: Deleted 4096. CPU 0.01s/0.10u sec elapsed 0.21 sec. INFO: Removed 4096 tuples in 154 pages. CPU 0.04s/0.02u sec elapsed 0.07 sec. INFO: Pages 192: Changed 192, Empty 0; Tup 16384: Vac 4096, Keep 0, UnUsed 0. Total CPU 0.08s/0.16u sec elapsed 0.36 sec. VACUUM Time: 415.00 ms And this is on a 400 MHz machine under cygwin, so don't worry if you have a real computer. Servus Manfred ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Inconsistent performance
On Mon, 15 Sep 2003 22:26:45 -0400, Christopher Browne <[EMAIL PROTECTED]> wrote: >> select count (*) from table; >The only possible plan for THAT query will involve a seq scan of the >whole table. If the postmaster already has the data in cache, it >makes sense for it to run in 1 second. If it has to read it from >disk, 12 seconds makes a lot of sense. Yes. And note that the main difference is between having the data in memory and having to fetch it from disk. I don't believe that this difference can be explained by 9000 read calls hitting the operating system's cache. >You might want to increase the "shared_buffers" parameter in >postgresql.conf; that should lead to increased stability of times as >it should be more likely that the data in "table" will remain in >cache. Let's not jump to this conclusion before we know what's going on. Joseph Bove <[EMAIL PROTECTED]> wrote in another message above: | I did have shared_buffers and sort_mem both set higher originally (15000, | 32168) As I read this I think he meant "... and had the same performance problem." Joseph, what do you get, if you run that EXPLAIN ANALYSE SELECT count(*) ... several times? What do vmstat and top show while the query is running? Are there other processes active during or between the runs? What kind of processes? Postgres backends? Web server? ... Servus Manfred ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] rewrite in to exists?
On Thu, 18 Sep 2003 13:23:37 +0800, "Christopher Kings-Lynne" <[EMAIL PROTECTED]> wrote: >Why can't you just go: > >select code, id, name, date_of_service from tbl where xxx <= 29800 and xx >= >29909 and code='XX' and client_code='XX' order by id, date_of_service; Because (ignoring conditions on code and client_code for a moment) if for a given date there is at least one row satisfying the condition on xxx, the original query returns *all* rows having this date, regardless of their xxx value. For example: id | date | xxx ++--- 1 | 2003-01-01 | 1* 2 | 2003-01-01 | 29800 ** 3 | 2003-01-01 | 3* 4 | 2003-02-02 | 2 5 | 2003-03-03 | 29900 ** >> select code, id, name, date_of_service >> from tbl >> where date_of_service in >> (select date_of_service >> from tbl >> where xxx >= '29800' >> and xxx <= '29909' >> and code = 'XX') >> and client_code = 'XX' >> order by id, date_of_service; To the original poster: You did not provide a lot of information, but the following suggestions might give you an idea ... SELECT code, id, date_of_service FROM tbl WHERE EXISTS (SELECT * FROM tbl t2 WHERE t2.xxx >= '29800' AND t2.xxx <= '29909' AND t2.code = 'XX' AND tbl.date_of_service = t2.date_of_service) AND client_code = 'XX' ORDER BY id, date_of_service; SELECT t1.code, t1.id, t1.date_of_service FROM tbl t1 INNER JOIN (SELECT DISTINCT date_of_service FROM tbl WHERE xxx >= '29800' AND xxx <= '29909' AND code = 'XX' ) AS t2 ON (t1.date_of_service = t2.date_of_service) WHERE t1.client_code = 'XX' ORDER BY id, date_of_service; SELECT DISTINCT t1.code, t1.id, t1.date_of_service FROM tbl AS t1 INNER JOIN tbl AS t2 ON (t1.date_of_service = t2.date_of_service AND t2.xxx >= '29800' AND t2.xxx <= '29909' AND t2.code = 'XX') WHERE t1.client_code = 'XX' -- might as well put this -- condition into the ON clause ORDER BY id, date_of_service; The last one assumes that there are no duplicates on code, id, date_of_service in the desired result. Servus Manfred ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] rewrite in to exists?
On Thu, 18 Sep 2003 12:27:23 -0700 (GMT-07:00), LN Cisneros <[EMAIL PROTECTED]> wrote: >But, the EXISTS version doesn't Laurette, looking at that SELECT statement again I can't see what's wrong with it. One of us is missing something ;-) > really give me what I want... Can you elaborate? SELECT code, id, date_of_service FROM tbl WHERE EXISTS (SELECT * FROM tbl t2 WHERE t2.xxx >= '29800' AND t2.xxx <= '29909' AND t2.code = 'XX' AND tbl.date_of_service = t2.date_of_service) -- (!) AND client_code = 'XX' ORDER BY id, date_of_service; >all rows in tbl that ^^^ Well, all that have client_code = 'XX', as in your original query. > match the date of the subquery. The matching is done by the line with the (!) comment. Servus Manfred ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] osdl-dbt3 run results - puzzled by the execution plans
On Thu, 18 Sep 2003 15:36:50 -0700, Jenny Zhang <[EMAIL PROTECTED]> wrote: >We thought the large effective_cache_size should lead us to better >plans. But we found the opposite. The common structure of your query plans is: Sort Sort Key: sum((partsupp.ps_supplycost * partsupp.ps_availqty)) InitPlan -> Aggregate -> SubPlan -> Aggregate Filter: (sum((ps_supplycost * ps_availqty)) > $0) -> Group -> Sort Sort Key: partsupp.ps_partkey -> SubPlan (same as above) where the SubPlan is -> Merge Join (cost=519.60..99880.05 rows=32068 width=65) (actual time=114.78..17435.28 rows=30400 loops=1) ctr=5.73 Merge Cond: ("outer".ps_suppkey = "inner".s_suppkey) -> Index Scan using i_ps_suppkey on partsupp (cost=0.00..96953.31 rows=801712 width=34) (actual time=0.42..14008.92 rows=799361 loops=1) ctr=6.92 -> Sort (cost=519.60..520.60 rows=400 width=31) (actual time=106.88..143.49 rows=30321 loops=1) ctr=3.63 Sort Key: supplier.s_suppkey -> SubSubPlan for large effective_cache_size and -> Nested Loop (cost=0.00..130168.30 rows=32068 width=65) (actual time=0.56..1374.41 rows=30400 loops=1) ctr=94.71 -> SubSubPlan -> Index Scan using i_ps_suppkey on partsupp (cost=0.00..323.16 rows=80 width=34) (actual time=0.16..2.98 rows=80 loops=380) ctr=108.44 Index Cond: (partsupp.ps_suppkey = "outer".s_suppkey) for small effective_cache_size. Both subplans have an almost identical subsubplan: -> Nested Loop (cost=0.00..502.31 rows=400 width=31) (actual time=0.23..110.51 rows=380 loops=1) ctr=4.55 Join Filter: ("inner".s_nationkey = "outer".n_nationkey) -> Seq Scan on nation (cost=0.00..1.31 rows=1 width=10) (actual time=0.08..0.14 rows=1 loops=1) ctr=9.36 Filter: (n_name = 'ETHIOPIA'::bpchar) -> Seq Scan on supplier (cost=0.00..376.00 rows=1 width=21) (actual time=0.10..70.72 rows=1 loops=1) ctr=5.32 I have added the ctr (cost:time ratio) for each plan node. These values are mostly between 5 and 10 with two notable exceptions: 1) -> Sort (cost=519.60..520.60 rows=400 width=31) (actual time=106.88..143.49 rows=30321 loops=1) ctr=3.63 It has already been noticed by Matt Clark that this is the only plan node where the row count estimation looks wrong. However, I don't believe that this has great influence on the total cost of the plan, because the ctr is not far from the usual range and if it were a bit higher, it would only add a few hundred cost units to a branch costing almost 10 units. BTW I vaguely remember that there is something strange with the way actual rows are counted inside a merge join. Look at the branch below this plan node: It shows an actual row count of 380. 2) -> Index Scan using i_ps_suppkey on partsupp (cost=0.00..323.16 rows=80 width=34) (actual time=0.16..2.98 rows=80 loops=380) ctr=108.44 Here we have the only plan node where loops > 1, and it is the only one where the ctr is far off. The planner computes the cost for one loop and multiplies it by the number of loops (which it estimates quite accurately to be 400), thus getting a total cost of ca. 13. We have no reason to believe that the single loop cost is very far from reality (for a *single* index scan), but the planner does not account for additional index scans hitting pages in the cache that have been brought in by preceding scans. This is a known problem, Tom has mentioned it several times, IIRC. Now I'm very interested in getting a better understanding of this problem, so could you please report the results of . \d i_ps_suppkey . VACUUM VERBOSE ANALYSE partsupp; VACUUM VERBOSE ANALYSE supplier; . SELECT attname, null_frac, avg_witdh, n_distinct, correlation FROM pg_stats WHERE tablename = 'partsupp' AND attname IN ('ps_suppkey', ...); Please insert other interesting column names for ..., especially those contained in i_ps_suppkey, if any. . SELECT relname, relpages, reltuples FROM pg_class WHERE relname IN ('partsupp', 'supplier', ...); ^^^ Add relevant index names here. . EXPLAIN ANALYSE SELECT ps_partkey, ps_supplycost, ps_availqty FROM partsupp, supplier WHERE ps_suppkey = s_suppkey AND s_nationkey = ''; The idea is to eliminate parts of the plan that are always the same. Omitting nation is possibly to much a simplification. In this case pleas
Re: [PERFORM] osdl-dbt3 run results - puzzled by the execution
On Fri, 19 Sep 2003 11:35:35 -0700, Jenny Zhang <[EMAIL PROTECTED]> wrote: >I posted more results as you requested: Unfortunately they only confirm what I suspected earlier: >> 2) -> Index Scan using i_ps_suppkey on partsupp >> (cost=0.00..323.16 rows=80 width=34) >> (actual time=0.16..2.98 rows=80 loops=380) >> ctr=108.44 >> the planner does not >> account for additional index scans hitting pages in the cache that >> have been brought in by preceding scans. This is a known problem PF1 = estimated number of page fetches for one loop ~ 320 L = estimated number of loops ~ 400 P = number of pages in relation ~ 21000 Cutting down the number of heap page fetches if PF1 * L > P and P < effective_cache_size seems like an obvious improvement, but I was not able to figure out where to make this change. Maybe it belongs into costsize.c near run_cost += outer_path_rows * (inner_path->total_cost - inner_path->startup_cost) * joininfactor; in cost_nestloop() or it should be pushed into the index cost estimation functions. Hackers? For now you have to keep lying about effective_cache_size to make the planner overestimate merge joins to compensate for the planner's overestimation of nested loops. Sorry for having no better answer. Servus Manfred ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Optimizing >= and <= for numbers and dates
On Wed, 1 Oct 2003 19:45:29 +0200 (MEST), "Dimitri Nagiev" <[EMAIL PROTECTED]> wrote: >template1=# explain analyze select * from mytable where >mydate>='2003-09-01'; > Seq Scan on mytable (cost=0.00..2209.11 rows=22274 width=562) (actual > time=0.06..267.30 rows=22677 loops=1) > Filter: (mydate >= '2003-09-01'::date) > Total runtime: 307.71 msec Didn't you say that there are 25000 rows in the table? I can't believe that for selecting 90% of all rows an index scan would be faster. Try SET enable_seqscan = 0; explain analyze select * from mytable where mydate>='2003-09-01'; If you find the index scan to be faster, there might be lots of dead tuples in which case you should VACUUM FULL mytable; Servus Manfred ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] planner doesn't use multicolumn index
On Wed, 8 Oct 2003 09:08:59 -0500 (CDT), Adrian Demaestri <[EMAIL PROTECTED]> wrote: >the type of the fields are int2 and >int4, the where condition is v.g. partido=99 and partida=123). Write your search condition as WHERE partido=99::int2 and partida=123 Servus Manfred ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] vacuum locking
On Fri, 17 Oct 2003 09:52:26 -0600, Rob Nagler <[EMAIL PROTECTED]> wrote: >INFO: Removed 8368 tuples in 427 pages. >CPU 0.06s/0.04u sec elapsed 1.54 sec. >INFO: Pages 24675: Changed 195, Empty 0; Tup 1031519: Vac 8368, Keep 254, UnUsed >1739. >Total CPU 2.92s/2.58u sec elapsed 65.35 sec. > >INFO: Removed 232 tuples in 108 pages. >CPU 0.01s/0.02u sec elapsed 0.27 sec. >INFO: Pages 74836: Changed 157, Empty 0; Tup 4716475: Vac 232, Keep 11, UnUsed >641. >Total CPU 10.19s/6.03u sec elapsed 261.44 sec. The low UnUsed numbers indicate that FSM is working fine. >Assuming I vacuum every 15 minutes, it would seem like max_fsm_pages >should be 1000, because that's about what was reclaimed. The default >is 1. Do I need to change this? ISTM you are VACCUMing too aggressively. You are reclaiming less than 1% and 0.005%, respectively, of tuples. I would increase FSM settings to ca. 1000 fsm_relations, 10 fsm_pages and VACUUM *less* often, say every two hours or so. ... or configure autovacuum to VACUUM a table when it has 10% dead tuples. Servus Manfred ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Performance Concern
On Sun, 26 Oct 2003 00:13:36 +0300, Hannu Krosing <[EMAIL PROTECTED]> wrote: >UPDATE baz > SET customer_id = '1234' > WHERE baz_key IN ( >SELECT baz_key > FROM baz innerbaz > WHERE customer_id IS NULL > and innerbaz.baz_key = baz.baz_key > LIMIT 1000 ); AFAICS this is not what the OP intended. It is equivalent to UPDATE baz SET customer_id = '1234' WHERE customer_id IS NULL; because the subselect is now correlated to the outer query and is evaluated for each row of the outer query which makes the LIMIT clause ineffective. Servus Manfred ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Guesses on what this NestLoop is for?
On Mon, 27 Oct 2003 15:32:41 -0800, Josh Berkus <[EMAIL PROTECTED]> wrote: >FROM event_types, events > LEFT OUTER JOIN ... >WHERE events.status = 1 or events.status = 11 > and events.event_date > '2003-10-27' > and events.etype_id = event_types.etype_id > and ( ... > ); > > >What I can't figure out is what is that inredibly expensive nested loop for? Sorry, I have no answer to your question, but may I ask whether you really want to get presumably 106 output rows for each event with status 1? Or did you mean WHERE (events.status = 1 OR events.status = 11) AND ... >Ideas? I'd also try to push that NOT EXISTS condition into the FROM clause: ...LEFT JOIN (SELECT DISTINCT ON (event_id) event_id, mod_date, mod_user FROM event_history ORDER BY event_id, mod_date ) AS eh ON (events.event_id = eh.event_id) ... WHERE ... AND CASE WHEN eh.event_id IS NULL THEN events.mod_user ELSE eh.mod_user END = 562 If mod_user is NOT NULL in event_history, then CASE ... END can be simplified to COALESCE(eh.mod_user, events.mod_user). Servus Manfred ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] redundent index?
On Wed, 29 Oct 2003 10:17:24 -0500, Rod Taylor <[EMAIL PROTECTED]> wrote: >On Wed, 2003-10-29 at 09:03, Robert Treat wrote: >> Indexes: entity_watch_map_pkey primary key btree (entity_id, watch_id), >> ewm_entity_id btree (entity_id), >> >> I can't think of why the second index is there, as ISTM there is no >> instance where the first index wouldn't be used in place of the second > >The cost in evaluating the first index will be a little higher Yes, the actual cost may be a little higher. But the cost estimation might be significantly higher, so there can be border cases where the planner chooses a sequential scan over a multi-column index scan while a single-column index would correctly be recognized as being faster ... Servus Manfred ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Seeking help with a query that takes too long
On Wed, 12 Nov 2003 08:34:50 -0500, "Nick Fankhauser" <[EMAIL PROTECTED]> wrote: > -> Index Scan using >actor_full_name_uppercase on actor (cost=0.00..6.01 rows=1 width=42) ^^ >(actual time=37.62..677.44 rows=3501 loops=1) ^ > Index Cond: >((actor_full_name_uppercase >= 'SANDERS'::character varying) AND >(actor_full_name_uppercase < 'SANDERT'::character varying)) > Filter: >(actor_full_name_uppercase ~~ 'SANDERS%'::text) Nick, can you find out why this row count estimation is so far off? \x SELECT * FROM pg_stats WHERE tablename='actor' AND attname='actor_full_name_uppercase'; BTW, there seem to be missing cases: > -> Nested Loop (cost=0.00..2214.66 rows=2 width=115) > (actual time=59.05..119929.71 rows=5879 loops=1) >-> Nested Loop (cost=0.00..2205.26 rows=3 width=76) > (actual time=51.46..66089.04 rows=5882 loops=1) Servus Manfred ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Seeking help with a query that takes too long
On Wed, 12 Nov 2003 13:27:53 -0500, "Nick Fankhauser" <[EMAIL PROTECTED]> wrote: > >> You might have to resort to brute force, like "set enable_nestloop=false". > -> Seq Scan on >actor_case_assignment (cost=0.00..209980.49 rows=8669349 width=34) (actual >time=9.13..85504.05 rows=8670467 loops=1) Does actor_case_assignment contain more columns than just the two ids? If yes, do these additional fields account for ca. 70 bytes per tuple? If not, try VACUUM FULL ANALYSE actor_case_assignment; > -> Index Scan using >actor_full_name_uppercase on actor (cost=0.00..6.01 rows=1 width=42) >(actual time=51.67..24900.53 rows=3502 loops=1) This same index scan on actor has been much faster in your previous postings (677ms, 3200ms), probably due to caching effects. 7ms per tuple returned looks like a lot of disk seeks are involved. Is clustering actor on actor_full_name_uppercase an option or would this slow down other queries? Servus Manfred ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Seeking help with a query that takes too long
On Fri, 14 Nov 2003 11:00:38 -0500, "Nick Fankhauser" <[EMAIL PROTECTED]> wrote: >Good question... I've never used clustering in PostgreSQL before, so I'm >unsure. I presume this is like clustering in Oracle where the table is >ordered to match the index? Yes, something like that. With the exception that Postgres looses the clustered status, while you INSERT and UPDATE tuples. So you have to re-CLUSTER from time to time. Look at pg_stats.correlation to see, if its necessary. > Is there a way to flush out >the cache in a testing situation like this in order to start from a >consistent base? To flush Postgres shared buffers: SELECT count(*) FROM another_large_table; To flush your database pages from the OS cache: tar cf /dev/null /some/large/directory And run each of your tests at least twice to get a feeling how caching affects your specific queries. Servus Manfred ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] TEXT column and indexing
On Wed, 19 Nov 2003 10:18:18 +0100, Ian Barwick <[EMAIL PROTECTED]> wrote: >Indexes: >[...] >"opv_v_ix" btree (substr(value, 1, 128)) >SELECT obj_property_id > FROM object_property_value opv > WHERE opv.value = 'foo' Try ... WHERE substr(opv.value, 1, 128) = 'foo' HTH. Servus Manfred ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] optimization ideas for frequent, large(ish) updates in frequently accessed DB?
On Fri, 13 Feb 2004 01:58:34 +0100, "Marinos J. Yannikos" <[EMAIL PROTECTED]> wrote: >I'm looking for ideas that might improve the interactive performance of >the system, without slowing down the updates too much. IOW, you could accept slower updates. Did you actually try and throttle down the insert rate? > Here are the >characteristics of the table and its use: > >- approx. 2 million rows Doesn't sound worrying. What's the min/max/average size of these rows? How large is this table? SELECT relpages FROM pg_class WHERE relname='...'; What else is in this database, how many tables, how large is the database (du $PGDATA)? >- approx. 4-5 million rows per day are replaced in short bursts of >1-200k rows (average ~3000 rows per update) How often do you VACUUM [ANALYSE]? >- the table needs 6 indexes (not all indexes are used all the time, but >keeping them all the time slows the system down less than re-creating >some of them just before they're needed and dropping them afterwards) I agree. >- an "update" means that 1-200k rows with a common value in a particular >field are replaced with an arbitrary number of new rows (with the same >value in that field), i.e.: > >begin transaction; > delete from t where id=5; > insert into t (id,...) values (5,...); > ... [1-200k rows] >end; This is a wide variation in the number of rows. You told us the average batch size is 3000. Is this also a *typical* batch size? And what is the number of rows where you start to get the feeling that it slows down other sessions? Where do the new values come from? I don't think they are typed in :-) Do they come from external sources or from the same database? If the latter, INSERT INTO ... SELECT ... might help. >The problem is, that a large update of this kind can delay SELECT >queries running in parallel for several seconds, so the web interface >used by several people will be unusable for a short while. Silly question: By SELECT you mean pure SELECT transactions and not some transaction that *mostly* reads from the database? I mean, you are sure your SELECT queries are slowed down and not blocked by the "updates". Show us the EXPLAIN ANALYSE output for the same SELECT, once when it is fast and once when it is slow. BTW, what is fast and what is slow? >Currently, I'm using temporary tables: > [...] >This is slightly faster than inserting directly into t (and probably >faster than using COPY, even though using that might reduce the overall >load on the database). You might try using a prepared INSERT statement or COPY. >shared_buffers=10 >(I tried many values, this seems to work well for us - 12GB RAM) >wal_buffers=500 >sort_mem=80 >checkpoint_segments=16 >effective_cache_size=100 See Josh's comments. >Any help/suggestions would be greatly appreciated... Even if it's >something like "you need a faster db box, there's no other way" ;-) We have to find out, what is the bottleneck. Tell us about your environment (hardware, OS, ...). Run top and/or vmstat and look for significant differences between times of normal processing and slow phases. Post top/vmstat output here if you need help. Servus Manfred ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] optimization ideas for frequent, large(ish) updates in frequently accessed DB?
Marinos, while you are busy answering my first set of questions :-), here is an idea that might help even out resource consumption. On Fri, 13 Feb 2004 01:58:34 +0100, "Marinos J. Yannikos" <[EMAIL PROTECTED]> wrote: >begin transaction; > delete from t where id=5; > insert into t (id,...) values (5,...); > ... [1-200k rows] >end; > >The problem is, that a large update of this kind can delay SELECT >queries running in parallel for several seconds, so the web interface >used by several people will be unusable for a short while. CREATE TABLE idmap ( internalid int NOT NULL PRIMARY KEY, visibleid int NOT NULL, active bool NOT NULL ); CREATE INDEX ipmap_visible ON idmap(visibleid); Populate this table with INSERT INTO idmap SELECT id, id, true FROM t; Change SELECT ... FROM t WHERE t.id = 5; to SELECT ... FROM t INNER JOIN idmap ON (idmap.internalid = t.id AND idmap.active) WHERE idmap.visibleid = 5; When you have to replace the rows in t for id=5, start by INSERT INTO idmap VALUES (12345, 5, false); Then repeatedly INSERT INTO t (id, ...) VALUES (12345, ...); at a rate as slow as you can accept. You don't have to wrap all INSERTs into a single transaction, but batching together a few hundred to a few thousand INSERTs will improve performance. When all the new values are in the database, you switch to the new id in one short transaction: BEGIN; UPDATE idmap SET active = false WHERE visibleid = 5 AND active; UPDATE idmap SET active = true WHERE internalid = 12345; COMMIT; Do the cleanup in off-peak hours (pseudocode): FOR delid IN (SELECT internalid FROM idmap WHERE NOT active) BEGIN DELETE FROM t WHERE id = delid; DELETE FROM idmap WHERE internalid = delid; END; VACUUM ANALYSE t; VACUUM ANALYSE idmap; To prevent this cleanup from interfering with INSERTs in progress, you might want to add a "beinginserted" flag to idmap. HTH. Servus Manfred ---(end of broadcast)--- TIP 8: explain analyze is your friend