Re: [PERFORM] Need indexes on empty tables for good performance ?
On Tue, Aug 23, 2005 at 13:41:32 +1000, Lenard, Rohan (Rohan) [EMAIL PROTECTED] wrote: I've read that indexes aren't used for COUNT(*) and I've noticed (7.3.x) with EXPLAIN that indexes never seem to be used on empty tables - is there any reason to have indexes on empty tables, or will postgresql never use them. count will use indexes if appropiate. The counts themselves are NOT in the indexes, so counts of significant fractions of a table (in particular of the whole table) won't benefit from indexes. You aren't going to get query speed ups by putting indexes on empty tables. However, they may be required if you have unique or primary keys declared in the table. You may want them to enforce some kinds of constraints. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Performance for relative large DB
tobbe wrote: Hi Chris. Thanks for the answer. Sorry that i was a bit unclear. 1) We update around 20.000 posts per night. 2) What i meant was that we suspect that the DBMS called PervasiveSQL that we are using today is much to small. That's why we're looking for alternatives. Today we base our solution much on using querry-specific tables created at night, so instead of doing querrys direct on the post table (with 4-6M rows) at daytime, we have the data pre-aligned in several much smaller tables. This is just to make the current DBMS coop with our amount of data. What I am particulary interested in is if we can expect to run all our select querrys directly from the post table with PostgreSQL. 20k transactions per day? Doesn't seem too bad. That amounts to how many transactions per second during peak times? Personally I don't think it will be a problem, but you might want to clarify what sort of load you are expecting during its peak time. 3) How well does postgres work with load balancing environments. Is it built-in? There is no load balancing built in. You would need to use Slony-I and possibly Pg-Pool for that. I don't know about Pg-Pool, but Slony-I was written in large part by member(s?) of the core development team so even if it is not built in it is not as if it is a team of outsiders who wrote it. If you need something proprietary, there are similar solutions with replication built in which are based on PostgreSQL and licensed under proprietary licenses. Best Wishes, Chris Travers Metatron Technology Consulting ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Weird performance drop after VACUUM
On Cum, 2005-08-26 at 19:31 -0400, Tom Lane wrote: I think the problem is that the planner is underestimating the cost of evaluating this complicated filter condition: - Seq Scan on scf_irsaliye irs (cost=0.00..30.00 rows=1 width=20) (actual time=0.290..0.290 rows=0 loops=1) Filter: (((karsifirma)::text = 'KENDI'::text) AND (((turu)::text = 'MAI'::text) OR ((turu)::text = 'KGI'::text) OR ((turu)::text = 'PS'::text) OR ((turu)::text = 'TS'::text) OR ((turu)::text = 'KC'::text) OR ((turu)::text = 'KCO'::text)) AND (((_key_sis_depo_dest)::text = '003l$1$$'::text) OR ((_key_sis_depo_dest)::text = '0048$1$$'::text) OR ((_key_sis_depo_dest)::text = '004b$1$$'::text) OR ((_key_sis_depo_dest)::text = '004d$1$$'::text)) AND (tarih = '2005-08-26'::date)) While you could attack that by raising the cpu_operator_cost parameter, it would also be worth inquiring *why* the condition is so expensive to evaluate. I am suspicious that you are running the database in a locale in which strcoll() is really slow. Can you run it in C locale instead, or do you really need locale-aware behavior? Can you switch to a different database encoding? (A single-byte encoding such as Latin1 might be faster than UTF8, for example.) Yes, you are perfectly right. We are using UTF8 and tr_TR.UTF8 locale. However, I tried the same tests with latin1 and C locale, it is surely faster, but not dramatically. i.e.: Before VacuumAfter Vacuum UTF8 and tr_TR.UTF8: ~8 s~110 s latin1 and C: ~7 s ~65 s I also played with cpu_operator_cost parameter and it dramatically reduced query times, but not to the level before vacuum: Before VacuumAfter Vacuum UTF8 and tr_TR.UTF8: ~8 s~11 s latin1 and C: ~7 s ~9 s These values are much better but I really wonder if I can reach the performance levels before vacuum. I am also worried about the side-effects that may be caused by the non-default cpu_operator_cost parameter. Another possibility is to take a hard look at whether you can't simplify the filter condition, but that'd require more knowledge of your application than I have. Yes that is another option, we are even considering schema changes to use less character types, but these are really costly and error-prone operations at the moment. Or you could just play with the order of the filter conditions ... for example, the date condition at the end is probably far cheaper to test than the text comparisons, so if that's fairly selective it'd be worth putting it first. We are experimenting on this. Thanks your help! Best Regards, Umit Oztosun ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Weird performance drop after VACUUM
On Fri, Aug 26, 2005 at 07:31:51PM -0400, Tom Lane wrote: Or you could just play with the order of the filter conditions ... for example, the date condition at the end is probably far cheaper to test than the text comparisons, so if that's fairly selective it'd be worth putting it first. That's an interesting approach -- could the planner do such things itself? /* Steinar */ -- Homepage: http://www.sesse.net/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Inefficient queryplan for query with intersectable
On 27-8-2005 0:56, Tom Lane wrote: Arjen van der Meijden [EMAIL PROTECTED] writes: As said, it chooses sequential scans or the wrong index plans over a perfectly good plan that is just not selected when the parameters are too well tuned or sequential scanning of the table is allowed. I think some part of the problem comes from using inconsistent datatypes. For instance, it seems very odd that the thing is not using a hash or something to handle t_0.Cat2 IN (SELECT 545 UNION SELECT ID FROM cat WHERE ParentID = 545) seeing that it correctly guesses there are only going to be about 8 rows in the union. Part of the reason is that cat2 is smallint, whereas the output of the union must be at least int, maybe wider depending on the datatype of cat.id (which you did not show us); so the comparison isn't hashable. Even a smallint vs int comparison would be mergejoinable, though, so I'm really wondering what cat.id is. cat.id is a smallint. I replaced that subquery with these two: t_0.Cat2 IN (SELECT '545'::smallint UNION SELECT ID FROM cat WHERE ParentID = '545'::smallint) t_0.Cat2 IN (SELECT '545' UNION SELECT ID FROM cat WHERE ParentID = '545') But appareantly there is a bug in the explain mechanism of the 8.1devel I'm using (I downloaded a nightly 25 august somewhere in the morning (CEST)), since it returned: ERROR: bogus varno: 9 So I can't see whether the plan changed, execution times didn't change much. I also replaced the subselect with the result of that query (like ('545', '546', ...) ) but that didn't seem to make much difference in the execution time as well. The plan did change of course, it used a BitmapOr of 8 Bitmap Index Scans over the pwprodukten. By the way, as far as I know, this is the only datatype mismatch in the query. Another big part of the problem comes from poor result size estimation. I'm not sure you can eliminate that entirely given the multiple conditions on different columns (which'd require cross-column statistics to really do well, which we do not have). But you could avoid constructs like WHERE ... t_1.recordtimestamp = (SELECT max_date - 60 FROM last_dates WHERE table_name = 'pricetracker') The planner is basically going to throw up its hands and make a default guess on the selectivity of this; it's not smart enough to decide that the sub-select probably represents a constant. What I'd do with this is to define a function marked STABLE for the sub-select result, perhaps something like [...] need.) Then write the query like WHERE ... t_1.recordtimestamp = get_last_date('pricetracker', 60) In this formulation the planner will be able to make a reasonable guess about how many rows will match ... at least if your statistics are up to date ... I tried such a function and also tried replacing it with the fixed outcome of that suquery itself. Although it has a considerable more accurate estimate of the rows returned, it doesn't seem to impact the basic plan much. It does make the sub-query itself use another index (the one on the recordtimestamp alone, rather than the combined index on leverancierid and recordtimestamp). With that changed subquery it estimates about 4173 rows over 4405 real rows. Actually with the adjusted or original query, it seems to favor the hash join over a nested loop, but the rest of the plan (for the subqueries) seems to be exactly the same. Here is the first part of the explain analyze when it can do any trick it wants: Hash Join (cost=7367.43..186630.19 rows=132426 width=12) (actual time=191.726..11072.025 rows=58065 loops=1) Hash Cond: (outer.produktid = inner.id) - Seq Scan on pwprijs chart_2 (cost=0.00..137491.07 rows=7692207 width=16) (actual time=0.018..6267.744 rows=7692207 loops=1) - Hash (cost=7366.02..7366.02 rows=565 width=4) (actual time=123.265..123.265 rows=103 loops=1) - SetOp Intersect (cost=7332.10..7360.37 rows=565 width=4) (actual time=115.760..123.192 rows=103 loops=1) [snip] And here is the first (and last) part when I disable hash joins or seq scans: Nested Loop (cost=7334.92..517159.39 rows=132426 width=12) (actual time=111.905..512.575 rows=58065 loops=1) - SetOp Intersect (cost=7332.10..7360.37 rows=565 width=4) (actual time=111.588..120.035 rows=103 loops=1) [snip] - Bitmap Heap Scan on pwprijs chart_2 (cost=2.82..895.85 rows=234 width=16) (actual time=0.344..2.149 rows=564 loops=103) Recheck Cond: (chart_2.produktid = outer.id) - Bitmap Index Scan on pwprijs_produktid_idx (cost=0.00..2.82 rows=234 width=0) (actual time=0.189..0.189 rows=564 loops=103) Index Cond: (chart_2.produktid = outer.id) Is a nested loop normally so much (3x) more costly than a hash join? Or is it just this query that gets estimated wronly? Best regards, Arjen ---(end of broadcast)--- TIP 4: Have you searched our list archives?
Re: [PERFORM] Inefficient queryplan for query with intersectable
Arjen van der Meijden [EMAIL PROTECTED] writes: But appareantly there is a bug in the explain mechanism of the 8.1devel I'm using (I downloaded a nightly 25 august somewhere in the morning (CEST)), since it returned: ERROR: bogus varno: 9 Yeah, someone else sent in a test case for this failure (or at least one with a similar symptom) yesterday. I'll try to fix it today. Is a nested loop normally so much (3x) more costly than a hash join? Or is it just this query that gets estimated wronly? There's been some discussion that we are overestimating the cost of nestloops in general, because we don't take into account that successive scans of the inner relation are likely to find many pages already in cache from the earlier scans. So far no one's come up with a good cost model to use for this, though. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Inefficient queryplan for query with
At 10:27 AM 8/27/2005, Tom Lane wrote: Arjen van der Meijden [EMAIL PROTECTED] writes: But appareantly there is a bug in the explain mechanism of the 8.1devel I'm using (I downloaded a nightly 25 august somewhere in the morning (CEST)), since it returned: ERROR: bogus varno: 9 Yeah, someone else sent in a test case for this failure (or at least one with a similar symptom) yesterday. I'll try to fix it today. Is a nested loop normally so much (3x) more costly than a hash join? Or is it just this query that gets estimated wronly? There's been some discussion that we are overestimating the cost of nestloops in general, because we don't take into account that successive scans of the inner relation are likely to find many pages already in cache from the earlier scans. So far no one's come up with a good cost model to use for this, though. regards, tom lane It certainly seems common in the EXPLAIN ANALYZE output I see that the (estimated) cost of Nested Loop is far higher than the actual time measured. What happened when someone tried the naive approach of telling the planner to estimate the cost of a nested loop based on fitting whatever entities are involved in the nested loop in RAM as much as possible? When there are multiple such mappings, use whichever one results in the lowest cost for the NL in question. Clearly, this should lead to an underestimate of the cost of the constant of operation involved, but since nested loops have the only polynomial growth function of the planner's choices, NL's should still have a decent chance of being more expensive than other choices under most circumstances. In addition, if those costs are based on actual measurements of how long it takes to do such scans then the estimated cost has a decent chance of being fairly accurate under such circumstances. It might not work well, but it seems like a reasonable first attempt at a solution? Ron Peacetree ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Weird performance drop after VACUUM
On Sat, Aug 27, 2005 at 11:05:01AM -0400, Tom Lane wrote: It could, but it doesn't really have enough information. We don't currently have any model that some operators are more expensive than others. IIRC the only sort of reordering the current code will do in a filter condition list is to push clauses involving sub-SELECTs to the end. I was more thinking along the lines of reordering a AND/OR b to b AND/OR a if b has lower selectivity than a. /* Steinar */ -- Homepage: http://www.sesse.net/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Weird performance drop after VACUUM
Steinar H. Gunderson [EMAIL PROTECTED] writes: On Sat, Aug 27, 2005 at 11:05:01AM -0400, Tom Lane wrote: It could, but it doesn't really have enough information. We don't currently have any model that some operators are more expensive than others. IIRC the only sort of reordering the current code will do in a filter condition list is to push clauses involving sub-SELECTs to the end. I was more thinking along the lines of reordering a AND/OR b to b AND/OR a if b has lower selectivity than a. Yeah, but if b is considerably more expensive to evaluate than a, that could still be a net loss. To do it correctly you really need to trade off cost of evaluation against selectivity, and the planner currently only knows something about the latter (and all too often, not enough :-(). I'd like to do this someday, but until we get some cost info in there I think it'd be a mistake to do much re-ordering of conditions. Currently the SQL programmer can determine what happens by writing his query carefully --- if we reorder based on selectivity only, we could make things worse, and there'd be no way to override it. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Weird performance drop after VACUUM
Steinar H. Gunderson [EMAIL PROTECTED] writes: On Fri, Aug 26, 2005 at 07:31:51PM -0400, Tom Lane wrote: Or you could just play with the order of the filter conditions ... for example, the date condition at the end is probably far cheaper to test than the text comparisons, so if that's fairly selective it'd be worth putting it first. That's an interesting approach -- could the planner do such things itself? It could, but it doesn't really have enough information. We don't currently have any model that some operators are more expensive than others. IIRC the only sort of reordering the current code will do in a filter condition list is to push clauses involving sub-SELECTs to the end. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Limit + group + join
Mark Kirkwood [EMAIL PROTECTED] writes: joinlimit=# EXPLAIN SELECT c.id FROM c JOIN b ON c_id=c.id GROUP BY c.id ORDER BY c.id DESC LIMIT 5; [ fails to pick an available index-scan-backward plan ] I looked into this and found that indeed the desirable join plan was getting generated, but it wasn't picked because query_planner didn't have an accurate idea of how much of the join needed to be scanned to satisfy the GROUP BY step. I've committed some changes that hopefully will let 8.1 be smarter about GROUP BY ... LIMIT queries. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Limit + group + join
Tom Lane wrote: I looked into this and found that indeed the desirable join plan was getting generated, but it wasn't picked because query_planner didn't have an accurate idea of how much of the join needed to be scanned to satisfy the GROUP BY step. I've committed some changes that hopefully will let 8.1 be smarter about GROUP BY ... LIMIT queries. Very nice :-) joinlimit=# EXPLAIN SELECT c.id FROM c JOIN b ON c_id=c.id GROUP BY c.id ORDER BY c.id DESC LIMIT 5; QUERY PLAN -- Limit (cost=0.00..15.23 rows=5 width=4) - Group (cost=0.00..243730.00 rows=8 width=4) - Nested Loop (cost=0.00..243530.00 rows=8 width=4) - Index Scan Backward using c_pkey on c (cost=0.00..1450.00 rows=8 width=4) - Index Scan using b_on_c on b (cost=0.00..3.01 rows=1 width=4) Index Cond: (b.c_id = outer.id) (6 rows) This is 8.1devel from today. regards Mark ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Limit + group + join
[Tom Lane] I looked into this and (...) I've committed some changes that hopefully will let 8.1 be smarter about GROUP BY ... LIMIT queries. [Mark Kirkwood] Very nice :-) (...) This is 8.1devel from today. Splendid :-) Unfortunately we will not be upgrading for some monthes still, but anyway I'm happy. This provides yet another good argument for upgrading sooner. I'm also happy to see such a perfect match: - A problem that can be reduced from beeing complex and production-specific, to simple and easily reproducible. - Enthusiastic people testing it and pinpointing even more precisely what conditions will cause the condition - Programmers actually fixing the issue - Testers verifying that it was fixed Long live postgresql! :-) -- Notice of Confidentiality: This email is sent unencrypted over the network, and may be stored on several email servers; it can be read by third parties as easy as a postcard. Do not rely on email for confidential information. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings