[PERFORM] partial index regarded more expensive
So, I have a table game with a timestamp attribute 'game_end', ranging from jan-2005 to present. The game table also have an attribute state, with live games beeing in state 2, and ended games beeing in state 4 (so, game_end+deltanow() usually means state=4). There are also an insignificant number of games in states 1,3. This query puzzles me: select * from game where game_end'2005-07-30' and state in (3,4); Now, one (at least me) should believe that the best index would be a partial index, resolved_game_by_date btree (game_end) WHERE ((state = 3) OR (state = 4)) NBET= explain analyze select * from game where game_end'2005-07-30' and state in (3,4); QUERY PLAN Index Scan using resolved_game_by_date on game (cost=0.00..7002.87 rows=7147 width=555) (actual time=0.220..86.234 rows=3852 loops=1) Index Cond: (game_end '2005-07-30 00:00:00'::timestamp without time zone) Filter: ((state = 3) OR (state = 4)) Total runtime: 90.568 ms (4 rows) Since state has only two significant states, I wouldn't believe this index to be any good: game_by_state btree (state) ...and it seems like I'm right: NBET= explain analyze select * from game where game_end'2005-07-30' and state in (3,4); QUERY PLAN Index Scan using game_by_state, game_by_state on game (cost=0.00..4413.78 rows=7147 width=555) (actual time=0.074..451.771 rows=3851 loops=1) Index Cond: ((state = 3) OR (state = 4)) Filter: (game_end '2005-07-30 00:00:00'::timestamp without time zone) Total runtime: 457.132 ms (4 rows) Now, how can the planner believe the game_by_state-index to be better? ('vacuum analyze game' did not significantly impact the numbers, and I've tried running the queries some times with and without the game_by_state-index to rule out cacheing effects) -- Tobias Brox This signature has been virus scanned, and is probably safe to read. This mail may contain confidential information, please keep your eyes closed. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] partial index regarded more expensive
why not simply create an index on (game_end, state) ? ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] partial index regarded more expensive
[PFC - Wed at 08:15:13PM +0200] why not simply create an index on (game_end, state) ? No, the planner prefers to use the partial index (I dropped the index on game(state)). -- Tobias Brox, Nordicbet IT dept This signature has been virus scanned, and is probably safe to read. This mail may contain confidential information, please keep your eyes closed. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Planner doesn't look at LIMIT?
I have a case that I though was an example of this issue, and that this patch would correct. I applied this patch to an 8.0.3 source distribution, but it didn't seem to solve my problem. In a nutshell, I have a LIMIT query where the planner seems to favor a merge join over a nested loop. I've simplified the query as much as possible: itvtrackdata3= \d tableA Table public.tableA Column | Type | Modifiers +--+--- foo| bigint | not null bar| smallint | not null bap| bigint | not null bip| bigint | not null bom| bigint | not null Indexes: idx_tableA_bip btree (bip) WHERE (bip = 900::bigint) idx_tableA_foo btree (foo) itvtrackdata3= \d tableB Table tableB Column | Type | Modifiers -+--+--- bim | bigint | not null bif | smallint | not null baf | smallint | not null bof | smallint | not null buf | smallint | not null foo | bigint | not null Indexes: idx_tableB_bim btree (bim, foo) itvtrackdata3= set default_statistics_target to 1000; SET Time: 0.448 ms itvtrackdata3= analyze tableA; ANALYZE Time: 4237.151 ms itvtrackdata3= analyze tableB; ANALYZE Time: 46672.939 ms itvtrackdata3= explain analyze SELECT * FROM tableB NATURAL JOIN tableA WHERE bim=72555896091359 AND bim72555935412959 AND bim=bap ORDER BY bim ASC LIMIT 1; QUERY PLAN -- Limit (cost=149626.57..252987.71 rows=1 width=50) (actual time=5684.013..5684.013 rows=1 loops=1) - Merge Join (cost=149626.57..252987.71 rows=1 width=50) (actual time=5684.012..5684.012 rows=1 loops=1) Merge Cond: ((outer.bim = inner.bap) AND (outer.foo = inner.foo)) - Index Scan using idx_tableB_bim on tableB (cost=0.00..97391.22 rows=55672 width=24) (actual time=0.017..0.059 rows=29 loops=1) Index Cond: ((bim = 72555896091359::bigint) AND (bim 72555935412959::bigint)) - Sort (cost=149626.57..151523.94 rows=758948 width=34) (actual time=5099.300..5442.825 rows=560856 loops=1) Sort Key: tableA.bap, tableA.foo - Seq Scan on tableA (cost=0.00..47351.48 rows=758948 width=34) (actual time=0.021..1645.204 rows=758948 loops=1) Total runtime: 5706.655 ms (9 rows) Time: 5729.984 ms itvtrackdata3= set enable_mergejoin to false; SET Time: 0.373 ms itvtrackdata3= explain analyze SELECT * FROM tableB NATURAL JOIN tableA WHERE bim=72555896091359 AND bim72555935412959 AND bim=bap ORDER BY bim ASC LIMIT 1; QUERY PLAN -- Limit (cost=0.00..432619.68 rows=1 width=50) (actual time=11.149..11.150 rows=1 loops=1) - Nested Loop (cost=0.00..432619.68 rows=1 width=50) (actual time=11.148..11.148 rows=1 loops=1) Join Filter: (outer.bim = inner.bap) - Index Scan using idx_tableB_bim on tableB (cost=0.00..97391.22 rows=55672 width=24) (actual time=0.017..0.062 rows=29 loops=1) Index Cond: ((bim = 72555896091359::bigint) AND (bim 72555935412959::bigint)) - Index Scan using idx_tableA_foo on tableA (cost=0.00..6.01 rows=1 width=34) (actual time=0.007..0.379 rows=1 loops=29) Index Cond: (outer.foo = tableA.foo) Total runtime: 11.215 ms (8 rows) Time: 32.007 ms Have I just flubbed the patch, or is there something else going on here? Thanks, --Ian On Fri, 2005-07-22 at 12:20, Tom Lane wrote: I wrote: Dawid Kuroczko [EMAIL PROTECTED] writes: qnex=# EXPLAIN SELECT * FROM log NATURAL JOIN useragents LIMIT 1; Limit (cost=15912.20..15912.31 rows=1 width=272) - Hash Join (cost=15912.20..5328368.96 rows=47044336 width=272) If I set enable_hashjoin=false: qnex=# EXPLAIN ANALYZE SELECT * FROM log NATURAL LEFT JOIN useragents LIMIT 1; Limit (cost=0.00..3.07 rows=1 width=272) (actual time=74.214..74.216 rows=1 loops=1) - Nested Loop Left Join (cost=0.00..144295895.01 rows=47044336 width=272) (actual time=74.204..74.204 rows=1 loops=1) This is quite strange. The nestloop plan definitely should be preferred in the context of the LIMIT, considering that it has far lower estimated cost. And it is preferred in simple tests for me. After a suitable period of contemplating my navel, I figured out what is going on here: the total costs involved are large
Re: [PERFORM] Planner doesn't look at LIMIT?
Ian Westmacott [EMAIL PROTECTED] writes: In a nutshell, I have a LIMIT query where the planner seems to favor a merge join over a nested loop. The planner is already estimating only one row out of the join, and so the LIMIT doesn't affect its cost estimates at all. It appears to me that the reason the nestloop plan is fast is just chance: a suitable matching row is found very early in the scan of tableB, so that the indexscan on it can stop after 29 rows, instead of having to go through all 55000 rows in the given range of bim. If it'd have had to go through, say, half of the rows to find a match, the sort/merge plan would show up a lot better. If this wasn't chance, but was expected because there are many matching rows and not only one, then there's a statistical problem. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
[PERFORM] Speedier count(*)
I have a web page for my customers that shows them count of records and some min/max date ranges in each table of a database, as this is how we bill them for service. They can log in and check the counts at any time. I'd like for the counts to be as fresh as possible by keeping this dynamic, but I will use a periodic 'snapshot'/cron job if that is the only option to speed this up. I have thought about using the table statistics, but the estimate error is probably unacceptable because of the billing purposes. For some reason, the SQL Server we migrated the app from can return count(*) in a split second on multi-million row tables, even though it is a MUCH slower box hardware-wise, but it's now taking many seconds to run. I have read in the archives the problems MVCC brings into the count(*) dilemma forcing Pg to run a seq scan to get counts. Does SQLServer not use MVCC or have they found another approach for arriving at this number? Compounding all the min/max and counts from other tables and all those queries take about a minute to run. The tables will contain anywhere from 1 million to 40 million rows. Also, I am using select ... group by ... order by .. limit 1 to get the min/max since I have already been bit by the issue of min() max() being slower. -Dan ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Speedier count(*)
Also, I am using select ... group by ... order by .. limit 1 to get the min/max since I have already been bit by the issue of min() max() being slower. This specific instance is fixed in 8.1 Sincerely, Joshua D. Drake -Dan ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Speedier count(*)
On Wed, Aug 10, 2005 at 05:37:49PM -0600, Dan Harris wrote: Also, I am using select ... group by ... order by .. limit 1 to get the min/max since I have already been bit by the issue of min() max() being slower. PostgreSQL 8.1 will have optimizations for certain MIN and MAX queries. http://archives.postgresql.org/pgsql-committers/2005-04/msg00163.php http://archives.postgresql.org/pgsql-committers/2005-04/msg00168.php -- Michael Fuhr ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Speedier count(*)
Dan Harris wrote: I have a web page for my customers that shows them count of records and some min/max date ranges in each table of a database, as this is how we bill them for service. They can log in and check the counts at any time. I'd like for the counts to be as fresh as possible by keeping this dynamic, but I will use a periodic 'snapshot'/cron job if that is the only option to speed this up. I have thought about using the table statistics, but the estimate error is probably unacceptable because of the billing purposes. For some reason, the SQL Server we migrated the app from can return count(*) in a split second on multi-million row tables, even though it is a MUCH slower box hardware-wise, but it's now taking many seconds to run. I have read in the archives the problems MVCC brings into the count(*) dilemma forcing Pg to run a seq scan to get counts. Does SQLServer not use MVCC or have they found another approach for arriving at this number? Compounding all the min/max and counts from other tables and all those queries take about a minute to run. The tables will contain anywhere from 1 million to 40 million rows. I believe SQL Server doesn't use MVCC in the same way. At the very least, it stores some row information in the index, so it can get some info from just an index, without having to go to the actual page (MVCC requires a main page visit to determine visibility.) Depending on how much it impacts performance, you can create an INSERT/UPDATE trigger so that whenever a new entry is added, it automatically updates a statistics table. It would be maintained as you go, rather than periodically like a cron job. I would go Cron if things can be slightly out of date (like 1 hour at least), and you need updates inserts to not be slowed down. Otherwise I think the trigger is nicer, since it doesn't do redundant work, and means everything stays up-to-date. Also, I am using select ... group by ... order by .. limit 1 to get the min/max since I have already been bit by the issue of min() max() being slower. -Dan John =:- signature.asc Description: OpenPGP digital signature
Re: [PERFORM] partial index regarded more expensive
Tobias Brox [EMAIL PROTECTED] writes: This query puzzles me: select * from game where game_end'2005-07-30' and state in (3,4); ... Now, how can the planner believe the game_by_state-index to be better? I suspect the problem has to do with lack of cross-column statistics. The planner does not know that state=4 is correlated with game_end, and it's probably coming up with some bogus guesses about the numbers of index rows visited in each case. You haven't given enough info to quantify this, though. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] it is always delete temp table will slow down the postmaster?
hi, i got one situation here, i create one pl/pgsql function that using temp table to store temporary data. wherever i execute my function, i need to delete all the data inside the temp table, but this will slow down the searching function if i conitnue to run the server because old tuples are not really clear if just using delete command. so i use drop table command and recreate the table. my question is, would it slow down the postmaster speed if i continue to run this searching function more than 300 time per day?, cause the speed for execute searching function will graduatelly increase after i test it for few day? anyway to test it is causing by the drop temp table and create temp table command? regards ivan
Re: [PERFORM] Speedier count(*)
Hi Dan, On Wed, 10 Aug 2005, Dan Harris wrote: I have a web page for my customers that shows them count of records and some min/max date ranges in each table of a database, as this is how we bill them for service. They can log in and check the counts at any time. I'd like for the counts to be as fresh as possible by keeping this dynamic, but I will use a periodic 'snapshot'/cron job if that is the only option to speed this up. I have thought about using the table statistics, but the estimate error is probably unacceptable because of the billing purposes. For some reason, the SQL Server we migrated the app from can return count(*) in a split second on multi-million row tables, even though it is a MUCH slower box hardware-wise, but it's now taking many seconds to run. I have read in the archives the problems MVCC brings into the count(*) dilemma forcing Pg to run a seq scan to get counts. Does SQLServer not use MVCC or have they found another SQL Server probably jumps through a lot of hoops to do fast count(*)s. I'm sure we could do something similar -- it's just a question of complexity, resources, desirability, etc. The are other solutions, which makes the idea of doing it less attractive still. approach for arriving at this number? Compounding all the min/max and counts from other tables and all those queries take about a minute to run. The tables will contain anywhere from 1 million to 40 million rows. Also, I am using select ... group by ... order by .. limit 1 to get the min/max since I have already been bit by the issue of min() max() being slower. I generally pre generate the results. There are two ways to do this: the 'snapshot'/cronjon you mentioned or using rules and triggers to maintain 'count' tables. The idea is that if data is added, modified or removed from your table, you modify counters in these other tables. Alternatively, feel free to post your schema and sample queries with explain analyze results to this list. Alternatively, jump on irc at irc.freenode.net #postgresql and someone will be more than happy to look through the problem in more detail. Thanks, Gavin ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Speedier count(*)
Here's a trigger I wrote to perform essentially the same purpose. The nice thing about this is it keeps the number up to date for you, but you do incur slight overhead. CREATE TABLE test (id serial primary key, name varchar(20)); CREATE TABLE rowcount (tablename varchar(50), rowcount bigint default 0); CREATE INDEX rowcount_tablename ON rowcount(tablename); CREATE OR REPLACE FUNCTION del_rowcount() RETURNS trigger AS $$ BEGIN UPDATE rowcount SET rowcount = rowcount-1 WHERE tablename = TG_RELNAME; RETURN OLD; END; $$ LANGUAGE PLPGSQL; CREATE OR REPLACE FUNCTION add_rowcount() RETURNS trigger AS $$ BEGIN UPDATE rowcount SET rowcount = rowcount+1 WHERE tablename = TG_RELNAME; RETURN NEW; END; $$ LANGUAGE PLPGSQL; CREATE TRIGGER del_rowcount_tr BEFORE DELETE ON test FOR EACH ROW EXECUTE PROCEDURE del_rowcount(); CREATE TRIGGER add_rowcount_tr BEFORE INSERT ON test FOR EACH ROW EXECUTE PROCEDURE add_rowcount(); INSERT INTO rowcount (tablename) VALUES ('test'); root=# select * from test; id | name +-- (0 rows) Time: 0.934 ms root=# select * from rowcount; tablename | rowcount ---+-- test |0 (1 row) Time: 0.630 ms root=# insert into test (name) values ('blah'); INSERT 1190671626 1 Time: 3.278 ms root=# select * from test; id | name +-- 5 | blah (1 row) Time: 0.612 ms root=# select * from rowcount; tablename | rowcount ---+-- test |1 (1 row) Time: 0.640 ms root=# insert into test (name) values ('blah'); INSERT 1190671627 1 Time: 1.677 ms root=# select * from test; id | name +-- 5 | blah 6 | blah (2 rows) Time: 0.653 ms root=# select * from rowcount; tablename | rowcount ---+-- test |2 (1 row) Time: 0.660 ms root=# delete from test where id = 6; DELETE 1 Time: 2.412 ms root=# select * from test; id | name +-- 5 | blah (1 row) Time: 0.631 ms root=# select * from rowcount; tablename | rowcount ---+-- test |1 (1 row) Time: 0.609 ms One thing to be mindful of . . . Truncate is NOT accounted for with this, and unfortunately the rule system doesn't allow truncate operations so you can't work around it that way. 'njoy, Mark On 8/10/05 11:52 PM, Gavin Sherry [EMAIL PROTECTED] wrote: Hi Dan, On Wed, 10 Aug 2005, Dan Harris wrote: I have a web page for my customers that shows them count of records and some min/max date ranges in each table of a database, as this is how we bill them for service. They can log in and check the counts at any time. I'd like for the counts to be as fresh as possible by keeping this dynamic, but I will use a periodic 'snapshot'/cron job if that is the only option to speed this up. I have thought about using the table statistics, but the estimate error is probably unacceptable because of the billing purposes. For some reason, the SQL Server we migrated the app from can return count(*) in a split second on multi-million row tables, even though it is a MUCH slower box hardware-wise, but it's now taking many seconds to run. I have read in the archives the problems MVCC brings into the count(*) dilemma forcing Pg to run a seq scan to get counts. Does SQLServer not use MVCC or have they found another SQL Server probably jumps through a lot of hoops to do fast count(*)s. I'm sure we could do something similar -- it's just a question of complexity, resources, desirability, etc. The are other solutions, which makes the idea of doing it less attractive still. approach for arriving at this number? Compounding all the min/max and counts from other tables and all those queries take about a minute to run. The tables will contain anywhere from 1 million to 40 million rows. Also, I am using select ... group by ... order by .. limit 1 to get the min/max since I have already been bit by the issue of min() max() being slower. I generally pre generate the results. There are two ways to do this: the 'snapshot'/cronjon you mentioned or using rules and triggers to maintain 'count' tables. The idea is that if data is added, modified or removed from your table, you modify counters in these other tables. Alternatively, feel free to post your schema and sample queries with explain analyze results to this list. Alternatively, jump on irc at irc.freenode.net #postgresql and someone will be more than happy to look through the problem in more detail. Thanks, Gavin ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq