[PERFORM] Seq scan on zero-parameters function
Hi! I'd like to know if this is expected behavior. These are two couples of queries. In each couple, the first one has a WHERE field = function() condition, just like the second one, but in the form WHERE field = (SELECT function()). In my opinion, both should have the same execution plan, as the function has no parameters and, therefore, is constant. I'm concerned about this, because the second form looks like a workaround. *** TESTED IN: PostgreSQL 7.4.1 on i686-pc-cygwin *** pgdb=# explain analyze select count(*) from t_students where period = (select current_period_id()); QUERY PLAN --- Aggregate (cost=127.84..127.84 rows=1 width=0) (actual time=1.000..1.000 rows=1 loops=1) InitPlan - Result (cost=0.00..0.01 rows=1 width=0) (actual time=1.000..1.000 rows=1 loops=1) - Index Scan using i_t_students__period on t_students (cost=0.00..127.71 rows=44 width=0) (actual time=1.000..1.000 rows=21 loop=1) Index Cond: (period = $0) Total runtime: 1.000 ms (6 rows) pgdb=# explain analyze select count(*) from t_students where period = (select current_period_id()); QUERY PLAN --- Aggregate (cost=127.84..127.84 rows=1 width=0) (actual time=1.000..1.000 rows=1 loops=1) InitPlan - Result (cost=0.00..0.01 rows=1 width=0) (actual time=1.000..1.000 rows=1 loops=1) - Index Scan using i_t_students__period on t_students (cost=0.00..127.71 rows=44 width=0) (actual time=1.000..1.000 rows=21 loop=1) Index Cond: (period = $0) Total runtime: 1.000 ms (6 rows) pgdb=# select version(); version --- PostgreSQL 7.4.1 on i686-pc-cygwin, compiled by GCC gcc (GCC) 3.3.1 (cygming special) (1 row) pgdb=# *** TESTED IN: PostgreSQL 7.3.4 on i386-redhat-linux-gnu *** pgdb=# explain analyze select count(*) from t_students where period = current_period_id(); QUERY PLAN --- Aggregate (cost=182.32..182.32 rows=1 width=0) (actual time=49077.38..49077.38 rows=1 loops=1) - Seq Scan on t_students (cost=0.00..182.22 rows=43 width=0) (actual time=17993.89..49077.13 rows=21 loops=1) Filter: (period = current_period_id()) Total runtime: 49077.61 msec (4 rows) pgdb=# explain analyze select count(*) from t_students where period = (select current_period_id()); QUERY PLAN -- Aggregate (cost=125.19..125.19 rows=1 width=0) (actual time=131.59..131.60 rows=1 loops=1) InitPlan - Result (cost=0.00..0.01 rows=1 width=0) (actual time=41.05..41.06 rows=1 loops=1) - Index Scan using i_t_students__period on t_students (cost=0.00..125.08 rows=43 width=0) (actual time=131.28..131.48 rows=21 loops=1) Index Cond: (period = $0) Total runtime: 131.95 msec (6 rows) pgdb=# select version(); version - PostgreSQL 7.3.4 on i386-redhat-linux-gnu, compiled by GCC 2.96 (1 row) -- Octavio Alvarez. E-mail: [EMAIL PROTECTED] Agradezco que sus correos sean enviados siempre a esta dirección. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Seq scan on zero-parameters function
Tomasz Myrta said: Dnia 2004-02-06 08:19, U¿ytkownik Octavio Alvarez napisa³: In each couple, the first one has a WHERE field = function() condition, just like the second one, but in the form WHERE field = (SELECT function()). In my opinion, both should have the same execution plan, as the function has no parameters and, therefore, is constant. Nope. What would you say about function without params returning timeofday()? Is it constant? No... :-P ;-) If you are sure, that your function returns constant value - declare it as IMMUTABLE. (look at CREATE FUNCTION documentation) Thanks for the hint. In fact, my current_period_id() is based on time, but it should be constant along the query execution. I mean, I don't want some records filtered with some values and other with other values... I'll have an uncongruent recordset. Say SELECT [field-list] FROM [complex-join] WHERE sec = datepart('second', now()); Now suppose the query takes always more than 1 second because of the complex-join or whatever reason: I will naver have a congruent recordset. IMMUTABLE wouldn't help here, only wrapping the function in a subquery. Is this expected behavior? Is this standards compliant (if it can be qualified as such)? Octavio. -- Octavio Alvarez. E-mail: [EMAIL PROTECTED] Agradezco que sus correos sean enviados siempre a esta dirección. ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Increase performance of a UNION query that thakes 655.07 msec to be runned ?
Bruno BAGUETTE [EMAIL PROTECTED] writes: Do you see a way to get better performances with this query which takes currently 655.07 msec to be done. levure= explain analyze SELECT distinct lower(substr(l_name, 1, 1)) AS initiale FROM people levure- UNION levure- SELECT distinct lower(substr(org_name, 1, 1)) AS initiale FROM organizations levure- ORDER BY initiale; This is inherently a bit inefficient since the UNION implies a DISTINCT step, thus partially repeating the DISTINCT work done inside each SELECT. It would likely be a tad faster to drop the DISTINCTs from the subselects and rely on UNION to do the filtering. However, you're still gonna have a big SORT/UNIQUE step. As of PG 7.4 you could probably get a performance win by converting the thing to use GROUP BY instead of DISTINCT or UNION: select initiale from ( select lower(substr(l_name,1,1)) as initiale from people union all select lower(substr(org_name,1,1)) as initiale from organizations ) ss group by initiale order by initiale; This should use a HashAggregate to do the unique-ification. I think that will be faster than Sort/Unique. regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Increase performance of a UNION query that thakes
On Fri, 6 Feb 2004, Bruno BAGUETTE wrote: I was thinking that a index on lower(substr(l_name, 1, 1)) and another index on lower(substr(org_name, 1, 1)) should gives better performances. When I've to create theses two indexes, it seems like this is not allowed : levure= CREATE INDEX firstchar_lastname_idx ON people(lower(substr(l_name, 1, 1))); ERROR: parser: parse error at or near ( at character 59 In 7.4, I believe you would say on people((lower(substr(l_name,1,1 but I'm not sure that index would really help in practice. Do you have another idea to get better performances ? In addition to what Tom said, the row estimates look suspiciously default. You mention vacuuming, but do you ever analyze the tables? Also, what do you have sort_mem set to? ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
RE : [PERFORM] Increase performance of a UNION query that thakes 655.07 msec to be runned ?
re-Hello, As suggested by Tom, I've removed the distinct and tried it's query : levure= explain analyze select initiale from ( levure( select lower(substr(l_name,1,1)) as initiale from people levure( union all levure( select lower(substr(org_name,1,1)) as initiale from organizations levure( ) ss levure- group by initiale order by initiale; QUERY PLAN - Group (cost=1018.48..1074.32 rows=1117 width=17) (actual time=783.47..867.61 rows=39 loops=1) - Sort (cost=1018.48..1046.40 rows=11167 width=17) (actual time=782.18..801.68 rows=11167 loops=1) Sort Key: initiale - Subquery Scan ss (cost=0.00..267.67 rows=11167 width=17) (actual time=0.23..330.31 rows=11167 loops=1) - Append (cost=0.00..267.67 rows=11167 width=17) (actual time=0.22..263.69 rows=11167 loops=1) - Subquery Scan *SELECT* 1 (cost=0.00..87.93 rows=4093 width=15) (actual time=0.22..79.51 rows=4093 loops=1) - Seq Scan on people (cost=0.00..87.93 rows=4093 width=15) (actual time=0.20..53.82 rows=4093 loops=1) - Subquery Scan *SELECT* 2 (cost=0.00..179.74 rows=7074 width=17) (actual time=0.24..146.12 rows=7074 loops=1) - Seq Scan on organizations (cost=0.00..179.74 rows=7074 width=17) (actual time=0.23..100.70 rows=7074 loops=1) Total runtime: 874.79 msec (10 rows) That seems to be 200 msec longer that my first query... Indeed, I've noticed something strange : now, if I rerun my initial query, I get worse runtime than this morning : levure= EXPLAIN ANALYZE SELECT lower(substr(l_name, 1, 1)) AS initiale FROM people levure- UNION levure- SELECT lower(substr(org_name, 1, 1)) AS initiale FROM organizations levure- ORDER BY initiale; QUERY PLAN Sort (cost=1130.85..1133.64 rows=1117 width=17) (actual time=802.52..802.58 rows=39 loops=1) Sort Key: initiale - Unique (cost=1018.48..1074.32 rows=1117 width=17) (actual time=712.04..801.83 rows=39 loops=1) - Sort (cost=1018.48..1046.40 rows=11167 width=17) (actual time=712.03..732.63 rows=11167 loops=1) Sort Key: initiale - Append (cost=0.00..267.67 rows=11167 width=17) (actual time=0.21..263.54 rows=11167 loops=1) - Subquery Scan *SELECT* 1 (cost=0.00..87.93 rows=4093 width=15) (actual time=0.20..80.47 rows=4093 loops=1) - Seq Scan on people (cost=0.00..87.93 rows=4093 width=15) (actual time=0.19..54.14 rows=4093 loops=1) - Subquery Scan *SELECT* 2 (cost=0.00..179.74 rows=7074 width=17) (actual time=0.28..144.82 rows=7074 loops=1) - Seq Scan on organizations (cost=0.00..179.74 rows=7074 width=17) (actual time=0.27..99.06 rows=7074 loops=1) Total runtime: 806.47 msec (11 rows) I don't understand why this runtime has changed because no data has been added/updated/deleted since several weeks (I'm working on a copy of the production database. And this copy is not accessible for the users). My PostgreSQL version is PostgreSQL 7.3.2, I have to ask to the administrator if it can be upgraded to 7.4 in the production server. Thanks in advance for your help. --- Bruno BAGUETTE - [EMAIL PROTECTED] ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
RE : [PERFORM] Increase performance of a UNION query that thakes 655.07 msec to be runned ?
In addition to what Tom said, the row estimates look suspiciously default. You mention vacuuming, but do you ever analyze the tables? I run VACUUM FULL ANALYZE with the postgres user on all the PostgreSQL databases on the server, twice a day, sometimes more. Also, what do you have sort_mem set to? [EMAIL PROTECTED] data]# cat postgresql.conf | grep sort_mem sort_mem = 6144 # min 64, size in KB Do you think I should increase that value ? It's not so easy to do a good setup of that postgresql.conf file, is there any tool that suggests some values for that ? Thanks in advance for your tips :-) --- Bruno BAGUETTE - [EMAIL PROTECTED] ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: RE : [PERFORM] Increase performance of a UNION query that thakes
On Fri, 6 Feb 2004, Bruno BAGUETTE wrote: In addition to what Tom said, the row estimates look suspiciously default. You mention vacuuming, but do you ever analyze the tables? I run VACUUM FULL ANALYZE with the postgres user on all the PostgreSQL databases on the server, twice a day, sometimes more. Wierd, because you're getting 1000 estimated on both people and organizations. What does pg_class have to say about those two tables? Also, what do you have sort_mem set to? [EMAIL PROTECTED] data]# cat postgresql.conf | grep sort_mem sort_mem = 6144 # min 64, size in KB Do you think I should increase that value ? Hmm, I'd expect that the sort would fit in that space in general. If you want to try different values, you can set sort_mem from psql rather than changing the configuration file. On my machine the index does actually help, although I needed to lower random_page_cost a little from its default of 4 to get it to use it preferentially, but I'm also getting times about 1/3 of yours (and my machine is pretty poor) so I think I may not have data that matches yours very well. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] COPY with INDEXES question
On Thu, 2004-02-05 at 19:46, Slavisa Garic wrote: Hi, I have a quick question. In order to speed up insertion of large number of rows (100s of thousands) I replaced the INSERT with the COPY. This works fine but one question popped into my mind. Does copy updates indexes on that table if there are some defined? Copy does nearly everything that standard inserts to. RULES are the only thing that come to mind. Triggers, indexes, constraints, etc. are all applied. -- Rod Taylor rbt [at] rbt [dot] ca Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL PGP Key: http://www.rbt.ca/rbtpub.asc signature.asc Description: This is a digitally signed message part
Re: [PERFORM] [pgsql-advocacy] MySQL+InnoDB vs. PostgreSQL test?
Mike Nolan wrote: Seriously, I am tired of this kind of question. You gotta get bold enough to stand up in a meeting like that, say guy's, you can ask me how this compares to Oracle ... but if you're seriously asking me how this compares to MySQL, call me again when you've done your homework. Can they call you at the unemployment office? It might not work with the words I used above, but the point I tried to make is that the hardest thing you can sell is a no. I mean, not just saying no, but selling it in a way that the customer will not go with the next idiot who claims we can do that. If the customer has a stupid idea, like envisioning an enterprise solution based on ImSOL, there is no way you will be able to deliver it. Paying customer or not, you will fail if you bow to their strategic decisions and ignore knowing that the stuff they want to use just doesn't fit. That is absolutely not ImSOL specific. If someone comes to me and asks for a HA scenario with zero transaction loss during failover, we can discuss a little if this is really what he needs or not, but if he needs that, the solution will be Oracle or DB2, for sure I will not claim that PostgreSQL can do that, because it cannot. Jan -- #==# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #== [EMAIL PROTECTED] # ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Database conversion woes...
First just wanted to say thank you all for the quick and helpful answers. With all the input I know I am on the right track. With that in mind I created a perl script to do my migrations and to do it based on moving from a db name to a schema name. I had done alot of the reading on converting based on the miss match of data types that MySQL likes to use. I must say it is VERY nice having a intelligent system that say won't let a date of '-00-00' be entered. Luckily I didn't have to deal with any enumerations. So the conversion goes on. I will definitely be back and forth in here as I get the new queries written and start migrating all I can back into the pg backend using plpgsql or c for the stored procedures where required. The mammoth replicator has been working well. I had tried the pgsql-r and had limited success with it, and dbmirror was just taking to long having to do 4 db transactions just to mirror one command. I have eserv but was never really a java kind of guy. Alright then - back to my code. Again thanks for the help and info. Kevin ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Why is query selecting sequential?
Karl, SubPlan - Seq Scan on forumlog (cost=0.00..1.18 rows=1 width=8) Filter: ((login = '%s'::text) AND (forum = '%s'::text) AND (number = $0)) Why is the subplan using a sequential scan? At minimum the index on the post number (forumlog_number) should be used, no? What would be even better would be a set of indices that allow at least two (or even all three) of the keys in the inside SELECT to be used. It's using a seq scan because you have only 1 row in the table. Don't bother testing performance before your database is populated. PostgreSQL doesn't just use an index because it's there; it uses and index because it's faster than not using one. If there is more than one row in the table, then: 1) run ANALYZE forumlog; 2) Send us the EXPLAIN ANALYZE, not just the explain for the query. -- -Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Why is query selecting sequential?
Karl, Well, still with only 5 rows in the forumlog table you're not going get realistic results compared to a loaded database. However, you are making things difficult for the parser with awkward query syntax; what you currently have encourages a sequential loop. If there are potentially several rows in forumlog for each row in post, then your query won't work either. akcs= explain analyze select forum, (replied (select lastview from forumlog where forumlog.login='genesis' and forumlog.forum='General' and number=post.number)) as newflag, * from post where forum = 'General' and toppost = 1 order by pinned desc, replied desc; Instead: if only one row in forumlog per row in post: SELECT (replied lastview) AS newflag, post.* FROM post, forumlog WHERE post.forum = 'General' and toppost = 1 and forumlog.login = 'genesis' and forumlog.forum='General' and forumlog.number=post.number; -- -Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] 7.3 vs 7.4 performance
On Fri, 2004-02-06 at 02:43, Hannu Krosing wrote: Orion Henry kirjutas N, 05.02.2004 kell 07:16: I've done some testing of 7.3.4 vs 7.4.1 and found 7.4.1 to be 20%-30% slower than 7.3.4. Is this common knowledge or am I just unlucky with my query/data selection? Things of note that might matter: the machine is a dual Opteron 1.4GHz running Fedora Core 1 Test 1 for X86_64. The 7.3.4 was from the Fedora distro and the 7.4.1 was the PGDG package. Are you sure that it is not the case that it is not tha case that 7.3.4 is 64 bit and the PGDG package is 32 ? Yes sure... I don't know if they were compiled with differing optimizations or compilers though... The database is 3.5 Gigs with 10 millions rows and the machine had 1 Gig or ram. Oh... as a side note I'm happy to announce that the 2.6 Linux kernel has more than DOUBLED the speed of all my Postgres queries over the 2.4. =) Is this on this same hardware ? No. I havent gotten the 2.6 kernel working on the Opteron yet. The 2x speedup was on a dual Athlon 2GHz. signature.asc Description: This is a digitally signed message part
Re: [PERFORM] 7.3 vs 7.4 performance
On Wed, 2004-02-04 at 21:27, Josh Berkus wrote: Orion, I've done some testing of 7.3.4 vs 7.4.1 and found 7.4.1 to be 20%-30% slower than 7.3.4. Is this common knowledge or am I just unlucky with my query/data selection? No, it's not common knowledge. It should be the other way around. Perhaps it's the queries you picked? Even so . feel free to post individual EXPLAIN ANALYZEs to the list. Thank you... Here's one good example of 7.3 beating 7.4 soundly: Again this could me some compile option since I built the 7.4 RPM from source and I got the 7.3 from Fedora or something to do with the Opteron architecture. (Yes the compiled postgres is 64 bit) SELECT cid,media_name,media_type,count(*) as count,sum(a_amount) as a,sum(case when b_amount 0 then b_amount else 0 end) as b, sum(case when b_amount 0 then b_amount else 0 end) as c FROM transdata JOIN media_info ON (media = media_type) WHERE cid = 140100 AND demo is not null AND trans_date between date '2004-01-01' AND date_trunc('month',date '2004-01-01' + interval '32 days') GROUP BY cid,media_name,media_type; Here's 7.3's time and explain real 0m34.260s user 0m0.010s sys 0m0.000s --- Aggregate (cost=7411.88..7415.32 rows=17 width=25) - Group (cost=7411.88..7413.60 rows=172 width=25) - Sort (cost=7411.88..7412.31 rows=172 width=25) Sort Key: transdata.cid, media_info.media_name, transdata.media_type - Hash Join (cost=1.22..7405.50 rows=172 width=25) Hash Cond: (outer.media_type = inner.media) - Index Scan using transdata_date_index on transdata (cost=0.00..7401.27 rows=172 width=14) Index Cond: ((trans_date = ('2004-01-01'::date)::timestamp with time zone) AND (trans_date = ('2004-02-01 00:00:00'::timestamp without time zone)::timestamp with time zone)) Filter: ((cid = 140100) AND (demo IS NOT NULL)) - Hash (cost=1.18..1.18 rows=18 width=11) - Seq Scan on media_info (cost=0.00..1.18 rows=18 width=11) Here's 7.4's time and explain real 0m43.052s user 0m0.000s sys 0m0.020s QUERY PLAN --- HashAggregate (cost=8098.26..8098.29 rows=2 width=23) - Hash Join (cost=1.22..8095.48 rows=159 width=23) Hash Cond: (outer.media_type = inner.media) - Index Scan using transdata_date_index on transdata (cost=0.00..8091.87 rows=159 width=14) Index Cond: ((trans_date = ('2004-01-01'::date)::timestamp with time zone) AND (trans_date = ('2004-02-01 00:00:00'::timestamp without time zone)::timestamp with time zone)) Filter: ((cid = 140100) AND (demo IS NOT NULL)) - Hash (cost=1.18..1.18 rows=18 width=11) - Seq Scan on media_info (cost=0.00..1.18 rows=18 width=11) signature.asc Description: This is a digitally signed message part
Re: [PERFORM] 7.3 vs 7.4 performance
Orion, Here's one good example of 7.3 beating 7.4 soundly: Again this could me some compile option since I built the 7.4 RPM from source and I got the 7.3 from Fedora or something to do with the Opteron architecture. (Yes the compiled postgres is 64 bit) Need an EXPLAIN ANALYZE, not just an EXPLAIN. -- -Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])