Re: [PERFORM] Major differences between oracle and postgres performance - what can I do ?
Don't know about Oracle, but select-distinct in MSSQL2K will indeed throw away duplicates, which chops the CPU time. Very easy to see in the graphic query plan, both in terms of CPU and the number of rows retrieved from a single-node or nested-loop subtree. Definitely a worthwhile optimization. Tom Lane [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] Stephen Frost [EMAIL PROTECTED] writes: * Tom Lane ([EMAIL PROTECTED]) wrote: [... thinks for awhile ...] It seems possible that they may use sort code that knows it is performing a DISTINCT operation and discards duplicates on sight. Given that there are only 534 distinct values, the sort would easily stay in memory if that were happening. Could this optimization be added to PostgreSQL? It sounds like a very reasonable thing to do. That's what I was wondering about too. But first I'd like to get some kind of reading on how effective it would be. If someone can demonstrate that Oracle can do sort-and-drop-dups a lot faster than it can do a straight sort of the same amount of input data, that would be a strong indication that it's worth doing. At this point we don't know if that's the source of their win or not. regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[PERFORM] Major differences between oracle and postgres performance - what can I do ?
I'm trying to migrate an application from an Oracle backend to PostgreSQL and have a performance question. The hardware for the database is the same, a SunFire v120, 2x73GB U2W SCSI disks, 1GB RAM, 650MHz US-IIe CPU. Running Solaris 8. The table in question has 541741 rows. Under Oracle, the query ' select distinct version from vers where version is not null ' returns 534 rows in 6.14 seconds, with an execution plan showing a table scan of vers followed by a sort. The explain output on postgres shows the same execution with a scan on vers and a sort but the query time is 78.6 seconds. The explain output from PostgreSQL is: QUERY PLAN - Unique (cost=117865.77..120574.48 rows=142 width=132) - Sort (cost=117865.77..119220.13 rows=541741 width=132) Sort Key: version - Seq Scan on vers (cost=0.00..21367.41 rows=541741 width=132) Filter: (version IS NOT NULL) I do have an index on the column in question but neither oracle nor postgresql choose to use it (which given that we're visiting all rows is perhaps not surprising). I'm not as familiar with postgresql as I am with Oracle but I think I've configured comparible buffering and sort area sizes, certainly there isn't much physical IO going on in either case. What can I do to speed up this query? Other queries are slightly slower than under Oracle on the same hardware but nothing like this. Thanks! G ___ALL-NEW Yahoo! Messenger - so many all-new ways to express yourself http://uk.messenger.yahoo.com ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Major differences between oracle and postgres performance
Gary Cowell wrote: I'm not as familiar with postgresql as I am with Oracle but I think I've configured comparible buffering and sort area sizes, certainly there isn't much physical IO going on in either case. People are going to want to know: 1. version of PG 2. explain analyse output, rather than just explain 3. What values you've used for the postgresql.conf file The actual plan from explain analyse isn't going to be much use - as you say, a scan of the whole table followed by sorting is the best you'll get. However, the actual costs of these steps might say something useful. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Major differences between oracle and postgres performance
Hi , I have similare problem and found that the problem is by pg sort. It is extremly slow by me. Also in my case I tryed to migrate one db from oracle to pg . To solve this problem I dinamicaly set sort_mem to some big value. In this case the sort is working into RAM and is relative fast. You can try this and remember sort mem is per sort, not per connection. In my migration I found the only advantage for oracle is the very good sort. regards, ivan. Gary Cowell wrote: --- [EMAIL PROTECTED] wrote: You can roughly estimate time spent for just scaning the table using something like this: select sum(version) from ... where version is not null and just select sum(version) from ... The results would be interesting to compare. To answer (I hope) everyones questions at once: 1) Oracle and postmaster were not running at the same time 2) The queries were run once, to cache as much as possible then run again to get the timing 3) Distinct vs. no distinct (i.e. sort performance). select length(version) from vers where version is not null; Time: 9748.174 ms select distinct(version) from vers where version is not null; Time: 67988.972 ms So about an extra 60 seconds with the distinct on. Here is the explain analyze output from psql: # explain analyze select distinct version from vers where version is not null; QUERY PLAN --- Unique (cost=117865.77..120574.48 rows=142 width=132) (actual time=63623.428..68269.111 rows=536 loops=1) - Sort (cost=117865.77..119220.13 rows=541741 width=132) (actual time=63623.417..66127.641 rows=541741 loops=1) Sort Key: "version" - Seq Scan on vers (cost=0.00..21367.41 rows=541741 width=132) (actual time=0.218..7214.903 rows=541741 loops=1) Filter: ("version" IS NOT NULL) Total runtime: 68324.215 ms (6 rows) Time: 68326.062 ms And the non-default .conf parameters: tcpip_socket = true max_connections = 100 password_encryption = true shared_buffers = 2000 sort_mem = 16384 vacuum_mem = 8192 effective_cache_size = 4000 syslog = 2 postgresql version is 7.4.3 compiled with GCC 3.3.2 on sun4u architecture. ___ALL-NEW Yahoo! Messenger - so many all-new ways to express yourself http://uk.messenger.yahoo.com ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Major differences between oracle and postgres performance - what can I do ?
* Tom Lane ([EMAIL PROTECTED]) wrote: [... thinks for awhile ...] It seems possible that they may use sort code that knows it is performing a DISTINCT operation and discards duplicates on sight. Given that there are only 534 distinct values, the sort would easily stay in memory if that were happening. Could this optimization be added to PostgreSQL? It sounds like a very reasonable thing to do. Hopefully there wouldn't be too much complexity needed to add it. Stephen signature.asc Description: Digital signature
Re: [PERFORM] Major differences between oracle and postgres performance - what can I do ?
Stephen Frost [EMAIL PROTECTED] writes: * Tom Lane ([EMAIL PROTECTED]) wrote: [... thinks for awhile ...] It seems possible that they may use sort code that knows it is performing a DISTINCT operation and discards duplicates on sight. Given that there are only 534 distinct values, the sort would easily stay in memory if that were happening. Could this optimization be added to PostgreSQL? It sounds like a very reasonable thing to do. That's what I was wondering about too. But first I'd like to get some kind of reading on how effective it would be. If someone can demonstrate that Oracle can do sort-and-drop-dups a lot faster than it can do a straight sort of the same amount of input data, that would be a strong indication that it's worth doing. At this point we don't know if that's the source of their win or not. regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Major differences between oracle and postgres performance - what can I do ?
* Tom Lane ([EMAIL PROTECTED]) wrote: Stephen Frost [EMAIL PROTECTED] writes: * Tom Lane ([EMAIL PROTECTED]) wrote: [... thinks for awhile ...] It seems possible that they may use sort code that knows it is performing a DISTINCT operation and discards duplicates on sight. Given that there are only 534 distinct values, the sort would easily stay in memory if that were happening. Could this optimization be added to PostgreSQL? It sounds like a very reasonable thing to do. That's what I was wondering about too. But first I'd like to get some kind of reading on how effective it would be. If someone can demonstrate that Oracle can do sort-and-drop-dups a lot faster than it can do a straight sort of the same amount of input data, that would be a strong indication that it's worth doing. At this point we don't know if that's the source of their win or not. Alright, I did a couple tests, these are different systems with different hardware, but in the end I think the difference is clear: tsf=# explain analyze select distinct access_type_id from p_gen_dom_dedicated_swc_access ; QUERY PLAN Unique (cost=321591.00..333205.56 rows=16 width=10) (actual time=32891.141..37420.429 rows=16 loops=1) - Sort (cost=321591.00..327398.28 rows=2322912 width=10) (actual time=32891.137..35234.810 rows=2322912 loops=1) Sort Key: access_type_id - Seq Scan on p_gen_dom_dedicated_swc_access (cost=0.00..55492.12 rows=2322912 width=10) (actual time=0.013..3743.470 rows=2322912 loops=1) Total runtime: 37587.519 ms (5 rows) tsf=# explain analyze select access_type_id from p_gen_dom_dedicated_swc_access order by access_type_id; QUERY PLAN -- Sort (cost=321591.00..327398.28 rows=2322912 width=10) (actual time=32926.696..35278.847 rows=2322912 loops=1) Sort Key: access_type_id - Seq Scan on p_gen_dom_dedicated_swc_access (cost=0.00..55492.12 rows=2322912 width=10) (actual time=0.014..3753.443 rows=2322912 loops=1) Total runtime: 36737.628 ms (4 rows) So, about the same from postgres in each case. From Oracle: (select access_type_id from p_gen_dom_dedicated_swc_access order by access_type_id) sauron:/home/sfrost time sqlplus mci_vendor/mci @test.sql /dev/null real3m55.12s user2m25.87s sys 0m10.59s (select distinct access_type_id from p_gen_dom_dedicated_swc_access) sauron:/home/sfrost time sqlplus mci_vendor/mci @test.sql /dev/null real0m5.08s user0m3.86s sys 0m0.95s All the queries were run multiple times, though there wasn't all that much difference in the times. Both systems are pretty speedy, but I tend to feel the Postgres box is faster in CPU/disk access time, which is probably why the Oracle system took 4 minutes to do what the Postgres systems does in 40 seconds. My only other concern is the Oracle system having to do the write I/O while the postgres one doesn't... I don't see an obvious way to get around that though, and I'm not sure if it'd really make *that* big of a difference. Stephen signature.asc Description: Digital signature
Re: [PERFORM] Major differences between oracle and postgres performance - what can I do ?
=?iso-8859-1?q?Gary=20Cowell?= [EMAIL PROTECTED] writes: So it seems the idea that oracle is dropping duplicate rows prior to the sort when using distinct may indeed be the case. Okay. We won't have any short-term solution for making DISTINCT do that, but if you are on PG 7.4 you could get the same effect from using GROUP BY: instead of select distinct version from vers where version is not null try select version from vers where version is not null group by version You should get a HashAggregate plan out of that, and I'd think it'd be pretty quick when there are not many distinct values of version. regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend