Re: [PERFORM] Major differences between oracle and postgres performance - what can I do ?

2004-06-29 Thread Mischa Sandberg
Don't know about Oracle, but select-distinct in MSSQL2K will indeed throw
away duplicates, which chops the CPU time. Very easy to see in the graphic
query plan, both in terms of CPU and the number of rows retrieved from a
single-node or nested-loop subtree. Definitely a worthwhile optimization.

Tom Lane [EMAIL PROTECTED] wrote in message
news:[EMAIL PROTECTED]
 Stephen Frost [EMAIL PROTECTED] writes:
  * Tom Lane ([EMAIL PROTECTED]) wrote:
  [... thinks for awhile ...]  It seems possible that they may use sort
  code that knows it is performing a DISTINCT operation and discards
  duplicates on sight.  Given that there are only 534 distinct values,
  the sort would easily stay in memory if that were happening.

  Could this optimization be added to PostgreSQL?  It sounds like a very
  reasonable thing to do.

 That's what I was wondering about too.  But first I'd like to get
 some kind of reading on how effective it would be.  If someone can
 demonstrate that Oracle can do sort-and-drop-dups a lot faster than
 it can do a straight sort of the same amount of input data, that
 would be a strong indication that it's worth doing.  At this point
 we don't know if that's the source of their win or not.

 regards, tom lane

 ---(end of broadcast)---
 TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]




---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


[PERFORM] Major differences between oracle and postgres performance - what can I do ?

2004-06-18 Thread Gary Cowell
I'm trying to migrate an application from an Oracle
backend to PostgreSQL and have a performance question.

The hardware for the database is the same, a SunFire
v120, 2x73GB U2W SCSI  disks, 1GB RAM, 650MHz US-IIe
CPU. Running Solaris 8.

The table in question has 541741 rows. Under Oracle,
the query ' select distinct version from vers where
version is not null '  returns 534 rows in 6.14
seconds, with an execution plan showing a table scan
of vers followed by a sort.

The explain output on postgres shows the same
execution with a scan on vers and a sort but the query
time is 78.6 seconds.

The explain output from PostgreSQL is:
   QUERY PLAN
-
 Unique  (cost=117865.77..120574.48 rows=142
width=132)
   -  Sort  (cost=117865.77..119220.13 rows=541741
width=132)
 Sort Key: version
 -  Seq Scan on vers  (cost=0.00..21367.41
rows=541741 width=132)
   Filter: (version IS NOT NULL)

I do have an index on the column in question but
neither oracle nor postgresql choose to use it (which
given that we're visiting all rows is perhaps not
surprising).

I'm not as familiar with postgresql as I am with
Oracle but I think I've configured comparible
buffering and sort area sizes, certainly there isn't
much physical IO going on in either case.

What can I do to speed up this query? Other queries
are slightly slower than under Oracle on the same
hardware but nothing like this.

Thanks!

G





___ALL-NEW Yahoo! Messenger - 
so many all-new ways to express yourself http://uk.messenger.yahoo.com

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Major differences between oracle and postgres performance

2004-06-18 Thread Richard Huxton
Gary Cowell wrote:
I'm not as familiar with postgresql as I am with
Oracle but I think I've configured comparible
buffering and sort area sizes, certainly there isn't
much physical IO going on in either case.
People are going to want to know:
1. version of PG
2. explain analyse output, rather than just explain
3. What values you've used for the postgresql.conf file
The actual plan from explain analyse isn't going to be much use - as you 
say, a scan of the whole table followed by sorting is the best you'll 
get. However, the actual costs of these steps might say something useful.

--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] Major differences between oracle and postgres performance

2004-06-18 Thread pginfo




Hi ,
I have similare problem and found that the problem is by pg sort.
It is extremly slow by me.

Also in my case I tryed to migrate one db from oracle to pg .

To solve this problem I dinamicaly set sort_mem to some big value.
In this case the sort is working into RAM and is relative fast.
You can try this and remember sort mem is per sort, not per connection.

In my migration I found the only advantage for oracle is the very good sort.

regards,
ivan.

Gary Cowell wrote:

  --- [EMAIL PROTECTED] wrote:  You can roughly estimate time
spent for just scaning
  
  
the table using
something like this: 

	select sum(version) from ... where version is not
null

	and just 

	select sum(version) from ...

The results would be interesting to compare. 

  
  
To answer (I hope) everyones questions at once:

1) Oracle and postmaster were not running at the same
time
2) The queries were run once, to cache as much as
possible then run again to get the timing

3) Distinct vs. no distinct (i.e. sort performance).

select length(version) from vers where version is not
null;

Time: 9748.174 ms

select distinct(version) from vers where version is
not null;

Time: 67988.972 ms

So about an extra 60 seconds with the distinct on.

Here is the explain analyze output from psql:

# explain analyze select distinct version from vers
where version is not null;
  
 QUERY PLAN
---
 Unique  (cost=117865.77..120574.48 rows=142
width=132) (actual time=63623.428..68269.111 rows=536
loops=1)
   -  Sort  (cost=117865.77..119220.13 rows=541741
width=132) (actual time=63623.417..66127.641
rows=541741 loops=1)
 Sort Key: "version"
 -  Seq Scan on vers  (cost=0.00..21367.41
rows=541741 width=132) (actual time=0.218..7214.903
rows=541741 loops=1)
   Filter: ("version" IS NOT NULL)
 Total runtime: 68324.215 ms
(6 rows)

Time: 68326.062 ms


And the non-default .conf parameters:

tcpip_socket = true
max_connections = 100
password_encryption = true
shared_buffers = 2000
sort_mem = 16384 
vacuum_mem = 8192
effective_cache_size = 4000
syslog = 2 

postgresql version is 7.4.3
compiled with GCC 3.3.2 on sun4u architecture.





	
	
		
___ALL-NEW Yahoo! Messenger - so many all-new ways to express yourself http://uk.messenger.yahoo.com

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


  






Re: [PERFORM] Major differences between oracle and postgres performance - what can I do ?

2004-06-18 Thread Stephen Frost
* Tom Lane ([EMAIL PROTECTED]) wrote:
 [... thinks for awhile ...]  It seems possible that they may use sort
 code that knows it is performing a DISTINCT operation and discards
 duplicates on sight.  Given that there are only 534 distinct values,
 the sort would easily stay in memory if that were happening.

Could this optimization be added to PostgreSQL?  It sounds like a very
reasonable thing to do.  Hopefully there wouldn't be too much complexity
needed to add it.

Stephen


signature.asc
Description: Digital signature


Re: [PERFORM] Major differences between oracle and postgres performance - what can I do ?

2004-06-18 Thread Tom Lane
Stephen Frost [EMAIL PROTECTED] writes:
 * Tom Lane ([EMAIL PROTECTED]) wrote:
 [... thinks for awhile ...]  It seems possible that they may use sort
 code that knows it is performing a DISTINCT operation and discards
 duplicates on sight.  Given that there are only 534 distinct values,
 the sort would easily stay in memory if that were happening.

 Could this optimization be added to PostgreSQL?  It sounds like a very
 reasonable thing to do.

That's what I was wondering about too.  But first I'd like to get
some kind of reading on how effective it would be.  If someone can
demonstrate that Oracle can do sort-and-drop-dups a lot faster than
it can do a straight sort of the same amount of input data, that
would be a strong indication that it's worth doing.  At this point
we don't know if that's the source of their win or not.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Major differences between oracle and postgres performance - what can I do ?

2004-06-18 Thread Stephen Frost
* Tom Lane ([EMAIL PROTECTED]) wrote:
 Stephen Frost [EMAIL PROTECTED] writes:
  * Tom Lane ([EMAIL PROTECTED]) wrote:
  [... thinks for awhile ...]  It seems possible that they may use sort
  code that knows it is performing a DISTINCT operation and discards
  duplicates on sight.  Given that there are only 534 distinct values,
  the sort would easily stay in memory if that were happening.
 
  Could this optimization be added to PostgreSQL?  It sounds like a very
  reasonable thing to do.
 
 That's what I was wondering about too.  But first I'd like to get
 some kind of reading on how effective it would be.  If someone can
 demonstrate that Oracle can do sort-and-drop-dups a lot faster than
 it can do a straight sort of the same amount of input data, that
 would be a strong indication that it's worth doing.  At this point
 we don't know if that's the source of their win or not.

Alright, I did a couple tests, these are different systems with
different hardware, but in the end I think the difference is clear:

tsf=# explain analyze select distinct access_type_id from 
p_gen_dom_dedicated_swc_access ;
   QUERY PLAN  
   
 

   
 Unique  (cost=321591.00..333205.56 rows=16 width=10) (actual 
time=32891.141..37420.429 rows=16 loops=1)
   -  Sort  (cost=321591.00..327398.28 rows=2322912 width=10) (actual 
time=32891.137..35234.810 rows=2322912 loops=1)
 Sort Key: access_type_id
 -  Seq Scan on p_gen_dom_dedicated_swc_access  (cost=0.00..55492.12 
rows=2322912 width=10) (actual time=0.013..3743.470 rows=2322912 loops=1)
 Total runtime: 37587.519 ms
(5 rows)

tsf=# explain analyze select access_type_id from p_gen_dom_dedicated_swc_access order 
by access_type_id;
QUERY PLAN 
   
--
 Sort  (cost=321591.00..327398.28 rows=2322912 width=10) (actual 
time=32926.696..35278.847 rows=2322912 loops=1)
   Sort Key: access_type_id 
   -  Seq Scan on p_gen_dom_dedicated_swc_access  (cost=0.00..55492.12 rows=2322912 
width=10) (actual time=0.014..3753.443 rows=2322912 loops=1)
 Total runtime: 36737.628 ms
(4 rows)

So, about the same from postgres in each case.  From Oracle:

(select access_type_id from p_gen_dom_dedicated_swc_access order by access_type_id)
sauron:/home/sfrost time sqlplus mci_vendor/mci @test.sql  /dev/null

real3m55.12s
user2m25.87s
sys 0m10.59s

(select distinct access_type_id from p_gen_dom_dedicated_swc_access)
sauron:/home/sfrost time sqlplus mci_vendor/mci @test.sql  /dev/null

real0m5.08s
user0m3.86s
sys 0m0.95s

All the queries were run multiple times, though there wasn't all that
much difference in the times.  Both systems are pretty speedy, but I
tend to feel the Postgres box is faster in CPU/disk access time, which
is probably why the Oracle system took 4 minutes to do what the Postgres
systems does in 40 seconds.  My only other concern is the Oracle system
having to do the write I/O while the postgres one doesn't...  I don't
see an obvious way to get around that though, and I'm not sure if it'd
really make *that* big of a difference.

Stephen


signature.asc
Description: Digital signature


Re: [PERFORM] Major differences between oracle and postgres performance - what can I do ?

2004-06-18 Thread Tom Lane
=?iso-8859-1?q?Gary=20Cowell?= [EMAIL PROTECTED] writes:
 So it seems the idea that oracle is dropping duplicate
 rows prior to the sort when using distinct may indeed
 be the case.

Okay.  We won't have any short-term solution for making DISTINCT do that,
but if you are on PG 7.4 you could get the same effect from using
GROUP BY: instead of
select distinct version from vers where version is not null
try
select version from vers where version is not null group by version
You should get a HashAggregate plan out of that, and I'd think it'd be
pretty quick when there are not many distinct values of version.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend