Re: [PERFORM] performance comparission postgresql/ms-sql server
hi shridhar, > Heiko Kehlenbrink wrote: > >> hi list, >> >> i want to convince people to use postgresql instead of ms-sql server, so i >> set up a kind of comparission insert data / select data from postgresql / >> ms-sql server >> >> the table i use was pretty basic, >> >> id bigserial >> dist float8 >> x float8 >> y float8 >> z float8 >> >> i filled the table with a function which filled x,y,z with incremental increasing values (1,2,3,4,5,6...) and computing from that the dist value >> for every tupel (sqrt((x*x)+(y*y)+(z*z))). >> >> this works fine for both dbms >> >> postgresql needs 13:37 min for 10.000.000 tupel, >> ms-sql needs 1:01:27 h for 10.000.000 tupel. >> >> so far so good. >> >> i attached an index on the dist row and started to query the dbs with scripts which select a serial row of 100.000,200.000,500.000 tupels based >> on the dist row. >> i randomizly compute the start and the end distance and made a "select avg(dist) from table where dist > startdist and dist < enddist" > > Some basics to check quickly. > > 1. vacuum analyze the table before you start selecting. was done, > 2. for slow running queries, check explain analyze output and find out who takes > maximum time. [EMAIL PROTECTED]:~> psql -d test -c 'explain analyse select avg(dist) from massive2 where dist > (100*sqrt(3.0))::float8 and dist < (150*sqrt(3.0))::float8;' NOTICE: QUERY PLAN: Aggregate (cost=14884.61..14884.61 rows=1 width=8) (actual time=3133.24..3133.24 rows=1 loops=1) -> Index Scan using massive2_dist on massive2 (cost=0.00..13648.17 rows=494573 width=8) (actual time=0.11..2061.38 rows=49 loops=1) Total runtime: 3133.79 msec EXPLAIN seems to me that most time was needed for the index scanning... > 3. Check for typecasting. You need to typecast the query correctly e.g. > > select avg(dist) from table where dist >startdist::float8 and > dist > This might still end up with sequential scan depending upon the plan. but if > index scan is picked up, it might be plenty fast.. > nope, the dist row is float8 and the query-borders are float8 too, also the explain says that an index scann was done. > Post explain analyze for the queries if things don't improve. > see above.. > HTH > > Shridhar > best regards heiko > > ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] performance comparission postgresql/ms-sql server
Heiko Kehlenbrink wrote: i want to convince people to use postgresql instead of ms-sql server, so i set up a kind of comparission insert data / select data from postgresql / ms-sql server [...] do you have any hints like compiler-flags and so on to get the answering time from postgresql equal to ms-sql? (btw both dbms were running on exactly the same hardware) i use suse 8.1 postgresql 7.2 compiled from the rpms for using postgis, but that is anothe story... 1.5 gig ram 1.8 mhz intel cpu every help welcome Suse 8.1 comes with 2.4 series kernel I suppose. Many have witnessed a speed increase when using 2.6 series kernel. Might consider this too besides the newer PostgreSQL version already suggested. 2.6 has some scheduling options that are not enabled by default but may enhance database performance (http://story.news.yahoo.com/news?tmpl=story&cid=75&e=2&u=/nf/20040405/tc_nf/23603). Kaarel ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] performance comparission postgresql/ms-sql server
Heiko Kehlenbrink wrote: Hmm... I would suggest if you are testing, you should try 7.4.2. 7.4 has some good optimisation for hash agregates though I am not sure if it apply to averaging. would be the last option till we are runing other applications on that 7.2 system I can understand.. Also try forcing a seq. scan by turning off index scan. I guess index scan for so many rows is not exactly good thing even if tuple size if pretty small. a sequential scann gives me the following result: [EMAIL PROTECTED]:~> time psql -d test -c 'explain analyse select avg(dist) from massive2 where dist > 100*sqrt(3.0)::float8 and dist < 150*sqrt(3.0)::float8 ;' NOTICE: QUERY PLAN: Aggregate (cost=1193714.43..1193714.43 rows=1 width=8) (actual time=166718.54..166718.54 rows=1 loops=1) -> Seq Scan on massive2 (cost=0.00..1192478.00 rows=494573 width=8) (actual time=3233.22..165576.40 rows=49 loops=1) Total runtime: 166733.73 msec Certainly bad and not an option.. I can't think of anything offhand to speed this up.. Shridhar ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] performance comparission postgresql/ms-sql server
Heiko Kehlenbrink wrote: [EMAIL PROTECTED]:~> psql -d test -c 'explain analyse select avg(dist) from massive2 where dist > (100*sqrt(3.0))::float8 and dist < (150*sqrt(3.0))::float8;' NOTICE: QUERY PLAN: Aggregate (cost=14884.61..14884.61 rows=1 width=8) (actual time=3133.24..3133.24 rows=1 loops=1) -> Index Scan using massive2_dist on massive2 (cost=0.00..13648.17 rows=494573 width=8) (actual time=0.11..2061.38 rows=49 loops=1) Total runtime: 3133.79 msec EXPLAIN seems to me that most time was needed for the index scanning... Hmm... I would suggest if you are testing, you should try 7.4.2. 7.4 has some good optimisation for hash agregates though I am not sure if it apply to averaging. Also try forcing a seq. scan by turning off index scan. I guess index scan for so many rows is not exactly good thing even if tuple size if pretty small. Shridhar ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] performance comparission postgresql/ms-sql server
Heiko Kehlenbrink wrote: i use suse 8.1 postgresql 7.2 compiled from the rpms for using postgis, but that is Try v7.4, there are many performance improvements. It may not make up all the differences but it should help. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] performance comparission postgresql/ms-sql server
"Heiko Kehlenbrink" <[EMAIL PROTECTED]> writes: > i use suse 8.1 > postgresql 7.2 compiled from the rpms for using postgis, but that is > anothe story... 7.4 might be a little quicker; but in any case you should be doing this sort of comparison using the current release, no? regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] performance comparission postgresql/ms-sql server
Heiko Kehlenbrink wrote: hi list, i want to convince people to use postgresql instead of ms-sql server, so i set up a kind of comparission insert data / select data from postgresql / ms-sql server the table i use was pretty basic, id bigserial dist float8 x float8 y float8 z float8 i filled the table with a function which filled x,y,z with incremental increasing values (1,2,3,4,5,6...) and computing from that the dist value for every tupel (sqrt((x*x)+(y*y)+(z*z))). this works fine for both dbms postgresql needs 13:37 min for 10.000.000 tupel, ms-sql needs 1:01:27 h for 10.000.000 tupel. so far so good. i attached an index on the dist row and started to query the dbs with scripts which select a serial row of 100.000,200.000,500.000 tupels based on the dist row. i randomizly compute the start and the end distance and made a "select avg(dist) from table where dist > startdist and dist < enddist" Some basics to check quickly. 1. vacuum analyze the table before you start selecting. 2. for slow running queries, check explain analyze output and find out who takes maximum time. 3. Check for typecasting. You need to typecast the query correctly e.g. select avg(dist) from table where dist >startdist::float8 and dist This might still end up with sequential scan depending upon the plan. but if index scan is picked up, it might be plenty fast.. Post explain analyze for the queries if things don't improve. HTH Shridhar ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
[PERFORM] performance comparission postgresql/ms-sql server
hi list, i want to convince people to use postgresql instead of ms-sql server, so i set up a kind of comparission insert data / select data from postgresql / ms-sql server the table i use was pretty basic, id bigserial dist float8 x float8 y float8 z float8 i filled the table with a function which filled x,y,z with incremental increasing values (1,2,3,4,5,6...) and computing from that the dist value for every tupel (sqrt((x*x)+(y*y)+(z*z))). this works fine for both dbms postgresql needs 13:37 min for 10.000.000 tupel, ms-sql needs 1:01:27 h for 10.000.000 tupel. so far so good. i attached an index on the dist row and started to query the dbs with scripts which select a serial row of 100.000,200.000,500.000 tupels based on the dist row. i randomizly compute the start and the end distance and made a "select avg(dist) from table where dist > startdist and dist < enddist" Did the same with a table with 50.000.000 tupel in ms-sql and postgres. the outcome so far: 100.000 from 50.000.000: postgres: 0.88 sec ms-sql: 0.38 sec 200.000 from 50.000.000: postgres: 1.57 sec ms-sql: 0.54 sec 500.000 from 50.000.000: postgres: 3.66 sec ms-sql: 1.18 sec i try a lot of changes to the postgresql.conf regarding "Tuning PostgreSQL for performance" by Shridhar Daithankar, Josh Berkus which did not make a big diffrence to the answering times from postgresql. i'm pretty fine with the insert time... do you have any hints like compiler-flags and so on to get the answering time from postgresql equal to ms-sql? (btw both dbms were running on exactly the same hardware) i use suse 8.1 postgresql 7.2 compiled from the rpms for using postgis, but that is anothe story... 1.5 gig ram 1.8 mhz intel cpu every help welcome best regards heiko ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] performance comparission postgresql/ms-sql server
Heiko, > 100.000 from 50.000.000: > > postgres: 0.88 sec > ms-sql: 0.38 sec > > 200.000 from 50.000.000: > > postgres: 1.57 sec > ms-sql: 0.54 sec > > 500.000 from 50.000.000: > > postgres: 3.66 sec > ms-sql: 1.18 sec Questions: 1. Is this the time to return *all rows* or just the first row? Given the different way that PostgreSQL fetches rows to the client from MSSQL, it makes a difference. 2. What are your sort-mem and shared-mem settings? 3. Have you tried clustering the table? 4. Have you done a comparison of selecting random or scattered, instead of serial rows? MSSQL has a tendency to physically store rows in "order" which gives it a certain advantage in this kind of query. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings