Re: [PERFORM] Query Performance SQL Server vs. Postgresql
I did some further analysis and here are the results: work_mem;response_time1MB;62 seconds2MB;2 seconds4MB;700 milliseconds8MB;550 milliseconds In all cases shared_buffers were set to the default value of 32MB. As you can see the 1 to 2 MB jump on the work_mem does wonders. I probably don't need this to be any higher than 8 or 16 MB. Thanks to all for help! Humair Date: Mon, 22 Nov 2010 12:00:15 +0100 Subject: Re: [PERFORM] Query Performance SQL Server vs. Postgresql From: t...@fuzzy.cz To: huma...@hotmail.com CC: pgsql-performance@postgresql.org Correct, the optimizer did not take the settings with the pg_ctl reload command. I did a pg_ctl restart and work_mem now displays the updated value. I had to bump up all the way to 2047 MB to get the response below (with work_mem at 1024 MB I see 7 seconds response time) and with 2047 MB (which is the max value that can be set for work_mem - anything more than that results in a FATAL error because of the limit) the results are below. Hm, can you post explain plan for the case work_mem=1024MB. I guess the difference is due to caching. According to the explain analyze, there are just cache hits, no reads. Anyway the hash join uses only about 40MB of memory, so 1024MB should be perfectly fine and the explain plan should be exactly the same as with work_mem=2047MB. And the row estimates seem quite precise, so I don't think there's some severe overestimation. Tomas
Re: [PERFORM] Query Performance SQL Server vs. Postgresql
On Mon, Nov 22, 2010 at 7:12 PM, Humair Mohammed huma...@hotmail.com wrote: I did some further analysis and here are the results: work_mem;response_time 1MB;62 seconds 2MB;2 seconds 4MB;700 milliseconds 8MB;550 milliseconds In all cases shared_buffers were set to the default value of 32MB. As you can see the 1 to 2 MB jump on the work_mem does wonders. I probably don't need this to be any higher than 8 or 16 MB. Thanks to all for help! Humair work_mem directly affects how the query is planned, because certain types of plans (hash joins ans large sorts) require memory. raising or lowering shared_buffers OTOH is very subtle and is not something you tune to improve the execution of a single query... merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query Performance SQL Server vs. Postgresql
That was a typo: work_mem = 2GBshared_buffers = 2GB From: pavel.steh...@gmail.com Date: Sun, 21 Nov 2010 12:38:43 +0100 Subject: Re: [PERFORM] Query Performance SQL Server vs. Postgresql To: huma...@hotmail.com CC: pgsql-performance@postgresql.org 2010/11/21 Humair Mohammed huma...@hotmail.com: 1) OS/Configuration 64-bit Windows 7 Enterprise with 8G RAM and a Dual Core 2.67 Ghz Intel CPU postgresql-x64-9.0 (PostgreSQL 9.0.1, compiled by Visual C++ build 1500, 64-bit) work_mem 2GB shared_buffers = 2 shared_buffers = 2 ??? Regards Pavel Stehule 2) Dataset name,pages,tuples,pg_size_pretty pivotbad;1870;93496;15 MB pivotgood;5025;251212;39 MB 3) EXPLAIN (ANALYZE ON, BUFFERS ON) Hash Join (cost=16212.30..52586.43 rows=92869 width=17) (actual time=25814.222..32296.765 rows=3163 loops=1) Hash Cond: (((pb.id)::text = (pg.id)::text) AND ((pb.question)::text = (pg.question)::text)) Join Filter: ((COALESCE(pb.response, 'MISSING'::character varying))::text (COALESCE(pg.response, 'MISSING'::character varying))::text) Buffers: shared hit=384 read=6511, temp read=6444 written=6318 - Seq Scan on pivotbad pb (cost=0.00..2804.96 rows=93496 width=134) (actual time=0.069..37.143 rows=93496 loops=1) Buffers: shared hit=192 read=1678 - Hash (cost=7537.12..7537.12 rows=251212 width=134) (actual time=24621.752..24621.752 rows=251212 loops=1) Buckets: 1024 Batches: 64 Memory Usage: 650kB Buffers: shared hit=192 read=4833, temp written=4524 - Seq Scan on pivotgood pg (cost=0.00..7537.12 rows=251212 width=134) (actual time=0.038..117.780 rows=251212 loops=1) Buffers: shared hit=192 read=4833 Total runtime: 32297.305 ms 4) INDEXES I can certainly add an index but given the table sizes I am not sure if that is a factor. This by no means is a large dataset less than 350,000 rows in total and 3 columns. Also this was just a quick dump of data for comparison purpose. When I saw the poor performance on the COALESCE, I pointed the data load to SQL Server and ran the same query except with the TSQL specific ISNULL function.
Re: [PERFORM] Query Performance SQL Server vs. Postgresql
Correct, the optimizer did not take the settings with the pg_ctl reload command. I did a pg_ctl restart and work_mem now displays the updated value. I had to bump up all the way to 2047 MB to get the response below (with work_mem at 1024 MB I see 7 seconds response time) and with 2047 MB (which is the max value that can be set for work_mem - anything more than that results in a FATAL error because of the limit) the results are below. The batches and memory usage are reflecting the right behavior with these settings. Thanks for everyones input, the result is now matching what SQL Server was producing. Hash Join (cost=11305.30..39118.43 rows=92869 width=17) (actual time=145.888..326.216 rows=3163 loops=1) Hash Cond: (((pb.id)::text = (pg.id)::text) AND ((pb.question)::text = (pg.question)::text)) Join Filter: ((COALESCE(pb.response, 'MISSING'::character varying))::text (COALESCE(pg.response, 'MISSING'::character varying))::text) Buffers: shared hit=6895 - Seq Scan on pivotbad pb (cost=0.00..2804.96 rows=93496 width=134) (actual time=0.011..11.903 rows=93496 loops=1)Buffers: shared hit=1870 - Hash (cost=7537.12..7537.12 rows=251212 width=134) (actual time=145.673..145.673 rows=251212 loops=1)Buckets: 32768 Batches: 1 Memory Usage: 39939kBBuffers: shared hit=5025- Seq Scan on pivotgood pg (cost=0.00..7537.12 rows=251212 width=134) (actual time=0.004..26.242 rows=251212 loops=1) Buffers: shared hit=5025Total runtime: 331.168 ms Humair CC: t...@fuzzy.cz; huma...@hotmail.com; pavel.steh...@gmail.com; pgsql-performance@postgresql.org From: robertmh...@gmail.com Subject: Re: [PERFORM] Query Performance SQL Server vs. Postgresql Date: Sun, 21 Nov 2010 13:55:54 -0500 To: t...@sss.pgh.pa.us On Nov 21, 2010, at 12:16 PM, Tom Lane t...@sss.pgh.pa.us wrote: t...@fuzzy.cz writes: Second, I modified the work_mem setting to 2GB (reloaded config) and I see a response time of 38 seconds. Results below from EXPLAIN ANALYZE: How did you reload the config? Using 'kill -HUP pid'? That should work fine. Have you cheched 'work_mem' after the reload? Because the explain plans are exactly the same (structure, estimated costs). The really interesting bit is this and it did not change at all Buckets: 1024 Batches: 64 Memory Usage: 650kB If that didn't change, I'm prepared to bet that the OP didn't actually manage to change the active value of work_mem. Yep. All this speculation about slow disks and/or COALESCE strikes me as likely totally off-base. I think the original poster needs to run show work_mem right before the EXPLAIN ANALYZE to make sure the new value they set actually stuck. There's no reason for the planner to have used only 650kB if work_mem is set to anything =2MB. ...Robert
Re: [PERFORM] Query Performance SQL Server vs. Postgresql
Correct, the optimizer did not take the settings with the pg_ctl reload command. I did a pg_ctl restart and work_mem now displays the updated value. I had to bump up all the way to 2047 MB to get the response below (with work_mem at 1024 MB I see 7 seconds response time) and with 2047 MB (which is the max value that can be set for work_mem - anything more than that results in a FATAL error because of the limit) the results are below. Hm, can you post explain plan for the case work_mem=1024MB. I guess the difference is due to caching. According to the explain analyze, there are just cache hits, no reads. Anyway the hash join uses only about 40MB of memory, so 1024MB should be perfectly fine and the explain plan should be exactly the same as with work_mem=2047MB. And the row estimates seem quite precise, so I don't think there's some severe overestimation. Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query Performance SQL Server vs. Postgresql
On Sun, Nov 21, 2010 at 10:21 PM, Humair Mohammed huma...@hotmail.comwrote: Correct, the optimizer did not take the settings with the pg_ctl reload command. I did a pg_ctl restart and work_mem now displays the updated value. I had to bump up all the way to 2047 MB to get the response below (with work_mem at 1024 MB I see 7 seconds response time) and with 2047 MB (which is the max value that can be set for work_mem - anything more than that results in a FATAL error because of the limit) the results are below. The batches and memory usage are reflecting the right behavior with these settings. Thanks for everyones input, the result is now matching what SQL Server was producing. I believe you can set work_mem to a different value just for the duration of a single query, so you needn't have work_mem set so high if for every query on the system. A single query may well use a multiple of work_mem, so you really probably don't want it that high all the time unless all of your queries are structured similarly. Just set work_mem='2047MB'; query; reset all; But you should wait until someone more knowledgable than I confirm what I just wrote.
Re: [PERFORM] Query Performance SQL Server vs. Postgresql
I believe you can set work_mem to a different value just for the duration of a single query, so you needn't have work_mem set so high if for every query on the system. A single query may well use a multiple of work_mem, so you really probably don't want it that high all the time unless all of your queries are structured similarly. Just set work_mem='2047MB'; query; reset all; Yes, executing set work_mem='64MB' right before the query should be just fine. Setting work_mem to 2GB is an overkill most of the time (99.9%). Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query Performance SQL Server vs. Postgresql
2010/11/21 Humair Mohammed huma...@hotmail.com: 1) OS/Configuration 64-bit Windows 7 Enterprise with 8G RAM and a Dual Core 2.67 Ghz Intel CPU postgresql-x64-9.0 (PostgreSQL 9.0.1, compiled by Visual C++ build 1500, 64-bit) work_mem 2GB shared_buffers = 2 shared_buffers = 2 ??? Regards Pavel Stehule 2) Dataset name,pages,tuples,pg_size_pretty pivotbad;1870;93496;15 MB pivotgood;5025;251212;39 MB 3) EXPLAIN (ANALYZE ON, BUFFERS ON) Hash Join (cost=16212.30..52586.43 rows=92869 width=17) (actual time=25814.222..32296.765 rows=3163 loops=1) Hash Cond: (((pb.id)::text = (pg.id)::text) AND ((pb.question)::text = (pg.question)::text)) Join Filter: ((COALESCE(pb.response, 'MISSING'::character varying))::text (COALESCE(pg.response, 'MISSING'::character varying))::text) Buffers: shared hit=384 read=6511, temp read=6444 written=6318 - Seq Scan on pivotbad pb (cost=0.00..2804.96 rows=93496 width=134) (actual time=0.069..37.143 rows=93496 loops=1) Buffers: shared hit=192 read=1678 - Hash (cost=7537.12..7537.12 rows=251212 width=134) (actual time=24621.752..24621.752 rows=251212 loops=1) Buckets: 1024 Batches: 64 Memory Usage: 650kB Buffers: shared hit=192 read=4833, temp written=4524 - Seq Scan on pivotgood pg (cost=0.00..7537.12 rows=251212 width=134) (actual time=0.038..117.780 rows=251212 loops=1) Buffers: shared hit=192 read=4833 Total runtime: 32297.305 ms 4) INDEXES I can certainly add an index but given the table sizes I am not sure if that is a factor. This by no means is a large dataset less than 350,000 rows in total and 3 columns. Also this was just a quick dump of data for comparison purpose. When I saw the poor performance on the COALESCE, I pointed the data load to SQL Server and ran the same query except with the TSQL specific ISNULL function. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query Performance SQL Server vs. Postgresql
4) INDEXESI can certainly add an index but given the table sizes I am not sure if that is a factor. This by no means is a large dataset less than 350,000 rows in total and 3 columns. Also this was just a quick dump of data for comparison purpose. When I saw the poor performance on the COALESCE, I pointed the data load to SQL Server and ran the same query except with the TSQL specific ISNULL function. 35 rows definitely is a lot of rows, although with 3 INT column it's just about 13MB of data (including overhead). But indexes can be quite handy when doing joins, as in this case. Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query Performance SQL Server vs. Postgresql
Pavel Stehule wrote: 2010/11/21 Humair Mohammed : shared_buffers = 2 shared_buffers = 2 ??? Yeah, if that's not a typo, that's a very serious misconfiguration. With 8 GB of RAM in the machine, this should probably be set to somewhere between 200 MB and 2 GB, depending on your workload and what else is running on the machine. Please read through this page and make use of the information: http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query Performance SQL Server vs. Postgresql
First, I modified the work_mem setting to 1GB (reloaded config) from the default 1MB and I see a response time of 33 seconds. Results below from EXPLAIN ANALYZE: ... Second, I modified the work_mem setting to 2GB (reloaded config) and I see a response time of 38 seconds. Results below from EXPLAIN ANALYZE: ... How did you reload the config? Using 'kill -HUP pid'? That should work fine. Have you cheched 'work_mem' after the reload? Because the explain plans are exactly the same (structure, estimated costs). The really interesting bit is this and it did not change at all Buckets: 1024 Batches: 64 Memory Usage: 650kB As Tom Lane already mentioned, splitting hash join into batches (due to small memory) adds overhead, the optimal number of batches is 1. But I guess 1GB of work_mem is an overkill - something like 64MB should be fine. The suspicious thing is the query plans have not changed at all (especially the number of batches). I think you're not telling us something important (unintentionally of course). By no means I am trying to compare the 2 products. When I noticed the slow behavior of COALESCE I tried it on SQL Server. And since they are running on the same machine my comment regarding apples to apples. It is possible that this is not an apples to apples comparison other than the fact that it is running on the same machine. OK. The point of my post was that you've provided very little info about the settings etc. so it was difficult to identify why PostgreSQL is so slow. Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query Performance SQL Server vs. Postgresql
4) INDEXESI can certainly add an index but given the table sizes I am not sure if that is a factor. This by no means is a large dataset less than 350,000 rows in total and 3 columns. Also this was just a quick dump of data for comparison purpose. When I saw the poor performance on the COALESCE, I pointed the data load to SQL Server and ran the same query except with the TSQL specific ISNULL function. 35 rows definitely is a lot of rows, although with 3 INT column it's just about 13MB of data (including overhead). But indexes can be quite handy when doing joins, as in this case. OK, I've just realized the tables have 3 character columns, not integers. In that case the tables are probably much bigger (and there are things like TOAST). In that case indexes may be even more important. Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query Performance SQL Server vs. Postgresql
Pavel Stehule wrote: 2010/11/21 Humair Mohammed : shared_buffers = 2 shared_buffers = 2 ??? Yeah, if that's not a typo, that's a very serious misconfiguration. I guess that's a typo, as the explain plain in one of the previous posts contains Buffers: shared hit=192 read=4833 for a sequential scan. But I still don't know why is the query so slow :-( regards Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query Performance SQL Server vs. Postgresql
t...@fuzzy.cz writes: Second, I modified the work_mem setting to 2GB (reloaded config) and I see a response time of 38 seconds. Results below from EXPLAIN ANALYZE: How did you reload the config? Using 'kill -HUP pid'? That should work fine. Have you cheched 'work_mem' after the reload? Because the explain plans are exactly the same (structure, estimated costs). The really interesting bit is this and it did not change at all Buckets: 1024 Batches: 64 Memory Usage: 650kB If that didn't change, I'm prepared to bet that the OP didn't actually manage to change the active value of work_mem. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query Performance SQL Server vs. Postgresql
1) OS/Configuration64-bit Windows 7 Enterprise with 8G RAM and a Dual Core 2.67 Ghz Intel CPUpostgresql-x64-9.0 (PostgreSQL 9.0.1, compiled by Visual C++ build 1500, 64-bit)work_mem 2GBshared_buffers = 22) Datasetname,pages,tuples,pg_size_prettypivotbad;1870;93496;15 MBpivotgood;5025;251212;39 MB 3) EXPLAIN (ANALYZE ON, BUFFERS ON)Hash Join (cost=16212.30..52586.43 rows=92869 width=17) (actual time=25814.222..32296.765 rows=3163 loops=1) Hash Cond: (((pb.id)::text = (pg.id)::text) AND ((pb.question)::text = (pg.question)::text)) Join Filter: ((COALESCE(pb.response, 'MISSING'::character varying))::text (COALESCE(pg.response, 'MISSING'::character varying))::text) Buffers: shared hit=384 read=6511, temp read=6444 written=6318 - Seq Scan on pivotbad pb (cost=0.00..2804.96 rows=93496 width=134) (actual time=0.069..37.143 rows=93496 loops=1) Buffers: shared hit=192 read=1678 - Hash (cost=7537.12..7537.12 rows=251212 width=134) (actual time=24621.752..24621.752 rows=251212 loops=1) Buckets: 1024 Batches: 64 Memory Usage: 650kBBuffers: shared hit=192 read=4833, temp written=4524- Seq Scan on pivotgood pg (cost=0.00..7537.12 rows=251212 width=134) (actual time=0.038..117.780 rows=251212 loops=1) Buffers: shared hit=192 read=4833Total runtime: 32297.305 ms 4) INDEXESI can certainly add an index but given the table sizes I am not sure if that is a factor. This by no means is a large dataset less than 350,000 rows in total and 3 columns. Also this was just a quick dump of data for comparison purpose. When I saw the poor performance on the COALESCE, I pointed the data load to SQL Server and ran the same query except with the TSQL specific ISNULL function.
Re: [PERFORM] Query Performance SQL Server vs. Postgresql
I am running 64-bit Windows 7 Enterprise with 8G RAM and a Dual Core 2.67 Ghz Intel CPU. Both the SQL 2008 R2 and Postgresql are installed on the same machine. The DISTINCT FROM instead of the COALESCE does not help much. I ran 2 further tests with work_mem modifications (please note memory usage is quite low 650kb, so I am not sure if the work_mem is a factor): First, I modified the work_mem setting to 1GB (reloaded config) from the default 1MB and I see a response time of 33 seconds. Results below from EXPLAIN ANALYZE: Hash Join (cost=16212.30..52586.43 rows=92869 width=17) (actual time=26742.343..33274.317 rows=3163 loops=1) Hash Cond: (((pb.id)::text = (pg.id)::text) AND ((pb.question)::text = (pg.question)::text)) Join Filter: ((COALESCE(pb.response, 'MISSING'::character varying))::text (COALESCE(pg.response, 'MISSING'::character varying))::text) - Seq Scan on pivotbad pb (cost=0.00..2804.96 rows=93496 width=134) (actual time=0.055..40.710 rows=93496 loops=1) - Hash (cost=7537.12..7537.12 rows=251212 width=134) (actual time=25603.460..25603.460 rows=251212 loops=1) Buckets: 1024 Batches: 64 Memory Usage: 650kB- Seq Scan on pivotgood pg (cost=0.00..7537.12 rows=251212 width=134) (actual time=0.050..120.269 rows=251212 loops=1)Total runtime: 33275.028 ms Second, I modified the work_mem setting to 2GB (reloaded config) and I see a response time of 38 seconds. Results below from EXPLAIN ANALYZE: Hash Join (cost=16212.30..52586.43 rows=92869 width=17) (actual time=26574.459..38406.422 rows=3163 loops=1) Hash Cond: (((pb.id)::text = (pg.id)::text) AND ((pb.question)::text = (pg.question)::text)) Join Filter: ((COALESCE(pb.response, 'MISSING'::character varying))::text (COALESCE(pg.response, 'MISSING'::character varying))::text) - Seq Scan on pivotbad pb (cost=0.00..2804.96 rows=93496 width=134) (actual time=0.067..37.938 rows=93496 loops=1) - Hash (cost=7537.12..7537.12 rows=251212 width=134) (actual time=26426.127..26426.127 rows=251212 loops=1) Buckets: 1024 Batches: 64 Memory Usage: 650kB- Seq Scan on pivotgood pg (cost=0.00..7537.12 rows=251212 width=134) (actual time=0.038..115.319 rows=251212 loops=1)Total runtime: 38406.927 ms By no means I am trying to compare the 2 products. When I noticed the slow behavior of COALESCE I tried it on SQL Server. And since they are running on the same machine my comment regarding apples to apples. It is possible that this is not an apples to apples comparison other than the fact that it is running on the same machine. From: pavel.steh...@gmail.com Date: Thu, 18 Nov 2010 07:14:24 +0100 Subject: Re: Query Performance SQL Server vs. Postgresql To: huma...@hotmail.com CC: pgsql-performance@postgresql.org Hello, there should be a problem in a statistic, they are out of reality. Please, try to use a DISTINCT OF operator now - maybe a statistic will be better. Next - try to increase a work_mem. Hash join is untypically slow in your comp. Regards Pavel Stehule 2010/11/17 Humair Mohammed huma...@hotmail.com: Yes strange indeed, I did rerun ANALYZE and VACCUM. Took 70 seconds to rerun the query. Results from EXPLAIN ANALYZE below: Hash Join (cost=16212.30..52586.43 rows=92869 width=17) (actual time=43200.223..49502.874 rows=3163 loops=1) Hash Cond: (((pb.id)::text = (pg.id)::text) AND ((pb.question)::text = (pg.question)::text)) Join Filter: ((COALESCE(pb.response, 'MISSING'::character varying))::text (COALESCE(pg.response, 'MISSING'::character varying))::text) - Seq Scan on pivotbad pb (cost=0.00..2804.96 rows=93496 width=134) (actual time=0.009..48.200 rows=93496 loops=1) - Hash (cost=7537.12..7537.12 rows=251212 width=134) (actual time=42919.453..42919.453 rows=251212 loops=1) Buckets: 1024 Batches: 64 Memory Usage: 650kB - Seq Scan on pivotgood pg (cost=0.00..7537.12 rows=251212 width=134) (actual time=0.119..173.019 rows=251212 loops=1) Total runtime: 49503.450 ms From: pavel.steh...@gmail.com Date: Wed, 17 Nov 2010 05:47:51 +0100 Subject: Re: Query Performance SQL Server vs. Postgresql To: huma...@hotmail.com CC: pgsql-performance@postgresql.org 2010/11/17 Humair Mohammed huma...@hotmail.com: There are no indexes on the tables either in SQL Server or Postgresql - I am comparing apples to apples here. I ran ANALYZE on the postgresql tables, after that query performance times are still high 42 seconds with COALESCE and 35 seconds with IS DISTINCT FROM. Here is the execution plan from Postgresql for qurey - select pb.id from pivotbad pb inner join pivotgood pg on pb.id = pg.id and pb.question = pg.question and coalesce(pb.response,'MISSING') coalesce(pg.response,'MISSING') Execution Time: 42 seconds Hash Join (cost=16212.30..48854.24 rows=93477 width=17) Hash Cond: (((pb.id)::text = (pg.id)::text) AND ((pb.question)::text =
Re: [PERFORM] Query Performance SQL Server vs. Postgresql
On Nov 21, 2010, at 12:16 PM, Tom Lane t...@sss.pgh.pa.us wrote: t...@fuzzy.cz writes: Second, I modified the work_mem setting to 2GB (reloaded config) and I see a response time of 38 seconds. Results below from EXPLAIN ANALYZE: How did you reload the config? Using 'kill -HUP pid'? That should work fine. Have you cheched 'work_mem' after the reload? Because the explain plans are exactly the same (structure, estimated costs). The really interesting bit is this and it did not change at all Buckets: 1024 Batches: 64 Memory Usage: 650kB If that didn't change, I'm prepared to bet that the OP didn't actually manage to change the active value of work_mem. Yep. All this speculation about slow disks and/or COALESCE strikes me as likely totally off-base. I think the original poster needs to run show work_mem right before the EXPLAIN ANALYZE to make sure the new value they set actually stuck. There's no reason for the planner to have used only 650kB if work_mem is set to anything =2MB. ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query Performance SQL Server vs. Postgresql
Hello I don't know. I checked similar query on similar dataset (size), and I have a times about 3 sec on much more worse notebook. So problem can be in disk IO operation speed - maybe in access to TOASTed value. 2010/11/21 Humair Mohammed huma...@hotmail.com: I am running 64-bit Windows 7 Enterprise with 8G RAM and a Dual Core 2.67 Ghz Intel CPU. Both the SQL 2008 R2 and Postgresql are installed on the same machine. The DISTINCT FROM instead of the COALESCE does not help much. I ran 2 further tests with work_mem modifications (please note memory usage is quite low 650kb, so I am not sure if the work_mem is a factor): it's has a little bit different meaning. work_mem is just limit, so memory usage must not be great than work_mem ever. if then pg increase butches number - store data to blocks on disk. Higher work_mem ~ less butches. So ideal is 1 butches. Regards Pavel Stehule First, I modified the work_mem setting to 1GB (reloaded config) from the default 1MB and I see a response time of 33 seconds. Results below from EXPLAIN ANALYZE: Hash Join (cost=16212.30..52586.43 rows=92869 width=17) (actual time=26742.343..33274.317 rows=3163 loops=1) Hash Cond: (((pb.id)::text = (pg.id)::text) AND ((pb.question)::text = (pg.question)::text)) Join Filter: ((COALESCE(pb.response, 'MISSING'::character varying))::text (COALESCE(pg.response, 'MISSING'::character varying))::text) - Seq Scan on pivotbad pb (cost=0.00..2804.96 rows=93496 width=134) (actual time=0.055..40.710 rows=93496 loops=1) - Hash (cost=7537.12..7537.12 rows=251212 width=134) (actual time=25603.460..25603.460 rows=251212 loops=1) Buckets: 1024 Batches: 64 Memory Usage: 650kB - Seq Scan on pivotgood pg (cost=0.00..7537.12 rows=251212 width=134) (actual time=0.050..120.269 rows=251212 loops=1) Total runtime: 33275.028 ms Second, I modified the work_mem setting to 2GB (reloaded config) and I see a response time of 38 seconds. Results below from EXPLAIN ANALYZE: Hash Join (cost=16212.30..52586.43 rows=92869 width=17) (actual time=26574.459..38406.422 rows=3163 loops=1) Hash Cond: (((pb.id)::text = (pg.id)::text) AND ((pb.question)::text = (pg.question)::text)) Join Filter: ((COALESCE(pb.response, 'MISSING'::character varying))::text (COALESCE(pg.response, 'MISSING'::character varying))::text) - Seq Scan on pivotbad pb (cost=0.00..2804.96 rows=93496 width=134) (actual time=0.067..37.938 rows=93496 loops=1) - Hash (cost=7537.12..7537.12 rows=251212 width=134) (actual time=26426.127..26426.127 rows=251212 loops=1) Buckets: 1024 Batches: 64 Memory Usage: 650kB - Seq Scan on pivotgood pg (cost=0.00..7537.12 rows=251212 width=134) (actual time=0.038..115.319 rows=251212 loops=1) Total runtime: 38406.927 ms By no means I am trying to compare the 2 products. When I noticed the slow behavior of COALESCE I tried it on SQL Server. And since they are running on the same machine my comment regarding apples to apples. It is possible that this is not an apples to apples comparison other than the fact that it is running on the same machine. From: pavel.steh...@gmail.com Date: Thu, 18 Nov 2010 07:14:24 +0100 Subject: Re: Query Performance SQL Server vs. Postgresql To: huma...@hotmail.com CC: pgsql-performance@postgresql.org Hello, there should be a problem in a statistic, they are out of reality. Please, try to use a DISTINCT OF operator now - maybe a statistic will be better. Next - try to increase a work_mem. Hash join is untypically slow in your comp. Regards Pavel Stehule 2010/11/17 Humair Mohammed huma...@hotmail.com: Yes strange indeed, I did rerun ANALYZE and VACCUM. Took 70 seconds to rerun the query. Results from EXPLAIN ANALYZE below: Hash Join (cost=16212.30..52586.43 rows=92869 width=17) (actual time=43200.223..49502.874 rows=3163 loops=1) Hash Cond: (((pb.id)::text = (pg.id)::text) AND ((pb.question)::text = (pg.question)::text)) Join Filter: ((COALESCE(pb.response, 'MISSING'::character varying))::text (COALESCE(pg.response, 'MISSING'::character varying))::text) - Seq Scan on pivotbad pb (cost=0.00..2804.96 rows=93496 width=134) (actual time=0.009..48.200 rows=93496 loops=1) - Hash (cost=7537.12..7537.12 rows=251212 width=134) (actual time=42919.453..42919.453 rows=251212 loops=1) Buckets: 1024 Batches: 64 Memory Usage: 650kB - Seq Scan on pivotgood pg (cost=0.00..7537.12 rows=251212 width=134) (actual time=0.119..173.019 rows=251212 loops=1) Total runtime: 49503.450 ms From: pavel.steh...@gmail.com Date: Wed, 17 Nov 2010 05:47:51 +0100 Subject: Re: Query Performance SQL Server vs. Postgresql To: huma...@hotmail.com CC: pgsql-performance@postgresql.org 2010/11/17 Humair Mohammed huma...@hotmail.com: There are no indexes on the tables either in SQL Server or Postgresql - I am comparing apples to apples here. I ran ANALYZE
Re: [PERFORM] Query Performance SQL Server vs. Postgresql
Tom Lane t...@sss.pgh.pa.us wrote: I have no idea how much memory SQL Server thinks it can use Hmmm... That triggered an old memory -- when we were running SQL Server on Windows there was some registry setting which we tweaked to prevent the OS from trying to cache disk I/O. (Sorry I don't remember the name of it.) That helped SQL Server perform better, but would cripple PostgreSQL -- it counts on OS caching. Of course, once we found that PostgreSQL was 70% faster on identical hardware with identical load, and switching the OS to Linux brought it to twice as fast, I haven't had to worry about SQL Server or Windows configurations. ;-) Don't panic if PostgreSQL seems slower at first, it's probably a configuration or maintenance schedule issue that can be sorted out. Besides the specific advice Tom gave you, you might want to browse this page for configuration in general: http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server And if you continue to experience performance issues, this page can help you get to a resolution quickly: http://wiki.postgresql.org/wiki/SlowQueryQuestions We've been very happy with the switch to PostgreSQL. We've had better performance, better reliability, less staff time needed to babysit backups, and we've been gradually using more of the advance features not available in other products. It's well worth the effort to get over those initial bumps resulting from product differences. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query Performance SQL Server vs. Postgresql
Dne 17.11.2010 05:47, Pavel Stehule napsal(a): 2010/11/17 Humair Mohammed huma...@hotmail.com: There are no indexes on the tables either in SQL Server or Postgresql - I am comparing apples to apples here. I ran ANALYZE on the postgresql tables, Actually no, you're not comparing apples to apples. You've provided so little information that you may be comparing apples to cucumbers or maybe some strange animals. 1) info about the install What OS is this running on? I guess it's Windows in both cases, right? How nuch memory is there? What is the size of shared_buffers? The default PostgreSQL settings is very very very limited, you have to bump it to a much larger value. What are the other inportant settings (e.g. the work_mem)? 2) info about the dataset How large are the tables? I don't mean number of rows, I mean number of blocks / occupied disk space. Run this query SELECT relname, relpages, reltuples, pg_size_pretty(pg_table_size(oid)) FROM pg_class WHERE relname IN ('table1', 'table2'); 3) info about the plan Please, provide EXPLAIN ANALYZE output, maybe with info about buffers, e.g. something like EXPLAIN (ANALYZE ON, BUFFERS ON) SELECT ... 4) no indexes ? Why have you decided not to use any indexes? If you want a decent performance, you will have to use indexes. Obviously there is some overhead associated with them, but it's premature optimization unless you prove the opposite. BTW I'm not a MSSQL expert, but it seems like it's building a bitmap index on the fly, to synchronize parallelized query - PostgreSQL does not support that. regards Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query Performance SQL Server vs. Postgresql
I have to concur. Sql is written specifially and only for Windows. It is optimized for windows. Postgreal is writeen for just about everything trying to use common code so there isn't much optimization because it has to be optimized based on the OS that is running it. Check out your config and send it to us. That would include the OS and hardware configs for both machines. On Wed, Nov 17, 2010 at 3:47 PM, Tomas Vondra t...@fuzzy.cz wrote: Dne 17.11.2010 05:47, Pavel Stehule napsal(a): 2010/11/17 Humair Mohammed huma...@hotmail.com: There are no indexes on the tables either in SQL Server or Postgresql - I am comparing apples to apples here. I ran ANALYZE on the postgresql tables, Actually no, you're not comparing apples to apples. You've provided so little information that you may be comparing apples to cucumbers or maybe some strange animals. 1) info about the install What OS is this running on? I guess it's Windows in both cases, right? How nuch memory is there? What is the size of shared_buffers? The default PostgreSQL settings is very very very limited, you have to bump it to a much larger value. What are the other inportant settings (e.g. the work_mem)? 2) info about the dataset How large are the tables? I don't mean number of rows, I mean number of blocks / occupied disk space. Run this query SELECT relname, relpages, reltuples, pg_size_pretty(pg_table_size(oid)) FROM pg_class WHERE relname IN ('table1', 'table2'); 3) info about the plan Please, provide EXPLAIN ANALYZE output, maybe with info about buffers, e.g. something like EXPLAIN (ANALYZE ON, BUFFERS ON) SELECT ... 4) no indexes ? Why have you decided not to use any indexes? If you want a decent performance, you will have to use indexes. Obviously there is some overhead associated with them, but it's premature optimization unless you prove the opposite. BTW I'm not a MSSQL expert, but it seems like it's building a bitmap index on the fly, to synchronize parallelized query - PostgreSQL does not support that. regards Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query Performance SQL Server vs. Postgresql
I thought that I've seen an announcement about the SQL Server for Linux on 04/01/2005? I cannot find the link right now, but I am quite certain that there was such an announcement. From: pgsql-performance-ow...@postgresql.org pgsql-performance-ow...@postgresql.org To: Tomas Vondra t...@fuzzy.cz Cc: pgsql-performance@postgresql.org pgsql-performance@postgresql.org Sent: Wed Nov 17 15:51:55 2010 Subject: Re: [PERFORM] Query Performance SQL Server vs. Postgresql I have to concur. Sql is written specifially and only for Windows. It is optimized for windows. Postgreal is writeen for just about everything trying to use common code so there isn't much optimization because it has to be optimized based on the OS that is running it. Check out your config and send it to us. That would include the OS and hardware configs for both machines. On Wed, Nov 17, 2010 at 3:47 PM, Tomas Vondra t...@fuzzy.czmailto:t...@fuzzy.cz wrote: Dne 17.11.2010 05:47, Pavel Stehule napsal(a): 2010/11/17 Humair Mohammed huma...@hotmail.commailto:huma...@hotmail.com: There are no indexes on the tables either in SQL Server or Postgresql - I am comparing apples to apples here. I ran ANALYZE on the postgresql tables, Actually no, you're not comparing apples to apples. You've provided so little information that you may be comparing apples to cucumbers or maybe some strange animals. 1) info about the install What OS is this running on? I guess it's Windows in both cases, right? How nuch memory is there? What is the size of shared_buffers? The default PostgreSQL settings is very very very limited, you have to bump it to a much larger value. What are the other inportant settings (e.g. the work_mem)? 2) info about the dataset How large are the tables? I don't mean number of rows, I mean number of blocks / occupied disk space. Run this query SELECT relname, relpages, reltuples, pg_size_pretty(pg_table_size(oid)) FROM pg_class WHERE relname IN ('table1', 'table2'); 3) info about the plan Please, provide EXPLAIN ANALYZE output, maybe with info about buffers, e.g. something like EXPLAIN (ANALYZE ON, BUFFERS ON) SELECT ... 4) no indexes ? Why have you decided not to use any indexes? If you want a decent performance, you will have to use indexes. Obviously there is some overhead associated with them, but it's premature optimization unless you prove the opposite. BTW I'm not a MSSQL expert, but it seems like it's building a bitmap index on the fly, to synchronize parallelized query - PostgreSQL does not support that. regards Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.orgmailto:pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query Performance SQL Server vs. Postgresql
Yes strange indeed, I did rerun ANALYZE and VACCUM. Took 70 seconds to rerun the query. Results from EXPLAIN ANALYZE below: Hash Join (cost=16212.30..52586.43 rows=92869 width=17) (actual time=43200.223..49502.874 rows=3163 loops=1) Hash Cond: (((pb.id)::text = (pg.id)::text) AND ((pb.question)::text = (pg.question)::text)) Join Filter: ((COALESCE(pb.response, 'MISSING'::character varying))::text (COALESCE(pg.response, 'MISSING'::character varying))::text) - Seq Scan on pivotbad pb (cost=0.00..2804.96 rows=93496 width=134) (actual time=0.009..48.200 rows=93496 loops=1) - Hash (cost=7537.12..7537.12 rows=251212 width=134) (actual time=42919.453..42919.453 rows=251212 loops=1) Buckets: 1024 Batches: 64 Memory Usage: 650kB- Seq Scan on pivotgood pg (cost=0.00..7537.12 rows=251212 width=134) (actual time=0.119..173.019 rows=251212 loops=1)Total runtime: 49503.450 ms From: pavel.steh...@gmail.com Date: Wed, 17 Nov 2010 05:47:51 +0100 Subject: Re: Query Performance SQL Server vs. Postgresql To: huma...@hotmail.com CC: pgsql-performance@postgresql.org 2010/11/17 Humair Mohammed huma...@hotmail.com: There are no indexes on the tables either in SQL Server or Postgresql - I am comparing apples to apples here. I ran ANALYZE on the postgresql tables, after that query performance times are still high 42 seconds with COALESCE and 35 seconds with IS DISTINCT FROM. Here is the execution plan from Postgresql for qurey - select pb.id from pivotbad pb inner join pivotgood pg on pb.id = pg.id and pb.question = pg.question and coalesce(pb.response,'MISSING') coalesce(pg.response,'MISSING') Execution Time: 42 seconds Hash Join (cost=16212.30..48854.24 rows=93477 width=17) Hash Cond: (((pb.id)::text = (pg.id)::text) AND ((pb.question)::text = (pg.question)::text)) Join Filter: ((COALESCE(pb.response, 'MISSING'::character varying))::text (COALESCE(pg.response, 'MISSING'::character varying))::text) - Seq Scan on pivotbad pb (cost=0.00..2804.96 rows=93496 width=134) - Hash (cost=7537.12..7537.12 rows=251212 width=134) - Seq Scan on pivotgood pg (cost=0.00..7537.12 rows=251212 width=134) this is little bit strange - did you ANALYZE and VACUUM? please send result of EXPLAIN ANALYZE Pavel And here is the execution plan from SQL Server for query - select pb.id from pivotbad pb inner join pivotgood pg on pb.id = pg.id and pb.question = pg.question and isnull(pb.response,'ISNULL') isnull(pg.response,'ISNULL') Execution Time: 1 second Cost: 1% |--Parallelism(Gather Streams) Cost: 31% |--Hash Match(Inner Join, HASH:([pb].[ID], [pb].[Question])=([pg].[ID], [pg].[Question]), RESIDUAL:([master].[dbo].[pivotbad].[ID] as [pb].[ID]=[master].[dbo].[pivotgood].[ID] as [pg].[ID] AND [master].[dbo].[pivotbad].[Question] as [pb].[Question]=[master].[dbo].[pivotgood].[Question] as [pg].[Question] AND [Expr1006][Expr1007])) Cost: 0% |--Bitmap(HASH:([pb].[ID], [pb].[Question]), DEFINE:([Bitmap1008])) Cost: 0%|--Compute Scalar(DEFINE:([Expr1006]=isnull([master].[dbo].[pivotbad].[Response] as [pb].[Response],'ISNULL'))) Cost: 6% |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([pb].[ID], [pb].[Question])) Cost: 12% |--Table Scan(OBJECT:([master].[dbo].[pivotbad] AS [pb])) Cost: 0% |--Compute Scalar(DEFINE:([Expr1007]=isnull([master].[dbo].[pivotgood].[Response] as [pg].[Response],'ISNULL'))) Cost: 17% |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([pg].[ID], [pg].[Question])) Cost: 33% |--Table Scan(OBJECT:([master].[dbo].[pivotgood] AS [pg]), WHERE:(PROBE([Bitmap1008],[master].[dbo].[pivotgood].[ID] as [pg].[ID],[master].[dbo].[pivotgood].[Question] as [pg].[Question]))) From: pavel.steh...@gmail.com Date: Tue, 16 Nov 2010 08:12:03 +0100 Subject: Re: [PERFORM] To: huma...@hotmail.com CC: pgsql-performance@postgresql.org 2010/11/15 Humair Mohammed huma...@hotmail.com: I have 2 tables with a 200,000 rows of data 3 character/string columns ID, Question and Response. The query below compares the data between the 2 tables based on ID and Question and if the Response does not match between the left table and the right table it identifies the ID's where there is a mismatch. Running the query in SQL Server 2008 using the ISNULL function take a few milliseconds. Running the same query in Postgresql takes over 70 seconds. The 2 queries are below: SQL Server 2008 R2 Query select t1.id from table1 t1 inner join table2 t2 on t1.id = t2.id and t1.question = t2.question and isnull(t1.response,'ISNULL') isnull(t2.response,'ISNULL') Postgres 9.1 Query select t1.id from table1 t1 inner join table2 t2 on t1.id = t2.id and t1.question = t2.question and
Re: [PERFORM] Query Performance SQL Server vs. Postgresql
There are no indexes on the tables either in SQL Server or Postgresql - I am comparing apples to apples here. I ran ANALYZE on the postgresql tables, after that query performance times are still high 42 seconds with COALESCE and 35 seconds with IS DISTINCT FROM. Here is the execution plan from Postgresql for qurey - select pb.id from pivotbad pb inner join pivotgood pg on pb.id = pg.id and pb.question = pg.question and coalesce(pb.response,'MISSING') coalesce(pg.response,'MISSING') Execution Time: 42 seconds Hash Join (cost=16212.30..48854.24 rows=93477 width=17) Hash Cond: (((pb.id)::text = (pg.id)::text) AND ((pb.question)::text = (pg.question)::text)) Join Filter: ((COALESCE(pb.response, 'MISSING'::character varying))::text (COALESCE(pg.response, 'MISSING'::character varying))::text) - Seq Scan on pivotbad pb (cost=0.00..2804.96 rows=93496 width=134) - Hash (cost=7537.12..7537.12 rows=251212 width=134)- Seq Scan on pivotgood pg (cost=0.00..7537.12 rows=251212 width=134) And here is the execution plan from SQL Server for query - select pb.id from pivotbad pb inner join pivotgood pg on pb.id = pg.id and pb.question = pg.question and isnull(pb.response,'ISNULL') isnull(pg.response,'ISNULL') Execution Time: 1 second Cost: 1% |--Parallelism(Gather Streams)Cost: 31% |--Hash Match(Inner Join, HASH:([pb].[ID], [pb].[Question])=([pg].[ID], [pg].[Question]), RESIDUAL:([master].[dbo].[pivotbad].[ID] as [pb].[ID]=[master].[dbo].[pivotgood].[ID] as [pg].[ID] AND [master].[dbo].[pivotbad].[Question] as [pb].[Question]=[master].[dbo].[pivotgood].[Question] as [pg].[Question] AND [Expr1006][Expr1007]))Cost: 0% |--Bitmap(HASH:([pb].[ID], [pb].[Question]), DEFINE:([Bitmap1008]))Cost: 0%|--Compute Scalar(DEFINE:([Expr1006]=isnull([master].[dbo].[pivotbad].[Response] as [pb].[Response],'ISNULL')))Cost: 6% |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([pb].[ID], [pb].[Question])) Cost: 12% |--Table Scan(OBJECT:([master].[dbo].[pivotbad] AS [pb])) Cost: 0% |--Compute Scalar(DEFINE:([Expr1007]=isnull([master].[dbo].[pivotgood].[Response] as [pg].[Response],'ISNULL')))Cost: 17% |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([pg].[ID], [pg].[Question])) Cost: 33% |--Table Scan(OBJECT:([master].[dbo].[pivotgood] AS [pg]), WHERE:(PROBE([Bitmap1008],[master].[dbo].[pivotgood].[ID] as [pg].[ID],[master].[dbo].[pivotgood].[Question] as [pg].[Question]))) From: pavel.steh...@gmail.com Date: Tue, 16 Nov 2010 08:12:03 +0100 Subject: Re: [PERFORM] To: huma...@hotmail.com CC: pgsql-performance@postgresql.org 2010/11/15 Humair Mohammed huma...@hotmail.com: I have 2 tables with a 200,000 rows of data 3 character/string columns ID, Question and Response. The query below compares the data between the 2 tables based on ID and Question and if the Response does not match between the left table and the right table it identifies the ID's where there is a mismatch. Running the query in SQL Server 2008 using the ISNULL function take a few milliseconds. Running the same query in Postgresql takes over 70 seconds. The 2 queries are below: SQL Server 2008 R2 Query select t1.id from table1 t1 inner join table2 t2 on t1.id = t2.id and t1.question = t2.question and isnull(t1.response,'ISNULL') isnull(t2.response,'ISNULL') Postgres 9.1 Query select t1.id from table1 t1 inner join table2 t2 on t1.id = t2.id and t1.question = t2.question and coalesce(t1.response,'ISNULL') coalesce(t2.response,'ISNULL') What gives? I think, so must problem can be in ugly predicate coalesce(t1.response,'ISNULL') coalesce(t2.response,'ISNULL') try use a IS DISTINCT OF operator ... AND t1.response IS DISTINCT t2.response Regards Pavel Stehule p.s. don't use a coalesce in WHERE clause if it is possible.
Re: [PERFORM] Query Performance SQL Server vs. Postgresql
Humair Mohammed huma...@hotmail.com writes: Yes strange indeed, I did rerun ANALYZE and VACCUM. Took 70 seconds to rerun the query. Results from EXPLAIN ANALYZE below: Hash Join (cost=16212.30..52586.43 rows=92869 width=17) (actual time=43200.223..49502.874 rows=3163 loops=1) Hash Cond: (((pb.id)::text = (pg.id)::text) AND ((pb.question)::text = (pg.question)::text)) Join Filter: ((COALESCE(pb.response, 'MISSING'::character varying))::text (COALESCE(pg.response, 'MISSING'::character varying))::text) - Seq Scan on pivotbad pb (cost=0.00..2804.96 rows=93496 width=134) (actual time=0.009..48.200 rows=93496 loops=1) - Hash (cost=7537.12..7537.12 rows=251212 width=134) (actual time=42919.453..42919.453 rows=251212 loops=1)Buckets: 1024 Batches: 64 Memory Usage: 650kB- Seq Scan on pivotgood pg (cost=0.00..7537.12 rows=251212 width=134) (actual time=0.119..173.019 rows=251212 loops=1)Total runtime: 49503.450 ms I have no idea how much memory SQL Server thinks it can use, but Postgres is limiting itself to work_mem which you've apparently left at the default 1MB. You might get a fairer comparison by bumping that up some --- try 32MB or so. You want it high enough so that the Hash output doesn't say there are multiple batches. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query Performance SQL Server vs. Postgresql
Hello, there should be a problem in a statistic, they are out of reality. Please, try to use a DISTINCT OF operator now - maybe a statistic will be better. Next - try to increase a work_mem. Hash join is untypically slow in your comp. Regards Pavel Stehule 2010/11/17 Humair Mohammed huma...@hotmail.com: Yes strange indeed, I did rerun ANALYZE and VACCUM. Took 70 seconds to rerun the query. Results from EXPLAIN ANALYZE below: Hash Join (cost=16212.30..52586.43 rows=92869 width=17) (actual time=43200.223..49502.874 rows=3163 loops=1) Hash Cond: (((pb.id)::text = (pg.id)::text) AND ((pb.question)::text = (pg.question)::text)) Join Filter: ((COALESCE(pb.response, 'MISSING'::character varying))::text (COALESCE(pg.response, 'MISSING'::character varying))::text) - Seq Scan on pivotbad pb (cost=0.00..2804.96 rows=93496 width=134) (actual time=0.009..48.200 rows=93496 loops=1) - Hash (cost=7537.12..7537.12 rows=251212 width=134) (actual time=42919.453..42919.453 rows=251212 loops=1) Buckets: 1024 Batches: 64 Memory Usage: 650kB - Seq Scan on pivotgood pg (cost=0.00..7537.12 rows=251212 width=134) (actual time=0.119..173.019 rows=251212 loops=1) Total runtime: 49503.450 ms From: pavel.steh...@gmail.com Date: Wed, 17 Nov 2010 05:47:51 +0100 Subject: Re: Query Performance SQL Server vs. Postgresql To: huma...@hotmail.com CC: pgsql-performance@postgresql.org 2010/11/17 Humair Mohammed huma...@hotmail.com: There are no indexes on the tables either in SQL Server or Postgresql - I am comparing apples to apples here. I ran ANALYZE on the postgresql tables, after that query performance times are still high 42 seconds with COALESCE and 35 seconds with IS DISTINCT FROM. Here is the execution plan from Postgresql for qurey - select pb.id from pivotbad pb inner join pivotgood pg on pb.id = pg.id and pb.question = pg.question and coalesce(pb.response,'MISSING') coalesce(pg.response,'MISSING') Execution Time: 42 seconds Hash Join (cost=16212.30..48854.24 rows=93477 width=17) Hash Cond: (((pb.id)::text = (pg.id)::text) AND ((pb.question)::text = (pg.question)::text)) Join Filter: ((COALESCE(pb.response, 'MISSING'::character varying))::text (COALESCE(pg.response, 'MISSING'::character varying))::text) - Seq Scan on pivotbad pb (cost=0.00..2804.96 rows=93496 width=134) - Hash (cost=7537.12..7537.12 rows=251212 width=134) - Seq Scan on pivotgood pg (cost=0.00..7537.12 rows=251212 width=134) this is little bit strange - did you ANALYZE and VACUUM? please send result of EXPLAIN ANALYZE Pavel And here is the execution plan from SQL Server for query - select pb.id from pivotbad pb inner join pivotgood pg on pb.id = pg.id and pb.question = pg.question and isnull(pb.response,'ISNULL') isnull(pg.response,'ISNULL') Execution Time: 1 second Cost: 1% |--Parallelism(Gather Streams) Cost: 31% |--Hash Match(Inner Join, HASH:([pb].[ID], [pb].[Question])=([pg].[ID], [pg].[Question]), RESIDUAL:([master].[dbo].[pivotbad].[ID] as [pb].[ID]=[master].[dbo].[pivotgood].[ID] as [pg].[ID] AND [master].[dbo].[pivotbad].[Question] as [pb].[Question]=[master].[dbo].[pivotgood].[Question] as [pg].[Question] AND [Expr1006][Expr1007])) Cost: 0% |--Bitmap(HASH:([pb].[ID], [pb].[Question]), DEFINE:([Bitmap1008])) Cost: 0% |--Compute Scalar(DEFINE:([Expr1006]=isnull([master].[dbo].[pivotbad].[Response] as [pb].[Response],'ISNULL'))) Cost: 6% |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([pb].[ID], [pb].[Question])) Cost: 12% |--Table Scan(OBJECT:([master].[dbo].[pivotbad] AS [pb])) Cost: 0% |--Compute Scalar(DEFINE:([Expr1007]=isnull([master].[dbo].[pivotgood].[Response] as [pg].[Response],'ISNULL'))) Cost: 17% |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([pg].[ID], [pg].[Question])) Cost: 33% |--Table Scan(OBJECT:([master].[dbo].[pivotgood] AS [pg]), WHERE:(PROBE([Bitmap1008],[master].[dbo].[pivotgood].[ID] as [pg].[ID],[master].[dbo].[pivotgood].[Question] as [pg].[Question]))) From: pavel.steh...@gmail.com Date: Tue, 16 Nov 2010 08:12:03 +0100 Subject: Re: [PERFORM] To: huma...@hotmail.com CC: pgsql-performance@postgresql.org 2010/11/15 Humair Mohammed huma...@hotmail.com: I have 2 tables with a 200,000 rows of data 3 character/string columns ID, Question and Response. The query below compares the data between the 2 tables based on ID and Question and if the Response does not match between the left table and the right table it identifies the ID's where there is a mismatch. Running the query in SQL Server 2008 using the ISNULL function take a few milliseconds. Running the same query in Postgresql takes over 70 seconds. The 2
Re: [PERFORM] Query Performance SQL Server vs. Postgresql
2010/11/17 Humair Mohammed huma...@hotmail.com: There are no indexes on the tables either in SQL Server or Postgresql - I am comparing apples to apples here. I ran ANALYZE on the postgresql tables, after that query performance times are still high 42 seconds with COALESCE and 35 seconds with IS DISTINCT FROM. Here is the execution plan from Postgresql for qurey - select pb.id from pivotbad pb inner join pivotgood pg on pb.id = pg.id and pb.question = pg.question and coalesce(pb.response,'MISSING') coalesce(pg.response,'MISSING') Execution Time: 42 seconds Hash Join (cost=16212.30..48854.24 rows=93477 width=17) Hash Cond: (((pb.id)::text = (pg.id)::text) AND ((pb.question)::text = (pg.question)::text)) Join Filter: ((COALESCE(pb.response, 'MISSING'::character varying))::text (COALESCE(pg.response, 'MISSING'::character varying))::text) - Seq Scan on pivotbad pb (cost=0.00..2804.96 rows=93496 width=134) - Hash (cost=7537.12..7537.12 rows=251212 width=134) - Seq Scan on pivotgood pg (cost=0.00..7537.12 rows=251212 width=134) this is little bit strange - did you ANALYZE and VACUUM? please send result of EXPLAIN ANALYZE Pavel And here is the execution plan from SQL Server for query - select pb.id from pivotbad pb inner join pivotgood pg on pb.id = pg.id and pb.question = pg.question and isnull(pb.response,'ISNULL') isnull(pg.response,'ISNULL') Execution Time: 1 second Cost: 1% |--Parallelism(Gather Streams) Cost: 31% |--Hash Match(Inner Join, HASH:([pb].[ID], [pb].[Question])=([pg].[ID], [pg].[Question]), RESIDUAL:([master].[dbo].[pivotbad].[ID] as [pb].[ID]=[master].[dbo].[pivotgood].[ID] as [pg].[ID] AND [master].[dbo].[pivotbad].[Question] as [pb].[Question]=[master].[dbo].[pivotgood].[Question] as [pg].[Question] AND [Expr1006][Expr1007])) Cost: 0% |--Bitmap(HASH:([pb].[ID], [pb].[Question]), DEFINE:([Bitmap1008])) Cost: 0% |--Compute Scalar(DEFINE:([Expr1006]=isnull([master].[dbo].[pivotbad].[Response] as [pb].[Response],'ISNULL'))) Cost: 6% |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([pb].[ID], [pb].[Question])) Cost: 12% |--Table Scan(OBJECT:([master].[dbo].[pivotbad] AS [pb])) Cost: 0% |--Compute Scalar(DEFINE:([Expr1007]=isnull([master].[dbo].[pivotgood].[Response] as [pg].[Response],'ISNULL'))) Cost: 17% |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([pg].[ID], [pg].[Question])) Cost: 33% |--Table Scan(OBJECT:([master].[dbo].[pivotgood] AS [pg]), WHERE:(PROBE([Bitmap1008],[master].[dbo].[pivotgood].[ID] as [pg].[ID],[master].[dbo].[pivotgood].[Question] as [pg].[Question]))) From: pavel.steh...@gmail.com Date: Tue, 16 Nov 2010 08:12:03 +0100 Subject: Re: [PERFORM] To: huma...@hotmail.com CC: pgsql-performance@postgresql.org 2010/11/15 Humair Mohammed huma...@hotmail.com: I have 2 tables with a 200,000 rows of data 3 character/string columns ID, Question and Response. The query below compares the data between the 2 tables based on ID and Question and if the Response does not match between the left table and the right table it identifies the ID's where there is a mismatch. Running the query in SQL Server 2008 using the ISNULL function take a few milliseconds. Running the same query in Postgresql takes over 70 seconds. The 2 queries are below: SQL Server 2008 R2 Query select t1.id from table1 t1 inner join table2 t2 on t1.id = t2.id and t1.question = t2.question and isnull(t1.response,'ISNULL') isnull(t2.response,'ISNULL') Postgres 9.1 Query select t1.id from table1 t1 inner join table2 t2 on t1.id = t2.id and t1.question = t2.question and coalesce(t1.response,'ISNULL') coalesce(t2.response,'ISNULL') What gives? I think, so must problem can be in ugly predicate coalesce(t1.response,'ISNULL') coalesce(t2.response,'ISNULL') try use a IS DISTINCT OF operator ... AND t1.response IS DISTINCT t2.response Regards Pavel Stehule p.s. don't use a coalesce in WHERE clause if it is possible. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance