Re: [PERFORM] Query Performance SQL Server vs. Postgresql

2010-11-23 Thread Humair Mohammed

I did some further analysis and here are the results:
work_mem;response_time1MB;62 seconds2MB;2 seconds4MB;700 milliseconds8MB;550 
milliseconds
In all cases shared_buffers were set to the default value of 32MB. As you can 
see the 1 to 2 MB jump on the work_mem does wonders. I probably don't need this 
to be any higher than 8 or 16 MB. Thanks to all for help!
Humair
 Date: Mon, 22 Nov 2010 12:00:15 +0100
 Subject: Re: [PERFORM] Query Performance SQL Server vs. Postgresql
 From: t...@fuzzy.cz
 To: huma...@hotmail.com
 CC: pgsql-performance@postgresql.org
 
 
 
  Correct, the optimizer did not take the settings with the pg_ctl reload
  command. I did a pg_ctl restart and work_mem now displays the updated
  value. I had to bump up all the way to 2047 MB to get the response below
  (with work_mem at 1024 MB I see 7 seconds response time) and with 2047 MB
  (which is the max value that can be set for work_mem - anything more than
  that results in a FATAL error because of the limit) the results are below.
 
 Hm, can you post explain plan for the case work_mem=1024MB. I guess the
 difference is due to caching. According to the explain analyze, there are
 just cache hits, no reads.
 
 Anyway the hash join uses only about 40MB of memory, so 1024MB should be
 perfectly fine and the explain plan should be exactly the same as with
 work_mem=2047MB. And the row estimates seem quite precise, so I don't
 think there's some severe overestimation.
 
 Tomas
 
  

Re: [PERFORM] Query Performance SQL Server vs. Postgresql

2010-11-23 Thread Merlin Moncure
On Mon, Nov 22, 2010 at 7:12 PM, Humair Mohammed huma...@hotmail.com wrote:
 I did some further analysis and here are the results:
 work_mem;response_time
 1MB;62 seconds
 2MB;2 seconds
 4MB;700 milliseconds
 8MB;550 milliseconds
 In all cases shared_buffers were set to the default value of 32MB. As you
 can see the 1 to 2 MB jump on the work_mem does wonders. I probably don't
 need this to be any higher than 8 or 16 MB. Thanks to all for help!
 Humair

work_mem directly affects how the query is planned, because certain
types of plans (hash joins ans large sorts) require memory. raising or
lowering shared_buffers OTOH is very subtle and is not something you
tune to improve the execution of a single query...

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query Performance SQL Server vs. Postgresql

2010-11-22 Thread Humair Mohammed

That was a typo:
work_mem = 2GBshared_buffers = 2GB
 From: pavel.steh...@gmail.com
 Date: Sun, 21 Nov 2010 12:38:43 +0100
 Subject: Re: [PERFORM] Query Performance SQL Server vs. Postgresql
 To: huma...@hotmail.com
 CC: pgsql-performance@postgresql.org
 
 2010/11/21 Humair Mohammed huma...@hotmail.com:
 
  1) OS/Configuration
  64-bit Windows 7 Enterprise with 8G RAM and a Dual Core 2.67 Ghz Intel CPU
  postgresql-x64-9.0 (PostgreSQL 9.0.1, compiled by Visual C++ build 1500,
  64-bit)
  work_mem  2GB
  shared_buffers = 2
 
 shared_buffers = 2 ???
 
 Regards
 
 Pavel Stehule
 
 
  2) Dataset
  name,pages,tuples,pg_size_pretty
  pivotbad;1870;93496;15 MB
  pivotgood;5025;251212;39 MB
  3) EXPLAIN (ANALYZE ON, BUFFERS ON)
  Hash Join  (cost=16212.30..52586.43 rows=92869 width=17) (actual
  time=25814.222..32296.765 rows=3163 loops=1)
Hash Cond: (((pb.id)::text = (pg.id)::text) AND ((pb.question)::text =
  (pg.question)::text))
Join Filter: ((COALESCE(pb.response, 'MISSING'::character varying))::text
   (COALESCE(pg.response, 'MISSING'::character varying))::text)
Buffers: shared hit=384 read=6511, temp read=6444 written=6318
-  Seq Scan on pivotbad pb  (cost=0.00..2804.96 rows=93496 width=134)
  (actual time=0.069..37.143 rows=93496 loops=1)
  Buffers: shared hit=192 read=1678
-  Hash  (cost=7537.12..7537.12 rows=251212 width=134) (actual
  time=24621.752..24621.752 rows=251212 loops=1)
  Buckets: 1024  Batches: 64  Memory Usage: 650kB
  Buffers: shared hit=192 read=4833, temp written=4524
  -  Seq Scan on pivotgood pg  (cost=0.00..7537.12 rows=251212
  width=134) (actual time=0.038..117.780 rows=251212 loops=1)
Buffers: shared hit=192 read=4833
  Total runtime: 32297.305 ms
  4) INDEXES
  I can certainly add an index but given the table sizes I am not sure if that
  is a factor. This by no means is a large dataset less than 350,000 rows in
  total and 3 columns. Also this was just a quick dump of data for comparison
  purpose. When I saw the poor performance on the COALESCE, I pointed the data
  load to SQL Server and ran the same query except with the TSQL specific
  ISNULL function.
 
  

Re: [PERFORM] Query Performance SQL Server vs. Postgresql

2010-11-22 Thread Humair Mohammed


Correct, the optimizer did not take the settings with the pg_ctl reload 
command. I did a pg_ctl restart and work_mem now displays the updated value. I 
had to bump up all the way to 2047 MB to get the response below (with work_mem 
at 1024 MB I see 7 seconds response time) and with 2047 MB (which is the max 
value that can be set for work_mem - anything more than that results in a FATAL 
error because of the limit) the results are below. The batches and memory usage 
are reflecting the right behavior with these settings. Thanks for everyones 
input, the result is now matching what SQL Server was producing.
Hash Join  (cost=11305.30..39118.43 rows=92869 width=17) (actual 
time=145.888..326.216 rows=3163 loops=1)  Hash Cond: (((pb.id)::text = 
(pg.id)::text) AND ((pb.question)::text = (pg.question)::text))  Join Filter: 
((COALESCE(pb.response, 'MISSING'::character varying))::text  
(COALESCE(pg.response, 'MISSING'::character varying))::text)  Buffers: shared 
hit=6895  -  Seq Scan on pivotbad pb  (cost=0.00..2804.96 rows=93496 
width=134) (actual time=0.011..11.903 rows=93496 loops=1)Buffers: 
shared hit=1870  -  Hash  (cost=7537.12..7537.12 rows=251212 width=134) 
(actual time=145.673..145.673 rows=251212 loops=1)Buckets: 32768  
Batches: 1  Memory Usage: 39939kBBuffers: shared hit=5025- 
 Seq Scan on pivotgood pg  (cost=0.00..7537.12 rows=251212 width=134) (actual 
time=0.004..26.242 rows=251212 loops=1)  Buffers: shared 
hit=5025Total runtime: 331.168 ms
Humair

 CC: t...@fuzzy.cz; huma...@hotmail.com; pavel.steh...@gmail.com; 
 pgsql-performance@postgresql.org
 From: robertmh...@gmail.com
 Subject: Re: [PERFORM] Query Performance SQL Server vs. Postgresql
 Date: Sun, 21 Nov 2010 13:55:54 -0500
 To: t...@sss.pgh.pa.us
 
 On Nov 21, 2010, at 12:16 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  t...@fuzzy.cz writes:
  Second, I modified the work_mem setting to 2GB (reloaded config) and I see
  a response time of 38 seconds. Results below from EXPLAIN ANALYZE:
  
  How did you reload the config? Using 'kill -HUP pid'? That should work
  fine. Have you cheched 'work_mem' after the reload?
  
  Because the explain plans are exactly the same (structure, estimated
  costs). The really interesting bit is this and it did not change at all
  
Buckets: 1024 Batches: 64  Memory Usage: 650kB
  
  If that didn't change, I'm prepared to bet that the OP didn't actually
  manage to change the active value of work_mem.
 
 Yep.  All this speculation about slow disks and/or COALESCE strikes me as 
 likely totally off-base. I think the original poster needs to run show 
 work_mem right before the EXPLAIN ANALYZE to make sure the new value they 
 set actually stuck. There's no reason for the planner to have used only 650kB 
 if work_mem is set to anything =2MB.
 
 ...Robert
  

Re: [PERFORM] Query Performance SQL Server vs. Postgresql

2010-11-22 Thread tv


 Correct, the optimizer did not take the settings with the pg_ctl reload
 command. I did a pg_ctl restart and work_mem now displays the updated
 value. I had to bump up all the way to 2047 MB to get the response below
 (with work_mem at 1024 MB I see 7 seconds response time) and with 2047 MB
 (which is the max value that can be set for work_mem - anything more than
 that results in a FATAL error because of the limit) the results are below.

Hm, can you post explain plan for the case work_mem=1024MB. I guess the
difference is due to caching. According to the explain analyze, there are
just cache hits, no reads.

Anyway the hash join uses only about 40MB of memory, so 1024MB should be
perfectly fine and the explain plan should be exactly the same as with
work_mem=2047MB. And the row estimates seem quite precise, so I don't
think there's some severe overestimation.

Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query Performance SQL Server vs. Postgresql

2010-11-22 Thread Samuel Gendler
On Sun, Nov 21, 2010 at 10:21 PM, Humair Mohammed huma...@hotmail.comwrote:


 Correct, the optimizer did not take the settings with the pg_ctl reload
 command. I did a pg_ctl restart and work_mem now displays the updated value.
 I had to bump up all the way to 2047 MB to get the response below (with
 work_mem at 1024 MB I see 7 seconds response time) and with 2047 MB (which
 is the max value that can be set for work_mem - anything more than that
 results in a FATAL error because of the limit) the results are below. The
 batches and memory usage are reflecting the right behavior with these
 settings. Thanks for everyones input, the result is now matching what SQL
 Server was producing.


I believe you can set work_mem to a different value just for the duration of
a single query, so you needn't have work_mem set so high if for every query
on the system.  A single query may well use a multiple of work_mem, so you
really probably don't want it that high all the time unless all of your
queries are structured similarly.  Just set work_mem='2047MB'; query; reset
all;

But you should wait until someone more knowledgable than I confirm what I
just wrote.


Re: [PERFORM] Query Performance SQL Server vs. Postgresql

2010-11-22 Thread tv
 I believe you can set work_mem to a different value just for the duration
 of
 a single query, so you needn't have work_mem set so high if for every
 query
 on the system.  A single query may well use a multiple of work_mem, so you
 really probably don't want it that high all the time unless all of your
 queries are structured similarly.  Just set work_mem='2047MB'; query;
 reset
 all;

Yes, executing set work_mem='64MB' right before the query should be just
fine. Setting work_mem to 2GB is an overkill most of the time (99.9%).

Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query Performance SQL Server vs. Postgresql

2010-11-21 Thread Pavel Stehule
2010/11/21 Humair Mohammed huma...@hotmail.com:

 1) OS/Configuration
 64-bit Windows 7 Enterprise with 8G RAM and a Dual Core 2.67 Ghz Intel CPU
 postgresql-x64-9.0 (PostgreSQL 9.0.1, compiled by Visual C++ build 1500,
 64-bit)
 work_mem  2GB
 shared_buffers = 2

shared_buffers = 2 ???

Regards

Pavel Stehule


 2) Dataset
 name,pages,tuples,pg_size_pretty
 pivotbad;1870;93496;15 MB
 pivotgood;5025;251212;39 MB
 3) EXPLAIN (ANALYZE ON, BUFFERS ON)
 Hash Join  (cost=16212.30..52586.43 rows=92869 width=17) (actual
 time=25814.222..32296.765 rows=3163 loops=1)
   Hash Cond: (((pb.id)::text = (pg.id)::text) AND ((pb.question)::text =
 (pg.question)::text))
   Join Filter: ((COALESCE(pb.response, 'MISSING'::character varying))::text
  (COALESCE(pg.response, 'MISSING'::character varying))::text)
   Buffers: shared hit=384 read=6511, temp read=6444 written=6318
   -  Seq Scan on pivotbad pb  (cost=0.00..2804.96 rows=93496 width=134)
 (actual time=0.069..37.143 rows=93496 loops=1)
         Buffers: shared hit=192 read=1678
   -  Hash  (cost=7537.12..7537.12 rows=251212 width=134) (actual
 time=24621.752..24621.752 rows=251212 loops=1)
         Buckets: 1024  Batches: 64  Memory Usage: 650kB
         Buffers: shared hit=192 read=4833, temp written=4524
         -  Seq Scan on pivotgood pg  (cost=0.00..7537.12 rows=251212
 width=134) (actual time=0.038..117.780 rows=251212 loops=1)
               Buffers: shared hit=192 read=4833
 Total runtime: 32297.305 ms
 4) INDEXES
 I can certainly add an index but given the table sizes I am not sure if that
 is a factor. This by no means is a large dataset less than 350,000 rows in
 total and 3 columns. Also this was just a quick dump of data for comparison
 purpose. When I saw the poor performance on the COALESCE, I pointed the data
 load to SQL Server and ran the same query except with the TSQL specific
 ISNULL function.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query Performance SQL Server vs. Postgresql

2010-11-21 Thread tv
 4) INDEXESI can certainly add an index but given the table sizes I am not
 sure if that is a factor. This by no means is a large dataset less than
 350,000 rows in total and 3 columns. Also this was just a quick dump of
 data for comparison purpose. When I saw the poor performance on the
 COALESCE, I pointed the data load to SQL Server and ran the same query
 except with the TSQL specific ISNULL function.

35 rows definitely is a lot of rows, although with 3 INT column it's
just about 13MB of data (including overhead). But indexes can be quite
handy when doing joins, as in this case.

Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query Performance SQL Server vs. Postgresql

2010-11-21 Thread Kevin Grittner
Pavel Stehule  wrote:
 2010/11/21 Humair Mohammed :
 
 shared_buffers = 2

 shared_buffers = 2 ???
 
Yeah, if that's not a typo, that's a very serious misconfiguration.
 
With 8 GB of RAM in the machine, this should probably be set to
somewhere between 200 MB and 2 GB, depending on your workload and
what else is running on the machine.
 
Please read through this page and make use of the information:
 
http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query Performance SQL Server vs. Postgresql

2010-11-21 Thread tv
 First, I modified the work_mem setting to 1GB (reloaded config) from the
 default 1MB and I see a response time of 33 seconds. Results below from
 EXPLAIN ANALYZE:

...

 Second, I modified the work_mem setting to 2GB (reloaded config) and I see
 a response time of 38 seconds. Results below from EXPLAIN ANALYZE:

...

How did you reload the config? Using 'kill -HUP pid'? That should work
fine. Have you cheched 'work_mem' after the reload?

Because the explain plans are exactly the same (structure, estimated
costs). The really interesting bit is this and it did not change at all

   Buckets: 1024 Batches: 64  Memory Usage: 650kB

As Tom Lane already mentioned, splitting hash join into batches (due to
small memory) adds overhead, the optimal number of batches is 1. But I
guess 1GB of work_mem is an overkill - something like 64MB should be fine.

The suspicious thing is the query plans have not changed at all
(especially the number of batches). I think you're not telling us
something important (unintentionally of course).

 By no means I am trying to compare the 2 products. When I noticed the slow
 behavior of COALESCE I tried it on SQL Server. And since they are running
 on the same machine my comment regarding apples to apples. It is possible
 that this is not an apples to apples comparison other than the fact that
 it is running on the same machine.

OK. The point of my post was that you've provided very little info about
the settings etc. so it was difficult to identify why PostgreSQL is so
slow.

Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query Performance SQL Server vs. Postgresql

2010-11-21 Thread tv
 4) INDEXESI can certainly add an index but given the table sizes I am
 not
 sure if that is a factor. This by no means is a large dataset less than
 350,000 rows in total and 3 columns. Also this was just a quick dump of
 data for comparison purpose. When I saw the poor performance on the
 COALESCE, I pointed the data load to SQL Server and ran the same query
 except with the TSQL specific ISNULL function.

 35 rows definitely is a lot of rows, although with 3 INT column it's
 just about 13MB of data (including overhead). But indexes can be quite
 handy when doing joins, as in this case.

OK, I've just realized the tables have 3 character columns, not integers.
In that case the tables are probably much bigger (and there are things
like TOAST). In that case indexes may be even more important.

Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query Performance SQL Server vs. Postgresql

2010-11-21 Thread tv
 Pavel Stehule  wrote:
 2010/11/21 Humair Mohammed :

 shared_buffers = 2

 shared_buffers = 2 ???

 Yeah, if that's not a typo, that's a very serious misconfiguration.

I guess that's a typo, as the explain plain in one of the previous posts
contains

   Buffers: shared hit=192 read=4833

for a sequential scan. But I still don't know why is the query so slow :-(

regards
Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query Performance SQL Server vs. Postgresql

2010-11-21 Thread Tom Lane
t...@fuzzy.cz writes:
 Second, I modified the work_mem setting to 2GB (reloaded config) and I see
 a response time of 38 seconds. Results below from EXPLAIN ANALYZE:

 How did you reload the config? Using 'kill -HUP pid'? That should work
 fine. Have you cheched 'work_mem' after the reload?

 Because the explain plans are exactly the same (structure, estimated
 costs). The really interesting bit is this and it did not change at all

Buckets: 1024 Batches: 64  Memory Usage: 650kB

If that didn't change, I'm prepared to bet that the OP didn't actually
manage to change the active value of work_mem.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query Performance SQL Server vs. Postgresql

2010-11-21 Thread Humair Mohammed


1) OS/Configuration64-bit Windows 7 Enterprise with 8G RAM and a Dual Core 2.67 
Ghz Intel CPUpostgresql-x64-9.0 (PostgreSQL 9.0.1, compiled by Visual C++ build 
1500, 64-bit)work_mem  2GBshared_buffers = 22) 
Datasetname,pages,tuples,pg_size_prettypivotbad;1870;93496;15 
MBpivotgood;5025;251212;39 MB
3) EXPLAIN (ANALYZE ON, BUFFERS ON)Hash Join  (cost=16212.30..52586.43 
rows=92869 width=17) (actual time=25814.222..32296.765 rows=3163 loops=1)  
Hash Cond: (((pb.id)::text = (pg.id)::text) AND ((pb.question)::text = 
(pg.question)::text))  Join Filter: ((COALESCE(pb.response, 
'MISSING'::character varying))::text  (COALESCE(pg.response, 
'MISSING'::character varying))::text)  Buffers: shared hit=384 read=6511, 
temp read=6444 written=6318  -  Seq Scan on pivotbad pb  (cost=0.00..2804.96 
rows=93496 width=134) (actual time=0.069..37.143 rows=93496 loops=1)
Buffers: shared hit=192 read=1678  -  Hash  (cost=7537.12..7537.12 
rows=251212 width=134) (actual time=24621.752..24621.752 rows=251212 loops=1) 
   Buckets: 1024  Batches: 64  Memory Usage: 650kBBuffers: shared 
hit=192 read=4833, temp written=4524-  Seq Scan on pivotgood pg  
(cost=0.00..7537.12 rows=251212 width=134) (actual time=0.038..117.780 
rows=251212 loops=1)  Buffers: shared hit=192 read=4833Total 
runtime: 32297.305 ms
4) INDEXESI can certainly add an index but given the table sizes I am not sure 
if that is a factor. This by no means is a large dataset less than 350,000 rows 
in total and 3 columns. Also this was just a quick dump of data for comparison 
purpose. When I saw the poor performance on the COALESCE, I pointed the data 
load to SQL Server and ran the same query except with the TSQL specific ISNULL 
function.
  

Re: [PERFORM] Query Performance SQL Server vs. Postgresql

2010-11-21 Thread Humair Mohammed

I am running 64-bit Windows 7 Enterprise with 8G RAM and a Dual Core 2.67 Ghz 
Intel CPU. Both the SQL 2008 R2 and Postgresql are installed on the same 
machine. The DISTINCT FROM instead of the COALESCE does not help much. I ran 2 
further tests with work_mem modifications (please note memory usage is quite 
low 650kb, so I am not sure if the work_mem is a factor):
First, I modified the work_mem setting to 1GB (reloaded config) from the 
default 1MB and I see a response time of 33 seconds. Results below from EXPLAIN 
ANALYZE:
Hash Join  (cost=16212.30..52586.43 rows=92869 width=17) (actual 
time=26742.343..33274.317 rows=3163 loops=1)  Hash Cond: (((pb.id)::text = 
(pg.id)::text) AND ((pb.question)::text = (pg.question)::text))  Join Filter: 
((COALESCE(pb.response, 'MISSING'::character varying))::text  
(COALESCE(pg.response, 'MISSING'::character varying))::text)  -  Seq Scan on 
pivotbad pb  (cost=0.00..2804.96 rows=93496 width=134) (actual 
time=0.055..40.710 rows=93496 loops=1)  -  Hash  (cost=7537.12..7537.12 
rows=251212 width=134) (actual time=25603.460..25603.460 rows=251212 loops=1) 
   Buckets: 1024  Batches: 64  Memory Usage: 650kB-  Seq Scan on 
pivotgood pg  (cost=0.00..7537.12 rows=251212 width=134) (actual 
time=0.050..120.269 rows=251212 loops=1)Total runtime: 33275.028 ms

Second, I modified the work_mem setting to 2GB (reloaded config) and I see a 
response time of 38 seconds. Results below from EXPLAIN ANALYZE:
Hash Join  (cost=16212.30..52586.43 rows=92869 width=17) (actual 
time=26574.459..38406.422 rows=3163 loops=1)  Hash Cond: (((pb.id)::text = 
(pg.id)::text) AND ((pb.question)::text = (pg.question)::text))  Join Filter: 
((COALESCE(pb.response, 'MISSING'::character varying))::text  
(COALESCE(pg.response, 'MISSING'::character varying))::text)  -  Seq Scan on 
pivotbad pb  (cost=0.00..2804.96 rows=93496 width=134) (actual 
time=0.067..37.938 rows=93496 loops=1)  -  Hash  (cost=7537.12..7537.12 
rows=251212 width=134) (actual time=26426.127..26426.127 rows=251212 loops=1) 
   Buckets: 1024  Batches: 64  Memory Usage: 650kB-  Seq Scan on 
pivotgood pg  (cost=0.00..7537.12 rows=251212 width=134) (actual 
time=0.038..115.319 rows=251212 loops=1)Total runtime: 38406.927 ms

By no means I am trying to compare the 2 products. When I noticed the slow 
behavior of COALESCE I tried it on SQL Server. And since they are running on 
the same machine my comment regarding apples to apples. It is possible that 
this is not an apples to apples comparison other than the fact that it is 
running on the same machine.

 From: pavel.steh...@gmail.com
 Date: Thu, 18 Nov 2010 07:14:24 +0100
 Subject: Re: Query Performance SQL Server vs. Postgresql
 To: huma...@hotmail.com
 CC: pgsql-performance@postgresql.org
 
 Hello,
 
 there should be a problem in a statistic, they are out of reality.
 Please, try to use a DISTINCT OF operator now - maybe a statistic will
 be better. Next -  try to increase a work_mem. Hash join is
 untypically slow in your comp.
 
 Regards
 
 Pavel Stehule
 
 2010/11/17 Humair Mohammed huma...@hotmail.com:
  Yes strange indeed, I did rerun ANALYZE and VACCUM. Took 70 seconds to rerun
  the query. Results from EXPLAIN ANALYZE below:
  Hash Join  (cost=16212.30..52586.43 rows=92869 width=17) (actual
  time=43200.223..49502.874 rows=3163 loops=1)
Hash Cond: (((pb.id)::text = (pg.id)::text) AND ((pb.question)::text =
  (pg.question)::text))
Join Filter: ((COALESCE(pb.response, 'MISSING'::character varying))::text
   (COALESCE(pg.response, 'MISSING'::character varying))::text)
-  Seq Scan on pivotbad pb  (cost=0.00..2804.96 rows=93496 width=134)
  (actual time=0.009..48.200 rows=93496 loops=1)
-  Hash  (cost=7537.12..7537.12 rows=251212 width=134) (actual
  time=42919.453..42919.453 rows=251212 loops=1)
  Buckets: 1024  Batches: 64  Memory Usage: 650kB
  -  Seq Scan on pivotgood pg  (cost=0.00..7537.12 rows=251212
  width=134) (actual time=0.119..173.019 rows=251212 loops=1)
  Total runtime: 49503.450 ms
 
  From: pavel.steh...@gmail.com
  Date: Wed, 17 Nov 2010 05:47:51 +0100
  Subject: Re: Query Performance SQL Server vs. Postgresql
  To: huma...@hotmail.com
  CC: pgsql-performance@postgresql.org
 
  2010/11/17 Humair Mohammed huma...@hotmail.com:
  
   There are no indexes on the tables either in SQL Server or Postgresql -
   I am
   comparing apples to apples here. I ran ANALYZE on the postgresql tables,
   after that query performance times are still high 42 seconds with
   COALESCE
   and 35 seconds with IS DISTINCT FROM.
   Here is the execution plan from Postgresql for qurey - select pb.id from
   pivotbad pb inner join pivotgood pg on pb.id = pg.id and pb.question =
   pg.question and coalesce(pb.response,'MISSING') 
   coalesce(pg.response,'MISSING')
   Execution Time: 42 seconds
   Hash Join  (cost=16212.30..48854.24 rows=93477 width=17)
 Hash Cond: (((pb.id)::text = (pg.id)::text) AND ((pb.question)::text
   =
   

Re: [PERFORM] Query Performance SQL Server vs. Postgresql

2010-11-21 Thread Robert Haas
On Nov 21, 2010, at 12:16 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 t...@fuzzy.cz writes:
 Second, I modified the work_mem setting to 2GB (reloaded config) and I see
 a response time of 38 seconds. Results below from EXPLAIN ANALYZE:
 
 How did you reload the config? Using 'kill -HUP pid'? That should work
 fine. Have you cheched 'work_mem' after the reload?
 
 Because the explain plans are exactly the same (structure, estimated
 costs). The really interesting bit is this and it did not change at all
 
   Buckets: 1024 Batches: 64  Memory Usage: 650kB
 
 If that didn't change, I'm prepared to bet that the OP didn't actually
 manage to change the active value of work_mem.

Yep.  All this speculation about slow disks and/or COALESCE strikes me as 
likely totally off-base. I think the original poster needs to run show 
work_mem right before the EXPLAIN ANALYZE to make sure the new value they set 
actually stuck. There's no reason for the planner to have used only 650kB if 
work_mem is set to anything =2MB.

...Robert
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query Performance SQL Server vs. Postgresql

2010-11-20 Thread Pavel Stehule
Hello

I don't know. I checked similar query on similar dataset (size), and I
have a times about 3 sec on much more worse notebook. So problem can
be in disk IO operation speed - maybe in access to TOASTed value.

2010/11/21 Humair Mohammed huma...@hotmail.com:
 I am running 64-bit Windows 7 Enterprise with 8G RAM and a Dual Core 2.67
 Ghz Intel CPU. Both the SQL 2008 R2 and Postgresql are installed on the same
 machine. The DISTINCT FROM instead of the COALESCE does not help much. I ran
 2 further tests with work_mem modifications (please note memory usage is
 quite low 650kb, so I am not sure if the work_mem is a factor):

it's has a little bit different meaning. work_mem is just limit, so
memory usage must not be great than work_mem ever. if then pg
increase butches number - store data to blocks on disk. Higher
work_mem ~ less butches. So ideal is 1 butches.

Regards

Pavel Stehule

 First, I modified the work_mem setting to 1GB (reloaded config) from the
 default 1MB and I see a response time of 33 seconds. Results below from
 EXPLAIN ANALYZE:
 Hash Join  (cost=16212.30..52586.43 rows=92869 width=17) (actual
 time=26742.343..33274.317 rows=3163 loops=1)
   Hash Cond: (((pb.id)::text = (pg.id)::text) AND ((pb.question)::text =
 (pg.question)::text))
   Join Filter: ((COALESCE(pb.response, 'MISSING'::character varying))::text
  (COALESCE(pg.response, 'MISSING'::character varying))::text)
   -  Seq Scan on pivotbad pb  (cost=0.00..2804.96 rows=93496 width=134)
 (actual time=0.055..40.710 rows=93496 loops=1)
   -  Hash  (cost=7537.12..7537.12 rows=251212 width=134) (actual
 time=25603.460..25603.460 rows=251212 loops=1)
         Buckets: 1024  Batches: 64  Memory Usage: 650kB
         -  Seq Scan on pivotgood pg  (cost=0.00..7537.12 rows=251212
 width=134) (actual time=0.050..120.269 rows=251212 loops=1)
 Total runtime: 33275.028 ms

 Second, I modified the work_mem setting to 2GB (reloaded config) and I see a
 response time of 38 seconds. Results below from EXPLAIN ANALYZE:
 Hash Join  (cost=16212.30..52586.43 rows=92869 width=17) (actual
 time=26574.459..38406.422 rows=3163 loops=1)
   Hash Cond: (((pb.id)::text = (pg.id)::text) AND ((pb.question)::text =
 (pg.question)::text))
   Join Filter: ((COALESCE(pb.response, 'MISSING'::character varying))::text
  (COALESCE(pg.response, 'MISSING'::character varying))::text)
   -  Seq Scan on pivotbad pb  (cost=0.00..2804.96 rows=93496 width=134)
 (actual time=0.067..37.938 rows=93496 loops=1)
   -  Hash  (cost=7537.12..7537.12 rows=251212 width=134) (actual
 time=26426.127..26426.127 rows=251212 loops=1)
         Buckets: 1024  Batches: 64  Memory Usage: 650kB
         -  Seq Scan on pivotgood pg  (cost=0.00..7537.12 rows=251212
 width=134) (actual time=0.038..115.319 rows=251212 loops=1)
 Total runtime: 38406.927 ms

 By no means I am trying to compare the 2 products. When I noticed the slow
 behavior of COALESCE I tried it on SQL Server. And since they are running on
 the same machine my comment regarding apples to apples. It is possible that
 this is not an apples to apples comparison other than the fact that it is
 running on the same machine.

 From: pavel.steh...@gmail.com
 Date: Thu, 18 Nov 2010 07:14:24 +0100
 Subject: Re: Query Performance SQL Server vs. Postgresql
 To: huma...@hotmail.com
 CC: pgsql-performance@postgresql.org

 Hello,

 there should be a problem in a statistic, they are out of reality.
 Please, try to use a DISTINCT OF operator now - maybe a statistic will
 be better. Next - try to increase a work_mem. Hash join is
 untypically slow in your comp.

 Regards

 Pavel Stehule

 2010/11/17 Humair Mohammed huma...@hotmail.com:
  Yes strange indeed, I did rerun ANALYZE and VACCUM. Took 70 seconds to
  rerun
  the query. Results from EXPLAIN ANALYZE below:
  Hash Join  (cost=16212.30..52586.43 rows=92869 width=17) (actual
  time=43200.223..49502.874 rows=3163 loops=1)
    Hash Cond: (((pb.id)::text = (pg.id)::text) AND ((pb.question)::text
  =
  (pg.question)::text))
    Join Filter: ((COALESCE(pb.response, 'MISSING'::character
  varying))::text
   (COALESCE(pg.response, 'MISSING'::character varying))::text)
    -  Seq Scan on pivotbad pb  (cost=0.00..2804.96 rows=93496
  width=134)
  (actual time=0.009..48.200 rows=93496 loops=1)
    -  Hash  (cost=7537.12..7537.12 rows=251212 width=134) (actual
  time=42919.453..42919.453 rows=251212 loops=1)
          Buckets: 1024  Batches: 64  Memory Usage: 650kB
          -  Seq Scan on pivotgood pg  (cost=0.00..7537.12 rows=251212
  width=134) (actual time=0.119..173.019 rows=251212 loops=1)
  Total runtime: 49503.450 ms
 
  From: pavel.steh...@gmail.com
  Date: Wed, 17 Nov 2010 05:47:51 +0100
  Subject: Re: Query Performance SQL Server vs. Postgresql
  To: huma...@hotmail.com
  CC: pgsql-performance@postgresql.org
 
  2010/11/17 Humair Mohammed huma...@hotmail.com:
  
   There are no indexes on the tables either in SQL Server or Postgresql
   -
   I am
   comparing apples to apples here. I ran ANALYZE 

Re: [PERFORM] Query Performance SQL Server vs. Postgresql

2010-11-18 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote:
 
 I have no idea how much memory SQL Server thinks it can use
 
Hmmm...  That triggered an old memory -- when we were running SQL
Server on Windows there was some registry setting which we tweaked
to prevent the OS from trying to cache disk I/O.  (Sorry I don't
remember the name of it.)  That helped SQL Server perform better,
but would cripple PostgreSQL -- it counts on OS caching.
 
Of course, once we found that PostgreSQL was 70% faster on identical
hardware with identical load, and switching the OS to Linux brought
it to twice as fast, I haven't had to worry about SQL Server or
Windows configurations.  ;-)  Don't panic if PostgreSQL seems slower
at first, it's probably a configuration or maintenance schedule
issue that can be sorted out.
 
Besides the specific advice Tom gave you, you might want to browse
this page for configuration in general:
 
http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
 
And if you continue to experience performance issues, this page can
help you get to a resolution quickly:
 
http://wiki.postgresql.org/wiki/SlowQueryQuestions
 
We've been very happy with the switch to PostgreSQL.  We've had
better performance, better reliability, less staff time needed to
babysit backups, and we've been gradually using more of the advance
features not available in other products.  It's well worth the
effort to get over those initial bumps resulting from product
differences.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query Performance SQL Server vs. Postgresql

2010-11-17 Thread Tomas Vondra
Dne 17.11.2010 05:47, Pavel Stehule napsal(a):
 2010/11/17 Humair Mohammed huma...@hotmail.com:

 There are no indexes on the tables either in SQL Server or Postgresql - I am
 comparing apples to apples here. I ran ANALYZE on the postgresql tables,

Actually no, you're not comparing apples to apples. You've provided so
little information that you may be comparing apples to cucumbers or
maybe some strange animals.

1) info about the install

What OS is this running on? I guess it's Windows in both cases, right?

How nuch memory is there? What is the size of shared_buffers? The
default PostgreSQL settings is very very very limited, you have to bump
it to a much larger value.

What are the other inportant settings (e.g. the work_mem)?

2) info about the dataset

How large are the tables? I don't mean number of rows, I mean number of
blocks / occupied disk space. Run this query

SELECT relname, relpages, reltuples, pg_size_pretty(pg_table_size(oid))
FROM pg_class WHERE relname IN ('table1', 'table2');

3) info about the plan

Please, provide EXPLAIN ANALYZE output, maybe with info about buffers,
e.g. something like

EXPLAIN (ANALYZE ON, BUFFERS ON) SELECT ...

4) no indexes ?

Why have you decided not to use any indexes? If you want a decent
performance, you will have to use indexes. Obviously there is some
overhead associated with them, but it's premature optimization unless
you prove the opposite.

BTW I'm not a MSSQL expert, but it seems like it's building a bitmap
index on the fly, to synchronize parallelized query - PostgreSQL does
not support that.

regards
Tomas

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query Performance SQL Server vs. Postgresql

2010-11-17 Thread Rich
I have to concur.  Sql is written specifially and only for Windows. It is
optimized for windows.  Postgreal is writeen for just about everything
trying to use common code so there isn't much optimization because it has to
be optimized based on the OS that is running it.  Check out your config and
send it to us.  That would include the OS and hardware configs for both
machines.

On Wed, Nov 17, 2010 at 3:47 PM, Tomas Vondra t...@fuzzy.cz wrote:

 Dne 17.11.2010 05:47, Pavel Stehule napsal(a):
  2010/11/17 Humair Mohammed huma...@hotmail.com:
 
  There are no indexes on the tables either in SQL Server or Postgresql -
 I am
  comparing apples to apples here. I ran ANALYZE on the postgresql tables,

 Actually no, you're not comparing apples to apples. You've provided so
 little information that you may be comparing apples to cucumbers or
 maybe some strange animals.

 1) info about the install

 What OS is this running on? I guess it's Windows in both cases, right?

 How nuch memory is there? What is the size of shared_buffers? The
 default PostgreSQL settings is very very very limited, you have to bump
 it to a much larger value.

 What are the other inportant settings (e.g. the work_mem)?

 2) info about the dataset

 How large are the tables? I don't mean number of rows, I mean number of
 blocks / occupied disk space. Run this query

 SELECT relname, relpages, reltuples, pg_size_pretty(pg_table_size(oid))
 FROM pg_class WHERE relname IN ('table1', 'table2');

 3) info about the plan

 Please, provide EXPLAIN ANALYZE output, maybe with info about buffers,
 e.g. something like

 EXPLAIN (ANALYZE ON, BUFFERS ON) SELECT ...

 4) no indexes ?

 Why have you decided not to use any indexes? If you want a decent
 performance, you will have to use indexes. Obviously there is some
 overhead associated with them, but it's premature optimization unless
 you prove the opposite.

 BTW I'm not a MSSQL expert, but it seems like it's building a bitmap
 index on the fly, to synchronize parallelized query - PostgreSQL does
 not support that.

 regards
 Tomas

 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance



Re: [PERFORM] Query Performance SQL Server vs. Postgresql

2010-11-17 Thread Mladen Gogala
I thought that I've seen an announcement about the SQL Server for Linux on 
04/01/2005? I cannot find the link right now, but I am quite certain that there 
was such an announcement.


From: pgsql-performance-ow...@postgresql.org 
pgsql-performance-ow...@postgresql.org
To: Tomas Vondra t...@fuzzy.cz
Cc: pgsql-performance@postgresql.org pgsql-performance@postgresql.org
Sent: Wed Nov 17 15:51:55 2010
Subject: Re: [PERFORM] Query Performance SQL Server vs. Postgresql

I have to concur.  Sql is written specifially and only for Windows. It is 
optimized for windows.  Postgreal is writeen for just about everything trying 
to use common code so there isn't much optimization because it has to be 
optimized based on the OS that is running it.  Check out your config and send 
it to us.  That would include the OS and hardware configs for both machines.

On Wed, Nov 17, 2010 at 3:47 PM, Tomas Vondra 
t...@fuzzy.czmailto:t...@fuzzy.cz wrote:
Dne 17.11.2010 05:47, Pavel Stehule napsal(a):
 2010/11/17 Humair Mohammed huma...@hotmail.commailto:huma...@hotmail.com:

 There are no indexes on the tables either in SQL Server or Postgresql - I am
 comparing apples to apples here. I ran ANALYZE on the postgresql tables,

Actually no, you're not comparing apples to apples. You've provided so
little information that you may be comparing apples to cucumbers or
maybe some strange animals.

1) info about the install

What OS is this running on? I guess it's Windows in both cases, right?

How nuch memory is there? What is the size of shared_buffers? The
default PostgreSQL settings is very very very limited, you have to bump
it to a much larger value.

What are the other inportant settings (e.g. the work_mem)?

2) info about the dataset

How large are the tables? I don't mean number of rows, I mean number of
blocks / occupied disk space. Run this query

SELECT relname, relpages, reltuples, pg_size_pretty(pg_table_size(oid))
FROM pg_class WHERE relname IN ('table1', 'table2');

3) info about the plan

Please, provide EXPLAIN ANALYZE output, maybe with info about buffers,
e.g. something like

EXPLAIN (ANALYZE ON, BUFFERS ON) SELECT ...

4) no indexes ?

Why have you decided not to use any indexes? If you want a decent
performance, you will have to use indexes. Obviously there is some
overhead associated with them, but it's premature optimization unless
you prove the opposite.

BTW I'm not a MSSQL expert, but it seems like it's building a bitmap
index on the fly, to synchronize parallelized query - PostgreSQL does
not support that.

regards
Tomas

--
Sent via pgsql-performance mailing list 
(pgsql-performance@postgresql.orgmailto:pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



Re: [PERFORM] Query Performance SQL Server vs. Postgresql

2010-11-17 Thread Humair Mohammed

Yes strange indeed, I did rerun ANALYZE and VACCUM. Took 70 seconds to rerun 
the query. Results from EXPLAIN ANALYZE below:
Hash Join  (cost=16212.30..52586.43 rows=92869 width=17) (actual 
time=43200.223..49502.874 rows=3163 loops=1)  Hash Cond: (((pb.id)::text = 
(pg.id)::text) AND ((pb.question)::text = (pg.question)::text))  Join Filter: 
((COALESCE(pb.response, 'MISSING'::character varying))::text  
(COALESCE(pg.response, 'MISSING'::character varying))::text)  -  Seq Scan on 
pivotbad pb  (cost=0.00..2804.96 rows=93496 width=134) (actual 
time=0.009..48.200 rows=93496 loops=1)  -  Hash  (cost=7537.12..7537.12 
rows=251212 width=134) (actual time=42919.453..42919.453 rows=251212 loops=1) 
   Buckets: 1024  Batches: 64  Memory Usage: 650kB-  Seq Scan on 
pivotgood pg  (cost=0.00..7537.12 rows=251212 width=134) (actual 
time=0.119..173.019 rows=251212 loops=1)Total runtime: 49503.450 ms

 From: pavel.steh...@gmail.com
 Date: Wed, 17 Nov 2010 05:47:51 +0100
 Subject: Re: Query Performance SQL Server vs. Postgresql
 To: huma...@hotmail.com
 CC: pgsql-performance@postgresql.org
 
 2010/11/17 Humair Mohammed huma...@hotmail.com:
 
  There are no indexes on the tables either in SQL Server or Postgresql - I am
  comparing apples to apples here. I ran ANALYZE on the postgresql tables,
  after that query performance times are still high 42 seconds with COALESCE
  and 35 seconds with IS DISTINCT FROM.
  Here is the execution plan from Postgresql for qurey - select pb.id from
  pivotbad pb inner join pivotgood pg on pb.id = pg.id and pb.question =
  pg.question and coalesce(pb.response,'MISSING') 
  coalesce(pg.response,'MISSING')
  Execution Time: 42 seconds
  Hash Join  (cost=16212.30..48854.24 rows=93477 width=17)
Hash Cond: (((pb.id)::text = (pg.id)::text) AND ((pb.question)::text =
  (pg.question)::text))
Join Filter: ((COALESCE(pb.response, 'MISSING'::character varying))::text
   (COALESCE(pg.response, 'MISSING'::character varying))::text)
-  Seq Scan on pivotbad pb  (cost=0.00..2804.96 rows=93496 width=134)
-  Hash  (cost=7537.12..7537.12 rows=251212 width=134)
  -  Seq Scan on pivotgood pg  (cost=0.00..7537.12 rows=251212
  width=134)
 
 this is little bit strange - did you ANALYZE and VACUUM?
 
 please send result of EXPLAIN ANALYZE
 
 Pavel
 
 
  And here is the execution plan from SQL Server for query - select pb.id from
  pivotbad pb inner join pivotgood pg on pb.id = pg.id and pb.question =
  pg.question and isnull(pb.response,'ISNULL')  isnull(pg.response,'ISNULL')
  Execution Time:  1 second
  Cost: 1%  |--Parallelism(Gather Streams)
  Cost: 31%   |--Hash Match(Inner Join, HASH:([pb].[ID],
  [pb].[Question])=([pg].[ID], [pg].[Question]),
  RESIDUAL:([master].[dbo].[pivotbad].[ID] as
  [pb].[ID]=[master].[dbo].[pivotgood].[ID] as [pg].[ID] AND
  [master].[dbo].[pivotbad].[Question] as
  [pb].[Question]=[master].[dbo].[pivotgood].[Question] as [pg].[Question] AND
  [Expr1006][Expr1007]))
  Cost: 0%  |--Bitmap(HASH:([pb].[ID], [pb].[Question]),
  DEFINE:([Bitmap1008]))
  Cost: 0%|--Compute
  Scalar(DEFINE:([Expr1006]=isnull([master].[dbo].[pivotbad].[Response] as
  [pb].[Response],'ISNULL')))
  Cost:  6%   |--Parallelism(Repartition Streams, Hash
  Partitioning, PARTITION COLUMNS:([pb].[ID], [pb].[Question]))
  Cost: 12%  |--Table Scan(OBJECT:([master].[dbo].[pivotbad] AS
  [pb]))
  Cost: 0% |--Compute
  Scalar(DEFINE:([Expr1007]=isnull([master].[dbo].[pivotgood].[Response] as
  [pg].[Response],'ISNULL')))
  Cost: 17% |--Parallelism(Repartition Streams, Hash
  Partitioning, PARTITION COLUMNS:([pg].[ID], [pg].[Question]))
  Cost: 33% |--Table
  Scan(OBJECT:([master].[dbo].[pivotgood] AS [pg]),
  WHERE:(PROBE([Bitmap1008],[master].[dbo].[pivotgood].[ID] as
  [pg].[ID],[master].[dbo].[pivotgood].[Question] as [pg].[Question])))
 
 
 
  From: pavel.steh...@gmail.com
  Date: Tue, 16 Nov 2010 08:12:03 +0100
  Subject: Re: [PERFORM]
  To: huma...@hotmail.com
  CC: pgsql-performance@postgresql.org
 
  2010/11/15 Humair Mohammed huma...@hotmail.com:
   I have 2 tables with a 200,000 rows of data 3 character/string columns
   ID,
   Question and Response. The query below compares the data between the 2
   tables based on ID and Question and if the Response does not match
   between
   the left table and the right table it identifies the ID's where there is
   a
   mismatch. Running the query in SQL Server 2008 using the ISNULL function
   take a few milliseconds. Running the same query in Postgresql takes over
   70
   seconds. The 2 queries are below:
   SQL Server 2008 R2 Query
   select t1.id from table1 t1 inner join table2 t2 on t1.id = t2.id and
   t1.question = t2.question and isnull(t1.response,'ISNULL') 
   isnull(t2.response,'ISNULL')
 
   Postgres 9.1 Query
   select t1.id from table1 t1 inner join table2 t2 on t1.id = t2.id and
   t1.question = t2.question and 

Re: [PERFORM] Query Performance SQL Server vs. Postgresql

2010-11-17 Thread Humair Mohammed


There are no indexes on the tables either in SQL Server or Postgresql - I am 
comparing apples to apples here. I ran ANALYZE on the postgresql tables, after 
that query performance times are still high 42 seconds with COALESCE and 35 
seconds with IS DISTINCT FROM.
Here is the execution plan from Postgresql for qurey - select pb.id from 
pivotbad pb inner join pivotgood pg on pb.id = pg.id and pb.question = 
pg.question and coalesce(pb.response,'MISSING')  
coalesce(pg.response,'MISSING')
Execution Time: 42 seconds
Hash Join  (cost=16212.30..48854.24 rows=93477 width=17)  Hash Cond: 
(((pb.id)::text = (pg.id)::text) AND ((pb.question)::text = 
(pg.question)::text))  Join Filter: ((COALESCE(pb.response, 
'MISSING'::character varying))::text  (COALESCE(pg.response, 
'MISSING'::character varying))::text)  -  Seq Scan on pivotbad pb  
(cost=0.00..2804.96 rows=93496 width=134)  -  Hash  (cost=7537.12..7537.12 
rows=251212 width=134)-  Seq Scan on pivotgood pg  
(cost=0.00..7537.12 rows=251212 width=134)

And here is the execution plan from SQL Server for query - select pb.id from 
pivotbad pb inner join pivotgood pg on pb.id = pg.id and pb.question = 
pg.question and isnull(pb.response,'ISNULL')  isnull(pg.response,'ISNULL')
Execution Time:  1 second
Cost: 1%  |--Parallelism(Gather Streams)Cost: 31%   |--Hash Match(Inner 
Join, HASH:([pb].[ID], [pb].[Question])=([pg].[ID], [pg].[Question]), 
RESIDUAL:([master].[dbo].[pivotbad].[ID] as 
[pb].[ID]=[master].[dbo].[pivotgood].[ID] as [pg].[ID] AND 
[master].[dbo].[pivotbad].[Question] as 
[pb].[Question]=[master].[dbo].[pivotgood].[Question] as [pg].[Question] AND 
[Expr1006][Expr1007]))Cost: 0%  |--Bitmap(HASH:([pb].[ID], 
[pb].[Question]), DEFINE:([Bitmap1008]))Cost: 0%|--Compute 
Scalar(DEFINE:([Expr1006]=isnull([master].[dbo].[pivotbad].[Response] as 
[pb].[Response],'ISNULL')))Cost:  6%   |--Parallelism(Repartition 
Streams, Hash Partitioning, PARTITION COLUMNS:([pb].[ID], [pb].[Question])) 
   Cost: 12%  |--Table Scan(OBJECT:([master].[dbo].[pivotbad] AS [pb])) 
   Cost: 0% |--Compute 
Scalar(DEFINE:([Expr1007]=isnull([master].[dbo].[pivotgood].[Response] as 
[pg].[Response],'ISNULL')))Cost: 17% |--Parallelism(Repartition 
Streams, Hash Partitioning, PARTITION COLUMNS:([pg].[ID], [pg].[Question])) 
   Cost: 33% |--Table Scan(OBJECT:([master].[dbo].[pivotgood] AS 
[pg]), WHERE:(PROBE([Bitmap1008],[master].[dbo].[pivotgood].[ID] as 
[pg].[ID],[master].[dbo].[pivotgood].[Question] as [pg].[Question])))



 From: pavel.steh...@gmail.com
 Date: Tue, 16 Nov 2010 08:12:03 +0100
 Subject: Re: [PERFORM]
 To: huma...@hotmail.com
 CC: pgsql-performance@postgresql.org
 
 2010/11/15 Humair Mohammed huma...@hotmail.com:
  I have 2 tables with a 200,000 rows of data 3 character/string columns ID,
  Question and Response. The query below compares the data between the 2
  tables based on ID and Question and if the Response does not match between
  the left table and the right table it identifies the ID's where there is a
  mismatch. Running the query in SQL Server 2008 using the ISNULL function
  take a few milliseconds. Running the same query in Postgresql takes over 70
  seconds. The 2 queries are below:
  SQL Server 2008 R2 Query
  select t1.id from table1 t1 inner join table2 t2 on t1.id = t2.id and
  t1.question = t2.question and isnull(t1.response,'ISNULL') 
  isnull(t2.response,'ISNULL')
 
  Postgres 9.1 Query
  select t1.id from table1 t1 inner join table2 t2 on t1.id = t2.id and
  t1.question = t2.question and coalesce(t1.response,'ISNULL') 
  coalesce(t2.response,'ISNULL')
  What gives?
 
 I think, so must problem can be in ugly predicate
 coalesce(t1.response,'ISNULL') 
  coalesce(t2.response,'ISNULL')
 
 try use a IS DISTINCT OF operator
 
 ... AND t1.response IS DISTINCT t2.response
 
 Regards
 
 Pavel Stehule
 
 p.s. don't use a coalesce in WHERE clause if it is possible.
  

Re: [PERFORM] Query Performance SQL Server vs. Postgresql

2010-11-17 Thread Tom Lane
Humair Mohammed huma...@hotmail.com writes:
 Yes strange indeed, I did rerun ANALYZE and VACCUM. Took 70 seconds to rerun 
 the query. Results from EXPLAIN ANALYZE below:
 Hash Join  (cost=16212.30..52586.43 rows=92869 width=17) (actual 
 time=43200.223..49502.874 rows=3163 loops=1)  Hash Cond: (((pb.id)::text = 
 (pg.id)::text) AND ((pb.question)::text = (pg.question)::text))  Join 
 Filter: ((COALESCE(pb.response, 'MISSING'::character varying))::text  
 (COALESCE(pg.response, 'MISSING'::character varying))::text)  -  Seq Scan 
 on pivotbad pb  (cost=0.00..2804.96 rows=93496 width=134) (actual 
 time=0.009..48.200 rows=93496 loops=1)  -  Hash  (cost=7537.12..7537.12 
 rows=251212 width=134) (actual time=42919.453..42919.453 rows=251212 
 loops=1)Buckets: 1024  Batches: 64  Memory Usage: 650kB- 
  Seq Scan on pivotgood pg  (cost=0.00..7537.12 rows=251212 width=134) (actual 
 time=0.119..173.019 rows=251212 loops=1)Total runtime: 49503.450 ms

I have no idea how much memory SQL Server thinks it can use, but
Postgres is limiting itself to work_mem which you've apparently left at
the default 1MB.  You might get a fairer comparison by bumping that up
some --- try 32MB or so.  You want it high enough so that the Hash
output doesn't say there are multiple batches.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query Performance SQL Server vs. Postgresql

2010-11-17 Thread Pavel Stehule
Hello,

there should be a problem in a statistic, they are out of reality.
Please, try to use a DISTINCT OF operator now - maybe a statistic will
be better. Next -  try to increase a work_mem. Hash join is
untypically slow in your comp.

Regards

Pavel Stehule

2010/11/17 Humair Mohammed huma...@hotmail.com:
 Yes strange indeed, I did rerun ANALYZE and VACCUM. Took 70 seconds to rerun
 the query. Results from EXPLAIN ANALYZE below:
 Hash Join  (cost=16212.30..52586.43 rows=92869 width=17) (actual
 time=43200.223..49502.874 rows=3163 loops=1)
   Hash Cond: (((pb.id)::text = (pg.id)::text) AND ((pb.question)::text =
 (pg.question)::text))
   Join Filter: ((COALESCE(pb.response, 'MISSING'::character varying))::text
  (COALESCE(pg.response, 'MISSING'::character varying))::text)
   -  Seq Scan on pivotbad pb  (cost=0.00..2804.96 rows=93496 width=134)
 (actual time=0.009..48.200 rows=93496 loops=1)
   -  Hash  (cost=7537.12..7537.12 rows=251212 width=134) (actual
 time=42919.453..42919.453 rows=251212 loops=1)
         Buckets: 1024  Batches: 64  Memory Usage: 650kB
         -  Seq Scan on pivotgood pg  (cost=0.00..7537.12 rows=251212
 width=134) (actual time=0.119..173.019 rows=251212 loops=1)
 Total runtime: 49503.450 ms

 From: pavel.steh...@gmail.com
 Date: Wed, 17 Nov 2010 05:47:51 +0100
 Subject: Re: Query Performance SQL Server vs. Postgresql
 To: huma...@hotmail.com
 CC: pgsql-performance@postgresql.org

 2010/11/17 Humair Mohammed huma...@hotmail.com:
 
  There are no indexes on the tables either in SQL Server or Postgresql -
  I am
  comparing apples to apples here. I ran ANALYZE on the postgresql tables,
  after that query performance times are still high 42 seconds with
  COALESCE
  and 35 seconds with IS DISTINCT FROM.
  Here is the execution plan from Postgresql for qurey - select pb.id from
  pivotbad pb inner join pivotgood pg on pb.id = pg.id and pb.question =
  pg.question and coalesce(pb.response,'MISSING') 
  coalesce(pg.response,'MISSING')
  Execution Time: 42 seconds
  Hash Join  (cost=16212.30..48854.24 rows=93477 width=17)
    Hash Cond: (((pb.id)::text = (pg.id)::text) AND ((pb.question)::text
  =
  (pg.question)::text))
    Join Filter: ((COALESCE(pb.response, 'MISSING'::character
  varying))::text
   (COALESCE(pg.response, 'MISSING'::character varying))::text)
    -  Seq Scan on pivotbad pb  (cost=0.00..2804.96 rows=93496
  width=134)
    -  Hash  (cost=7537.12..7537.12 rows=251212 width=134)
          -  Seq Scan on pivotgood pg  (cost=0.00..7537.12 rows=251212
  width=134)

 this is little bit strange - did you ANALYZE and VACUUM?

 please send result of EXPLAIN ANALYZE

 Pavel

 
  And here is the execution plan from SQL Server for query - select pb.id
  from
  pivotbad pb inner join pivotgood pg on pb.id = pg.id and pb.question =
  pg.question and isnull(pb.response,'ISNULL')
   isnull(pg.response,'ISNULL')
  Execution Time:  1 second
  Cost: 1%  |--Parallelism(Gather Streams)
  Cost: 31%       |--Hash Match(Inner Join, HASH:([pb].[ID],
  [pb].[Question])=([pg].[ID], [pg].[Question]),
  RESIDUAL:([master].[dbo].[pivotbad].[ID] as
  [pb].[ID]=[master].[dbo].[pivotgood].[ID] as [pg].[ID] AND
  [master].[dbo].[pivotbad].[Question] as
  [pb].[Question]=[master].[dbo].[pivotgood].[Question] as [pg].[Question]
  AND
  [Expr1006][Expr1007]))
      Cost: 0%  |--Bitmap(HASH:([pb].[ID], [pb].[Question]),
  DEFINE:([Bitmap1008]))
              Cost: 0%    |--Compute
  Scalar(DEFINE:([Expr1006]=isnull([master].[dbo].[pivotbad].[Response] as
  [pb].[Response],'ISNULL')))
              Cost:  6%   |--Parallelism(Repartition Streams, Hash
  Partitioning, PARTITION COLUMNS:([pb].[ID], [pb].[Question]))
              Cost: 12%  |--Table Scan(OBJECT:([master].[dbo].[pivotbad]
  AS
  [pb]))
              Cost: 0% |--Compute
  Scalar(DEFINE:([Expr1007]=isnull([master].[dbo].[pivotgood].[Response]
  as
  [pg].[Response],'ISNULL')))
                  Cost: 17% |--Parallelism(Repartition Streams, Hash
  Partitioning, PARTITION COLUMNS:([pg].[ID], [pg].[Question]))
                      Cost: 33% |--Table
  Scan(OBJECT:([master].[dbo].[pivotgood] AS [pg]),
  WHERE:(PROBE([Bitmap1008],[master].[dbo].[pivotgood].[ID] as
  [pg].[ID],[master].[dbo].[pivotgood].[Question] as [pg].[Question])))
 
 
 
  From: pavel.steh...@gmail.com
  Date: Tue, 16 Nov 2010 08:12:03 +0100
  Subject: Re: [PERFORM]
  To: huma...@hotmail.com
  CC: pgsql-performance@postgresql.org
 
  2010/11/15 Humair Mohammed huma...@hotmail.com:
   I have 2 tables with a 200,000 rows of data 3 character/string
   columns
   ID,
   Question and Response. The query below compares the data between the
   2
   tables based on ID and Question and if the Response does not match
   between
   the left table and the right table it identifies the ID's where there
   is
   a
   mismatch. Running the query in SQL Server 2008 using the ISNULL
   function
   take a few milliseconds. Running the same query in Postgresql takes
   over
   70
   seconds. The 2 

Re: [PERFORM] Query Performance SQL Server vs. Postgresql

2010-11-16 Thread Pavel Stehule
2010/11/17 Humair Mohammed huma...@hotmail.com:

 There are no indexes on the tables either in SQL Server or Postgresql - I am
 comparing apples to apples here. I ran ANALYZE on the postgresql tables,
 after that query performance times are still high 42 seconds with COALESCE
 and 35 seconds with IS DISTINCT FROM.
 Here is the execution plan from Postgresql for qurey - select pb.id from
 pivotbad pb inner join pivotgood pg on pb.id = pg.id and pb.question =
 pg.question and coalesce(pb.response,'MISSING') 
 coalesce(pg.response,'MISSING')
 Execution Time: 42 seconds
 Hash Join  (cost=16212.30..48854.24 rows=93477 width=17)
   Hash Cond: (((pb.id)::text = (pg.id)::text) AND ((pb.question)::text =
 (pg.question)::text))
   Join Filter: ((COALESCE(pb.response, 'MISSING'::character varying))::text
  (COALESCE(pg.response, 'MISSING'::character varying))::text)
   -  Seq Scan on pivotbad pb  (cost=0.00..2804.96 rows=93496 width=134)
   -  Hash  (cost=7537.12..7537.12 rows=251212 width=134)
         -  Seq Scan on pivotgood pg  (cost=0.00..7537.12 rows=251212
 width=134)

this is little bit strange - did you ANALYZE and VACUUM?

please send result of EXPLAIN ANALYZE

Pavel


 And here is the execution plan from SQL Server for query - select pb.id from
 pivotbad pb inner join pivotgood pg on pb.id = pg.id and pb.question =
 pg.question and isnull(pb.response,'ISNULL')  isnull(pg.response,'ISNULL')
 Execution Time:  1 second
 Cost: 1%  |--Parallelism(Gather Streams)
 Cost: 31%       |--Hash Match(Inner Join, HASH:([pb].[ID],
 [pb].[Question])=([pg].[ID], [pg].[Question]),
 RESIDUAL:([master].[dbo].[pivotbad].[ID] as
 [pb].[ID]=[master].[dbo].[pivotgood].[ID] as [pg].[ID] AND
 [master].[dbo].[pivotbad].[Question] as
 [pb].[Question]=[master].[dbo].[pivotgood].[Question] as [pg].[Question] AND
 [Expr1006][Expr1007]))
     Cost: 0%  |--Bitmap(HASH:([pb].[ID], [pb].[Question]),
 DEFINE:([Bitmap1008]))
             Cost: 0%    |--Compute
 Scalar(DEFINE:([Expr1006]=isnull([master].[dbo].[pivotbad].[Response] as
 [pb].[Response],'ISNULL')))
             Cost:  6%   |--Parallelism(Repartition Streams, Hash
 Partitioning, PARTITION COLUMNS:([pb].[ID], [pb].[Question]))
             Cost: 12%  |--Table Scan(OBJECT:([master].[dbo].[pivotbad] AS
 [pb]))
             Cost: 0% |--Compute
 Scalar(DEFINE:([Expr1007]=isnull([master].[dbo].[pivotgood].[Response] as
 [pg].[Response],'ISNULL')))
                 Cost: 17% |--Parallelism(Repartition Streams, Hash
 Partitioning, PARTITION COLUMNS:([pg].[ID], [pg].[Question]))
                     Cost: 33% |--Table
 Scan(OBJECT:([master].[dbo].[pivotgood] AS [pg]),
 WHERE:(PROBE([Bitmap1008],[master].[dbo].[pivotgood].[ID] as
 [pg].[ID],[master].[dbo].[pivotgood].[Question] as [pg].[Question])))



 From: pavel.steh...@gmail.com
 Date: Tue, 16 Nov 2010 08:12:03 +0100
 Subject: Re: [PERFORM]
 To: huma...@hotmail.com
 CC: pgsql-performance@postgresql.org

 2010/11/15 Humair Mohammed huma...@hotmail.com:
  I have 2 tables with a 200,000 rows of data 3 character/string columns
  ID,
  Question and Response. The query below compares the data between the 2
  tables based on ID and Question and if the Response does not match
  between
  the left table and the right table it identifies the ID's where there is
  a
  mismatch. Running the query in SQL Server 2008 using the ISNULL function
  take a few milliseconds. Running the same query in Postgresql takes over
  70
  seconds. The 2 queries are below:
  SQL Server 2008 R2 Query
  select t1.id from table1 t1 inner join table2 t2 on t1.id = t2.id and
  t1.question = t2.question and isnull(t1.response,'ISNULL') 
  isnull(t2.response,'ISNULL')

  Postgres 9.1 Query
  select t1.id from table1 t1 inner join table2 t2 on t1.id = t2.id and
  t1.question = t2.question and coalesce(t1.response,'ISNULL') 
  coalesce(t2.response,'ISNULL')
  What gives?

 I think, so must problem can be in ugly predicate
 coalesce(t1.response,'ISNULL') 
  coalesce(t2.response,'ISNULL')

 try use a IS DISTINCT OF operator

 ... AND t1.response IS DISTINCT t2.response

 Regards

 Pavel Stehule

 p.s. don't use a coalesce in WHERE clause if it is possible.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance