Re: [PERFORM] strange query plan with LIMIT

2011-06-08 Thread Claudio Freire
On Wed, Jun 8, 2011 at 7:08 AM,  anthony.ship...@symstream.com wrote:
 What seems odd to me is that the only difference between the two is the limit
 clause

Why would that seem odd?

Of course optimally executing a plan with limit is a lot different
than one without.

Just... why are you sorting by diag_id?

I believe you would be better off sorting by timestamp than diag_id,
but I don't know what the query is supposed to do.

In any case, that's a weakness I've seen in many database systems, and
postgres is no exception: order + limit strongly suggests index usage,
and when the ordered column has anti correlation with the where
clause (that is, too many of the first rows in the ordered output are
filtered out by the whereclause), the plan with an index is
insufferably slow compared to a sequential scan + sort.

Postgres has no way to know that, it depends on correlation between
the where clause and the ordering expressions.

If you cannot change the query, I think your only option is to either
add a specific index for that query (ie, if the where clause is always
the same, you could add a partial index), or just disable nested loops
with set enable_nestloop = false; just prior to running that query
(and remember to re-enable afterwards).

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] strange query plan with LIMIT

2011-06-08 Thread tv
 What seems odd to me is that the only difference between the two is the
 limit
 clause:

 select * from tdiag where (create_time = '2011-06-03
 09:49:04.00+0' and create_time  '2011-06-06 09:59:04.00+0') order
 by
 diag_id limit 1;

 select * from tdiag where (create_time = '2011-06-03
 09:49:04.00+0' and create_time  '2011-06-06 09:59:04.00+0') order
 by
 diag_id;

 and yet the plan completely changes.

As Claudio Freire already pointed out, this is expected behavior. With
LIMIT the planner prefers plans with low starting cost, as it expects to
end soon and building index bitmap / hash table would be a waste. So
actually it would be very odd if the plan did not change in this case ...

Anyway I have no idea how to fix this clean - without messing with
enable_* or cost variables or other such dirty tricks.

regards
Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] strange query plan with LIMIT

2011-06-08 Thread anthony . shipman
On Wednesday 08 June 2011 17:39, Claudio Freire wrote:
 Of course optimally executing a plan with limit is a lot different
 than one without.

I imagined that limit just cuts out a slice of the query results. 
If it can find 8 rows in 0.5 seconds then I would have thought that 
returning just the first 100 of them should be just as easy.


 Just... why are you sorting by diag_id?

 I believe you would be better off sorting by timestamp than diag_id,
 but I don't know what the query is supposed to do.

The timestamp is only almost monotonic. I need to scan the table in slices and 
I use limit and offset to select the slice.

I've forced the query order with some pgsql like:

declare
query   character varying;
rec record;
begin
-- PG 8.3 doesn't have the 'using' syntax nor 'return query execute'

execute 'create temporary table tt on commit drop as ' ||
'select diag_id from tdiag ' || v_where;

query = 'select * from tdiag where diag_id in (select * from tt) ' ||
'order by diag_id ' || v_limit || ' ' || v_offset;

for rec in execute query loop
return next rec;
end loop;
end;

-- 
Anthony Shipman | Life is the interval
anthony.ship...@symstream.com   | between pay days.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] strange query plan with LIMIT

2011-06-08 Thread Pavel Stehule
2011/6/8  anthony.ship...@symstream.com:
 On Wednesday 08 June 2011 17:39, Claudio Freire wrote:
 Of course optimally executing a plan with limit is a lot different
 than one without.

 I imagined that limit just cuts out a slice of the query results.
 If it can find 8 rows in 0.5 seconds then I would have thought that
 returning just the first 100 of them should be just as easy.


 Just... why are you sorting by diag_id?

 I believe you would be better off sorting by timestamp than diag_id,
 but I don't know what the query is supposed to do.

 The timestamp is only almost monotonic. I need to scan the table in slices and
 I use limit and offset to select the slice.

 I've forced the query order with some pgsql like:

 declare
    query   character varying;
    rec     record;
 begin
    -- PG 8.3 doesn't have the 'using' syntax nor 'return query execute'

    execute 'create temporary table tt on commit drop as ' ||
        'select diag_id from tdiag ' || v_where;

    query = 'select * from tdiag where diag_id in (select * from tt) ' ||
            'order by diag_id ' || v_limit || ' ' || v_offset;

    for rec in execute query loop
        return next rec;
    end loop;
 end;

if you use FOR statement, there should be a problem in using a
implicit cursor - try to set a GUC cursor_tuple_fraction to 1.0.

Regards

Pavel Stehule



 --
 Anthony Shipman                 | Life is the interval
 anthony.ship...@symstream.com   | between pay days.

 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Set of related slow queries

2011-06-08 Thread John Williams
Hi All,
I'm having issues with a set of fairly related queries in my
application.  EXPLAIN ANALYZE is showing them all to be getting stuck
performing roughly the same operation:
   -  Bitmap Heap Scan on logparser_entry
(cost=4119.06..21520.55 rows=68787 width=8) (actual
time=107.032..444.864 rows=16168 loops=1)
 Recheck Cond: ((event_type)::text = ANY
('{Attack,DoT Tick,Critical Attack}'::text[]))
 Filter: target_relation)::text  ALL
('{Other,N/A}'::text[])) OR (NOT (target_relation IS NOT NULL))) AND
(log_id = 2))
 -  Bitmap Index Scan on
logparser_entry_event_type_like  (cost=0.00..4101.86 rows=217733
width=0) (actual time=46.392..46.392 rows=237151 loops=1)
   Index Cond: ((event_type)::text = ANY
('{Attack,DoT Tick,Critical Attack}'::text[]))
   -  Hash  (cost=196.49..196.49 rows=9749 width=23)
(actual time=19.606..19.606 rows=9749 loops=1)

All the queries are being generated by the Django ORM, so they are not
particularly well optimized pretty.  I'd prefer to stay with the ORM
as a lot of the queries are highly variable depending on the request
parameters and so unless their are huge gains to be had by falling
back to raw SQL it will save me a lot of development time to stay with
the ORM.

The table in question (logparser_entry) currently has 815000 records
(but that only represents a very very small amount compared to what
the production server would have to handle,  as this represents only 2
log objects when I would expect easily 100 or more logs to be uploaded
per day).

Nulls should be rare in the fields.

This was being run on an AWS High CPU medium instance.  Obviously not
enoughfor a produciton system, but I would hope it would be more than
adequate for testing when I'm the only one using the app.  I opted for
High CPU because the system doesn't seem to be IO bound even on a
micro instance (nearly 0 wait time according to top) and barely
touches the RAM even when tuned to be aggressive with memory usage.
At the same time it's running 100% cpu usage.

My server config:
Server Config
 name |
  current_setting
--+---
 version  | PostgreSQL 8.4.8 on i686-pc-linux-gnu,
compiled by GCC gcc-4.4.real (Ubuntu/Linaro 4.4.4-14ubuntu5) 4.4.5,
32-bit
 checkpoint_completion_target | 0.9
 effective_cache_size | 1044MB
 external_pid_file| /var/run/postgresql/8.4-main.pid
 fsync| on
 lc_collate   | en_US.UTF-8
 lc_ctype | en_US.UTF-8
 listen_addresses | *
 log_line_prefix  | %t
 log_min_duration_statement   | 250ms
 max_connections  | 25
 max_stack_depth  | 2MB
 port | 5432
 random_page_cost | 4
 server_encoding  | UTF8
 shared_buffers   | 16MB
 synchronous_commit   | off
 TimeZone | UTC
 unix_socket_directory| /var/run/postgresql
 work_mem | 250MB
(20 rows)

To try to make reading the queries easier I've attached a text file
with the queries and links to EXPLAIN ANALYZE outputs as well as
copied them below.  I've tried a lot to tune these queries, but
nothing seems to work.  The queries always spend a large amount of
time in the same place.  Is there something I missing that could
improve these or even a way to rework my schema to speed things up.

Thanks,
John


SELECT   logparser_entry.id   ,
 logparser_entry.log_id   ,
 logparser_entry.encounter_id ,
 logparser_entry.entry_order  ,
 logparser_entry.timestamp,
 logparser_entry.seconds_since_start  ,
 logparser_entry.event_type   ,
 logparser_entry.actor_id ,
 logparser_entry.actor_relation   ,
 logparser_entry.target_id,
 logparser_entry.target_relation  ,
 logparser_entry.pet_owner_id ,
 logparser_entry.pet_owner_relation   ,
 logparser_entry.pet_target_owner_id  ,
 logparser_entry.pet_target_owner_relation,
 logparser_entry.ability_id   ,
 logparser_entry.effective_value  ,
 logparser_entry.blocked  ,
 logparser_entry.absorbed ,
 logparser_entry.overkill ,
 logparser_entry.overheal ,
 logparser_entry.total_value
FROM logparser_entry
WHERE(
  logparser_entry.log_id = 2
 AND  NOT
  (
   (

Re: [PERFORM] Set of related slow queries

2011-06-08 Thread Craig Ringer

On 8/06/2011 10:58 AM, John Williams wrote:


-   Bitmap Heap Scan on logparser_entry
(cost=4119.06..21520.55 rows=68787 width=8) (actual
time=107.032..444.864 rows=16168 loops=1)
  Recheck Cond: ((event_type)::text = ANY
('{Attack,DoT Tick,Critical Attack}'::text[]))
  Filter: target_relation)::text  ALL
('{Other,N/A}'::text[])) OR (NOT (target_relation IS NOT NULL))) AND
(log_id = 2))
  -   Bitmap Index Scan on
logparser_entry_event_type_like  (cost=0.00..4101.86 rows=217733
width=0) (actual time=46.392..46.392 rows=237151 loops=1)
Index Cond: ((event_type)::text = ANY
('{Attack,DoT Tick,Critical Attack}'::text[]))
-   Hash  (cost=196.49..196.49 rows=9749 width=23)
(actual time=19.606..19.606 rows=9749 loops=1)


Thanks for including explain analyze output.

Is there any chance you can pop the full explains (not just excerpts) in 
here:


http://explain.depesz.com/

?

Big query plans tend to get mangled into unreadable garbage by mail 
clients, unfortunately.


--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] strange query plan with LIMIT

2011-06-08 Thread anthony . shipman
On Wednesday 08 June 2011 18:39, Pavel Stehule wrote:
 if you use FOR statement, there should be a problem in using a
 implicit cursor - try to set a GUC cursor_tuple_fraction to 1.0.
Alas this is mammoth replicator, equivalent to PG 8.3 and it doesn't have that 
parameter.
-- 
Anthony Shipman | It's caches all the way 
anthony.ship...@symstream.com   | down.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] strange query plan with LIMIT

2011-06-08 Thread tv
 On Wednesday 08 June 2011 17:39, Claudio Freire wrote:
 Of course optimally executing a plan with limit is a lot different
 than one without.

 I imagined that limit just cuts out a slice of the query results.
 If it can find 8 rows in 0.5 seconds then I would have thought that
 returning just the first 100 of them should be just as easy.

But that's exactly the problem with LIMIT clause. The planner considers
two choices - index scan with this estimate

Index Scan using tdiag_pkey on tdiag  (cost=0.00..19114765.76
rows=1141019 width=114)

and bitmap index scan with this estimate

Bitmap Heap Scan on tdiag  (cost=25763.48..638085.13 rows=1141019
width=114)

and says - hey, the index scan has much lower starting cost, and I'm using
limit so it's much better! Let's use index scan. But then it finds out it
needs to scan most of the table and that ruins the performance.

Have you tried to create a composite index on those two columns? Not sure
if that helps but I'd try that.

Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] strange query plan with LIMIT

2011-06-08 Thread Pavel Stehule
Hello

2011/6/8  anthony.ship...@symstream.com:
 On Wednesday 08 June 2011 18:39, Pavel Stehule wrote:
 if you use FOR statement, there should be a problem in using a
 implicit cursor - try to set a GUC cursor_tuple_fraction to 1.0.
 Alas this is mammoth replicator, equivalent to PG 8.3 and it doesn't have that
 parameter.

It should be a part of problem - resp. combination with bad statistic.
Maybe you should to rewrite your code to

DECLARE int i = 0;

FOR x IN EXECUTE ''
LOOP
  RETURN NEXT ...
  i := i + 1;
  EXIT WHEN i  limitvar
END LOOP

Regards

Pavel Stehule



 --
 Anthony Shipman                 | It's caches all the way
 anthony.ship...@symstream.com   | down.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Set of related slow queries

2011-06-08 Thread tv
 Thanks for including explain analyze output.

 Is there any chance you can pop the full explains (not just excerpts) in
 here:

 http://explain.depesz.com/

 ?

I believe he already did that - there's a link below each query.

Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Set of related slow queries

2011-06-08 Thread tv
--+-
  shared_buffers   | 16MB
  work_mem | 250MB

This seems a bit suspicious. Are you sure you want to keep the
shared_buffers so small and work_mem so large at the same time? There
probably are workloads where this is the right thing to do, but I doubt
this is the case. Why have you set it like this?

I don't have much experience with running Pg on AWS, but I'd try to
increase the shared buffers to say 512MB and decrease the work_mem to 16MB
(or something like that).

Undersized shared_buffers might actually be part of the problem - to
access a row, the page needs to be loaded into shared_buffers. Even though
the I/O is very fast (or the page is already in the filesystem page
cache), there's some locking etc. that needs to be done. When the cache is
small (e.g. 16MB) then the pages need to be removed and read again
frequently. This might be one of the reasons why the CPU is 100% utilized.

 SELECT   logparser_entry.id   ,
  logparser_entry.log_id   ,
  logparser_entry.encounter_id ,
  logparser_entry.entry_order  ,
  logparser_entry.timestamp,
  logparser_entry.seconds_since_start  ,
  logparser_entry.event_type   ,
  logparser_entry.actor_id ,
  logparser_entry.actor_relation   ,
  logparser_entry.target_id,
  logparser_entry.target_relation  ,
  logparser_entry.pet_owner_id ,
  logparser_entry.pet_owner_relation   ,
  logparser_entry.pet_target_owner_id  ,
  logparser_entry.pet_target_owner_relation,
  logparser_entry.ability_id   ,
  logparser_entry.effective_value  ,
  logparser_entry.blocked  ,
  logparser_entry.absorbed ,
  logparser_entry.overkill ,
  logparser_entry.overheal ,
  logparser_entry.total_value
 FROM logparser_entry
 WHERE(
   logparser_entry.log_id = 2
  AND  NOT
   (
(
 logparser_entry.actor_relation
 IN (E'Other',

 E'N/A')
AND  logparser_entry.actor_relation
 IS NOT NULL
)
   )
  AND  logparser_entry.event_type IN (E'Attack'  ,
  E'DoT Tick',
  E'Critical Attack')
  )
 ORDER BY logparser_entry.entry_order ASC
 LIMIT1
 http://explain.depesz.com/s/vEx

Well, the problem with this is that it needs to evaluate the whole result
set, sort it by entry_order and then get the 1st row. And there's no
index on entry_order, so it has to evaluate the whole result set and then
perform a traditional sort.

Try to create an index on the entry_order column - that might push it
towards index scan (to be honest I don't know if PostgreSQL knows it can
do it this way, so maybe it won't work).

 SELECT   (ROUND(logparser_entry.seconds_since_start / 42)) AS interval,
  SUM(logparser_entry.effective_value)  AS
 effective_value__sum
 FROM logparser_entry
 WHERE(
   logparser_entry.log_id = 2
  AND  NOT
   (
(
 logparser_entry.actor_relation
 IN (E'Other',

 E'N/A')
AND  logparser_entry.actor_relation
 IS NOT NULL
)
   )
  AND  logparser_entry.event_type IN (E'Attack'  ,
  E'DoT Tick',
  E'Critical Attack')
  )
 GROUP BY (ROUND(logparser_entry.seconds_since_start / 42)),
  ROUND(logparser_entry.seconds_since_start  / 42)
 ORDER BY interval ASC
 http://explain.depesz.com/s/Rhb

Hm, this is probably the best plan possible - not sure how to make it
faster. I'd expect a better performance with larger shared_buffers.

 http://explain.depesz.com/s/JUo

Same as above. Good plan, maybe increase shared_buffers?

 http://explain.depesz.com/s/VZA

Same as above. Good plan, maybe increase shared_buffers.

regards
Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Set of related slow queries

2011-06-08 Thread Craig Ringer

On 06/08/2011 07:08 PM, t...@fuzzy.cz wrote:

Thanks for including explain analyze output.

Is there any chance you can pop the full explains (not just excerpts) in
here:

http://explain.depesz.com/

?


I believe he already did that - there's a link below each query.


Gah, I'm blind. Thanks.

--
Craig Ringer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Set of related slow queries

2011-06-08 Thread Shaun Thomas

On 06/08/2011 06:30 AM, t...@fuzzy.cz wrote:


  shared_buffers   | 16MB
  work_mem | 250MB


This seems a bit suspicious. Are you sure you want to keep the
shared_buffers so small and work_mem so large at the same time? There
probably are workloads where this is the right thing to do, but I doubt
this is the case. Why have you set it like this?


I must concur in this case.  I can't imagine any scenario where this 
makes sense. Work-mem is allocated on a per-sort basis, not just per 
session or transaction. So a large query could allocate several of these 
and run your system out of memory and cause the OOM killer to start 
causing trouble.



I don't have much experience with running Pg on AWS, but I'd try to
increase the shared buffers to say 512MB and decrease the work_mem to
16MB (or something like that).


Easily good minimums. But it looks like your AWS only has 1GB of RAM 
(based on your effective_cache_size), so you may only want to increase 
it to 256MB. That said, reduce your work_mem to 8MB to start, and 
increase it in 4MB increments if it's still too low.


With a setting of 16MB, it has to load data in and out of memory 
constantly. Even if the host OS has cached every single block you'll 
ever use, that's only the raw table contents. Processing hundreds of 
thousands of rows still takes time, you just saved yourself the effort 
of fetching them from disk, shared_buffers is still necessary to do 
actual work.


Now... I have some issues with your queries, which are likely the fault 
of the Django ORM, but still consider your analyze:


 http://explain.depesz.com/s/vEx

Your bitmap index scan on logparser is hilarious. The estimates are 
fine. 237k rows in 47ms when it expected 217k. If your table really does 
have 815k rows in it, that's not very selective at all. Then it adds a 
heap scan for the remaining where conditions, and you end up with 100k 
rows it then has to sort. That's never going to be fast. 600ms actually 
isn't terrible for this many rows, and it also explains your high CPU.


Then your next one:

 http://explain.depesz.com/s/Rhb

700ms, mostly because of the HashAggregate caused by grouping by 
round(((seconds_since_start / 42)). You're aggregating by a calculation 
on 100k rows. Again, this will never be fast and 700ms is not terrible 
considering all the extra work the engine's doing. Again, your index 
scan returning everything and the kitchen sink is the root cause. Which 
also is evidenced here:


 http://explain.depesz.com/s/JUo

And here:

http://explain.depesz.com/s/VZA

Everything is being caused because it's always using the 
ogparser_entry_event_type_like index to fetch the initial 200k rows. The 
only way to make this faster is to restrict the rows coming back. For 
instance, since you know these values are coming in every day, why 
search through all of history every time?


Why not get your timestamp column involved? Maybe you only need to look 
at Attack, DoT Tick, and Critical Attack event types for the last day, 
or week, or even month. That alone should drastically reduce your row 
counts and give the engine a much smaller data set to aggregate and sort.


The thing is, the way your queries are currently written, as you get 
more data, this is just going to get worse and worse. Grabbing a quarter 
of a table that just gets bigger every day and then getting aggregates 
(group by, etc) is going to get slower every day unless you can restrict 
the result set with more where clauses. If you need reports on a lot of 
this data on a regular basis, consider running a nightly or hourly batch 
to insert them into a reporting table you can check later.


There's a lot you can do here.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
stho...@peak6.com

__

See  http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Oracle v. Postgres 9.0 query performance

2011-06-08 Thread Tom Lane
Tony Capobianco tcapobia...@prospectiv.com writes:
 pg_dw=# explain CREATE TABLE ecr_opens with (FILLFACTOR=100)
 pg_dw-# as
 pg_dw-# select o.emailcampaignid, count(memberid) opencnt
 pg_dw-#   from openactivity o,ecr_sents s
 pg_dw-#  where s.emailcampaignid = o.emailcampaignid
 pg_dw-#  group by o.emailcampaignid;
  QUERY
 PLAN  
 -
  GroupAggregate  (cost=0.00..1788988.05 rows=9939 width=12)
-  Nested Loop  (cost=0.00..1742467.24 rows=9279316 width=12)
  -  Index Scan using ecr_sents_ecid_idx on ecr_sents s
 (cost=0.00..38.59 rows=479 width=4)
  -  Index Scan using openact_emcamp_idx on openactivity o
 (cost=0.00..3395.49 rows=19372 width=12)
Index Cond: (o.emailcampaignid = s.emailcampaignid)
 (5 rows)

 Should this query be hashing the smaller table on Postgres rather than
 using nested loops?

Yeah, seems like it.  Just for testing purposes, do set enable_nestloop
= 0 and see what plan you get then.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Oracle v. Postgres 9.0 query performance

2011-06-08 Thread Tony Capobianco
pg_dw=# set enable_nestloop =0;
SET
Time: 0.165 ms
pg_dw=# explain CREATE TABLE ecr_opens with (FILLFACTOR=100)
pg_dw-# as
pg_dw-# select o.emailcampaignid, count(memberid) opencnt
pg_dw-#   from openactivity o,ecr_sents s
pg_dw-#  where s.emailcampaignid = o.emailcampaignid
pg_dw-#  group by o.emailcampaignid;
   QUERY
PLAN
-
 HashAggregate  (cost=4391163.81..4391288.05 rows=9939 width=12)
   -  Hash Join  (cost=14.78..4344767.23 rows=9279316 width=12)
 Hash Cond: (o.emailcampaignid = s.emailcampaignid)
 -  Seq Scan on openactivity o  (cost=0.00..3529930.67
rows=192540967 width=12)
 -  Hash  (cost=8.79..8.79 rows=479 width=4)
   -  Seq Scan on ecr_sents s  (cost=0.00..8.79 rows=479
width=4)

Yikes.  Two sequential scans.


On Wed, 2011-06-08 at 11:33 -0400, Tom Lane wrote:
 Tony Capobianco tcapobia...@prospectiv.com writes:
  pg_dw=# explain CREATE TABLE ecr_opens with (FILLFACTOR=100)
  pg_dw-# as
  pg_dw-# select o.emailcampaignid, count(memberid) opencnt
  pg_dw-#   from openactivity o,ecr_sents s
  pg_dw-#  where s.emailcampaignid = o.emailcampaignid
  pg_dw-#  group by o.emailcampaignid;
   QUERY
  PLAN  
  -
   GroupAggregate  (cost=0.00..1788988.05 rows=9939 width=12)
 -  Nested Loop  (cost=0.00..1742467.24 rows=9279316 width=12)
   -  Index Scan using ecr_sents_ecid_idx on ecr_sents s
  (cost=0.00..38.59 rows=479 width=4)
   -  Index Scan using openact_emcamp_idx on openactivity o
  (cost=0.00..3395.49 rows=19372 width=12)
 Index Cond: (o.emailcampaignid = s.emailcampaignid)
  (5 rows)
 
  Should this query be hashing the smaller table on Postgres rather than
  using nested loops?
 
 Yeah, seems like it.  Just for testing purposes, do set enable_nestloop
 = 0 and see what plan you get then.
 
   regards, tom lane
 



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Oracle v. Postgres 9.0 query performance

2011-06-08 Thread Stephen Frost
* Tony Capobianco (tcapobia...@prospectiv.com) wrote:
  HashAggregate  (cost=4391163.81..4391288.05 rows=9939 width=12)
-  Hash Join  (cost=14.78..4344767.23 rows=9279316 width=12)
  Hash Cond: (o.emailcampaignid = s.emailcampaignid)
  -  Seq Scan on openactivity o  (cost=0.00..3529930.67
 rows=192540967 width=12)
  -  Hash  (cost=8.79..8.79 rows=479 width=4)
-  Seq Scan on ecr_sents s  (cost=0.00..8.79 rows=479
 width=4)
 
 Yikes.  Two sequential scans.

Err, isn't that more-or-less exactly what you want here?  The smaller
table is going to be hashed and then you'll traverse the bigger table
and bounce each row off the hash table.  Have you tried actually running
this and seeing how long it takes?  The bigger table doesn't look to be
*that* big, if your i/o subsystem is decent and you've got a lot of
memory available for kernel cacheing, should be quick.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PERFORM] Oracle v. Postgres 9.0 query performance

2011-06-08 Thread Vitalii Tymchyshyn

08.06.11 18:40, Tony Capobianco написав(ла):

pg_dw=# set enable_nestloop =0;
SET
Time: 0.165 ms
pg_dw=# explain CREATE TABLE ecr_opens with (FILLFACTOR=100)
pg_dw-# as
pg_dw-# select o.emailcampaignid, count(memberid) opencnt
pg_dw-#   from openactivity o,ecr_sents s
pg_dw-#  where s.emailcampaignid = o.emailcampaignid
pg_dw-#  group by o.emailcampaignid;
QUERY
PLAN
-
  HashAggregate  (cost=4391163.81..4391288.05 rows=9939 width=12)
-   Hash Join  (cost=14.78..4344767.23 rows=9279316 width=12)
  Hash Cond: (o.emailcampaignid = s.emailcampaignid)
  -   Seq Scan on openactivity o  (cost=0.00..3529930.67
rows=192540967 width=12)
  -   Hash  (cost=8.79..8.79 rows=479 width=4)
-   Seq Scan on ecr_sents s  (cost=0.00..8.79 rows=479
width=4)

Yikes.  Two sequential scans.


Yep. Can you see another options? Either you take each of 479 records 
and try to find matching records in another table using index (first 
plan), or you take both two tables fully (seq scan) and join - second plan.
First plan is better if your large table is clustered enough on 
emailcampaignid field (479 index reads and 479 sequential table reads). 
If it's not, you may get a 479 table reads transformed into a lot or 
random reads.
BTW: May be you have different data clustering in PostgreSQL  Oracle? 
Or data in Oracle may be hot in caches?
Also, sequential scan is not too bad thing. It may be cheap enough to 
read millions of records if they are not too wide. Please show select 
pg_size_pretty(pg_relation_size('openactivity')); Have you tried to 
explain analyze second plan?


Best regards, Vitalii Tymchyshyn




On Wed, 2011-06-08 at 11:33 -0400, Tom Lane wrote:

Tony Capobiancotcapobia...@prospectiv.com  writes:

pg_dw=# explain CREATE TABLE ecr_opens with (FILLFACTOR=100)
pg_dw-# as
pg_dw-# select o.emailcampaignid, count(memberid) opencnt
pg_dw-#   from openactivity o,ecr_sents s
pg_dw-#  where s.emailcampaignid = o.emailcampaignid
pg_dw-#  group by o.emailcampaignid;
  QUERY
PLAN
-
  GroupAggregate  (cost=0.00..1788988.05 rows=9939 width=12)
-   Nested Loop  (cost=0.00..1742467.24 rows=9279316 width=12)
  -   Index Scan using ecr_sents_ecid_idx on ecr_sents s
(cost=0.00..38.59 rows=479 width=4)
  -   Index Scan using openact_emcamp_idx on openactivity o
(cost=0.00..3395.49 rows=19372 width=12)
Index Cond: (o.emailcampaignid = s.emailcampaignid)
(5 rows)
Should this query be hashing the smaller table on Postgres rather than
using nested loops?

Yeah, seems like it.  Just for testing purposes, do set enable_nestloop
= 0 and see what plan you get then.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Oracle v. Postgres 9.0 query performance

2011-06-08 Thread Tony Capobianco
Here's the explain analyze:

pg_dw=# explain analyze CREATE TABLE ecr_opens with (FILLFACTOR=100)
as
select o.emailcampaignid, count(memberid) opencnt
  from openactivity o,ecr_sents s
 where s.emailcampaignid = o.emailcampaignid
 group by o.emailcampaignid;

QUERY
PLAN   

 GroupAggregate  (cost=0.00..1788988.05 rows=9939 width=12) (actual
time=308630.967..2592279.526 rows=472 loops=1)
   -  Nested Loop  (cost=0.00..1742467.24 rows=9279316 width=12)
(actual time=31.489..2589363.047 rows=8586466 loops=1)
 -  Index Scan using ecr_sents_ecid_idx on ecr_sents s
(cost=0.00..38.59 rows=479 width=4) (actual time=0.010..13.326 rows=479
loops=1)
 -  Index Scan using openact_emcamp_idx on openactivity o
(cost=0.00..3395.49 rows=19372 width=12) (actual time=1.336..5397.139
rows=17926 loops=479)
   Index Cond: (o.emailcampaignid = s.emailcampaignid)
 Total runtime: 2592284.336 ms


On Wed, 2011-06-08 at 17:31 +0200, t...@fuzzy.cz wrote:
  On Postgres, this same query takes about 58 minutes (could not run
  explain analyze because it is in progress):
 
  pg_dw=# explain CREATE TABLE ecr_opens with (FILLFACTOR=100)
  pg_dw-# as
  pg_dw-# select o.emailcampaignid, count(memberid) opencnt
  pg_dw-#   from openactivity o,ecr_sents s
  pg_dw-#  where s.emailcampaignid = o.emailcampaignid
  pg_dw-#  group by o.emailcampaignid;
   QUERY
  PLAN
  -
   GroupAggregate  (cost=0.00..1788988.05 rows=9939 width=12)
 -  Nested Loop  (cost=0.00..1742467.24 rows=9279316 width=12)
   -  Index Scan using ecr_sents_ecid_idx on ecr_sents s
  (cost=0.00..38.59 rows=479 width=4)
   -  Index Scan using openact_emcamp_idx on openactivity o
  (cost=0.00..3395.49 rows=19372 width=12)
 Index Cond: (o.emailcampaignid = s.emailcampaignid)
  (5 rows)
 
 
 Please, post EXPLAIN ANALYZE, not just EXPLAIN. Preferably using
 explain.depesz.com.
 
 regards
 Tomas
 
 



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Oracle v. Postgres 9.0 query performance

2011-06-08 Thread Pavel Stehule
Hello

what is your settings for

random_page_cost, seq_page_cost and work_mem?

Regards

Pavel Stehule

2011/6/8 Tony Capobianco tcapobia...@prospectiv.com:
 Here's the explain analyze:

 pg_dw=# explain analyze CREATE TABLE ecr_opens with (FILLFACTOR=100)
 as
 select o.emailcampaignid, count(memberid) opencnt
  from openactivity o,ecr_sents s
  where s.emailcampaignid = o.emailcampaignid
  group by o.emailcampaignid;

 QUERY
 PLAN
 
  GroupAggregate  (cost=0.00..1788988.05 rows=9939 width=12) (actual
 time=308630.967..2592279.526 rows=472 loops=1)
   -  Nested Loop  (cost=0.00..1742467.24 rows=9279316 width=12)
 (actual time=31.489..2589363.047 rows=8586466 loops=1)
         -  Index Scan using ecr_sents_ecid_idx on ecr_sents s
 (cost=0.00..38.59 rows=479 width=4) (actual time=0.010..13.326 rows=479
 loops=1)
         -  Index Scan using openact_emcamp_idx on openactivity o
 (cost=0.00..3395.49 rows=19372 width=12) (actual time=1.336..5397.139
 rows=17926 loops=479)
               Index Cond: (o.emailcampaignid = s.emailcampaignid)
  Total runtime: 2592284.336 ms


 On Wed, 2011-06-08 at 17:31 +0200, t...@fuzzy.cz wrote:
  On Postgres, this same query takes about 58 minutes (could not run
  explain analyze because it is in progress):
 
  pg_dw=# explain CREATE TABLE ecr_opens with (FILLFACTOR=100)
  pg_dw-# as
  pg_dw-# select o.emailcampaignid, count(memberid) opencnt
  pg_dw-#   from openactivity o,ecr_sents s
  pg_dw-#  where s.emailcampaignid = o.emailcampaignid
  pg_dw-#  group by o.emailcampaignid;
                                                   QUERY
  PLAN
  -
   GroupAggregate  (cost=0.00..1788988.05 rows=9939 width=12)
     -  Nested Loop  (cost=0.00..1742467.24 rows=9279316 width=12)
           -  Index Scan using ecr_sents_ecid_idx on ecr_sents s
  (cost=0.00..38.59 rows=479 width=4)
           -  Index Scan using openact_emcamp_idx on openactivity o
  (cost=0.00..3395.49 rows=19372 width=12)
                 Index Cond: (o.emailcampaignid = s.emailcampaignid)
  (5 rows)
 

 Please, post EXPLAIN ANALYZE, not just EXPLAIN. Preferably using
 explain.depesz.com.

 regards
 Tomas





 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Oracle v. Postgres 9.0 query performance

2011-06-08 Thread Tony Capobianco
Well, this ran much better.  However, I'm not sure if it's because of
set enable_nestloop = 0, or because I'm executing the query twice in a
row, where previous results may be cached.  I will try this setting in
my code for when this process runs later today and see what the result
is.

Thanks!

pg_dw=# explain analyze CREATE TABLE ecr_opens with (FILLFACTOR=100)
pg_dw-# as
pg_dw-# select o.emailcampaignid, count(memberid) opencnt
pg_dw-#   from openactivity o,ecr_sents s
pg_dw-#  where s.emailcampaignid = o.emailcampaignid
pg_dw-#  group by o.emailcampaignid;

   QUERY
PLAN   

 HashAggregate  (cost=4391163.81..4391288.05 rows=9939 width=12) (actual
time=167254.751..167254.937 rows=472 loops=1)
   -  Hash Join  (cost=14.78..4344767.23 rows=9279316 width=12) (actual
time=0.300..164577.131 rows=8586466 loops=1)
 Hash Cond: (o.emailcampaignid = s.emailcampaignid)
 -  Seq Scan on openactivity o  (cost=0.00..3529930.67
rows=192540967 width=12) (actual time=0.011..124351.878 rows=192542480
loops=1)
 -  Hash  (cost=8.79..8.79 rows=479 width=4) (actual
time=0.253..0.253 rows=479 loops=1)
   Buckets: 1024  Batches: 1  Memory Usage: 17kB
   -  Seq Scan on ecr_sents s  (cost=0.00..8.79 rows=479
width=4) (actual time=0.010..0.121 rows=479 loops=1)
 Total runtime: 167279.950 ms



On Wed, 2011-06-08 at 11:51 -0400, Stephen Frost wrote:
 * Tony Capobianco (tcapobia...@prospectiv.com) wrote:
   HashAggregate  (cost=4391163.81..4391288.05 rows=9939 width=12)
 -  Hash Join  (cost=14.78..4344767.23 rows=9279316 width=12)
   Hash Cond: (o.emailcampaignid = s.emailcampaignid)
   -  Seq Scan on openactivity o  (cost=0.00..3529930.67
  rows=192540967 width=12)
   -  Hash  (cost=8.79..8.79 rows=479 width=4)
 -  Seq Scan on ecr_sents s  (cost=0.00..8.79 rows=479
  width=4)
  
  Yikes.  Two sequential scans.
 
 Err, isn't that more-or-less exactly what you want here?  The smaller
 table is going to be hashed and then you'll traverse the bigger table
 and bounce each row off the hash table.  Have you tried actually running
 this and seeing how long it takes?  The bigger table doesn't look to be
 *that* big, if your i/o subsystem is decent and you've got a lot of
 memory available for kernel cacheing, should be quick.
 
   Thanks,
 
   Stephen



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Oracle v. Postgres 9.0 query performance

2011-06-08 Thread Tony Capobianco
pg_dw=# show random_page_cost ;
 random_page_cost 
--
 4
(1 row)

Time: 0.299 ms
pg_dw=# show seq_page_cost ;
 seq_page_cost 
---
 1
(1 row)

Time: 0.250 ms
pg_dw=# show work_mem ;
 work_mem 
--
 768MB
(1 row)




On Wed, 2011-06-08 at 18:27 +0200, Pavel Stehule wrote:
 Hello
 
 what is your settings for
 
 random_page_cost, seq_page_cost and work_mem?
 
 Regards
 
 Pavel Stehule
 
 2011/6/8 Tony Capobianco tcapobia...@prospectiv.com:
  Here's the explain analyze:
 
  pg_dw=# explain analyze CREATE TABLE ecr_opens with (FILLFACTOR=100)
  as
  select o.emailcampaignid, count(memberid) opencnt
   from openactivity o,ecr_sents s
   where s.emailcampaignid = o.emailcampaignid
   group by o.emailcampaignid;
 
  QUERY
  PLAN
  
   GroupAggregate  (cost=0.00..1788988.05 rows=9939 width=12) (actual
  time=308630.967..2592279.526 rows=472 loops=1)
-  Nested Loop  (cost=0.00..1742467.24 rows=9279316 width=12)
  (actual time=31.489..2589363.047 rows=8586466 loops=1)
  -  Index Scan using ecr_sents_ecid_idx on ecr_sents s
  (cost=0.00..38.59 rows=479 width=4) (actual time=0.010..13.326 rows=479
  loops=1)
  -  Index Scan using openact_emcamp_idx on openactivity o
  (cost=0.00..3395.49 rows=19372 width=12) (actual time=1.336..5397.139
  rows=17926 loops=479)
Index Cond: (o.emailcampaignid = s.emailcampaignid)
   Total runtime: 2592284.336 ms
 
 
  On Wed, 2011-06-08 at 17:31 +0200, t...@fuzzy.cz wrote:
   On Postgres, this same query takes about 58 minutes (could not run
   explain analyze because it is in progress):
  
   pg_dw=# explain CREATE TABLE ecr_opens with (FILLFACTOR=100)
   pg_dw-# as
   pg_dw-# select o.emailcampaignid, count(memberid) opencnt
   pg_dw-#   from openactivity o,ecr_sents s
   pg_dw-#  where s.emailcampaignid = o.emailcampaignid
   pg_dw-#  group by o.emailcampaignid;
QUERY
   PLAN
   -
GroupAggregate  (cost=0.00..1788988.05 rows=9939 width=12)
  -  Nested Loop  (cost=0.00..1742467.24 rows=9279316 width=12)
-  Index Scan using ecr_sents_ecid_idx on ecr_sents s
   (cost=0.00..38.59 rows=479 width=4)
-  Index Scan using openact_emcamp_idx on openactivity o
   (cost=0.00..3395.49 rows=19372 width=12)
  Index Cond: (o.emailcampaignid = s.emailcampaignid)
   (5 rows)
  
 
  Please, post EXPLAIN ANALYZE, not just EXPLAIN. Preferably using
  explain.depesz.com.
 
  regards
  Tomas
 
 
 
 
 
  --
  Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
  To make changes to your subscription:
  http://www.postgresql.org/mailpref/pgsql-performance
 
 



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Set of related slow queries

2011-06-08 Thread John Williams
Hi All,

Let me clarify this a bit.

The memory values are ridiculous you're completely correct.  I've
since fixed that (it had no effect sadly).

I've adjust the shared buffers to about 400MB.  As per the tuning
guide to set that to around 1/4 of your system memory (the AWS in
question has 1.7GB).  I didn't have the shared buffers set correctly
to start because truthfully I had no idea how to incurease shmmax and
I had to look that up.

The work_mem is very very high for the system It's running on
admittedly.  I'm ok with leaving that though because currently I'm the
only one on the machine at all (this isn't a production set up it's a
testing setup).  Realistically it's only that high because someone
suggusted trying a much higher value (I had already personally set it
to 50MB as that was about 30% larger than the largest sort I found)
and see if that improved the situation (it didn't).

Several of the implications of my current set of data make things look
a little wrong so let me clarify the issue a bit.  The table is
composed of data coming from a games combat log.  Each log represents
about 400k entries.  Since I only really care to look at data from the
perspective of each log, the log_id is infact going to be more most
selective portion of the query.  Right now the table only has two logs
in it making this hard to see.  But it should reflect that the
situation shouldn't get worse over time.  I will basically never be
looking at more than a 400-500k record portion of my entries table at
a time.

This stuff gets really painful because I can't very well predict the
queries so I can't pre calculate and the data isn't like a system log,
I could be accepting uploads of 100's of such logs per day.  The
actual queries that are run are a function of what the user wants to
see.  Their are roughly 5 or so different data views, each of which
takes 15-25 separate queries to calculate all the various graphs and
aggregates.  Frequently I won't be looking at the overall entire log
(composed of 400k entries), instead I'll be looking at smaller slices
of the data adding: WHERE seconds_since_start = 1500 AND seconds
since start = 4000 or some such with very arbitrary min and max.

Now I should say I've seen almost this exact same work done before for
a different game.  So I can't help but feel I must be missing
something really important either in how I'm setting up my data or how
I'm processing.

Thanks,
John

---

John Williams
42nd Design
Email: jwilli...@42nddesign.com
Skype: druidjaidan
Phone: (520) 440-7239



On Wed, Jun 8, 2011 at 6:36 AM, Shaun Thomas stho...@peak6.com wrote:
 On 06/08/2011 06:30 AM, t...@fuzzy.cz wrote:

  shared_buffers               | 16MB
  work_mem                     | 250MB

 This seems a bit suspicious. Are you sure you want to keep the
 shared_buffers so small and work_mem so large at the same time? There
 probably are workloads where this is the right thing to do, but I doubt
 this is the case. Why have you set it like this?

 I must concur in this case.  I can't imagine any scenario where this makes
 sense. Work-mem is allocated on a per-sort basis, not just per session or
 transaction. So a large query could allocate several of these and run your
 system out of memory and cause the OOM killer to start causing trouble.

 I don't have much experience with running Pg on AWS, but I'd try to
 increase the shared buffers to say 512MB and decrease the work_mem to
 16MB (or something like that).

 Easily good minimums. But it looks like your AWS only has 1GB of RAM (based
 on your effective_cache_size), so you may only want to increase it to 256MB.
 That said, reduce your work_mem to 8MB to start, and increase it in 4MB
 increments if it's still too low.

 With a setting of 16MB, it has to load data in and out of memory constantly.
 Even if the host OS has cached every single block you'll ever use, that's
 only the raw table contents. Processing hundreds of thousands of rows still
 takes time, you just saved yourself the effort of fetching them from disk,
 shared_buffers is still necessary to do actual work.

 Now... I have some issues with your queries, which are likely the fault of
 the Django ORM, but still consider your analyze:

 http://explain.depesz.com/s/vEx

 Your bitmap index scan on logparser is hilarious. The estimates are fine.
 237k rows in 47ms when it expected 217k. If your table really does have 815k
 rows in it, that's not very selective at all. Then it adds a heap scan for
 the remaining where conditions, and you end up with 100k rows it then has to
 sort. That's never going to be fast. 600ms actually isn't terrible for this
 many rows, and it also explains your high CPU.

 Then your next one:

 http://explain.depesz.com/s/Rhb

 700ms, mostly because of the HashAggregate caused by grouping by
 round(((seconds_since_start / 42)). You're aggregating by a calculation on
 100k rows. Again, this will never be fast and 700ms is not terrible
 considering all the extra work 

Re: [PERFORM] Oracle v. Postgres 9.0 query performance

2011-06-08 Thread Tom Lane
Tony Capobianco tcapobia...@prospectiv.com writes:
 Well, this ran much better.  However, I'm not sure if it's because of
 set enable_nestloop = 0, or because I'm executing the query twice in a
 row, where previous results may be cached.  I will try this setting in
 my code for when this process runs later today and see what the result
 is.

If the performance differential holds up, you should look at adjusting
your cost parameters so that the planner isn't so wrong about which one
is faster.  Hacking enable_nestloop is a band-aid, not something you
want to use in production.

Looking at the values you gave earlier, I wonder whether the
effective_cache_size setting isn't unreasonably high.  That's reducing
the estimated cost of accessing the large table via indexscans, and
I'm thinking it reduced it too much.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Oracle v. Postgres 9.0 query performance

2011-06-08 Thread Pavel Stehule
2011/6/8 Tony Capobianco tcapobia...@prospectiv.com:
 pg_dw=# show random_page_cost ;
  random_page_cost
 --
  4
 (1 row)

 Time: 0.299 ms
 pg_dw=# show seq_page_cost ;
  seq_page_cost
 ---
  1
 (1 row)

 Time: 0.250 ms
 pg_dw=# show work_mem ;
  work_mem
 --
  768MB
 (1 row)



it is ok.

Pavel



 On Wed, 2011-06-08 at 18:27 +0200, Pavel Stehule wrote:
 Hello

 what is your settings for

 random_page_cost, seq_page_cost and work_mem?

 Regards

 Pavel Stehule

 2011/6/8 Tony Capobianco tcapobia...@prospectiv.com:
  Here's the explain analyze:
 
  pg_dw=# explain analyze CREATE TABLE ecr_opens with (FILLFACTOR=100)
  as
  select o.emailcampaignid, count(memberid) opencnt
   from openactivity o,ecr_sents s
   where s.emailcampaignid = o.emailcampaignid
   group by o.emailcampaignid;
 
  QUERY
  PLAN
  
   GroupAggregate  (cost=0.00..1788988.05 rows=9939 width=12) (actual
  time=308630.967..2592279.526 rows=472 loops=1)
    -  Nested Loop  (cost=0.00..1742467.24 rows=9279316 width=12)
  (actual time=31.489..2589363.047 rows=8586466 loops=1)
          -  Index Scan using ecr_sents_ecid_idx on ecr_sents s
  (cost=0.00..38.59 rows=479 width=4) (actual time=0.010..13.326 rows=479
  loops=1)
          -  Index Scan using openact_emcamp_idx on openactivity o
  (cost=0.00..3395.49 rows=19372 width=12) (actual time=1.336..5397.139
  rows=17926 loops=479)
                Index Cond: (o.emailcampaignid = s.emailcampaignid)
   Total runtime: 2592284.336 ms
 
 
  On Wed, 2011-06-08 at 17:31 +0200, t...@fuzzy.cz wrote:
   On Postgres, this same query takes about 58 minutes (could not run
   explain analyze because it is in progress):
  
   pg_dw=# explain CREATE TABLE ecr_opens with (FILLFACTOR=100)
   pg_dw-# as
   pg_dw-# select o.emailcampaignid, count(memberid) opencnt
   pg_dw-#   from openactivity o,ecr_sents s
   pg_dw-#  where s.emailcampaignid = o.emailcampaignid
   pg_dw-#  group by o.emailcampaignid;
                                                    QUERY
   PLAN
   -
    GroupAggregate  (cost=0.00..1788988.05 rows=9939 width=12)
      -  Nested Loop  (cost=0.00..1742467.24 rows=9279316 width=12)
            -  Index Scan using ecr_sents_ecid_idx on ecr_sents s
   (cost=0.00..38.59 rows=479 width=4)
            -  Index Scan using openact_emcamp_idx on openactivity o
   (cost=0.00..3395.49 rows=19372 width=12)
                  Index Cond: (o.emailcampaignid = s.emailcampaignid)
   (5 rows)
  
 
  Please, post EXPLAIN ANALYZE, not just EXPLAIN. Preferably using
  explain.depesz.com.
 
  regards
  Tomas
 
 
 
 
 
  --
  Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
  To make changes to your subscription:
  http://www.postgresql.org/mailpref/pgsql-performance
 





-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Oracle v. Postgres 9.0 query performance

2011-06-08 Thread Tony Capobianco
My current setting is 22G.  According to some documentation, I want to
set effective_cache_size to my OS disk cache + shared_buffers.  In this
case, I have 4 quad-core processors with 512K cache (8G) and my
shared_buffers is 7680M.  Therefore my effective_cache_size should be
approximately 16G?  Most of our other etl processes are running fine,
however I'm curious if I could see a significant performance boost by
reducing the effective_cache_size.


On Wed, 2011-06-08 at 13:03 -0400, Tom Lane wrote:
 Tony Capobianco tcapobia...@prospectiv.com writes:
  Well, this ran much better.  However, I'm not sure if it's because of
  set enable_nestloop = 0, or because I'm executing the query twice in a
  row, where previous results may be cached.  I will try this setting in
  my code for when this process runs later today and see what the result
  is.
 
 If the performance differential holds up, you should look at adjusting
 your cost parameters so that the planner isn't so wrong about which one
 is faster.  Hacking enable_nestloop is a band-aid, not something you
 want to use in production.
 
 Looking at the values you gave earlier, I wonder whether the
 effective_cache_size setting isn't unreasonably high.  That's reducing
 the estimated cost of accessing the large table via indexscans, and
 I'm thinking it reduced it too much.
 
   regards, tom lane
 



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] poor performance when recreating constraints on large tables

2011-06-08 Thread Robert Haas
On Mon, Jun 6, 2011 at 6:10 PM, Mike Broers mbro...@gmail.com wrote:
 Thanks for the suggestion, maintenance_work_mem is set to the default of
 16MB on the host that was taking over an hour as well as on the host that
 was taking less than 10 minutes.  I tried setting it to 1GB on the faster
 test server and it reduced the time from around 6-7 minutes to about 3:30.
  this is a good start, if there are any other suggestions please let me know
 - is there any query to check estimated time remaining on long running
 transactions?

Sadly, no.  I suspect that coming up with a good algorithm for that is
a suitable topic for a PhD thesis.  :-(

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Oracle v. Postgres 9.0 query performance

2011-06-08 Thread Kevin Grittner
Tony Capobianco tcapobia...@prospectiv.com wrote:
 
 According to some documentation, I want to set
 effective_cache_size to my OS disk cache + shared_buffers.
 
That seems reasonable, and is what has worked well for me.
 
 In this case, I have 4 quad-core processors with 512K cache (8G)
 and my shared_buffers is 7680M.  Therefore my effective_cache_size
 should be approximately 16G?
 
I didn't follow that at all.  Can you run `free` or `vmstat`?  If
so, go by what those say your cache size is.
 
 Most of our other etl processes are running fine, however I'm
 curious if I could see a significant performance boost by reducing
 the effective_cache_size.
 
Since it is an optimizer costing parameter and has no affect on
memory allocation, you can set it on a connection and run a query on
that connection to test the impact.  Why wonder about it when you
can easily test it?
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Oracle v. Postgres 9.0 query performance

2011-06-08 Thread Samuel Gendler
On Wed, Jun 8, 2011 at 12:03 PM, Tony Capobianco tcapobia...@prospectiv.com
 wrote:

 My current setting is 22G.  According to some documentation, I want to
 set effective_cache_size to my OS disk cache + shared_buffers.  In this
 case, I have 4 quad-core processors with 512K cache (8G) and my
 shared_buffers is 7680M.  Therefore my effective_cache_size should be
 approximately 16G?  Most of our other etl processes are running fine,
 however I'm curious if I could see a significant performance boost by
 reducing the effective_cache_size.


disk cache, not CPU memory cache.  It will be some significant fraction of
total RAM on the host.  Incidentally, 16 * 512K cache = 8MB, not 8GB.

http://en.wikipedia.org/wiki/CPU_cache


Re: [PERFORM] poor performance when recreating constraints on large tables

2011-06-08 Thread Greg Smith

Samuel Gendler wrote:
Sure, but if it is a query that is slow enough for a time estimate to 
be useful, odds are good that stats that are that far out of whack 
would actually be interesting to whoever is looking at the time 
estimate, so showing some kind of 'N/A' response once things have 
gotten out of whack wouldn't be unwarranted.


The next question is what are you then going to do with that information?

The ability to track some measure of progress relative to expectations 
is mainly proposed as something helpful when a query has gone out of 
control.  When that's happened, the progress meter normally turns out to 
be fundamentally broken; the plan isn't happening at all as expected.  
So, as you say, you will get an N/A response that says the query is 
out of control, when in the cases where this sort of thing is expected 
to be the most useful.


At that point, you have two choices.  You can let the query keep running 
and see how long it really takes.  You have no idea how long that will 
be, all you can do is wait and see because the estimation is trashed.  
Or you can decide to kill it.  And the broken progress meter won't help 
with that decision.  So why put it there at all?


What I try to do as a force of habit is run just about everything that 
might take a while with \timing on, and try to keep statement_timeout 
to a reasonable value at all times.  Do that enough, and you get a feel 
for what reasonable and unreasonable time scales look like better than 
the query executor can be expected to figure them out for you.  It would 
be nice to provide a better UI here for tracking progress, but it would 
really work only in the simplest of cases--which are of course the ones 
you need it the least for.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance