Re: [PERFORM] Performance of query

2013-03-25 Thread Jeff Janes
On Sat, Mar 23, 2013 at 3:27 PM, Misa Simic misa.si...@gmail.com wrote:

 Hi Jeff,

 It seems my previous mail has not showed up in the list... copied/pasted
 again belloew

 However, you said something important:

 The join to the state table is not necessary.  Between the foreign key
 and the primary key, you know that every state exists, and that every state
 exists only once.  But, that will not solve your problem, as the join to
 the state table is not where the time goes.

 I think it is something what planner could/should be aware off... and
 discard the join


I thought that this was on the To Do list (
http://wiki.postgresql.org/wiki/Todo) but if it is, I can't find it.

I think the main concern was that it might add substantial planning time to
all queries, even ones that would not benefit from it.  I don't know if
there is a way to address this concern, other then to implement it and see
what happens.

...


 EXPLAIN ANALYZE
 SELECT busbase.state AS c0, count(busbase.id) AS m0 FROM busbase INNER
 JOIN state USING (state)
 GROUP BY busbase.state


In the original email, the table definition listed id twice, once with a
not null constraint.  If it is truly not null, then this count could be
replaced with count(1), in which case the original index on (state) would
be sufficient, the composite on (count, id) would not be necessary.  (Yes,
this is another thing the planner could, in theory, recognize on your
behalf)

Based on the use of column aliases which are less meaningful than the
original column names were, I'm assuming that this is generated SQL that
you have no control over?


 on created composite index
 CREATE INDEX comp_statidx2
   ON busbase
   USING btree
   (state, id );







 we got:

 GroupAggregate  (cost=0.00..2610570.81 rows=51 width=3) (actual
 time=98.923..51033.888 rows=51 loops=1)
   -  Merge Join  (cost=0.00..2310285.02 rows=60057056 width=3) (actual
 time=38.424..41992.070 rows=60057057 loops=1)
 Merge Cond: (state.state = busbase.state)
 -  Index Only Scan using state_pkey on state  (cost=0.00..13.02
 rows=51 width=3) (actual time=0.008..0.148 rows=51 loops=1)
   Heap Fetches: 51
 -  Index Only Scan using comp_statidx2 on busbase
  (cost=0.00..1559558.68 rows=60057056 width=3) (actual
 time=38.408..12883.575 rows=60057057 loops=1)
   Heap Fetches: 0
 Total runtime: 51045.648 ms


I don't understand why you are getting a merge join rather than a hash
join.  Nor why there is such a big difference between the actual time of
the index only scan and of the merge join itself.  I would think the two
should be about equal.  Perhaps I just don't understand the semantics of
reported actual time for merge joins.

During normal operations, how much of busbase is going to be all_visible at
any given time?  If that table sees high turnover, this plan might not work
well on the production system.


Cheers,

Jeff


Re: [PERFORM] Performance of query

2013-03-25 Thread Cindy Makarowsky
I basically don't have any control over the generated select statement.
 I'm using Mondrian and that is the select statement that gets passed to
Postgres.  You're right that if you remove the count(id), the query is
faster but I can't do that since the select statement is being executed
from Mondrian.

On Mon, Mar 25, 2013 at 2:18 PM, Jeff Janes jeff.ja...@gmail.com wrote:

 On Sat, Mar 23, 2013 at 3:27 PM, Misa Simic misa.si...@gmail.com wrote:

 Hi Jeff,

 It seems my previous mail has not showed up in the list... copied/pasted
 again belloew

 However, you said something important:

 The join to the state table is not necessary.  Between the foreign
 key and the primary key, you know that every state exists, and that every
 state exists only once.  But, that will not solve your problem, as the join
 to the state table is not where the time goes.

 I think it is something what planner could/should be aware off... and
 discard the join


 I thought that this was on the To Do list (
 http://wiki.postgresql.org/wiki/Todo) but if it is, I can't find it.

 I think the main concern was that it might add substantial planning time
 to all queries, even ones that would not benefit from it.  I don't know if
 there is a way to address this concern, other then to implement it and see
 what happens.

 ...


 EXPLAIN ANALYZE
 SELECT busbase.state AS c0, count(busbase.id) AS m0 FROM busbase INNER
 JOIN state USING (state)
 GROUP BY busbase.state


 In the original email, the table definition listed id twice, once with a
 not null constraint.  If it is truly not null, then this count could be
 replaced with count(1), in which case the original index on (state) would
 be sufficient, the composite on (count, id) would not be necessary.  (Yes,
 this is another thing the planner could, in theory, recognize on your
 behalf)

 Based on the use of column aliases which are less meaningful than the
 original column names were, I'm assuming that this is generated SQL that
 you have no control over?


 on created composite index
 CREATE INDEX comp_statidx2
   ON busbase
   USING btree
   (state, id );







 we got:

 GroupAggregate  (cost=0.00..2610570.81 rows=51 width=3) (actual
 time=98.923..51033.888 rows=51 loops=1)
   -  Merge Join  (cost=0.00..2310285.02 rows=60057056 width=3) (actual
 time=38.424..41992.070 rows=60057057 loops=1)
 Merge Cond: (state.state = busbase.state)
 -  Index Only Scan using state_pkey on state  (cost=0.00..13.02
 rows=51 width=3) (actual time=0.008..0.148 rows=51 loops=1)
   Heap Fetches: 51
 -  Index Only Scan using comp_statidx2 on busbase
  (cost=0.00..1559558.68 rows=60057056 width=3) (actual
 time=38.408..12883.575 rows=60057057 loops=1)
   Heap Fetches: 0
 Total runtime: 51045.648 ms


 I don't understand why you are getting a merge join rather than a hash
 join.  Nor why there is such a big difference between the actual time of
 the index only scan and of the merge join itself.  I would think the two
 should be about equal.  Perhaps I just don't understand the semantics of
 reported actual time for merge joins.

 During normal operations, how much of busbase is going to be all_visible
 at any given time?  If that table sees high turnover, this plan might not
 work well on the production system.


 Cheers,

 Jeff



Re: [PERFORM] Performance of query

2013-03-24 Thread Roman Konoval
I assume there are reasons not to throw away join to state. May be it still
can be done as the last thing. This should help further:
SELECT counts.* FROM (
   SELECT busbase.state AS state, count(busbase.id) AS m0 FROM busbase
   GROUP BY busbase.state ) AS counts
 INNER JOIN state USING (state)

Regards,
Roman Konoval


On Sun, Mar 24, 2013 at 12:27 AM, Misa Simic misa.si...@gmail.com wrote:

 Hi Jeff,

 It seems my previous mail has not showed up in the list... copied/pasted
 again belloew

 However, you said something important:

 The join to the state table is not necessary.  Between the foreign key
 and the primary key, you know that every state exists, and that every state
 exists only once.  But, that will not solve your problem, as the join to
 the state table is not where the time goes.

 I think it is something what planner could/should be aware off... and
 discard the join

  Merge Join  (cost=0.00..2310285.02 rows=60057056 width=3) (actual
 time=38.424..41992.070 rows=60057057 loops=1)
 Merge Cond: (state.state = busbase.state)

 this part from bellow plan  would save significant time if planner didn't
 decide to take this step at all 

 Kind regards,

 Misa




 
 Hi Cindy

 TBH - I don't know...

 I have added this to list so maybe someone else can help...

 To recap:

 from start situation (table structure and indexes are in the first mail in
 this thread)

 EXPLAIN ANALYZE
 SELECT busbase.state AS c0, count(busbase.id) AS m0 FROM busbase INNER
 JOIN state USING (state)
 GROUP BY busbase.state

 says:
 HashAggregate  (cost=7416975.58..7416976.09 rows=51 width=7) (actual
 time=285339.465..285339.473 rows=51 loops=1)
   -  Hash Join  (cost=2.15..7139961.94 rows=55402728 width=7) (actual
 time=0.066..269527.934 rows=60057057 loops=1)
  Hash Cond: (busbase.state = state.state)
 -  Seq Scan on busbase  (cost=0.00..6378172.28 rows=55402728
 width=7) (actual time=0.022..251029.307 rows=60057057 loops=1)
 -  Hash  (cost=1.51..1.51 rows=51 width=3) (actual
 time=0.028..0.028 rows=51 loops=1)
   Buckets: 1024  Batches: 1  Memory Usage: 2kB
   -  Seq Scan on state  (cost=0.00..1.51 rows=51 width=3)
 (actual time=0.003..0.019 rows=51 loops=1)
 Total runtime: 285339.516 ms

 on created composite index
 CREATE INDEX comp_statidx2
   ON busbase
   USING btree
   (state, id );


 we got:

 GroupAggregate  (cost=0.00..2610570.81 rows=51 width=3) (actual
 time=98.923..51033.888 rows=51 loops=1)
   -  Merge Join  (cost=0.00..2310285.02 rows=60057056 width=3) (actual
 time=38.424..41992.070 rows=60057057 loops=1)
 Merge Cond: (state.state = busbase.state)
 -  Index Only Scan using state_pkey on state  (cost=0.00..13.02
 rows=51 width=3) (actual time=0.008..0.148 rows=51 loops=1)
   Heap Fetches: 51
 -  Index Only Scan using comp_statidx2 on busbase
  (cost=0.00..1559558.68 rows=60057056 width=3) (actual
 time=38.408..12883.575 rows=60057057 loops=1)
   Heap Fetches: 0
 Total runtime: 51045.648 ms


 Question is - is it possible to improve it more?
  
 ​



Re: [PERFORM] Performance of query

2013-03-23 Thread Jeff Janes
On Friday, March 22, 2013, Cindy Makarowsky wrote:

 I have two tables in Postgres 9.2 on a Linux server with 8GB of RAM.  The
 first table has 60 million records:


You have over 40GB of data in that table, so there is no way you are going
to get it into 8GB RAM without some major reorganization.


   company character(35),
   address character(35),
   city character(20),
   contact character(35),
   title character(20),



All of those fixed width fields are probably taking up needless space, and
in your case, space is time.  Varchar would probably be better.  (And
probably longer maximum lengths as well.  Most people don't need more than
35 characters for their addresses, but the people who do are going to be
cheesed off when you inform them that you deem their address to be
unreasonable.  Unless your mailing labels only hold 35 characters)



 When I run this query:

  select state.state, count(table1.id)  from state,table1 where
 table1.state = state.state group by state.state



The join to the state table is not necessary.  Between the foreign key
and the primary key, you know that every state exists, and that every state
exists only once.  But, that will not solve your problem, as the join to
the state table is not where the time goes.



 -  Seq Scan on busbase  (cost=0.00..6378172.28 rows=55402728
 width=7) (actual time=0.004..250046.673 rows=60057057 loops=1)


Assuming that your cost parameters are all default, this means you have
(6378172.28 - 0.01* 55402728)/1  = 5.8e6 pages, or 44.4 GB of table.  That
is, less than 10 tuples per page.

Tightly packed, you should be able to hold over 30 tuples per page.  You
are probably not vacuuming aggressively enough, or you were not doing so in
the past and never did a vacuum full to reclaim the bloated space.

In any event, your sequential scan is running at 181 MB/s.  Is this what
you would expect given your IO hardware?




 I've tried playing around with the settings in the config file for
 shared_buffers, work_mem, etc restarting Postgres each time and nothing
 seems to help.


How fast do you think it should run?  How fast do you need it to run?  This
seems like the type of query that would get run once per financial quarter,
or maybe once per day on off-peak times.

Cheers,

Jeff


Re: [PERFORM] Performance of query

2013-03-23 Thread Misa Simic
Hi Jeff,

It seems my previous mail has not showed up in the list... copied/pasted
again belloew

However, you said something important:

The join to the state table is not necessary.  Between the foreign key
and the primary key, you know that every state exists, and that every state
exists only once.  But, that will not solve your problem, as the join to
the state table is not where the time goes.

I think it is something what planner could/should be aware off... and
discard the join

 Merge Join  (cost=0.00..2310285.02 rows=60057056 width=3) (actual
time=38.424..41992.070 rows=60057057 loops=1)
Merge Cond: (state.state = busbase.state)

this part from bellow plan  would save significant time if planner didn't
decide to take this step at all 

Kind regards,

Misa





Hi Cindy

TBH - I don't know...

I have added this to list so maybe someone else can help...

To recap:

from start situation (table structure and indexes are in the first mail in
this thread)

EXPLAIN ANALYZE
SELECT busbase.state AS c0, count(busbase.id) AS m0 FROM busbase INNER JOIN
state USING (state)
GROUP BY busbase.state

says:
HashAggregate  (cost=7416975.58..7416976.09 rows=51 width=7) (actual
time=285339.465..285339.473 rows=51 loops=1)
  -  Hash Join  (cost=2.15..7139961.94 rows=55402728 width=7) (actual
time=0.066..269527.934 rows=60057057 loops=1)
Hash Cond: (busbase.state = state.state)
-  Seq Scan on busbase  (cost=0.00..6378172.28 rows=55402728
width=7) (actual time=0.022..251029.307 rows=60057057 loops=1)
-  Hash  (cost=1.51..1.51 rows=51 width=3) (actual
time=0.028..0.028 rows=51 loops=1)
  Buckets: 1024  Batches: 1  Memory Usage: 2kB
  -  Seq Scan on state  (cost=0.00..1.51 rows=51 width=3)
(actual time=0.003..0.019 rows=51 loops=1)
Total runtime: 285339.516 ms

on created composite index
CREATE INDEX comp_statidx2
  ON busbase
  USING btree
  (state, id );


we got:

GroupAggregate  (cost=0.00..2610570.81 rows=51 width=3) (actual
time=98.923..51033.888 rows=51 loops=1)
  -  Merge Join  (cost=0.00..2310285.02 rows=60057056 width=3) (actual
time=38.424..41992.070 rows=60057057 loops=1)
Merge Cond: (state.state = busbase.state)
-  Index Only Scan using state_pkey on state  (cost=0.00..13.02
rows=51 width=3) (actual time=0.008..0.148 rows=51 loops=1)
  Heap Fetches: 51
-  Index Only Scan using comp_statidx2 on busbase
 (cost=0.00..1559558.68 rows=60057056 width=3) (actual
time=38.408..12883.575 rows=60057057 loops=1)
  Heap Fetches: 0
Total runtime: 51045.648 ms


Question is - is it possible to improve it more?

​


[PERFORM] Performance of query

2013-03-22 Thread Cindy Makarowsky
I have two tables in Postgres 9.2 on a Linux server with 8GB of RAM.  The
first table has 60 million records:

CREATE TABLE table1
(
  id integer,
  update date,
  company character(35),
  address character(35),
  city character(20),
  state character(2),
  zip character(9),
  phone character(10),
  fips character(5),
  tract character(6),
  block character(4),
  status character(1),
  pre_title character(2),
  contact character(35),
  title character(20),
  pstat character(1),
  id integer NOT NULL,
  pkone character(2),
  pktwo character(2),
  pkthree character(2),
  pkfour character(2),
  centract character(15),
  CONSTRAINT table1_pkey PRIMARY KEY (id ),
CONSTRAINT fipsc FOREIGN KEY (fips)
  REFERENCES fips (fips) MATCH FULL
  ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT statec FOREIGN KEY (state)
  REFERENCES state (state) MATCH FULL
  ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT tractc FOREIGN KEY (centract)
  REFERENCES tract (centract) MATCH FULL
  ON UPDATE NO ACTION ON DELETE NO ACTION,
 CONSTRAINT zipc FOREIGN KEY (zip)
  REFERENCES zip (zip) MATCH FULL
  ON UPDATE NO ACTION ON DELETE NO ACTION
)
WITH (
  OIDS=FALSE
);
ALTER TABLE table1
  OWNER TO postgres;


-- Index: statidx2

-- DROP INDEX statidx2;

CREATE INDEX statidx2
  ON table1
  USING btree
  (state COLLATE pg_catalog.default );

The second table just has the 51 state records:

CREATE TABLE state
(
  state character(2) NOT NULL,
  state_name character(15),
  CONSTRAINT state_pkey PRIMARY KEY (state )
)
WITH (
  OIDS=FALSE
);
ALTER TABLE state
  OWNER TO postgres;

-- Index: stateidx

-- DROP INDEX stateidx;

CREATE UNIQUE INDEX stateidx
  ON state
  USING btree
  (state COLLATE pg_catalog.default );

When I run this query:

 select state.state, count(table1.id)  from state,table1 where table1.state
= state.state group by state.state

It takes almost 4 minutes with this output from explain:

HashAggregate  (cost=7416975.58..7416976.09 rows=51 width=7) (actual
time=284891.955..284891.964 rows=51 loops=1)
  -  Hash Join  (cost=2.15..7139961.94 rows=55402728 width=7) (actual
time=0.049..269049.678 rows=60057057 loops=1)
Hash Cond: (busbase.state = state.state)
-  Seq Scan on busbase  (cost=0.00..6378172.28 rows=55402728
width=7) (actual time=0.004..250046.673 rows=60057057 loops=1)
-  Hash  (cost=1.51..1.51 rows=51 width=3) (actual
time=0.032..0.032 rows=51 loops=1)
  Buckets: 1024  Batches: 1  Memory Usage: 2kB
  -  Seq Scan on state  (cost=0.00..1.51 rows=51 width=3)
(actual time=0.003..0.012 rows=51 loops=1)
Total runtime: 284892.024 ms

I've tried playing around with the settings in the config file for
shared_buffers, work_mem, etc restarting Postgres each time and nothing
seems to help.

Thanks for any help.


Re: [PERFORM] Performance of query

2013-03-22 Thread Cindy Makarowsky
But, I do have an index on Table1 on the state field which is in my group
by condition:

CREATE INDEX statidx2
  ON table1
  USING btree
  (state COLLATE pg_catalog.default );

I have vacuumed the table too.
On Fri, Mar 22, 2013 at 5:13 PM, Josh Berkus j...@agliodbs.com wrote:

 On 03/22/2013 12:46 PM, Cindy Makarowsky wrote:
  I've tried playing around with the settings in the config file for
  shared_buffers, work_mem, etc restarting Postgres each time and nothing
  seems to help.

 Well, you're summarizing 55 million rows on an unindexed table:

 -  Seq Scan on busbase  (cost=0.00..6378172.28 rows=55402728
 width=7) (actual time=0.004..250046.673 rows=60057057 loops=1)

 ... that's where your time is going.

 My only suggestion would be to create a composite index which matches
 the group by condition on table1, and vacuum freeze the whole table so
 that you can use index-only scan on 9.2.

 --
 Josh Berkus
 PostgreSQL Experts Inc.
 http://pgexperts.com


 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance



Re: [PERFORM] Performance of query

2013-03-22 Thread Misa Simic
Hi,

there is something mixed..

your index is on table1

Explain Analyze reports about table called: busbase

Kind Regards,

Misa




2013/3/22 Cindy Makarowsky cindymakarow...@gmail.com

 But, I do have an index on Table1 on the state field which is in my group
 by condition:

 CREATE INDEX statidx2
   ON table1
   USING btree
   (state COLLATE pg_catalog.default );

 I have vacuumed the table too.

 On Fri, Mar 22, 2013 at 5:13 PM, Josh Berkus j...@agliodbs.com wrote:

 On 03/22/2013 12:46 PM, Cindy Makarowsky wrote:
  I've tried playing around with the settings in the config file for
  shared_buffers, work_mem, etc restarting Postgres each time and nothing
  seems to help.

 Well, you're summarizing 55 million rows on an unindexed table:

 -  Seq Scan on busbase  (cost=0.00..6378172.28 rows=55402728
 width=7) (actual time=0.004..250046.673 rows=60057057 loops=1)

 ... that's where your time is going.

 My only suggestion would be to create a composite index which matches
 the group by condition on table1, and vacuum freeze the whole table so
 that you can use index-only scan on 9.2.

 --
 Josh Berkus
 PostgreSQL Experts Inc.
 http://pgexperts.com


 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org
 )
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance





Re: [PERFORM] Performance of query

2013-03-22 Thread Cindy Makarowsky
I changed the name of the table for the post but forgot to change it in the
results of the explain.  Table1 is busbase.

On Fri, Mar 22, 2013 at 6:25 PM, Misa Simic misa.si...@gmail.com wrote:

 Hi,

 there is something mixed..

 your index is on table1

 Explain Analyze reports about table called: busbase

 Kind Regards,

 Misa




 2013/3/22 Cindy Makarowsky cindymakarow...@gmail.com

 But, I do have an index on Table1 on the state field which is in my group
 by condition:

 CREATE INDEX statidx2
   ON table1
   USING btree
   (state COLLATE pg_catalog.default );

 I have vacuumed the table too.

 On Fri, Mar 22, 2013 at 5:13 PM, Josh Berkus j...@agliodbs.com wrote:

 On 03/22/2013 12:46 PM, Cindy Makarowsky wrote:
  I've tried playing around with the settings in the config file for
  shared_buffers, work_mem, etc restarting Postgres each time and nothing
  seems to help.

 Well, you're summarizing 55 million rows on an unindexed table:

 -  Seq Scan on busbase  (cost=0.00..6378172.28 rows=55402728
 width=7) (actual time=0.004..250046.673 rows=60057057 loops=1)

 ... that's where your time is going.

 My only suggestion would be to create a composite index which matches
 the group by condition on table1, and vacuum freeze the whole table so
 that you can use index-only scan on 9.2.

 --
 Josh Berkus
 PostgreSQL Experts Inc.
 http://pgexperts.com


 --
 Sent via pgsql-performance mailing list (
 pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance






Re: [PERFORM] performance with query

2009-06-18 Thread Alberto Dalmaso
P.S.: to understand what the query has to make (and 80% of the view hve
these to make): a lot of time is spend to pivoting a table with a
structure like
identifier, description_of_value, numeric value
that has to be transformed in
identifier, description_1, description_2, ..., description_n
where n is not a fixed number (it changes in function of the type of
calculation that was used to generate the rows in the table).

perhaps this information could help.

thanks everybady


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance with query

2009-06-18 Thread Kevin Grittner
Alberto Dalmaso dalm...@clesius.it wrote: 
 P.S.: to understand what the query has to make (and 80% of the view
 hve these to make): a lot of time is spend to pivoting a table with
 a structure like
 identifier, description_of_value, numeric value
 that has to be transformed in
 identifier, description_1, description_2, ..., description_n
 where n is not a fixed number (it changes in function of the type of
 calculation that was used to generate the rows in the table).
 
 perhaps this information could help.
 
What would help more is the actual query, if that can be shared.  It
leaves a lot less to the imagination than descriptions of it.
 
There are a couple things which have been requested which would help
pin down the reason the optimizer is not getting to a good plan, so
that it can be allowed to do a good job.  As Tom said, this would be a
much more productive focus than casting about for ways to force it to
do what you think is the best thing.  (Maybe, given the chance, it can
come up with a plan which runs in seconds, rather than over the 24
minutes you've gotten.)
 
With all the optimizer options on, and the from_collapse_limit and
join_collapse_limit values both set to 100, run an EXPLAIN (no
ANALYZE) on your big problem query.  Let us know how long the EXPLAIN
runs.  If it gets any errors, copy and paste all available
information.  (General descriptions aren't likely to get us very far.)
Since EXPLAIN without ANALYZE only *plans* the query, but doesn't run
it, it should not take long to do this.
 
If there are any views or custom functions involved, showing those
along with the query source would be good.
 
If we get this information, we have a much better chance to find the
real problem and get it fixed.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance with query

2009-06-18 Thread Tom Lane
Kevin Grittner kevin.gritt...@wicourts.gov writes:
 With all the optimizer options on, and the from_collapse_limit and
 join_collapse_limit values both set to 100, run an EXPLAIN (no
 ANALYZE) on your big problem query.  Let us know how long the EXPLAIN
 runs.  If it gets any errors, copy and paste all available
 information.  (General descriptions aren't likely to get us very far.)
 Since EXPLAIN without ANALYZE only *plans* the query, but doesn't run
 it, it should not take long to do this.

One issue here is that with the collapse limits cranked up to more than
geqo_threshold, he's going to be coping with GEQO's partially-random
plan selection; so whatever he reports might or might not be especially
reflective of day-to-day results.  I'm tempted to ask that he also push
up geqo_threshold.  It's possible that that *will* send the planning
time to the moon; but it would certainly be worth trying, to find out
what plan is produced.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance with query

2009-06-18 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote: 
 Kevin Grittner kevin.gritt...@wicourts.gov writes:
 With all the optimizer options on, and the from_collapse_limit and
 join_collapse_limit values both set to 100, run an EXPLAIN (no
 ANALYZE) on your big problem query.  Let us know how long the
 EXPLAIN runs.  If it gets any errors, copy and paste all available
 information.  (General descriptions aren't likely to get us very
 far.)  Since EXPLAIN without ANALYZE only *plans* the query, but
 doesn't run it, it should not take long to do this.
 
 One issue here is that with the collapse limits cranked up to more
 than geqo_threshold, he's going to be coping with GEQO's partially-
 random plan selection; so whatever he reports might or might not be
 especially reflective of day-to-day results.  I'm tempted to ask
 that he also push up geqo_threshold.
 
In an earlier post[1] he said that he had geqo turned off.  It does
pay to be explicit, though; I'd hate to assume it's of if he's been
changing things.
 
Alberto, please ensure that you still have geqo off when you run the
test I suggested.  Also, I see that I didn't explicitly say that you
should send the ANALYZE output, but that's what would be helpful.
 
 It's possible that that *will* send the planning time to the moon;
 but it would certainly be worth trying, to find out what plan is
 produced.
 
Agreed.  What plan is produced, and how long that takes.  (And whether
he gets an out of memory error.)  I figured it was best to get a clear
answer to those before moving on
 
-Kevin
 
[1]
http://archives.postgresql.org/pgsql-performance/2009-06/msg00186.php

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance with query

2009-06-17 Thread Alberto Dalmaso
Ok, here are the last rows for the vacuum analyze verbose

INFO:  free space map contains 154679 pages in 39 relations
DETAIL:  A total of 126176 page slots are in use (including overhead).
126176 page slots are required to track all free space.
Current limits are:  16 page slots, 5000 relations, using 1476 kB.
L'interrogazione è stata eseguita con successo, ma senza risultato, in
1332269 ms.


and I attach the complete explain analyze of the complex query.
Giving more detail about the tables involved in the query could be not
so easy as they are a lot.
The joins are made between columns that are primary key in a table and
indexed in the other.
All the where clausole are on indexed colums (perhaps there are too many
indexes...)

Thanks a lot.
QUERY PLAN
Merge Right Join  (cost=508603077.17..508603195.59 rows=1 width=227) (actual 
time=73312.340..1463106.860 rows=32407 loops=1)
  Merge Cond: (ve_edil_rendite.id_domanda = domande.id_domanda)
  -  GroupAggregate  (cost=0.00..105.51 rows=1031 width=11) (actual 
time=0.098..110.794 rows=1031 loops=1)
-  Index Scan using pk_ve_edil_rendite on ve_edil_rendite  
(cost=0.00..86.84 rows=1157 width=11) (actual time=0.063..98.601 rows=1157 
loops=1)
  -  Materialize  (cost=508603077.17..508603077.18 rows=1 width=195) (actual 
time=73312.188..1462604.387 rows=32407 loops=1)
-  Nested Loop  (cost=506932259.90..508603077.17 rows=1 width=195) 
(actual time=73312.174..1462385.266 rows=32407 loops=1)
  -  Merge Join  (cost=406932259.90..408603074.89 rows=1 
width=188) (actual time=73312.123..1461834.776 rows=32407 loops=1)
Merge Cond: (domande.id_domanda = c_elaout_7.id_domanda)
-  Merge Join  (cost=406932259.90..408188339.97 rows=1 
width=240) (actual time=72975.426..1458427.886 rows=32407 loops=1)
  Merge Cond: (c_elaout_5.id_domanda = 
domande.id_domanda)
  -  Merge Join  (cost=3895.15..1259628.81 rows=138561 
width=41) (actual time=1721.643..7493.711 rows=99308 loops=1)
Merge Cond: (edil_veneto.id_domanda = 
c_elaout_5.id_domanda)
-  Merge Join  (cost=1123.18..372710.75 
rows=98122 width=29) (actual time=569.693..4135.019 rows=99308 loops=1)
  Merge Cond: (edil_veneto.id_domanda = 
c_elaout_6.id_domanda)
  -  Index Scan using IDX_pk_Edil_Veneto 
on edil_veneto  (cost=0.00..11825.14 rows=232649 width=17) (actual 
time=0.080..1157.486 rows=232471 loops=1)
  -  Index Scan using IDX_3_c_elaout on 
c_elaout c_elaout_6  (cost=0.00..359914.34 rows=98122 width=12) (actual 
time=0.094..1900.373 rows=99308 loops=1)
Index Cond: 
((c_elaout_6.node)::text = 'contributo_sociale'::text)
-  Index Scan using IDX_3_c_elaout on 
c_elaout c_elaout_5  (cost=0.00..887091.20 rows=245306 width=12) (actual 
time=0.120..2389.615 rows=250746 loops=1)
  Index Cond: ((c_elaout_5.node)::text = 
'contributo'::text)
  -  Materialize  (cost=406928364.74..406928364.75 
rows=1 width=199) (actual time=69623.122..1450472.706 rows=32407 loops=1)
-  Nested Loop  
(cost=402583154.89..406928364.74 rows=1 width=199) (actual 
time=69623.107..1450215.911 rows=32407 loops=1)
  Join Filter: ((r_enti.codice_ente)::text 
= (r_luoghi.cod_catastale)::text)
  -  Merge Join  
(cost=202583154.89..206928031.60 rows=1 width=198) (actual 
time=69611.258..115367.182 rows=32407 loops=1)
Merge Cond: (domande.id_domanda = 
c_elaout_4.id_domanda)
-  Merge Join  
(cost=202583154.89..206425374.54 rows=1 width=186) (actual 
time=69007.657..113053.726 rows=32407 loops=1)
  Merge Cond: 
(domande.id_domanda = c_elain_3.id_domanda)
  -  Merge Join  
(cost=201328203.80..205170407.27 rows=41 width=138) (actual 
time=66160.710..100104.342 rows=32407 loops=1)
Merge Cond: 
(domande.id_domanda = c_elain_7.id_domanda)
-  Merge Join  
(cost=201328203.80..204498966.35 rows=93 width=126) (actual 
time=56792.251..72298.070 rows=32407 loops=1)
  Merge Cond: 
(domande.id_domanda = c_elain_9.id_domanda)
  -  Merge Join  
(cost=201322293.83..203828121.81 rows=424 width=114) (actual 
time=47349.082..55619.999 rows=32407 loops=1)
Merge Cond: 

Re: [PERFORM] performance with query (OT)

2009-06-17 Thread Albe Laurenz
Alberto Dalmaso wrote:
[...]
 in the explanation I'll see that the db use nasted loop.
[...]

Sorry for the remark off topic, but I *love* the term
nasted loop. It should not go to oblivion unnoticed.

Yours,
Laurenz Albe

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance with query

2009-06-17 Thread Kevin Grittner
Alberto Dalmaso dalm...@clesius.it wrote: 
 Ok, here are the last rows for the vacuum analyze verbose
 
 INFO:  free space map contains 154679 pages in 39 relations
 DETAIL:  A total of 126176 page slots are in use (including
 overhead).
 126176 page slots are required to track all free space.
 Current limits are:  16 page slots, 5000 relations, using 1476
? kB.
 
No indication of bloat there.  You could afford to free some RAM by
reducing the max_fsm_relations setting.  (You have 39 relations but
are reserving RAM to keep track of free space in 5000 relations.)
 
 and I attach the complete explain analyze of the complex query.
 
I'll see what I can glean from that when I get some time.
 
 All the where clausole are on indexed colums (perhaps there are too
 many indexes...)
 
That's not usually a problem.
 
The other thing I was hoping to see, which I don't think you've sent,
is an EXPLAIN ANALYZE of the same query with the settings which you
have found which cause it to pick a faster plan.  As I understand it,
that runs pretty fast, so hopefully that's a quick one for you to
produce.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance with query

2009-06-17 Thread Alberto Dalmaso
That what i send is the quick execution, with other parameters this
query simply doesn't come to an end.
It is the little query that changing the settings (using the default
with all the query analyzer on) becames really quick, while with this
settings (with some analyzer switched off) became very slow.

I don't belleve: using this settings

set enable_hashjoin = 'on';
set enable_nestloop = 'on';
set enable_seqscan = 'on';
set enable_sort = 'on';


set from_collapse_limit = 8;
set join_collapse_limit = 3;  


select * from v_fsa_2007_estrazione;
finnally end in acceptable time (156 sec)
what does it mean using join_collapse_limit = 3 (that is really a lot of
object less that what i'm using in taht query).

I'm executing an explain analyze in this new situation...
It is possible that such a configuration can create performance problem
on other queryes? (join_collapse_limit is set to a really low value)

I'll made some test in this direction.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance with query

2009-06-17 Thread Kevin Grittner
Alberto Dalmaso dalm...@clesius.it wrote: 
 
 what does it mean using join_collapse_limit = 3
 
http://www.postgresql.org/docs/8.3/interactive/runtime-config-query.html#RUNTIME-CONFIG-QUERY-OTHER
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] performance with query

2009-06-16 Thread Alberto Dalmaso
Hi everybody, I'm creating my database on postgres and after some days
of hard work I'm arrived to obtain good performance and owfull
performace with the same configuration.
I have complex query that perform very well with mergejoin on and
nestloop off.
If I activate nestloop postgres try to use it and the query execution
become inconclusive: after 3 hours still no answare so I kill the query.
Tht's ok but, with this configuration, very simple and little query like
slect colum from table where primarykey=value bacome incredibly slow.
The only solutionI found at the momento is to set mergejoin to off
before doing this query.
That is an awfull solution because with that solution I have to change
all the software (a big, old software) in the (many) points in witch
this kind of query are used (the same problem to set to off mergejoin
for all the system and activate it on che connection that have to make
the hard query).
Do you have any suggestion to accelerate both complex and silply query?
I've tried a lot of configuration in enabling different  Planner Method
Configuration but the only combination that really accelerate hard
query is mergejoin on and nestloop off, other settings seems to be
useless.
Thank's in advance.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance with query

2009-06-16 Thread Joshua Tolley
On Tue, Jun 16, 2009 at 03:37:42PM +0200, Alberto Dalmaso wrote:
 Hi everybody, I'm creating my database on postgres and after some days
 of hard work I'm arrived to obtain good performance and owfull
 performace with the same configuration.
 I have complex query that perform very well with mergejoin on and
 nestloop off.
 If I activate nestloop postgres try to use it and the query execution
 become inconclusive: after 3 hours still no answare so I kill the query.
 Tht's ok but, with this configuration, very simple and little query like
 slect colum from table where primarykey=value bacome incredibly slow.
 The only solutionI found at the momento is to set mergejoin to off
 before doing this query.
 That is an awfull solution because with that solution I have to change
 all the software (a big, old software) in the (many) points in witch
 this kind of query are used (the same problem to set to off mergejoin
 for all the system and activate it on che connection that have to make
 the hard query).
 Do you have any suggestion to accelerate both complex and silply query?
 I've tried a lot of configuration in enabling different  Planner Method
 Configuration but the only combination that really accelerate hard
 query is mergejoin on and nestloop off, other settings seems to be
 useless.
 Thank's in advance.

It would be helpful if you posted EXPLAIN ANALYZE results for both queries.
This will require you to run each query to completion; if that's not possible
for the 3 hour query, at least run EXPLAIN and post those results.

- Josh / eggyknap


signature.asc
Description: Digital signature


Re: [PERFORM] performance with query

2009-06-16 Thread Kevin Grittner
Alberto Dalmaso dalm...@clesius.it wrote: 
 
 I have complex query that perform very well with mergejoin on and
 nestloop off.
 If I activate nestloop postgres try to use it and the query
 execution become inconclusive: after 3 hours still no answare so I
 kill the query.
 Tht's ok but, with this configuration, very simple and little query
 like slect colum from table where primarykey=value bacome
 incredibly slow.
 The only solutionI found at the momento is to set mergejoin to off
 before doing this query.
 
We'll need  a lot more information to be able to provide useful
advice.
 
What version of PostgreSQL?
 
What OS?
 
What does the hardware look like?  (CPUs, drives, memory, etc.)
 
Do you have autovacuum running?  What other regular maintenance to you
do?
 
What does your postgresql.conf file look like?  (If you can strip out
all comments and show the rest, that would be great.)
 
With that as background, if you can show us the schema for the
table(s) involved and the text of a query, along with the EXPLAIN
ANALYZE output (or just EXPLAIN, if the query runs too long to get the
EXPLAIN ANALYZE results) that would allow us to wee where things are
going wrong.  Please show this information without setting any of the
optimizer options off; but then, as a diagnostic step, *also* show
EXPLAIN ANALYZE results when you set options to a configuration that
runs faster.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance with query

2009-06-16 Thread Alberto Dalmaso
 What version of PostgreSQL?
8.3 that comes with opensuse 11.1
  
 What OS?
Linux, opensuse 11.1 64 bit
  
 What does the hardware look like?  (CPUs, drives, memory, etc.)
2 * opteron dual core 8 GB RAM, 70 GB SCSI U320 RAID 1
  
 Do you have autovacuum running?  What other regular maintenance to you
 do?
YES, autovacuum and analyze are running, the only other activity is the
wal backup
  
 What does your postgresql.conf file look like?  (If you can strip out
 all comments and show the rest, that would be great.)

I'll post only the value I've changed

shared_buffers = 1536MB 
temp_buffers = 5MB  
max_prepared_transactions = 30  
  
work_mem = 50MB # I've lot of work in order by
maintenance_work_mem =50MB  
max_stack_depth = 6MB   

max_fsm_pages = 16  
max_fsm_relations = 5000

wal_buffers = 3072kB  

enable_bitmapscan = on
enable_hashagg = on
enable_hashjoin = off
enable_indexscan = on
enable_mergejoin = on
enable_nestloop = off
enable_seqscan = off
enable_sort = off
enable_tidscan = on


effective_cache_size = 3600MB

geqo = off
default_statistics_target = 100

  
 With that as background, if you can show us the schema for the
 table(s) involved and the text of a query, along with the EXPLAIN
 ANALYZE output (or just EXPLAIN, if the query runs too long to get the
 EXPLAIN ANALYZE results) that would allow us to wee where things are
 going wrong.  Please show this information without setting any of the
 optimizer options off; but then, as a diagnostic step, *also* show
 EXPLAIN ANALYZE results when you set options to a configuration that
 runs faster.
  
 -Kevin

The problem is that in the simply query it uses mergejoin instead of
nastedloop (obvious for the parameters I set) but in this situation in
becomes very very slow (15 sec vs 5 ms when I set to off mergejoin).

That is the explain of the complex query that works with more than
acceptable performance

Merge Right Join  (cost=508603077.17..508603195.59 rows=1 width=227)
  Merge Cond: (ve_edil_rendite.id_domanda = domande.id_domanda)
  -  GroupAggregate  (cost=0.00..105.51 rows=1031 width=11)
-  Index Scan using pk_ve_edil_rendite on ve_edil_rendite
(cost=0.00..86.84 rows=1157 width=11)
  -  Materialize  (cost=508603077.17..508603077.18 rows=1 width=195)
-  Nested Loop  (cost=506932259.90..508603077.17 rows=1
width=195)
  -  Merge Join  (cost=406932259.90..408603074.89 rows=1
width=188)
Merge Cond: (domande.id_domanda =
c_elaout_7.id_domanda)
-  Merge Join  (cost=406932259.90..408188339.97
rows=1 width=240)
  Merge Cond: (c_elaout_5.id_domanda =
domande.id_domanda)
  -  Merge Join  (cost=3895.15..1259628.81
rows=138561 width=41)
Merge Cond: (edil_veneto.id_domanda =
c_elaout_5.id_domanda)
-  Merge Join
(cost=1123.18..372710.75 rows=98122 width=29)
  Merge Cond:
(edil_veneto.id_domanda = c_elaout_6.id_domanda)
  -  Index Scan using
IDX_pk_Edil_Veneto on edil_veneto  (cost=0.00..11825.14 rows=232649
width=17)
  -  Index Scan using
IDX_3_c_elaout on c_elaout c_elaout_6  (cost=0.00..359914.34
rows=98122 width=12)
Index Cond:
((c_elaout_6.node)::text = 'contributo_sociale'::text)
-  Index Scan using IDX_3_c_elaout
on c_elaout c_elaout_5  (cost=0.00..887091.20 rows=245306 width=12)
  Index Cond:
((c_elaout_5.node)::text = 'contributo'::text)
  -  Materialize
(cost=406928364.74..406928364.75 rows=1 width=199)
-  Nested Loop
(cost=402583154.89..406928364.74 rows=1 width=199)
  Join Filter:
((r_enti.codice_ente)::text = (r_luoghi.cod_catastale)::text)
  -  Merge Join
(cost=202583154.89..206928031.60 rows=1 width=198)
Merge Cond:
(domande.id_domanda = c_elaout_4.id_domanda)
-  Merge Join
(cost=202583154.89..206425374.54 rows=1 width=186)
  Merge Cond:
(domande.id_domanda = c_elain_3.id_domanda)
  -  Merge Join
(cost=201328203.80..205170407.27 rows=41 width=138)
Merge Cond:
(domande.id_domanda = c_elain_7.id_domanda)
-  Merge Join
(cost=201328203.80..204498966.35 rows=93 width=126)
  Merge
Cond: (domande.id_domanda = 

Re: [PERFORM] performance with query

2009-06-16 Thread Matthew Wakeling

On Tue, 16 Jun 2009, Alberto Dalmaso wrote:

What does your postgresql.conf file look like?



enable_hashjoin = off
enable_nestloop = off
enable_seqscan = off
enable_sort = off


Why are these switched off?


and that is the explain of the too slow simple query

Merge Join  (cost=0.00..1032305.52 rows=4 width=12)
  Merge Cond: (domande.id_dichiarazione = c_elaout.id_domanda)
  -  Index Scan using IDX_8_domande on domande  (cost=0.00..8.39
rows=1 width=4)
Index Cond: (id_domanda = 4165757)
  -  Index Scan using IDX_2_c_elaout on c_elaout
(cost=0.00..1030283.89 rows=805279 width=12)
Filter: ((c_elaout.node)::text = 'Invalido'::text)

this cost  15 sec


with mergejoin to off:

Nested Loop  (cost=1.00..10022.97 rows=4 width=12)
  -  Index Scan using IDX_8_domande on domande  (cost=0.00..8.39
rows=1 width=4)
Index Cond: (id_domanda = 4165757)
  -  Index Scan using IDX_2_c_elaout on c_elaout  (cost=0.00..14.54
rows=4 width=12)
Index Cond: (c_elaout.id_domanda = domande.id_dichiarazione)
Filter: ((c_elaout.node)::text = 'Invalido'::text)

this cost 15 msec!!!


Well duh. What you're effectively doing is telling Postgres to NEVER use a 
nested loop. Then you're getting upset because it isn't using a nested 
loop. When you tell it to NEVER use anything (switching all join 
algorithms off), it ignores you and chooses the right plan anyway.


Matthew

--
You can configure Windows, but don't ask me how.   -- Bill Gates

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance with query

2009-06-16 Thread Kevin Grittner
Alberto Dalmaso dalm...@clesius.it wrote: 
 
  What version of PostgreSQL?
 8.3 that comes with opensuse 11.1
 
Could you show us the result of SELECT version(); ?
 
 max_prepared_transactions = 30
 
Unless you're using distributed transactions or need a lot of locks,
that's just going to waste some RAM.  Zero is fine for most people.
 
 maintenance_work_mem =50MB
 
That's a little small -- this only comes into play for maintenance
tasks like index builds.  Not directly part of your reported problem,
but maybe something to bump to the 1GB range.
 
 max_fsm_pages = 16
 max_fsm_relations = 5000
 
Have you done any VACUUM VERBOSE lately and captured the output?  If
so, what do the last few lines say?  (That's a lot of relations for
the number of pages; just curious how it maps to actual.)
 
 enable_hashjoin = off
 enable_nestloop = off
 enable_seqscan = off
 enable_sort = off
 
That's probably a bad idea.  If particular queries aren't performing
well, you can always set these temporarily on a particular connection.
Even then, turning these off is rarely a good idea except for
diagnostic purposes.  I *strongly* recommend you put all of these back
to the defaults of 'on' and start from there, turning off selected
items as needed to get EXPLAIN ANALYZE output to demonstrate the
better plans you've found for particular queries.
 
 effective_cache_size = 3600MB
 
That seems a little on the low side for an 8GB machine, unless you
have other things on there using a lot of RAM.  Do you?
 
If you could set the optimizer options back on and get new plans where
you show specifically which options (if any) where turned off for the
run, that would be good.  Also, please attach the plans to the email
instead of pasting -- the word wrap makes them hard to read.  Finally,
if you could do \d on the tables involved in the query, it would help.
I'll hold off looking at these in hopes that you can do the above.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance with query

2009-06-16 Thread Alberto Dalmaso
Il giorno mar, 16/06/2009 alle 15.58 +0100, Matthew Wakeling ha scritto:
 On Tue, 16 Jun 2009, Alberto Dalmaso wrote:
  What does your postgresql.conf file look like?
 
  enable_hashjoin = off
  enable_nestloop = off
  enable_seqscan = off
  enable_sort = off
 
 Why are these switched off?
 
because of the need to pump up the performance of the complex query. If
I set then to on then it try to use nasted loop even in the complex
query and that query does never arrive to a response and, of course,
I need a response from it!!!
So my problem is to find a configuration taht save performance for all
the two kind of query, but I'm not abble to find it.
Move to parameters of the RAM can save a 10% of the time in the complex
query, wile I have no changes on the simple one...


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance with query

2009-06-16 Thread Kevin Grittner
Alberto Dalmaso dalm...@clesius.it wrote: 
 Il giorno mar, 16/06/2009 alle 15.58 +0100, Matthew Wakeling ha
 scritto:
 On Tue, 16 Jun 2009, Alberto Dalmaso wrote:
 
  enable_hashjoin = off
  enable_nestloop = off
  enable_seqscan = off
  enable_sort = off
 
 Why are these switched off?
 
 because of the need to pump up the performance of the complex query.
 
These really are meant primarily for diagnostic purposes.  As a last
resort, you could set them off right before running a problem query,
and set them back on again afterward; but you will be much better off
if you can cure the underlying problem.  The best chance of that is to
show us the plan you get with all turned on.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance with query

2009-06-16 Thread Alberto Dalmaso
 Could you show us the result of SELECT version(); ?
of course I can 
PostgreSQL 8.3.7 on x86_64-unknown-linux-gnu, compiled by GCC gcc (SUSE
Linux) 4.3.2 [gcc-4_3-branch revision 141291]
  
 Have you done any VACUUM VERBOSE lately and captured the output?  If
 so, what do the last few lines say?  (That's a lot of relations for
 the number of pages; just curious how it maps to actual.)
It need a lot of time (20 GB database), when I will have the answare
I'll post it
  
  enable_hashjoin = off
  enable_nestloop = off
  enable_seqscan = off
  enable_sort = off
  
 That's probably a bad idea.  If particular queries aren't performing
 well, you can always set these temporarily on a particular connection.
 Even then, turning these off is rarely a good idea except for
 diagnostic purposes.  I *strongly* recommend you put all of these back
 to the defaults of 'on' and start from there, turning off selected
 items as needed to get EXPLAIN ANALYZE output to demonstrate the
 better plans you've found for particular queries.

OK, it will became the viceversa of what I'm doing now (set them to on
and set them to off only on the appropriate connection instead of set
them to off and set them to on only on some appropriate connection).
But the question is: do you thing it is impossible to find a
configuration that works fine for both the kind of query? The
application have to run even versus oracle db... i wont have to write a
different source for the two database...

  
  effective_cache_size = 3600MB
  
 That seems a little on the low side for an 8GB machine, unless you
 have other things on there using a lot of RAM.  Do you?
yes there are two instances of postgress running on the same server (the
database have to stay complitely separated).
  
 If you could set the optimizer options back on and get new plans where
 you show specifically which options (if any) where turned off for the
 run, that would be good.  Also, please attach the plans to the email
 instead of pasting -- the word wrap makes them hard to read.  Finally,
 if you could do \d on the tables involved in the query, it would help.
 I'll hold off looking at these in hopes that you can do the above.
  
 -Kevin
I attach the explanation of the log query after setting all the enable
to on. In this condition the query will never finish...
QUERY PLAN
Nested Loop Left Join  (cost=283253.73..417552.90 rows=1 width=227)
  Join Filter: (ve_edil_rendite.id_domanda = domande.id_domanda)
  -  Nested Loop  (cost=283222.38..417485.46 rows=1 width=195)
-  Nested Loop  (cost=283222.38..417470.91 rows=1 width=247)
  -  Nested Loop  (cost=283222.38..417456.36 rows=1 width=235)
Join Filter: (edil_veneto.id_tp_superficie = 
ve_edil_tp_superfici.id_tp_superficie)
-  Nested Loop  (cost=283222.38..417455.29 rows=1 
width=228)
  -  Nested Loop  (cost=283222.38..417440.75 rows=1 
width=216)
Join Filter: ((r_enti.codice_ente)::text = 
(r_luoghi.cod_catastale)::text)
-  Nested Loop  (cost=283222.38..417107.61 
rows=1 width=215)
  -  Nested Loop  
(cost=283222.38..417093.06 rows=1 width=203)
-  Nested Loop  
(cost=283222.38..417084.75 rows=1 width=186)
  -  Nested Loop  
(cost=283222.38..417070.20 rows=1 width=174)
-  Nested Loop  
(cost=283222.38..417057.25 rows=1 width=162)
  -  Nested Loop  
(cost=283222.38..417044.33 rows=1 width=150)
-  Nested 
Loop  (cost=283222.38..417031.40 rows=1 width=138)
  -  
Nested Loop  (cost=283222.38..417016.85 rows=1 width=126)

-  Nested Loop  (cost=283222.38..416343.12 rows=44 width=114)

  -  Merge Join  (cost=283222.38..384803.94 rows=2431 width=102)

Merge Cond: (componenti.id_dichiarazione = domande.id_dichiarazione)

-  GroupAggregate  (cost=0.00..94032.39 rows=601009 width=12)

  -  Index Scan using IDX_1_componenti on componenti  
(cost=0.00..76403.45 rows=2023265 width=12)

-  Sort  (cost=283222.38..283223.41 rows=412 width=102)
 

Re: [PERFORM] performance with query

2009-06-16 Thread Kevin Grittner
Alberto Dalmaso dalm...@clesius.it wrote:
 
 do you thing it is impossible to find a
 configuration that works fine for both the kind of query?
 
No.  We probably just need a little more information.
 
 The application have to run even versus oracle db... i wont have to
 write a different source for the two database...
 
I understand completely.
 
 I attach the explanation of the log query after setting all the
 enable to on. In this condition the query will never finish...
 
We're getting close.  Can you share the table structure and the actual
query you are running?  It's a lot easier (for me, anyway) to put this
puzzle together with all the pieces in hand.
 
Also, if you can set off some of the optimizer options and get a fast
plan, please show us an EXPLAIN ANALYZE for that, with information on
which settings were turned off.  That will help show where bad
estimates may be causing a problem, or possibly give a hint of table
or index bloat problems.
 
I think we're getting close to critical mass for seeing the
solution
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance with query

2009-06-16 Thread Alberto Dalmaso
Il giorno mar, 16/06/2009 alle 11.31 -0400, Tom Lane ha scritto:
 Alberto Dalmaso dalm...@clesius.it writes:
  Il giorno mar, 16/06/2009 alle 15.58 +0100, Matthew Wakeling ha scritto:
  enable_hashjoin = off
  enable_nestloop = off
  enable_seqscan = off
  enable_sort = off
  
  Why are these switched off?
  
  because of the need to pump up the performance of the complex query.
 
 That is *not* the way to improve performance of a query.  Turning off
 specific enable_ parameters can be helpful while investigating planner
 behavior, but it is never recommended as a production solution.  You
 have already found out why.
 
   regards, tom lane
Ok, but the problem is that my very long query performes quite well when
it works with merge join but it cannot arrive to an end if it use other
kind of joining.
If i put all the parameter to on, as both of you tell me, in the
explanation I'll see that the db use nasted loop.
If i put to off nasted loop, it will use hash join.
How can I write the query so that the analyzer will use mergejoin (that
is the only option that permit the query to give me the waited answare)
without changing the settings every time on the connection?


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance with query

2009-06-16 Thread Kevin Grittner
Alberto Dalmaso dalm...@clesius.it wrote: 
 
 I attach the explanation of the log query after setting all the
 enable to on. In this condition the query will never finish...
 
I notice that you many joins in there.  If the query can't be
simplified, you probably need to boost the join_collapse_limit and
from_collapse_limit quite a bit.  If planning time goes through the
roof in that case, you may need to enable geqo -- this is what it's
intended to help.  If you try geqo, you may need to tune it; I'm not
familiar with the knobs for tuning that, so maybe someone else will
jump in if you get to that point.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance with query

2009-06-16 Thread Tom Lane
Alberto Dalmaso dalm...@clesius.it writes:
 Ok, but the problem is that my very long query performes quite well when
 it works with merge join but it cannot arrive to an end if it use other
 kind of joining.
 If i put all the parameter to on, as both of you tell me, in the
 explanation I'll see that the db use nasted loop.
 If i put to off nasted loop, it will use hash join.
 How can I write the query so that the analyzer will use mergejoin (that
 is the only option that permit the query to give me the waited answare)
 without changing the settings every time on the connection?

You have the wrong mindset completely.  Instead of thinking how can I
force the planner to do it my way, you need to be thinking why is the
planner guessing wrong about which is the best way to do it?  And how
can I improve its guess?

There's not really enough information in what you've posted so far to
let people help you with that question, but one thing that strikes me
from the EXPLAIN is that you have a remarkably large number of joins.
Perhaps increasing from_collapse_limit and/or join_collapse_limit
(to more than the number of tables in the query) would help.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance with query

2009-06-16 Thread Alberto Dalmaso
Unfortunatly the query need that level of complxity as the information I
have to show are spread around different table.
I have tryed the geqo on at the beginning but only with the default
parameters.
Tomorrow (my working day here in Italy is finished some minutes ago, so
I will wait for the end of the explain analyze and the go home ;-P )
I'll try to increase, as you suggest, join_collapse_limit and
from_collapse_limit.
If someone can give me some information on how to configure geqo, I'll
try it again.
In the meantime this night I leave the vacuum verbose to work for me.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance with query

2009-06-16 Thread Alberto Dalmaso
Even if the query end in aproximately 200 sec, the explain analyze is
still working and there are gone more than 1000 sec...
I leave it working this night.
Have a nice evening and thenks for the help.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] performance on query

2005-10-26 Thread Sidar López Cruz

I DON'T KNOW WHAT TO DO WITH THIS QUERYS...
Comparation with sql server, sql server wins !!!


Table sizes:
archivos: 40MB
fotos: 55MB

select count(1) from fotos f where not exists (select a.archivo from 
archivos a where a.archivo=f.archivo)

173713 ms.
110217 ms.
83122 ms.

select count(*) from
(
select archivo from fotos
except
select archivo from archivos
) x;
201479 ms.

SELECT count(*)
FROM fotos f
LEFT JOIN archivos a USING(archivo)
WHERE a.archivo IS NULL
199523 ms.

_
MSN Amor: busca tu ½ naranja http://latam.msn.com/amor/


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] performance on query

2005-10-26 Thread Jim C. Nasby
So the issue is that instead of taking 174 seconds the query now takes
201?

I'm guessing that SQL server might be using index covering, but that's
just a guess. Posting query plans (prefferably with actual timing info;
EXPLAIN ANALYZE on PostgreSQL and whatever the equivalent would be for
MSSQL) might give us some idea.

On Wed, Oct 26, 2005 at 05:47:31PM -0600, Sidar L?pez Cruz wrote:
 I DON'T KNOW WHAT TO DO WITH THIS QUERYS...
 Comparation with sql server, sql server wins !!!
 
 
 Table sizes:
 archivos: 40MB
 fotos: 55MB
 
 select count(1) from fotos f where not exists (select a.archivo from 
 archivos a where a.archivo=f.archivo)
 173713 ms.
 110217 ms.
 83122 ms.
 
 select count(*) from
 (
   select archivo from fotos
   except
   select archivo from archivos
 ) x;
 201479 ms.
 
 SELECT count(*)
 FROM fotos f
 LEFT JOIN archivos a USING(archivo)
 WHERE a.archivo IS NULL
 199523 ms.
 
 _
 MSN Amor: busca tu ? naranja http://latam.msn.com/amor/
 
 
 ---(end of broadcast)---
 TIP 2: Don't 'kill -9' the postmaster
 

-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings