Re: [PERFORM] PostgreSQL runs a query much slower than BDE and

2006-10-02 Thread Simon Riggs
On Thu, 2006-08-17 at 14:33 -0400, Tom Lane wrote:

 There's a more interesting issue, which I'm afraid we do not have time
 to fix for PG 8.2.  The crux of the matter is that given
 
 SELECT ...
 FROM SHEEP_FLOCK f1 JOIN 
 (SELECT f.regn_no, MAX(f.transfer_date) as last_xfer_date
 FROM  SHEEP_FLOCK f
 GROUP BY f.regn_no) f2 
 ON f1.regn_no = f2.regn_no
 AND f1.transfer_date = f2.last_xfer_date
 
 if there is an index on (regn_no, transfer_date) then the planner could
 in principle do a double-column merge join between an indexscan on this
 index and the output of a GroupAggregate implementation of the subquery.
 The GroupAggregate plan would in fact only be sorting on regn_no, so
 it's not immediately obvious why this is OK.  The reason is that there
 is only one MAX() value for any particular regn_no, and so the sort
 condition that the last_xfer_date values be in order for any one value
 of regn_no is vacuous.  We could consider the subquery's output to be
 sorted by *any* list of the form regn_no, other-stuff.
 
 The planner's notion of matching pathkey lists to determine sortedness
 is not at all capable of dealing with this.  After a little bit of
 thought I'm tempted to propose that we add a concept that a particular
 pathkey list is unique, meaning that it is known to include a unique
 key for the data.  Such a key would match, for sortedness purposes,
 any requested sort ordering consisting of its columns followed by
 others.  In the above example, we would know that a GROUP BY implemented
 by GroupAggregate yields an output for which the grouping columns
 are a unique sort key.
 
 I imagine this concept is already known in the database research
 literature; anyone recognize it and know a standard name for it?

(catching up on some earlier mails)

Not seen any particular name for that around. There are quite a few
places in the optimizer, IIRC, that could use the concept of uniqueness
if it existed.

I would note that the above query plan is similar-ish to the one you'd
get if you tried to push down the GROUP BY from the top of a join. So
the uniqueness information sounds like an important precursor to that.

I've just rechecked out the lit I was reading on this earlier this year:
http://portal.acm.org/ft_gateway.cfm?id=233320type=pdfcoll=dl=acmCFID=15151515CFTOKEN=6184618#search=%22db2%20order%20optimization%20tpc-d%22
Fundamental Techniques for Order Optimization Simmen et al

Also, IIRC, there was some work talking about extending the Interesting
Order concept to allow groupings to be noted also.

From our work on sorting earlier, we had it that a Merge Join will
always require a Mark/Restore operation on its sorted inputs. If the
Outer input is unique then a Restore operation will never be required,
so the Mark can be avoided also and thus the materialization of the sort
can also be avoided. So some way of telling the MJ node that the sort
order is also unique would be very useful.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[PERFORM] How much memory in 32 bits Architecture to Shared Buffers is Possible

2006-10-02 Thread Marcelo Costa

Hi, to all!

Recently i try increasing the memory values of shared buffers on one
IBM xseries 255 (Quad XEON 2.8, 8 GB RAM, 2 disk SCSI 36 GB(Raid 1), 1
Storage.

I try change these shared memory values to use 25% of memory ram (2048
MB) and effective_cache_size to 50% (4096 MB) of memory. All this
settings to 220 Max Connections.

Where I start up the cluster very messages of configurations errors on
shared_memmory and SHMMAX look up. I try change the values of
shared_memory, max_connections and effective_cache_size and large the
size of SHMALL and SHMMAX to use 4294967296 (4096 MB)  but the cluster
don't start.

Only with 15% of value on shared memory i can start up this cluster.
In my tests the maximum value who i can put is 1.9 GB, more of this
the cluster don't start.

Can anybody help me and explicate if exist one limit to memory on 32
bits Architecture.

Anybody was experience with tuning servers with this configurations
and increasing ?

thanks to all.



--
Marcelo Costa

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] any hope for my big query?

2006-10-02 Thread Shaun Thomas
On Thursday 28 September 2006 17:18, Ben wrote:

 explain select
   distinct public.album.id
 from
   public.album,public.albumjoin,public.track,umdb.node
 where
   node.dir=2811
   and albumjoin.album = public.album.id
   and public.albumjoin.track = public.track.id
   and levenshtein(substring(public.track.name for 75),
   substring(node.file for 75)) = 10
   and public.album.id in
   (select album from albumjoin group by album having count(*) 
 between 
15 and 25) 
 group by public.album.id
 having count(*) = 5;

If I'm reading this right, you want all the albums with 15-25 entries in 
album join having 5 or more tracks that are (soundex type) similar to 
other nodes.  Knowing that, you can also try something like this:

select a.album
  from (select album,track from albumjoin group by album having count(1) 
between 15 and 25) a
  join public.track t on (a.track = t.id)
  join umdb.node n on (levenshtein(substring(t.name for 75), 
substring(n.file for 75))  9)
 where n.dir = 2811
 group by a.album
 having count(1)  4;

This removes two of your tables, since you were only interested in 
albums with 15-25 albumjoins, and weren't actually using any album data 
other than the ID, which albumjoin supplies.  Your subselect is now an 
integral part of the whole query, being treated like a temp table that 
only supplies album IDs with 15-25 albumjoins.  From there, add on the 
track information, and use that to restrict the matching nodes.  Your 
explain should be better with the above.

Just remember with the levenshtein in there, you're forcing a sequence 
scan on the node table.  Depending on how big that table is, you may 
not be able to truly optimize this.

-- 

Shaun Thomas
Database Administrator

Leapfrog Online 
807 Greenwood Street 
Evanston, IL 60201 
Tel. 847-440-8253
Fax. 847-570-5750
www.leapfrogonline.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Optimizing queries

2006-10-02 Thread Simon Riggs
On Tue, 2006-08-08 at 16:42 -0400, Tom Lane wrote:
 Patrice Beliveau [EMAIL PROTECTED] writes:
  SELECT * FROM TABLE
  WHERE TABLE.COLUMN1=something
  AND TABLE.COLUMN2=somethingelse
  AND function(TABLE.COLUMN3,TABLE.COLUMN4)  0;
 
  I find out that the function process every row even if the row should be 
  rejected as per the first or the second condition.
  ... I'm using version 8.1.3
 
 PG 8.1 will not reorder WHERE clauses for a single table unless it has
 some specific reason to do so (and AFAICT no version back to 7.0 or so
 has done so either...)  So there's something you are not telling us that
 is relevant.  Let's see the exact table schema (psql \d output is good),
 the exact query, and EXPLAIN output for that query.

Is WHERE clause re-ordering done for 8.2, or is that still a TODO item?
(Don't remember seeing that at all).

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] archive wal's failure and load increase.

2006-10-02 Thread Simon Riggs
On Fri, 2006-09-29 at 11:55 -0400, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  PreallocXlogFiles() adds only a *single* xlog file, sometimes.
 
 Hm, you are right.  I wonder why it's so unaggressive ... perhaps
 because under normal circumstances we soon settle into a steady
 state where each checkpoint recycles the right number of files.

That is normally the case, yes. But only for people that have correctly
judged (or massively overestimated) what checkpoint_segments should be
set at.

Currently, when we don't have enough we add one, maybe. When we have too
many we truncate right back to checkpoint_segments as quickly as
possible.

Seems like we should try and automate that completely for 8.3:
- calculate the number required by keeping a running average which
ignores a single peak value, yet takes 5 consistently high values as the
new average
- add more segments with increasing aggressiveness 1,1,2,3,5,8 segments
at a time when required 
- handle out-of-space errors fairly gracefully by waking up the
archiver, complaining to the logs and then eventually preventing
transactions from writing to logs rather than taking server down
- shrink back more slowly by halving the difference between the
overlimit and the typical value
- get rid of checkpoint_segments GUC

That should handle peaks caused by data loads, archiving interruptions
or other peak loadings.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] How much memory in 32 bits Architecture to Shared Buffers

2006-10-02 Thread Joshua D. Drake
Marcelo Costa wrote:
 Hi, to all!
 
 Recently i try increasing the memory values of shared buffers on one
 IBM xseries 255 (Quad XEON 2.8, 8 GB RAM, 2 disk SCSI 36 GB(Raid 1), 1
 Storage.

You haven't specified your OS so I am going to assume Linux.

 Where I start up the cluster very messages of configurations errors on
 shared_memmory and SHMMAX look up. I try change the values of
 shared_memory, max_connections and effective_cache_size and large the
 size of SHMALL and SHMMAX to use 4294967296 (4096 MB)  but the cluster
 don't start.

You have to edit your sysctl.conf see:

http://www.postgresql.org/docs/8.1/static/runtime.html

I *think* (I would have to double check) the limit for shared memory on
linux 32bit is 2 gig. Possibly 2 gig per CPU I don't recall. I run all
64bit now.

Sincerely,

Joshua D. Drake



-- 

   === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
   Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] How much memory in 32 bits Architecture to Shared Buffers is Possible

2006-10-02 Thread Marcelo Costa
Yes, my system is DEBIAN SARGE 3.0thanks,Marcelo2006/10/2, Joshua D. Drake [EMAIL PROTECTED]:
Marcelo Costa wrote: Hi, to all! Recently i try increasing the memory values of shared buffers on one IBM xseries 255 (Quad XEON 2.8, 8 GB RAM, 2 disk SCSI 36 GB(Raid 1), 1 Storage.
You haven't specified your OS so I am going to assume Linux. Where I start up the cluster very messages of configurations errors on shared_memmory and SHMMAX look up. I try change the values of
 shared_memory, max_connections and effective_cache_size and large the size of SHMALL and SHMMAX to use 4294967296 (4096 MB)but the cluster don't start.You have to edit your sysctl.conf see:
http://www.postgresql.org/docs/8.1/static/runtime.htmlI *think* (I would have to double check) the limit for shared memory onlinux 32bit is 2 gig. Possibly 2 gig per CPU I don't recall. I run all
64bit now.Sincerely,Joshua D. Drake-- === The PostgreSQL Company: Command Prompt, Inc. ===Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensivePostgreSQL solutions since 1997
 http://www.commandprompt.com/-- Marcelo Costa


[PERFORM] Unsubscribe

2006-10-02 Thread uwcssa


Please unsubscribe me! Thank you! 


Also, it would be better to have a message foot saying how to unsubscribe.


Re: [PERFORM] Table not getting vaccumed.

2006-10-02 Thread Jim C. Nasby
On Sat, Sep 30, 2006 at 02:55:54PM +0530, Nimesh Satam wrote:
 I am trying to vaccum one of the table using the following command:
 
 VACUUM FULL ANALYZE VERBOSE table_name;

Are you sure you want to do a vacuum full? Normally, that shouldn't be
required.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] selecting data from information_schema.columns performance.

2006-10-02 Thread Jim C. Nasby
On Sun, Oct 01, 2006 at 11:01:19PM -0400, Tom Lane wrote:
 Steve Martin [EMAIL PROTECTED] writes:
  I am trying to determine if there is a way to improve the performance 
  when selecting data from the information_schema.columns view.
 
 In my experience, there isn't any single one of the information_schema
 views whose performance doesn't suck :-(.  Somebody should work on that
 sometime.  I haven't looked closely enough to determine where the
 bottlenecks are.

Looking at the newsysviews stuff should prove enlightening... 
AndrewSN spent a lot of time making sure those views are very
performant.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Unsubscribe

2006-10-02 Thread Markus Schaber
Hi, Uwcssa,

uwcssa wrote:
 
  Please unsubscribe me!  Thank you!   

Sorry, but we (the list members) are unable do that, we have no
adminstrative power on the list. :-(

  Also, it would be better to have a message foot saying how to unsubscribe.

List unsubscribe information is contained in the Headers of every mail
that's sent over the list:

List-Archive: http://archives.postgresql.org/pgsql-performance
List-Help: mailto:[EMAIL PROTECTED]
List-ID: pgsql-performance.postgresql.org
List-Owner: mailto:[EMAIL PROTECTED]
List-Post: mailto:pgsql-performance@postgresql.org
List-Subscribe:
mailto:[EMAIL PROTECTED]
List-Unsubscribe:
mailto:[EMAIL PROTECTED]

Additionally, there is a link to the unsubscribe web form at the list
archive page: http://archives.postgresql.org/pgsql-performance/

HTH,
Markus

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] selecting data from information_schema.columns

2006-10-02 Thread Steve Martin

Hi

Thanks for you replies.

Regarding, newsysviews, what is the current state, I have had a quick 
look at the pgFoundry site and the last updates were 9 months ago.


The most efficient way in the short term I can find to improve 
performance for our application is to create a table from 
information_schema.columns and update it when tables a created or 
deleted, or columns added or removed. E.g.


=  create table my_information_schema_columns as select * from 
information_schema.columns;
=  create index my_information_schema_columns_index 
on my_information_schema_columns (table_name);


Update table with the following statements:

When tables or columns are added:
=  insert into my_information_schema_columns select * from 
information_schema.columns

-  except select * from my_information_schema_columns;

When tables are removed, does not work for column changes:
=  delete from my_information_schema_columns
-  where table_name = (select table_name from my_information_schema_columns
-   except select table_name from information_schema.columns);

For column changes a script will need to be created, the following 
returns the rows to be deleted.  (Any alternative ideas?)
= select table_name, column_name, ordinal_position from 
my_information_schema_columns
- except select table_name, column_name, ordinal_position from 
information_schema.columns;



My problem now is how to get the update statements to be executed when a 
table is created or dropped, or columns are added or removed.  For our 
application, this is not very often.  My understanding is that triggers 
cannot be created for system tables, therefore the updates cannot be 
triggered when pg_tables is modified.  Also how to detect column changes 
is a problem.


Detecting when a table has been added is relatively easy and can be 
performed by our application, e.g. check my_information_schema_columns, 
if it does not exist, check information_schema.columns, if exist, run 
update statements.


A simple method would be to run a cron job to do the updates, but I 
would like to try to be a bit more intelligent about when  the update 
statements are executed.


Regards
Steve Martin


Jim C. Nasby wrote:


On Sun, Oct 01, 2006 at 11:01:19PM -0400, Tom Lane wrote:
 


Steve Martin [EMAIL PROTECTED] writes:
   

I am trying to determine if there is a way to improve the performance 
when selecting data from the information_schema.columns view.
 


In my experience, there isn't any single one of the information_schema
views whose performance doesn't suck :-(.  Somebody should work on that
sometime.  I haven't looked closely enough to determine where the
bottlenecks are.
   



Looking at the newsysviews stuff should prove enlightening... 
AndrewSN spent a lot of time making sure those views are very

performant.
 



--
  \\|// From near to far,
   @ @  from here to there,
   ---oOOo-(_)-oOOo---  funny things are everywhere.  (Dr. Seuss)



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] selecting data from information_schema.columns

2006-10-02 Thread Jim Nasby

On Oct 2, 2006, at 7:31 PM, Steve Martin wrote:
Regarding, newsysviews, what is the current state, I have had a  
quick look at the pgFoundry site and the last updates were 9 months  
ago.


Well, the system catalogs don't change terribly often, so it's not  
like a lot needs to be done. We'd hoped to get them into core, but  
that didn't pan out. Theoretically, we should be making the views  
look more like information_schema, but no one's gotten to it yet.


The most efficient way in the short term I can find to improve  
performance for our application is to create a table from  
information_schema.columns and update it when tables a created or  
deleted, or columns added or removed. E.g.


Well, there's nothing that says you have to use information_schema.  
You can always query the catalog tables directly. Even if you don't  
want to use newsysviews as-is, the code there should be very helpful  
for doing that.


There is no ability to put triggers on DDL, so the best you could do  
with your caching table is to just periodically update it.

--
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Performace Optimization for Dummies

2006-10-02 Thread Carlo Stonebanks
I have loaded three of the four cores by running three different versions of 
the import program to import three different segments of the table to 
import. The server jumps to 75% usage, with three postgresql processes 
eating up 25% each., the actual client itself taking up just a few ticks.

Heikki Linnakangas [EMAIL PROTECTED] wrote in message 
news:[EMAIL PROTECTED]
 Carlo Stonebanks wrote:
 We urgently need a major performance improvement. We are running the
 PostgreSQL 8.1.4 on a Windows 2003 x64 Server on a dual processor, dual 
 core
 3.2Ghz Xeon box with 4gb RAM and a RAID (sorry, I don't know what type) 
 disc
 subsystem. Sorry about the long intro, but here are my questions:

 Others have already drilled down to the way you do the inserts and 
 statistics etc., so I'll just point out:

 Are you fully utilizing all the 4 cores you have? Could you parallelize 
 the loading process, if you're currently running just one client? Are you 
 I/O bound or CPU bound?

 -- 
 Heikki Linnakangas
 EnterpriseDB http://www.enterprisedb.com

 ---(end of broadcast)---
 TIP 2: Don't 'kill -9' the postmaster
 



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Performace Optimization for Dummies

2006-10-02 Thread Carlo Stonebanks
 Did you think about putting the whole data into PostgreSQL using COPY in
 a nearly unprocessed manner, index it properly, and then use SQL and
 stored functions to transform the data inside the database to the
 desired result?

This is actually what we are doing. The slowness is on the row-by-row 
transformation. Every row reqauires that all the inserts and updates of the 
pvious row be committed - that's why we have problems figuring out how to 
use this using SQL set logic.

Carlo 



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Performace Optimization for Dummies

2006-10-02 Thread Carlo Stonebanks
 My experience with that type of load process is that doing this
 row-by-row is a very expensive approach and your results bear that out.

I expected this, and had warned the client before the project started that 
this is exactly where SQL underperforms.

 It is often better to write each step as an SQL statement that operates
 on a set of rows at one time.

The problem with this approach is that every row of data is dependent on the 
previous row's data being validated and imported. e.g.

Import Row 1:
John Q Smith
Foobar Corp
123 Main St,
Bigtown, MD 12345-6789

Import Row 2:
John Quincy Smith
FuzzyLoginc Inc
123 Main St, Suite 301
Bigtown, MD 12345-6789

Import Row 3:
Bobby Jones
Foobar Corp
123 Main Strett Suite 300,
Bigtown, MD 12345

Every row must be imported into the table so that the next row may see the 
data and consider it when assigning ID's to the name, company and address. 
(all data must be normalised) How can this be done using set logic?

 You can also improve performance by ordering your checks so that the
 ones most likely to fail happen first.

Already done - I believe the problem is definitely in the navigational 
access model. What I am doing now makes perfect sense as far as the logic of 
the process goes - any other developer will read it and understand what is 
going on. At 3000 lines of code, this will be tedious, but understandable. 
But SQL hates it.

 Trying to achieve a high level of data quality in one large project is
 not often possible. Focus on the most critical areas of checking and get
 that working first with acceptable performance, then layer on additional
 checks while tuning. The complexity of the load programs you have also
 means they are susceptible to introducing data quality problems rather
 than removing them, so an incremental approach will also aid debugging
 of the load suite.

I couldn't agree more.

Carlo



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Performace Optimization for Dummies

2006-10-02 Thread Carlo Stonebanks
 1. fork your import somhow to get all 4 cores running

This is already happening, albeit only 3. No improvement - it appears we 
have taken the same problem, and divided it by 3. Same projected completion 
time. this is really curious, to say the least.

 2. write the code that actually does the insert in C and use the
 parameterized prepared statement.

I had already tried the paremetrised prepare statement; I had mentioned that 
I was surprised that it had no effect. No one here seemed surprised, or at 
least didn't think of commenting on it.

 however, your general approach has been 'please give me advice, but
 only the advice that i want'.

I'm sorry I don't understand - I had actually originally come asking four 
questions asking for recommendations and opinions on hardware, O/S and 
commercial support. I did also ask for comments on my config setup.



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Forcing the use of particular execution plans

2006-10-02 Thread Tim Truman
Here is an explain analyze for the query that performs slowly, I hope this
helps unfortunately I can't reproduce the version of the query that ran
quickly and therefore can't provide and 'explain analyze' for it.

Aggregate  (cost=88256.32..88256.32 rows=1 width=0) (actual
time=55829.000..55829.000 rows=1 loops=1)
  -  Subquery Scan foobar  (cost=0.00..88256.23 rows=35 width=0) (actual
time=19235.000..55829.000 rows=24 loops=1)
-  Append  (cost=0.00..88255.88 rows=35 width=631) (actual
time=19235.000..55829.000 rows=24 loops=1)
  -  Subquery Scan *SELECT* 1  (cost=0.00..1165.12 rows=1
width=631) (actual time=16.000..16.000 rows=0 loops=1)
-  Nested Loop  (cost=0.00..1165.11 rows=1 width=631)
(actual time=16.000..16.000 rows=0 loops=1)
  -  Index Scan using ix_transaction_merchant_id
on transaction t  (cost=0.00..1159.98 rows=1 width=349) (actual
time=16.000..16.000 rows=0 loops=1)
Index Cond: (198 = merchant_id)
Filter: ((transaction_date =
'2005-01-01'::date) AND (transaction_date = '2006-09-25'::date) AND
((credit_card_no)::text ~~ '4564%549'::text))
  -  Index Scan using pk_merchant on merchant m
(cost=0.00..5.11 rows=1 width=282) (never executed)
Index Cond: (id = 198)
  -  Subquery Scan *SELECT* 2  (cost=20.90..87090.76 rows=34
width=631) (actual time=19219.000..55813.000 rows=24 loops=1)
-  Hash Join  (cost=20.90..87090.42 rows=34 width=631)
(actual time=19219.000..55813.000 rows=24 loops=1)
  Hash Cond: (outer.merchant_id = inner.id)
  -  Seq Scan on transaction t
(cost=0.00..87061.04 rows=1630 width=349) (actual time=234.000..55797.000
rows=200 loops=1)
Filter: ((transaction_date =
'2005-01-01'::date) AND (transaction_date = '2006-09-25'::date) AND
((credit_card_no)::text ~~ '4564%549'::text))
  -  Hash  (cost=20.88..20.88 rows=8 width=282)
(actual time=16.000..16.000 rows=0 loops=1)
-  Seq Scan on merchant m
(cost=0.00..20.88 rows=8 width=282) (actual time=0.000..16.000 rows=7
loops=1)
  Filter: (parent_merchant_id = 198)
Total runtime: 55829.000 ms

Once again any help much appreciated.

Tim

-Original Message-
From: Dave Dutcher [mailto:[EMAIL PROTECTED] 
Sent: Thursday, 28 September 2006 1:21 AM
To: 'Tim Truman'; pgsql-performance@postgresql.org
Subject: RE: [PERFORM] Forcing the use of particular execution plans

 -Original Message-
 From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of  Tim Truman
 
 Hi,
 
 I have the following query which has been running very slowly 
 and after a
 lot of testing/trial and error I found an execution plan that 
 ran the query
 in a fraction of the time (and then lost the statistics that 
 produced it).
 What I wish to know is how to force the query to use the 
 faster execution
 plan.

It would be a bit easier to diagnose the problem if you posted EXPLAIN
ANALYZE rather than just EXPLAIN.  The two plans you posted looked very
similar except for the order of the nested loop in subquery 1 and an index
scan rather than a seq scan in subquery 2.  

My guess would be that the order of the nested loop is determined mostly by
estimates of matching rows.  If you ran an EXPLAIN ANALYZE you could tell if
the planner is estimating correctly.  If it is not, you could try increasing
your statistics target and running ANALYZE.  

To make the planner prefer an index scan over a seq scan, I would first
check the statistics again, and then you can try setting enable_seqscan to
false (enable_seqscan is meant more for testing than production) or, you
could try reducing random_page_cost, but you should test that against a
range of queries before putting it in production.

Dave

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Forcing the use of particular execution plans

2006-10-02 Thread Tom Lane
Tim Truman [EMAIL PROTECTED] writes:
 Here is an explain analyze for the query that performs slowly,

This shows that the planner is exactly correct in thinking that all
the runtime is going into the seqscan on transaction:

 Aggregate  (cost=88256.32..88256.32 rows=1 width=0) (actual
 time=55829.000..55829.000 rows=1 loops=1)
 ...
   -  Seq Scan on transaction t
 (cost=0.00..87061.04 rows=1630 width=349) (actual time=234.000..55797.000
 rows=200 loops=1)
 Filter: ((transaction_date =
 '2005-01-01'::date) AND (transaction_date = '2006-09-25'::date) AND
 ((credit_card_no)::text ~~ '4564%549'::text))

Since that component of the plan was identical in your two original
plans (desired and undesired) it seems pretty clear that you have
not correctly identified what your problem is.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org