Re: [PERFORM] Any way to speed this up?

2005-04-09 Thread Jim C. Nasby
2 things to point out from this last run:

50% of the time is taken scanning tblassociate 
-  Seq Scan on tblassociate a  (cost=0.00..38388.79 rows=199922 width=53) 
(actual time=62.000..10589.000 rows=176431 loops=1)
  Filter: ((clientnum)::text = 'SAKS'::text)

If you had an index on clientnum and didn't cast it to text in the view,
you might be able to use an indexscan, which could be faster (depends on
how big the table actually is).

This sort is taking about 25% of the time:
  -  Sort  (cost=85542.59..86042.39 rows=199922 width=75) (actual 
time=19641.000..19955.000 rows=159960 loops=1)
Sort Key: a.locationid
-  Merge Right Join  (cost=60850.40..62453.22 rows=199922 width=75) 
(actual time=13500.000..14734.000 rows=176431 loops=1)

I suspect it shouldn't take 5 seconds to sort 160k rows in memory, and
that this sort is spilling to disk. If you increase your working memory
the sort might fit entirely in memory. As a quick test, you could set
working memory to 80% of system memory and see how that changes the
speed. But you wouldn't want to set it that high in production.

On Thu, Apr 07, 2005 at 01:14:33PM -0400, Joel Fradkin wrote:
 Here is the result after putting it back to 4 the original value (I had done
 that prior to your suggestion of using 2 or 3) to see what might change.
 I also vacummed and thought I saw records deleted in associate, which I
 found odd as this is a test site and no new records were added or deleted.
 
 Merge Join  (cost=86788.09..87945.00 rows=10387 width=112) (actual
 time=19703.000..21154.000 rows=159959 loops=1)
   Merge Cond: (outer.locationid = inner.locationid)
   -  Sort  (cost=1245.50..1246.33 rows=332 width=48) (actual
 time=62.000..62.000 rows=441 loops=1)
 Sort Key: l.locationid
 -  Index Scan using ix_location on tbllocation l
 (cost=0.00..1231.60 rows=332 width=48) (actual time=15.000..62.000 rows=441
 loops=1)
   Index Cond: ('SAKS'::text = (clientnum)::text)
   -  Sort  (cost=85542.59..86042.39 rows=199922 width=75) (actual
 time=19641.000..19955.000 rows=159960 loops=1)
 Sort Key: a.locationid
 -  Merge Right Join  (cost=60850.40..62453.22 rows=199922
 width=75) (actual time=13500.000..14734.000 rows=176431 loops=1)
   Merge Cond: ((outer.id = inner.jobtitleid) AND
 (outer.?column4? = inner.?column10?))
   -  Sort  (cost=554.11..570.13 rows=6409 width=37) (actual
 time=94.000..94.000 rows=6391 loops=1)
 Sort Key: jt.id, (jt.clientnum)::text
 -  Seq Scan on tbljobtitle jt  (cost=0.00..148.88
 rows=6409 width=37) (actual time=0.000..63.000 rows=6391 loops=1)
   Filter: (1 = presentationid)
   -  Sort  (cost=60296.29..60796.09 rows=199922 width=53)
 (actual time=13406.000..13859.000 rows=176431 loops=1)
 Sort Key: a.jobtitleid, (a.clientnum)::text
 -  Seq Scan on tblassociate a  (cost=0.00..38388.79
 rows=199922 width=53) (actual time=62.000..10589.000 rows=176431 loops=1)
   Filter: ((clientnum)::text = 'SAKS'::text)
 Total runtime: 22843.000 ms
 
 Joel Fradkin
  
 -Original Message-
 From: Tom Lane [mailto:[EMAIL PROTECTED] 
 Sent: Thursday, April 07, 2005 11:43 AM
 To: Joel Fradkin
 Cc: 'PostgreSQL Perform'
 Subject: Re: [PERFORM] Any way to speed this up? 
 
 Joel Fradkin [EMAIL PROTECTED] writes:
  random_page_cost = 1.2#4# units are one sequential page
  fetch cost
 
 That is almost certainly overoptimistic; it's causing the planner to
 use indexscans when it shouldn't.  Try 2 or 3 or thereabouts.
 
   regards, tom lane
 
 
 ---(end of broadcast)---
 TIP 9: the planner will ignore your desire to choose an index scan if your
   joining column's datatypes do not match
 

-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 8: explain analyze is your friend


[PERFORM] Any way to speed this up?

2005-04-07 Thread Joel Fradkin








Running this explain on windows box, but production on linux
both 8.0.1

The MSSQL is beating me out for some reason on this query.

The linux box is much more powerful, I may have to increase
the cache, but I am pretty sure its not an issue yet.

It has 8 gig internal memory any recommendation on the cache
size to use?



explain analyze select * from viwassoclist where clientnum =
'SAKS'



Merge Join (cost=59871.79..60855.42 rows=7934
width=112) (actual time=46906.000..48217.000 rows=159959 loops=1)

 Merge Cond: (outer.locationid =
inner.locationid)

 - Sort (cost=393.76..394.61
rows=338 width=48) (actual time=62.000..62.000 rows=441 loops=1)

 Sort Key: l.locationid

 -
Index Scan using ix_location on tbllocation l (cost=0.00..379.56 rows=338
width=48) (actual time=15.000..62.000 rows=441 loops=1)


Index Cond: ('SAKS'::text = (clientnum)::text)

 - Sort (cost=59478.03..59909.58
rows=172618 width=75) (actual time=46844.000..46985.000 rows=159960
loops=1)

 Sort Key: a.locationid

 -
Merge Right Join (cost=0.00..39739.84 rows=172618 width=75) (actual
time=250.000..43657.000 rows=176431 loops=1)


Merge Cond: (((outer.clientnum)::text = (inner.clientnum)::text)
AND (outer.id = inner.jobtitleid))


- Index Scan using ix_tbljobtitle_id on tbljobtitle jt
(cost=0.00..194.63 rows=6391 width=37) (actual time=32.000..313.000 rows=5689
loops=1)


Filter: (1 = presentationid)


- Index Scan using ix_tblassoc_jobtitleid on tblassociate a
(cost=0.00..38218.08 rows=172618 width=53) (actual time=31.000..41876.000 rows=176431
loops=1)


Index Cond: ((clientnum)::text = 'SAKS'::text)

Total runtime: 48500.000 ms



CREATE OR REPLACE VIEW viwassoclist AS 

SELECT a.clientnum, a.associateid, a.associatenum, a.lastname,
a.firstname, jt.value AS jobtitle, l.name AS location, l.locationid
AS mainlocationid, l.divisionid, l.regionid, l.districtid, (a.lastname::text ||
', '::text) || a.firstname::text AS assocname, a.isactive, a.isdeleted

 FROM tblassociate a

 LEFT JOIN tbljobtitle jt ON a.jobtitleid = jt.id
AND jt.clientnum::text = a.clientnum::text AND 1 = jt.presentationid

 JOIN tbllocation l ON a.locationid = l.locationid
AND l.clientnum::text = a.clientnum::text;





CREATE TABLE tblassociate

(

 clientnum varchar(16) NOT NULL,

 associateid int4 NOT NULL,

 associatenum varchar(10),

 firstname varchar(50),

 middleinit varchar(5),

 lastname varchar(50),

 ssn varchar(18),

 dob timestamp,

 address varchar(100),

 city varchar(50),

 state varchar(50),

 country varchar(50),

 zip varchar(10),

 homephone varchar(14),

 cellphone varchar(14),

 pager varchar(14),

 associateaccount varchar(50),

 doh timestamp,

 dot timestamp,

 rehiredate timestamp,

 lastdayworked timestamp,

 staffexecid int4,

 jobtitleid int4,

 locationid int4,

 deptid int4,

 positionnum int4,

 worktypeid int4,

 sexid int4,

 maritalstatusid int4,

 ethnicityid int4,

 weight float8,

 heightfeet int4,

 heightinches int4,

 haircolorid int4,

 eyecolorid int4,

 isonalarmlist bool NOT NULL DEFAULT false,

 isactive bool NOT NULL DEFAULT true,

 ismanager bool NOT NULL DEFAULT false,

 issecurity bool NOT NULL DEFAULT false,

 createdbyid int4,

 isdeleted bool NOT NULL DEFAULT false,

 militarybranchid int4,

 militarystatusid int4,

 patrontypeid int4,

 identificationtypeid int4,

 workaddress varchar(200),

 testtypeid int4,

 testscore int4,

 pin int4,

 county varchar(50),

 CONSTRAINT pk_tblassociate PRIMARY KEY (clientnum, associateid),

 CONSTRAINT ix_tblassociate UNIQUE (clientnum, associatenum)

)



CREATE TABLE tbljobtitle

(

 clientnum varchar(16) NOT NULL,

 id int4 NOT NULL,

 value varchar(50),

 code varchar(16),

 isdeleted bool DEFAULT false,

 presentationid int4 NOT NULL DEFAULT 1,

 CONSTRAINT pk_tbljobtitle PRIMARY KEY (clientnum, id,
presentationid)

)



CREATE TABLE tbllocation

(

 clientnum varchar(16) NOT NULL,

 locationid int4 NOT NULL,

 districtid int4 NOT NULL,

 regionid int4 NOT NULL,

 divisionid int4 NOT NULL,

 locationnum varchar(8),

 name varchar(50),

 clientlocnum varchar(50),

 address varchar(100),

 address2 varchar(100),

 city varchar(50),

 state varchar(2) NOT NULL DEFAULT 'zz'::character
varying,

 zip varchar(10),

 countryid int4,

 phone varchar(15),

 fax varchar(15),

 payname varchar(40),

 contact char(36),

 active bool NOT NULL DEFAULT true,

 coiprogram text,

 coilimit text,

 coiuser varchar(255),

 coidatetime varchar(32),

 ec_note_field varchar(1050),

 locationtypeid int4,

 open_time timestamp,

 close_time timestamp,

 insurance_loc_id varchar(50),

 lpregionid int4,

 sic int4,

 CONSTRAINT pk_tbllocation PRIMARY KEY (clientnum, locationid),

 CONSTRAINT ix_tbllocation_1 UNIQUE (clientnum, locationnum,
name),

 CONSTRAINT ix_tbllocation_unique_number UNIQUE (clientnum,
divisionid, regionid, districtid, locationnum)

)



Joel Fradkin
















Re: [PERFORM] Any way to speed this up?

2005-04-07 Thread Keith Worthington
On Thu, 07 Apr 2005 11:13:57 -0400, Joel Fradkin wrote
[snip]
   -  Sort  (cost=393.76..394.61 rows=338 width=48) (actual
 time=62.000..62.000 rows=441 loops=1)
 
 Sort Key: l.locationid
 
 -  Index Scan using ix_location on tbllocation l
 
 (cost=0.00..379.56 rows=338 width=48) (actual time=15.000..62.000 rows=441
 loops=1)
 
   Index Cond: ('SAKS'::text = (clientnum)::text)
 
   -  Sort  (cost=59478.03..59909.58 rows=172618 width=75) (actual
 time=46844.000..46985.000 rows=159960 loops=1)
 
 Sort Key: a.locationid
[snip]
 
 CREATE TABLE tblassociate
[snip]
 
   CONSTRAINT pk_tblassociate PRIMARY KEY (clientnum, associateid),
 
   CONSTRAINT ix_tblassociate UNIQUE (clientnum, associatenum)
 
[snip]
 
 Joel Fradkin

Joel,

I am REALLY new at this and struggling to understand EXPLAIN ANALYZE output
but for what it is worth it looks like the sort on a.locationid is taking up a
lot of the time.  I do not see an index on that column.  I would suggest
indexing tblassociate.locationid and seeing if that helps.

Kind Regards,
Keith

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Any way to speed this up?

2005-04-07 Thread John Arbash Meinel
Joel Fradkin wrote:
Running this explain on windows box, but production on linux both 8.0.1
The MSSQL is beating me out for some reason on this query.
The linux box is much more powerful, I may have to increase the cache,
but I am pretty sure its not an issue yet.
It has 8 gig internal memory any recommendation on the cache size to use?

explain analyze select * from viwassoclist where clientnum = 'SAKS'

Merge Join  (cost=59871.79..60855.42 rows=7934 width=112) (actual
time=46906.000..48217.000 rows=159959 loops=1)
The first thing I noticed was this. Notice that the estimated rows is
8k, the actual rows is 160k. Which means the planner is mis-estimating
the selectivity of your merge.
  -  Sort  (cost=59478.03..59909.58 rows=172618 width=75) (actual
time=46844.000..46985.000 rows=159960 loops=1)
Sort Key: a.locationid
This sort actually isn't taking very long. It starts at 46800 and runs
until 47000 so it takes  1 second.
-  Merge Right Join  (cost=0.00..39739.84 rows=172618
width=75) (actual time=250.000..43657.000 rows=176431 loops=1)
  Merge Cond: (((outer.clientnum)::text =
(inner.clientnum)::text) AND (outer.id = inner.jobtitleid))
  -  Index Scan using ix_tbljobtitle_id on tbljobtitle
jt  (cost=0.00..194.63 rows=6391 width=37) (actual
time=32.000..313.000 rows=5689 loops=1)
Filter: (1 = presentationid)
  -  Index Scan using ix_tblassoc_jobtitleid on
tblassociate a  (cost=0.00..38218.08 rows=172618 width=53) (actual
time=31.000..41876.000 rows=176431 loops=1)
Index Cond: ((clientnum)::text = 'SAKS'::text)
This is where the actual expense is. The merge right join starts at 250,
and runs until 43000. Which seems to be caused primarily by the index
scan of tblassociate. How many rows are in tblassociate? I'm assuming
quite a bit, since the planner thinks an index scan is faster than seq
scan for 170k rows. (If you have  2M this is probably accurate)
I don't really know how long this should take, but 38s for 172k rows
seems a little long.
John
=:-


signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] Any way to speed this up?

2005-04-07 Thread Tom Lane
Joel Fradkin [EMAIL PROTECTED] writes:
 Running this explain on windows box, but production on linux both 8.0.1

Are you using any nondefault optimizer settings?  The vast bulk of the
time is going into the indexscan on tblassociate (almost 42 out of the
48 seconds), and I'm a bit surprised it didn't choose a seqscan and sort
instead.  Or even more likely, forget the merge joins altogether and use
hash joins --- the other tables are plenty small enough to fit in hash
tables.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Any way to speed this up?

2005-04-07 Thread Tom Lane
John Arbash Meinel [EMAIL PROTECTED] writes:
   -  Sort  (cost=59478.03..59909.58 rows=172618 width=75) (actual
 time=46844.000..46985.000 rows=159960 loops=1)
 
 Sort Key: a.locationid
 

 This sort actually isn't taking very long. It starts at 46800 and runs
 until 47000 so it takes  1 second.

 -  Merge Right Join  (cost=0.00..39739.84 rows=172618
 width=75) (actual time=250.000..43657.000 rows=176431 loops=1)

You're not reading it quite right.  The first actual number is the
time at which the first result row was delivered, which for a sort is
after the completion of (the bulk of) the sorting work.  What you
really need to look at is the difference between the completion times
of the node and its immediate input(s).  In this case I'd blame the
sort for 46985.000 - 43657.000 msec.

Come to think of it, though, you should not be putting a whole lot of
trust in EXPLAIN ANALYZE numbers taken on Windows, because they are
based on gettimeofday which has absolutely awful resolution on that
platform.  (There's a workaround for this in our CVS, but it's not in
8.0.*.)  I think we can still conclude that the indexscan on
tblassociate is most of the cost, but I wouldn't venture to say that
it's exactly such-and-such percent.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Any way to speed this up?

2005-04-07 Thread Joel Fradkin
shared_buffers = 8000   # min 16, at least max_connections*2, 8KB
each
work_mem = 8192#1024# min 64, size in KB
max_fsm_pages = 3   # min max_fsm_relations*16, 6 bytes each
effective_cache_size = 4 #1000  # typically 8KB each
random_page_cost = 1.2#4# units are one sequential page
fetch cost

These are the items I changed.
In the development box I turned random page cost to .2 because I figured it
would all be faster using an index as all my data is at a minimum being
selected by clientnum.

But the analyze I sent in is from these settings above on a windows box.
If I was running the analyze (pgadmin) on a windows box but connecting to a
linux box would the times be accurate or do I have to run the analyze on the
linux box for that to happen?

I am a little unclear why I would need an index on associate by location as
I thought it would be using indexes in location and jobtitle for their
joins.
I did not say where locationid = x in my query on the view.
I have so much to learn about SQL.
Joel




---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Any way to speed this up?

2005-04-07 Thread Tom Lane
Joel Fradkin [EMAIL PROTECTED] writes:
 random_page_cost = 1.2#4  # units are one sequential page
 fetch cost

That is almost certainly overoptimistic; it's causing the planner to
use indexscans when it shouldn't.  Try 2 or 3 or thereabouts.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Any way to speed this up?

2005-04-07 Thread John Arbash Meinel
Joel Fradkin wrote:
shared_buffers = 8000   # min 16, at least max_connections*2, 8KB
each
work_mem = 8192#1024# min 64, size in KB
max_fsm_pages = 3   # min max_fsm_relations*16, 6 bytes each
effective_cache_size = 4 #1000  # typically 8KB each
random_page_cost = 1.2#4# units are one sequential page
fetch cost
These are the items I changed.
In the development box I turned random page cost to .2 because I figured it
would all be faster using an index as all my data is at a minimum being
selected by clientnum.

You're random page cost is *way* too low. I would probably change this
to no less that 2.0.
But the analyze I sent in is from these settings above on a windows box.
If I was running the analyze (pgadmin) on a windows box but connecting to a
linux box would the times be accurate or do I have to run the analyze on the
linux box for that to happen?

EXPLAIN ANALYZE is done on the server side, so it doesn't matter what
you use to connect to it. The \timing flag occurs on the local side, and
is thus influenced my network latency (but it only tells you the time
for the whole query anyway).
I am a little unclear why I would need an index on associate by location as
I thought it would be using indexes in location and jobtitle for their
joins.
I did not say where locationid = x in my query on the view.
I have so much to learn about SQL.
Joel
CREATE OR REPLACE VIEW viwassoclist AS
 SELECT a.clientnum, a.associateid, a.associatenum, a.lastname,
a.firstname, jt.value AS jobtitle, l.name AS location, l.locationid
AS mainlocationid, l.divisionid, l.regionid, l.districtid,
(a.lastname::text || ', '::text) || a.firstname::text AS assocname,
a.isactive, a.isdeleted
   FROM tblassociate a
   LEFT JOIN tbljobtitle jt ON a.jobtitleid = jt.id AND
jt.clientnum::text = a.clientnum::text AND 1 = jt.presentationid
   JOIN tbllocation l ON a.locationid = l.locationid AND
l.clientnum::text = a.clientnum::text;
  ^^^
The locationid is defined in your view. This is the part that postgres
uses to merge all of the different tables together, it doesn't really
matter whether you restrict it with a WHERE clause.
Try just setting your random page cost back to something more
reasonable, and try again.
John
=:-


signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] Any way to speed this up?

2005-04-07 Thread Joel Fradkin
Here is the result after putting it back to 4 the original value (I had done
that prior to your suggestion of using 2 or 3) to see what might change.
I also vacummed and thought I saw records deleted in associate, which I
found odd as this is a test site and no new records were added or deleted.

Merge Join  (cost=86788.09..87945.00 rows=10387 width=112) (actual
time=19703.000..21154.000 rows=159959 loops=1)
  Merge Cond: (outer.locationid = inner.locationid)
  -  Sort  (cost=1245.50..1246.33 rows=332 width=48) (actual
time=62.000..62.000 rows=441 loops=1)
Sort Key: l.locationid
-  Index Scan using ix_location on tbllocation l
(cost=0.00..1231.60 rows=332 width=48) (actual time=15.000..62.000 rows=441
loops=1)
  Index Cond: ('SAKS'::text = (clientnum)::text)
  -  Sort  (cost=85542.59..86042.39 rows=199922 width=75) (actual
time=19641.000..19955.000 rows=159960 loops=1)
Sort Key: a.locationid
-  Merge Right Join  (cost=60850.40..62453.22 rows=199922
width=75) (actual time=13500.000..14734.000 rows=176431 loops=1)
  Merge Cond: ((outer.id = inner.jobtitleid) AND
(outer.?column4? = inner.?column10?))
  -  Sort  (cost=554.11..570.13 rows=6409 width=37) (actual
time=94.000..94.000 rows=6391 loops=1)
Sort Key: jt.id, (jt.clientnum)::text
-  Seq Scan on tbljobtitle jt  (cost=0.00..148.88
rows=6409 width=37) (actual time=0.000..63.000 rows=6391 loops=1)
  Filter: (1 = presentationid)
  -  Sort  (cost=60296.29..60796.09 rows=199922 width=53)
(actual time=13406.000..13859.000 rows=176431 loops=1)
Sort Key: a.jobtitleid, (a.clientnum)::text
-  Seq Scan on tblassociate a  (cost=0.00..38388.79
rows=199922 width=53) (actual time=62.000..10589.000 rows=176431 loops=1)
  Filter: ((clientnum)::text = 'SAKS'::text)
Total runtime: 22843.000 ms

Joel Fradkin
 
-Original Message-
From: Tom Lane [mailto:[EMAIL PROTECTED] 
Sent: Thursday, April 07, 2005 11:43 AM
To: Joel Fradkin
Cc: 'PostgreSQL Perform'
Subject: Re: [PERFORM] Any way to speed this up? 

Joel Fradkin [EMAIL PROTECTED] writes:
 random_page_cost = 1.2#4  # units are one sequential page
 fetch cost

That is almost certainly overoptimistic; it's causing the planner to
use indexscans when it shouldn't.  Try 2 or 3 or thereabouts.

regards, tom lane


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Any way to speed this up?

2005-04-07 Thread John Arbash Meinel
Joel Fradkin wrote:
Here is the result after putting it back to 4 the original value (I had done
that prior to your suggestion of using 2 or 3) to see what might change.
I also vacummed and thought I saw records deleted in associate, which I
found odd as this is a test site and no new records were added or deleted.

Well, that looks 2x as fast, right?
You might try
SET enable_mergejoin TO off;
Just to see if you can force a hash-join and see how long that takes.
You might also try increasing work_mem.
You can do that just in the current session with
SET work_mem TO ;
John
=:-


signature.asc
Description: OpenPGP digital signature