Re: [PERFORM] How to optimize monstrous query, sorts instead of

2003-07-02 Thread Michael Mattox
 Try this:

Rod, you improved my query last week (thank you very much) but I'm not sure
why but my performance is getting worse.  I think I know what happened, when
I did my load testing I created data that all had the same date, so sorting
on the date was very fast.  But now I've been running the system for a few
weeks I've got a range of dates and now the sort is very costly.  I'm
curious if it's possible to optimize this with an index?  I've tried
creating some indexes but they're never used.

explain analyze SELECT mx.idx , ms.averageconnecttimex AS
ms_averageconnecttime , ms.averagedurationx AS ms_averageduration , ms.datex
AS ms_date , ms.idx AS ms_id , ms.statusstringx AS ms_statusstring ,
ms.statusx AS ms_status , msi.actualcontentx AS msi_actualcontent ,
msi.connecttimex AS msi_connecttime , msi.correctcontentx AS
msi_correctcontent , msi.datex AS msi_date , msi.descriptionx AS
msi_description , msi.durationx AS msi_duration , msi.errorcontentx AS
msi_errorcontent , msi.idx AS msi_id , msi.monitorlocationx AS
msi_monitorlocation , msi.statusstringx AS msi_statusstring , msi.statusx AS
msi_status FROM monitorstatusx AS ms , monitorstatusitemx AS msi , monitorx
AS mx , monitorstatus_statusitemsx AS mssisx ,
monitorstatusitemlistd8ea58a5x AS litem WHERE ms.jdoidx = mssisx.jdoidx AND
mssisx.statusitemsx = litem.jdoidx AND litem.statusitemlistx = msi.jdoidx
AND mx.jdoidx = ms.monitorx AND ms.datex BETWEEN '2003-07-01
00:00:00.00+01' AND '2003-07-01 23:59:59.00+01' AND mx.idx =
'M-TEST_150-TEST_01_10560776551771895174239' ORDER BY ms.datex DESC;


QUERY PLAN


---
 Sort  (cost=6882.84..6883.08 rows=97 width=827) (actual
time=16712.46..16712.65 rows=225 loops=1)
   Sort Key: ms.datex
   -  Nested Loop  (cost=0.00..6879.66 rows=97 width=827) (actual
time=4413.12..16711.62 rows=225 loops=1)
 -  Nested Loop  (cost=0.00..6587.53 rows=97 width=162) (actual
time=4406.06..15941.16 rows=225 loops=1)
   -  Nested Loop  (cost=0.00..6295.38 rows=97 width=146)
(actual time=4383.59..15424.96 rows=225 loops=1)
 -  Nested Loop  (cost=0.00..6003.22 rows=97 width=130)
(actual time=4383.53..14938.02 rows=225 loops=1)
   -  Index Scan using monitorx_id_index on
monitorx mx  (cost=0.00..5.01 rows=1 width=46) (actual time=0.13..0.21
rows=1 loops=1)
 Index Cond: (idx =
'M-TEST_150-TEST_01_10560776551771895174239'::character varying)
   -  Index Scan using monitorstatusxmonitori on
monitorstatusx ms  (cost=0.00..5996.18 rows=163 width=84) (actual
time=4383.38..14936.39 rows=225 loops=1)
 Index Cond: (outer.jdoidx = ms.monitorx)
 Filter: ((datex = '2003-07-01
00:00:00'::timestamp without time zone) AND (datex = '2003-07-01
23:59:59'::timestamp without time zone))
 -  Index Scan using monitorstatus_stjdoidb742c9b3i on
monitorstatus_statusitemsx mssisx  (cost=0.00..3.01 rows=1 width=16) (actual
time=2.15..2.15 rows=1 loops=225)
   Index Cond: (outer.jdoidx = mssisx.jdoidx)
   -  Index Scan using monitorstatusitejdoid7db0befci on
monitorstatusitemlistd8ea58a5x litem  (cost=0.00..3.01 rows=1 width=16)
(actual time=2.28..2.28 rows=1 loops=225)
 Index Cond: (outer.statusitemsx = litem.jdoidx)
 -  Index Scan using monitorstatusitemx_pkey on monitorstatusitemx
msi  (cost=0.00..3.01 rows=1 width=665) (actual time=3.41..3.41 rows=1
loops=225)
   Index Cond: (outer.statusitemlistx = msi.jdoidx)
 Total runtime: 16713.25 msec
(18 rows)

As you can see it takes 16 seconds to return only 18 rows.  The
monitorstatusx table has over 7 million rows, and for each monitor status
there's one row in each of the monitorstatusitemx and the join tables.  So I
think the size of the database is just too high for this sort.  I run my
reports offline, but what I'm finding is that at 16 seconds per report, the
reports aren't finished by morning.  My postgresql.conf is attached in case
I have it configured incorrectly.

Thanks,
Michael



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] How to optimize monstrous query, sorts instead of

2003-07-02 Thread Rod Taylor
On Wed, 2003-07-02 at 10:24, Michael Mattox wrote:
  Try this:
 
 Rod, you improved my query last week (thank you very much) but I'm not sure
 why but my performance is getting worse.  I think I know what happened, when
 I did my load testing I created data that all had the same date, so sorting
 on the date was very fast.  But now I've been running the system for a few
 weeks I've got a range of dates and now the sort is very costly.  I'm
 curious if it's possible to optimize this with an index?  I've tried
 creating some indexes but they're never used.

Standard questions, did you VACUUM? Regularly?  Want to try again and
send us the output from VACUUM VERBOSE?

Sounds like you created a ton of test data, then removed a bunch?  Did
you REINDEX that table?

During normal use, what is your query spread like?  Mostly selects with
some inserts?  Any updates or deletes?  How often to updates or deletes
come in, and how many rows do they effect?

-  Index Scan using monitorstatusxmonitori on
 monitorstatusx ms  (cost=0.00..5996.18 rows=163 width=84) (actual
 time=4383.38..14936.39 rows=225 loops=1)
  Index Cond: (outer.jdoidx = ms.monitorx)
  Filter: ((datex = '2003-07-01
 00:00:00'::timestamp without time zone) AND (datex = '2003-07-01
 23:59:59'::timestamp without time zone))

The above index scan is taking a vast majority of the time (nearly 15
seconds of the 16 second total -- stop thinking about sorts!)..  What
happened to the index on monitorx and datex?

-- 
Rod Taylor [EMAIL PROTECTED]

PGP Key: http://www.rbt.ca/rbtpub.asc


signature.asc
Description: This is a digitally signed message part


Re: [PERFORM] How to optimize monstrous query, sorts instead of

2003-07-02 Thread Rod Taylor
On Wed, 2003-07-02 at 09:46, Michael Mattox wrote:
  Shared buffers is probably too high.  How much memory in this machine?
  Is there anything else running aside from PostgreSQL?  What does top say
  about cached / buffered data (number)
 
 I was using the 25% of RAM guideline posted recently.  The machine has
 1.5gig but it also has a couple other java applications running on it
 including tomcat.

Seems you have a ton of cached data, so it's probably fine.  Only way to
really find out is a benchmark of sorts.

  I see you reduced the random_page_cost to 1.5.  Why did you do this (how
  is your disk subsystem configured)?
 
 Someone suggested I lower it to 1.5 or 1.0, not sure what the reasoning was.
 The disks are both SCSI 10,000 RPM.  My data directory is on one disk by

I'd be tempted to bump it up to 2.0 or 2.5 since data is on a single
disk (sequential scans *will* be faster than an index scan), but you
would need to run a benchmark on your disk to see if that is right.

This setting should not effect this query however unless you change the
value by a few orders of magnitude.

 there's now 7693057 rows in monitorstatusx and monitorstatusitemx as well as
 the necessary rows for the join tables.

Looks to me like 50% of the database should be in memory already if
Linux is caching the right stuff.

  During normal use, what is your query spread like?  Mostly selects with
  some inserts?  Any updates or deletes?  How often to updates or deletes
  come in, and how many rows do they effect?
 
 There is a query on monitorx by datex every 10 seconds (monitors are updated
 every 5 minutes, so every 10 seconds I get the monitors that are due for an
 update).  Each monitor is then saved with its status field modified, and a
 new status item is inserted.  This happens every 5 minutes.  There are 

Hence the vacuum every 5 minutes.  Sounds good.

Every monitor is updated every 5 minutes?

 Before I guess the index with monitorx,datex didn't do much because all the
 data had the same date.  But now that I have over 2 weeks of real data, it
 makes a difference.

Yeah, the more the date diverges, the more useful that index will be.

Your reports. Are they always on the most recent date, or do they vary
in time.  Are queries distributed against old data as much as the new
stuff?

If not (the examples you've shown have been recent) you might try a
partial index on datex, monitorx and a single index on monitorx.

CREATE INDEX ON  (datex, monitorx) WHERE datex = '2003-07-01';

After data from the 1st is no longer useful (being queried frequently),
drop that index, and recreate WHERE datex = relevent date here.

This will make a significant improvement but will sacrifice the
performance on queries for older timeframes.

Since you're doing frequent updates, don't forget to drop any non-useful
indexes. Updating indexes significantly lengthens the time taken for an
update to occur.

-- 
Rod Taylor [EMAIL PROTECTED]

PGP Key: http://www.rbt.ca/rbtpub.asc


signature.asc
Description: This is a digitally signed message part


Re: [PERFORM] How to optimize monstrous query, sorts instead of

2003-07-02 Thread Michael Mattox
 I'd be tempted to bump it up to 2.0 or 2.5 since data is on a single
 disk (sequential scans *will* be faster than an index scan), but you
 would need to run a benchmark on your disk to see if that is right.

I just set it to 2.5.  What kind of benchmark can I run?

 Every monitor is updated every 5 minutes?

Yes.  Actually the customer can detween between every 5-60 minutes but for a
worst case scenario I set them to be 5 minutes for stress testing.  Note
that all 1,500 monitors are not updated at the same time, they're evenly
distributed.

 Your reports. Are they always on the most recent date, or do they vary
 in time.  Are queries distributed against old data as much as the new
 stuff?

So far my plan is to have only the current month of data in the database, so
at the first of every month I'll drop the old data.  There are two sources
of this monstrous query:

the webapp lets users make the query for the current day.
the reporting app makes the query for the prior day, the current week, and
month to date.

Between the two the queries are pretty evenly distributed.




---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] How to optimize monstrous query, sorts instead of

2003-06-25 Thread Michael Mattox
With a slight correction (you had m  mx so I changed them to be all mx, I
hope this is what you intended) this query works.  It's exactly the same
speed, but it doesn't give me the warnings I was getting:

NOTICE:  Adding missing FROM-clause entry for table monitorx
NOTICE:  Adding missing FROM-clause entry for table
monitorstatus_statusitemsx
NOTICE:  Adding missing FROM-clause entry for table
monitorstatusitemlistd8ea58a5x

I never knew what those were from, I even searched Google trying to find out
and I couldn't understand it so I gave up.  Thanks for pointing this out for
me, and thanks for fixing my query.

Michael


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] Behalf Of Rod Taylor
 Sent: Wednesday, June 25, 2003 4:28 PM
 To: [EMAIL PROTECTED]
 Cc: Postgresql Performance
 Subject: Re: [PERFORM] How to optimize monstrous query, sorts instead of


  I didn't notice that before, thanks for pointing that out.  I just tried
  adding monitorx.idx to the select and it ended up making my query take
  several minutes long.  Any ideas how I can fix this and keep my
 performance?

 By using it aliased and non-aliased (2 different references to the same
 table) you've caused it to join itself.

 Try this:

 SELECT m.idx
  , ms.averageconnecttimex AS ms_averageconnecttime
  , ms.averagedurationx AS ms_averageduration
  , ms.datex AS ms_date
  , ms.idx AS ms_id
  , ms.statusstringx AS ms_statusstring
  , ms.statusx AS ms_status
  , msi.actualcontentx AS msi_actualcontent
  , msi.connecttimex AS msi_connecttime
  , msi.correctcontentx AS msi_correctcontent
  , msi.datex AS msi_date
  , msi.descriptionx AS msi_description
  , msi.durationx AS msi_duration
  , msi.errorcontentx AS msi_errorcontent
  , msi.idx AS msi_id
  , msi.monitorlocationx AS msi_monitorlocation
  , msi.statusstringx AS msi_statusstring
  , msi.statusx AS msi_status

   FROM monitorstatusx AS ms
  , monitorstatusitemx AS msi

  , monitorx AS mx
  , monitorstatus_statusitemsx AS mssisx
  , monitorstatusitemlistd8ea58a5x AS litem

  WHERE ms.jdoidx = mssisx.jdoidx
AND mssisx.statusitemsx = litem.jdoidx
AND litem.statusitemlistx = msi.jdoidx
AND mx.jdoidx = ms.monitorx
AND ms.datex BETWEEN '2003-06-20 08:57:21.36'
 AND '2003-06-29 08:57:21.36'
AND m.idx = 'M-TEST_1444-TEST_00_10560561260561463219352'

 ORDER BY ms.datex DESC;

 --
 Rod Taylor [EMAIL PROTECTED]

 PGP Key: http://www.rbt.ca/rbtpub.asc




---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] How to optimize monstrous query, sorts instead of using index

2003-06-25 Thread Tom Lane
Michael Mattox [EMAIL PROTECTED] writes:
 It's much slower but I appreciate you taking the time to try.  I'm pretty
 new to SQL so I must admin this query is very confusing for me.  I'm using
 Java Data Objects (JDO, an O/R mapping framework) but the implementation I'm
 using (Kodo) isn't smart enough to do all the joins efficiently, which is
 why I had to rewrite this query by hand.

It wasn't till I read that :-( that I noticed that you were doing nested
left joins.  Fooling with the join order may be your best route to a
solution --- have you read
http://www.postgresql.org/docs/view.php?version=7.3idoc=0file=explicit-joins.html

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match