Re: [PERFORM] How to optimize monstrous query, sorts instead of
Try this: Rod, you improved my query last week (thank you very much) but I'm not sure why but my performance is getting worse. I think I know what happened, when I did my load testing I created data that all had the same date, so sorting on the date was very fast. But now I've been running the system for a few weeks I've got a range of dates and now the sort is very costly. I'm curious if it's possible to optimize this with an index? I've tried creating some indexes but they're never used. explain analyze SELECT mx.idx , ms.averageconnecttimex AS ms_averageconnecttime , ms.averagedurationx AS ms_averageduration , ms.datex AS ms_date , ms.idx AS ms_id , ms.statusstringx AS ms_statusstring , ms.statusx AS ms_status , msi.actualcontentx AS msi_actualcontent , msi.connecttimex AS msi_connecttime , msi.correctcontentx AS msi_correctcontent , msi.datex AS msi_date , msi.descriptionx AS msi_description , msi.durationx AS msi_duration , msi.errorcontentx AS msi_errorcontent , msi.idx AS msi_id , msi.monitorlocationx AS msi_monitorlocation , msi.statusstringx AS msi_statusstring , msi.statusx AS msi_status FROM monitorstatusx AS ms , monitorstatusitemx AS msi , monitorx AS mx , monitorstatus_statusitemsx AS mssisx , monitorstatusitemlistd8ea58a5x AS litem WHERE ms.jdoidx = mssisx.jdoidx AND mssisx.statusitemsx = litem.jdoidx AND litem.statusitemlistx = msi.jdoidx AND mx.jdoidx = ms.monitorx AND ms.datex BETWEEN '2003-07-01 00:00:00.00+01' AND '2003-07-01 23:59:59.00+01' AND mx.idx = 'M-TEST_150-TEST_01_10560776551771895174239' ORDER BY ms.datex DESC; QUERY PLAN --- Sort (cost=6882.84..6883.08 rows=97 width=827) (actual time=16712.46..16712.65 rows=225 loops=1) Sort Key: ms.datex - Nested Loop (cost=0.00..6879.66 rows=97 width=827) (actual time=4413.12..16711.62 rows=225 loops=1) - Nested Loop (cost=0.00..6587.53 rows=97 width=162) (actual time=4406.06..15941.16 rows=225 loops=1) - Nested Loop (cost=0.00..6295.38 rows=97 width=146) (actual time=4383.59..15424.96 rows=225 loops=1) - Nested Loop (cost=0.00..6003.22 rows=97 width=130) (actual time=4383.53..14938.02 rows=225 loops=1) - Index Scan using monitorx_id_index on monitorx mx (cost=0.00..5.01 rows=1 width=46) (actual time=0.13..0.21 rows=1 loops=1) Index Cond: (idx = 'M-TEST_150-TEST_01_10560776551771895174239'::character varying) - Index Scan using monitorstatusxmonitori on monitorstatusx ms (cost=0.00..5996.18 rows=163 width=84) (actual time=4383.38..14936.39 rows=225 loops=1) Index Cond: (outer.jdoidx = ms.monitorx) Filter: ((datex = '2003-07-01 00:00:00'::timestamp without time zone) AND (datex = '2003-07-01 23:59:59'::timestamp without time zone)) - Index Scan using monitorstatus_stjdoidb742c9b3i on monitorstatus_statusitemsx mssisx (cost=0.00..3.01 rows=1 width=16) (actual time=2.15..2.15 rows=1 loops=225) Index Cond: (outer.jdoidx = mssisx.jdoidx) - Index Scan using monitorstatusitejdoid7db0befci on monitorstatusitemlistd8ea58a5x litem (cost=0.00..3.01 rows=1 width=16) (actual time=2.28..2.28 rows=1 loops=225) Index Cond: (outer.statusitemsx = litem.jdoidx) - Index Scan using monitorstatusitemx_pkey on monitorstatusitemx msi (cost=0.00..3.01 rows=1 width=665) (actual time=3.41..3.41 rows=1 loops=225) Index Cond: (outer.statusitemlistx = msi.jdoidx) Total runtime: 16713.25 msec (18 rows) As you can see it takes 16 seconds to return only 18 rows. The monitorstatusx table has over 7 million rows, and for each monitor status there's one row in each of the monitorstatusitemx and the join tables. So I think the size of the database is just too high for this sort. I run my reports offline, but what I'm finding is that at 16 seconds per report, the reports aren't finished by morning. My postgresql.conf is attached in case I have it configured incorrectly. Thanks, Michael ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] How to optimize monstrous query, sorts instead of
On Wed, 2003-07-02 at 10:24, Michael Mattox wrote: Try this: Rod, you improved my query last week (thank you very much) but I'm not sure why but my performance is getting worse. I think I know what happened, when I did my load testing I created data that all had the same date, so sorting on the date was very fast. But now I've been running the system for a few weeks I've got a range of dates and now the sort is very costly. I'm curious if it's possible to optimize this with an index? I've tried creating some indexes but they're never used. Standard questions, did you VACUUM? Regularly? Want to try again and send us the output from VACUUM VERBOSE? Sounds like you created a ton of test data, then removed a bunch? Did you REINDEX that table? During normal use, what is your query spread like? Mostly selects with some inserts? Any updates or deletes? How often to updates or deletes come in, and how many rows do they effect? - Index Scan using monitorstatusxmonitori on monitorstatusx ms (cost=0.00..5996.18 rows=163 width=84) (actual time=4383.38..14936.39 rows=225 loops=1) Index Cond: (outer.jdoidx = ms.monitorx) Filter: ((datex = '2003-07-01 00:00:00'::timestamp without time zone) AND (datex = '2003-07-01 23:59:59'::timestamp without time zone)) The above index scan is taking a vast majority of the time (nearly 15 seconds of the 16 second total -- stop thinking about sorts!).. What happened to the index on monitorx and datex? -- Rod Taylor [EMAIL PROTECTED] PGP Key: http://www.rbt.ca/rbtpub.asc signature.asc Description: This is a digitally signed message part
Re: [PERFORM] How to optimize monstrous query, sorts instead of
On Wed, 2003-07-02 at 09:46, Michael Mattox wrote: Shared buffers is probably too high. How much memory in this machine? Is there anything else running aside from PostgreSQL? What does top say about cached / buffered data (number) I was using the 25% of RAM guideline posted recently. The machine has 1.5gig but it also has a couple other java applications running on it including tomcat. Seems you have a ton of cached data, so it's probably fine. Only way to really find out is a benchmark of sorts. I see you reduced the random_page_cost to 1.5. Why did you do this (how is your disk subsystem configured)? Someone suggested I lower it to 1.5 or 1.0, not sure what the reasoning was. The disks are both SCSI 10,000 RPM. My data directory is on one disk by I'd be tempted to bump it up to 2.0 or 2.5 since data is on a single disk (sequential scans *will* be faster than an index scan), but you would need to run a benchmark on your disk to see if that is right. This setting should not effect this query however unless you change the value by a few orders of magnitude. there's now 7693057 rows in monitorstatusx and monitorstatusitemx as well as the necessary rows for the join tables. Looks to me like 50% of the database should be in memory already if Linux is caching the right stuff. During normal use, what is your query spread like? Mostly selects with some inserts? Any updates or deletes? How often to updates or deletes come in, and how many rows do they effect? There is a query on monitorx by datex every 10 seconds (monitors are updated every 5 minutes, so every 10 seconds I get the monitors that are due for an update). Each monitor is then saved with its status field modified, and a new status item is inserted. This happens every 5 minutes. There are Hence the vacuum every 5 minutes. Sounds good. Every monitor is updated every 5 minutes? Before I guess the index with monitorx,datex didn't do much because all the data had the same date. But now that I have over 2 weeks of real data, it makes a difference. Yeah, the more the date diverges, the more useful that index will be. Your reports. Are they always on the most recent date, or do they vary in time. Are queries distributed against old data as much as the new stuff? If not (the examples you've shown have been recent) you might try a partial index on datex, monitorx and a single index on monitorx. CREATE INDEX ON (datex, monitorx) WHERE datex = '2003-07-01'; After data from the 1st is no longer useful (being queried frequently), drop that index, and recreate WHERE datex = relevent date here. This will make a significant improvement but will sacrifice the performance on queries for older timeframes. Since you're doing frequent updates, don't forget to drop any non-useful indexes. Updating indexes significantly lengthens the time taken for an update to occur. -- Rod Taylor [EMAIL PROTECTED] PGP Key: http://www.rbt.ca/rbtpub.asc signature.asc Description: This is a digitally signed message part
Re: [PERFORM] How to optimize monstrous query, sorts instead of
I'd be tempted to bump it up to 2.0 or 2.5 since data is on a single disk (sequential scans *will* be faster than an index scan), but you would need to run a benchmark on your disk to see if that is right. I just set it to 2.5. What kind of benchmark can I run? Every monitor is updated every 5 minutes? Yes. Actually the customer can detween between every 5-60 minutes but for a worst case scenario I set them to be 5 minutes for stress testing. Note that all 1,500 monitors are not updated at the same time, they're evenly distributed. Your reports. Are they always on the most recent date, or do they vary in time. Are queries distributed against old data as much as the new stuff? So far my plan is to have only the current month of data in the database, so at the first of every month I'll drop the old data. There are two sources of this monstrous query: the webapp lets users make the query for the current day. the reporting app makes the query for the prior day, the current week, and month to date. Between the two the queries are pretty evenly distributed. ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] How to optimize monstrous query, sorts instead of
With a slight correction (you had m mx so I changed them to be all mx, I hope this is what you intended) this query works. It's exactly the same speed, but it doesn't give me the warnings I was getting: NOTICE: Adding missing FROM-clause entry for table monitorx NOTICE: Adding missing FROM-clause entry for table monitorstatus_statusitemsx NOTICE: Adding missing FROM-clause entry for table monitorstatusitemlistd8ea58a5x I never knew what those were from, I even searched Google trying to find out and I couldn't understand it so I gave up. Thanks for pointing this out for me, and thanks for fixing my query. Michael -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Rod Taylor Sent: Wednesday, June 25, 2003 4:28 PM To: [EMAIL PROTECTED] Cc: Postgresql Performance Subject: Re: [PERFORM] How to optimize monstrous query, sorts instead of I didn't notice that before, thanks for pointing that out. I just tried adding monitorx.idx to the select and it ended up making my query take several minutes long. Any ideas how I can fix this and keep my performance? By using it aliased and non-aliased (2 different references to the same table) you've caused it to join itself. Try this: SELECT m.idx , ms.averageconnecttimex AS ms_averageconnecttime , ms.averagedurationx AS ms_averageduration , ms.datex AS ms_date , ms.idx AS ms_id , ms.statusstringx AS ms_statusstring , ms.statusx AS ms_status , msi.actualcontentx AS msi_actualcontent , msi.connecttimex AS msi_connecttime , msi.correctcontentx AS msi_correctcontent , msi.datex AS msi_date , msi.descriptionx AS msi_description , msi.durationx AS msi_duration , msi.errorcontentx AS msi_errorcontent , msi.idx AS msi_id , msi.monitorlocationx AS msi_monitorlocation , msi.statusstringx AS msi_statusstring , msi.statusx AS msi_status FROM monitorstatusx AS ms , monitorstatusitemx AS msi , monitorx AS mx , monitorstatus_statusitemsx AS mssisx , monitorstatusitemlistd8ea58a5x AS litem WHERE ms.jdoidx = mssisx.jdoidx AND mssisx.statusitemsx = litem.jdoidx AND litem.statusitemlistx = msi.jdoidx AND mx.jdoidx = ms.monitorx AND ms.datex BETWEEN '2003-06-20 08:57:21.36' AND '2003-06-29 08:57:21.36' AND m.idx = 'M-TEST_1444-TEST_00_10560561260561463219352' ORDER BY ms.datex DESC; -- Rod Taylor [EMAIL PROTECTED] PGP Key: http://www.rbt.ca/rbtpub.asc ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] How to optimize monstrous query, sorts instead of using index
Michael Mattox [EMAIL PROTECTED] writes: It's much slower but I appreciate you taking the time to try. I'm pretty new to SQL so I must admin this query is very confusing for me. I'm using Java Data Objects (JDO, an O/R mapping framework) but the implementation I'm using (Kodo) isn't smart enough to do all the joins efficiently, which is why I had to rewrite this query by hand. It wasn't till I read that :-( that I noticed that you were doing nested left joins. Fooling with the join order may be your best route to a solution --- have you read http://www.postgresql.org/docs/view.php?version=7.3idoc=0file=explicit-joins.html regards, tom lane ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match