Re: [PERFORM] What is the postgres sql command for last_user_id ???
On Tue, Mar 01, 2005 at 10:46:02PM +0700, [EMAIL PROTECTED] wrote: I would like to know whether there is any command which the server will give the record ID back to the client when client puts the data and the server generates an autoincrement ID for that record. See How do I get the value of a SERIAL insert? and the question immediately following it in the FAQ: http://www.postgresql.org/files/documentation/faqs/FAQ.html#4.11.2 -- Michael Fuhr http://www.fuhr.org/~mfuhr/ ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Help with tuning this query
Ken Egervari wrote: I've tried to use Dan Tow's tuning method Who? What? and created all the right indexes from his diagraming method, but the query still performs quite slow both inside the application and just inside pgadmin III. Can anyone be kind enough to help me tune it so that it performs better in postgres? I don't think it's using the right indexes, or maybe postgres needs special treatment. I've converted the below query to SQL from a Hibernate query, so the syntax is probably not perfect but it's semantics are exactly the same. I've done so by looking at the source code, but I can't run it to get the exact SQL since I don't have the database on my home machine. Hibernate is a java thing, no? It'd be helpful to have the actual SQL the hibernate class (or whatever) generates. One of the problems with SQL is that you can have multiple ways to get the same results and it's not always possible for the planner to convert from one to the other. Anyway, people will want to see EXPLAIN ANALYSE for the query in question. Obviously, make sure you've vacuumed and analysed the tables in question recently. Oh, and make sure yousay what version of PG you're running. select s.* from shipment s inner join carrier_code cc on s.carrier_code_id = cc.id inner join carrier c on cc.carrier_id = c.id inner join carrier_to_person ctp on ctp.carrier_id = c.id inner join person p on p.id = ctp.person_id inner join shipment_status cs on s.current_status_id = cs.id inner join release_code rc on cs.release_code_id = rc.id left join shipment_status ss on ss.shipment_id = s.id where p.id = :personId and s.is_purged = false and rc.number = '9' and cs is not null and cs.date = current_date - 31 order by cs.date desc 1. Why are you quoting the 9 when checking against rc.number? 2. The cs is not null doesn't appear to be qualified - which table? Just assume I have no indexes for the moment because while some of the indexes I made make it work faster, it's still around 250 milliseconds and under heavy load, the query performs very badly (6-7 seconds). 3. If you rewrite the current_date - 31 as a suitable ago(31) function then you can use an index on cs.date 4. Are you familiar with the configuration setting join_collapse_limit? -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Help with tuning this query
Richard Huxton wrote: Ken Egervari wrote: I've tried to use Dan Tow's tuning method Who? What? http://www.singingsql.com/ Dan has written some remarkable papers on sql tuning. Some of it is pretty complex, but his book SQL Tuning is an excellent resource. -- ___ This e-mail may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this e-mail or the information it contains by other than an intended recipient is unauthorized. If you received this e-mail in error, please advise me (by return e-mail or otherwise) immediately. ___ ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Help with tuning this query
Bricklen Anderson wrote: Richard Huxton wrote: Ken Egervari wrote: I've tried to use Dan Tow's tuning method Who? What? http://www.singingsql.com/ That URL is invalid for me. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Help with tuning this query
First, what version of postgres, and have you run VACUUM ANALYZE recently? Also, please attach the result of running EXPLAIN ANALYZE. (eg, explain analyze select s.* from shipment ...) I'm using postgres 8.0. I wish I could paste explain analyze, but I won't be at work for a few days. I was hoping some Postgres/SQL experts here would be able to simply look at the query and make recommendations because it's not a very difficult or unique query. It's very possible that you don't have up-to-date statistics, which causes postgres to make a bad estimate of what the fastest plan is. I run VACUUM ANALYZE religiously. I even dumped the production database and used it as my test database after a full vacuum analyze. It's really as fresh as it can be. I don't know how to make Hibernate do what you want, but if you change the query to using subselects (not all databases support this, so hibernate might not let you), you can see a performance improvement. Yes, Hibernate supports sub-selects. In fact, I can even drop down to JDBC explicitly, so whatever SQL tricks out there I can use will work on Hibernate. In what way will sub-selects improve this query? Also sometimes using explicit joins can be worse than just letting the query manager figure it out. So something like select s.* from shipment s, carrier_code cc, carrier c, ... where s.carrier_code_id = cc.id and c.id = cc.carrier_id and I think I can avoid using joins in Hibernate, but it makes the query harder to maintain. How much of a performance benefit are we talking with this change? Since hibernate is an object language, you don't actually have to specify many joins. You can use the dot notation. Query query = session.createQuery( select shipment + from Shipment shipment + inner join shipment.cargoControlNumber.carrierCode.carrier.persons person + inner join shipment.currentStatus currentStatus + inner join currentStatus.releaseCode releaseCode + left join fetch shipment.currentStatus + where + person.id = :personId and + shipment.isPurged = false and + releaseCode.number = '9' and + currentStatus is not null and + currentStatus.date = current_date - 31 + order by currentStatus.date desc ); query.setParameter( personId, personId ); query.setFirstResult( firstResult ); query.setMaxResults( maxResults ); return query.list(); As you can see, it's fairly elegant language and maps to SQL quite well. But again, since this is generated from another program (Hibernate), I really don't know how you tell it how to tune the SQL. Probably the biggest non-bug performance improvements are from tuning the SQL. I agree, but the ones I've tried aren't good enough. I have made these indexes that apply to this query as well as others in my from looking at my SQL scripts. Many of my queries have really sped up to 14 milliseconds from these indexes. But I can't make this query run any faster. CREATE INDEX carrier_to_person_person_id_idx ON carrier_to_person USING btree (person_id); CREATE INDEX carrier_to_person_carrier_id_idx ON carrier_to_person USING btree (carrier_id); CREATE INDEX carrier_code_carrier_id_idx ON carrier_code USING btree (carrier_id); CREATE INDEX shipment_carrier_code_id_idx ON shipment USING btree (carrier_code_id); CREATE INDEX current_status_date_idx ON shipment_status USING btree (date); CREATE INDEX shipment_current_status_id_idx ON shipment USING btree (current_status_id); CREATE INDEX shipment_status_shipment_id_idx ON shipment_status USING btree (shipment_id); Thanks for your responses everyone. I'll try and get you that explain analyze. I'm just not at work at the moment but this is a problem that I'm simply puzzled and worried about. I'm getting all of this from CVS on my work server. Ken ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Help with tuning this query
Ken Egervari wrote: First, what version of postgres, and have you run VACUUM ANALYZE recently? Also, please attach the result of running EXPLAIN ANALYZE. (eg, explain analyze select s.* from shipment ...) I'm using postgres 8.0. I wish I could paste explain analyze, but I won't be at work for a few days. I was hoping some Postgres/SQL experts here would be able to simply look at the query and make recommendations because it's not a very difficult or unique query. That's the problem. Without explain analyze, it's hard to say why it is performing weird, because it *does* look like a straightforward query. It's very possible that you don't have up-to-date statistics, which causes postgres to make a bad estimate of what the fastest plan is. I run VACUUM ANALYZE religiously. I even dumped the production database and used it as my test database after a full vacuum analyze. It's really as fresh as it can be. Good. Again, this is just the first precaution, as not everyone is as careful as you. And without the explain analyze, you can't tell what the planner estimates are. I don't know how to make Hibernate do what you want, but if you change the query to using subselects (not all databases support this, so hibernate might not let you), you can see a performance improvement. Yes, Hibernate supports sub-selects. In fact, I can even drop down to JDBC explicitly, so whatever SQL tricks out there I can use will work on Hibernate. In what way will sub-selects improve this query? When doing massive joins across multiple tables (as you are doing) it is frequently faster to do a couple of small joins where you only need a couple of rows as input to the rest. Something like: select * from shipment s where s.carrier_code_id in (select cc.id from carrier_code cc join carrier c on cc.carrier_id = c.id) and s.current_status_id in (select cs.id from shipment_status cs where ...) Again it's something that you can try. I have found quite a few of my queries performed much better with subselects. I'm guessing it's because with big queries it has a harder time figuring out how to refactor (the decision tree becomes big). But I'm not really sure. I just know it can work. Also sometimes using explicit joins can be worse than just letting the query manager figure it out. So something like select s.* from shipment s, carrier_code cc, carrier c, ... where s.carrier_code_id = cc.id and c.id = cc.carrier_id and I think I can avoid using joins in Hibernate, but it makes the query harder to maintain. How much of a performance benefit are we talking with this change? Since hibernate is an object language, you don't actually have to specify many joins. You can use the dot notation. I'm not saying this *will* improve performance. It is just something to try. It very easily could not be worth the overhead. Query query = session.createQuery( select shipment + from Shipment shipment + inner join shipment.cargoControlNumber.carrierCode.carrier.persons person + inner join shipment.currentStatus currentStatus + inner join currentStatus.releaseCode releaseCode + left join fetch shipment.currentStatus + where + person.id = :personId and + shipment.isPurged = false and + releaseCode.number = '9' and + currentStatus is not null and + currentStatus.date = current_date - 31 + order by currentStatus.date desc ); query.setParameter( personId, personId ); query.setFirstResult( firstResult ); query.setMaxResults( maxResults ); return query.list(); As you can see, it's fairly elegant language and maps to SQL quite well. But again, since this is generated from another program (Hibernate), I really don't know how you tell it how to tune the SQL. Probably the biggest non-bug performance improvements are from tuning the SQL. I agree, but the ones I've tried aren't good enough. I have made these indexes that apply to this query as well as others in my from looking at my SQL scripts. Many of my queries have really sped up to 14 milliseconds from these indexes. But I can't make this query run any faster. CREATE INDEX carrier_to_person_person_id_idx ON carrier_to_person USING btree (person_id); CREATE INDEX carrier_to_person_carrier_id_idx ON carrier_to_person USING btree (carrier_id); CREATE INDEX carrier_code_carrier_id_idx ON carrier_code USING btree (carrier_id); CREATE INDEX shipment_carrier_code_id_idx ON shipment USING btree (carrier_code_id); CREATE INDEX current_status_date_idx ON shipment_status USING btree (date); CREATE INDEX shipment_current_status_id_idx ON shipment USING btree (current_status_id); CREATE INDEX shipment_status_shipment_id_idx ON shipment_status USING btree (shipment_id); Thanks for your responses everyone. I'll try and get you that explain analyze. I'm just not at work at the moment but this is a problem that I'm simply puzzled and worried
[PERFORM] Performance tradeoff
Hi All, I am wondering about the relative performance of insert into table1 select distinct a,b from ... and insert into table1 select a,b from ... group by a,b when querying tables of different sizes (10K, 100K, 1s, 10s, 100s of millions of rows). The distinct way tends to sort/unique and the group by tends to hash aggregate... any opinions on which is better? I can also change the schema to a certain extent, so would it be worthwhile to put indices on the queried tables (or refactor them) hoping the distinct does an index scan instead of sort... would the query planner take advantage of that? Thanks, Shawn ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Help with tuning this query
On Wed, 2005-03-02 at 01:51 -0500, Ken Egervari wrote: select s.* from shipment s inner join carrier_code cc on s.carrier_code_id = cc.id inner join carrier c on cc.carrier_id = c.id inner join carrier_to_person ctp on ctp.carrier_id = c.id inner join person p on p.id = ctp.person_id inner join shipment_status cs on s.current_status_id = cs.id inner join release_code rc on cs.release_code_id = rc.id left join shipment_status ss on ss.shipment_id = s.id where p.id = :personId and s.is_purged = false and rc.number = '9' and cs is not null and cs.date = current_date - 31 order by cs.date desc ... shipment contains 40,000 rows shipment_status contains 80,000 rows I may be missing something, but it looks like the second join on shipment_status (the left join) is not adding anything to your results, except more work. ss is not used for output, nor in the where clause, so what is its purpose ? if cs.date has an upper limit, it might be helpful to change the condition to a BETWEEN in any case, i would think you might need an index on shipment(carrier_code_id) shipment(current_status_id) shipment_status(id) gnari ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Help with tuning this query
select s.* from shipment s inner join carrier_code cc on s.carrier_code_id = cc.id inner join carrier c on cc.carrier_id = c.id inner join carrier_to_person ctp on ctp.carrier_id = c.id inner join person p on p.id = ctp.person_id inner join shipment_status cs on s.current_status_id = cs.id inner join release_code rc on cs.release_code_id = rc.id left join shipment_status ss on ss.shipment_id = s.id where p.id = :personId and s.is_purged = false and rc.number = '9' and cs is not null and cs.date = current_date - 31 order by cs.date desc ... shipment contains 40,000 rows shipment_status contains 80,000 rows I may be missing something, but it looks like the second join on shipment_status (the left join) is not adding anything to your results, except more work. ss is not used for output, nor in the where clause, so what is its purpose ? It does look strange doesn't it? I would think the same thing if it were the first time I looked at it. But rest assured, it's done by design. A shipment relates to many shipment_status rows, but only 1 is the current shipment_status for the shipment. The first does queries on the current status only and doesn't analyze the rest of the related items. The second left join is for eager loading so that I don't have to run a seperate query to fetch the children for each shipment. This really does improve performance because otherwise you'll have to make N+1 queries to the database, and that's just too much overhead. Since I need all the shipment_status children along with the shipment for the domain logic to work on them, I have to load them all. On average, a shipment will have 2 shipment_status rows. So if the query selects 100 shipments, the query returns 200 rows. Hibernate is intelligent enough to map the shipment_status children to the appropriate shipment automatically. if cs.date has an upper limit, it might be helpful to change the condition to a BETWEEN Well, I could create an upper limit. It would be the current date. Would adding in this redundant condition improve performance? I've clustered the shipment table so that the dates are together, which has improved performance. I'm not sure adding in this implicit condition will speed up anything, but I will definately try it. in any case, i would think you might need an index on shipment(carrier_code_id) shipment(current_status_id) shipment_status(id) Unfortunately, I have indexes on all three (Postgres implicitly creates indexes for unique keys). Here are the other 2 that are already created: CREATE INDEX shipment_carrier_code_id_idx ON shipment USING btree (carrier_code_id); CREATE INDEX shipment_current_status_id_idx ON shipment USING btree (current_status_id); So I guess we've been thinking the same thing. Don't get me wrong. These indexes speed up the query from 1.6 seconds to 250 milliseconds. I just need to be around 30 milliseconds. Another idea that had occured to me was trying to force postgres to driver on the person table because that filter ratio is so great compared to everything else, but I do remember looking at the explain days ago and it was one of the last tables being filtered/joined. Is there anyway to force postgres to pick person? The reason I ask is because this would really reduce the number of rows it pulls out from the shipment table. Thanks for comments. I'll try making that date explicit and change the query to use between to see if that does anything. Regards and many thanks, Ken ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Help with tuning this query
On Wed, 2005-03-02 at 13:28 -0500, Ken Egervari wrote: select s.* from shipment s inner join carrier_code cc on s.carrier_code_id = cc.id inner join carrier c on cc.carrier_id = c.id inner join carrier_to_person ctp on ctp.carrier_id = c.id inner join person p on p.id = ctp.person_id inner join shipment_status cs on s.current_status_id = cs.id inner join release_code rc on cs.release_code_id = rc.id left join shipment_status ss on ss.shipment_id = s.id where p.id = :personId and s.is_purged = false and rc.number = '9' and cs is not null and cs.date = current_date - 31 order by cs.date desc I may be missing something, but it looks like the second join on shipment_status (the left join) is not adding anything to your results, except more work. ss is not used for output, nor in the where clause, so what is its purpose ? ... The second left join is for eager loading so that I don't have to run a seperate query to fetch the children for each shipment. This really does improve performance because otherwise you'll have to make N+1 queries to the database, and that's just too much overhead. are you saying that you are actually doing a select s.*,ss.* ... ? if cs.date has an upper limit, it might be helpful to change the condition to a BETWEEN Well, I could create an upper limit. It would be the current date. Would adding in this redundant condition improve performance? it might help the planner estimate better the number of cs rows affected. whether this improves performance depends on whether the best plans are sensitive to this. an EXPLAIN ANALYSE might reduce the guessing. gnari ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Help with tuning this query
left join is for eager loading so that I don't have to run a seperate query to fetch the children for each shipment. This really does improve performance because otherwise you'll have to make N+1 queries to the database, and that's just too much overhead. are you saying that you are actually doing a select s.*,ss.* ... ? Yes, this is how the SQL should be written. When I manually converted the query, I forgot to include this detail. In hibernate, you don't need to specifiy the ss.* because you are dealing with objects, so you just say shipment. The ss.* is indicated in the fetch part of the Hibernate query. That was my mistake. it might help the planner estimate better the number of cs rows affected. whether this improves performance depends on whether the best plans are sensitive to this. This sounds like a good idea since cs rows are quite large. shipment and shipment_status are the largest tables in the database and they will grow very large over time. ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Help with tuning this query (with explain analyze finally)
it might help the planner estimate better the number of cs rows affected. whether this improves performance depends on whether the best plans are sensitive to this. I managed to try this and see if it did anything. Unfortunately, it made no difference. It's still 250 milliseconds. It was a good suggestion though. I believed it work too. an EXPLAIN ANALYSE might reduce the guessing. Okay, here is the explain analyze I managed to get from work. It came out to 312ms here, but without the analyze it actually runs at ~250ms. It is using indexes, so my guess is that there are too many joins or it's not driving on person fast enough. Release code is such a small table that I dont think that sequencial scan matters. Thanks for taking the time to analyze this. Sort (cost=1902.27..1902.31 rows=17 width=91) (actual time=312.000..312.000 rows=39 loops=1) Sort Key: ss.date - Hash Join (cost=617.07..1901.92 rows=17 width=91) (actual time=234.000..312.000 rows=39 loops=1) Hash Cond: (outer.carrier_code_id = inner.id) - Merge Join (cost=602.54..1882.73 rows=870 width=91) (actual time=234.000..312.000 rows=310 loops=1) Merge Cond: (outer.current_status_id = inner.id) - Index Scan using shipment_current_status_id_idx on shipment s (cost=0.00..2552.13 rows=60327 width=66) (actual time=0.000..61.000 rows=27711 loops=1) Filter: (is_purged = false) - Sort (cost=602.54..607.21 rows=1866 width=25) (actual time=125.000..125.000 rows=6934 loops=1) Sort Key: ss.id - Hash Join (cost=1.11..501.17 rows=1866 width=25) (actual time=0.000..78.000 rows=6934 loops=1) Hash Cond: (outer.release_code_id = inner.id) - Index Scan using current_status_date_idx on shipment_status ss (cost=0.00..406.78 rows=14924 width=25) (actual time=0.000..47.000 rows=15053 loops=1) Index Cond: (date = (('now'::text)::date - 31)) Filter: (id IS NOT NULL) - Hash (cost=1.10..1.10 rows=1 width=4) (actual time=0.000..0.000 rows=0 loops=1) - Seq Scan on release_code rc (cost=0.00..1.10 rows=1 width=4) (actual time=0.000..0.000 rows=1 loops=1) Filter: ((number)::text = '9'::text) - Hash (cost=14.53..14.53 rows=2 width=4) (actual time=0.000..0.000 rows=0 loops=1) - Nested Loop (cost=4.92..14.53 rows=2 width=4) (actual time=0.000..0.000 rows=2 loops=1) - Index Scan using person_pkey on person p (cost=0.00..5.75 rows=1 width=4) (actual time=0.000..0.000 rows=1 loops=1) Index Cond: (id = 355) - Hash Join (cost=4.92..8.75 rows=2 width=8) (actual time=0.000..0.000 rows=2 loops=1) Hash Cond: (outer.id = inner.carrier_id) - Seq Scan on carrier c (cost=0.00..3.54 rows=54 width=4) (actual time=0.000..0.000 rows=54 loops=1) - Hash (cost=4.92..4.92 rows=2 width=16) (actual time=0.000..0.000 rows=0 loops=1) - Hash Join (cost=3.04..4.92 rows=2 width=16) (actual time=0.000..0.000 rows=2 loops=1) Hash Cond: (outer.carrier_id = inner.carrier_id) - Seq Scan on carrier_code cc (cost=0.00..1.57 rows=57 width=8) (actual time=0.000..0.000 rows=57 loops=1) - Hash (cost=3.04..3.04 rows=1 width=8) (actual time=0.000..0.000 rows=0 loops=1) - Index Scan using carrier_to_person_person_id_idx on carrier_to_person ctp (cost=0.00..3.04 rows=1 width=8) (actual time=0.000..0.000 rows=1 loops=1) Index Cond: (355 = person_id) Total runtime: 312.000 ms Ken ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Help with tuning this query (with explain analyze finally)
Ken Egervari [EMAIL PROTECTED] writes: Okay, here is the explain analyze I managed to get from work. What platform is this on? It seems very strange/fishy that all the actual-time values are exact integral milliseconds. regards, tom lane My machine is WinXP professional, athon xp 2100, but I get similar results on my Intel P4 3.0Ghz as well (which is also running WinXP). Why do you ask? ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Help with tuning this query (Some musings)
I took John's advice and tried to work with sub-selects. I tried this variation, which actually seems like it would make a difference conceptually since it drives on the person table quickly. But to my surprise, the query runs at about 375 milliseconds. I think it's because it's going over that shipment table multiple times, which is where the results are coming from. select s.*, ss.* from shipment s inner join shipment_status ss on s.current_status_id=ss.id inner join release_code rc on ss.release_code_id=rc.id left outer join driver d on s.driver_id=d.id left outer join carrier_code cc on s.carrier_code_id=cc.id where s.id in ( select s.id from person p inner join carrier_to_person ctp on p.id=ctp.person_id inner join carrier c on ctp.carrier_id=c.id inner join carrier_code cc on cc.carrier_id = c.id inner join shipment s on s.carrier_code_id = cc.id where p.id = 355 and s.current_status_id is not null and s.is_purged=false ) and(rc.number='9' ) and(ss.date=current_date-31 ) order by ss.date desc *** Musing 1 Also, s.current_status_id is not null is an important filter that I forgot to mention. In this example where p.id = 355, it filters out 90% of the rows. In general, that filter ratio is 0.46 though, which is not quite so high. However, this filter gets better over time because more and more users will use a filter that will make this value null. It's still not as strong as person though and probably never will be. But I thought I'd mention it nonetheless. *** Musing 2 I do think that the filter ss.date=current_date-31 is slowing this query down. I don't think it's the mention of current_date or even that it's dynamic instead of static. I think the range is just too big. For example, if I use: and ss.date between '2005-02-01 00:00:00' and '2005-02-28 23:59:59' The query still results in 250 milliseconds. But if I make the range very small - say Feb 22nd of 2005: and ss.date between '2005-02-22 00:00:00' and '2005-02-22 23:59:59' Now the entire query runs in 47 milliseconds on average. If I can't make this query perform any better, should I change the user interface to select the date instead of showing the last 31 days to benefit from this single-day filter? This causes more clicks to select the day (like from a calendar), but most users probably aren't interested in seeing the entire listing anyway. However, it's a very important requirement that users know that shipment enteries exist in the last 31 days (because they are usually sure-fire problems if they are still in this query after a few days). I guess I'm wondering if tuning the query is futile and I should get the requirements changed, or is there something I can do to really speed it up? Thanks again, Ken ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Help with tuning this query (more musings)
Ken Egervari wrote: I took John's advice and tried to work with sub-selects. I tried this variation, which actually seems like it would make a difference conceptually since it drives on the person table quickly. But to my surprise, the query runs at about 375 milliseconds. I think it's because it's going over that shipment table multiple times, which is where the results are coming from. I also made a version that runs over shipment a single time, but it's exactly 250 milliseconds. I guess the planner does the exact same thing. Why are you now left joining driver and carrier code, but inner joining shipment_status? I assume this is the *real* query that you are executing. From the earlier explain analyze, and your statements, the initial person p should be the heavily selective portion. And what does driver get you? It isn't in the return, and it isn't part of a selectivity clause. You are also double joining against carrier code, once as a left outer join, and once in the inner join. This query doesn't seem quite right. Are you sure it is generating the rows you are expecting? select s.*, ss.* from shipment s inner join shipment_status ss on s.current_status_id=ss.id inner join release_code rc on ss.release_code_id=rc.id left outer join driver d on s.driver_id=d.id left outer join carrier_code cc on s.carrier_code_id=cc.id where s.carrier_code_id in ( select cc.id from person p inner join carrier_to_person ctp on p.id=ctp.person_id inner join carrier c on ctp.carrier_id=c.id inner join carrier_code cc on cc.carrier_id = c.id where p.id = 355 ) and s.current_status_id is not null and s.is_purged=false and(rc.number='9' ) and(ss.date=current_date-31 ) order by ss.date desc You might want to post the explain analyze of this query to have a point of reference, but what about something like this: select s.*, ss.* from shipment_status ss on s.current_status_id=ss.id join (select s.* from shipment s where s.carrier_code_id in (select cc.id from person p inner join carrier_to_person ctp on p.id=ctp.person_id inner join carrier c on ctp.carrier_id=c.id inner join carrier_code cc on cc.carrier_id = c.id where p.id = 355 ) and s.current_status_id is not null and s.is_purged=false ) as i -- Just a name for the subselect since it is in a join inner join release_code rc on ss.release_code_id=rc.id where (rc.number='9' ) and(ss.date between current_date-31 and current_date()) order by ss.date desc My idea with this query is to minimize the number of shipment rows that need to be generated before joining with the other rows. My syntax is probably a little bit off, since I can't actually run it against real tables. But looking at your *original* query, you were getting 15000 rows out of shipment_status, and then 27700 rows out of shipment, which was then being merge-joined down to only 300 rows, and then hash-joined down to 39. I'm just trying to think of ways to prevent it from blossoming into 27k rows to start with. Please double check your query, because it seems to be grabbing unnecessary rows with the left joins, and then post another explain analyze with one (or several) different subselect forms. John =:- signature.asc Description: OpenPGP digital signature
Re: [PERFORM] Performance tradeoff
Shawn, I can also change the schema to a certain extent, so would it be worthwhile to put indices on the queried tables (or refactor them) hoping the distinct does an index scan instead of sort... would the query planner take advantage of that? Use the GROUP BY, with an index on the grouped columns and lots of work_mem (sort_mem in 7.4). This will give the planner the option of a hashaggregate which could be significantly faster than the other methods. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Help with tuning this query
Ken, I've tried to use Dan Tow's tuning method and created all the right indexes from his diagraming method, but the query still performs quite slow both inside the application and just inside pgadmin III. Can anyone be kind enough to help me tune it so that it performs better in postgres? I don't think it's using the right indexes, or maybe postgres needs special treatment. FWIW, I picked up Dan Tow's book to give it a read, and they guy isn't qualified to author SQL Tuning. You should chuck that book, it won't help you -- not with Oracle or SQL Server, and certainly not with PostgreSQL. O'Reilly continues to have trouble turning out quality database books. Also, if you *were* using Dan's method, you'd be driving off Person, not Shipment. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Help with tuning this query (with explain analyze finally)
Ken, - Merge Join (cost=602.54..1882.73 rows=870 width=91) (actual time=234.000..312.000 rows=310 loops=1) Merge Cond: (outer.current_status_id = inner.id) Hmmm ... this merge join appears to be the majority of your execution time at least within the resolution that PGWin allows us.Please try two things, and give us Explain Analyzes: 1) To determine your query order ala Dan Tow and drive off of person, please SET JOIN_COLLAPSE_LIMIT = 1 and then run Mark Kirkwood's version of the query. (Not that I believe in Dan Tow ... see previous message ... but it would be interesting to see the results. 2) Force PG to drop the merge join via SET ENABLE_MERGEJOIN = FALSE; Also, please let us know some about the server you're using and your configuration parameters, particularly: shared_buffers work_mem effective_cache_size random_page_cost -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Help with tuning this query (with explain analyze finally)
Josh, 1) To determine your query order ala Dan Tow and drive off of person, please SET JOIN_COLLAPSE_LIMIT = 1 and then run Mark Kirkwood's version of the query. (Not that I believe in Dan Tow ... see previous message ... but it would be interesting to see the results. Unfortunately, the query still takes 250 milliseconds. I tried it with other queries and the results are the same as before. Here is the explain analayze anyway: Sort (cost=2036.83..2036.87 rows=16 width=103) (actual time=328.000..328.000 rows=39 loops=1) Sort Key: cs.date - Nested Loop Left Join (cost=620.61..2036.51 rows=16 width=103) (actual time=250.000..328.000 rows=39 loops=1) - Hash Join (cost=620.61..1984.90 rows=16 width=78) (actual time=250.000..328.000 rows=39 loops=1) Hash Cond: (outer.carrier_code_id = inner.id) - Merge Join (cost=606.11..1965.99 rows=825 width=74) (actual time=250.000..328.000 rows=310 loops=1) Merge Cond: (outer.current_status_id = inner.id) - Index Scan using shipment_current_status_id_idx on shipment s (cost=0.00..2701.26 rows=60307 width=66) (actual time=0.000..77.000 rows=27711 loops=1) Filter: (is_purged = false) - Sort (cost=606.11..610.50 rows=1756 width=12) (actual time=141.000..141.000 rows=6902 loops=1) Sort Key: cs.id - Hash Join (cost=1.11..511.48 rows=1756 width=12) (actual time=0.000..109.000 rows=6902 loops=1) Hash Cond: (outer.release_code_id = inner.id) - Index Scan Backward using current_status_date_idx on shipment_status cs (cost=0.01..422.58 rows=14047 width=16) (actual time=0.000..78.000 rows=14925 loops=1) Index Cond: ((date = (('now'::text)::date - 31)) AND (date = ('now'::text)::date)) Filter: (cs.* IS NOT NULL) - Hash (cost=1.10..1.10 rows=1 width=4) (actual time=0.000..0.000 rows=0 loops=1) - Seq Scan on release_code rc (cost=0.00..1.10 rows=1 width=4) (actual time=0.000..0.000 rows=1 loops=1) Filter: ((number)::text = '9'::text) - Hash (cost=14.49..14.49 rows=2 width=8) (actual time=0.000..0.000 rows=0 loops=1) - Nested Loop (cost=6.87..14.49 rows=2 width=8) (actual time=0.000..0.000 rows=2 loops=1) - Index Scan using person_pkey on person p (cost=0.00..5.73 rows=1 width=4) (actual time=0.000..0.000 rows=1 loops=1) Index Cond: (id = 355) - Hash Join (cost=6.87..8.74 rows=2 width=8) (actual time=0.000..0.000 rows=2 loops=1) Hash Cond: (outer.carrier_id = inner.carrier_id) - Seq Scan on carrier_code cc (cost=0.00..1.57 rows=57 width=8) (actual time=0.000..0.000 rows=57 loops=1) - Hash (cost=6.86..6.86 rows=1 width=12) (actual time=0.000..0.000 rows=0 loops=1) - Hash Join (cost=3.04..6.86 rows=1 width=12) (actual time=0.000..0.000 rows=1 loops=1) Hash Cond: (outer.id = inner.carrier_id) - Seq Scan on carrier c (cost=0.00..3.54 rows=54 width=4) (actual time=0.000..0.000 rows=54 loops=1) - Hash (cost=3.04..3.04 rows=1 width=8) (actual time=0.000..0.000 rows=0 loops=1) - Index Scan using carrier_to_person_person_id_idx on carrier_to_person ctp (cost=0.00..3.04 rows=1 width=8) (actual time=0.000..0.000 rows=1 loops=1) Index Cond: (355 = person_id) - Index Scan using shipment_status_shipment_id_idx on shipment_status ss (cost=0.00..3.20 rows=2 width=25) (actual time=0.000..0.000 rows=1 loops=39) Index Cond: (ss.shipment_id = outer.id) Total runtime: 328.000 ms 2) Force PG to drop the merge join via SET ENABLE_MERGEJOIN = FALSE; Setting this option had no effect either In fact, the query is a bit slower (266 milliseconds but 250 came up once in 20 executions). Also, please let us know some about the server you're using and your configuration parameters, particularly: shared_buffers work_mem effective_cache_size random_page_cost Well, I'm on a test machine so the settings haven't changed one bit from the defaults. This may sound embarrassing, but I bet the production server is not custom configured either. The computer I'm running these queries on is just a simple Athon XP 2100+ on WinXP with 1GB of RAM. The production server is a faster P4, but the rest is the same. Here are the 4 values in my configuration, but