Re: [PERFORM] Performance trouble finding records through related records
Hi, all. I've done some further analysis, found a form that works if I split things over two separate queries (B1 and B2, below) but still trouble when combining (B, below). This is the full pseudo-query: SELECT FROM A UNION SELECT FROM B ORDER BY dateTime DESC LIMIT 50 In that pseudo-query: - A is fast (few ms). A is all events for the given customer - B is slow (1 minute). B is all events for the same transactions as all events for the given customer Zooming in on B it looks originally as follows: SELECT events2.id, events2.transactionId, events2.dateTime FROM events_events events2 JOIN events_eventdetails details2_transKey ON events2.id = details2_transKey.event_id AND details2_transKey.keyname='transactionId' JOIN events_eventdetails details2_transValue ON substring(details2_transKey.value,0,32) = substring(details2_transValue.value,0,32) AND details2_transValue.keyname='transactionId' JOIN events_eventdetails customerDetails ON details2_transValue.event_id = customerDetails.event_id AND customerDetails.keyname='customer_id' AND substring(customerDetails.value,0,32)='598124' WHERE events2.eventtype_id IN (100,103,105,... et cetera ...) The above version of B is tremendously slow. The only fast version I've yet come to find is as follows: - Do a sub-query B1 - Do a sub-query B2 with the results of B1 B1 looks as follows: Works very fast (few ms) http://explain.depesz.com/s/7JS SELECT substring(details2_transValue.value,0,32) FROM events_eventdetails_customer_id customerDetails JOIN only events_eventdetails details2_transValue USING (event_id) WHERE customerDetails.keyname='customer_id' AND substring(customerDetails.value,0,32)='49' AND details2_transValue.keyname='transactionId' B2 looks as follows: Works very fast (few ms) http://explain.depesz.com/s/jGO SELECT events2.id, events2.dateTime FROM events_events events2 JOIN events_eventdetails details2_transKey ON events2.id = details2_transKey.event_id AND details2_transKey.keyname='transactionId' AND substring(details2_transKey.value,0,32) IN (... results of B1 ...) AND events2.eventtype_id IN (100,103,105,106,45,34,14,87,58,78,7,76,11,25,57,98,30,35,33,49,52,28,85,59,23,22,51,48,36,65,66,18,13,86,75,44,38,43,94,56,95,96,71,50,81,90,89,16,17,88,79,77,68,97,92,67,72,53,2,10,31,32,80,24,93,26,9,8,61,5,73,70,63,20,60,40,41,39,101,104,107,99,64,62,55,69,19,46,47,15,21,27,54,12,102,108) The combined version of B works slow again (3-10 seconds): http://explain.depesz.com/s/9oM SELECT events2.id, events2.dateTime FROM events_events events2 JOIN events_eventdetails details2_transKey ON events2.id = details2_transKey.event_id AND details2_transKey.keyname='transactionId' AND substring(details2_transKey.value,0,32) IN ( SELECT substring(details2_transValue.value,0,32) FROM events_eventdetails_customer_id customerDetails JOIN only events_eventdetails details2_transValue USING (event_id) WHERE customerDetails.keyname='customer_id' AND substring(customerDetails.value,0,32)='49' AND details2_transValue.keyname='transactionId') AND events2.eventtype_id IN (100,103,105,106,45,34,14,87,58,78,7,76,11,25,57,98,30,35,33,49,52,28,85,59,23,22,51,48,36,65,66,18,13,86,75,44,38,43,94,56,95,96,71,50,81,90,89,16,17,88,79,77,68,97,92,67,72,53,2,10,31,32,80,24,93,26,9,8,61,5,73,70,63,20,60,40,41,39,101,104,107,99,64,62,55,69,19,46,47,15,21,27,54,12,102,108) At the moment I see not other conclusion than to offer B1 and B2 to the database separately, but it feels like defeat :-| -- View this message in context: http://postgresql.1045698.n5.nabble.com/Performance-trouble-finding-records-through-related-records-tp3405914p3423334.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance trouble finding records through related records
Hi. Thanks for your response. Robert Haas wrote: Join Filter: ((events_events.transactionid)::text = (customerdetails.transactionid)::text) Now why is there a cast to text there on both sides? Do those two columns have exactly the same datatype? If not, you probably want to fix that, as it can make a big difference. Good question. I seem not able to get rid of that, even though these are same type: =# \d events_events Table public.events_events Column| Type | Modifiers --+--+-- [snip] transactionid| character varying(36)| not null [snip] =# \d events_eventdetails Table public.events_eventdetails Column | Type | Modifiers ---++-- [snip] transactionid | character varying(36) | not null [snip] (These columns allowing null or not is just something I've been playing with to no avail too.) Robert Haas wrote: Also, how many rows are there in events_events and how many in events_eventdetails? select count(*) from events_events; -- 3910163 select count(*) from events_eventdetails; -- 30216033 select count(*) from events_eventdetails_customer_id; (single partition) -- 2976101 -- View this message in context: http://postgresql.1045698.n5.nabble.com/Performance-trouble-finding-records-through-related-records-tp3405914p3413801.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance trouble finding records through related records
Merlin Moncure-2 wrote: 8.3? try converting the above to WHERE EXISTS or (even better) a JOIN... Thanks for that. But in my Mar 03, 2011; 10:19am post I already broke it down to the barebones with some variations, among which JOIN. The EXISTS IN variation was so poor that I left that one out. Best regards, Sander. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Performance-trouble-finding-records-through-related-records-tp3405914p3413814.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance trouble finding records through related records
On Thu, Mar 3, 2011 at 9:55 AM, Andy Colson a...@squeakycode.net wrote: explain analyze SELECT events_events.* FROM events_events JOIN events_eventdetails customerDetails ON events_events.transactionid = customerDetails.transactionid AND customerDetails.keyname='customer_id' AND substring(customerDetails.value,0,32)='1957' AND customerDetails.transactionid IS NOT NULL ORDER BY id LIMIT 50; -- http://explain.depesz.com/s/6aB Thanks for your efforts! Huh. Pretty much exactly the same. I'm sorry but I think I'm at my limit. I'm not sure why the nested loop takes so long, or how to get it to use something different. The join condition is showing up in the explain output as: Join Filter: ((events_events.transactionid)::text = (customerdetails.transactionid)::text) Now why is there a cast to text there on both sides? Do those two columns have exactly the same datatype? If not, you probably want to fix that, as it can make a big difference. Also, how many rows are there in events_events and how many in events_eventdetails? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance trouble finding records through related records
On Wed, Mar 2, 2011 at 6:12 PM, sverhagen sverha...@wps-nl.com wrote: Thanks for your help already! Hope you're up for some more :-) Andy Colson wrote: First off, excellent detail. Second, your explain analyze was hard to read... but since you are not really interested in your posted query, I wont worry about looking at it... but... have you seen: http://explain.depesz.com/ Thanks for that. Using it below :-) Andy Colson wrote: If you run the individual queries, without the union, are the part's slow too? Only problem is the second part. So that part can safely be isolated. Also the following does not play a role at this point: WHERE events2.eventtype_id IN (100,103,105,... Then I went ahead and denormalized the transactionId on both ends, so that both events_events records and events_eventdetails records have the transactionId (or NULL). That simplifies the query to this: SELECT events_events.* FROM events_events WHERE transactionid IN ( SELECT transactionid FROM events_eventdetails customerDetails WHERE customerDetails.keyname='customer_id' AND substring(customerDetails.value,0,32)='1957' AND transactionid IS NOT NULL ) ORDER BY id LIMIT 50; 8.3? try converting the above to WHERE EXISTS or (even better) a JOIN... merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance trouble finding records through related records
Andy Colson wrote: For your query, I think a join would be the best bet, can we see its explain analyze? Here is a few variations: SELECT events_events.* FROM events_events WHERE transactionid IN ( SELECT transactionid FROM events_eventdetails customerDetails WHERE customerDetails.keyname='customer_id' AND substring(customerDetails.value,0,32)='1957' AND transactionid IS NOT NULL ) ORDER BY id LIMIT 50; -- http://explain.depesz.com/s/Pnb explain analyze SELECT events_events.* FROM events_events, events_eventdetails customerDetails WHERE events_events.transactionid = customerDetails.transactionid AND customerDetails.keyname='customer_id' AND substring(customerDetails.value,0,32)='1957' AND customerDetails.transactionid IS NOT NULL ORDER BY id LIMIT 50; -- http://explain.depesz.com/s/rDh explain analyze SELECT events_events.* FROM events_events JOIN events_eventdetails customerDetails ON events_events.transactionid = customerDetails.transactionid AND customerDetails.keyname='customer_id' AND substring(customerDetails.value,0,32)='1957' AND customerDetails.transactionid IS NOT NULL ORDER BY id LIMIT 50; -- http://explain.depesz.com/s/6aB Thanks for your efforts! -- View this message in context: http://postgresql.1045698.n5.nabble.com/Performance-trouble-finding-records-through-related-records-tp3405914p3407689.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance trouble finding records through related records
On 3/3/2011 3:19 AM, sverhagen wrote: Andy Colson wrote: For your query, I think a join would be the best bet, can we see its explain analyze? Here is a few variations: SELECT events_events.* FROM events_events WHERE transactionid IN ( SELECT transactionid FROM events_eventdetails customerDetails WHERE customerDetails.keyname='customer_id' AND substring(customerDetails.value,0,32)='1957' AND transactionid IS NOT NULL ) ORDER BY id LIMIT 50; -- http://explain.depesz.com/s/Pnb explain analyze SELECT events_events.* FROM events_events, events_eventdetails customerDetails WHERE events_events.transactionid = customerDetails.transactionid AND customerDetails.keyname='customer_id' AND substring(customerDetails.value,0,32)='1957' AND customerDetails.transactionid IS NOT NULL ORDER BY id LIMIT 50; -- http://explain.depesz.com/s/rDh explain analyze SELECT events_events.* FROM events_events JOIN events_eventdetails customerDetails ON events_events.transactionid = customerDetails.transactionid AND customerDetails.keyname='customer_id' AND substring(customerDetails.value,0,32)='1957' AND customerDetails.transactionid IS NOT NULL ORDER BY id LIMIT 50; -- http://explain.depesz.com/s/6aB Thanks for your efforts! Huh. Pretty much exactly the same. I'm sorry but I think I'm at my limit. I'm not sure why the nested loop takes so long, or how to get it to use something different. -Andy -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance trouble finding records through related records
Thanks for your help already! Hope you're up for some more :-) Andy Colson wrote: First off, excellent detail. Second, your explain analyze was hard to read... but since you are not really interested in your posted query, I wont worry about looking at it... but... have you seen: http://explain.depesz.com/ Thanks for that. Using it below :-) Andy Colson wrote: If you run the individual queries, without the union, are the part's slow too? Only problem is the second part. So that part can safely be isolated. Also the following does not play a role at this point: WHERE events2.eventtype_id IN (100,103,105,... Then I went ahead and denormalized the transactionId on both ends, so that both events_events records and events_eventdetails records have the transactionId (or NULL). That simplifies the query to this: SELECT events_events.* FROM events_events WHERE transactionid IN ( SELECT transactionid FROM events_eventdetails customerDetails WHERE customerDetails.keyname='customer_id' AND substring(customerDetails.value,0,32)='1957' AND transactionid IS NOT NULL ) ORDER BY id LIMIT 50; To no avail. Also changing the above WHERE IN into implicit/explicit JOIN's doesn't make more than a marginal difference. Should joining not be very efficient somehow? http://explain.depesz.com/s/Pnb The above link nicely shows the hotspots, but I am at a loss now as how to approach them. Andy Colson wrote: Looked like your row counts (the estimate vs the actual) were way off, have you analyzed lately? Note sure what that means. Isn't all the maintenance nicely automated through my config? Andy Colson wrote: I could not tell from the explain analyze if an index was used, but I notice you have a ton of indexes on events_events table. Yes, a ton of indexes, but still not the right one :-) -- View this message in context: http://postgresql.1045698.n5.nabble.com/Performance-trouble-finding-records-through-related-records-tp3405914p3407330.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance trouble finding records through related records
On 03/02/2011 06:12 PM, sverhagen wrote: Thanks for your help already! Hope you're up for some more :-) Andy Colson wrote: First off, excellent detail. Second, your explain analyze was hard to read... but since you are not really interested in your posted query, I wont worry about looking at it... but... have you seen: http://explain.depesz.com/ Thanks for that. Using it below :-) Andy Colson wrote: If you run the individual queries, without the union, are the part's slow too? Only problem is the second part. So that part can safely be isolated. Also the following does not play a role at this point: WHERE events2.eventtype_id IN (100,103,105,... Then I went ahead and denormalized the transactionId on both ends, so that both events_events records and events_eventdetails records have the transactionId (or NULL). That simplifies the query to this: SELECT events_events.* FROM events_events WHERE transactionid IN ( SELECT transactionid FROM events_eventdetails customerDetails WHERE customerDetails.keyname='customer_id' AND substring(customerDetails.value,0,32)='1957' AND transactionid IS NOT NULL ) ORDER BY id LIMIT 50; To no avail. Also changing the above WHERE IN into implicit/explicit JOIN's doesn't make more than a marginal difference. Should joining not be very efficient somehow? http://explain.depesz.com/s/Pnb The above link nicely shows the hotspots, but I am at a loss now as how to approach them. Andy Colson wrote: Looked like your row counts (the estimate vs the actual) were way off, have you analyzed lately? Note sure what that means. Isn't all the maintenance nicely automated through my config? In the explain analyze you'll see stuff like: Append (cost=0.00..325645.93 rows=115469434145 width=52) (actual time=0.304..58763.738 rows=222 loops=1) This is taken from your first email. Red flags should go off when the row counts are not close. The first set is the planner's guess. The second set is what actually happened. The planner thought there would be 115,469,434,145 rows.. but turned out to only be 222. That's usually caused by bad stats. Isn't all the maintenance nicely automated through my config? I'd never assume. But the numbers in the plan you posted: http://explain.depesz.com/s/Pnb look fine to me (well, the row counts), and I didnt look to much at that plan in the first email, so we can probably ignore it. Andy Colson wrote: I could not tell from the explain analyze if an index was used, but I notice you have a ton of indexes on events_events table. Yes, a ton of indexes, but still not the right one :-) But... many indexes will slow down update/inserts. And an index on an unselective field can cause more problems than it would help. Especially if the stats are off. If PG has lots and lots of options, it'll take longer to plan querys too. If it picks an index to use, that it thinks is selective, but in reality is not, you are in for a world of hurt. For your query, I think a join would be the best bet, can we see its explain analyze? -Andy -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance trouble finding records through related records
On 03/01/2011 06:14 PM, sverhagen wrote: Hi, appreciated mailing list. Thanks already for taking your time for my performance question. Regards, Sander. ===POSTGRESQL VERSION AND ORIGIN=== PostgreSQL 8.3.9 on i486-pc-linux-gnu, compiled by GCC cc (GCC) 4.2.4 (Ubuntu 4.2.4-1ubuntu3) Installed using apt-get install postgresql-8.3 ===A DESCRIPTION OF WHAT YOU ARE TRYING TO ACHIEVE=== Query involving tables events_events and events_eventdetails. There is any number of events_eventdetails records for each events_events record. There may be multiple records in events_events that have the same value for their transactionId, which is available in one of their events_eventdetails records. We want a total query that returns events_events records that match condition I. or II., sorted by datetime descending, first 50. Condition I. All events_events records for which an events_eventdetails records that matches the following conditions: - Column keyname (in events_eventdetails) equals customerId, and - Column value (in events_eventdetails) equals 598124, or more precisely substring(customerDetails.value,0,32)='598124' Condition II. All events_events records that have a same value for in one of their events_eventdetails records with keyname 'transactionId' as any of the resulting events_events records of condition I. In other words: I want all events for a certain customerId, and all events with the same transactionId as those. The total query's time should be of the magnitude 100ms, but currently is of the magnitude 1min. JUST FOR THE PURPOSE OF EXPERIMENT I've now a denormalized copy of transactionId as a column in the events_events records. Been trying queries on those, with no improvements. I am not seeking WHY my query is too slow, rather trying to find a way to get it faster :-) much snippage First off, excellent detail. Second, your explain analyze was hard to read... but since you are not really interested in your posted query, I wont worry about looking at it... but... have you seen: http://explain.depesz.com/ Its nice. And last, to my questions: SELECT events1.id, events1.transactionId, events1.dateTime FROM events_events events1 JOIN events_eventdetails customerDetails ON events1.id = customerDetails.event_id AND customerDetails.keyname='customer_id' AND substring(customerDetails.value,0,32)='598124' WHERE events1.eventtype_id IN (100,103,105,106,45,34,14,87,58,78,7,76,11,25,57,98,30,35,33,49,52,28,85,59,23,22,51,48,36,65,66,18,13,86,75,44,38,43,94,56,95,96,71,50,81,90,89,16,17,88,79,77,68,97,92,67,72,53,2,10,31,32,80,24,93,26,9,8,61,5,73,70,63,20,60,40,41,39,101,104,107,99,64,62,55,69,19,46,47,15,21,27,54,12,102,108) UNION SELECT events2.id, events2.transactionId, events2.dateTime FROM events_events events2 JOIN events_eventdetails details2_transKey ON events2.id = details2_transKey.event_id AND details2_transKey.keyname='transactionId' JOIN events_eventdetails details2_transValue ON substring(details2_transKey.value,0,32) = substring(details2_transValue.value,0,32) AND details2_transValue.keyname='transactionId' JOIN events_eventdetails customerDetails ON details2_transValue.event_id = customerDetails.event_id AND customerDetails.keyname='customer_id' AND substring(customerDetails.value,0,32)='598124' WHERE events2.eventtype_id IN (100,103,105,106,45,34,14,87,58,78,7,76,11,25,57,98,30,35,33,49,52,28,85,59,23,22,51,48,36,65,66,18,13,86,75,44,38,43,94,56,95,96,71,50,81,90,89,16,17,88,79,77,68,97,92,67,72,53,2,10,31,32,80,24,93,26,9,8,61,5,73,70,63,20,60,40,41,39,101,104,107,99,64,62,55,69,19,46,47,15,21,27,54,12,102,108) ORDER BY dateTime DESC LIMIT 50 If you run the individual queries, without the union, are the part's slow too? Looked like your row counts (the estimate vs the actual) were way off, have you analyzed lately? I could not tell from the explain analyze if an index was used, but I notice you have a ton of indexes on events_events table. You have two indexes on the same fields but in reverse order: events_events_eventtype_id_datetime_ind (datetime, eventtype_id); events_events_datetime_eventtype_id_ind (eventtype_id, datetime); AND both eventtype_id and datetime are in other indexes! I think you need to review your indexes. Drop all of them and add one or two that are actually useful. A useful tool I have found for complex queries is to break them down into smaller sub sets, write sql that get's me just those sets, and them add them all back into one main query with subselects: select a,b,c,... from events_events where id in ( select id from details where some subset is needed ) and id not in ( select id frome details where some set is bad ) and id in ( select anotherid from anothertable where ... ) Its the subselects you need to think about. Find one that gets you a small set that's interesting somehow. Once you get all your little sets, its easy to combine them. -Andy -- Sent via