Re: [PERFORM] query optimization question
On Thu, 29 Jan 2004, Tom Lane wrote: jackdb-# GROUP BY memberid_ HAVING ( Um, that's not what I had in mind at all. Does GROUP BY actually do anything at all here? (You didn't answer me as to whether memberid_ is a unique identifier or not, but if it is, this GROUP BY is just an expensive no-op.) From your comment I assume that there is no transformation in pg that detects that the group by columns are unique? this is all blue-sky speculation anyway. What I'm actually more interested in is your statement that MSSQL can do the original query quickly. I find that a bit hard to believe because I don't see any relevant optimization techniques. Getting rid of the group by would not give that kind of speedup? Maybe mssql manage to rewrite the query like that before executing. -- /Dennis Björklund ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] query optimization question
On Thu, 29 Jan 2004, Jack Coates wrote: Probably better to repost it as a gzip'd attachment. That should complete with a picture of the GUI version. 26k zipped, let's see if this makes it through. Are you sure you attached it? At least when it got here there was no attachment. -- /Dennis Björklund ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] query optimization question
Tom Lane [EMAIL PROTECTED] writes: Jack Coates [EMAIL PROTECTED] writes: yup -- here it is. It will probably be a nasty mess after linewrap gets done with it, yup, sure is :-( If I was familiar with the layout I could probably decipher where the line breaks are supposed to be, but right now I'm just confused. I just replaced all newlines that are followed by lines starting in column 1 with spaces and got something reasonable: SELECT DISTINCT members_.memberid_ FROM members_ WHERE ( members_.List_='list1' AND members_.MemberType_='normal'AND members_.SubType_='mail' ) GROUP BY memberid_ HAVING ( ( select count(*) from lyrActiveRecips, outmail_where outmail11 1 0 NULLNULL1 NULL102274.5 NULLNULLNULL104.10356 NULLNULLSELECT 0 NULL |--Parallelism(Gather Streams)11 2 1 Parallelism Gather Streams NULLNULL102274.50.0 0.22011127 23 104.10356 [members_].[MemberID_] NULLPLAN_ROW-1 1.0 |--Filter(WHERE:(If ([Expr1006] IS NULL) then 0 else [Expr1006]+If ([Expr1012] IS NULL) then 0 else [Expr1012]=3)) 11 3 2 Filter Filter WHERE:(If ([Expr1006] IS NULL) then 0 else [Expr1006]+If ([Expr1012] IS NULL) then 0 else [Expr1012]=3) NULL102274.50.0 3.5393338 23 103.88345 [members_].[MemberID_] NULLPLAN_ROW-1 1.0 |--Hash Match(Right Outer Join, HASH:([lyrCompletedRecips].[MemberID])=([members_].[MemberID_]), RESIDUAL:([members_].[MemberID_]=[lyrCompletedRecips].[MemberID])) 11 4 3 Hash Match Right Outer Join HASH:([lyrCompletedRecips].[MemberID])=([members_].[MemberID_]), RESIDUAL:([members_].[MemberID_]=[lyrCompletedRecips].[MemberID]) NULL 4782883.5 0.0 21.874712 23 100.34412 [members_].[MemberID_], [Expr1006], [Expr1012] NULLPLAN_ROW-1 1.0 |--Compute Scalar(DEFINE:([Expr1012]=Convert([Expr1020]))) 11 5 4 Compute Scalar Compute Scalar DEFINE:([Expr1012]=Convert([Expr1020])) [Expr1012]=Convert([Expr1020]) 119575.35 0.0 1.3723248 15 4.3749919 [lyrCompletedRecips].[MemberID], [Expr1012] NULLPLAN_ROW-1 1.0 ||--Hash Match(Aggregate, HASH:([lyrCompletedRecips].[MemberID]), RESIDUAL:([lyrCompletedRecips].[MemberID]=[lyrCompletedRecips].[MemberID]) DEFINE:([Expr1020]=COUNT(*))) 11 6 5 Hash Match Aggregate HASH:([lyrCompletedRecips].[MemberID]), RESIDUAL:([lyrCompletedRecips].[MemberID]=[lyrCompletedRecips].[MemberID]) [Expr1020]=COUNT(*) 119575.35 0.0 1.3723248 15 4.3749919 [lyrCompletedRecips].[MemberID], [Expr1020] NULLPLAN_ROW-1 1.0 | |--Parallelism(Repartition Streams, PARTITION COLUMNS:([lyrCompletedRecips].[MemberID])) 11 7 6 Parallelism Repartition Streams PARTITION COLUMNS:([lyrCompletedRecips].[MemberID]) NULL 119640.60.0 0.32407209 173 3.002667 [lyrCompletedRecips].[MemberID] NULLPLAN_ROW-1 1.0 | |--Nested Loops(Inner Join, OUTER REFERENCES:([outmail_].[MessageID_])) 11 8 7 Nested LoopsInner Join OUTER REFERENCES:([outmail_].[MessageID_]) NULL119640.60.0 0.75014657 173 2.6785948 [lyrCompletedRecips].[MemberID] NULL PLAN_ROW-1 1.0 | |--Parallelism(Distribute Streams) 11 9 8 Parallelism Distribute Streams NULLNULL1.0 0.0 2.8501874E-2128 9.4664574E-2[outmail_].[MessageID_] NULLPLAN_ROW -1 1.0 | ||--Clustered Index Scan(OBJECT:([lmdb].[dbo].[outmail_].[IX_outmail_list]), WHERE:([outmail_].[Type_]='list'))11 10 9 Clustered Index Scan Clustered Index ScanOBJECT:([lmdb].[dbo].[outmail_].[IX_outmail_list]), WHERE:([outmail_].[Type_]='list') [outmail_].[Type_], [outmail_].[MessageID_] 1.0 0.01878925 3.981E-5128 3.7658099E-2[outmail_].[Type_], [outmail_].[MessageID_] NULLPLAN_ROW0 1.0 | |--Clustered Index Seek(OBJECT:([lmdb].[dbo].[lyrCompletedRecips].[IX_CompletedRecipsMailingID]), SEEK:([lyrCompletedRecips].[MailingID]=[outmail_].[MessageID_]), WHERE:([lyrCompletedRecips].[CompletionStatusID]=300 AN11 11 8 Clustered Index SeekClustered Index Seek OBJECT:([lmdb].[dbo].[lyrCompletedRecips].[IX_CompletedRecipsMailingID]), SEEK:([lyrCompletedRecips].[MailingID]=[outmail_].[MessageID_]),
Re: [PERFORM] query optimization question
On Thu, 2004-01-29 at 23:23, Dennis Bjorklund wrote: On Thu, 29 Jan 2004, Jack Coates wrote: Probably better to repost it as a gzip'd attachment. That should complete with a picture of the GUI version. 26k zipped, let's see if this makes it through. Are you sure you attached it? At least when it got here there was no attachment. argh; attached the 40K version which was in color, removed it to make the new one with greyscale and forgot to attach that. Here it is again: -- Jack Coates, Lyris Technologies Applications Engineer 510-549-4350 x148, [EMAIL PROTECTED] Interoperability is the keyword, uniformity is a dead end. --Olivier Fourdan pg-perf-sql-plan.tgz Description: application/compressed-tar ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] query optimization question
On Wed, 2004-01-28 at 18:04, Tom Lane wrote: Jack Coates [EMAIL PROTECTED] writes: I've got a query that needs some help, please. Is there a way to avoid all the looping? I've got freedom to work with the double-indented sections below ) AND (, but the initial select distinct wrapper is much more difficult to change. This is auto-generated code. Well, you're not going to get any serious improvement without a wholesale rewrite of the query --- I'd think that something driven by a GROUP BY memberid_ HAVING count(*) = whatever at the outer level would be a better way to approach it. As you have it, the system has no choice but to fully evaluate two very expensive subselects, from scratch, for each outer row. I hear you. There's definitely an understanding that this tool can generate some gnarly queries, and we want to redesign in a way that will allow some more intelligence to be applied to the problem. In the meantime, I'll be happy if PG grinds at the same level as other databases. MS-SQL completed that query in 25 minutes on a database with 31 times the data in it. Since I'm one of the bigger *nix fans around here, that doesn't make me happy. However... ( select count(*) from lyrActiveRecips, members_ a, outmail_ where lyrActiveRecips.UserName = a.UserNameLC_ and lyrActiveRecips.Domain = a.Domain_ and a.MemberID_ = members_.MemberID_ and outmail_.MessageID_ = lyrActiveRecips.MailingID Is memberid_ a unique identifier for members_, as one would think from the name? If so, can't you drop the join of members_ a in this subselect, and just use the corresponding fields from the outer table? ( select count(*) from lyrCompletedRecips, members_ a, outmail_ where a.MemberID_ = lyrCompletedRecips.MemberID and a.UserNameLC_ = members_.UserNameLC_ and a.Domain_ = members_.Domain_ and outmail_.MessageID_ = lyrCompletedRecips.MailingID Why are the join conditions different here from the other subselect? Can't you rephrase them the same as above, and then again remove the inner appearance of members_ ? regards, tom lane unfortunately, the column names are different between lyrcompletedrecips and lyractiverecips. However, one thing we were able to do is to reduce the number of queries by not trying to match across multiple lists. SELECT DISTINCT members_.emailaddr_, members_.memberid_ FROM members_ WHERE ( members_.List_='list1' AND members_.MemberType_='normal' AND members_.SubType_='mail' AND members_.emailaddr_ IS NOT NULL ) AND ( ( select count(*) from lyrActiveRecips, outmail_ where outmail_.MessageID_ = lyrActiveRecips.MailingID and outmail_.Type_ = 'list' and members_.MemberID_ = lyrActiveRecips.MemberID and lyrActiveRecips.NextAttempt '2004-01-20 00:00:00' ) + ( select count(*) from lyrCompletedRecips, outmail_ where members_.MemberID_ = lyrCompletedRecips.MemberID and outmail_.MessageID_ = lyrCompletedRecips.MailingID and outmail_.Type_ = 'list' and lyrCompletedRecips.FinalAttempt '2004-01-20 00:00:00' and lyrCompletedRecips.CompletionStatusID = 300 ) = 3 ); That completed in 3.5 minutes on MS-SQL. I killed the query this morning after 15 hours on PostgreSQL 7.4. I tried a GROUP BY memberid_ HAVING variation, which completed in 59 seconds on MS-SQL. I killed it after 35 minutes on PostgreSQL. On a more positive note, if you remember the benchmarking I was doing last month, PostgreSQL got some pretty good relative numbers. It requires a lot of hand-holding and tuning relative to MS-SQL, but it certainly beat the pants off of Oracle 8 and 9 for speed and ease of management. Oracle 8 was in fact unable to complete the uglier stress tests. I'll be working on a tuning recommendations white paper today. thanks for all the help, -- Jack Coates, Lyris Technologies Applications Engineer 510-549-4350 x148, [EMAIL PROTECTED] Interoperability is the keyword, uniformity is a dead end. --Olivier Fourdan ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] query optimization question
Jack Coates [EMAIL PROTECTED] writes: That completed in 3.5 minutes on MS-SQL. I killed the query this morning after 15 hours on PostgreSQL 7.4. I tried a GROUP BY memberid_ HAVING variation, which completed in 59 seconds on MS-SQL. I killed it after 35 minutes on PostgreSQL. Hm. I'd like to think that 7.4 would be competitive on grouping queries. What sort of plan did you get from it? regards, tom lane ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] query optimization question
Jack Coates [EMAIL PROTECTED] writes: jackdb=# explain SELECT DISTINCT members_.memberid_ jackdb-# FROM members_ jackdb-# WHERE ( members_.List_='list1' jackdb(# AND members_.MemberType_='normal' jackdb(# AND members_.SubType_='mail' jackdb(# AND members_.emailaddr_ IS NOT NULL ) jackdb-# GROUP BY memberid_ HAVING ( Um, that's not what I had in mind at all. Does GROUP BY actually do anything at all here? (You didn't answer me as to whether memberid_ is a unique identifier or not, but if it is, this GROUP BY is just an expensive no-op.) What I was envisioning was pulling the sub-selects up to the top level and using grouping to calculate the count(*) values for all memberids in parallel. Roughly speaking it would look like (again assuming memberid_ is unique) SELECT memberid_ FROM ( SELECT memberid_ FROM lyrActiveRecips, members_, outmail WHERE (all the conditions for this case) UNION ALL SELECT memberid_ FROM lyrCompletedRecips, members_, outmail WHERE (all the conditions for this case) ) GROUP BY memberid_ HAVING count(*) = 3; However, if you can't change the boilerplate part of your query then this is all blue-sky speculation anyway. What I'm actually more interested in is your statement that MSSQL can do the original query quickly. I find that a bit hard to believe because I don't see any relevant optimization techniques. Do they have any equivalent to EXPLAIN that would give some hint how they're doing it? regards, tom lane ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] query optimization question
On Thu, 2004-01-29 at 11:31, Tom Lane wrote: Jack Coates [EMAIL PROTECTED] writes: jackdb=# explain SELECT DISTINCT members_.memberid_ jackdb-# FROM members_ jackdb-# WHERE ( members_.List_='list1' jackdb(# AND members_.MemberType_='normal' jackdb(# AND members_.SubType_='mail' jackdb(# AND members_.emailaddr_ IS NOT NULL ) jackdb-# GROUP BY memberid_ HAVING ( Um, that's not what I had in mind at all. Does GROUP BY actually do anything at all here? (You didn't answer me as to whether memberid_ is a unique identifier or not, but if it is, this GROUP BY is just an expensive no-op.) Sorry for the misunderstanding. It should be unique, yes. What I was envisioning was pulling the sub-selects up to the top level and using grouping to calculate the count(*) values for all memberids in parallel. Roughly speaking it would look like (again assuming memberid_ is unique) SELECT memberid_ FROM ( SELECT memberid_ FROM lyrActiveRecips, members_, outmail WHERE (all the conditions for this case) UNION ALL SELECT memberid_ FROM lyrCompletedRecips, members_, outmail WHERE (all the conditions for this case) ) GROUP BY memberid_ HAVING count(*) = 3; However, if you can't change the boilerplate part of your query then this is all blue-sky speculation anyway. Got it now -- I'm running into some subquery errors trying to implement this, anyway. What I'm actually more interested in is your statement that MSSQL can do the original query quickly. I find that a bit hard to believe because I don't see any relevant optimization techniques. Do they have any equivalent to EXPLAIN that would give some hint how they're doing it? yup -- here it is. It will probably be a nasty mess after linewrap gets done with it, so let me know if you'd like me to post a copy on ftp. SELECT DISTINCT members_.memberid_ FROM members_ WHERE ( members_.List_='list1'AND members_.MemberType_='normal'AND members_.SubType_='mail' ) GROUP BY memberid_ HAVING ( ( select count(*) from lyrActiveRecips, outmail_where outmail 11 1 0 NULLNULL1 NULL102274.5NULL NULLNULL104.10356 NULLNULLSELECT 0 NULL |--Parallelism(Gather Streams)11 2 1 Parallelism Gather Streams NULLNULL102274.50.0 0.22011127 23 104.10356 [members_].[MemberID_] NULLPLAN_ROW-1 1.0 |--Filter(WHERE:(If ([Expr1006] IS NULL) then 0 else [Expr1006]+If ([Expr1012] IS NULL) then 0 else [Expr1012]=3)) 11 3 2 Filter Filter WHERE:(If ([Expr1006] IS NULL) then 0 else [Expr1006]+If ([Expr1012] IS NULL) then 0 else [Expr1012]=3) NULL102274.50.0 3.5393338 23 103.88345 [members_].[MemberID_] NULLPLAN_ROW-1 1.0 |--Hash Match(Right Outer Join, HASH:([lyrCompletedRecips].[MemberID])=([members_].[MemberID_]), RESIDUAL:([members_].[MemberID_]=[lyrCompletedRecips].[MemberID])) 11 4 3 Hash Match Right Outer Join HASH:([lyrCompletedRecips].[MemberID])=([members_].[MemberID_]), RESIDUAL:([members_].[MemberID_]=[lyrCompletedRecips].[MemberID]) NULL 4782883.5 0.0 21.874712 23 100.34412 [members_].[MemberID_], [Expr1006], [Expr1012] NULLPLAN_ROW-1 1.0 |--Compute Scalar(DEFINE:([Expr1012]=Convert([Expr1020]))) 11 5 4 Compute Scalar Compute Scalar DEFINE:([Expr1012]=Convert([Expr1020])) [Expr1012]=Convert([Expr1020]) 119575.35 0.0 1.3723248 15 4.3749919 [lyrCompletedRecips].[MemberID], [Expr1012] NULLPLAN_ROW-1 1.0 ||--Hash Match(Aggregate, HASH:([lyrCompletedRecips].[MemberID]), RESIDUAL:([lyrCompletedRecips].[MemberID]=[lyrCompletedRecips].[MemberID]) DEFINE:([Expr1020]=COUNT(*)))11 6 5 Hash Match Aggregate HASH:([lyrCompletedRecips].[MemberID]), RESIDUAL:([lyrCompletedRecips].[MemberID]=[lyrCompletedRecips].[MemberID]) [Expr1020]=COUNT(*) 119575.35 0.0 1.3723248 15 4.3749919 [lyrCompletedRecips].[MemberID], [Expr1020] NULLPLAN_ROW-1 1.0 | |--Parallelism(Repartition Streams, PARTITION COLUMNS:([lyrCompletedRecips].[MemberID])) 11 7 6 Parallelism Repartition Streams PARTITION COLUMNS:([lyrCompletedRecips].[MemberID]) NULL 119640.60.0 0.32407209 173 3.002667 [lyrCompletedRecips].[MemberID] NULLPLAN_ROW-1 1.0 | |--Nested Loops(Inner Join, OUTER REFERENCES:([outmail_].[MessageID_])) 11 8 7 Nested LoopsInner JoinOUTER REFERENCES:([outmail_].[MessageID_])NULL119640.60.0 0.75014657 173 2.6785948
Re: [PERFORM] query optimization question
On Thu, 2004-01-29 at 14:01, Tom Lane wrote: Probably better to repost it as a gzip'd attachment. That should protect the formatting and get it into the list archives. regards, tom lane complete with a picture of the GUI version. 26k zipped, let's see if this makes it through. -- Jack Coates, Lyris Technologies Applications Engineer 510-549-4350 x148, [EMAIL PROTECTED] Interoperability is the keyword, uniformity is a dead end. --Olivier Fourdan ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] query optimization question
Hi all, I've got a query that needs some help, please. Is there a way to avoid all the looping? I've got freedom to work with the double-indented sections below ) AND (, but the initial select distinct wrapper is much more difficult to change. This is auto-generated code. explain analyze SELECT DISTINCT members_.emailaddr_, members_.memberid_ FROM members_ WHERE ( members_.List_='list1' AND members_.MemberType_='normal' AND members_.SubType_='mail' AND members_.emailaddr_ IS NOT NULL ) AND ( ( select count(*) from lyrActiveRecips, members_ a, outmail_ where lyrActiveRecips.UserName = a.UserNameLC_ and lyrActiveRecips.Domain = a.Domain_ and a.MemberID_ = members_.MemberID_ and outmail_.MessageID_ = lyrActiveRecips.MailingID and outmail_.Type_ = 'list' and lyrActiveRecips.NextAttempt '2004-01-20 00:00:00' ) + ( select count(*) from lyrCompletedRecips, members_ a, outmail_ where a.MemberID_ = lyrCompletedRecips.MemberID and a.UserNameLC_ = members_.UserNameLC_ and a.Domain_ = members_.Domain_ and outmail_.MessageID_ = lyrCompletedRecips.MailingID and outmail_.Type_ = 'list' and lyrCompletedRecips.FinalAttempt '2004-01-20 00:00:00' and lyrCompletedRecips.CompletionStatusID = 300 ) = 3 ) ; QUERY PLAN - Unique (cost=537.06..537.07 rows=1 width=72) (actual time=114460.908..114460.908 rows=0 loops=1) - Sort (cost=537.06..537.06 rows=1 width=72) (actual time=114460.905..114460.905 rows=0 loops=1) Sort Key: emailaddr_, memberid_ - Index Scan using ix_members_list_notifyerr on members_ (cost=0.00..537.05 rows=1 width=72) (actual time=114460.893..114460.893 rows=0 loops=1) Index Cond: ((list_)::text = 'list1'::text) Filter: (((membertype_)::text = 'normal'::text) AND ((subtype_)::text = 'mail'::text) AND (emailaddr_ IS NOT NULL) AND (((subplan) + (subplan)) = 3)) SubPlan - Aggregate (cost=52.39..52.39 rows=1 width=0) (actual time=0.089..0.090 rows=1 loops=818122) - Hash Join (cost=47.55..52.39 rows=1 width=0) (actual time=0.086..0.086 rows=0 loops=818122) Hash Cond: (outer.memberid_ = inner.memberid) - Index Scan using ix_members_emaillc on members_ a (cost=0.00..4.83 rows=1 width=4) (actual time=0.077..0.081 rows=1 loops=818122) Index Cond: (((domain_)::text = ($2)::text) AND ((usernamelc_)::text = ($1)::text)) - Hash (cost=47.55..47.55 rows=1 width=4) (actual time=0.025..0.025 rows=0 loops=1) - Hash Join (cost=25.00..47.55 rows=1 width=4) (actual time=0.023..0.023 rows=0 loops=1) Hash Cond: (outer.messageid_ = inner.mailingid) - Seq Scan on outmail_ (cost=0.00..22.50 rows=6 width=4) (actual time=0.001..0.001 rows=0 loops=1) Filter: ((type_)::text = 'list'::text) - Hash (cost=25.00..25.00 rows=2 width=8) (actual time=0.003..0.003 rows=0 loops=1) - Seq Scan on lyrcompletedrecips (cost=0.00..25.00 rows=2 width=8) (actual time=0.001..0.001 rows=0 loops=1) Filter: ((finalattempt '2004-01-20 00:00:00'::timestamp without time zone) AND (completionstatusid = 300)) - Aggregate (cost=51.59..51.59 rows=1 width=0) (actual time=0.033..0.034 rows=1 loops=818122) - Hash Join (cost=27.35..51.59 rows=1 width=0) (actual time=0.028..0.028 rows=0 loops=818122) Hash Cond: (((outer.username)::text = (inner.usernamelc_)::text) AND ((outer.domain)::text = (inner.domain_)::text)) - Hash Join (cost=22.52..46.72 rows=3 width=211) (actual time=0.003..0.003 rows=0 loops=818122) Hash Cond: (outer.mailingid = inner.messageid_) - Seq Scan on lyractiverecips (cost=0.00..22.50 rows=334 width=215) (actual time=0.001..0.001 rows=0 loops=818122) Filter: (nextattempt '2004-01-20 00:00:00'::timestamp