Re: [PERFORM] Possibly slow query
Manfred, Yeah, that was a typo. It should have been ASet.Value IS NULL. I have considered storing the setting names by key, since I do have a separate table with the names and a key as you suggest, but since my application is only ~75% finished, it's still pretty important to have human readable/editable tables. Thanks, Peter Darley -Original Message- From: Manfred Koizar [mailto:[EMAIL PROTECTED] Sent: Monday, January 31, 2005 3:06 AM To: Peter Darley Cc: Richard Huxton; Pgsql-Performance Subject: Re: [PERFORM] Possibly slow query On Wed, 26 Jan 2005 07:16:25 -0800, Peter Darley [EMAIL PROTECTED] wrote: SELECT User_ID FROM Assignments A NATURAL LEFT JOIN (SELECT * FROM Assignment_Settings WHERE Setting='Status') ASet WHERE A.User_ID IS NOT NULL AND ASet.Assignment_ID IS NULL GROUP BY User_ID; ASet.Assignment_ID IS NULL and value IS NULL as you had in your original post don't necessarily result in the same set of rows. SELECT DISTINCT a.User_ID FROM Assignments a LEFT JOIN Assignment_Settings s ON (a.Assignment_ID=s.Assignment_ID AND s.Setting='Status') WHERE a.User_ID IS NOT NULL AND s.Value IS NULL; Note how the join condition can contain subexpressions that only depend on columns from one table. BTW, |neo=# \d assignment_settings | [...] | setting | character varying(250) | not null | [...] |Indexes: |[...] |assignment_settings_assignment_id_setting unique, btree (assignment_id, setting) storing the setting names in their own table and referencing them by id might speed up some queries (and slow down others). Certainly worth a try ... Servus Manfred ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Possibly slow query
Peter Darley wrote: Folks, I'm using PostgreSQL 7.4.1 on Linux, and I'm trying to figure out weather a query I have is going to be slow when I have more information in my tables. both tables involved will likely have ~500K rows within a year or so. Specifically I can't tell if I'm causing myself future problems with the subquery, and should maybe re-write the query to use a join. The reason I went with the subquery is that I don't know weather a row in Assignments will have a corresponding row in Assignment_Settings The query is: SELECT User_ID FROM Assignments A WHERE A.User_ID IS NOT NULL AND (SELECT Value FROM Assignment_Settings WHERE Setting='Status' AND Assignment_ID=A.Assignment_ID) IS NULL GROUP BY User_ID; You could always use a LEFT JOIN instead, like you say. I'd personally be tempted to select distinct user_id's then join, but it depends on how many of each. You're not going to know for sure whether you'll have problems without testing. Generate 500k rows of plausible looking test-data and give it a try. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Possibly slow query
Peter Darley wrote: Folks, I'm using PostgreSQL 7.4.1 on Linux Oh, and move to the latest in the 7.4 series too. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Possibly slow query
Richard, I tried a left join, which has to be a little weird, because there may or may not be a corresponding row in Assignment_Settings for each Assignment, and they may or may not have Setting='Status', so I came up with: SELECT User_ID FROM Assignments A NATURAL LEFT JOIN (SELECT * FROM Assignment_Settings WHERE Setting='Status') ASet WHERE A.User_ID IS NOT NULL AND ASet.Assignment_ID IS NULL GROUP BY User_ID; Which explain analyze is saying takes 0.816 ms as compared to 0.163 ms for my other query. So, I'm not sure that I'm writing the best LEFT JOIN that I can. Also, I suspect that these ratios wouldn't hold as the data got bigger and started using indexes, etc. I'll mock up a couple of tables with a bunch of data and see how things go. It would be nice to understand WHY I get the results I get, which I'm not sure I will. I'm not sure what you mean by selecting a distinct User_ID first. Since I'm joining the tables on Assignment_ID, I'm not sure how I'd do a distinct before the join (because I'd lose Assignment_ID). I was also under the impression that group by was likely to be faster than a distinct, tho I can't really recall where I got that idea from. Thanks for your suggestions! Peter Darley -Original Message- From: Richard Huxton [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 26, 2005 1:36 AM To: Peter Darley Cc: Pgsql-Performance Subject: Re: [PERFORM] Possibly slow query Peter Darley wrote: Folks, I'm using PostgreSQL 7.4.1 on Linux, and I'm trying to figure out weather a query I have is going to be slow when I have more information in my tables. both tables involved will likely have ~500K rows within a year or so. Specifically I can't tell if I'm causing myself future problems with the subquery, and should maybe re-write the query to use a join. The reason I went with the subquery is that I don't know weather a row in Assignments will have a corresponding row in Assignment_Settings The query is: SELECT User_ID FROM Assignments A WHERE A.User_ID IS NOT NULL AND (SELECT Value FROM Assignment_Settings WHERE Setting='Status' AND Assignment_ID=A.Assignment_ID) IS NULL GROUP BY User_ID; You could always use a LEFT JOIN instead, like you say. I'd personally be tempted to select distinct user_id's then join, but it depends on how many of each. You're not going to know for sure whether you'll have problems without testing. Generate 500k rows of plausible looking test-data and give it a try. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
[PERFORM] Possibly slow query
Folks, I'm using PostgreSQL 7.4.1 on Linux, and I'm trying to figure out weather a query I have is going to be slow when I have more information in my tables. both tables involved will likely have ~500K rows within a year or so. Specifically I can't tell if I'm causing myself future problems with the subquery, and should maybe re-write the query to use a join. The reason I went with the subquery is that I don't know weather a row in Assignments will have a corresponding row in Assignment_Settings The query is: SELECT User_ID FROM Assignments A WHERE A.User_ID IS NOT NULL AND (SELECT Value FROM Assignment_Settings WHERE Setting='Status' AND Assignment_ID=A.Assignment_ID) IS NULL GROUP BY User_ID; The tables and an explain analyze of the query are as follows: neo=# \d assignments; Table shopper.assignments Column | Type | Modifiers ---++--- -- assignment_id | integer| not null default nextval('shopper.assignments_assignment_id_seq'::text) sample_id | integer| not null user_id | integer| time | timestamp(0) without time zone | not null default now() address_id| integer| Indexes: assignments_pkey primary key, btree (assignment_id) assignments_sample_id unique, btree (sample_id) assignments_address_id btree (address_id) assignments_user_id btree (user_id) Triggers: assignments_check_assignment BEFORE INSERT ON assignments FOR EACH ROW EXECUTE PROCEDURE check_assignment() neo=# \d assignment_settings Table shopper.assignment_settings Column | Type | Modifiers ---++--- -- assignment_setting_id | integer| not null default nextval('shopper.assignment_settings_assignment_setting_id_seq'::text) assignment_id | integer| not null setting | character varying(250) | not null value | text | Indexes: assignment_settings_pkey primary key, btree (assignment_setting_id) assignment_settings_assignment_id_setting unique, btree (assignment_id, setting) neo=# explain analyze SELECT User_ID FROM Assignments A WHERE A.User_ID IS NOT NULL AND (SELECT Value FROM Assignment_Settings WHERE Setti ng='Status' AND Assignment_ID=A.Assignment_ID) IS NULL GROUP BY User_ID; QUERY PLAN HashAggregate (cost=1.01..1.01 rows=1 width=4) (actual time=0.057..0.058 rows=1 loops=1) - Seq Scan on assignments a (cost=0.00..1.01 rows=1 width=4) (actual time=0.033..0.040 rows=2 loops=1) Filter: ((user_id IS NOT NULL) AND ((subplan) IS NULL)) SubPlan - Seq Scan on assignment_settings (cost=0.00..0.00 rows=1 width=13) (actual time=0.001..0.001 rows=0 loops=2) Filter: (((setting)::text = 'Status'::text) AND (assignment_id = $0)) Total runtime: 0.159 ms (7 rows) Thanks in advance for any help! Thanks, Peter Darley ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]