Re: [PERFORM] Possibly slow query

2005-01-31 Thread Peter Darley
Manfred,
Yeah, that was a typo.  It should have been ASet.Value IS NULL.
I have considered storing the setting names by key, since I do have a
separate table with the names and a key as you suggest, but since my
application is only ~75% finished, it's still pretty important to have human
readable/editable tables.
Thanks,
Peter Darley

-Original Message-
From: Manfred Koizar [mailto:[EMAIL PROTECTED]
Sent: Monday, January 31, 2005 3:06 AM
To: Peter Darley
Cc: Richard Huxton; Pgsql-Performance
Subject: Re: [PERFORM] Possibly slow query


On Wed, 26 Jan 2005 07:16:25 -0800, Peter Darley
[EMAIL PROTECTED] wrote:
SELECT User_ID
FROM Assignments A NATURAL LEFT JOIN (SELECT * FROM Assignment_Settings
WHERE Setting='Status') ASet
WHERE A.User_ID IS NOT NULL
   AND ASet.Assignment_ID IS NULL
GROUP BY User_ID;

ASet.Assignment_ID IS NULL and value IS NULL as you had in your
original post don't necessarily result in the same set of rows.

SELECT DISTINCT a.User_ID
  FROM Assignments a
   LEFT JOIN Assignment_Settings s
  ON (a.Assignment_ID=s.Assignment_ID
  AND s.Setting='Status')
 WHERE a.User_ID IS NOT NULL
   AND s.Value IS NULL;

Note how the join condition can contain subexpressions that only depend
on columns from one table.

BTW,
|neo=# \d assignment_settings
| [...]
| setting   | character varying(250) | not null
| [...]
|Indexes:
|[...]
|assignment_settings_assignment_id_setting unique, btree
(assignment_id, setting)

storing the setting names in their own table and referencing them by id
might speed up some queries (and slow down others).  Certainly worth a
try ...

Servus
 Manfred


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Possibly slow query

2005-01-26 Thread Richard Huxton
Peter Darley wrote:
Folks,
I'm using PostgreSQL 7.4.1 on Linux, and I'm trying to figure out 
weather a
query I have is going to be slow when I have more information in my tables.
both tables involved will likely have ~500K rows within a year or so.
Specifically I can't tell if I'm causing myself future problems with the
subquery, and should maybe re-write the query to use a join.  The reason I
went with the subquery is that I don't know weather a row in Assignments
will have a corresponding row in Assignment_Settings
The query is:
SELECT User_ID
FROM Assignments A
WHERE A.User_ID IS NOT NULL
AND (SELECT Value FROM Assignment_Settings WHERE Setting='Status' AND
Assignment_ID=A.Assignment_ID) IS NULL
GROUP BY User_ID;
You could always use a LEFT JOIN instead, like you say. I'd personally 
be tempted to select distinct user_id's then join, but it depends on how 
many of each.

You're not going to know for sure whether you'll have problems without 
testing. Generate 500k rows of plausible looking test-data and give it a 
try.

--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faq


Re: [PERFORM] Possibly slow query

2005-01-26 Thread Richard Huxton
Peter Darley wrote:
Folks,
	I'm using PostgreSQL 7.4.1 on Linux
Oh, and move to the latest in the 7.4 series too.
--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] Possibly slow query

2005-01-26 Thread Peter Darley
Richard,
I tried a left join, which has to be a little weird, because there may 
or
may not be a corresponding row in Assignment_Settings for each Assignment,
and they may or may not have Setting='Status', so I came up with:

SELECT User_ID
FROM Assignments A NATURAL LEFT JOIN (SELECT * FROM Assignment_Settings
WHERE Setting='Status') ASet
WHERE A.User_ID IS NOT NULL
AND ASet.Assignment_ID IS NULL
GROUP BY User_ID;

Which explain analyze is saying takes 0.816 ms as compared to 0.163 ms 
for
my other query.  So, I'm not sure that I'm writing the best LEFT JOIN that I
can.  Also, I suspect that these ratios wouldn't hold as the data got bigger
and started using indexes, etc.  I'll mock up a couple of tables with a
bunch of data and see how things go.  It would be nice to understand WHY I
get the results I get, which I'm not sure I will.

I'm not sure what you mean by selecting a distinct User_ID first.  Since
I'm joining the tables on Assignment_ID, I'm not sure how I'd do a distinct
before the join (because I'd lose Assignment_ID).  I was also under the
impression that group by was likely to be faster than a distinct, tho I
can't really recall where I got that idea from.

Thanks for your suggestions!
Peter Darley

-Original Message-
From: Richard Huxton [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 26, 2005 1:36 AM
To: Peter Darley
Cc: Pgsql-Performance
Subject: Re: [PERFORM] Possibly slow query


Peter Darley wrote:
 Folks,

   I'm using PostgreSQL 7.4.1 on Linux, and I'm trying to figure out 
 weather
a
 query I have is going to be slow when I have more information in my
tables.
 both tables involved will likely have ~500K rows within a year or so.

   Specifically I can't tell if I'm causing myself future problems with the
 subquery, and should maybe re-write the query to use a join.  The reason I
 went with the subquery is that I don't know weather a row in Assignments
 will have a corresponding row in Assignment_Settings

   The query is:
 SELECT User_ID
 FROM Assignments A
 WHERE A.User_ID IS NOT NULL
   AND (SELECT Value FROM Assignment_Settings WHERE Setting='Status' AND
 Assignment_ID=A.Assignment_ID) IS NULL
 GROUP BY User_ID;

You could always use a LEFT JOIN instead, like you say. I'd personally
be tempted to select distinct user_id's then join, but it depends on how
many of each.

You're not going to know for sure whether you'll have problems without
testing. Generate 500k rows of plausible looking test-data and give it a
try.

--
   Richard Huxton
   Archonet Ltd


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


[PERFORM] Possibly slow query

2005-01-25 Thread Peter Darley
Folks,

I'm using PostgreSQL 7.4.1 on Linux, and I'm trying to figure out 
weather a
query I have is going to be slow when I have more information in my tables.
both tables involved will likely have ~500K rows within a year or so.

Specifically I can't tell if I'm causing myself future problems with the
subquery, and should maybe re-write the query to use a join.  The reason I
went with the subquery is that I don't know weather a row in Assignments
will have a corresponding row in Assignment_Settings

The query is:
SELECT User_ID
FROM Assignments A
WHERE A.User_ID IS NOT NULL
AND (SELECT Value FROM Assignment_Settings WHERE Setting='Status' AND
Assignment_ID=A.Assignment_ID) IS NULL
GROUP BY User_ID;

The tables and an explain analyze of the query are as follows:

neo=# \d assignments;
   Table shopper.assignments
Column |  Type  |
Modifiers
---++---
--
 assignment_id | integer| not null default
nextval('shopper.assignments_assignment_id_seq'::text)
 sample_id | integer| not null
 user_id   | integer|
 time  | timestamp(0) without time zone | not null default now()
 address_id| integer|
Indexes:
assignments_pkey primary key, btree (assignment_id)
assignments_sample_id unique, btree (sample_id)
assignments_address_id btree (address_id)
assignments_user_id btree (user_id)
Triggers:
assignments_check_assignment BEFORE INSERT ON assignments FOR EACH ROW
EXECUTE PROCEDURE check_assignment()

neo=# \d assignment_settings
   Table
shopper.assignment_settings
Column |  Type  |
Modifiers
---++---
--
 assignment_setting_id | integer| not null default
nextval('shopper.assignment_settings_assignment_setting_id_seq'::text)
 assignment_id | integer| not null
 setting   | character varying(250) | not null
 value | text   |
Indexes:
assignment_settings_pkey primary key, btree (assignment_setting_id)
assignment_settings_assignment_id_setting unique, btree
(assignment_id, setting)

neo=# explain analyze SELECT User_ID FROM Assignments A WHERE A.User_ID IS
NOT NULL AND (SELECT Value FROM Assignment_Settings WHERE Setti
ng='Status' AND Assignment_ID=A.Assignment_ID) IS NULL GROUP BY User_ID;
 QUERY PLAN


 HashAggregate  (cost=1.01..1.01 rows=1 width=4) (actual time=0.057..0.058
rows=1 loops=1)
   -  Seq Scan on assignments a  (cost=0.00..1.01 rows=1 width=4) (actual
time=0.033..0.040 rows=2 loops=1)
 Filter: ((user_id IS NOT NULL) AND ((subplan) IS NULL))
 SubPlan
   -  Seq Scan on assignment_settings  (cost=0.00..0.00 rows=1
width=13) (actual time=0.001..0.001 rows=0 loops=2)
 Filter: (((setting)::text = 'Status'::text) AND
(assignment_id = $0))
 Total runtime: 0.159 ms
(7 rows)


Thanks in advance for any help!

Thanks,
Peter Darley


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]