Re: [PERFORM] Simple query: how to optimize

2005-10-28 Thread Roger Hand
On October 28, 2005 2:54 PM
Collin Peters wrote:
 I have two tables, one is called 'users' the other is 'user_activity'.
...
 I am trying to write a simple query that returns the last time each
 user logged into the system.  This is how the query looks at the
 moment:
 
 SELECT u.user_id, MAX(ua.activity_date)
 FROM pp_users u
 LEFT OUTER JOIN user_activity ua ON (u.user_id = ua.user_id AND
 ua.user_activity_type_id = 7)
 WHERE u.userstatus_id  4
 AND age(u.joined_date)  interval '30 days'
 GROUP BY u.user_id

You're first joining against the entire user table, then filtering out the users
you don't need.

Instead, filter out the users you don't need first, then do the join:

SELECT users.user_id, MAX(ua.activity_date)
FROM 
(SELECT u.user_id 
FROM pp_users u
WHERE u.userstatus_id  4
AND age(u.joined_date)  interval '30 days'
) users
LEFT OUTER JOIN user_activity ua 
  ON (users.user_id = ua.user_id 
  AND ua.user_activity_type_id = 7)
GROUP BY users.user_id

(disclaimer: I haven't actually tried this sql)

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Simple query: how to optimize

2005-10-28 Thread Collin Peters
These two queries execute at exactly the same speed.  When I run run
EXPLAIN on them both they return the *exact* same query plan as well. 
I find this strange... but it is also kind of what I expected from
reading up on various things.  I am under the impression the
postgresql will break up your query and run it as it sees best.   So
in the case of these two queries... it seems it is actually almost
converting one into the other.  Maybe I am wrong.

Is there a good resource list somewhere for postgresql query
optimization?  There are entire books devoted to the subject for
oracle but I can't find more than a few small articles on postgresql
query optimizations on the web.

Regards,
Collin

On 10/28/05, Roger Hand [EMAIL PROTECTED] wrote:
  SELECT u.user_id, MAX(ua.activity_date)
  FROM pp_users u
  LEFT OUTER JOIN user_activity ua ON (u.user_id = ua.user_id AND
  ua.user_activity_type_id = 7)
  WHERE u.userstatus_id  4
  AND age(u.joined_date)  interval '30 days'
  GROUP BY u.user_id

 You're first joining against the entire user table, then filtering out the 
 users
 you don't need.

 Instead, filter out the users you don't need first, then do the join:

 SELECT users.user_id, MAX(ua.activity_date)
 FROM
 (SELECT u.user_id
 FROM pp_users u
 WHERE u.userstatus_id  4
 AND age(u.joined_date)  interval '30 days'
 ) users
 LEFT OUTER JOIN user_activity ua
   ON (users.user_id = ua.user_id
   AND ua.user_activity_type_id = 7)
 GROUP BY users.user_id

 (disclaimer: I haven't actually tried this sql)


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Simple query: how to optimize

2005-10-28 Thread Collin Peters
A little bit more on my last post that I forget to mention.  The two
queries run at the same speed and have the same plan only if I have an
index on the user_activity.user_id column.  Otherwise they run at
different speeds.  The query you gave me actually runs slower without
the index.  All  this is making my head spin!! :O

On 10/28/05, Collin Peters [EMAIL PROTECTED] wrote:
 These two queries execute at exactly the same speed.  When I run run
 EXPLAIN on them both they return the *exact* same query plan as well.
 I find this strange... but it is also kind of what I expected from
 reading up on various things.  I am under the impression the
 postgresql will break up your query and run it as it sees best.   So
 in the case of these two queries... it seems it is actually almost
 converting one into the other.  Maybe I am wrong.

 Is there a good resource list somewhere for postgresql query
 optimization?  There are entire books devoted to the subject for
 oracle but I can't find more than a few small articles on postgresql
 query optimizations on the web.

 Regards,
 Collin

 On 10/28/05, Roger Hand [EMAIL PROTECTED] wrote:
   SELECT u.user_id, MAX(ua.activity_date)
   FROM pp_users u
   LEFT OUTER JOIN user_activity ua ON (u.user_id = ua.user_id AND
   ua.user_activity_type_id = 7)
   WHERE u.userstatus_id  4
   AND age(u.joined_date)  interval '30 days'
   GROUP BY u.user_id
 
  You're first joining against the entire user table, then filtering out the 
  users
  you don't need.
 
  Instead, filter out the users you don't need first, then do the join:
 
  SELECT users.user_id, MAX(ua.activity_date)
  FROM
  (SELECT u.user_id
  FROM pp_users u
  WHERE u.userstatus_id  4
  AND age(u.joined_date)  interval '30 days'
  ) users
  LEFT OUTER JOIN user_activity ua
ON (users.user_id = ua.user_id
AND ua.user_activity_type_id = 7)
  GROUP BY users.user_id
 
  (disclaimer: I haven't actually tried this sql)
 


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Simple query: how to optimize

2005-10-28 Thread Steinar H. Gunderson
On Fri, Oct 28, 2005 at 03:40:40PM -0700, Roger Hand wrote:
 You're first joining against the entire user table, then filtering out the 
 users
 you don't need.

That's just wrong, sorry -- the planner is perfectly able to push the WHERE
down before the join.

I'd guess the problem is the age() query; age() doesn't really return what
you'd expect, and I don't think it can use an index easily (I might be wrong
here, though). Instead, try something like

  WHERE u.joined_date = current_date - interval '30 days'

except that if you're running pre-8.0, you might want to precalculate the
right-hand side on the client.

I couldn't see EXPLAIN ANALYZE of your query, BTW -- having it would be
useful.

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Simple query: how to optimize

2005-10-28 Thread PostgreSQL
Postgres is somewhat speed-challenged on aggregate functions.
The most-repeated work-around would be something like:

SELECT u.user_id,
(SELECT activity_date
  FROM user_activity
  WHERE user_activity.user_id = pp_users.user_id
  AND user_activity_type_id = 7
  ORDER BY activity_date DESC
  LIMIT 1)
FROM pp_users u
WHERE u.userstatus_id  4
AND age(u.joined_date)  interval '30 days'

(code above is untested) I've read that aggregate functions are
improved in the 8.1 code.  I'm running 8.1beta3 on one machine
but haven't experimented to verify the claimed improvements.

Martin Nickel

Collin Peters [EMAIL PROTECTED] wrote in message 
news:[EMAIL PROTECTED]
I have two tables, one is called 'users' the other is 'user_activity'.
 The 'users' table simply contains the users in the system there is
 about 30,000 rows.  The 'user_activity' table stores the activities
 the user has taken.  This table has about 430,000 rows and also
 (notably) has a column which tracks the type of activity.  90% of the
 table is type 7 which indicates the user logged into the system.

 I am trying to write a simple query that returns the last time each
 user logged into the system.  This is how the query looks at the
 moment:

 SELECT u.user_id, MAX(ua.activity_date)
 FROM pp_users u
 LEFT OUTER JOIN user_activity ua ON (u.user_id = ua.user_id AND
 ua.user_activity_type_id = 7)
 WHERE u.userstatus_id  4
 AND age(u.joined_date)  interval '30 days'
 GROUP BY u.user_id

 The above query takes about 5 seconds but I'm wondering how it can be
 optimized.  When the query is formatted as above it does use an index
 on the user_id column of the user_activity table... but the cost is
 huge (cost=0.00..1396700.80).

 I have tried formatting it another way with a sub-query but it takes
 about the same amount to completed:

 SELECT u.user_id, ua.last
 FROM pp_users u
 LEFT OUTER JOIN (SELECT max(activity_date) as last, user_id FROM
 user_activity WHERE user_activity_type_id = 7 GROUP BY user_id) as ua
 ON (u.user_id = ua.user_id)
 WHERE u.userstatus_id  4
 AND age(u.joined_date)  interval '30 days'

 Can anybody offer any pointers on this scenario?

 Regards,
 Collin

 ---(end of broadcast)---
 TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly
 



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org