Re: [PERFORM] Simple query: how to optimize

2005-10-28 Thread Roger Hand
On October 28, 2005 2:54 PM
Collin Peters wrote:
 I have two tables, one is called 'users' the other is 'user_activity'.
...
 I am trying to write a simple query that returns the last time each
 user logged into the system.  This is how the query looks at the
 moment:
 
 SELECT u.user_id, MAX(ua.activity_date)
 FROM pp_users u
 LEFT OUTER JOIN user_activity ua ON (u.user_id = ua.user_id AND
 ua.user_activity_type_id = 7)
 WHERE u.userstatus_id  4
 AND age(u.joined_date)  interval '30 days'
 GROUP BY u.user_id

You're first joining against the entire user table, then filtering out the users
you don't need.

Instead, filter out the users you don't need first, then do the join:

SELECT users.user_id, MAX(ua.activity_date)
FROM 
(SELECT u.user_id 
FROM pp_users u
WHERE u.userstatus_id  4
AND age(u.joined_date)  interval '30 days'
) users
LEFT OUTER JOIN user_activity ua 
  ON (users.user_id = ua.user_id 
  AND ua.user_activity_type_id = 7)
GROUP BY users.user_id

(disclaimer: I haven't actually tried this sql)

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] [HACKERS] Query in SQL statement

2005-10-01 Thread Roger Hand
 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] Behalf Of Jim C. Nasby
 Sent: Friday, September 30, 2005 4:49 PM
 Subject: Re: [PERFORM] [HACKERS] Query in SQL statement
 
 I suggest ditching the CamelCase and going with underline_seperators.
 I'd also not use the bareword id, instead using bad_user_id. And I'd
 name the table bad_user. But that's just me. :)

I converted a db from MS SQL, where tables and fields were CamelCase, and 
just lowercased the ddl to create the tables.

So table and fields names were all created in lowercase, but I didn't have to 
change
any of the application code: the SELECT statements worked fine with mixed case.

-- sample DDL
CREATE TABLE testtable
(
  fieldone int4
) 
insert into TestTable (fieldone) values (11);

-- These statements will both work:

-- lowercase
SELECT fieldone FROM testtable;

-- CamelCase
SELECT FieldOne FROM TestTable;

-Roger

 Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Query plan looks OK, but slow I/O - settings advice?

2005-08-19 Thread Roger Hand
Jeffrey W. Baker wrote:
 On Tue, 2005-08-16 at 10:46 -0700, Roger Hand wrote:
 The disks are ext3 with journalling type of ordered, but this was later 
 changed to writeback with no apparent change in speed.
 
 They're on a Dell poweredge 6650 with LSI raid card, setup as follows:
 4 disks raid 10 for indexes (145GB) - sdc1
 6 disks raid 10 for data (220GB) - sdd1
 2 mirrored disks for logs - sdb1
 
 stripe size is 32k
 cache policy: cached io (am told the controller has bbu)
 write policy: write-back
 read policy: readahead
 
 I assume you are using Linux 2.6.  

Oops, sorry I left that out. Nope, we're on 2.4:

[EMAIL PROTECTED] ~]$ uname -a
Linux xxx.xxx.xxx 2.4.21-27.0.2.ELsmp #1 SMP Wed Jan 12 23:35:44 EST 2005 i686 
i686 i386 GNU/Linux

It's RedHat Enterprise AS3.0 Fri Nov  5 17:55:14 PST 2004

 Have you considered booting your
 machine with elevator=deadline? 

I just did a little Googling and see that the 2.4 kernel didn't have a decent 
elevator tuning system, and that was fixed in 2.6. Hmmm 

Thanks for the ideas ...

-Roger

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Need for speed

2005-08-18 Thread Roger Hand
 Ulrich Wisser wrote:
 
  one of our services is click counting for on line advertising. We do
  this by importing Apache log files every five minutes. This results in a
  lot of insert and delete statements. 
...
 If you are doing mostly inserting, make sure you are in a transaction,

Well, yes, but you may need to make sure that a single transaction doesn't have 
too many inserts in it.
I was having a performance problem when doing transactions with a huge number 
of inserts
(tens of thousands), and I solved the problem by putting a simple counter in 
the loop (in the Java import code, 
that is) and doing a commit every 100 or so inserts.

-Roger

 John

  Ulrich

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[PERFORM] Query plan looks OK, but slow I/O - settings advice?

2005-08-18 Thread Roger Hand
Summary
===
We are writing to the db pretty much 24 hours a day.
Recently the amount of data we write has increased, and the query speed, 
formerly okay, has taken a dive.
The query is using the indexes as expected, so I don't _think_ I have a query 
tuning issue, just an io problem. 
The first time a query is done it takes about 60 seconds. The second time it 
runs in about 6 seconds.
What I know I need advice on is io settings and various buffer settings. 
I may also need advice on other things, but just don't know it yet!

Below is ...
- an explain analyze
- details of the db setup and hardware
- some vmstat and iostat output showing the disks are very busy
- the SHOW ALL output for the db config.

Details
===
Postgres 8.0.3

Below is a sample query. (This is actually implemented as a prepared statement. 
Here I fill in the '?'s with actual values.)

electric=# EXPLAIN ANALYZE
electric-# SELECT datavalue, logfielddatatype, timestamp FROM logdata_recent 
electric-# WHERE (logfielddatatype = 70 OR logfielddatatype = 71 OR 
logfielddatatype = 69) 
electric-# AND graphtargetlog = 1327 
electric-# AND timestamp = 1123052400 AND timestamp = 1123138800 
electric-# ORDER BY timestamp;


QUERY PLAN  

   
--
 Sort  (cost=82.48..82.50 rows=6 width=14) (actual time=60208.968..60211.232 
rows=2625 loops=1)
   Sort Key: public.logdata_recent.timestamp
   -  Result  (cost=0.00..82.41 rows=6 width=14) (actual 
time=52.483..60200.868 rows=2625 loops=1)
 -  Append  (cost=0.00..82.41 rows=6 width=14) (actual 
time=52.476..60189.929 rows=2625 loops=1)
   -  Seq Scan on logdata_recent  (cost=0.00..46.25 rows=1 
width=14) (actual time=0.003..0.003 rows=0 loops=1)
 Filter: (((logfielddatatype = 70) OR (logfielddatatype = 
71) OR (logfielddatatype = 69)) AND (graphtargetlog = 1327) AND (timestamp = 
1123052400) AND (timestamp = 1123138800))
   -  Index Scan using logdata_recent_1123085306_ix_t_fld_gtl, 
logdata_recent_1123085306_ix_t_fld_gtl, logdata_recent_1123085306_ix_t_fld_gtl 
on logdata_recent_stale logdata_recent  (cost=0.00..18.08 rows=3 width=14) 
(actual time=52.465..60181.624 rows=2625 loops=1)
 Index Cond: (((timestamp = 1123052400) AND (timestamp 
= 1123138800) AND (logfielddatatype = 70) AND (graphtargetlog = 1327)) OR 
((timestamp = 1123052400) AND (timestamp = 1123138800) AND 
(logfielddatatype = 71) AND (graphtargetlog = 1327)) OR ((timestamp = 
1123052400) AND (timestamp = 1123138800) AND (logfielddatatype = 69) AND 
(graphtargetlog = 1327)))
 Filter: (((logfielddatatype = 70) OR (logfielddatatype = 
71) OR (logfielddatatype = 69)) AND (graphtargetlog = 1327) AND (timestamp = 
1123052400) AND (timestamp = 1123138800))
   -  Index Scan using logdata_recent_1123139634_ix_t_fld_gtl, 
logdata_recent_1123139634_ix_t_fld_gtl, logdata_recent_1123139634_ix_t_fld_gtl 
on logdata_recent_active logdata_recent  (cost=0.00..18.08 rows=2 width=14) 
(actual time=0.178..0.178 rows=0 loops=1)
 Index Cond: (((timestamp = 1123052400) AND (timestamp 
= 1123138800) AND (logfielddatatype = 70) AND (graphtargetlog = 1327)) OR 
((timestamp = 1123052400) AND (timestamp = 1123138800) AND 
(logfielddatatype = 71) AND (graphtargetlog = 1327)) OR ((timestamp = 
1123052400) AND (timestamp = 1123138800) AND (logfielddatatype = 69) AND 
(graphtargetlog = 1327)))
 Filter: (((logfielddatatype = 70) OR (logfielddatatype = 
71) OR (logfielddatatype = 69)) AND (graphtargetlog = 1327) AND (timestamp = 
1123052400) AND (timestamp = 1123138800))
 Total runtime: 60214.545 ms
(13 rows)

60 seconds is much longer than it used to be. I would guess it used to be under 
10 seconds. The second time the above query is run we see the magic of caching 
as the time goes down to 6 seconds.

logdata_recent_active and logdata_recent_stale are inherited tables of 
logdata_recent, which never has any data. (This is pseudo-partitioning in 
action!)
So the very quick seq_scan on the empty logdata_recent parent table is okay 
with me.

The index is built on timestamp, logfielddatatype, graphtargetlog. I am curious 
as to why the same index shows up 3 times in the using clause, but can live 
without knowing the details as long as it doesn't indicate that something's 
wrong.

The logdata_recent_stale table has 5 millions rows. The size of the table 
itself, on disk, is 324MB. The size of the index is 210MB.

The disks are ext3 with journalling type of ordered, but this was later changed 
to 

Re: [PERFORM] What needs to be done for real Partitioning?

2005-04-26 Thread Roger Hand
On March 21, 2005 8:07 AM, Hannu Krosing wrote:
 On L, 2005-03-19 at 23:47 -0500, Tom Lane wrote:
  Well, partitioning on the primary key would be Good Enough for 95% or
  99% of the real problems out there.  I'm not excited about adding a
  large chunk of complexity to cover another few percent.
 
 Are you sure that partitioning on anything else than PK would be
 significantly harder ?
 
 I have a case where I do manual partitioning over start_time
 (timestamp), but the PK is an id from a sequence. They are almost, but
 not exactly in the same order. And I don't think that moving the PK to
 be (start_time, id) just because of partitioning on PK only would be a
 good design in any way.
 
 So please don't design the system to partition on PK only.

I agree. I have used table partitioning to implement pseudo-partitioning, and I 
am very pleased with the results so far. Real partitioning would be even 
better, but I am partitioning by timestamp, and this is not the PK, and I don't 
wish to make it one.

-Roger

---(end of broadcast)---
TIP 8: explain analyze is your friend