Re: [PERFORM] Simple query: how to optimize
On October 28, 2005 2:54 PM Collin Peters wrote: I have two tables, one is called 'users' the other is 'user_activity'. ... I am trying to write a simple query that returns the last time each user logged into the system. This is how the query looks at the moment: SELECT u.user_id, MAX(ua.activity_date) FROM pp_users u LEFT OUTER JOIN user_activity ua ON (u.user_id = ua.user_id AND ua.user_activity_type_id = 7) WHERE u.userstatus_id 4 AND age(u.joined_date) interval '30 days' GROUP BY u.user_id You're first joining against the entire user table, then filtering out the users you don't need. Instead, filter out the users you don't need first, then do the join: SELECT users.user_id, MAX(ua.activity_date) FROM (SELECT u.user_id FROM pp_users u WHERE u.userstatus_id 4 AND age(u.joined_date) interval '30 days' ) users LEFT OUTER JOIN user_activity ua ON (users.user_id = ua.user_id AND ua.user_activity_type_id = 7) GROUP BY users.user_id (disclaimer: I haven't actually tried this sql) ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] [HACKERS] Query in SQL statement
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Jim C. Nasby Sent: Friday, September 30, 2005 4:49 PM Subject: Re: [PERFORM] [HACKERS] Query in SQL statement I suggest ditching the CamelCase and going with underline_seperators. I'd also not use the bareword id, instead using bad_user_id. And I'd name the table bad_user. But that's just me. :) I converted a db from MS SQL, where tables and fields were CamelCase, and just lowercased the ddl to create the tables. So table and fields names were all created in lowercase, but I didn't have to change any of the application code: the SELECT statements worked fine with mixed case. -- sample DDL CREATE TABLE testtable ( fieldone int4 ) insert into TestTable (fieldone) values (11); -- These statements will both work: -- lowercase SELECT fieldone FROM testtable; -- CamelCase SELECT FieldOne FROM TestTable; -Roger Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Query plan looks OK, but slow I/O - settings advice?
Jeffrey W. Baker wrote: On Tue, 2005-08-16 at 10:46 -0700, Roger Hand wrote: The disks are ext3 with journalling type of ordered, but this was later changed to writeback with no apparent change in speed. They're on a Dell poweredge 6650 with LSI raid card, setup as follows: 4 disks raid 10 for indexes (145GB) - sdc1 6 disks raid 10 for data (220GB) - sdd1 2 mirrored disks for logs - sdb1 stripe size is 32k cache policy: cached io (am told the controller has bbu) write policy: write-back read policy: readahead I assume you are using Linux 2.6. Oops, sorry I left that out. Nope, we're on 2.4: [EMAIL PROTECTED] ~]$ uname -a Linux xxx.xxx.xxx 2.4.21-27.0.2.ELsmp #1 SMP Wed Jan 12 23:35:44 EST 2005 i686 i686 i386 GNU/Linux It's RedHat Enterprise AS3.0 Fri Nov 5 17:55:14 PST 2004 Have you considered booting your machine with elevator=deadline? I just did a little Googling and see that the 2.4 kernel didn't have a decent elevator tuning system, and that was fixed in 2.6. Hmmm Thanks for the ideas ... -Roger ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Need for speed
Ulrich Wisser wrote: one of our services is click counting for on line advertising. We do this by importing Apache log files every five minutes. This results in a lot of insert and delete statements. ... If you are doing mostly inserting, make sure you are in a transaction, Well, yes, but you may need to make sure that a single transaction doesn't have too many inserts in it. I was having a performance problem when doing transactions with a huge number of inserts (tens of thousands), and I solved the problem by putting a simple counter in the loop (in the Java import code, that is) and doing a commit every 100 or so inserts. -Roger John Ulrich ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[PERFORM] Query plan looks OK, but slow I/O - settings advice?
Summary === We are writing to the db pretty much 24 hours a day. Recently the amount of data we write has increased, and the query speed, formerly okay, has taken a dive. The query is using the indexes as expected, so I don't _think_ I have a query tuning issue, just an io problem. The first time a query is done it takes about 60 seconds. The second time it runs in about 6 seconds. What I know I need advice on is io settings and various buffer settings. I may also need advice on other things, but just don't know it yet! Below is ... - an explain analyze - details of the db setup and hardware - some vmstat and iostat output showing the disks are very busy - the SHOW ALL output for the db config. Details === Postgres 8.0.3 Below is a sample query. (This is actually implemented as a prepared statement. Here I fill in the '?'s with actual values.) electric=# EXPLAIN ANALYZE electric-# SELECT datavalue, logfielddatatype, timestamp FROM logdata_recent electric-# WHERE (logfielddatatype = 70 OR logfielddatatype = 71 OR logfielddatatype = 69) electric-# AND graphtargetlog = 1327 electric-# AND timestamp = 1123052400 AND timestamp = 1123138800 electric-# ORDER BY timestamp; QUERY PLAN -- Sort (cost=82.48..82.50 rows=6 width=14) (actual time=60208.968..60211.232 rows=2625 loops=1) Sort Key: public.logdata_recent.timestamp - Result (cost=0.00..82.41 rows=6 width=14) (actual time=52.483..60200.868 rows=2625 loops=1) - Append (cost=0.00..82.41 rows=6 width=14) (actual time=52.476..60189.929 rows=2625 loops=1) - Seq Scan on logdata_recent (cost=0.00..46.25 rows=1 width=14) (actual time=0.003..0.003 rows=0 loops=1) Filter: (((logfielddatatype = 70) OR (logfielddatatype = 71) OR (logfielddatatype = 69)) AND (graphtargetlog = 1327) AND (timestamp = 1123052400) AND (timestamp = 1123138800)) - Index Scan using logdata_recent_1123085306_ix_t_fld_gtl, logdata_recent_1123085306_ix_t_fld_gtl, logdata_recent_1123085306_ix_t_fld_gtl on logdata_recent_stale logdata_recent (cost=0.00..18.08 rows=3 width=14) (actual time=52.465..60181.624 rows=2625 loops=1) Index Cond: (((timestamp = 1123052400) AND (timestamp = 1123138800) AND (logfielddatatype = 70) AND (graphtargetlog = 1327)) OR ((timestamp = 1123052400) AND (timestamp = 1123138800) AND (logfielddatatype = 71) AND (graphtargetlog = 1327)) OR ((timestamp = 1123052400) AND (timestamp = 1123138800) AND (logfielddatatype = 69) AND (graphtargetlog = 1327))) Filter: (((logfielddatatype = 70) OR (logfielddatatype = 71) OR (logfielddatatype = 69)) AND (graphtargetlog = 1327) AND (timestamp = 1123052400) AND (timestamp = 1123138800)) - Index Scan using logdata_recent_1123139634_ix_t_fld_gtl, logdata_recent_1123139634_ix_t_fld_gtl, logdata_recent_1123139634_ix_t_fld_gtl on logdata_recent_active logdata_recent (cost=0.00..18.08 rows=2 width=14) (actual time=0.178..0.178 rows=0 loops=1) Index Cond: (((timestamp = 1123052400) AND (timestamp = 1123138800) AND (logfielddatatype = 70) AND (graphtargetlog = 1327)) OR ((timestamp = 1123052400) AND (timestamp = 1123138800) AND (logfielddatatype = 71) AND (graphtargetlog = 1327)) OR ((timestamp = 1123052400) AND (timestamp = 1123138800) AND (logfielddatatype = 69) AND (graphtargetlog = 1327))) Filter: (((logfielddatatype = 70) OR (logfielddatatype = 71) OR (logfielddatatype = 69)) AND (graphtargetlog = 1327) AND (timestamp = 1123052400) AND (timestamp = 1123138800)) Total runtime: 60214.545 ms (13 rows) 60 seconds is much longer than it used to be. I would guess it used to be under 10 seconds. The second time the above query is run we see the magic of caching as the time goes down to 6 seconds. logdata_recent_active and logdata_recent_stale are inherited tables of logdata_recent, which never has any data. (This is pseudo-partitioning in action!) So the very quick seq_scan on the empty logdata_recent parent table is okay with me. The index is built on timestamp, logfielddatatype, graphtargetlog. I am curious as to why the same index shows up 3 times in the using clause, but can live without knowing the details as long as it doesn't indicate that something's wrong. The logdata_recent_stale table has 5 millions rows. The size of the table itself, on disk, is 324MB. The size of the index is 210MB. The disks are ext3 with journalling type of ordered, but this was later changed to
Re: [PERFORM] What needs to be done for real Partitioning?
On March 21, 2005 8:07 AM, Hannu Krosing wrote: On L, 2005-03-19 at 23:47 -0500, Tom Lane wrote: Well, partitioning on the primary key would be Good Enough for 95% or 99% of the real problems out there. I'm not excited about adding a large chunk of complexity to cover another few percent. Are you sure that partitioning on anything else than PK would be significantly harder ? I have a case where I do manual partitioning over start_time (timestamp), but the PK is an id from a sequence. They are almost, but not exactly in the same order. And I don't think that moving the PK to be (start_time, id) just because of partitioning on PK only would be a good design in any way. So please don't design the system to partition on PK only. I agree. I have used table partitioning to implement pseudo-partitioning, and I am very pleased with the results so far. Real partitioning would be even better, but I am partitioning by timestamp, and this is not the PK, and I don't wish to make it one. -Roger ---(end of broadcast)--- TIP 8: explain analyze is your friend