Re: [PERFORM] What needs to be done for real Partitioning?
On March 21, 2005 8:07 AM, Hannu Krosing wrote: > On L, 2005-03-19 at 23:47 -0500, Tom Lane wrote: > > Well, partitioning on the primary key would be Good Enough for 95% or > > 99% of the real problems out there. I'm not excited about adding a > > large chunk of complexity to cover another few percent. > > Are you sure that partitioning on anything else than PK would be > significantly harder ? > > I have a case where I do manual partitioning over start_time > (timestamp), but the PK is an id from a sequence. They are almost, but > not exactly in the same order. And I don't think that moving the PK to > be (start_time, id) just because of "partitioning on PK only" would be a > good design in any way. > > So please don't design the system to partition on PK only. I agree. I have used table partitioning to implement pseudo-partitioning, and I am very pleased with the results so far. Real partitioning would be even better, but I am partitioning by timestamp, and this is not the PK, and I don't wish to make it one. -Roger ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Need for speed
> Ulrich Wisser wrote: > > > > one of our services is click counting for on line advertising. We do > > this by importing Apache log files every five minutes. This results in a > > lot of insert and delete statements. ... > If you are doing mostly inserting, make sure you are in a transaction, Well, yes, but you may need to make sure that a single transaction doesn't have too many inserts in it. I was having a performance problem when doing transactions with a huge number of inserts (tens of thousands), and I solved the problem by putting a simple counter in the loop (in the Java import code, that is) and doing a commit every 100 or so inserts. -Roger > John > > > Ulrich ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[PERFORM] Query plan looks OK, but slow I/O - settings advice?
Summary === We are writing to the db pretty much 24 hours a day. Recently the amount of data we write has increased, and the query speed, formerly okay, has taken a dive. The query is using the indexes as expected, so I don't _think_ I have a query tuning issue, just an io problem. The first time a query is done it takes about 60 seconds. The second time it runs in about 6 seconds. What I know I need advice on is io settings and various buffer settings. I may also need advice on other things, but just don't know it yet! Below is ... - an explain analyze - details of the db setup and hardware - some vmstat and iostat output showing the disks are very busy - the SHOW ALL output for the db config. Details === Postgres 8.0.3 Below is a sample query. (This is actually implemented as a prepared statement. Here I fill in the '?'s with actual values.) electric=# EXPLAIN ANALYZE electric-# SELECT datavalue, logfielddatatype, timestamp FROM logdata_recent electric-# WHERE (logfielddatatype = 70 OR logfielddatatype = 71 OR logfielddatatype = 69) electric-# AND graphtargetlog = 1327 electric-# AND timestamp >= 1123052400 AND timestamp <= 1123138800 electric-# ORDER BY timestamp; QUERY PLAN -- Sort (cost=82.48..82.50 rows=6 width=14) (actual time=60208.968..60211.232 rows=2625 loops=1) Sort Key: public.logdata_recent."timestamp" -> Result (cost=0.00..82.41 rows=6 width=14) (actual time=52.483..60200.868 rows=2625 loops=1) -> Append (cost=0.00..82.41 rows=6 width=14) (actual time=52.476..60189.929 rows=2625 loops=1) -> Seq Scan on logdata_recent (cost=0.00..46.25 rows=1 width=14) (actual time=0.003..0.003 rows=0 loops=1) Filter: (((logfielddatatype = 70) OR (logfielddatatype = 71) OR (logfielddatatype = 69)) AND (graphtargetlog = 1327) AND ("timestamp" >= 1123052400) AND ("timestamp" <= 1123138800)) -> Index Scan using logdata_recent_1123085306_ix_t_fld_gtl, logdata_recent_1123085306_ix_t_fld_gtl, logdata_recent_1123085306_ix_t_fld_gtl on logdata_recent_stale logdata_recent (cost=0.00..18.08 rows=3 width=14) (actual time=52.465..60181.624 rows=2625 loops=1) Index Cond: ((("timestamp" >= 1123052400) AND ("timestamp" <= 1123138800) AND (logfielddatatype = 70) AND (graphtargetlog = 1327)) OR (("timestamp" >= 1123052400) AND ("timestamp" <= 1123138800) AND (logfielddatatype = 71) AND (graphtargetlog = 1327)) OR (("timestamp" >= 1123052400) AND ("timestamp" <= 1123138800) AND (logfielddatatype = 69) AND (graphtargetlog = 1327))) Filter: (((logfielddatatype = 70) OR (logfielddatatype = 71) OR (logfielddatatype = 69)) AND (graphtargetlog = 1327) AND ("timestamp" >= 1123052400) AND ("timestamp" <= 1123138800)) -> Index Scan using logdata_recent_1123139634_ix_t_fld_gtl, logdata_recent_1123139634_ix_t_fld_gtl, logdata_recent_1123139634_ix_t_fld_gtl on logdata_recent_active logdata_recent (cost=0.00..18.08 rows=2 width=14) (actual time=0.178..0.178 rows=0 loops=1) Index Cond: ((("timestamp" >= 1123052400) AND ("timestamp" <= 1123138800) AND (logfielddatatype = 70) AND (graphtargetlog = 1327)) OR (("timestamp" >= 1123052400) AND ("timestamp" <= 1123138800) AND (logfielddatatype = 71) AND (graphtargetlog = 1327)) OR (("timestamp" >= 1123052400) AND ("timestamp" <= 1123138800) AND (logfielddatatype = 69) AND (graphtargetlog = 1327))) Filter: (((logfielddatatype = 70) OR (logfielddatatype = 71) OR (logfielddatatype = 69)) AND (graphtargetlog = 1327) AND ("timestamp" >= 1123052400) AND ("timestamp" <= 1123138800)) Total runtime: 60214.545 ms (13 rows) 60 seconds is much longer than it used to be. I would guess it used to be under 10 seconds. The second time the above query is run we see the magic of caching as the time goes down to 6 seconds. logdata_recent_active and logdata_recent_stale are inherited tables of logdata_recent, which never has any data. (This is pseudo-partitioning in action!) So the very quick seq_scan on the empty logdata_recent parent table is okay with me. The index is built on timestamp, logfielddatatype, graphtargetlog. I am curious as to why the same index shows up 3 times in the "using" clause, but can live without knowing the details as long as it doesn't indicate that something's wrong. The logdata_recent_stale table has 5 millions rows. The size of the table itself, on disk, is 324MB. The size of the index is 210MB. The disks are ext3 with jou
Re: [PERFORM] Query plan looks OK, but slow I/O - settings advice?
Jeffrey W. Baker wrote: > On Tue, 2005-08-16 at 10:46 -0700, Roger Hand wrote: >> The disks are ext3 with journalling type of ordered, but this was later >> changed to writeback with no apparent change in speed. >> >> They're on a Dell poweredge 6650 with LSI raid card, setup as follows: >> 4 disks raid 10 for indexes (145GB) - sdc1 >> 6 disks raid 10 for data (220GB) - sdd1 >> 2 mirrored disks for logs - sdb1 >> >> stripe size is 32k >> cache policy: cached io (am told the controller has bbu) >> write policy: write-back >> read policy: readahead > > I assume you are using Linux 2.6. Oops, sorry I left that out. Nope, we're on 2.4: [EMAIL PROTECTED] ~]$ uname -a Linux xxx.xxx.xxx 2.4.21-27.0.2.ELsmp #1 SMP Wed Jan 12 23:35:44 EST 2005 i686 i686 i386 GNU/Linux It's RedHat Enterprise AS3.0 Fri Nov 5 17:55:14 PST 2004 > Have you considered booting your > machine with elevator=deadline? I just did a little Googling and see that the 2.4 kernel didn't have a decent elevator tuning system, and that was fixed in 2.6. Hmmm Thanks for the ideas ... -Roger ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] [HACKERS] Query in SQL statement
> -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] Behalf Of Jim C. Nasby > Sent: Friday, September 30, 2005 4:49 PM > Subject: Re: [PERFORM] [HACKERS] Query in SQL statement > I suggest ditching the CamelCase and going with underline_seperators. > I'd also not use the bareword id, instead using bad_user_id. And I'd > name the table bad_user. But that's just me. :) I converted a db from MS SQL, where tables and fields were CamelCase, and just lowercased the ddl to create the tables. So table and fields names were all created in lowercase, but I didn't have to change any of the application code: the SELECT statements worked fine with mixed case. -- sample DDL CREATE TABLE testtable ( fieldone int4 ) insert into TestTable (fieldone) values (11); -- These statements will both work: -- lowercase SELECT fieldone FROM testtable; -- CamelCase SELECT FieldOne FROM TestTable; -Roger > Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Simple query: how to optimize
On October 28, 2005 2:54 PM Collin Peters wrote: > I have two tables, one is called 'users' the other is 'user_activity'. ... > I am trying to write a simple query that returns the last time each > user logged into the system. This is how the query looks at the > moment: > > SELECT u.user_id, MAX(ua.activity_date) > FROM pp_users u > LEFT OUTER JOIN user_activity ua ON (u.user_id = ua.user_id AND > ua.user_activity_type_id = 7) > WHERE u.userstatus_id <> 4 > AND age(u.joined_date) < interval '30 days' > GROUP BY u.user_id You're first joining against the entire user table, then filtering out the users you don't need. Instead, filter out the users you don't need first, then do the join: SELECT users.user_id, MAX(ua.activity_date) FROM (SELECT u.user_id FROM pp_users u WHERE u.userstatus_id <> 4 AND age(u.joined_date) < interval '30 days' ) users LEFT OUTER JOIN user_activity ua ON (users.user_id = ua.user_id AND ua.user_activity_type_id = 7) GROUP BY users.user_id (disclaimer: I haven't actually tried this sql) ---(end of broadcast)--- TIP 6: explain analyze is your friend