[PERFORM] Sustained inserts per sec ... ?
Just curious, but does anyone have an idea of what we are capable of? I realize that size of record would affect things, as well as hardware, but if anyone has some ideas on max, with 'record size', that would be appreciated ... Thanks ... ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: [EMAIL PROTECTED] Yahoo!: yscrappy ICQ: 7615664 ---(end of broadcast)--- TIP 8: explain analyze is your friend
[PERFORM] Major flood of mail to lists ...
Do to moderator error (namely, mine), several hundred messages (spread across all the lists) were just approved ... Sorry for all the incoming junk :( Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: [EMAIL PROTECTED] Yahoo!: yscrappy ICQ: 7615664 ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
[PERFORM] getting an index to work with partial indices ...
Try as I might, I can't seem to get it to work ... table has >9million rows in it, I've created an index "using btree ( priority ) where priority < 0;", where the table distribution looks like: priority | count --+- -2 | 138435 -1 | 943250 1 |3416 9 | 1134171 | 7276960 (5 rows) And it still won't use the index: # explain update table set priority = -3 where priority = -1; QUERY PLAN -- Seq Scan on table (cost=0.00..400735.90 rows=993939 width=278) Filter: (priority = -1) (2 rows) But, ti will if I try 'priority = -2' ... what is teh threshhold for using the index? obviously 10% of the records is too high ... thanks ... Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: [EMAIL PROTECTED] Yahoo!: yscrappy ICQ: 7615664 ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[PERFORM] CHECK vs REFERENCES
Which is faster, where the list involved is fixed? My thought is that since it doesn't have to check a seperate table, the CHECK itself should be the faster of the two, but I can't find anything that seems to validate that theory ... The case is where I just want to check that a value being inserted is one of a few possible values, with that list of values rarely (if ever) changing, so havng a 'flexible list' REFERENCED seems relatively overkill ... Thoughts, or pointers to a doc that disproves, or proves, what I believe? Thanks ... ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: [EMAIL PROTECTED] Yahoo!: yscrappy ICQ: 7615664 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] CHECK vs REFERENCES
On Fri, 9 Sep 2005, Michael Fuhr wrote: On Sat, Sep 10, 2005 at 12:23:19AM -0300, Marc G. Fournier wrote: Which is faster, where the list involved is fixed? My thought is that since it doesn't have to check a seperate table, the CHECK itself should be the faster of the two, but I can't find anything that seems to validate that theory ... Why not just benchmark each method as you intend to use them? Here's a simplistic example: CREATE TABLE test_none ( val integer NOT NULL ); CREATE TABLE test_check ( val integer NOT NULL CHECK (val IN (1, 2, 3, 4, 5)) ); CREATE TABLE test_vals ( id integer PRIMARY KEY ); INSERT INTO test_vals SELECT * FROM generate_series(1, 5); CREATE TABLE test_fk ( val integer NOT NULL REFERENCES test_vals ); \timing INSERT INTO test_none SELECT 1 FROM generate_series(1, 10); INSERT 0 10 Time: 3109.089 ms INSERT INTO test_check SELECT 1 FROM generate_series(1, 10); INSERT 0 10 Time: 3492.344 ms INSERT INTO test_fk SELECT 1 FROM generate_series(1, 10); INSERT 0 10 Time: 23578.853 ms Yowch, I expected CHECK to be better ... but not so significantly ... I figured I'd be saving milliseconds, which, on a busy server, would add up fast ... but not 10k' of milliseconds ... Thanks, that definitely shows a major benefit ... Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: [EMAIL PROTECTED] Yahoo!: yscrappy ICQ: 7615664 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] pg_stat* values ...
Not having found anything so far, does anyone know of, and can point me to, either tools, or articles, that talk about doing tuning based on the information that this sort of information can help with? Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: [EMAIL PROTECTED] Yahoo!: yscrappy ICQ: 7615664 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Postgres on VPS - how much is enough?
On Mon, 6 Mar 2006, Matthew Nuzum wrote: On 3/6/06, Nagita Karunaratne <[EMAIL PROTECTED]> wrote: How big a VPS would I need to run a Postgres DB. One application will add about 500 orders per day Another will access this data to create and send about 500 emails per day A third will access this data to create an after-sales survey for at most 500 times per day. What type of VPS would I need to run a database with this type pf load? Is 128 MB ram enough? What percentage of a 2.8 GHz CPU would be required? My problem with running PG inside of a VPS was that the VPS used a virtual filesystem... basically, a single file that had been formatted and loop mounted so that it looked like a regular hard drive. Unfortunately, it was very slow. The difference between my application and yours is that mine well more than filled the 1GB of RAM that I had allocated. If your data will fit comfortably into RAM then you may be fine. We host VPSs here (http://www.hub.org) and don't use the 'single file, virtual file system' to put them into ... it must depend on where you host? Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: [EMAIL PROTECTED] Yahoo!: yscrappy ICQ: 7615664 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] Please ignore ...
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Someone on this list has one of those 'confirm your email' filters on their mailbox, which is bouncing back messages ... this is an attempt to try and narrow down the address that is causing this ... - -- Marc G. FournierHub.Org Hosting Solutions S.A. (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.8 (FreeBSD) iEYEARECAAYFAkgZRAAACgkQ4QvfyHIvDvNHrwCcDdlkjAXSyfyOBa5vgfLVOrSb JyoAn005bSbY6lnyjGmlOQzj7fSMNSKV =n5PC -END PGP SIGNATURE- -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] FreeBSD page size
Just curious, but Bruce(?) mentioned that apparently a 32k block size was found to show a 15% improvement ... care to run one more test? :) On Wed, 3 Sep 2003, Vivek Khera wrote: > Ok... simple tests have completed. Here are some numbers. > > FreeBSD 4.8 > PG 7.4b2 > 4GB Ram > Dual Xeon 2.4GHz processors > 14 U320 SCSI disks attached to Dell PERC3/DC RAID controller in RAID 5 > config with 32k stripe size > > Dump file: > -rw-r--r-- 1 vivek wheel 1646633745 Aug 28 11:01 19-Aug-2003.dump > > When restored (after deleting one index that took up ~1Gb -- turned > out it was redundant to another multi-column index): > > % df -k /u/d02 > Filesystem1K-blocks Used Avail Capacity Mounted on > /dev/amrd1s1e 226408360 18067260 190228432 9%/u/d02 > > > > postgresql.conf alterations from standard: > shared_buffers = 6 > sort_mem = 8192 > vacuum_mem=131702 > max_fsm_pages=100 > effective_cache_size=25600 > random_page-cost = 2 > > > restore time: 14777 seconds > vacuum analyze time: 30 minutes > select count(*) from user_list where owner_id=315; 50388.64 ms > > > the restore complained often about checkpoints occurring every few > seconds: > > Sep 2 11:57:14 d02 postgres[49721]: [5-1] LOG: checkpoints are occurring too > frequently (15 seconds apart) > Sep 2 11:57:14 d02 postgres[49721]: [5-2] HINT: Consider increasing > CHECKPOINT_SEGMENTS. > > The HINT threw me off since I had to set checkpoint_segments in > postgresql.conf, where as CHECKPOINT_SEGMENTS implied to me a > compile-time constant. > > Anyhow, so I deleted the PG data directory, and made these two > changes: > > checkpoint_segments=50 > sort_mem = 131702 > > This *really* improved the time for the restore: > > restore time: 11594 seconds > > then I reset the checkpoint_segments and sort_mem back to old > values... > > vacuum analyze time is still 30 minutes > select count(*) from user_list where owner_id=315; 51363.98 ms > > so the select appears a bit slower but it is hard to say why. the > system is otherwise idle as it is not in production yet. > > > Then I took the suggestion to update PG's page size to 16k and did the > same increase on sort_mem and checkpoint_segments as above. I also > halved the shared_buffers and max_fsm_pages (probably should have > halved the effective_cache_size too...) > > restore time: 11322 seconds > vacuum analyze time: 27 minutes > select count(*) from user_list where owner_id=315; 48267.66 ms > > > Granted, given this simple test it is hard to say whether the 16k > blocks will make an improvement under live load, but I'm gonna give it > a shot. The 16k block size shows me roughly 2-6% improvement on these > tests. > > So throw in my vote for 16k blocks on FreeBSD (and annotate the docs > to tell which parameters need to be halved to account for it). > > > -- > =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= > Vivek Khera, Ph.D.Khera Communications, Inc. > Internet: [EMAIL PROTECTED] Rockville, MD +1-240-453-8497 > AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/ > > ---(end of broadcast)--- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED]) > ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] FreeBSD page size
On Wed, 3 Sep 2003, Bruce Momjian wrote: > Vivek Khera wrote: > > the restore complained often about checkpoints occurring every few > > seconds: > > > > Sep 2 11:57:14 d02 postgres[49721]: [5-1] LOG: checkpoints are occurring too > > frequently (15 seconds apart) > > Sep 2 11:57:14 d02 postgres[49721]: [5-2] HINT: Consider increasing > > CHECKPOINT_SEGMENTS. > > > > The HINT threw me off since I had to set checkpoint_segments in > > postgresql.conf, where as CHECKPOINT_SEGMENTS implied to me a > > compile-time constant. > > Woo hoo, my warning worked. Great. > > I uppercased it because config parameters are uppercased in the > documentation. Do we mention config parameters in any other error > messages? Should it be lowercased? k, to me upper case denotes a compiler #define, so I would have been confused ... I'd go with lower case and single quotes around it to denote its a variable to be changed ... ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] *very* slow query to summarize data for a month ...
Table structure is simple: CREATE TABLE traffic_logs ( company_id bigint, ip_id bigint, port integer, bytes bigint, runtime timestamp without time zone ); runtime is 'day of month' ... I need to summarize the month, per company, with a query as: explain analyze SELECT ts.company_id, company_name, SUM(ts.bytes) AS total_traffic FROM company c, traffic_logs ts WHERE c.company_id = ts.company_id AND month_trunc(ts.runtime) = '2003-10-01' GROUP BY company_name,ts.company_id; and the explain looks like: QUERY PLAN -- Aggregate (cost=32000.94..32083.07 rows=821 width=41) (actual time=32983.36..47586.17 rows=144 loops=1) -> Group (cost=32000.94..32062.54 rows=8213 width=41) (actual time=32957.40..42817.88 rows=462198 loops=1) -> Sort (cost=32000.94..32021.47 rows=8213 width=41) (actual time=32957.38..36261.31 rows=462198 loops=1) Sort Key: c.company_name, ts.company_id -> Merge Join (cost=31321.45..31466.92 rows=8213 width=41) (actual time=13983.07..22642.14 rows=462198 loops=1) Merge Cond: ("outer".company_id = "inner".company_id) -> Sort (cost=24.41..25.29 rows=352 width=25) (actual time=5.52..7.40 rows=348 loops=1) Sort Key: c.company_id -> Seq Scan on company c (cost=0.00..9.52 rows=352 width=25) (actual time=0.02..2.78 rows=352 loops=1) -> Sort (cost=31297.04..31317.57 rows=8213 width=16) (actual time=13977.49..16794.41 rows=462198 loops=1) Sort Key: ts.company_id -> Index Scan using tl_month on traffic_logs ts (cost=0.00..30763.02 rows=8213 width=16) (actual time=0.29..5562.25 rows=462198 loops=1) Index Cond: (month_trunc(runtime) = '2003-10-01 00:00:00'::timestamp without time zone) Total runtime: 47587.82 msec (14 rows) the problem is that we're only taking a few months worth of data, so I don't think there is much of a way of 'improve performance' on this, but figured I'd ask quickly before I do something rash ... Note that without the month_trunc() index, the Total runtime more then doubles: QUERY PLAN Aggregate (cost=39578.63..39660.76 rows=821 width=41) (actual time=87805.47..101251.35 rows=144 loops=1) -> Group (cost=39578.63..39640.23 rows=8213 width=41) (actual time=87779.56..96824.56 rows=462198 loops=1) -> Sort (cost=39578.63..39599.17 rows=8213 width=41) (actual time=87779.52..90781.48 rows=462198 loops=1) Sort Key: c.company_name, ts.company_id -> Merge Join (cost=38899.14..39044.62 rows=8213 width=41) (actual time=64073.98..72783.68 rows=462198 loops=1) Merge Cond: ("outer".company_id = "inner".company_id) -> Sort (cost=24.41..25.29 rows=352 width=25) (actual time=64.66..66.55 rows=348 loops=1) Sort Key: c.company_id -> Seq Scan on company c (cost=0.00..9.52 rows=352 width=25) (actual time=1.76..61.70 rows=352 loops=1) -> Sort (cost=38874.73..38895.27 rows=8213 width=16) (actual time=64009.26..66860.71 rows=462198 loops=1) Sort Key: ts.company_id -> Seq Scan on traffic_logs ts (cost=0.00..38340.72 rows=8213 width=16) (actual time=5.02..-645982.04 rows=462198 loops=1) Filter: (date_trunc('month'::text, runtime) = '2003-10-01 00:00:00'::timestamp without time zone) Total runtime: 101277.17 msec (14 rows) ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] *very* slow query to summarize data for a month ...
On Mon, 10 Nov 2003, Neil Conway wrote: > "Marc G. Fournier" <[EMAIL PROTECTED]> writes: > > -> Index Scan using tl_month on traffic_logs ts (cost=0.00..30763.02 rows=8213 > > width=16) (actual time=0.29..5562.25 rows=462198 loops=1) > > Index Cond: (month_trunc(runtime) = '2003-10-01 00:00:00'::timestamp without > > time zone) > > Interesting that we get the row count estimate for this index scan so > wrong -- I believe this is the root of the problem. Hmmm... I would > guess that the optimizer stats we have for estimating the selectivity > of a functional index is pretty primitive, but I haven't looked into > it at all. Tom might be able to shed some light... > > [ In the second EXPLAIN ANALYZE, ... ] > > > -> Seq Scan on traffic_logs ts (cost=0.00..38340.72 rows=8213 width=16) (actual > > time=5.02..-645982.04 rows=462198 loops=1) > > Filter: (date_trunc('month'::text, runtime) = '2003-10-01 > > 00:00:00'::timestamp without time zone) > > Uh, what? The "actual time" seems to have finished far before it has > begun :-) Is this just a typo, or does the actual output include a > negative number? This was purely a cut-n-paste ... ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] *very* slow query to summarize data for a month ...
On Mon, 10 Nov 2003, Tom Lane wrote: > Neil Conway <[EMAIL PROTECTED]> writes: > > Interesting that we get the row count estimate for this index scan so > > wrong -- I believe this is the root of the problem. Hmmm... I would > > guess that the optimizer stats we have for estimating the selectivity > > of a functional index is pretty primitive, but I haven't looked into > > it at all. Tom might be able to shed some light... > > Try "none at all". I have speculated in the past that it would be worth > gathering statistics about the contents of functional indexes, but it's > still on the to-do-someday list. > > >> -> Seq Scan on traffic_logs ts (cost=0.00..38340.72 rows=8213 width=16) (actual > >> time=5.02..-645982.04 rows=462198 loops=1) > > > Uh, what? > > That is bizarre, all right. Is it reproducible? Nope, and a subsequent run shows better results too: QUERY PLAN - Aggregate (cost=39674.38..39756.70 rows=823 width=41) (actual time=35573.27..49953.47 rows=144 loops=1) -> Group (cost=39674.38..39736.12 rows=8232 width=41) (actual time=35547.27..45479.27 rows=462198 loops=1) -> Sort (cost=39674.38..39694.96 rows=8232 width=41) (actual time=35547.23..39167.90 rows=462198 loops=1) Sort Key: c.company_name, ts.company_id -> Merge Join (cost=38993.22..39139.02 rows=8232 width=41) (actual time=16658.23..25559.08 rows=462198 loops=1) Merge Cond: ("outer".company_id = "inner".company_id) -> Sort (cost=24.41..25.29 rows=352 width=25) (actual time=5.51..7.38 rows=348 loops=1) Sort Key: c.company_id -> Seq Scan on company c (cost=0.00..9.52 rows=352 width=25) (actual time=0.02..2.80 rows=352 loops=1) -> Sort (cost=38968.82..38989.40 rows=8232 width=16) (actual time=16652.66..19785.83 rows=462198 loops=1) Sort Key: ts.company_id -> Seq Scan on traffic_logs ts (cost=0.00..38433.46 rows=8232 width=16) (actual time=0.11..8794.43 rows=462198 loops=1) Filter: (date_trunc('month'::text, runtime) = '2003-10-01 00:00:00'::timestamp without time zone) Total runtime: 49955.22 msec ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] *very* slow query to summarize data for a month ...
On Mon, 10 Nov 2003, Josh Berkus wrote: > Marc, > > I'd say your machine is very low on available RAM, particularly sort_mem. > The steps which are taking a long time are: Here's the server: last pid: 42651; load averages: 1.52, 0.96, 0.88 up 28+07:43:33 20:35:44 307 processes: 2 running, 304 sleeping, 1 zombie CPU states: 18.0% user, 0.0% nice, 29.1% system, 0.6% interrupt, 52.3% idle Mem: 1203M Active, 1839M Inact, 709M Wired, 206M Cache, 199M Buf, 5608K Free Swap: 8192M Total, 1804K Used, 8190M Free > > > Aggregate (cost=32000.94..32083.07 rows=821 width=41) (actual > time=32983.36..47586.17 rows=144 loops=1) > >-> Group (cost=32000.94..32062.54 rows=8213 width=41) (actual > time=32957.40..42817.88 rows=462198 loops=1) > > and: > > >-> Merge Join (cost=31321.45..31466.92 rows=8213 width=41) > (actual time=13983.07..22642.14 rows=462198 loops=1) > > Merge Cond: ("outer".company_id = "inner".company_id) > > -> Sort (cost=24.41..25.29 rows=352 width=25) (actual > time=5.52..7.40 rows=348 loops=1) > > There are also *large* delays between steps.Either your I/O is saturated, > or you haven't run a VACUUM FULL ANALYZE in a while (which would also explain > the estimates being off). thought about that before I started the thread, and ran it just in case ... just restarted the server with sort_mem set to 10M, and didn't help much on the Aggregate, or MergeJoin ... : QUERY PLAN - Aggregate (cost=39674.38..39756.70 rows=823 width=41) (actual time=33066.25..54021.50 rows=144 loops=1) -> Group (cost=39674.38..39736.12 rows=8232 width=41) (actual time=33040.25..47005.57 rows=462198 loops=1) -> Sort (cost=39674.38..39694.96 rows=8232 width=41) (actual time=33040.22..37875.97 rows=462198 loops=1) Sort Key: c.company_name, ts.company_id -> Merge Join (cost=38993.22..39139.02 rows=8232 width=41) (actual time=14428.17..23568.80 rows=462198 loops=1) Merge Cond: ("outer".company_id = "inner".company_id) -> Sort (cost=24.41..25.29 rows=352 width=25) (actual time=5.80..7.66 rows=348 loops=1) Sort Key: c.company_id -> Seq Scan on company c (cost=0.00..9.52 rows=352 width=25) (actual time=0.08..3.06 rows=352 loops=1) -> Sort (cost=38968.82..38989.40 rows=8232 width=16) (actual time=14422.27..17429.34 rows=462198 loops=1) Sort Key: ts.company_id -> Seq Scan on traffic_logs ts (cost=0.00..38433.46 rows=8232 width=16) (actual time=0.15..8119.72 rows=462198 loops=1) Filter: (date_trunc('month'::text, runtime) = '2003-10-01 00:00:00'::timestamp without time zone) Total runtime: 54034.44 msec (14 rows) the problem is that the results we are comparing with right now is the one that had the - time on it :( Just restarted the server with default sort_mem, and here is the query with that: QUERY PLAN - Aggregate (cost=39691.27..39773.61 rows=823 width=41) (actual time=35077.18..50424.74 rows=144 loops=1) -> Group (cost=39691.27..39753.03 rows=8234 width=41) (actual time=35051.29..-650049.84 rows=462198 loops=1) -> Sort (cost=39691.27..39711.86 rows=8234 width=41) (actual time=35051.26..38847.40 rows=462198 loops=1) Sort Key: c.company_name, ts.company_id -> Merge Join (cost=39009.92..39155.76 rows=8234 width=41) (actual time=16155.37..25439.42 rows=462198 loops=1) Merge Cond: ("outer".company_id = "inner".company_id) -> Sort (cost=24.41..25.29 rows=352 width=25) (actual time=5.85..7.71 rows=348 loops=1) Sort Key: c.company_id -> Seq Scan on company c (cost=0.00..9.52 rows=352 width=25) (actual time=0.10..3.07 rows=352 loops=1) -> Sort (cost=38985.51..39006.10 rows=8234 width=16) (actual time=16149.46..19437.47 rows=462198 loops=1) Sort Key: ts.company_id -> Seq Scan on traffic_logs ts (cost=0.00..38450.00 rows=8234 width=16) (actual time=0.16..8869.37 rows=462198 loops=1) Filter: (date_trunc('month'::text, runtime) = '2003-10-01 00:00:00'::timestamp without time zone) Total runtime: 50426.80 msec (14 rows) And, just on a whim, here it is set to 100M:
Re: [PERFORM] *very* slow query to summarize data for a month ...
On Tue, 11 Nov 2003, Greg Stark wrote: > Actually you might be able to get the same effect using function indexes > like: > > create index i on traffic_log (month_trunc(runtime), company_id) had actually thought of that one ... is it something that is only available in v7.4? ams=# create index i on traffic_logs ( month_trunc(runtime), company_id ); ERROR: parser: parse error at or near "," at character 54 ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] *very* slow query to summarize data for a month ...
On Tue, 11 Nov 2003, Dennis Bjorklund wrote: > On Mon, 10 Nov 2003, Marc G. Fournier wrote: > > > > > explain analyze SELECT ts.company_id, company_name, SUM(ts.bytes) AS total_traffic > > FROM company c, traffic_logs ts > >WHERE c.company_id = ts.company_id > > AND month_trunc(ts.runtime) = '2003-10-01' > > GROUP BY company_name,ts.company_id; > > What if you do > > ts.runtime >= '2003-10-01' AND ts.runtime < '2003-11-01' > > and add an index like (runtime, company_name, company_id)? Good thought, but even simplifying it to the *lowest* query possible, with no table joins, is painfully slow: explain analyze SELECT ts.company_id, SUM(ts.bytes) AS total_traffic FROM traffic_logs ts WHERE month_trunc(ts.runtime) = '2003-10-01' GROUP BY ts.company_id; QUERY PLAN -- Aggregate (cost=31630.84..31693.05 rows=829 width=16) (actual time=14862.71..26552.39 rows=144 loops=1) -> Group (cost=31630.84..31672.31 rows=8295 width=16) (actual time=9634.28..20967.07 rows=462198 loops=1) -> Sort (cost=31630.84..31651.57 rows=8295 width=16) (actual time=9634.24..12838.73 rows=462198 loops=1) Sort Key: company_id -> Index Scan using tl_month on traffic_logs ts (cost=0.00..31090.93 rows=8295 width=16) (actual time=0.26..6043.35 rows=462198 loops=1) Index Cond: (month_trunc(runtime) = '2003-10-01 00:00:00'::timestamp without time zone) Total runtime: 26659.35 msec (7 rows) -OR- explain analyze SELECT ts.company_id, SUM(ts.bytes) AS total_traffic FROM traffic_logs ts WHERE ts.runtime >= '2003-10-01' AND ts.runtime < '2003-11-01' GROUP BY ts.company_id; QUERY PLAN -- Aggregate (cost=81044.53..84424.21 rows=45062 width=16) (actual time=13307.52..29274.66 rows=144 loops=1) -> Group (cost=81044.53..83297.65 rows=450625 width=16) (actual time=10809.02..-673265.13 rows=462198 loops=1) -> Sort (cost=81044.53..82171.09 rows=450625 width=16) (actual time=10808.99..14069.79 rows=462198 loops=1) Sort Key: company_id -> Seq Scan on traffic_logs ts (cost=0.00..38727.35 rows=450625 width=16) (actual time=0.07..6801.92 rows=462198 loops=1) Filter: ((runtime >= '2003-10-01 00:00:00'::timestamp without time zone) AND (runtime < '2003-11-01 00:00:00'::timestamp without time zone)) Total runtime: 29385.97 msec (7 rows) Just as a side note, just doing a straight scan for the records, with no SUM()/GROUP BY involved, with the month_trunc() index is still >8k msec: QUERY PLAN Index Scan using tl_month on traffic_logs ts (cost=0.00..31096.36 rows=8297 width=16) (actual time=0.96..5432.93 rows=462198 loops=1) Index Cond: (month_trunc(runtime) = '2003-10-01 00:00:00'::timestamp without time zone) Total runtime: 8092.88 msec (3 rows) and without the index, >15k msec: QUERY PLAN Seq Scan on traffic_logs ts (cost=0.00..38719.55 rows=8297 width=16) (actual time=0.11..11354.45 rows=462198 loops=1) Filter: (date_trunc('month'::text, runtime) = '2003-10-01 00:00:00'::timestamp without time zone) Total runtime: 15353.57 msec (3 rows) so the GROUP BY is affecting the overall, but even without it, its still taking a helluva long time ... I'm going to modify my load script so that it dumps monthly totals to traffic_logs, and 'details' to a schema.traffic_logs table ... I don't need the 'per day totals' at the top level at all, only speed ... the 'per day totals' are only required at the 'per client' level, and by moving the 'per day' into a client schema will shrink the table significantly ... If it wasn't for trying to pull in that 'whole month' summary, it would be fine :( ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] *very* slow query to summarize data for a month ...
On Wed, 12 Nov 2003, Greg Stark wrote: > > "Marc G. Fournier" <[EMAIL PROTECTED]> writes: > > > Just as a side note, just doing a straight scan for the records, with no > > SUM()/GROUP BY involved, with the month_trunc() index is still >8k msec: > > One of the other advantages of these aggregate tables was that we could > purge the old data much sooner with much less resistance from the > business. Since the reports were all still available and a lot of ad-hoc > queries could still be done without the raw data anyways. Actually, what I've done is do this at the 'load stage' ... but same concept ... ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[PERFORM] mnogosearch under 7.4 ...
G'day all ... Dave asked me today about 'slow downs' on the search engines, so am looking at the various queries generated by enabling log_statement/log_duration, to get a feel for is something is "off" ... and the following seems a bit weird ... QueryA and QueryB are the same query, but against two different tables in the databases ... QueryA takes ~4x longer to run then QueryB, but both EXPLAINs look similar ... in fact, looking at the EXPLAIN ANALYZE output, I would expect that QueryB would be the slower of the two ... but, the actual vs estimated times for ndict5/ndict4 seem off (ndict4 is estimated high, ndict5 is estimated low) ... QueryA: 186_archives=# explain analyze SELECT ndict5.url_id,ndict5.intag FROM ndict5, url WHERE ndict5.word_id=1343124681 AND url.rec_id=ndict5.url_id AND ((url.url || '') LIKE 'http://archives.postgresql.org/%%'); QUERY PLAN -- Nested Loop (cost=0.00..69799.69 rows=44 width=8) (actual time=113.067..26477.672 rows=14112 loops=1) -> Index Scan using n5_word on ndict5 (cost=0.00..34321.89 rows=8708 width=8) (actual time=27.349..25031.666 rows=15501 loops=1) Index Cond: (word_id = 1343124681) -> Index Scan using url_rec_id on url (cost=0.00..4.06 rows=1 width=4) (actual time=0.061..0.068 rows=1 loops=15501) Index Cond: (url.rec_id = "outer".url_id) Filter: ((url || ''::text) ~~ 'http://archives.postgresql.org/%%'::text) Total runtime: 26550.566 ms (7 rows) QueryB: 186_archives=# explain analyze SELECT ndict4.url_id,ndict4.intag FROM ndict4, url WHERE ndict4.word_id=-2038735111 AND url.rec_id=ndict4.url_id AND ((url.url || '') LIKE 'http://archives.postgresql.org/%%'); QUERY PLAN Nested Loop (cost=0.00..99120.97 rows=62 width=8) (actual time=26.330..6630.581 rows=2694 loops=1) -> Index Scan using n4_word on ndict4 (cost=0.00..48829.52 rows=12344 width=8) (actual time=7.954..6373.098 rows=2900 loops=1) Index Cond: (word_id = -2038735111) -> Index Scan using url_rec_id on url (cost=0.00..4.06 rows=1 width=4) (actual time=0.059..0.066 rows=1 loops=2900) Index Cond: (url.rec_id = "outer".url_id) Filter: ((url || ''::text) ~~ 'http://archives.postgresql.org/%%'::text) Total runtime: 6643.462 ms (7 rows) Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: [EMAIL PROTECTED] Yahoo!: yscrappy ICQ: 7615664 ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] another query optimization question
On Sat, 31 Jan 2004, Tom Lane wrote: > David Teran <[EMAIL PROTECTED]> writes: > > Apple provides a little tool that can enable / disable the l2 cache ... > > one CPU of a dual CPU system on the fly. When i start the testapp with > > two CPU's enabled i get this output here, when i turn off one CPU while > > the app is still running the messages disappear as long as one CPU is > > turned off. Reactivating the CPU again produces new error messages. > > Ah-hah, so the gettimeofday bug *is* linked to multiple CPUs. Marc, > were the machines you saw it on all multi-CPU? I'm not sure ... I thought I ran it on my P4 here in the office and saw it too, albeit not near as frequently ... but, in FreeBSD's case, it is a "design issue" ... there are two different functions, once that is kinda fuzzy (but fast), and the other that is designed to be exact, but at a performance loss ... or was it the same function, but a 'sysctl' variable that changes the state? Can't remember which, but it is by design on FreeBSD ... and, if we're talking about Apple, the same most likely applies, as its based on the same kernel ... Back of my mind, I *think* it was these sysctl variables: kern.timecounter.method: 0 kern.timecounter.hardware: i8254 Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: [EMAIL PROTECTED] Yahoo!: yscrappy ICQ: 7615664 ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly