[PERFORM] Please ignore ...

2008-04-30 Thread Marc G. Fournier
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


Someone on this list has one of those 'confirm your email' filters on their 
mailbox, which is bouncing back messages ... this is an attempt to try and 
narrow down the address that is causing this ...

- -- 
Marc G. FournierHub.Org Hosting Solutions S.A. (http://www.hub.org)
Email . [EMAIL PROTECTED]  MSN . [EMAIL PROTECTED]
Yahoo . yscrappy   Skype: hub.orgICQ . 7615664
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.8 (FreeBSD)

iEYEARECAAYFAkgZRAAACgkQ4QvfyHIvDvNHrwCcDdlkjAXSyfyOBa5vgfLVOrSb
JyoAn005bSbY6lnyjGmlOQzj7fSMNSKV
=n5PC
-END PGP SIGNATURE-


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres on VPS - how much is enough?

2006-03-06 Thread Marc G. Fournier

On Mon, 6 Mar 2006, Matthew Nuzum wrote:


On 3/6/06, Nagita Karunaratne [EMAIL PROTECTED] wrote:

How big a VPS would I need to run a Postgres DB.




One application will add about 500 orders per day
Another will  access this data to create and send about 500 emails per day
A third will access this data to create an after-sales survey for at
most 500 times per day.

What type of VPS would I need to run a database with this type pf load?
Is 128 MB ram enough?
What percentage of a 2.8 GHz CPU would be required?


My problem with running PG inside of a VPS was that the VPS used a
virtual filesystem... basically, a single file that had been formatted
and loop mounted so that it looked like a regular hard drive.
Unfortunately, it was very slow. The difference between my application
and yours is that mine well more than filled the 1GB of RAM that I had
allocated. If your data will fit comfortably into RAM then you may be
fine.


We host VPSs here (http://www.hub.org) and don't use the 'single file, 
virtual file system' to put them into ... it must depend on where you 
host?



Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


[PERFORM] pg_stat* values ...

2005-12-01 Thread Marc G. Fournier


Not having found anything so far, does anyone know of, and can point me 
to, either tools, or articles, that talk about doing tuning based on the 
information that this sort of information can help with?



Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


[PERFORM] CHECK vs REFERENCES

2005-09-09 Thread Marc G. Fournier


Which is faster, where the list involved is fixed?  My thought is that 
since it doesn't have to check a seperate table, the CHECK itself should 
be the faster of the two, but I can't find anything that seems to validate 
that theory ...


The case is where I just want to check that a value being inserted is one 
of a few possible values, with that list of values rarely (if ever) 
changing, so havng a 'flexible list' REFERENCED seems relatively overkill 
...


Thoughts, or pointers to a doc that disproves, or proves, what I believe?

Thanks ...


Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] CHECK vs REFERENCES

2005-09-09 Thread Marc G. Fournier

On Fri, 9 Sep 2005, Michael Fuhr wrote:


On Sat, Sep 10, 2005 at 12:23:19AM -0300, Marc G. Fournier wrote:

Which is faster, where the list involved is fixed?  My thought is that
since it doesn't have to check a seperate table, the CHECK itself should
be the faster of the two, but I can't find anything that seems to validate
that theory ...


Why not just benchmark each method as you intend to use them?  Here's
a simplistic example:

CREATE TABLE test_none (
   val  integer NOT NULL
);

CREATE TABLE test_check (
   val  integer NOT NULL CHECK (val IN (1, 2, 3, 4, 5))
);

CREATE TABLE test_vals (
   id  integer PRIMARY KEY
);
INSERT INTO test_vals SELECT * FROM generate_series(1, 5);

CREATE TABLE test_fk (
   val  integer NOT NULL REFERENCES test_vals
);

\timing

INSERT INTO test_none SELECT 1 FROM generate_series(1, 10);
INSERT 0 10
Time: 3109.089 ms

INSERT INTO test_check SELECT 1 FROM generate_series(1, 10);
INSERT 0 10
Time: 3492.344 ms

INSERT INTO test_fk SELECT 1 FROM generate_series(1, 10);
INSERT 0 10
Time: 23578.853 ms


Yowch, I expected CHECK to be better ... but not so significantly ... I 
figured I'd be saving milliseconds, which, on a busy server, would add up 
fast ... but not 10k' of milliseconds ...


Thanks, that definitely shows a major benefit ...


Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


[PERFORM] getting an index to work with partial indices ...

2005-08-29 Thread Marc G. Fournier


Try as I might, I can't seem to get it to work ... table has 9million 
rows in it, I've created an index using btree ( priority ) where priority 
 0;, where the table distribution looks like:


 priority |  count
--+-
   -2 |  138435
   -1 |  943250
1 |3416
9 | 1134171
  | 7276960
(5 rows)

And it still won't use the index:

# explain update table set priority = -3 where priority = -1;
QUERY PLAN 
--

 Seq Scan on table  (cost=0.00..400735.90 rows=993939 width=278)
   Filter: (priority = -1)
(2 rows)

But, ti will if I try 'priority = -2' ... what is teh threshhold for using 
the index?  obviously 10% of the records is too high ...


thanks ...


Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PERFORM] Major flood of mail to lists ...

2005-05-31 Thread Marc G. Fournier


Do to moderator error (namely, mine), several hundred messages (spread 
across all the lists) were just approved ...


Sorry for all the incoming junk :(


Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


[PERFORM] Sustained inserts per sec ... ?

2005-04-01 Thread Marc G. Fournier
Just curious, but does anyone have an idea of what we are capable of?  I 
realize that size of record would affect things, as well as hardware, but 
if anyone has some ideas on max, with 'record size', that would be 
appreciated ...

Thanks ...

Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664
---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] another query optimization question

2004-01-31 Thread Marc G. Fournier
On Sat, 31 Jan 2004, Tom Lane wrote:

 David Teran [EMAIL PROTECTED] writes:
  Apple provides a little tool that can enable / disable the l2 cache ...
  one CPU of a dual CPU system on the fly. When i start the testapp with
  two CPU's enabled i get this output here, when i turn off one CPU while
  the app is still running the messages disappear as long as one CPU is
  turned off. Reactivating the CPU again produces new error messages.

 Ah-hah, so the gettimeofday bug *is* linked to multiple CPUs.  Marc,
 were the machines you saw it on all multi-CPU?

I'm not sure ... I thought I ran it on my P4 here in the office and saw it
too, albeit not near as frequently ... but, in FreeBSD's case, it is a
design issue ... there are two different functions, once that is kinda
fuzzy (but fast), and the other that is designed to be exact, but at a
performance loss ... or was it the same function, but a 'sysctl' variable
that changes the state?

Can't remember which, but it is by design on FreeBSD ... and, if we're
talking about Apple, the same most likely applies, as its based on the
same kernel ...

Back of my mind, I *think* it was these sysctl variables:

kern.timecounter.method: 0
kern.timecounter.hardware: i8254



 
Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[PERFORM] mnogosearch under 7.4 ...

2003-12-24 Thread Marc G. Fournier

G'day all ...

Dave asked me today about 'slow downs' on the search engines, so am
looking at the various queries generated by enabling
log_statement/log_duration, to get a feel for is something is off ...
and the following seems a bit weird ...

QueryA and QueryB are the same query, but against two different tables in
the databases ... QueryA takes ~4x longer to run then QueryB, but both
EXPLAINs look similar ... in fact, looking at the EXPLAIN ANALYZE output,
I would expect that QueryB would be the slower of the two ... but, the
actual vs estimated times for ndict5/ndict4 seem off (ndict4 is estimated
high, ndict5 is estimated low) ...

QueryA:

186_archives=# explain analyze SELECT ndict5.url_id,ndict5.intag
 FROM ndict5, url
WHERE ndict5.word_id=1343124681
  AND url.rec_id=ndict5.url_id
  AND ((url.url || '') LIKE 
'http://archives.postgresql.org/%%');
  QUERY PLAN
--
 Nested Loop  (cost=0.00..69799.69 rows=44 width=8) (actual time=113.067..26477.672 
rows=14112 loops=1)
   -  Index Scan using n5_word on ndict5  (cost=0.00..34321.89 rows=8708 width=8) 
(actual time=27.349..25031.666 rows=15501 loops=1)
 Index Cond: (word_id = 1343124681)
   -  Index Scan using url_rec_id on url  (cost=0.00..4.06 rows=1 width=4) (actual 
time=0.061..0.068 rows=1 loops=15501)
 Index Cond: (url.rec_id = outer.url_id)
 Filter: ((url || ''::text) ~~ 'http://archives.postgresql.org/%%'::text)
 Total runtime: 26550.566 ms
(7 rows)

QueryB:

186_archives=# explain analyze SELECT ndict4.url_id,ndict4.intag
 FROM ndict4, url
WHERE ndict4.word_id=-2038735111
  AND url.rec_id=ndict4.url_id
  AND ((url.url || '') LIKE 
'http://archives.postgresql.org/%%');

 QUERY PLAN

 Nested Loop  (cost=0.00..99120.97 rows=62 width=8) (actual time=26.330..6630.581 
rows=2694 loops=1)
   -  Index Scan using n4_word on ndict4  (cost=0.00..48829.52 rows=12344 width=8) 
(actual time=7.954..6373.098 rows=2900 loops=1)
 Index Cond: (word_id = -2038735111)
   -  Index Scan using url_rec_id on url  (cost=0.00..4.06 rows=1 width=4) (actual 
time=0.059..0.066 rows=1 loops=2900)
 Index Cond: (url.rec_id = outer.url_id)
 Filter: ((url || ''::text) ~~ 'http://archives.postgresql.org/%%'::text)
 Total runtime: 6643.462 ms
(7 rows)





Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] *very* slow query to summarize data for a month ...

2003-11-11 Thread Marc G. Fournier

On Tue, 11 Nov 2003, Greg Stark wrote:

 Actually you might be able to get the same effect using function indexes
 like:

 create index i on traffic_log (month_trunc(runtime), company_id)

had actually thought of that one ... is it something that is only
available in v7.4?

ams=# create index i on traffic_logs ( month_trunc(runtime), company_id );
ERROR:  parser: parse error at or near , at character 54


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] *very* slow query to summarize data for a month ...

2003-11-11 Thread Marc G. Fournier


On Tue, 11 Nov 2003, Dennis Bjorklund wrote:

 On Mon, 10 Nov 2003, Marc G. Fournier wrote:

 
  explain analyze SELECT ts.company_id, company_name, SUM(ts.bytes) AS total_traffic
  FROM company c, traffic_logs ts
 WHERE c.company_id = ts.company_id
   AND month_trunc(ts.runtime) = '2003-10-01'
  GROUP BY company_name,ts.company_id;

 What if you do

   ts.runtime = '2003-10-01' AND ts.runtime  '2003-11-01'

 and add an index like (runtime, company_name, company_id)?

Good thought, but even simplifying it to the *lowest* query possible, with
no table joins, is painfully slow:

explain analyze SELECT ts.company_id, SUM(ts.bytes) AS total_traffic
FROM traffic_logs ts
   WHERE month_trunc(ts.runtime) = '2003-10-01'
GROUP BY ts.company_id;


QUERY PLAN
--
 Aggregate  (cost=31630.84..31693.05 rows=829 width=16) (actual 
time=14862.71..26552.39 rows=144 loops=1)
   -  Group  (cost=31630.84..31672.31 rows=8295 width=16) (actual 
time=9634.28..20967.07 rows=462198 loops=1)
 -  Sort  (cost=31630.84..31651.57 rows=8295 width=16) (actual 
time=9634.24..12838.73 rows=462198 loops=1)
   Sort Key: company_id
   -  Index Scan using tl_month on traffic_logs ts  (cost=0.00..31090.93 
rows=8295 width=16) (actual time=0.26..6043.35 rows=462198 loops=1)
 Index Cond: (month_trunc(runtime) = '2003-10-01 
00:00:00'::timestamp without time zone)
 Total runtime: 26659.35 msec
(7 rows)



-OR-

explain analyze SELECT ts.company_id, SUM(ts.bytes) AS total_traffic
FROM traffic_logs ts
   WHERE ts.runtime = '2003-10-01' AND ts.runtime  '2003-11-01'
GROUP BY ts.company_id;


QUERY PLAN
--
 Aggregate  (cost=81044.53..84424.21 rows=45062 width=16) (actual 
time=13307.52..29274.66 rows=144 loops=1)
   -  Group  (cost=81044.53..83297.65 rows=450625 width=16) (actual 
time=10809.02..-673265.13 rows=462198 loops=1)
 -  Sort  (cost=81044.53..82171.09 rows=450625 width=16) (actual 
time=10808.99..14069.79 rows=462198 loops=1)
   Sort Key: company_id
   -  Seq Scan on traffic_logs ts  (cost=0.00..38727.35 rows=450625 
width=16) (actual time=0.07..6801.92 rows=462198 loops=1)
 Filter: ((runtime = '2003-10-01 00:00:00'::timestamp without 
time zone) AND (runtime  '2003-11-01 00:00:00'::timestamp without time zone))
 Total runtime: 29385.97 msec
(7 rows)


Just as a side note, just doing a straight scan for the records, with no
SUM()/GROUP BY involved, with the month_trunc() index is still 8k msec:

   QUERY PLAN

 Index Scan using tl_month on traffic_logs ts  (cost=0.00..31096.36 rows=8297 
width=16) (actual time=0.96..5432.93 rows=462198 loops=1)
   Index Cond: (month_trunc(runtime) = '2003-10-01 00:00:00'::timestamp without time 
zone)
 Total runtime: 8092.88 msec
(3 rows)

and without the index, 15k msec:

   QUERY PLAN

 Seq Scan on traffic_logs ts  (cost=0.00..38719.55 rows=8297 width=16) (actual 
time=0.11..11354.45 rows=462198 loops=1)
   Filter: (date_trunc('month'::text, runtime) = '2003-10-01 00:00:00'::timestamp 
without time zone)
 Total runtime: 15353.57 msec
(3 rows)

so the GROUP BY is affecting the overall, but even without it, its still
taking a helluva long time ...

I'm going to modify my load script so that it dumps monthly totals to
traffic_logs, and 'details' to a schema.traffic_logs table ... I don't
need the 'per day totals' at the top level at all, only speed ... the 'per
day totals' are only required at the 'per client' level, and by moving the
'per day' into a client schema will shrink the table significantly ...

If it wasn't for trying to pull in that 'whole month' summary, it would be
fine :(

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] *very* slow query to summarize data for a month ...

2003-11-10 Thread Marc G. Fournier

Table structure is simple:

CREATE TABLE traffic_logs (
company_id bigint,
ip_id bigint,
port integer,
bytes bigint,
runtime timestamp without time zone
);

runtime is 'day of month' ...

I need to summarize the month, per company, with a query as:

explain analyze SELECT ts.company_id, company_name, SUM(ts.bytes) AS total_traffic
FROM company c, traffic_logs ts
   WHERE c.company_id = ts.company_id
 AND month_trunc(ts.runtime) = '2003-10-01'
GROUP BY company_name,ts.company_id;

and the explain looks like:
  QUERY 
PLAN
--
 Aggregate  (cost=32000.94..32083.07 rows=821 width=41) (actual 
time=32983.36..47586.17 rows=144 loops=1)
   -  Group  (cost=32000.94..32062.54 rows=8213 width=41) (actual 
time=32957.40..42817.88 rows=462198 loops=1)
 -  Sort  (cost=32000.94..32021.47 rows=8213 width=41) (actual 
time=32957.38..36261.31 rows=462198 loops=1)
   Sort Key: c.company_name, ts.company_id
   -  Merge Join  (cost=31321.45..31466.92 rows=8213 width=41) (actual 
time=13983.07..22642.14 rows=462198 loops=1)
 Merge Cond: (outer.company_id = inner.company_id)
 -  Sort  (cost=24.41..25.29 rows=352 width=25) (actual 
time=5.52..7.40 rows=348 loops=1)
   Sort Key: c.company_id
   -  Seq Scan on company c  (cost=0.00..9.52 rows=352 
width=25) (actual time=0.02..2.78 rows=352 loops=1)
 -  Sort  (cost=31297.04..31317.57 rows=8213 width=16) (actual 
time=13977.49..16794.41 rows=462198 loops=1)
   Sort Key: ts.company_id
   -  Index Scan using tl_month on traffic_logs ts  
(cost=0.00..30763.02 rows=8213 width=16) (actual time=0.29..5562.25 rows=462198 
loops=1)
 Index Cond: (month_trunc(runtime) = '2003-10-01 
00:00:00'::timestamp without time zone)
 Total runtime: 47587.82 msec
(14 rows)

the problem is that we're only taking a few months worth of data, so I
don't think there is much of a way of 'improve performance' on this, but
figured I'd ask quickly before I do something rash ...

Note that without the month_trunc() index, the Total runtime more then
doubles:

   QUERY PLAN

 Aggregate  (cost=39578.63..39660.76 rows=821 width=41) (actual 
time=87805.47..101251.35 rows=144 loops=1)
   -  Group  (cost=39578.63..39640.23 rows=8213 width=41) (actual 
time=87779.56..96824.56 rows=462198 loops=1)
 -  Sort  (cost=39578.63..39599.17 rows=8213 width=41) (actual 
time=87779.52..90781.48 rows=462198 loops=1)
   Sort Key: c.company_name, ts.company_id
   -  Merge Join  (cost=38899.14..39044.62 rows=8213 width=41) (actual 
time=64073.98..72783.68 rows=462198 loops=1)
 Merge Cond: (outer.company_id = inner.company_id)
 -  Sort  (cost=24.41..25.29 rows=352 width=25) (actual 
time=64.66..66.55 rows=348 loops=1)
   Sort Key: c.company_id
   -  Seq Scan on company c  (cost=0.00..9.52 rows=352 
width=25) (actual time=1.76..61.70 rows=352 loops=1)
 -  Sort  (cost=38874.73..38895.27 rows=8213 width=16) (actual 
time=64009.26..66860.71 rows=462198 loops=1)
   Sort Key: ts.company_id
   -  Seq Scan on traffic_logs ts  (cost=0.00..38340.72 
rows=8213 width=16) (actual time=5.02..-645982.04 rows=462198 loops=1)
 Filter: (date_trunc('month'::text, runtime) = 
'2003-10-01 00:00:00'::timestamp without time zone)
 Total runtime: 101277.17 msec
(14 rows)


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] *very* slow query to summarize data for a month ...

2003-11-10 Thread Marc G. Fournier


On Mon, 10 Nov 2003, Neil Conway wrote:

 Marc G. Fournier [EMAIL PROTECTED] writes:
  -  Index Scan using tl_month on traffic_logs ts  (cost=0.00..30763.02 rows=8213 
  width=16) (actual time=0.29..5562.25 rows=462198 loops=1)
Index Cond: (month_trunc(runtime) = '2003-10-01 00:00:00'::timestamp without 
  time zone)

 Interesting that we get the row count estimate for this index scan so
 wrong -- I believe this is the root of the problem. Hmmm... I would
 guess that the optimizer stats we have for estimating the selectivity
 of a functional index is pretty primitive, but I haven't looked into
 it at all. Tom might be able to shed some light...

 [ In the second EXPLAIN ANALYZE, ... ]

  -  Seq Scan on traffic_logs ts  (cost=0.00..38340.72 rows=8213 width=16) (actual 
  time=5.02..-645982.04 rows=462198 loops=1)
Filter: (date_trunc('month'::text, runtime) = '2003-10-01 
  00:00:00'::timestamp without time zone)

 Uh, what? The actual time seems to have finished far before it has
 begun :-) Is this just a typo, or does the actual output include a
 negative number?

This was purely a cut-n-paste ...


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] *very* slow query to summarize data for a month ...

2003-11-10 Thread Marc G. Fournier


On Mon, 10 Nov 2003, Tom Lane wrote:

 Neil Conway [EMAIL PROTECTED] writes:
  Interesting that we get the row count estimate for this index scan so
  wrong -- I believe this is the root of the problem. Hmmm... I would
  guess that the optimizer stats we have for estimating the selectivity
  of a functional index is pretty primitive, but I haven't looked into
  it at all. Tom might be able to shed some light...

 Try none at all.  I have speculated in the past that it would be worth
 gathering statistics about the contents of functional indexes, but it's
 still on the to-do-someday list.

  -  Seq Scan on traffic_logs ts  (cost=0.00..38340.72 rows=8213 width=16) (actual 
  time=5.02..-645982.04 rows=462198 loops=1)

  Uh, what?

 That is bizarre, all right.  Is it reproducible?

Nope, and a subsequent run shows better results too:

 QUERY PLAN
-
 Aggregate  (cost=39674.38..39756.70 rows=823 width=41) (actual 
time=35573.27..49953.47 rows=144 loops=1)
   -  Group  (cost=39674.38..39736.12 rows=8232 width=41) (actual 
time=35547.27..45479.27 rows=462198 loops=1)
 -  Sort  (cost=39674.38..39694.96 rows=8232 width=41) (actual 
time=35547.23..39167.90 rows=462198 loops=1)
   Sort Key: c.company_name, ts.company_id
   -  Merge Join  (cost=38993.22..39139.02 rows=8232 width=41) (actual 
time=16658.23..25559.08 rows=462198 loops=1)
 Merge Cond: (outer.company_id = inner.company_id)
 -  Sort  (cost=24.41..25.29 rows=352 width=25) (actual 
time=5.51..7.38 rows=348 loops=1)
   Sort Key: c.company_id
   -  Seq Scan on company c  (cost=0.00..9.52 rows=352 
width=25) (actual time=0.02..2.80 rows=352 loops=1)
 -  Sort  (cost=38968.82..38989.40 rows=8232 width=16) (actual 
time=16652.66..19785.83 rows=462198 loops=1)
   Sort Key: ts.company_id
   -  Seq Scan on traffic_logs ts  (cost=0.00..38433.46 
rows=8232 width=16) (actual time=0.11..8794.43 rows=462198 loops=1)
 Filter: (date_trunc('month'::text, runtime) = 
'2003-10-01 00:00:00'::timestamp without time zone)
 Total runtime: 49955.22 msec


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] *very* slow query to summarize data for a month ...

2003-11-10 Thread Marc G. Fournier


On Mon, 10 Nov 2003, Josh Berkus wrote:

 Marc,

 I'd say your machine is very low on available RAM, particularly sort_mem.
 The steps which are taking a long time are:

Here's the server:

last pid: 42651;  load averages:  1.52,  0.96,  0.88
up 28+07:43:33  20:35:44
307 processes: 2 running, 304 sleeping, 1 zombie
CPU states: 18.0% user,  0.0% nice, 29.1% system,  0.6% interrupt, 52.3% idle
Mem: 1203M Active, 1839M Inact, 709M Wired, 206M Cache, 199M Buf, 5608K Free
Swap: 8192M Total, 1804K Used, 8190M Free


   Aggregate  (cost=32000.94..32083.07 rows=821 width=41) (actual
 time=32983.36..47586.17 rows=144 loops=1)
 -  Group  (cost=32000.94..32062.54 rows=8213 width=41) (actual
 time=32957.40..42817.88 rows=462198 loops=1)

 and:

 -  Merge Join  (cost=31321.45..31466.92 rows=8213 width=41)
 (actual time=13983.07..22642.14 rows=462198 loops=1)
   Merge Cond: (outer.company_id = inner.company_id)
   -  Sort  (cost=24.41..25.29 rows=352 width=25) (actual
 time=5.52..7.40 rows=348 loops=1)

 There are also *large* delays between steps.Either your I/O is saturated,
 or you haven't run a VACUUM FULL ANALYZE in a while (which would also explain
 the estimates being off).

thought about that before I started the thread, and ran it just in case ...

just restarted the server with sort_mem set to 10M, and didn't help much on the 
Aggregate, or MergeJoin ... :

 QUERY PLAN
-
 Aggregate  (cost=39674.38..39756.70 rows=823 width=41) (actual 
time=33066.25..54021.50 rows=144 loops=1)
   -  Group  (cost=39674.38..39736.12 rows=8232 width=41) (actual 
time=33040.25..47005.57 rows=462198 loops=1)
 -  Sort  (cost=39674.38..39694.96 rows=8232 width=41) (actual 
time=33040.22..37875.97 rows=462198 loops=1)
   Sort Key: c.company_name, ts.company_id
   -  Merge Join  (cost=38993.22..39139.02 rows=8232 width=41) (actual 
time=14428.17..23568.80 rows=462198 loops=1)
 Merge Cond: (outer.company_id = inner.company_id)
 -  Sort  (cost=24.41..25.29 rows=352 width=25) (actual 
time=5.80..7.66 rows=348 loops=1)
   Sort Key: c.company_id
   -  Seq Scan on company c  (cost=0.00..9.52 rows=352 
width=25) (actual time=0.08..3.06 rows=352 loops=1)
 -  Sort  (cost=38968.82..38989.40 rows=8232 width=16) (actual 
time=14422.27..17429.34 rows=462198 loops=1)
   Sort Key: ts.company_id
   -  Seq Scan on traffic_logs ts  (cost=0.00..38433.46 
rows=8232 width=16) (actual time=0.15..8119.72 rows=462198 loops=1)
 Filter: (date_trunc('month'::text, runtime) = 
'2003-10-01 00:00:00'::timestamp without time zone)
 Total runtime: 54034.44 msec
(14 rows)

the problem is that the results we are comparing with right now is the one
that had the - time on it :(  Just restarted the server with default
sort_mem, and here is the query with that:

 QUERY PLAN
-
 Aggregate  (cost=39691.27..39773.61 rows=823 width=41) (actual 
time=35077.18..50424.74 rows=144 loops=1)
   -  Group  (cost=39691.27..39753.03 rows=8234 width=41) (actual 
time=35051.29..-650049.84 rows=462198 loops=1)
 -  Sort  (cost=39691.27..39711.86 rows=8234 width=41) (actual 
time=35051.26..38847.40 rows=462198 loops=1)
   Sort Key: c.company_name, ts.company_id
   -  Merge Join  (cost=39009.92..39155.76 rows=8234 width=41) (actual 
time=16155.37..25439.42 rows=462198 loops=1)
 Merge Cond: (outer.company_id = inner.company_id)
 -  Sort  (cost=24.41..25.29 rows=352 width=25) (actual 
time=5.85..7.71 rows=348 loops=1)
   Sort Key: c.company_id
   -  Seq Scan on company c  (cost=0.00..9.52 rows=352 
width=25) (actual time=0.10..3.07 rows=352 loops=1)
 -  Sort  (cost=38985.51..39006.10 rows=8234 width=16) (actual 
time=16149.46..19437.47 rows=462198 loops=1)
   Sort Key: ts.company_id
   -  Seq Scan on traffic_logs ts  (cost=0.00..38450.00 
rows=8234 width=16) (actual time=0.16..8869.37 rows=462198 loops=1)
 Filter: (date_trunc('month'::text, runtime) = 
'2003-10-01 00:00:00'::timestamp without time zone)
 Total runtime: 50426.80 msec
(14 rows)


And, just on a whim, here it is set to 100M:

 QUERY PLAN

Re: [PERFORM] FreeBSD page size

2003-09-03 Thread Marc G. Fournier


Just curious, but Bruce(?) mentioned that apparently a 32k block size was
found to show a 15% improvement ... care to run one more test? :)

On Wed, 3 Sep 2003, Vivek Khera wrote:

 Ok... simple tests have completed.  Here are some numbers.

 FreeBSD 4.8
 PG 7.4b2
 4GB Ram
 Dual Xeon 2.4GHz processors
 14 U320 SCSI disks attached to Dell PERC3/DC RAID controller in RAID 5
  config with 32k stripe size

 Dump file:
 -rw-r--r--  1 vivek  wheel  1646633745 Aug 28 11:01 19-Aug-2003.dump

 When restored (after deleting one index that took up ~1Gb -- turned
 out it was redundant to another multi-column index):

 % df -k /u/d02
 Filesystem1K-blocks Used Avail Capacity  Mounted on
 /dev/amrd1s1e 226408360 18067260 190228432 9%/u/d02



 postgresql.conf alterations from standard:
 shared_buffers = 6
 sort_mem = 8192
 vacuum_mem=131702
 max_fsm_pages=100
 effective_cache_size=25600
 random_page-cost = 2


 restore time: 14777 seconds
 vacuum analyze time: 30 minutes
 select count(*) from user_list where owner_id=315;   50388.64 ms


 the restore complained often about checkpoints occurring every few
 seconds:

 Sep  2 11:57:14 d02 postgres[49721]: [5-1] LOG:  checkpoints are occurring too 
 frequently (15 seconds apart)
 Sep  2 11:57:14 d02 postgres[49721]: [5-2] HINT:  Consider increasing 
 CHECKPOINT_SEGMENTS.

 The HINT threw me off since I had to set checkpoint_segments in
 postgresql.conf, where as CHECKPOINT_SEGMENTS implied to me a
 compile-time constant.

 Anyhow, so I deleted the PG data directory, and made these two
 changes:

 checkpoint_segments=50
 sort_mem = 131702

 This *really* improved the time for the restore:

 restore time: 11594 seconds

 then I reset the checkpoint_segments and sort_mem back to old
 values...

 vacuum analyze time is still 30 minutes
 select count(*) from user_list where owner_id=315;   51363.98 ms

 so the select appears a bit slower but it is hard to say why.  the
 system is otherwise idle as it is not in production yet.


 Then I took the suggestion to update PG's page size to 16k and did the
 same increase on sort_mem and checkpoint_segments as above.  I also
 halved the shared_buffers and max_fsm_pages  (probably should have
 halved the effective_cache_size too...)

 restore time: 11322 seconds
 vacuum analyze time: 27 minutes
 select count(*) from user_list where owner_id=315;   48267.66 ms


 Granted, given this simple test it is hard to say whether the 16k
 blocks will make an improvement under live load, but I'm gonna give it
 a shot.  The 16k block size shows me roughly 2-6% improvement on these
 tests.

 So throw in my vote for 16k blocks on FreeBSD (and annotate the docs
 to tell which parameters need to be halved to account for it).


 --
 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
 Vivek Khera, Ph.D.Khera Communications, Inc.
 Internet: [EMAIL PROTECTED]   Rockville, MD   +1-240-453-8497
 AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/

 ---(end of broadcast)---
 TIP 2: you can get off all lists at once with the unregister command
 (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] FreeBSD page size

2003-09-03 Thread Marc G. Fournier


On Wed, 3 Sep 2003, Bruce Momjian wrote:

 Vivek Khera wrote:
  the restore complained often about checkpoints occurring every few
  seconds:
 
  Sep  2 11:57:14 d02 postgres[49721]: [5-1] LOG:  checkpoints are occurring too 
  frequently (15 seconds apart)
  Sep  2 11:57:14 d02 postgres[49721]: [5-2] HINT:  Consider increasing 
  CHECKPOINT_SEGMENTS.
 
  The HINT threw me off since I had to set checkpoint_segments in
  postgresql.conf, where as CHECKPOINT_SEGMENTS implied to me a
  compile-time constant.

 Woo hoo, my warning worked.  Great.

 I uppercased it because config parameters are uppercased in the
 documentation.  Do we mention config parameters in any other error
 messages?  Should it be lowercased?

k, to me upper case denotes a compiler #define, so I would have been
confused ... I'd go with lower case and single quotes around it to denote
its a variable to be changed ...

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org