Re: [PERFORM] Postgres Replaying WAL slowly

2014-07-01 Thread Jeff Frost
On Jun 30, 2014, at 4:57 PM, Jeff Frost j...@pgexperts.com wrote:

 
 On Jun 30, 2014, at 4:04 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 
 Ah ... that's more like a number I can believe something would have
 trouble coping with.  Did you see a noticeable slowdown with this?
 Now that we've seen that number, of course it's possible there was an
 even higher peak occurring when you saw the trouble.
 
 Perhaps there's an O(N^2) behavior in StandbyReleaseLocks, or maybe
 it just takes awhile to handle that many locks.
 
 Did you check whether the locks were all on temp tables of the
 ON COMMIT DROP persuasion?
 
 
 Unfortunately not, because I went for a poor man's: SELECT count(*) FROM 
 pg_locks WHERE mode = 'AccessExclusiveLock' 
 run in cron every minute.
 
 That said, I'd bet it was mostly ON COMMIT DROP temp tables.
 
 The unfortunate thing is I wouldn't know how to correlate that spike with the 
 corresponding slowdown because the replica is about 5.5hrs lagged at the 
 moment.
 
 Hopefully it will get caught up tonight and we can see if there's a 
 correlation tomorrow.

And indeed it did catch up overnight and the lag increased shortly after a 
correlating spike in AccessExclusiveLocks that were generated by temp table 
creation with on commit drop.




Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-30 Thread Jeff Frost

On Jun 30, 2014, at 9:14 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote:

 On 06/30/2014 05:46 PM, Soni M wrote:
 Here's what 'perf top' said on streaming replica :
 
 Samples: 26K of event 'cpu-clock', Event count (approx.): 19781
  95.97%  postgres [.] 0x002210f3
 
 Ok, so it's stuck doing something.. Can you get build with debug symbols 
 installed, so that we could see the function name?
 - Heikki
 

Looks like StandbyReleaseLocks:

Samples: 10K of event 'cpu-clock', Event count (approx.): 8507
 89.21%  postgres  [.] StandbyReleaseLocks
  0.89%  libc-2.12.so  [.] __strstr_sse2
  0.83%  perf  [.] 0x0005f1e5
  0.74%  [kernel]  [k] kallsyms_expand_symbol
  0.52%  libc-2.12.so  [.] memchr
  0.47%  perf  [.] symbols__insert
  0.47%  [kernel]  [k] format_decode

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-30 Thread Jeff Frost
On Jun 30, 2014, at 10:29 AM, Soni M diptat...@gmail.com wrote:

 
 
 
 On Tue, Jul 1, 2014 at 12:14 AM, Andres Freund and...@2ndquadrant.com wrote:
 
 My guess it's a spinlock, probably xlogctl-info_lck via
 RecoveryInProgress(). Unfortunately inline assembler doesn't always seem
 to show up correctly in profiles...
 
 What worked for me was to build with -fno-omit-frame-pointer - that
 normally shows the callers, even if it can't generate a proper symbol
 name.
 
 Soni: Do you use Hot Standby? Are there connections active while you
 have that problem? Any other processes with high cpu load?
 
 Greetings,
 
 Andres Freund
 
 --
  Andres Freund http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services
 
 It is 
  96.62%  postgres [.] StandbyReleaseLocks
  as Jeff said. It runs quite long time, more than 5 minutes i think
 
 i also use hot standby. we have 4 streaming replica, some of them has active 
 connection some has not. this issue has last more than 4 days. On one of the 
 standby, above postgres process is the only process that consume high cpu 
 load.

compiled with -fno-omit-frame-pointer doesn't yield much more info:

 76.24%  postgres   [.] StandbyReleaseLocks
  2.64%  libcrypto.so.1.0.1e[.] md5_block_asm_data_order
  2.19%  libcrypto.so.1.0.1e[.] RC4
  2.17%  postgres   [.] RecordIsValid
  1.20%  [kernel]   [k] copy_user_generic_unrolled
  1.18%  [kernel]   [k] _spin_unlock_irqrestore
  0.97%  [vmxnet3]  [k] vmxnet3_poll_rx_only
  0.87%  [kernel]   [k] __do_softirq
  0.77%  [vmxnet3]  [k] vmxnet3_xmit_frame
  0.69%  postgres   [.] hash_search_with_hash_value
  0.68%  [kernel]   [k] fin

However, this server started progressing through the WAL files quite a bit 
better before I finished compiling, so we'll leave it running with this version 
and see if there's more info available the next time it starts replaying slowly.




Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-30 Thread Jeff Frost
On Jun 30, 2014, at 11:39 AM, Andres Freund and...@2ndquadrant.com wrote:

 On 2014-06-30 11:34:52 -0700, Jeff Frost wrote:
 On Jun 30, 2014, at 10:29 AM, Soni M diptat...@gmail.com wrote:
 
 It is 
 96.62%  postgres [.] StandbyReleaseLocks
 as Jeff said. It runs quite long time, more than 5 minutes i think
 
 i also use hot standby. we have 4 streaming replica, some of them has 
 active connection some has not. this issue has last more than 4 days. On 
 one of the standby, above postgres process is the only process that consume 
 high cpu load.
 
 compiled with -fno-omit-frame-pointer doesn't yield much more info:
 
 You'd need to do perf record -ga instead of perf record -a to see
 additional information.
 

Ah! That's right.

Here's how that looks:

Samples: 473K of event 'cpu-clock', Event count (approx.): 473738
+  68.42%  init  [kernel.kallsyms] [k] 
native_safe_halt
+  26.07%  postgres  postgres  [.] 
StandbyReleaseLocks
+   2.82%   swapper  [kernel.kallsyms] [k] 
native_safe_halt
+   0.19%   ssh  libcrypto.so.1.0.1e   [.] 
md5_block_asm_data_order
+   0.19%  postgres  postgres  [.] 
RecordIsValid
+   0.16%   ssh  libcrypto.so.1.0.1e   [.] RC4
+   0.10%  postgres  postgres  [.] 
hash_search_with_hash_value
+   0.06%  postgres  [kernel.kallsyms] [k] 
_spin_unlock_irqrestore
+   0.05%  init  [vmxnet3] [k] 
vmxnet3_poll_rx_only
+   0.04%  postgres  [kernel.kallsyms] [k] 
copy_user_generic_unrolled
+   0.04%  init  [kernel.kallsyms] [k] 
finish_task_switch
+   0.04%  init  [kernel.kallsyms] [k] 
__do_softirq
+   0.04%   ssh  [kernel.kallsyms] [k] 
_spin_unlock_irqrestore
+   0.04%   ssh  [vmxnet3] [k] 
vmxnet3_xmit_frame
+   0.03%  postgres  postgres  [.] PinBuffer
+   0.03%  init  [vmxnet3] [k] 
vmxnet3_xmit_frame
+   0.03%   ssh  [kernel.kallsyms] [k] 
copy_user_generic_unrolled
+   0.03%  postgres  postgres  [.] 
XLogReadBufferExtended
+   0.03%   ssh  ssh   [.] 
0x0002aa07
+   0.03%  init  [kernel.kallsyms] [k] 
_spin_unlock_irqrestore
+   0.03%   ssh  [vmxnet3] [k] 
vmxnet3_poll_rx_only
+   0.02%   ssh  [kernel.kallsyms] [k] 
__do_softirq
+   0.02%  postgres  libc-2.12.so  [.] 
_wordcopy_bwd_dest_aligned
+   0.02%  postgres  postgres  [.] mdnblocks
+   0.02%   ssh  libcrypto.so.1.0.1e   [.] 
0x000e25a1
+   0.02%   scp  [kernel.kallsyms] [k] 
copy_user_generic_unrolled
+   0.02%   ssh  libc-2.12.so  [.] memcpy
+   0.02%  postgres  libc-2.12.so  [.] memcpy


 But:
 
 76.24%  postgres   [.] StandbyReleaseLocks
 
 already is quite helpful.
 
 What are you doing on that system? Is there anything requiring large
 amounts of access exclusive locks on the primary? Possibly large amounts
 of temporary relations?


The last time we did a 100% logging run, the peak temp table creation was 
something like 120k/hr, but the replicas seemed able to keep up with that just 
fine.

Hopefully Soni can answer whether that has increased significantly since May.



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-30 Thread Jeff Frost
On Jun 30, 2014, at 12:17 PM, Jeff Frost j...@pgexperts.com wrote:

 
 already is quite helpful.
 
 What are you doing on that system? Is there anything requiring large
 amounts of access exclusive locks on the primary? Possibly large amounts
 of temporary relations?
 
 
 The last time we did a 100% logging run, the peak temp table creation was 
 something like 120k/hr, but the replicas seemed able to keep up with that 
 just fine.
 

Sampling pg_locks on the primary shows ~50 locks with ExclusiveLock mode:

   mode   | count
--+---
 AccessExclusiveLock  |11
 AccessShareLock  |  2089
 ExclusiveLock|46
 RowExclusiveLock |81
 RowShareLock |17
 ShareLock| 4
 ShareUpdateExclusiveLock | 5

Seems to be relatively consistent.  Of course, it's hard to say what it looked 
like back when the issue began.





Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-30 Thread Jeff Frost

On Jun 30, 2014, at 12:32 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Jeff Frost j...@pgexperts.com writes:
 Sampling pg_locks on the primary shows ~50 locks with ExclusiveLock mode:
 
   mode   | count
 --+---
 AccessExclusiveLock  |11
 AccessShareLock  |  2089
 ExclusiveLock|46
 RowExclusiveLock |81
 RowShareLock |17
 ShareLock| 4
 ShareUpdateExclusiveLock | 5
 
 That's not too helpful if you don't pay attention to what the lock is on;
 it's likely that all the ExclusiveLocks are on transactions' own XIDs,
 which isn't relevant to the standby's behavior.  The AccessExclusiveLocks
 are probably interesting though --- you should look to see what those
 are on.

You're right about the ExclusiveLocks.

Here's how the AccessExclusiveLocks look:

 locktype | database |  relation  | page | tuple | virtualxid | transactionid | 
classid |   objid| objsubid | virtualtransaction |  pid  |mode  
   | granted
--+--++--+---++---+-++--++---+-+-
 relation |   111285 | 3245291551 |  |   ||   | 
||  | 233/170813 | 23509 | 
AccessExclusiveLock | t
 relation |   111285 | 3245292820 |  |   ||   | 
||  | 5/22498235 | 23427 | 
AccessExclusiveLock | t
 relation |   111285 | 3245292833 |  |   ||   | 
||  | 173/1723993| 23407 | 
AccessExclusiveLock | t
 relation |   111285 | 3245287874 |  |   ||   | 
||  | 133/3818415| 23348 | 
AccessExclusiveLock | t
 relation |   111285 | 3245292836 |  |   ||   | 
||  | 173/1723993| 23407 | 
AccessExclusiveLock | t
 relation |   111285 | 3245292774 |  |   ||   | 
||  | 5/22498235 | 23427 | 
AccessExclusiveLock | t
 relation |   111285 | 3245292734 |  |   ||   | 
||  | 5/22498235 | 23427 | 
AccessExclusiveLock | t
 relation |   111285 | 3245292827 |  |   ||   | 
||  | 173/1723993| 23407 | 
AccessExclusiveLock | t
 relation |   111285 | 3245288540 |  |   ||   | 
||  | 133/3818415| 23348 | 
AccessExclusiveLock | t
 relation |   111285 | 3245292773 |  |   ||   | 
||  | 5/22498235 | 23427 | 
AccessExclusiveLock | t
 relation |   111285 | 3245292775 |  |   ||   | 
||  | 5/22498235 | 23427 | 
AccessExclusiveLock | t
 relation |   111285 | 3245292743 |  |   ||   | 
||  | 5/22498235 | 23427 | 
AccessExclusiveLock | t
 relation |   111285 | 3245292751 |  |   ||   | 
||  | 5/22498235 | 23427 | 
AccessExclusiveLock | t
 relation |   111285 | 3245288669 |  |   ||   | 
||  | 133/3818415| 23348 | 
AccessExclusiveLock | t
 relation |   111285 | 3245292817 |  |   ||   | 
||  | 5/22498235 | 23427 | 
AccessExclusiveLock | t
 relation |   111285 | 3245288657 |  |   ||   | 
||  | 133/3818415| 23348 | 
AccessExclusiveLock | t
 object   |   111285 ||  |   ||   | 
   2615 | 1246019760 |0 | 233/170813 | 23509 | 
AccessExclusiveLock | t
 relation |   111285 | 3245292746 |  |   ||   | 
||  | 5/22498235 | 23427 | 
AccessExclusiveLock | t
 relation |   111285 | 3245287876 |  |   ||   | 
||  | 133/3818415| 23348 | 
AccessExclusiveLock | t
 relation |   111285 | 3245292739 |  |   ||   | 
||  | 5/22498235 | 23427 | 
AccessExclusiveLock | t
 relation |   111285 | 3245292826 |  |   ||   | 
||  | 5/22498235 | 23427 | 
AccessExclusiveLock | t
 relation |   111285 | 3245292825 |  |   ||   | 
||  | 5/22498235 | 23427 | 
AccessExclusiveLock | t
 relation |   111285 | 3245292832

Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-30 Thread Jeff Frost

On Jun 30, 2014, at 12:54 PM, Matheus de Oliveira matioli.math...@gmail.com 
wrote:

 
 On Mon, Jun 30, 2014 at 4:42 PM, Jeff Frost j...@pgexperts.com wrote:
 And if you go fishing in pg_class for any of the oids, you don't find 
 anything:
 
 That is probably because you are connected in the wrong database. Once you 
 connect to the database of interest, you don't even need to query pg_class, 
 just cast relation attribute to regclass:
 
 SELECT relation::regclass, ...
 FROM pg_locks WHERE database = (SELECT oid FROM pg_database WHERE datname 
 = current_database());
 

Yah, i thought about that too, but verified I am in the correct DB.  Just for 
clarity sake:

SELECT relation::regclass
FROM pg_locks WHERE database = (SELECT oid FROM pg_database WHERE datname = 
current_database()) and mode = 'AccessExclusiveLock';

  relation



 3245508214
 3245508273
 3245508272
 3245508257
 3245508469
 3245508274
 3245508373
 3245508468
 3245508210
 3245508463
 3245508205
 3245508260
 3245508265
 3245508434
(16 rows)

Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-30 Thread Jeff Frost

On Jun 30, 2014, at 1:15 PM, Andres Freund and...@2ndquadrant.com wrote:

 On 2014-06-30 12:57:56 -0700, Jeff Frost wrote:
 
 On Jun 30, 2014, at 12:54 PM, Matheus de Oliveira 
 matioli.math...@gmail.com wrote:
 
 
 On Mon, Jun 30, 2014 at 4:42 PM, Jeff Frost j...@pgexperts.com wrote:
 And if you go fishing in pg_class for any of the oids, you don't find 
 anything:
 
 That is probably because you are connected in the wrong database. Once you 
 connect to the database of interest, you don't even need to query pg_class, 
 just cast relation attribute to regclass:
 
SELECT relation::regclass, ...
FROM pg_locks WHERE database = (SELECT oid FROM pg_database WHERE 
 datname = current_database());
 
 
 Yah, i thought about that too, but verified I am in the correct DB.  Just 
 for clarity sake:
 
 So these are probably relations created in uncommitted
 transactions. Possibly ON COMMIT DROP temp tables?


That would make sense.  There are definitely quite a few of those being used.

Another item of note is the system catalogs are quite bloated:

 schemaname |  tablename   | tbloat | wastedmb | idxbloat | wastedidxmb
+--++--+--+-
 pg_catalog | pg_attribute |   3945 |   106.51 | 2770 |  611.28
 pg_catalog | pg_class |   8940 |45.26 | 4420 |   47.89
 pg_catalog | pg_type  |   4921 |18.45 | 5850 |   81.16
 pg_catalog | pg_depend|933 |10.23 |11730 |  274.37
 pg_catalog | pg_index |   3429 | 8.36 | 3920 |   24.24
 pg_catalog | pg_shdepend  |983 | 2.67 | 9360 |   30.56
(6 rows)

Would that cause the replica to spin on StandbyReleaseLocks?




Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-30 Thread Jeff Frost

On Jun 30, 2014, at 1:39 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 
 
 Another item of note is the system catalogs are quite bloated:
 Would that cause the replica to spin on StandbyReleaseLocks?
 
 AFAIK, no.  It's an unsurprising consequence of heavy use of short-lived
 temp tables though.
 

Yah, this has been an issue in the past, so we tend to cluster them regularly 
during off-hours to minimize the issue.

 So it seems like we have a candidate explanation.  I'm a bit surprised
 that StandbyReleaseLocks would get this slow if there are only a dozen
 AccessExclusiveLocks in place at any one time, though.  Perhaps that
 was a low point and there are often many more?
 
   

Entirely possible that it was a low point.  We'll set up some monitoring to 
track the number of AccessExclusiveLocks and see how much variance there is 
throughout the day.




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-30 Thread Jeff Frost

On Jun 30, 2014, at 1:46 PM, Jeff Frost j...@pgexperts.com wrote:

 So it seems like we have a candidate explanation.  I'm a bit surprised
 that StandbyReleaseLocks would get this slow if there are only a dozen
 AccessExclusiveLocks in place at any one time, though.  Perhaps that
 was a low point and there are often many more?
 
  
 
 Entirely possible that it was a low point.  We'll set up some monitoring to 
 track the number of AccessExclusiveLocks and see how much variance there is 
 throughout the day.


Since we turned on the monitoring for that, we had a peak of 13,550 
AccessExclusiveLocks.  So far most of the samples have been in the double 
digit, with that and two other outliers: 6,118 and 12,747.

Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-30 Thread Jeff Frost

On Jun 30, 2014, at 4:04 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Ah ... that's more like a number I can believe something would have
 trouble coping with.  Did you see a noticeable slowdown with this?
 Now that we've seen that number, of course it's possible there was an
 even higher peak occurring when you saw the trouble.
 
 Perhaps there's an O(N^2) behavior in StandbyReleaseLocks, or maybe
 it just takes awhile to handle that many locks.
 
 Did you check whether the locks were all on temp tables of the
 ON COMMIT DROP persuasion?


Unfortunately not, because I went for a poor man's: SELECT count(*) FROM 
pg_locks WHERE mode = 'AccessExclusiveLock' 
run in cron every minute.

That said, I'd bet it was mostly ON COMMIT DROP temp tables.

The unfortunate thing is I wouldn't know how to correlate that spike with the 
corresponding slowdown because the replica is about 5.5hrs lagged at the moment.

Hopefully it will get caught up tonight and we can see if there's a correlation 
tomorrow.

Re: [PERFORM] High CPU usage / load average after upgrading to Ubuntu 12.04

2013-02-25 Thread Jeff Frost
On 02/20/13 19:14, Josh Berkus wrote:
 Sounds to me like your IO system is stalling on fsyncs or something
 like that.  On machines with plenty of IO cranking up completion
 target usuall smooths things out. 
 It certainly seems like it does.  However, I can't demonstrate the issue
 using any simpler tool than pgbench ... even running four test_fsyncs in
 parallel didn't show any issues, nor do standard FS testing tools.


We were really starting to think that the system had an IO problem that we
couldn't tickle with any synthetic tools.  Then one of our other customers who
upgraded to Ubuntu 12.04 LTS and is also experiencing issues came across the
following LKML thread regarding pdflush on 3.0+ kernels:

https://lkml.org/lkml/2012/10/9/210

So, I went and built a couple custom kernels with this patch removed:

https://patchwork.kernel.org/patch/825212/

and the bad behavior stopped.   Best performance was with a 3.5 kernel with
the patch removed.



-- 
Jeff Frost j...@pgexperts.com
CTO, PostgreSQL Experts, Inc.
Phone: 1-888-PG-EXPRT x506
FAX: 415-762-5122
http://www.pgexperts.com/ 



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] strange index behaviour with different statistics target

2009-01-13 Thread Jeff Frost
So, I had a query that uses a postgis geometry index and the planner was
underestimating the number of rows it would return.  Because of this,
the planner was choosing the geometry index over a compound index on the
other columns in the WHERE clause.  So, I thought, let me increase the
stats target for that geometry column.  I did, and I got a different
(and better) plan, but when I looked at the estimates for the simplified
query against the geometry column alone, I noticed that neither the cost
nor the estimated rows changed:

oitest=# explain ANALYZE  SELECT * FROM blips WHERE
((ST_Contains(blips.shape,
'010120E61049111956F1EB55C0A8E49CD843F34440')) );

 
QUERY
PLAN
  

---
 Index Scan using index_blips_on_shape_gist on blips  (cost=0.00..7.33
rows=1 width=13804) (actual time=0.113..745.394 rows=2827 loops=1)
   Index Cond: (shape 
'010120E61049111956F1EB55C0A8E49CD843F34440'::geometry)
   Filter: ((shape 
'010120E61049111956F1EB55C0A8E49CD843F34440'::geometry) AND
_st_contains(shape,
'010120E61049111956F1EB55C0A8E49CD843F34440'::geometry))
 Total runtime: 745.977 ms
(4 rows)

Time: 747.199 ms
oitest=# ALTER TABLE blips ALTER COLUMN shape SET statistics 1000;
ALTER TABLE
Time: 0.478 ms
oitest=# ANALYZE ;
ANALYZE
Time: 7727.097 ms
oitest=# explain ANALYZE  SELECT * FROM blips WHERE
((ST_Contains(blips.shape,
'010120E61049111956F1EB55C0A8E49CD843F34440')) );

 
QUERY
PLAN
  

---
 Index Scan using index_blips_on_shape_gist on blips  (cost=0.00..7.33
rows=1 width=13761) (actual time=0.117..755.781 rows=2827 loops=1)
   Index Cond: (shape 
'010120E61049111956F1EB55C0A8E49CD843F34440'::geometry)
   Filter: ((shape 
'010120E61049111956F1EB55C0A8E49CD843F34440'::geometry) AND
_st_contains(shape,
'010120E61049111956F1EB55C0A8E49CD843F34440'::geometry))
 Total runtime: 756.396 ms
(4 rows)

The width changed slightly, but the cost is 7.33 in both.

So, now I thought how could that have changed the plan?  Did the other
parts of the plan estimate change?  So I pulled the shape column out of
the where clause and left the others:

oitest=# ALTER TABLE blips ALTER COLUMN shape SET statistics 100;
ALTER TABLE
Time: 0.475 ms
oitest=# ANALYZE ;
ANALYZE
Time: 1225.325 ms
oitest=# explain ANALYZE  SELECT * FROM blips WHERE
(blips.content_id = 2410268 AND blips.content_type = E'Story');
 
QUERY
PLAN 
--
 Index Scan using index_blips_on_content_type_and_content_id on blips 
(cost=0.00..9.01 rows=2 width=13924) (actual time=0.026..0.027 rows=2
loops=1)
   Index Cond: (((content_type)::text = 'Story'::text) AND (content_id =
2410268))
 Total runtime: 0.046 ms
(3 rows)

Time: 1.111 ms
oitest=# ALTER TABLE blips ALTER COLUMN shape SET statistics 1000;
ALTER TABLE
Time: 0.506 ms
oitest=# ANALYZE ;
ANALYZE
Time: 7785.496 ms
oitest=# explain ANALYZE  SELECT * FROM blips WHERE
(blips.content_id = 2410268 AND blips.content_type = E'Story');
 QUERY
PLAN 
-
 Index Scan using index_blips_on_content_id on blips  (cost=0.00..7.29
rows=1 width=13761) (actual time=0.013..0.014 rows=2 loops=1)
   Index Cond: (content_id = 2410268)
   Filter: ((content_type)::text = 'Story'::text)
 Total runtime: 0.034 ms
(4 rows)

Time: 1.007 ms

So, my question is, should changing the stats target on the shape column
affect the stats for the content_id and content_type columns?  Also, why
does the index on content_id win out over the compound index on
(content_type, content_id)?

index_blips_on_content_id btree (content_id)
index_blips_on_content_type_and_content_id btree (content_type,
content_id)

-- 
Jeff Frost, Owner   j...@frostconsultingllc.com
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 916-647-6411 FAX: 916-405-4032


-- 
Sent via pgsql-performance mailing list (pgsql

Re: [PERFORM] strange index behaviour with different statistics target

2009-01-13 Thread Jeff Frost

On Tue, 13 Jan 2009, Tom Lane wrote:


Jeff Frost j...@frostconsultingllc.com writes:

So, my question is, should changing the stats target on the shape column
affect the stats for the content_id and content_type columns?


It would change the size of the sample for the table, which might
improve the accuracy of the stats.  IIRC you'd still get the same number
of histogram entries and most-common-values for the other columns, but
they might be more accurate.


Why would they be more accurate?  Do they somehow correlate with the other 
column's histogram and most-common-values when the stats target is increased 
on that column?


The planner is choosing a plan I like for the query, I'm just trying to 
understand why it's doing that since the planner thinks the gist index is 
going to give it a single row (vs the 2827 rows it actually gets) and the fact 
that the cost didn't change for perusing the gist index.  I guess I was 
expecting the estimated rowcount and cost for perusing the gist index to go up 
and when it didn't I was pleasantly surprised to find I got a plan I wanted 
anyway.





Also, why does the index on content_id win out over the compound index
on (content_type, content_id)?


It's deciding (apparently correctly, from the explain results) that the
larger index isn't increasing the selectivity enough to be worth its
extra search cost.  I suppose content_type = 'Story' isn't very
selective in this table?


Ah!  You're right, especially with this content_id!

--
Jeff Frost, Owner   j...@frostconsultingllc.com
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 916-647-6411 FAX: 916-405-4032

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index usage problem on 8.3.3

2008-10-31 Thread Jeff Frost

On Fri, 31 Oct 2008, Gregory Stark wrote:


Tom Lane [EMAIL PROTECTED] writes:


Jeff Frost [EMAIL PROTECTED] writes:

Tom Lane wrote:

Huh.  That does sound like it's a version-to-version difference.
There's nothing in the CVS log that seems related though.  Are you
willing to post your test case?


It's a customer DB, so I'll contact them and see if we can boil it down
to a test case with no sensitive data.


Well, if there was a change it seems to have been in the right direction
;-) so this is mostly just idle curiosity.  Don't jump through hoops to
get a test case.


Assuming it's not a bug...


Well, after boiling down my test case to the bare essentials, I was unable to 
reproduce the different behavior between 8.3.3 and 8.3.4.  Now, I've gone back 
to the original script and can't reproduce the behavior I previously saw on 
8.3.4 and my screen session doesn't have enough scrollback to look at what 
happened previously.  I was thinking perhaps I had inadvertently committed the 
transaction, but then the act would have been dropped as it's a temp table 
created with ON COMMIT DROP.  But, I've tested 3 times in a row and every time 
8.3.4 uses the seq scan just like 8.3.3 now, so I must've done something 
differently to get that result as Tom had originally suggested.  I just can't 
think what it might have been.  Perhaps it's time to buy some glasses. :-/


--
Jeff Frost, Owner   [EMAIL PROTECTED]
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 916-647-6411 FAX: 916-405-4032

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Index usage problem on 8.3.3

2008-10-30 Thread Jeff Frost
I've run across a strange problem with PG 8.3.3 not using indexes on a
particular table after building the table during a transaction.

You can see a transcript of the issue here:

http://gist.github.com/21154

Interestingly, if I create another temp table 'CREATE TEMP TABLE AS
SELECT * FROM act' as seen on line 107, then add the same indexes to
that table, PG will use the indexes.  While it's not in the gist
transcript, even an extremely simple query like:

SELECT * FROM act WHERE act_usr_id = 1;

will not use the index on the original act table, but the jefftest and
jefftest2 tables both work fine.  As you can probably see in the
transcript, the tables have been ANALYZEd.  I even tried 'enable
seqscan=0;' and that made the cost really high for the seq scan, but the
planner still chose the seq scan.

The issue does not affect 8.2.3 nor does it affect 8.3.4.  I didn't see
any mention of a fix for this sort of thing in 8.3.4's release notes.  I
was wondering if this is a known bug in 8.3.3 (and maybe other 8.3.x
versions) and just didn't make it into the release notes of 8.3.4?

-- 
Jeff Frost, Owner   [EMAIL PROTECTED]
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 916-647-6411 FAX: 916-405-4032


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index usage problem on 8.3.3

2008-10-30 Thread Jeff Frost
Tom Lane wrote:
 Okay.  What that means is that the indexes were created on data that had
 already been inserted and updated to some extent, resulting in
 HOT-update chains that turned out to be illegal for the new indexes.
 The way we deal with this is to mark the indexes as not usable by any
 query that can still see the dead HOT-updated tuples.

 Your best bet for dodging the problem is probably to break the operation
 into two transactions, if that's possible.  INSERT and UPDATE in the
 first xact, create the indexes at the start of the second.  (Hmm ...
 I'm not sure if that's sufficient if there are other concurrent
 transactions; but it's certainly necessary.)  Another possibility is
 to create the indexes just after data load, before you start updating
 the columns they're on.

   
Thanks Tom!

Any idea why I don't see it on 8.3.4?

-- 
Jeff Frost, Owner   [EMAIL PROTECTED]
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 916-647-6411 FAX: 916-405-4032


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index usage problem on 8.3.3

2008-10-30 Thread Jeff Frost

On Thu, 30 Oct 2008, Tom Lane wrote:


Any idea why I don't see it on 8.3.4?


I think it's more likely some small difference in your test conditions
than any real version-to-version difference.  In particular I think the
still see test might be influenced by the ages of transactions running
concurrently.


Interesting.  This is on a test server which has no other concurrent 
transactions and it acts the same way after I stopped 8.3.4 and started up 
8.3.3 again as it did before stopping 8.3.3 to bring up 8.3.4.  Hrmm..I'm not 
sure that makes sense.  So, I did the test with the sql script on 8.3.3, then 
shut down 8.3.3, started up 8.3.4 on the same data dir, ran the test 
successfully.  Next I shut down 8.3.4 and started 8.3.3 and verified that the 
behavior was still the same on 8.3.3.  I wonder what else I might be doing 
differently.


The good news is that making the indexes before the updates seems to make the 
planner happy!


--
Jeff Frost, Owner   [EMAIL PROTECTED]
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 916-647-6411 FAX: 916-405-4032

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index usage problem on 8.3.3

2008-10-30 Thread Jeff Frost


Tom Lane wrote:
 Jeff Frost [EMAIL PROTECTED] writes:
   
 On Thu, 30 Oct 2008, Tom Lane wrote:
 
 Any idea why I don't see it on 8.3.4?
 
 I think it's more likely some small difference in your test conditions
 than any real version-to-version difference.  In particular I think the
 still see test might be influenced by the ages of transactions running
 concurrently.
   

   
 Interesting.  This is on a test server which has no other concurrent 
 transactions and it acts the same way after I stopped 8.3.4 and started up 
 8.3.3 again as it did before stopping 8.3.3 to bring up 8.3.4.  Hrmm..I'm 
 not 
 sure that makes sense.  So, I did the test with the sql script on 8.3.3, 
 then 
 shut down 8.3.3, started up 8.3.4 on the same data dir, ran the test 
 successfully.  Next I shut down 8.3.4 and started 8.3.3 and verified that 
 the 
 behavior was still the same on 8.3.3.  I wonder what else I might be doing 
 differently.
 

 Huh.  That does sound like it's a version-to-version difference.
 There's nothing in the CVS log that seems related though.  Are you
 willing to post your test case?
   
   
It's a customer DB, so I'll contact them and see if we can boil it down
to a test case with no sensitive data.

-- 
Jeff Frost, Owner   [EMAIL PROTECTED]
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 916-647-6411 FAX: 916-405-4032



Re: [PERFORM] index scan cost

2008-08-09 Thread Jeff Frost

Tom Lane wrote:

Jeff Frost [EMAIL PROTECTED] writes:
  
I have two postgresql servers.  One runs 8.3.1, the other 8.3.3.  On the 8.3.1 
machine, the index scans are being planned extremely low cost:



  
  Index Scan using ix_email_entity_thread on email_entity  (cost=0.00..4.59 
rows=1 width=1031) (actual time=0.095..0.120 rows=4 loops=1)

Index Cond: (email_thread = 375629157)



  
  Index Scan using ix_email_entity_thread on email_entity  (cost=0.00..2218.61 
rows=1151 width=931) (actual time=0.094..0.111 rows=4 loops=1)

Index Cond: (email_thread = 375629157)



This isn't a cost problem, this is a stats problem.  Why does the
second server think 1151 rows will be returned?  Try comparing the
pg_stats entries for the email_thread column on both servers ... seems
like they must be significantly different.
  
Sorry it took me a while to close the loop on this.  So, the server that 
had the less desirable plan had actually been analyzed more recently by 
autovacuum.  When I went back to compare the stats on the faster server, 
autovacuum had analyzed it and the plan was now more similar.  Adjusting 
the stats target up for that column helped on both servers though it 
never did get back as close as before.


--
Jeff Frost, Owner   [EMAIL PROTECTED]
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 916-647-6411 FAX: 916-405-4032



[PERFORM] index scan cost

2008-07-17 Thread Jeff Frost
, the reason I started looking at this is that the high cost 
changes the plan of a more complex query for the worse.


Any idea what might be influencing the plan on the other server?  I tried 
increasing the statistics target on the email_thread column and that helped to 
a certain extent.  Setting the statistics target to 1000 gets me a good enough 
plan to help the complex query in question:


   QUERY PLAN
-
 Index Scan using ix_email_entity_thread on email_entity  (cost=0.00..26.36 
rows=12 width=913) (actual time=0.028..0.040 rows=4 loops=1)

   Index Cond: (email_thread = 375629157)
 Total runtime: 0.092 ms
(3 rows)

But 26.36 is still not 4.59 like the other server estimates AND the statistics 
target on that column is just the default 10 on the server with the 4.59 cost 
estimate.


--
Jeff Frost, Owner   [EMAIL PROTECTED]
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] index scan cost

2008-07-17 Thread Jeff Frost

On Fri, 18 Jul 2008, Dennis Brakhane wrote:


The fast server makes a much more accurate estimation of the number
of rows to expect (4 rows are returning, 1 was estimated). The slow
server estimates 1151 rows. Try running ANALYZE on the slow one


You're quite right.  I probably didn't mention that the slow one has been 
analyzed several times.  In fact, every time adjusted the statistics target 
for that column I analyzed, thus the eventually better, but still inaccurate 
estimates toward the bottom of the post.


--
Jeff Frost, Owner   [EMAIL PROTECTED]
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] dell versus hp

2007-11-14 Thread Jeff Frost

On Wed, 14 Nov 2007, Alan Hodgson wrote:


On Tuesday 13 November 2007, Jeff Frost [EMAIL PROTECTED] wrote:

Ok, Areca ARC1261ML.  Note that results were similar for an 8 drive RAID6
vs 8 drive RAID10, but I don't have those bonnie results any longer.

Version  1.03   --Sequential Output-- --Sequential Input-
--Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
/sec %CP 14xRAID663G 73967  99 455162  58 164543  23 77637  99
438570  31 912.2   1 --Sequential Create-- Random
Create -Create-- --Read--- -Delete-- -Create-- --Read---
-Delete-- files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec
%CP 16 12815  63 + +++ 13041  61 12846  67 + +++ 12871  59


Version  1.03   --Sequential Output-- --Sequential Input-
--Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
/sec %CP 14xRAID10   63G 63968  92 246143  68 140634  30 77722  99
510904 36 607.8   0 --Sequential Create-- Random
Create -Create-- --Read--- -Delete-- -Create-- --Read---
-Delete-- files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP /sec
%CP 16  6655  16 + +++  5755  12  7259  17 + +++ 5550  12


OK, impressive RAID-6 performance (not so impressive RAID-10 performance,
but that could be a filesystem issue). Note to self; try an Areca
controller in next storage server.


I believe these were both on ext3.  I thought I had some XFS results available 
for comparison, but I couldn't find them.


--
Jeff Frost, Owner   [EMAIL PROTECTED]
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] dell versus hp

2007-11-14 Thread Jeff Frost

On Wed, 14 Nov 2007, Merlin Moncure wrote:


On Nov 14, 2007 5:24 PM, Alan Hodgson [EMAIL PROTECTED] wrote:

On Tuesday 13 November 2007, Jeff Frost [EMAIL PROTECTED] wrote:

Ok, Areca ARC1261ML.  Note that results were similar for an 8 drive RAID6
vs 8 drive RAID10, but I don't have those bonnie results any longer.

Version  1.03   --Sequential Output-- --Sequential Input-
--Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
/sec %CP 14xRAID663G 73967  99 455162  58 164543  23 77637  99
438570  31 912.2   1 --Sequential Create-- Random
Create -Create-- --Read--- -Delete-- -Create-- --Read---
-Delete-- files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec
%CP 16 12815  63 + +++ 13041  61 12846  67 + +++ 12871  59


Version  1.03   --Sequential Output-- --Sequential Input-
--Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
/sec %CP 14xRAID10   63G 63968  92 246143  68 140634  30 77722  99
510904 36 607.8   0 --Sequential Create-- Random
Create -Create-- --Read--- -Delete-- -Create-- --Read---
-Delete-- files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP /sec
%CP 16  6655  16 + +++  5755  12  7259  17 + +++ 5550  12


OK, impressive RAID-6 performance (not so impressive RAID-10 performance,
but that could be a filesystem issue). Note to self; try an Areca
controller in next storage server.


607 seeks/sec on a 8 drive raid 10 is terrible...this is not as
dependant on filesystem as sequential performance...


Then this must be horrible since it's a 14 drive raid 10. :-/  If we had more 
time for the testing, I would have tried a bunch of RAID1 volumes and 
used software RAID0 to add the +0 bit and see how that performed.


Merlin, what sort of seeks/sec from bonnie++ do you normally see from your 
RAID10 volumes?


On an 8xRAID10 volume with the smaller Areca controller we were seeing around 
450 seeks/sec.


--
Jeff Frost, Owner   [EMAIL PROTECTED]
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] dell versus hp

2007-11-13 Thread Jeff Frost

On Tue, 13 Nov 2007, Alan Hodgson wrote:


OK, I'll bite. Name one RAID controller that gives better write
performance in RAID 6 than it does in RAID 10, and post the benchmarks.

I'll grant a theoretical reliability edge to RAID 6 (although actual
implementations are a lot more iffy), but not performance.


Ok, Areca ARC1261ML.  Note that results were similar for an 8 drive RAID6 vs 8 
drive RAID10, but I don't have those bonnie results any longer.


Version  1.03   --Sequential Output-- --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
14xRAID663G 73967  99 455162  58 164543  23 77637  99 438570  31 912.2  
 1
--Sequential Create-- Random Create
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
  files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
 16 12815  63 + +++ 13041  61 12846  67 + +++ 12871  59


Version  1.03   --Sequential Output-- --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
14xRAID10   63G 63968  92 246143  68 140634  30 77722  99 510904 36 607.8   0
--Sequential Create-- Random Create
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
  files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP /sec %CP
 16  6655  16 + +++  5755  12  7259  17 + +++ 5550  12



--
Jeff Frost, Owner   [EMAIL PROTECTED]
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] query plan worse after analyze

2007-10-06 Thread Jeff Frost

On Sat, 6 Oct 2007, Tom Lane wrote:


Jeff Frost [EMAIL PROTECTED] writes:

Before analyze it seems to choose Bitmap Heap Scan on episodes
current_episode, but after it chooses Index Scan Backward using
index_episodes_on_publish_on on episodes current_episode.


Have you tried raising the stats target for episodes?  Seems like
the problem is a misestimate of the frequency of matches for
season_id = something.


Can you set the stats target for an entire table up?

I tried this:

ALTER TABLE episodes ALTER COLUMN season_id SET STATISTICS 1000;

and got the same plan.

And since I had this on a test server, I set the default stats target 
up to 100, reran analyze and got the same plan.


Same if I up it to 1000. :-(

--
Jeff Frost, Owner   [EMAIL PROTECTED]
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] query plan worse after analyze

2007-10-05 Thread Jeff Frost
)
   Index Cond:
(season_id = $0)
   -  Seq Scan on seasons current_seasons_shows
(cost=0.00..1.59 rows=59 width=8) (actual time=0.002
..0.018 rows=59 loops=267)
 Filter: (id IS NOT NULL)
 -  Bitmap Heap Scan on images  (cost=0.58..2.59 rows=1
width=4) (actual time=0.003..0.003 rows=1 loops=5
00)
   Recheck Cond: (images.id =
shows.landing_page_image_id)
   -  Bitmap Index Scan on images_pkey
(cost=0.00..0.58 rows=1 width=0) (actual time=0.002..0.002 ro
ws=1 loops=500)
 Index Cond: (images.id =
shows.landing_page_image_id)
   -  Bitmap Heap Scan on episodes current_episodes_seasons
(cost=0.39..2.51 rows=12 width=4) (actual time=0.006
..0.010 rows=12 loops=500)
 Recheck Cond: (current_episodes_seasons.season_id =
current_seasons_shows.id)
 -  Bitmap Index Scan on index_episodes_on_season_id
(cost=0.00..0.39 rows=12 width=0) (actual time=0.00
4..0.004 rows=12 loops=500)
   Index Cond: (current_episodes_seasons.season_id =
current_seasons_shows.id)
   SubPlan
 -  Aggregate  (cost=15.68..15.69 rows=1 width=8) (actual
time=0.022..0.022 rows=1 loops=6229)
   -  Bitmap Heap Scan on episodes current_episode
(cost=2.34..15.65 rows=11 width=8) (actual time=0.007
..0.016 rows=13 loops=6229)
 Recheck Cond: (season_id = $0)
 Filter: ((publish_on IS NOT NULL) AND
(publish_on = now()))
 -  Bitmap Index Scan on
index_episodes_on_season_id  (cost=0.00..2.34 rows=12 width=0) (actual t
ime=0.004..0.004 rows=13 loops=6229)
   Index Cond: (season_id = $0)
 Total runtime: 183.160 ms
(55 rows)



Jeff Frost, Owner   [EMAIL PROTECTED]
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] query plan worse after analyze

2007-10-05 Thread Jeff Frost

On Fri, 5 Oct 2007, Stephen Frost wrote:


* Jeff Frost ([EMAIL PROTECTED]) wrote:

Here are the plans:


It's probably just me but, honestly, I find it terribly frustrating to
try and read a line-wrapped explain-analyze output...  I realize it
might not be something you can control in your mailer, but you might
consider putting the various plans up somewhere online (perhaps a
pastebin like http://pgsql.privatepaste.com) instead of or in addition
to sending it in the email.


It's not you.  In fact, after I sent this and saw what it looked like, I put 
it into a txt file and replied with an attachment.  Unfortunately, it didn't 
bounce, nor did it show up on the list. :-(


So, here's a pastebin...it's a bit better:

http://pastebin.com/m4f0194b

--
Jeff Frost, Owner   [EMAIL PROTECTED]
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] How to ENABLE SQL capturing???

2007-08-10 Thread Jeff Frost

Michelle,

What platform are you on?  If you're on linux, than logging to syslog will 
likely show up in the /var/log/messages file.


On Fri, 10 Aug 2007, smiley2211 wrote:



Hello all,

I have ENABLED this 'log_min_duration_statement = 100 but I can't figure
out WHERE it's writing the commands to ...I have it set to 'syslogs' but
this file is 0 bytes :confused:

Should I set other parameters in my postgresql.conf file???

Thanks...Michelle


Bryan Murphy-3 wrote:


we currently have logging enabled for all queries over 100ms, and keep
the last 24 hours of logs before we rotate them.  I've found this tool
very helpful in diagnosing new performance problems that crop up:

http://pgfouine.projects.postgresql.org/

Bryan

On 8/8/07, Steinar H. Gunderson [EMAIL PROTECTED] wrote:

On Wed, Aug 08, 2007 at 01:02:24PM -0700, smiley2211 wrote:

I am trying to enable capturing of the submitted code via an
application...how do I do this in Postgres?  Performance is SLOW on my
server and I have autovacuum enabled as well as rebuilt

indexes...whatelse

should be looked at?


Try log_min_duration_statement = 100 in postgresql.conf; it will show
all
statements that take more than 100ms. Set to 0 to log _all_ statements,
or
-1 to turn the logging back off.

/* Steinar */
--
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 6: explain analyze is your friend



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq







--
Jeff Frost, Owner   [EMAIL PROTECTED]
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Slow Postgresql server

2007-04-12 Thread Jeff Frost

On Thu, 12 Apr 2007, Jason Lustig wrote:


0 -- BM starts here
10  0180 700436  16420  9174000 0   176  278  2923 59 41  0 
0  0
11  0180 696736  16420  9174000 0 0  254  2904 57 43  0 
0  0
12  0180 691272  16420  9174000 0 0  255  3043 60 39  1 
0  0
9  0180 690396  16420  9174000 0 0  254  3078 63 36  2  0 
0


Obviously, I've turned off logging now but I'd like to get it running again 
(without bogging down the server) so that I can profile the system and find 
out which queries I need to optimize. My logging settings (with unnecessary 
comments taken out) were:


So what did you get in the logs when you had logging turned on?  If you have 
the statement logging, perhaps it's worth running through pgfouine to generate 
a report.




log_destination = 'syslog'# Valid values are combinations of
redirect_stderr = off   # Enable capturing of stderr into log
log_min_duration_statement =  0  # -1 is disabled, 0 logs all 
statements

silent_mode = on# DO NOT USE without syslog or
log_duration = off
log_line_prefix = 'user=%u,db=%d'   # Special values:
log_statement = 'none'  # none, ddl, mod, all



Perhaps you just want to log slow queries  100ms?  But since you don't seem 
to know what queries you're running on each web page, I'd suggest you just 
turn on the following and run your benchmark against it, then turn it back 
off:


log_duration = on
log_statement = 'all'

Then go grab pgfouine and run the report against the logs to see what queries 
are chewing up all your time.


So you know, we're using Postgres 8.2.3. The database currently is pretty 
small (we're just running a testing database right now with a few megabytes 
of data). No doubt some of our queries are slow, but I was concerned because 
no matter how slow the queries were (at most the worst were taking a couple 
of msecs anyway), I was getting ridiculously slow responses from the server. 
Outside of logging, our only other non-default postgresql.conf items are:


shared_buffers = 13000  # min 128kB or max_connections*16kB
work_mem = 8096 # min 64kB

In terms of the server itself, I think that it uses software raid. How can I 
tell? Our hosting company set it up with the server so I guess I could ask 
them, but is there a program I can run which will tell me the information? I 
also ran bonnie++ and got this output:


Version  1.03   --Sequential Output-- --Sequential Input- 
--Random-
  -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
--Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec 
%CP

pgtest 2000M 29277  67 33819  15 15446   4 35144  62 48887   5 152.7   0
  --Sequential Create-- Random 
Create
  -Create-- --Read--- -Delete-- -Create-- --Read--- 
-Delete--
files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec 
%CP
   16 17886  77 + +++ + +++ 23258  99 + +++ + 
+++


So I'm getting 33MB and 48MB write/read respectively. Is this slow? Is there 
anything I should be doing to optimize our RAID configuration?




It's not fast, but at least it's about the same speed as an average IDE drive 
from this era.  More disks would help, but since you indicate the DB fits in 
RAM with plenty of room to spare, how about you update your 
effective_cache_size to something reasonable.  You can use the output of the 
'free' command and take the cache number and divide by 8 to get a reasonable 
value on linux.  Then turn on logging and run your benchmark.  After that, run 
a pgfouine report against the log and post us the explain analyze from your 
slow queries.


And if Ron is indeed local, it might be worthwhile to contact him.  Someone 
onsite would likely get this taken care of much faster than we can on the 
mailing list.


--
Jeff Frost, Owner   [EMAIL PROTECTED]
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Slow Postgresql server

2007-04-12 Thread Jeff Frost

On Thu, 12 Apr 2007, Scott Marlowe wrote:


On Thu, 2007-04-12 at 10:19, Guido Neitzer wrote:

On 12.04.2007, at 08:59, Ron wrote:




Depends. As I said - if the whole DB fits into the remaining space,
and a lot of website backend DBs do, it might just work out. But this
seems not to be the case - either the site is chewing on seq scans
all the time which will cause I/O or it is bound by the lack of
memory and swaps the whole time ... He has to find out.


It could also be something as simple as a very bloated data store.

I'd ask the user what vacuum verbose says at the end


You know, I should answer emails at night...we didn't ask when the last time 
the data was vacuumed or analyzed and I believe he indicated that the only 
non-default values were memory related, so no autovacuum running.


Jason,

Before you go any further, run 'vacuum analyze;' on your DB if you're not 
doing this with regularity and strongly consider enabling autovacuum.


--
Jeff Frost, Owner   [EMAIL PROTECTED]
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] Slow Postgresql server

2007-04-11 Thread Jeff Frost

On Wed, 11 Apr 2007, Jason Lustig wrote:


Hello all,

My website has been having issues with our new Linux/PostgreSQL server being 
somewhat slow. I have done tests using Apache Benchmark and for pages that do 
not connect to Postgres, the speeds are much faster (334 requests/second v. 
1-2 requests/second), so it seems that Postgres is what's causing the problem 
and not Apache. I did some reserach, and it seems that the bottleneck is in 
fact the hard drives! Here's an excerpt from vmstat:


procs ---memory-- ---swap-- -io --system-- 
-cpu--
r  b   swpd   free   buff  cache   si   sobibo   incs us sy id wa 
st
1  1140  24780 166636 57514400 0  3900 1462  3299  1  4 49 48 
0
0  1140  24780 166636 57514400 0  3828 1455  3391  0  4 48 48 
0
1  1140  24780 166636 57514400 0  2440  960  2033  0  3 48 48 
0
0  1140  24780 166636 57514400 0  2552 1001  2131  0  2 50 49 
0
0  1140  24780 166636 57514400 0  3188 1233  2755  0  3 49 48 
0
0  1140  24780 166636 57514400 0  2048  868  1812  0  2 49 49 
0
0  1140  24780 166636 57514400 0  2720 1094  2386  0  3 49 49 
0


As you can see, almost 50% of the CPU is waiting on I/O. This doesn't seem 
like it should be happening, however, since we are using a RAID 1 setup 
(160+160). We have 1GB ram, and have upped shared_buffers to 13000 and 
work_mem to 8096. What would cause the computer to only use such a small 
percentage of the CPU, with more than half of it waiting on I/O requests?


Well, the simple answer is a slow disk subsystem.  Is it hardware or software 
RAID1?  If hardware, what's the RAID controller?  Based on your vmstat output, 
I'd guess that this query activity is all writes since I see only blocks out. 
Can you identify what the slow queries are?  What version of postgres?  How 
large is the database?  Can you post the non-default values in your 
postgresql.conf?


I'd suggest you test your disk subsystem to see if it's as performant as you 
think with bonnie++.  Here's some output from my RAID1 test server:


Version  1.03   --Sequential Output-- --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
pgtest   4G 47090  92 52348  11 30954   6 41838  65 73396   8 255.9  1
--Sequential Create-- Random Create
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
  files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
 16   894   2 + +++   854   1   817   2 + +++   969 2

So, that's 52MB/sec block writes and 73MB/sec block reads.  That's typical of 
a RAID1 on 2 semi-fast SATA drives.


If you're doing writes to the DB on every web page, you might consider playing 
with the commit_delay and commit_siblings parameters in the postgresql.conf. 
Also, if you're doing multiple inserts as separate transactions, you should 
consider batching them up in one transaction.


--
Jeff Frost, Owner   [EMAIL PROTECTED]
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] SCSI vs SATA

2007-04-05 Thread Jeff Frost

On Thu, 5 Apr 2007, Scott Marlowe wrote:


I've read some recent contrary advice. Specifically advising the
sharing of all files (pg_xlogs, indices, etc..) on a huge raid array
and letting the drives load balance by brute force.


The other, at first almost counter-intuitive result was that putting
pg_xlog on a different partition on the same array (i.e. one big
physical partition broken up into multiple logical ones) because the OS
overhead of writing all the data to one file system caused performance
issues.  Can't remember who reported the performance increase of the top
of my head.


I noticed this behavior on the last Areca based 8 disk Raptor system I built. 
Putting pg_xlog on a separate partition on the same logical volume was faster 
than putting it on the large volume.  It was also faster to have 8xRAID10 for 
OS+data+pg_xlog vs 6xRAID10 for data and 2xRAID1 for pg_xlog+OS.  Your 
workload may vary, but it's definitely worth testing.  The system in question 
had 1GB BBU.


--
Jeff Frost, Owner   [EMAIL PROTECTED]
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] High Load on Postgres 7.4.16 Server

2007-04-05 Thread Jeff Frost

On Thu, 5 Apr 2007, John Allgood wrote:


Hello All

I sent this message to the admin list and it never got through so I
am trying the performance list.
We moved our application to a new machine last night. It is a Dell
PowerEdge 6950 2X Dual Core. AMD Opteron 8214 2.2Ghz. 8GB Memory. The
machine is running Redhat AS 4 Upd 4 and Redhat Cluster Suite. The SAN is an
EMC SAS connected via fibre. We are using Postgres 7.4.16. We have recently
had some major hardware issues and replaced the hardware with brand new Dell
equipment. We expected a major performance increase over the previous being
the old equipment was nearly three years old
I will try and explain how things are configured. We have 10
separate postmasters running 5 on each node. Each of the postmasters is a
single instance of each database. Each database is separated by division and
also we have them separate so we can restart an postmaster with needing to
restart all databases My largest database is about 7 GB. And the others run
anywhere from 100MB - 1.8GB.
The other configuration was RHEL3 and Postgres 7.4.13 and Redhat
Cluster Suite. The application seemed to run much faster on the older
equipment.
My thoughts on the issues are that I could be something with the OS
tuning. Here is what my kernel.shmmax, kernel.shmall = 1073741824. Is there
something else that I could tune in the OS. My max_connections=35 and shared
buffers=8192 for my largest database.


John,

Was the SAN connected to the previous machine or is it also a new addition 
with the Dell hardware?  We had a fairly recent post regarding a similar 
upgrade in which the SAN ended up being the problem, so the first thing I 
would do is test the SAN with bonnie-++ and/or move your application to use a 
local disk and test again.  With 8GB of RAM, I'd probably set the 
shared_buffers to at least 5...If I remember correctly, this was the most 
you could set it to on 7.4.x and continue benefitting from it.  I'd strongly 
encourage you to upgrade to at least 8.1.8 (and possibly 8.2.3) if you can, as 
it has much better shared memory management.  You might also want to double 
check your effective_cache_size and random_page_cost to see if they are set to 
reasonable values.  Did you just copy the old postgresql.conf over?


This is the beginning of the thread I mentioned above:

http://archives.postgresql.org/pgsql-performance/2007-03/msg00104.php

--
Jeff Frost, Owner   [EMAIL PROTECTED]
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] High Load on Postgres 7.4.16 Server

2007-04-05 Thread Jeff Frost

On Thu, 5 Apr 2007, John Allgood wrote:


The hard thing about running multiple postmasters is that you have to tune
each one separate. Most of the databases I have limited the max-connections
to 30-50 depending on the database. What would reasonable values for
effective_cache_size and random_page_cost. I think I have these default.
Also what about kernel buffers on RHEL4.


Normally, you would look at the output of 'free' and set it to the amount of 
cache/8.  For example:


 total   used   free sharedbuffers cached
Mem:   20551202025632  29488  0 505168 368132
-/+ buffers/cache:1152332 902788
Swap:  2048184   23802045804

So, you could take 902788/8 = 112848.  This machine is a bad example as it's 
just a workstation, but you get the idea.


That tells the planner it can expect the OS cache to have that much of the DB 
cached.  It's kind of an order of magnitude knob, so it doesn't have to be 
that precise.


Since you're running multiple postmasters on the same machine (5 per machine 
right?), then setting the shared_buffers up to 5 (400MB) on each 
postmaster is probably desirable, though if you have smaller DBs on some of 
them, it might only be worth it for the largest one.  I suspect that having 
the effective_cache_size set to the output of free on each postmaster is 
desirable, but your case likely requires some benchmarking to find the optimal 
config.


If you look through the archives, there is a formula for calculating what you 
need to set the kernel shared memory parameters.  Otherwise, you can just 
start postgres and look at the log as it'll tell you what it tried to 
allocate.


Hopefully there's someone with experience running multiple postmasters on the 
same machine that can speak to the postgresql.conf knobs more specifically.


I'd still suggest you upgrade to at least 8.1.8.






Thanks

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Jeff Frost
Sent: Thursday, April 05, 2007 3:24 PM
To: John Allgood
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] High Load on Postgres 7.4.16 Server

On Thu, 5 Apr 2007, John Allgood wrote:


Hello All

I sent this message to the admin list and it never got through so I
am trying the performance list.
We moved our application to a new machine last night. It is a Dell
PowerEdge 6950 2X Dual Core. AMD Opteron 8214 2.2Ghz. 8GB Memory. The
machine is running Redhat AS 4 Upd 4 and Redhat Cluster Suite. The SAN is

an

EMC SAS connected via fibre. We are using Postgres 7.4.16. We have

recently

had some major hardware issues and replaced the hardware with brand new

Dell

equipment. We expected a major performance increase over the previous

being

the old equipment was nearly three years old
I will try and explain how things are configured. We have 10
separate postmasters running 5 on each node. Each of the postmasters is a
single instance of each database. Each database is separated by division

and

also we have them separate so we can restart an postmaster with needing to
restart all databases My largest database is about 7 GB. And the others

run

anywhere from 100MB - 1.8GB.
The other configuration was RHEL3 and Postgres 7.4.13 and Redhat
Cluster Suite. The application seemed to run much faster on the older
equipment.
My thoughts on the issues are that I could be something with the OS
tuning. Here is what my kernel.shmmax, kernel.shmall = 1073741824. Is

there

something else that I could tune in the OS. My max_connections=35 and

shared

buffers=8192 for my largest database.


John,

Was the SAN connected to the previous machine or is it also a new addition
with the Dell hardware?  We had a fairly recent post regarding a similar
upgrade in which the SAN ended up being the problem, so the first thing I
would do is test the SAN with bonnie-++ and/or move your application to use
a
local disk and test again.  With 8GB of RAM, I'd probably set the
shared_buffers to at least 5...If I remember correctly, this was the
most
you could set it to on 7.4.x and continue benefitting from it.  I'd strongly

encourage you to upgrade to at least 8.1.8 (and possibly 8.2.3) if you can,
as
it has much better shared memory management.  You might also want to double
check your effective_cache_size and random_page_cost to see if they are set
to
reasonable values.  Did you just copy the old postgresql.conf over?

This is the beginning of the thread I mentioned above:

http://archives.postgresql.org/pgsql-performance/2007-03/msg00104.php




--
Jeff Frost, Owner   [EMAIL PROTECTED]
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] strange performance regression between 7.4 and 8.1

2007-03-02 Thread Jeff Frost

On Fri, 2 Mar 2007, Guido Neitzer wrote:


On 02.03.2007, at 14:20, Alex Deucher wrote:


Ah OK.  I see what you are saying;  thank you for clarifying.  Yes,
the SAN is configured for maximum capacity; it has large RAID 5
groups.  As I said earlier, we never intended to run a DB on the SAN,
it just happened to come up, hence the configuration.


So why not dumping the stuff ones, importing into a PG configured to use 
local discs (Or even ONE local disc, you might have the 16GB you gave as a 
size for the db on the local machine, right?) and testing whether the problem 
is with PG connecting to the SAN. So you have one factor less to consider 
after all your changes.


Maybe it's just that something in the chain from PG to the actual HD spindles 
kills your random access performance for getting the actual rows.


I am actually starting to think that the SAN may be introducing some amount of 
latency that is enough to kill your random IO which is what all of the queries 
in question are doing - look up in index - fetch row from table.


If you have the time, it would be totally worth it to test with a local disk 
and see how that affects the speed.


I would think that even with RAID5, a SAN with that many spindles would be 
quite fast in raw throughput, but perhaps it's just seek latency that's 
killing you.


When you run the bonnie tests again, take note of what the seeks/sec is 
compared with the old disk.  Also, you should run bonnie with the -b switch to 
see if that causes significant slowdown of the writes...maybe minor synced 
write activity to pg_xlog is bogging the entire system down.  Is the system 
spending most of its time in IO wait?


Also, another item of note might be the actual on disk DB size..I wonder if it 
has changed significantly going from SQL_ASCII to UTF8.


In 8.1 you can do this:

SELECT  datname,
pg_size_pretty(pg_database_size(datname)) AS size
FROM pg_database;

In 7.4, you'll need to install the dbsize contrib module to get the same info.

--
Jeff Frost, Owner   [EMAIL PROTECTED]
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] strange performance regression between 7.4 and 8.1

2007-03-01 Thread Jeff Frost

On Thu, 1 Mar 2007, Joshua D. Drake wrote:


Alex Deucher wrote:

Hello,

I have noticed a strange performance regression and I'm at a loss as
to what's happening.  We have a fairly large database (~16 GB).  The
original postgres 7.4 was running on a sun v880 with 4 CPUs and 8 GB
of ram running Solaris on local scsi discs.  The new server is a sun
Opteron box with 4 cores, 8 GB of ram running postgres 8.1.4 on Linux
(AMD64) on a 4 Gbps FC SAN volume.  When we created the new database
it was created from scratch rather than copying over the old one,
however the table structure is almost identical (UTF8 on the new one
vs. C on the old). The problem is queries are ~10x slower on the new
hardware.  I read several places that the SAN might be to blame, but
testing with bonnie and dd indicates that the SAN is actually almost
twice as fast as the scsi discs in the old sun server.  I've tried
adjusting just about every option in the postgres config file, but
performance remains the same.  Any ideas?


Vacuum? Analayze? default_statistics_target? How many shared_buffers?
effective_cache_size? work_mem?


Also, an explain analyze from both the 7.4 and 8.1 systems with one of the 
10x slower queries would probably be handy.


What do you mean by created from scratch rather than copying over the old 
one?  How did you put the data in?  Did you run analyze after loading it? 
Is autovacuum enabled and if so, what are the thresholds?


--
Jeff Frost, Owner   [EMAIL PROTECTED]
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] strange performance regression between 7.4 and 8.1

2007-03-01 Thread Jeff Frost

On Thu, 1 Mar 2007, Alex Deucher wrote:


Vacuum? Analayze? default_statistics_target? How many shared_buffers?
effective_cache_size? work_mem?



I'm running the autovacuum process on the 8.1 server.  vacuuming on
the old server was done manually.

default_statistics_target and effective_cache_size are set to the the
defaults on both.

postgres 7.4 server:
# - Memory -
shared_buffers = 82000 # 1000min 16, at least
max_connections*2, 8KB each
sort_mem = 8000# 1024min 64, size in KB
vacuum_mem = 32000 # 8192min 1024, size in KB
# - Free Space Map -
#max_fsm_pages = 2  # min max_fsm_relations*16, 6 bytes each
#max_fsm_relations = 1000   # min 100, ~50 bytes each
# - Kernel Resource Usage -
#max_files_per_process = 1000   # min 25

postgres 8.1 server:
# - Memory -
shared_buffers = 10 # min 16 or max_connections*2, 8KB 
each

temp_buffers = 2000 #1000   # min 100, 8KB each
max_prepared_transactions = 100 #5  # can be 0 or more
# note: increasing max_prepared_transactions costs ~600 bytes of shared 
memory

# per transaction slot, plus lock space (see max_locks_per_transaction).
work_mem = 1#1024   # min 64, size in KB
maintenance_work_mem = 524288 #16384# min 1024, size in KB
#max_stack_depth = 2048 # min 100, size in KB

I've also tried using the same settings from the old server on the new
one; same performance issues.



If this is a linux system, could you give us the output of the 'free' command? 
Postgresql might be choosing a bad plan because your effective_cache_size is 
way off (it's the default now right?).  Also, what was the block read/write 
speed of the SAN from your bonnie tests?  Probably want to tune 
random_page_cost as well if it's also at the default.



--
Jeff Frost, Owner   [EMAIL PROTECTED]
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] strange performance regression between 7.4 and 8.1

2007-03-01 Thread Jeff Frost

On Thu, 1 Mar 2007, Alex Deucher wrote:


On 3/1/07, Jeff Frost [EMAIL PROTECTED] wrote:

On Thu, 1 Mar 2007, Alex Deucher wrote:

 Vacuum? Analayze? default_statistics_target? How many shared_buffers?
 effective_cache_size? work_mem?


 I'm running the autovacuum process on the 8.1 server.  vacuuming on
 the old server was done manually.

 default_statistics_target and effective_cache_size are set to the the
 defaults on both.

 postgres 7.4 server:
 # - Memory -
 shared_buffers = 82000 # 1000min 16, at least
 max_connections*2, 8KB each
 sort_mem = 8000# 1024min 64, size in KB
 vacuum_mem = 32000 # 8192min 1024, size in KB
 # - Free Space Map -
 #max_fsm_pages = 2  # min max_fsm_relations*16, 6 bytes each
 #max_fsm_relations = 1000   # min 100, ~50 bytes each
 # - Kernel Resource Usage -
 #max_files_per_process = 1000   # min 25

 postgres 8.1 server:
 # - Memory -
 shared_buffers = 10 # min 16 or max_connections*2, 
8KB

 each
 temp_buffers = 2000 #1000   # min 100, 8KB each
 max_prepared_transactions = 100 #5  # can be 0 or more
 # note: increasing max_prepared_transactions costs ~600 bytes of shared
 memory
 # per transaction slot, plus lock space (see max_locks_per_transaction).
 work_mem = 1#1024   # min 64, size in KB
 maintenance_work_mem = 524288 #16384# min 1024, size in KB
 #max_stack_depth = 2048 # min 100, size in KB

 I've also tried using the same settings from the old server on the new
 one; same performance issues.


If this is a linux system, could you give us the output of the 'free' 
command?


   total   used   free sharedbuffers cached
Mem:   80598528042868  16984  02287888648
-/+ buffers/cache: 1539927905860
Swap: 15631224   2164   15629060


So, I would set effective_cache_size = 988232 (7905860/8).



Postgresql might be choosing a bad plan because your effective_cache_size 
is

way off (it's the default now right?).  Also, what was the block read/write


yes it's set to the default.


speed of the SAN from your bonnie tests?  Probably want to tune
random_page_cost as well if it's also at the default.



	--Sequential Output-- --Sequential Input- 
--Random-
  -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
--Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec 
%CP
luna12-san   16000M 58896  91 62931   9 35870   5 54869  82 145504 13  397.7 
0




So, you're getting 62MB/s writes and 145MB/s reads.  Just FYI, that write 
speed is about the same as my single SATA drive write speed on my workstation, 
so not that great.  The read speed is decent, though and with that sort of 
read performance, you might want to lower random_page_cost to something like 
2.5 or 2 so the planner will tend to prefer index scans.


--
Jeff Frost, Owner   [EMAIL PROTECTED]
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] strange performance regression between 7.4 and 8.1

2007-03-01 Thread Jeff Frost

On Thu, 1 Mar 2007, Alex Deucher wrote:


On 3/1/07, Jeff Frost [EMAIL PROTECTED] wrote:

On Thu, 1 Mar 2007, Alex Deucher wrote:

  Postgresql might be choosing a bad plan because your
 effective_cache_size
  is
  way off (it's the default now right?).  Also, what was the block
 read/write
 
  yes it's set to the default.
 
  speed of the SAN from your bonnie tests?  Probably want to tune
  random_page_cost as well if it's also at the default.
 
 
--Sequential Output-- --Sequential Input-
  --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
  --Seeks--
  MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
 /sec
  %CP
  luna12-san   16000M 58896  91 62931   9 35870   5 54869  82 145504 13
 397.7
  0
 

 So, you're getting 62MB/s writes and 145MB/s reads.  Just FYI, that 
write

 speed is about the same as my single SATA drive write speed on my
 workstation,
 so not that great.  The read speed is decent, though and with that sort 
of

 read performance, you might want to lower random_page_cost to something
 like
 2.5 or 2 so the planner will tend to prefer index scans.


 Right, but the old box was getting ~45MBps on both reads and writes,
 so it's an improvement for me :)  Thanks for the advice, I'll let you
 know how it goes.

Do you think that is because you have a different interface between you and
the SAN?  ~45MBps is pretty slow - your average 7200RPM ATA133 drive can do
that and costs quite a bit less than a SAN.

Is the SAN being shared between the database servers and other servers? 
Maybe
it was just random timing that gave you the poor write performance on the 
old

server which might be also yielding occassional poor performance on the new
one.



The direct attached scsi discs on the old database server we getting
45MBps not the SAN.  The SAN got 62/145Mbps, which is not as bad.  We
have 4 servers on the SAN each with it's own 4 GBps FC link via an FC
switch.  I'll try and re-run the numbers when the servers are idle
this weekend.


Sorry, I thought the old server was also attached to the SAN.  My fault for 
not hanging onto the entire email thread.


I think you're mixing and matching your capitol and lower case Bs in your 
sentence above though. :-)


I suspect what you really mean is The SAN got 62/145MBps (megabytes/sec) and 
teh FC link is 4Gbps (gigabits/sec) or 500MBps.  Is that correct?  If so, and 
seeing that you think there are 105 spindles on the SAN, I'd say you're either 
maxxing out the switch fabric of the SAN with your servers or you have a 
really poorly performing SAN in general, or you just misunderstood the .


As a comparison With 8 WD Raptors configured in a RAID10 with normal ext3 I 
get about 160MB/s write and 305MB/s read performance.  Hopefully the SAN has 
lots of other super nifty features that make up for the poor performance. :-(


--
Jeff Frost, Owner   [EMAIL PROTECTED]
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] SELECT performance problem

2007-02-20 Thread Jeff Frost

On Tue, 20 Feb 2007, Glenn Sullivan wrote:


I am updating from 7.4.5 to 8.2.3.  I have noticed a significant
slowdown in simple searches such as
   select filename from vnmr_data where seqfil = 'sems';
This returns 12 rows out of 1 million items in the table.
On 7.4.5, this takes about 1.5 seconds.  On 8.2.3, it is taking
about 9 seconds.

I have played with different values of:
work_mem, temp_buffers, shared_buffers and effective_cache_size
and none of them make any difference.

I am running on redhat Linux 4   64bit.


Glenn,

Can you forward us the explain analyze output from 7.4.5 and 8.2.3 for the 
query in question?


Also, is the hardware the same between 7.4.5 and 8.2.3?  If not, what is the 
difference?


--
Jeff Frost, Owner   [EMAIL PROTECTED]
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] drive configuration for a new server

2007-02-02 Thread Jeff Frost

On Thu, 1 Feb 2007, Ben wrote:

I'm looking to replace some old crusty hardware with some sparkling new 
hardware. In the process, I'm looking to move away from the previous 
mentality of having the Big Server for Everything to having a cluster of 
servers, each of which handles some discrete subset of data. But rackspace 
isn't inifinte, so I'm leaning towards cases that give me 8 drive bays. This 
leaves me with an interesting problem of how to configure these limited 
number of drives.


I know that ideally I would have seperate spindles for WAL, indexes, and 
data. But I also know that I must be able to survive a drive failure, and I 
want at least 1TB of space for my data. I suspect with so few drive bays, I 
won't be living in an ideal world.


With an even mix of reads and writes (or possibly more writes than reads), is 
it better to use RAID10 and have everything on the same partition, or to have 
data and indexes on a 6-drive RAID5 with WAL on its own RAID1?


I'm surprised I haven't seen any responses to this, but maybe everyone's tired 
of the what to do with X drives question...perhaps we need a pgsql-perform 
FAQ?


At any rate, I just recently built a new PG server for a client which had 8 
Raptors with an Areca 1160 controller that has the 1GB battery backed cache 
installed.  We tested a few different configurations and decided on an 8 disk 
RAID10 with a separate WAL partition.  The separate WAL partition was 
marginally faster by a few percent.


The 8 disk RAID5 was actually a bit faster than the 8 disk RAID10 in overall 
throughput with the Areca, but we opted for the RAID10 because of reliability 
reasons.


The moral of the story is to test each config with your workload and see what 
performs the best.  In our case, the battery backed write cache seemed to 
remove the need for a separate WAL disk, but someone elses workload might 
still benefit from it.


--
Jeff Frost, Owner   [EMAIL PROTECTED]
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS

2007-01-10 Thread Jeff Frost

On Wed, 10 Jan 2007, Jim C. Nasby wrote:


RAID1 for those of you who have been wondering if the BBU write back cache
mitigates the need for separate WAL (at least on this workload).  Those
are
the fastest times for each config, but ext2 WAL was always faster than the
other two options.  I didn't test any other filesystems in this go around.


Uh, if I'm reading this correctly, you're saying that WAL on a separate
ext2 vs. one big ext3 with data=writeback saved ~39 seconds out of
~158.5 minutes, or 0.4%? Is that even above the noise for your
measurements? I suspect the phase of the moon might play a bigger role
;P


That's what I thought too...cept I ran it 20 times and ext2 won by that
margin every time, so it was quite repeatable. :-/


Even so, you've got to really be hunting for performance to go through
the hassle of different filesystems just to gain 0.4%... :)


Indeed, but actually, I did the math again and it appears that it saves close 
to 2 minutes versus one big ext3.  I guess the moral of the story is that 
having a separate pg_xlog even on the same physical volume tends to be 
slightly faster for write oriented workloads.  Ext2 is slightly faster than 
ext3, but of course you could likely go with another filesystem yet and be 
even slightly faster as well. :-)


I guess the real moral of the story is that you can probably use one big ext3 
with the default config and it won't matter much more than 1-2% if you have a 
BBU.


--
Jeff Frost, Owner   [EMAIL PROTECTED]
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS

2007-01-09 Thread Jeff Frost

On Tue, 9 Jan 2007, Jim C. Nasby wrote:


On Thu, Dec 28, 2006 at 02:15:31PM -0800, Jeff Frost wrote:

When benchmarking various options for a new PG server at one of my clients,
I tried ext2 and ext3 (data=writeback) for the WAL and it appeared to be
fastest to have ext2 for the WAL.  The winning time was 157m46.713s for
ext2, 159m47.098s for combined ext3 data/xlog and 158m25.822s for ext3
data=writeback.  This was on an 8x150GB Raptor RAID10 on an Areca 1130 w/
1GB BBU cache.  This config benched out faster than a 6disk RAID10 + 2 disk
RAID1 for those of you who have been wondering if the BBU write back cache
mitigates the need for separate WAL (at least on this workload).  Those are
the fastest times for each config, but ext2 WAL was always faster than the
other two options.  I didn't test any other filesystems in this go around.


Uh, if I'm reading this correctly, you're saying that WAL on a separate
ext2 vs. one big ext3 with data=writeback saved ~39 seconds out of
~158.5 minutes, or 0.4%? Is that even above the noise for your
measurements? I suspect the phase of the moon might play a bigger role
;P


That's what I thought too...cept I ran it 20 times and ext2 won by that margin 
every time, so it was quite repeatable. :-/


--
Jeff Frost, Owner   [EMAIL PROTECTED]
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS

2006-12-29 Thread Jeff Frost

On Fri, 29 Dec 2006, Alvaro Herrera wrote:


Ron wrote:


C= What file system are you using?  Unlike BigDBMS, pg does not have
its own native one, so you have to choose the one that best suits
your needs.  For update heavy applications involving lots of small
updates jfs and XFS should both be seriously considered.


Actually it has been suggested that a combination of ext2 (for WAL) and
ext3 (for data, with data journalling disabled) is a good performer.
AFAIK you don't want the overhead of journalling for the WAL partition.


When benchmarking various options for a new PG server at one of my clients, I 
tried ext2 and ext3 (data=writeback) for the WAL and it appeared to be fastest 
to have ext2 for the WAL.  The winning time was 157m46.713s for ext2, 
159m47.098s for combined ext3 data/xlog and 158m25.822s for ext3 
data=writeback.  This was on an 8x150GB Raptor RAID10 on an Areca 1130 w/ 1GB 
BBU cache.  This config benched out faster than a 6disk RAID10 + 2 disk RAID1 
for those of you who have been wondering if the BBU write back cache mitigates 
the need for separate WAL (at least on this workload).  Those are the fastest 
times for each config, but ext2 WAL was always faster than the other two 
options.  I didn't test any other filesystems in this go around.


--
Jeff Frost, Owner   [EMAIL PROTECTED]
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] opportunity to benchmark a quad core Xeon

2006-12-18 Thread Jeff Frost

On Sat, 16 Dec 2006, Arjen van der Meijden wrote:


On 16-12-2006 4:24 Jeff Frost wrote:
We can add more RAM and drives for testing purposes.  Can someone suggest 
what benchmarks with what settings would be desirable to see how this 
system performs.  I don't believe I've seen any postgres benchmarks done on 
a quad xeon yet.


We've done our standard benchmark on a dual X5355:
http://tweakers.net/reviews/661

Verdict is that for a price/performance-ratio you're better off with a 5160, 
but in absolute performance it does win.




Arjen,

Have you guys run your benchmark on a quad opteron board yet?  I'm curious how 
the dual quad core Intels compare to quad dual core opteron.


--
Jeff Frost, Owner   [EMAIL PROTECTED]
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[PERFORM] opportunity to benchmark a quad core Xeon

2006-12-15 Thread Jeff Frost
I have the opportunity to benchmark a system is based on Supermicro 6015B-8V. 
It has 2x Quad Xeon E5320 1.86GHZ, 4GB DDR2 533, 1x 73GB 10k SCSI.


http://www.supermicro.com/products/system/1U/6015/SYS-6015B-8V.cfm

We can add more RAM and drives for testing purposes.  Can someone suggest what 
benchmarks with what settings would be desirable to see how this system 
performs.  I don't believe I've seen any postgres benchmarks done on a quad 
xeon yet.


--
Jeff Frost, Owner   [EMAIL PROTECTED]
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] availability of SATA vendors

2006-11-22 Thread Jeff Frost

On Wed, 22 Nov 2006, Bucky Jordan wrote:


Dells (at least the 1950 and 2950) come with the Perc5, which is
basically just the LSI MegaRAID. The units I have come with a 256MB BBU,
I'm not sure if it's upgradeable, but it looks like a standard DIMM in
there...

I posted some dd and bonnie++ benchmarks of a 6-disk setup a while back
on a 2950, so you might search the archive for those numbers if you're
interested- you should be able to get the same or better from a
similarly equipped LSI setup. I don't recall if I posted pgbench
numbers, but I can if that's of interest.


I could only find the 6 disk RAID5 numbers in the archives that were run with 
bonnie++1.03.  Have you run the RAID10 tests since?  Did you settle on 6 disk 
RAID5 or 2xRAID1 + 4XRAID10?


--
Jeff Frost, Owner   [EMAIL PROTECTED]
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] availability of SATA vendors

2006-11-22 Thread Jeff Frost

On Wed, 22 Nov 2006, Joshua D. Drake wrote:


I could only find the 6 disk RAID5 numbers in the archives that were run with
bonnie++1.03.  Have you run the RAID10 tests since?  Did you settle on 6 disk
RAID5 or 2xRAID1 + 4XRAID10?


Why not 6 drive raid 10? IIRC you need 4 to start RAID 10 but only pairs
after that.


A valid question.  Does the caching raid controller negate the desire to 
separate pg_xlog from PGDATA?


--
Jeff Frost, Owner   [EMAIL PROTECTED]
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] availability of SATA vendors

2006-11-21 Thread Jeff Frost

On Fri, 17 Nov 2006, Luke Lonergan wrote:


Currently, I'm looking at Penguin, HP and Sun (though Sun's store isn't
working for me at the moment).  Maybe I just need to order a Penguin and then
buy the controller separately, but was hoping to get support from a single
entity.


Rackable or Asacomputers sell and support systems with the 3Ware or Areca
controllers.


Luke,

ASAcomputers has been the most helpful of all the vendors so far, so thanks 
for point me at them.  I know you've been posting results with the Areca and 
3ware controllers, do you have a preference for one over the other?  It seems 
that you can only get 256MB cache with the 3ware 9550SX and you can get 512MB 
with the 9650SE, but only the Areca cards go up to 1GB.


I'm curious how big a performance gain we would see going from 256MB cache to 
512MB to 1GB.  This is for a web site backend DB which is mostly read 
intensive, but occassionally has large burts of write activity due to new user 
signups generated by the marketing engine.


--
Jeff Frost, Owner   [EMAIL PROTECTED]
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] availability of SATA vendors

2006-11-21 Thread Jeff Frost

On Tue, 21 Nov 2006, Joshua D. Drake wrote:




ASAcomputers has been the most helpful of all the vendors so far, so thanks
for point me at them.  I know you've been posting results with the Areca and
3ware controllers, do you have a preference for one over the other?  It seems
that you can only get 256MB cache with the 3ware 9550SX and you can get 512MB
with the 9650SE, but only the Areca cards go up to 1GB.


Don't count out LSI either. They make a great SATA controller based off
their very well respected SCSI controller.


Interesting.  Does it perform as well as the ARECAs and how much BBU cache can 
you put in it?  Oh, does it use the good ole megaraid_mbox driver as well?


--
Jeff Frost, Owner   [EMAIL PROTECTED]
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


[PERFORM] availability of SATA vendors

2006-11-17 Thread Jeff Frost
I see many of you folks singing the praises of the Areca and 3ware SATA 
controllers, but I've been trying to price some systems and am having trouble 
finding a vendor who ships these controllers with their systems.  Are you 
rolling your own white boxes or am I just looking in the wrong places?


Currently, I'm looking at Penguin, HP and Sun (though Sun's store isn't 
working for me at the moment).  Maybe I just need to order a Penguin and then 
buy the controller separately, but was hoping to get support from a single 
entity.


--
Jeff Frost, Owner   [EMAIL PROTECTED]
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] High CPU Usage - PostgreSQL 7.3

2006-07-12 Thread Jeff Frost
Please Cc: the list when replying to things like this so everyone can see (and 
likely help).


I'm not sure what you're response is actually regarding.  Could you give some 
more detail?


On Wed, 12 Jul 2006, Rizal wrote:


so, i must upgrade my PostgreSQL 803 which i have with a new version ?

- Original Message -
From: Jeff Frost [EMAIL PROTECTED]
To: Neil Hepworth [EMAIL PROTECTED]
Cc: pgsql-performance@postgresql.org
Sent: Wednesday, July 12, 2006 10:27 AM
Subject: Re: [PERFORM] High CPU Usage - PostgreSQL 7.3



On Wed, 12 Jul 2006, Neil Hepworth wrote:


I am using version PostgreSQL 7.3.10 (RPM:
postgresql73-rhel21-7.3.10-2).  Unfortunately vacuumdb -a -v does not
give the FSM info at the end (need a newer version of postgres for
that).  Running the same queries on 8.1 reduces the time taken to
about 16 minutes, though I didn't run the test on the same hardware or
OS as I want to keep my test server as close to production as
possible, so I ran the 8.1 server on my Windows laptop (2GHz Centrino
Duo with 2GB of RAM, yes the laptop is brand new :).


Well, looks like you're at least fairly up to date, but there is a fix in
7.3.11 that you might want to get by upgrading to 7.3.15:

  * Fix race condition in transaction log management
There was a narrow window in which an I/O operation could be
initiated for the wrong page, leading to an Assert failure or data
corruption.

It also appears that you can run autovacuum with 7.3 (I thought maybe it

only

went back as far as 7.4).

So, is the 16 minutes on your laptop with 8.1 for windows vs 1hr on the

server

for the whole set of loops?  If so, 4x isn't a bad improvement. :-)  So,
assuming you dumped/loaded the same DB onto your laptop's postgresql

server,

what does the vacuumdb -a -v say on the laptop?  Perhaps we can use it to

see

if your fsm settings are ok.

BTW, did you see Scott's posting here:

http://archives.postgresql.org/pgsql-performance/2006-07/msg00091.php

Since we didn't hear from you for a while, I thought perhaps Scott had hit

on

the fix.  Have you tried that yet?  It certainly would help the planner

out.


You might also want to turn on autovacuum and see if that helps.

What's your disk subsystem like?  In fact, what's the entire DB server
hardware like?



I run through a loop, executing the following or similar queries 8
times (well actually 12 but the last 4 don't do anything) - Jeff I've
attached complete outputs as files.

A debug output further below (numbers after each method call name,
above each SQL statement, are times to run that statement in
milliseconds, the times on the lines  are cumulative).  So total for
one loop is 515 seconds, multiple by 8 and that gets me to over an
hour); it is actually the deletes that take the most time; 179 seconds
and 185 seconds each loop.



CREATE TABLE fttemp670743219 AS SELECT * FROM ftone LIMIT 0
INSERT INTO fttemp670743219 ( epId, start, direction, classid,
consolidation, cnt )  SELECT epId, TO_TIMESTAMP(start, '-MM-DD
HH24:00:00.0')::timestamp AS start, direction, classid, 60 AS
consolidation, SUM(cnt) AS cnt FROM ftone WHERE consolidation = 0 AND
start  TO_TIMESTAMP('2006-07-11 14:04:34.156433+1000', '-MM-DD
HH24:00:00.0')::timestamp GROUP BY epId, direction,
TO_TIMESTAMP(start, '-MM-DD HH24:00:00.0')::timestamp, classid
DELETE FROM ONLY ftone WHERE ftone.epId = fttemp670743219.epId AND
ftone.direction = fttemp670743219.direction AND ftone.start =
fttemp670743219.start AND ftone.consolidation =
fttemp670743219.consolidation AND ftone.classid =
fttemp670743219.classid
INSERT INTO ftone ( epId, start, consolidation, direction, classid,
cnt ) SELECT epId, start, consolidation, direction, classid, cnt FROM
fttemp670743219
DROP TABLE fttemp670743219
DELETE FROM ftone WHERE consolidation = 0 AND start 
TO_TIMESTAMP((TO_TIMESTAMP('2006-07-11 14:04:34.156433+1000',
'-MM-DD 00:00:00.0')::timestamp - INTERVAL '10080 MINUTE'),
'-MM-DD 00:00:00.0')::timestamp



ftone: 0:
createConsolidatedInTemporary: 188:
CREATE TABLE fttemp678233382 AS SELECT * FROM ftone LIMIT 0
createConsolidatedInTemporary: 76783:
INSERT INTO fttemp678233382 ( epPairdefnid, start, direction, classid,
consolidation, cnt )  SELECT epPairdefnid, TO_TIMESTAMP(start,
'-MM-DD HH24:00:00.0')::timestamp AS start, direction, classid, 60
AS consolidation, SUM(cnt) AS cnt FROM ftone WHERE consolidation = 0
AND start  TO_TIMESTAMP('2006-07-12 11:02:13.865444+1000',
'-MM-DD HH24:00:00.0')::timestamp GROUP BY epPairdefnid,
direction, TO_TIMESTAMP(start, '-MM-DD HH24:00:00.0')::timestamp,
classid
replaceConsolidatedInMainTable: 179178:
DELETE FROM ONLY ftone WHERE ftone.epPairdefnid =
fttemp678233382.epPairdefnid AND ftone.direction =
fttemp678233382.direction AND ftone.start = fttemp678233382.start AND
ftone.consolidation

Re: [PERFORM] High CPU Usage - PostgreSQL 7.3

2006-07-12 Thread Jeff Frost

On Wed, 12 Jul 2006, Neil Hepworth wrote:


Yes, it was the same DB so, yes 8.1 gives roughly a four fold improvement
(assuming hardware and OS differences aren't that significant - I'd expect
the Linux version to be faster if anything); which certainly ain't bad! :)

Good idea for the vacuumdb -a -v on the laptop, I re imported the database
and than ran it output below:

INFO:  free space map contains 949 pages in 537 relations
DETAIL:  A total of 9024 page slots are in use (including overhead).
9024 page slots are required to track all free space.
Current limits are:  2 page slots, 1000 relations, using 186 KB.
VACUUM


Well, this looks like it's probably on track already even though it'll change 
as there are updates/deletes, but I suspect you're more or less ok with the 
FSM settings you have.




I am about to start testing Scott's suggestion now (thanks Scott - wasn't
ignoring you, just didn't have time yesterday), and I'll get back with the
results.

Before I posted the problem to this list I was focusing more on the settings
in postgresql.conf than optimising the query as I thought this might be a
general problem, for all my DB updates/queries, with the way the planner was
optimising queries; maybe assuming CPU cost was too cheap?  Do you think I
was off track in my initial thinking?  Optimising these queries is
certainly beneficial but I don't want postgres to hog the CPU for any
extended period (other apps also run on the server), so I was wondering if
the general config settings could to be tuned to always prevent this
(regardless of how poorly written my queries are :)?



I guess you could nice the postmaster, on startup or renice after startup but 
I'm not aware of any conf settings that would tune postgres to avoid using the 
CPU.



Neil


On 12/07/06, Jeff Frost [EMAIL PROTECTED] wrote:


On Wed, 12 Jul 2006, Neil Hepworth wrote:

 I am using version PostgreSQL 7.3.10 (RPM:
 postgresql73-rhel21-7.3.10-2).  Unfortunately vacuumdb -a -v does not
 give the FSM info at the end (need a newer version of postgres for
 that).  Running the same queries on 8.1 reduces the time taken to
 about 16 minutes, though I didn't run the test on the same hardware or
 OS as I want to keep my test server as close to production as
 possible, so I ran the 8.1 server on my Windows laptop (2GHz Centrino
 Duo with 2GB of RAM, yes the laptop is brand new :).

Well, looks like you're at least fairly up to date, but there is a fix in
7.3.11 that you might want to get by upgrading to 7.3.15:

 * Fix race condition in transaction log management
   There was a narrow window in which an I/O operation could be
   initiated for the wrong page, leading to an Assert failure or data
   corruption.

It also appears that you can run autovacuum with 7.3 (I thought maybe it
only
went back as far as 7.4).

So, is the 16 minutes on your laptop with 8.1 for windows vs 1hr on the
server
for the whole set of loops?  If so, 4x isn't a bad improvement. :-)  So,
assuming you dumped/loaded the same DB onto your laptop's postgresql
server,
what does the vacuumdb -a -v say on the laptop?  Perhaps we can use it to
see
if your fsm settings are ok.

BTW, did you see Scott's posting here:

http://archives.postgresql.org/pgsql-performance/2006-07/msg00091.php

Since we didn't hear from you for a while, I thought perhaps Scott had hit
on
the fix.  Have you tried that yet?  It certainly would help the planner
out.

You might also want to turn on autovacuum and see if that helps.

What's your disk subsystem like?  In fact, what's the entire DB server
hardware like?


 I run through a loop, executing the following or similar queries 8
 times (well actually 12 but the last 4 don't do anything) - Jeff I've
 attached complete outputs as files.

 A debug output further below (numbers after each method call name,
 above each SQL statement, are times to run that statement in
 milliseconds, the times on the lines  are cumulative).  So total for
 one loop is 515 seconds, multiple by 8 and that gets me to over an
 hour); it is actually the deletes that take the most time; 179 seconds
 and 185 seconds each loop.

 

 CREATE TABLE fttemp670743219 AS SELECT * FROM ftone LIMIT 0
 INSERT INTO fttemp670743219 ( epId, start, direction, classid,
 consolidation, cnt )  SELECT epId, TO_TIMESTAMP(start, '-MM-DD
 HH24:00:00.0')::timestamp AS start, direction, classid, 60 AS
 consolidation, SUM(cnt) AS cnt FROM ftone WHERE consolidation = 0 AND
 start  TO_TIMESTAMP('2006-07-11 14:04:34.156433+1000', '-MM-DD
 HH24:00:00.0')::timestamp GROUP BY epId, direction,
 TO_TIMESTAMP(start, '-MM-DD HH24:00:00.0')::timestamp, classid
 DELETE FROM ONLY ftone WHERE ftone.epId = fttemp670743219.epId AND
 ftone.direction = fttemp670743219.direction AND ftone.start =
 fttemp670743219.start AND ftone.consolidation =
 fttemp670743219.consolidation AND ftone.classid =
 fttemp670743219.classid

Re: [PERFORM] High CPU Usage - PostgreSQL 7.3

2006-07-11 Thread Jeff Frost
')::timestamp - INTERVAL '10080 MINUTE'),
'-MM-DD 00:00:00.0')::timestamp
MAIN LOOP TOTAL deleteExpiredData: 505142
MAIN LOOP TOTAL generateStatistics: 515611



Thanks again,
  Neil


On 11/07/06, Jeff Frost [EMAIL PROTECTED] wrote:



On Mon, 10 Jul 2006, Neil Hepworth wrote:

 I should also explain that I run through these queries on multiple
 tables and with some slightly different parameters for the
 consolidation so I run through those 3 queries (or similar) 9 times
 and this takes a total of about 2 hours, with high CPU usage.  And I
 am running the queries from a remote Java application (using JDBC),
 the client is using postgresql-8.0-311.jdbc3.jar.  The explain analyse
 results I have provided below are from running via pgAdmin, not the
 Java app (I did a vacuum analyse of the db before running them):



Neil, did you ever answer which version of 7.3 this is?

BTW, you mentioned that this takes 2 hours, but even looping over this 9 
times

seems like it would only take 9 minutes (55 seconds for the SELECT and 4
seconds for the DELETE = 59 seconds times 9).  Perhaps you should post the
explain analyze for the actual query that takes so long as the planner 
output

will likely be quite different.

One thing I noticed is that the planner seems quite incorrect about the 
number

of rows it expects in the SELECT.  If you ran vacuum analyze before this,
perhaps your fsm settings are incorrect?  What does vacuumdb -a -v output 
at

the end?  I'm looking for something that looks like this:

INFO:  free space map: 109 relations, 204 pages stored; 1792 total pages
needed
DETAIL:  Allocated FSM size: 1000 relations + 2 pages = 182 kB shared
memory.

I see your fsm settings are non-default, so it's also possible I'm not used 
to

reading 7.3's explain analyze output. :-)

Also, what does vmstat output look like while the query is running? 
Perhaps
you're running into some context switching problems.  It would be 
interesting

to know how the query runs on 8.1.x just to know if we're chasing an
optimization that's fixed already in a later version.


 Subquery Scan *SELECT*  (cost=59690.11..62038.38 rows=23483
 width=16) (actual time=16861.73..36473.12 rows=560094 loops=1)
 -  Aggregate  (cost=59690.11..62038.38 rows=23483 width=16) (actual
 time=16861.72..34243.63 rows=560094 loops=1)
   -  Group  (cost=59690.11..61451.32 rows=234827 width=16)
 (actual time=16861.62..20920.12 rows=709461 loops=1)
 -  Sort  (cost=59690.11..60277.18 rows=234827 width=16)
 (actual time=16861.62..18081.07 rows=709461 loops=1)
   Sort Key: eppairdefnid, start
   -  Seq Scan on ftone  (cost=0.00..36446.66
 rows=234827 width=16) (actual time=0.45..10320.91 rows=709461 loops=1)
 Filter: ((consolidation = 60) AND (start 
 (to_timestamp('2006-07-10 18:43:27.391103+1000'::text,
 '-MM-DDHH24:00:00.0'::text))::timestamp without time zone))
 Total runtime: 55378.68 msec

 *** For the delete ***:

 Hash Join  (cost=0.00..30020.31 rows=425 width=14) (actual
 time=3767.47..3767.47 rows=0 loops=1)
 Hash Cond: (outer.eppairdefnid = inner.eppairdefnid)
 -  Seq Scan on ftone  (cost=0.00..23583.33 rows=1286333 width=10)
 (actual time=0.04..2299.94 rows=1286333 loops=1)
 -  Hash  (cost=0.00..0.00 rows=1 width=4) (actual
 time=206.01..206.01 rows=0 loops=1)
   -  Seq Scan on fttemp1600384653  (cost=0.00..0.00 rows=1
 width=4) (actual time=206.00..206.00 rows=0 loops=1)
 Total runtime: 3767.52 msec

--
Jeff Frost, Owner   [EMAIL PROTECTED]
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954





--
Jeff Frost, Owner   [EMAIL PROTECTED]
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] High CPU Usage - PostgreSQL 7.3

2006-07-11 Thread Jeff Frost

On Tue, 11 Jul 2006, Jeff Frost wrote:


On Wed, 12 Jul 2006, Neil Hepworth wrote:

You might also want to turn on autovacuum and see if that helps.

What's your disk subsystem like?  In fact, what's the entire DB server 
hardware like?


By the way, how big does the temp table get?  If it's large, it might make the 
DELETE slow because it doesn't have any indexes on any of the comparison 
columns.


DELETE FROM ONLY ftone WHERE ftone.epId = fttemp670743219.epId AND 
ftone.direction = fttemp670743219.direction AND ftone.start = 
fttemp670743219.start AND ftone.consolidation = fttemp670743219.consolidation 
AND ftone.classid = fttemp670743219.classid


In your explain analyze from before, it seems that there were 0 rows in that 
table:



  -  Seq Scan on fttemp1600384653  (cost=0.00..0.00 rows=1
width=4) (actual time=206.00..206.00 rows=0 loops=1)
Total runtime: 3767.52 msec


but that was with the smaller set size I believe.





I run through a loop, executing the following or similar queries 8
times (well actually 12 but the last 4 don't do anything) - Jeff I've
attached complete outputs as files.

A debug output further below (numbers after each method call name,
above each SQL statement, are times to run that statement in
milliseconds, the times on the lines  are cumulative).  So total for
one loop is 515 seconds, multiple by 8 and that gets me to over an
hour); it is actually the deletes that take the most time; 179 seconds
and 185 seconds each loop.



CREATE TABLE fttemp670743219 AS SELECT * FROM ftone LIMIT 0
INSERT INTO fttemp670743219 ( epId, start, direction, classid,
consolidation, cnt )  SELECT epId, TO_TIMESTAMP(start, '-MM-DD
HH24:00:00.0')::timestamp AS start, direction, classid, 60 AS
consolidation, SUM(cnt) AS cnt FROM ftone WHERE consolidation = 0 AND
start  TO_TIMESTAMP('2006-07-11 14:04:34.156433+1000', '-MM-DD
HH24:00:00.0')::timestamp GROUP BY epId, direction,
TO_TIMESTAMP(start, '-MM-DD HH24:00:00.0')::timestamp, classid
DELETE FROM ONLY ftone WHERE ftone.epId = fttemp670743219.epId AND
ftone.direction = fttemp670743219.direction AND ftone.start =
fttemp670743219.start AND ftone.consolidation =
fttemp670743219.consolidation AND ftone.classid =
fttemp670743219.classid
INSERT INTO ftone ( epId, start, consolidation, direction, classid,
cnt ) SELECT epId, start, consolidation, direction, classid, cnt FROM
fttemp670743219
DROP TABLE fttemp670743219
DELETE FROM ftone WHERE consolidation = 0 AND start 
TO_TIMESTAMP((TO_TIMESTAMP('2006-07-11 14:04:34.156433+1000',
'-MM-DD 00:00:00.0')::timestamp - INTERVAL '10080 MINUTE'),
'-MM-DD 00:00:00.0')::timestamp



ftone: 0:
createConsolidatedInTemporary: 188:
CREATE TABLE fttemp678233382 AS SELECT * FROM ftone LIMIT 0
createConsolidatedInTemporary: 76783:
INSERT INTO fttemp678233382 ( epPairdefnid, start, direction, classid,
consolidation, cnt )  SELECT epPairdefnid, TO_TIMESTAMP(start,
'-MM-DD HH24:00:00.0')::timestamp AS start, direction, classid, 60
AS consolidation, SUM(cnt) AS cnt FROM ftone WHERE consolidation = 0
AND start  TO_TIMESTAMP('2006-07-12 11:02:13.865444+1000',
'-MM-DD HH24:00:00.0')::timestamp GROUP BY epPairdefnid,
direction, TO_TIMESTAMP(start, '-MM-DD HH24:00:00.0')::timestamp,
classid
replaceConsolidatedInMainTable: 179178:
DELETE FROM ONLY ftone WHERE ftone.epPairdefnid =
fttemp678233382.epPairdefnid AND ftone.direction =
fttemp678233382.direction AND ftone.start = fttemp678233382.start AND
ftone.consolidation = fttemp678233382.consolidation AND ftone.classid
= fttemp678233382.classid
replaceConsolidatedInMainTable: 61705:
INSERT INTO ftone ( epPairdefnid, start, consolidation, direction,
classid, cnt ) SELECT epPairdefnid, start, consolidation, direction,
classid, cnt FROM fttemp678233382
consolidate: 2656:
DROP TABLE fttemp678233382
MAIN LOOP TOTAL consolidate: 320526
deleteOlderThan: 184616:
DELETE FROM ftone WHERE consolidation = 0 AND start 
TO_TIMESTAMP((TO_TIMESTAMP('2006-07-12 11:02:13.865444+1000',
'-MM-DD 00:00:00.0')::timestamp - INTERVAL '10080 MINUTE'),
'-MM-DD 00:00:00.0')::timestamp
MAIN LOOP TOTAL deleteExpiredData: 505142
MAIN LOOP TOTAL generateStatistics: 515611



Thanks again,
  Neil


On 11/07/06, Jeff Frost [EMAIL PROTECTED] wrote:



On Mon, 10 Jul 2006, Neil Hepworth wrote:

 I should also explain that I run through these queries on multiple
 tables and with some slightly different parameters for the
 consolidation so I run through those 3 queries (or similar) 9 times
 and this takes a total of about 2 hours, with high CPU usage.  And I
 am running the queries from a remote Java application (using JDBC),
 the client is using postgresql-8.0-311.jdbc3.jar.  The explain analyse
 results I have provided below are from running via pgAdmin, not the
 Java app

Re: [PERFORM] High CPU Usage - PostgreSQL 7.3

2006-07-09 Thread Jeff Frost

On Mon, 10 Jul 2006, Neil Hepworth wrote:


I am running PostgreSQL 7.3 on a Linux box (RHEL 2.1 - Xeon 2.8GHz
with 1GB of RAM) and seeing very high CPU usage (normally over 90%)
when I am running the following queries, and the queries take a long
time to return; over an hour!


First off, when is the last time you vacuum analyzed this DB and how often 
does the vacuum analyze happen.  Please post the EXPLAIN ANALYZE output for 
each of the queries below.


Also, I would strongly urge you to upgrade to a more recent version of 
postgresql.  We're currently up to 8.1.4 and it has tons of excellent 
performance enhancements as well as helpful features such as integrated 
autovacuum, point in time recovery backups, etc.


Also, I see that you're running with fsync = false.  That's quite dangerous 
especially on a production system.





CREATE TEMPORARY TABLE fttemp1600384653 AS SELECT * FROM ftoneway LIMIT 0;

INSERT INTO fttemp1600384653 SELECT epId, TO_TIMESTAMP(start,
'-MM-DD HH24:00:00.0')::timestamp AS start, 60 AS consolidation,
SUM(cnt) AS cnt FROM ftone WHERE consolidation = 0 AND start 
TO_TIMESTAMP('2006-06-27 18:43:27.391103+1000', '-MM-DD
HH24:00:00.0')::timestamp;

DELETE FROM ONLY ftone WHERE ftoneway.epId= fttemp1600384653.epId;

The only changes I've made to the default postgresql.comf file are listed 
below:


LC_MESSAGES = 'en_US'
LC_MONETARY = 'en_US'
LC_NUMERIC = 'en_US'
LC_TIME = 'en_US'
tcpip_socket = true
max_connections = 20
effective_cache_size = 32768
wal_buffers = 128
fsync = false
shared_buffers = 3000
max_fsm_relations = 1
max_fsm_pages = 10

The tables are around a million rows but when when I run against
tables of a few hundred thousand rows it still takes tens of minutes
with high CPU.  My database does have a lot of tables (can be several
thousand), can that cause performance issues?

Thanks,
  Neil

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

 http://www.postgresql.org/docs/faq




--
Jeff Frost, Owner   [EMAIL PROTECTED]
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] VACUUM vs. REINDEX

2006-07-07 Thread Jeff Frost

On Fri, 7 Jul 2006, William Scott Jordan wrote:


Hi Jeff,

We are running ANALYZE with the hourly VACUUMs.  Most of the time the VACUUM 
for this table looks like this:



INFO:  vacuuming public.event_sums
INFO:  index event_sums_event_available now contains 56121 row versions in 
2256 pages

DETAIL:  102936 index row versions were removed.
1777 index pages have been deleted, 1635 are currently reusable.
CPU 0.03s/0.16u sec elapsed 1.04 sec.
INFO:  index event_sums_date_available now contains 56121 row versions in 
5504 pages

DETAIL:  102936 index row versions were removed.
2267 index pages have been deleted, 2202 are currently reusable.
CPU 0.15s/0.25u sec elapsed 13.91 sec.
INFO:  index event_sums_price_available now contains 56121 row versions in 
4929 pages

DETAIL:  102936 index row versions were removed.
149 index pages have been deleted, 149 are currently reusable.
CPU 0.13s/0.33u sec elapsed 0.51 sec.
INFO:  event_sums: removed 102936 row versions in 3796 pages
DETAIL:  CPU 0.31s/0.26u sec elapsed 0.92 sec.
INFO:  event_sums: found 102936 removable, 35972 nonremovable row versions 
in 170937 pages

DETAIL:  8008 dead row versions cannot be removed yet.
There were 4840134 unused item pointers.
0 pages are entirely empty.
CPU 5.13s/1.68u sec elapsed 209.38 sec.
INFO:  analyzing public.event_sums
INFO:  event_sums: 171629 pages, 3000 rows sampled, 7328 estimated total 
rows


Hmmm..I was looking for something that looks like this:

INFO:  free space map: 109 relations, 204 pages stored; 1792 total pages 
needed
DETAIL:  Allocated FSM size: 1000 relations + 2 pages = 182 kB shared 
memory.

VACUUM

Maybe 7.4 doesn't give this?  Or maybe you need to run vacuumdb -a -v to get 
it?







There are a few things in the second vacuum results that catch my eye, but I 
don't have the skill set to diagnose the problem.  I do know, however, that a 
REINDEX followed by a VACUUM FULL seems to make the symptoms go away for a 
while.


And I agree that we should upgrade to an 8.x version of PG, but as with many 
things in life time, money, and risk conspire against me.


You should still be able to use autovacuum, which might make you a little 
happier.  Which 7.4 version are you using?





-William




At 04:18 PM 7/7/2006, you wrote:

On Fri, 7 Jul 2006, William Scott Jordan wrote:


Hi all!

Can anyone explain to me what VACUUM does that REINDEX doesn't?  We have a 
frequently updated table on Postgres 7.4 on FC3 with about 35000 rows 
which we VACUUM hourly and VACUUM FULL once per day.  It seem like the 
table still slows to a crawl every few weeks.  Running a REINDEX by itself 
or a VACUUM FULL by itself doesn't seem to help, but running a REINDEX 
followed immediately by a VACUUM FULL seems to solve the problem.


I'm trying to decide now if we need to include a daily REINDEX along with 
our daily VACUUM FULL, and more importantly I'm just curious to know why 
we should or shouldn't do that.


Any information on this subject would be appreciated.


William,

If you're having to VACUUM FULL that often, then it's likely your FSM 
settings are too low.  What does the last few lines of VACUUM VERBOSE say? 
Also, are you running ANALYZE with the vacuums or just running VACUUM?  You 
still need to run ANALYZE to update the planner statistics, otherwise 
things might slowly grind to a halt.  Also, you should probably consider 
setting up autovacuum and upgrading to 8.0 or 8.1 for better performance 
overall.



--
Jeff Frost, Owner   [EMAIL PROTECTED]
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954






--
Jeff Frost, Owner   [EMAIL PROTECTED]
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Recovery will take 10 hours

2006-04-20 Thread Jeff Frost


Brendan,

Is your NFS share mounted hard or soft?  Do you have space to copy the files 
locally?  I suspect you're seeing NFS slowness in your restore since you 
aren't using much in the way of disk IO or CPU.


-Jeff

On Thu, 20 Apr 2006, Brendan Duddridge wrote:

Oops... forgot to mention that both files that postgres said were missing are 
in fact there:


A partial listing from our wal_archive directory:

-rw--- 1 postgres staff 4971129 Apr 19 20:08 0001018F0036.gz
-rw--- 1 postgres staff 4378284 Apr 19 20:09 0001018F0037.gz

There didn't seem to be any issues with the NFS mount. Perhaps it briefly 
disconnected and came back right away.



Thanks!



Brendan Duddridge | CTO | 403-277-5591 x24 |  [EMAIL PROTECTED]

ClickSpace Interactive Inc.
Suite L100, 239 - 10th Ave. SE
Calgary, AB  T2G 0V9

http://www.clickspace.com

On Apr 20, 2006, at 5:11 PM, Brendan Duddridge wrote:


Hi Jeff,

The WAL files are stored on a separate server and accessed through an NFS 
mount located at /wal_archive.


However, the restore failed about 5 hours in after we got this error:

[2006-04-20 16:41:28 MDT] LOG: restored log file 0001018F0034 
from archive
[2006-04-20 16:41:35 MDT] LOG: restored log file 0001018F0035 
from archive
[2006-04-20 16:41:38 MDT] LOG: restored log file 0001018F0036 
from archive
sh: line 1: /wal_archive/0001018F0037.gz: No such file or 
directory
[2006-04-20 16:41:46 MDT] LOG: could not open file 
pg_xlog/0001018F0037 (log file 399, segment 55): No such file 
or directory

[2006-04-20 16:41:46 MDT] LOG: redo done at 18F/36FFF254
sh: line 1: /wal_archive/0001018F0036.gz: No such file or 
directory
[2006-04-20 16:41:46 MDT] PANIC: could not open file 
pg_xlog/0001018F0036 (log file 399, segment 54): No such file 
or directory
[2006-04-20 16:41:46 MDT] LOG: startup process (PID 9190) was terminated by 
signal 6
[2006-04-20 16:41:46 MDT] LOG: aborting startup due to startup process 
failure

[2006-04-20 16:41:46 MDT] LOG: logger shutting down



The /wal_archive/0001018F0037.gz is there accessible on the NFS 
mount.


Is there a way to continue the restore process from where it left off?

Thanks,


Brendan Duddridge | CTO | 403-277-5591 x24 |  [EMAIL PROTECTED]

ClickSpace Interactive Inc.
Suite L100, 239 - 10th Ave. SE
Calgary, AB  T2G 0V9

http://www.clickspace.com

On Apr 20, 2006, at 3:19 PM, Jeff Frost wrote:


On Thu, 20 Apr 2006, Brendan Duddridge wrote:


Hi,

We had a database issue today that caused us to have to restore to our 
most recent backup. We are using PITR so we have 3120 WAL files that need 
to be applied to the database.


After 45 minutes, it has restored only 230 WAL files. At this rate, it's 
going to take about 10 hours to restore our database.


Most of the time, the server is not using very much CPU time or I/O time. 
So I'm wondering what can be done to speed up the process?


Brendan,

Where are the WAL files being stored and how are they being read back?

--
Jeff Frost, Owner   [EMAIL PROTECTED]
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly





---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

 http://www.postgresql.org/docs/faq





---(end of broadcast)---
TIP 6: explain analyze is your friend



--
Jeff Frost, Owner   [EMAIL PROTECTED]
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Best OS Configuration for Dual Xeon w/4GB

2006-03-21 Thread Jeff Frost

On Wed, 22 Mar 2006, Mark Kirkwood wrote:


Adam Witney wrote:



[EMAIL PROTECTED]:43]~:15sudo diskinfo -vt /dev/mirror/gm0


Can anyone point me to where I can find diskinfo or an equivalent to run on
my debian system, I have been googling for the last hour but can't find it!
I would like to analyse my own disk setup for comparison


I guess you could use hdparm (-t or -T flags do a simple benchmark).

Though iozone or bonnie++ are probably better.


You might also have a look at lmdd for sequential read/write performance from 
the lmbench suite: http://sourceforge.net/projects/lmbench


As numbers from lmdd are seen on this frequently.

--
Jeff Frost, Owner   [EMAIL PROTECTED]
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[PERFORM] motherboard recommendations

2006-03-21 Thread Jeff Frost
It's time to build a new white box postgresql test box/workstation.  My Athlon 
XP system is getting a little long in the tooth.  Have any of you performance 
folks evaluated the Socket 939 boards on the market these days?  I'd like to 
find something that doesn't have terrible SATA disk performance.  I'm planning 
to install Gentoo x86_64 on it and run software raid, so I won't be using the 
fakeraid controllers as raid.  I have been eyeing the Abit AN8 32X board, but 
I don't really need SLI, though having an extra PCI-e might be nice in the 
future.


If you respond off-list, I'll summarize and post the results back.

Thanks for any input.

--
Jeff Frost, Owner   [EMAIL PROTECTED]
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


[PERFORM] Open request for benchmarking input

2005-11-26 Thread Jeff Frost
Did you folks see this article on Slashdot with a fellow requesting input on 
what sort of benchmarks to run to get a good Postgresql vs Mysql dataset? 
Perhaps this would be a good opportunity for us to get some good benchmarking 
done.  Here's the article link and top text:


http://ask.slashdot.org/article.pl?sid=05/11/26/0317213

 David Lang asks: With the release of MySQL 5.0, PostgreSQL 8.1, and the flap 
over Oracle purchasing InnoDB, the age old question of performance is coming 
up again. I've got some boxes that were purchased for a data warehouse project 
that isn't going to be installed for a month or two, and could probably 
squeeze some time in to do some benchmarks on the machines. However, the 
question is: what should be done that's reasonably fair to both MySQL and 
PostgreSQL? We all know that careful selection of the benchmark can seriously 
skew the results, and I want to avoid that (in fact I would consider it close 
to ideal if the results came out that each database won in some tests). I 
would also not like to spend time generating the benchmarks only to have the 
losing side accuse me of being unfair. So, for both MySQL and PostgreSQL 
advocates, what would you like to see in a series of benchmarks?


 The hardware I have available is as follows:

* 2x dual Opteron 8G ram, 2x144G 15Krpm SCSI
* 2x dual Opteron 8G ram, 2x72G 15Krpm SCSI
* 1x dual Opteron 16G ram, 2x36G 15Krpm SCSI 16x400G 7200rpm SATA

I would prefer to use Debian Sarge as the base install of the systems (with 
custom built kernels), but would compile the databases from source rather then 
using binary packages.


For my own interests, I would like to at least cover the following bases: 32 
bit vs 64 bit vs 64 bit kernel + 32 bit user-space; data warehouse type tests 
(data  memory); and web prefs test (active data RAM)


What specific benchmarks should be run, and what other things should be 
tested? Where should I go for assistance on tuning each database, evaluating 
the benchmark results, and re-tuning them?


---
Jeff Frost, Owner 
[EMAIL PROTECTED]

Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Temporary Table

2005-11-07 Thread Jeff Frost

You can use the vacuumdb external command.  Here's an example:

vacuumdb --full --analyze --table mytablename mydbname



On Tue, 8 Nov 2005, Christian Paul B. Cosinas wrote:


But How Can I put this in the Cron of my Linux Server?
I really don't have an idea :)
What I want to do is to loop around all the databases in my server and
execute the vacuum of these 3 tables in each tables.

-Original Message-
From: Joshua D. Drake [mailto:[EMAIL PROTECTED]
Sent: Tuesday, November 08, 2005 2:11 AM
To: Christian Paul B. Cosinas
Cc: 'Alvaro Nunes Melo'; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Temporary Table

Christian Paul B. Cosinas wrote:

I try to run this command in my linux server.
VACUUM FULL pg_class;
VACUUM FULL pg_attribute;
VACUUM FULL pg_depend;

But it give me the following error:
-bash: VACUUM: command not found


That needs to be run from psql ...







I choose Polesoft Lockspam to fight spam, and you?
http://www.polesoft.com/refer.html


---(end of
broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org



I choose Polesoft Lockspam to fight spam, and you?
http://www.polesoft.com/refer.html


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster



--
Jeff Frost, Owner   [EMAIL PROTECTED]
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PERFORM] Status of Opteron vs Xeon

2005-10-06 Thread Jeff Frost
What's the current status of how much faster the Opteron is compared to the 
Xeons?  I know the Opterons used to be close to 2x faster, but is that still 
the case?  I understand much work has been done to reduce the contect 
switching storms on the Xeon architecture, is this correct?


--
Jeff Frost, Owner   [EMAIL PROTECTED]
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Massive performance issues

2005-09-01 Thread Jeff Frost

Well I've got 1GB of RAM, but from analysis of its use, a fair amount
isn't being used. About 50% is actually in use by applications and about
half of the rest is cache and the rest isn't being used. Has this to do
with the max_fsm_pages and max_fsm_relations settings? I've pretty much
not touched the configuration and it's the standard Debian package.


Matt, have a look at the annotated postgresql.conf for 7.x here:

http://www.varlena.com/varlena/GeneralBits/Tidbits/annotated_conf_e.html

If you have the default settings, you're likely hampering yourself quite a 
bit.  You probably care about shared_buffers, sort_mem, 
vacuum_mem, max_fsm_pages, effective_cache_size


Also, you may want to read the  PostgreSQL 8.0 Performance Checklist.  Even 
though it's for 8.0, it'll give you good ideas on what to change in 7.4.  You 
can find it here: http://www.powerpostgresql.com/PerfList/


--
Jeff Frost, Owner   [EMAIL PROTECTED]
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Whence the Opterons?

2005-05-09 Thread Jeff Frost
On Mon, 9 May 2005, John A Meinel wrote:
Well, I'm speaking more from what I remember reading, than personal
testing. Probably 50% is too high, but I thought I remembered it being
more general than just specific cases.
Anadtech had a benchmark here:
http://www.anandtech.com/linux/showdoc.aspx?i=2163p=2
It's a little old, as it's listing an Opteron 150 vs 3.6 Xeon, but it does
show that the opteron comes in almost twice as fast as the Xeon doing
Postgres.

--
Jeff Frost, Owner   [EMAIL PROTECTED]
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Opteron vs Xeon (Was: What to do with 6 disks?)

2005-04-20 Thread Jeff Frost
On Tue, 19 Apr 2005, J. Andrew Rogers wrote:
I don't know about 2.5x faster (perhaps on specific types of loads), but the 
reason Opterons rock for database applications is their insanely good memory 
bandwidth and latency that scales much better than the Xeon.  Opterons also 
have a ccNUMA-esque I/O fabric and two dedicated on-die memory channels *per 
processor* -- no shared bus there, closer to real UNIX server iron than a 
glorified PC.
Thanks J!  That's exactly what I was suspecting it might be.  Actually, I 
found an anandtech benchmark that shows the Opteron coming in at close to 2.0x 
performance:

http://www.anandtech.com/linux/showdoc.aspx?i=2163p=2
It's an Opteron 150 (2.4ghz) vs. Xeon 3.6ghz from August.  I wonder if the 
differences are more pronounced with the newer Opterons.

-Jeff
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


[PERFORM] What to do with 6 disks?

2005-04-19 Thread Jeff Frost
Now that we've hashed out which drives are quicker and more money equals 
faster...

Let's say you had a server with 6 separate 15k RPM SCSI disks, what raid 
option would you use for a standalone postgres server?

a) 3xRAID1 - 1 for data, 1 for xlog, 1 for os?
b) 1xRAID1 for OS/xlog, 1xRAID5 for data
c) 1xRAID10 for OS/xlong/data
d) 1xRAID1 for OS, 1xRAID10 for data
e) .
I was initially leaning towards b, but after talking to Josh a bit, I suspect 
that with only 4 disks the raid5 might be a performance detriment vs 3 raid 1s 
or some sort of split raid10 setup.

--
Jeff Frost, Owner   [EMAIL PROTECTED]
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faq


[PERFORM] DATA directory on network attached storage

2005-04-10 Thread Jeff Frost
We are currently considering the possibility of creating a warm standby 
machine utilizing heartbeat and a network attached storage device for the DATA 
directory.  The idea being that the warm standby machine has its postmaster 
stopped.  When heartbeat detects the death of the master server, the 
postmaster is started up on the warm standby using the shared DATA directory. 
Other than the obvious problems of both postmasters inadvertently attempting 
access at the same time, I'm curious to know if anyone has tried any similar 
setups and what the experiences have been.  Specifically is the performance of 
gigE good enough to allow postgres to perform under load with an NFS mounted 
DATA dir?  Are there other problems I haven't thought about?  Any input would 
be greatly appreciated.

Thanks!
--
Jeff Frost, Owner   [EMAIL PROTECTED]
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])