Re: [PERFORM] Tuning for warm standby

2007-09-28 Thread Merlin Moncure
On 9/27/07, Kevin Kempter [EMAIL PROTECTED] wrote:
 Hi All;

 I'm preparing to fire up WAL archiving on 8 production servers We will follow
 up with implementing a warm standby scenariio.

 Does anyone have any thoughts per how to maximize performance, yet minimize
 the potential for data loss assuming we were not able to retrieve the final
 un-archived WAL segment from the original pg_xlog dir in the case of a crash?

the standby mechanism is actually very simple and there is very little
to do for efficient operation.  all the hard work is done inside the
wal algorithms and from the outside it works like a fancy rsync.

some performance tips:
* don't use encrypted channel (scp) to transfer wal segments from
primary to secondary.
* make sure the link between servers is gigabit at least.  bonded
ethernet couldn't hurt if you can easily fit it in your topology
* do not directly write wal segments (nfs, cifs) to the remote folder.
 whatever you use, make sure it puts files into the remote folder
atomically unless it is specifically designed to handle wal segments.
* there's not much to do on the standby side.

I've set up a few warm standby systems with pg_standby...it works
great.  I find it works best using link mode (-l) and at lest 256 wal
file before it prunes.

'archive_timeout' is a way to guarantee your last transferred file is
no older than 'x' seconds.  I am not a big fan of setting this...most
of the servers I work with are fairly busy and I'd prefer to let the
server decide when to flip files.  I would only consider setting this
in a server that had very little writing going on but what did get
written was important.

There is a new player in warm standby systems (developed by skype!):
http://pgfoundry.org/projects/skytools/

I haven't looked at it yet, but supposedly it can stream WAL files
over real time.  definately worth looking in to.  This would moot some
of the other advice I've given here.

merlin

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] Postgres 7.4.2 hanging when vacuum full is run

2007-09-28 Thread Radhika S
Hi -

This has been happening more recently. Our database  hangs after a VACUUM
and is unresponsive when we come in next morning.

The vacuum job runs at 03:00 am daily.
The command is : /usr/local/pgsql/bin/vacuumdb --full -d DbName

Also, what exactly does this mean VACUUM waiting. Is there a reason why it
is never emerging from the VACUUM job?
I understand that doing a vacuumdb --full causes the tables to lock (not
really sure about the workings of vacuum).

Any light on this would be really appreciated.

Thanks,
Radhika


Below is what ps -ef |grep postgres shows:

 5530 ?S  0:13 /usr/local/pgsql/bin/postmaster -i
 5534 ?S  0:01 postgres: stats buffer process
 5535 ?S  0:04 postgres: stats collector process
 5621 ?S  0:53 postgres: slony myDB 10.142.20.50 idle
 5626 ?S  0:51 postgres: slony myDB 10.142.20.50 idle
 5627 ?S  0:34 postgres: slony myDB 10.142.20.50 idle
 5628 ?S  5:40 postgres: slony myDB 10.142.20.50 idle
 5637 ?S  2:09 postgres: slony myDB 10.132.20.26 idle
 5638 ?S  1:56 postgres: slony myDB 10.132.20.26 idle
 5745 ?S 42:08 postgres: abc myDB [local] idle
20774 ?S  4:29 postgres: abc myDB [local] idle
20775 ?S  0:00 postgres: abc myDB [local] idle in
transaction
20776 ?S  0:00 postgres: abc myDB [local] idle
17509 ?S  0:06 postgres: abc myDB [local] VACUUM waiting
24656 ?S  0:00 postgres: abc myDB [local] INSERT waiting
30489 ?S  0:00 postgres: abc myDB [local] SELECT waiting
30637 ?S  0:00 postgres: abc myDB [local] UPDATE waiting
30647 ?S  0:00 postgres: abc myDB [local] UPDATE waiting
30668 ?S  0:00 postgres: abc myDB [local] UPDATE waiting


-- 
It is all a matter of perspective. You choose your view by choosing where to
stand. --Larry Wall


Re: [PERFORM] Postgres 7.4.2 hanging when vacuum full is run

2007-09-28 Thread Scott Marlowe
On top of what Vivek said, you need to update your pg install.  7.4.2
had a few data eating bugs if I remember correctly.  7.4  branch is up
to 7.4.18, and those are a lot of bug fixes (2+ years) you're missing.
 If one of those bugs eats your data, don't expect any sympathy.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] sequence query performance issues

2007-09-28 Thread Tom Lane
Richard Huxton [EMAIL PROTECTED] writes:
 Hmm - why is it doing that?

I'm betting that the OP's people.uid column is not an integer.  Existing
PG releases can't use hashed subplans for cross-data-type comparisons
(8.3 will be a bit smarter).

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Postgres 7.4.2 hanging when vacuum full is run

2007-09-28 Thread Vivek Khera


On Sep 28, 2007, at 10:28 AM, Radhika S wrote:

20775 ?S  0:00 postgres: abc myDB [local] idle in  
transaction

20776 ?S  0:00 postgres: abc myDB [local] idle
17509 ?S  0:06 postgres: abc myDB [local] VACUUM  
waiting
24656 ?S  0:00 postgres: abc myDB [local] INSERT  
waiting


You're vacuum is probably waiting for the idle in transaction  
session to finish, so it can clean up.  It can't take a lock if your  
transaction has locks.  Your other tasks are probably waiting behind  
the vacuum.  Don't leave your transactions open for a long time.  it  
is bad.




Re: [PERFORM] Searching for the cause of a bad plan

2007-09-28 Thread Csaba Nagy
On Thu, 2007-09-27 at 11:07 -0700, Ron Mayer wrote:
 Csaba Nagy wrote:
  
  Well, my problem was actually solved by rising the statistics target,
 
 Would it do more benefit than harm if postgres increased the
 default_statistics_target?
 
 I see a fair number of people (myself included) asking questions who's
 resolution was to ALTER TABLE SET STATISTICS; and I think relatively
 fewer (if any?) people concerned about space in pg_statistic
 or people improving analyze time by reducing the statistics target.

Well, the cost of raising the statistics target is far from zero: with
all defaults the analyze time was ~ 10 seconds, with one column set to
100 was ~ 1.5 minutes, with one column set to 1000 was 15 minutes for
the table in question (few 100M rows). Of course the IO load must have
been proportional to the timings... so I'm pretty sure the current
default is serving well most of the situations.

Cheers,
Csaba.





---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] sequence query performance issues

2007-09-28 Thread Richard Huxton

Peter Koczan wrote:

Hello,

I have a weird performance issue with a query I'm testing. Basically,
I'm trying to port a function that generates user uids, and since
postgres offers a sequence generator function, I figure I'd take
advantage of that. Basically, I generate our uid range, filter out
those which are in use, and randomly pick however many I need.
However, when I run it it takes forever (10 minutes and I get nothing
so I cancelled the query) and cpu usage on the server is maxed out.


I'd suspect either an unconstrained join or looping through seq-scans.


Here's my query (I'll post the explain output later so as not to
obscure my question):
= select a.uid from generate_series(1000, 32767) as a(uid) where
a.uid not in (select uid from people) order by random() limit 1;


I let this run to it's conclusion and it's the materialize. If you see, 
it's materializing the result-set once for every value it tests against 
(loops=31768)


  QUERY 
PLAN

--
 Limit  (cost=62722.66..62722.67 rows=1 width=4) (actual 
time=189963.485..189963.485 rows=0 loops=1)
   -  Sort  (cost=62722.66..62723.91 rows=500 width=4) (actual 
time=189961.063..189961.063 rows=0 loops=1)

 Sort Key: random()
 -  Function Scan on generate_series a  (cost=184.00..62700.25 
rows=500 width=4) (actual time=189960.797..189960.797 rows=0 loops=1)

   Filter: (NOT (subplan))
   SubPlan
 -  Materialize  (cost=184.00..284.00 rows=1 
width=2) (actual time=0.000..2.406 rows=9372 loops=31768)
   -  Seq Scan on people  (cost=0.00..174.00 
rows=1 width=2) (actual time=0.055..7.181 rows=1 loops=1)

 Total runtime: 189967.150 ms

Hmm - why is it doing that? It's clearly confused about something.

I suspect the root of the problem is that it doesn't know what 
generate_series() will return. To the planner it's just another 
set-returning function.


This means it's getting (i) the # of rows wrong (rows=500) and also 
doesn't know (ii) there will be no nulls or (iii) what the range of 
values returned will be.


Easy enough to test:

CREATE TEMP TABLE all_uids (uid int2);
INSERT INTO all_uids SELECT generate_series(1000,32767);
ANALYSE all_uids;

EXPLAIN ANALYSE SELECT a.uid
FROM all_uids a
WHERE a.uid NOT IN (SELECT uid FROM people)
ORDER BY random() LIMIT 1;
  QUERY PLAN
--
 Limit  (cost=1884.14..1884.14 rows=1 width=2) (actual 
time=39.019..39.019 rows=0 loops=1)
   -  Sort  (cost=1884.14..1923.85 rows=15884 width=2) (actual 
time=39.014..39.014 rows=0 loops=1)

 Sort Key: random()
 -  Seq Scan on all_uids a  (cost=199.00..775.81 rows=15884 
width=2) (actual time=38.959..38.959 rows=0 loops=1)

   Filter: (NOT (hashed subplan))
   SubPlan
 -  Seq Scan on people  (cost=0.00..174.00 rows=1 
width=2) (actual time=0.046..7.282 rows=1 loops=1)

 Total runtime: 39.284 ms

That's more sensible.

I'd actually use a table to track unused_uids and have triggers that 
kept everything in step. However, if you didn't want to do that, I'd try 
a left-join.


EXPLAIN ANALYSE
SELECT a.uid
FROM generate_series(1000, 32767) as a(uid) LEFT JOIN people p ON 
a.uid=p.uid

WHERE
p.uid IS NULL
ORDER BY random() LIMIT 1;

Not ideal, but like I say I'd use an unused_uids table. If nothing else, 
I'd be wary about immediately re-using a uid - your db+application might 
cope fine, but these values have a tendency to be referred to elsewhere.


HTH
--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] sequence query performance issues

2007-09-28 Thread Richard Huxton

Tom Lane wrote:

Richard Huxton [EMAIL PROTECTED] writes:

Hmm - why is it doing that?


I'm betting that the OP's people.uid column is not an integer.  Existing
PG releases can't use hashed subplans for cross-data-type comparisons
(8.3 will be a bit smarter).


Looked like an int2 to me (width=2, max value ~ 32k)

--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] sequence query performance issues

2007-09-28 Thread Peter Koczan
  Hmm - why is it doing that?

 I'm betting that the OP's people.uid column is not an integer.  Existing
 PG releases can't use hashed subplans for cross-data-type comparisons
 (8.3 will be a bit smarter).

*light bulb* Ahhh, that's it. So, I guess the solution is either
to cast the column or wait for 8.3 (which isn't a problem since the
port won't be done until 8.3 is released anyway).

Thanks again.

Peter

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Searching for the cause of a bad plan

2007-09-28 Thread Csaba Nagy
 Just an idea, but with the 8.3 concurrent scan support would it be
 possible to hang a more in depth analyze over exisiting sequential
 scans. Then it would be a lower cost to have higher resolution in
 the statistics because the I/O component would be hidden.

The biggest problem with that is that it wouldn't be deterministic...
the table in question from my original post is never scanned
sequentially in normal operation. The other way around is also possible,
when sequential scans are too frequent, in that case you wouldn't want
to also analyze all the time. So there would be a need for logic of when
to analyze or not with a sequential scan and when do it proactively
without waiting for one... and I'm not sure it will be worth the
complexity.

I think it would me much more productive if some long running query
tracking combined with a background planner thread would do targeted
analyzes for specific correlations/distributions/conditions based on
what queries are actually running on the system.

Cheers,
Csaba.



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PERFORM] OOM Errors as a result of table inheritance and a bad plan(?)

2007-09-28 Thread Arctic Toucan
In keeping with some of the recent threads regarding the planner...
 
I have a fair sized data warehouse in which I am trying to perform an 
aggregation, but getting OOM errors in Postgres(8.2.4).
I believe the reason for the OOM is that Postgres is attempting to do a hash 
aggregation, but it has grossly underestimated the rows resulting from the 
aggregation.
The data in the database is very uniformly distributed so I don't believe that 
the table stats are the cause of the problem. 
This may be related to table inheritance, and can be demonstrated pretty easily.
 
CREATE TABLE foo(a INT);ANALYZE foo;
CREATE TABLE foo_1() INHERITS(foo);insert into foo_1 select  
generate_series(1,10);insert into foo_1 select  
generate_series(1,10);insert into foo_1 select  
generate_series(1,10);ANALYZE foo_1;
CREATE TABLE foo_2() INHERITS(foo);insert into foo_2 select  
generate_series(1,10);insert into foo_2 select  
generate_series(1,10);insert into foo_2 select  
generate_series(1,10);ANALYZE foo_2;
-- If I query a particular partition, the plan estimate for the hash aggregate 
is good
EXPLAIN ANALYZE SELECT a,COUNT(*) from foo_1 group by a;
 HashAggregate  (cost=5822.00..7061.01 rows=99121 width=4) (actual 
time=554.556..657.121 rows=10 loops=1)   -  Seq Scan on foo_1  
(cost=0.00..4322.00 rows=30 width=4) (actual time=0.014..203.290 
rows=30 loops=1) Total runtime: 712.211 ms
-- If I query the base table, the plan estimate for the hash aggregate is off 
by several orders of magnitude
EXPLAIN ANALYZE SELECT a,COUNT(*) from foo group by a;
HashAggregate  (cost=11686.10..11688.60 rows=200 width=4) (actual 
time=1724.188..1826.630 rows=10 loops=1)   -  Append  (cost=0.00..8675.40 
rows=602140 width=4) (actual time=0.016..1045.134 rows=60 loops=1) 
-  Seq Scan on foo  (cost=0.00..31.40 rows=2140 width=4) (actual 
time=0.001..0.001 rows=0 loops=1) -  Seq Scan on foo_1 foo  
(cost=0.00..4322.00 rows=30 width=4) (actual time=0.012..205.130 
rows=30 loops=1) -  Seq Scan on foo_2 foo  (cost=0.00..4322.00 
rows=30 width=4) (actual time=0.011..203.542 rows=30 loops=1) Total 
runtime: 1879.550 ms(6 rows)
-- Is there something magical about the hash aggregate estimate of 200 rows?
-- I can have 30,000 or 300,000 rows in each child partition table and multiple 
partition's with different values of a and yet it always come up with 200.
-- eg.
create table foo_3() inherits(foo);insert into foo_3 select 
generate_series(10,30);analyze foo_3;
EXPLAIN ANALYZE SELECT a,COUNT(*) from foo group by a;
  QUERY PLAN
--
 HashAggregate  (cost=15568.12..15570.62 rows=200 width=4) (actual 
time=2386.813..2691.254 rows=30 loops=1)   -  Append  (cost=0.00..11557.41 
rows=802141 width=4) (actual time=0.016..1403.121 rows=81 loops=1) 
-  Seq Scan on foo  (cost=0.00..31.40 rows=2140 width=4) (actual 
time=0.002..0.002 rows=0 loops=1) -  Seq Scan on foo_1 foo  
(cost=0.00..4322.00 rows=30 width=4) (actual time=0.013..201.549 
rows=30 loops=1) -  Seq Scan on foo_2 foo  (cost=0.00..4322.00 
rows=30 width=4) (actual time=0.010..211.332 rows=30 loops=1) 
-  Seq Scan on foo_3 foo  (cost=0.00..2882.01 rows=21 width=4) (actual 
time=0.011..137.262 rows=21 loops=1) Total runtime: 2851.990 ms
Is this a bug, or some subtlety of the Postgres query planner?
 
In my particular case, I am doing a join out to another small table as part of 
the aggregation and using constraint exclusion on the partitions, but I believe 
the cause of my problem is the same.
 
I am running on 64bit FreeBSD/Postgres 8.2.4 on a machine with 8GB of memory.
 
An explain of the query and resulting OOM diagnostics follow:
 
The aggregation will result in 5,000,000 rows, not 5,000.In the stats_dtl table 
there are 12 observations(hourly) for each customerThere are 125 different 
policy_id and 25 different policy_group_id'sPolicy's and policy_groups are even 
distributed across all customers
 
 userquery Scan table_4760  (cost=2897243.60..2897418.60 rows=5000 width=152)   
-  HashAggregate  (cost=2897243.60..2897368.60 rows=5000 width=40) -  
Hash Join  (cost=7.81..2241002.00 rows=37499520 width=40)   Hash 
Cond: (public.customer_stats_dtl.policy_id = policy.policy_id)   - 
 Append  (cost=0.00..1641001.87 rows=5232 width=40) -  
Seq Scan on customer_stats_dtl  (cost=0.00..22.45 rows=4 width=40)  
 Filter: ((period_start = '2007-09-08 20:00:00-04'::timestamp with 
time zone) AND (period_start  '2007-09-09 08:00:00-04'::timestamp with time 
zone)) -  Seq Scan on customer_stats_dtl_027 
customer_stats_dtl  (cost=0.00..1640979.42 rows=5228 width=40)  

[PERFORM] Non-blocking vacuum full

2007-09-28 Thread Peter Schuller
Hello,

I was wondering whether any thought has previously been given to
having a non-blocking vacuum full, in the sense of space reclamation
and table compactation.

The motivation is that it is useful to be able to assume that
operations that span a table will *roughtly* scale linearly with the
size of the table. But when you have a table that over an extended
period of time begins small, grows large, and grows small again (where
large might be, say, 200 GB), that assumption is most definitely
not correct when you're on the downward slope of that graph. Having
this assumption remain true simplifies things a lot for certain
workloads (= my particular work load ;)).

I have only looked very very briefly at the PG code so I don't know
how far fetched it is, but my thought was that it should be possible
to have a slow background process (similar to normal non-full vacuums
nows) that would, instead of registering dead tuples in the FSM, move
live tuples around.

Combine that slow moving operations with a policy to a new tuple space
allocation policy that prefers earlier locations on-disk, it should in
time result in a situation where the physical on-disk file contains
only dead tuples after a certain percentage location. At this point
the file can be truncated, giving space back to the OS as well as
eliminating all that dead space from having to be covered by
sequential scans on the table.

This does of course increase the total cost of all updates and
deletes, but would be very useful in some senarios. It also has the
interesting property that the scan for live tuples to move need not
touch the entire table to be effective; it could by design be applied
to the last n percentage of the table, where n would be scaled
appropriately with the frequency of the checks relative to
update/insert frequency.

Other benefits:

  * Never vacuum full - EVER. Not even after discovering too small
max_fsm_pages or too infrequent vacuums and needing to retroactively
shrink the table.
  * Increased locality in general; even if one does not care about
the diskspace or sequential scanning. Particularly relevant for low-update 
frequency
tables suffering from sudden shrinkage, where a blocking VACUUM FULL Is not
acceptable.
  * Non-blocking CLUSTER is perhaps suddently more trivial to implement?
Or at least SORTOFCLUSTER when you want it for reasons other than
perfect order (mostly sorted).

Opinions/thoughts?

-- 
/ Peter Schuller

PGP userID: 0xE9758B7D or 'Peter Schuller [EMAIL PROTECTED]'
Key retrieval: Send an E-Mail to [EMAIL PROTECTED]
E-Mail: [EMAIL PROTECTED] Web: http://www.scode.org



pgpFbOXmSf908.pgp
Description: PGP signature


Re: [PERFORM] Non-blocking vacuum full

2007-09-28 Thread Heikki Linnakangas
Peter Schuller wrote:
 I have only looked very very briefly at the PG code so I don't know
 how far fetched it is, but my thought was that it should be possible
 to have a slow background process (similar to normal non-full vacuums
 nows) that would, instead of registering dead tuples in the FSM, move
 live tuples around.

What you've described is actually very close to VACUUM FULL. VACUUM FULL
needs to take an exclusive lock to lock out concurrent scanners that
might miss or see a tuple twice, when a live tuple is moved. That's the
fundamental problem you need to solve.

I think it's doable, if you take a copy of the tuple, and set the ctid
pointer on the old one like an UPDATE, and wait until the old tuple is
no longer visible to anyone before removing it. It does require some
changes to tuple visibility code. For example, a transaction running in
serializable mode shouldn't throw a serialization error when it tries to
update an old, moved row version, but follow the ctid pointer instead.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] OOM Errors as a result of table inheritance and a bad plan(?)

2007-09-28 Thread Tom Lane
Arctic Toucan [EMAIL PROTECTED] writes:
 -- Is there something magical about the hash aggregate estimate of 200 rows?

Yeah, it's the default :-(

 Is this a bug, or some subtlety of the Postgres query planner?

It's an, um, known deficiency --- the planner hasn't got any idea how to
construct aggregated statistics for an inheritance tree.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Non-blocking vacuum full

2007-09-28 Thread Ron Mayer
Heikki Linnakangas wrote:
 Peter Schuller wrote:
 to have a slow background process (similar to normal non-full vacuums
 ... 
 I think it's doable, if you take a copy of the tuple, and set the ctid
 pointer on the old one like an UPDATE, and wait until the old tuple is
 no longer visible to anyone before removing it. It does require some
 changes to tuple visibility code.

Wouldn't just having this slow background process
repeatedly alternating between
 update table set anycol=anycol where ctid  [some ctid near the end]
and running normal VACUUM statements do what the original poster
was asking?  And with 8.3, I guess also avoiding HOT?



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match