[PERFORM] Parsing VACUUM VERBOSE

2007-06-14 Thread Sabin Coanda
Hi there,

I'd like to understand completely the report generated by VACUUM VERBOSE.
Please tell me where is it documented ?

TIA,
Sabin 



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Best use of second controller with faster disks?

2007-06-14 Thread Vivek Khera


On Jun 13, 2007, at 10:36 PM, Francisco Reyes wrote:


FreeBSD, indeed.  The vendor, Partners Data Systems, did a wonderful


This one?
http://www.partnersdata.com



that's the one.

job ensuring that everything integrated well to the point of  
talking  with various FreeBSD developers, LSI engineers, etc., and  
sent me a  fully tested system end-to-end with a Sun X4100 M2, LSI  
4Gb Fibre  card, and their RAID array, with FreeBSD installed  
already.


Is there a management program in FreeBSD for the Areca card?
So I understand the setup you are describing..
Machine has Areca controller
Connects to external enclosure
Enclosure has LSI controller


In the past I've had systems with RAID cards: LSI and Adaptec. The  
LSI 320-2X is the fastest one I've ever had.  The adaptec ones suck  
because there is no management software for them on the newer cards  
for freebsd, especially under amd64.


The system I'm working on now is thus:

Sun X4100 M2 with an LSI 4Gb fibre channel card connected to an  
external self-contained RAID enclosure, the Triton RAID from Partners  
Data.  The Triton unit has in it an Areca SATA RAID controller and 16  
disks.



I have separate disks built-in to the system for boot.


How did you get FreeBSD to newfs such a large setup?
newfs -s /dev/raw-disk?


It is only 2Tb raw, 1.7Tb formatted :-)  I just used sysinstall to  
run fdisk, label, and newfs for me.  Since it is just postgres data,  
no file will ever be larger than 1Gb I didn't need to make any  
adjustments to the newfs parameters.




What are the speed/size of the disks?
7K rpm?


I splurged for the 10kRPM drives, even though they are smaller 150Gb  
each.



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Parsing VACUUM VERBOSE

2007-06-14 Thread Guillaume Smet

Sabin,

On 6/14/07, Sabin Coanda <[EMAIL PROTECTED]> wrote:

I'd like to understand completely the report generated by VACUUM VERBOSE.
Please tell me where is it documented ?


You can take a look to what I did for pgFouine:
http://pgfouine.projects.postgresql.org/vacuum.html

--
Guillaume

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Parsing VACUUM VERBOSE

2007-06-14 Thread Dimitri Fontaine
Le jeudi 14 juin 2007, Sabin Coanda a écrit :
> I'd like to understand completely the report generated by VACUUM VERBOSE.
> Please tell me where is it documented ?

Try the pgfouine reporting tool :
  http://pgfouine.projects.postgresql.org/
  http://pgfouine.projects.postgresql.org/reports/sample_vacuum.html

It's easier to understand the vacuum verbose output from the generated report.
-- 
dim


signature.asc
Description: This is a digitally signed message part.


Re: [PERFORM] [PG 8.1.0 / AIX 5.3] Vacuum processes freezing

2007-06-14 Thread Simon Riggs
On Wed, 2007-06-13 at 18:33 +0200, RESTOUX, Loïc wrote:

> 2) I believed that the poor performances during the vacuum freeze were due to 
> the obsolete data statistics. But after a full restart of the dabatase, 
> performances are good. Does PostgreSQL rebuild his statistics during startup 
> ? 

You probably don't need to run VACUUM FREEZE.

VACUUM FREEZE will thrash the disks much more than normal VACUUM. We're
improving that somewhat in 8.3, but the basic issue is that VACUUM
FREEZE cleans out more dead rows and so will dirty more data blocks.

Are you concurrently running DDL, Truncate or CLUSTER? That will
interfere with the operation of VACUUM.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Parsing VACUUM VERBOSE

2007-06-14 Thread Sabin Coanda
Hi Guillaume,

Very interesting !

Merci beaucoup,
Sabin 



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Parsing VACUUM VERBOSE

2007-06-14 Thread Y Sidhu

On 6/14/07, Dimitri Fontaine <[EMAIL PROTECTED]> wrote:


Le jeudi 14 juin 2007, Sabin Coanda a écrit:
> I'd like to understand completely the report generated by VACUUM
VERBOSE.
> Please tell me where is it documented ?

Try the pgfouine reporting tool :
  http://pgfouine.projects.postgresql.org/
  http://pgfouine.projects.postgresql.org/reports/sample_vacuum.html

It's easier to understand the vacuum verbose output from the generated
report.
--
dim



Can anyone share what value they have set log_min_duration_statement to?

--
Yudhvir Singh Sidhu
408 375 3134 cell


Re: [PERFORM] Parsing VACUUM VERBOSE

2007-06-14 Thread Guillaume Smet

On 6/14/07, Y Sidhu <[EMAIL PROTECTED]> wrote:

Can anyone share what value they have set log_min_duration_statement to?


It's OT but we use different values for different databases and needs.

On a very loaded database with a lot of complex queries (lots of join
on big tables, proximity queries, full text queries), we use 100 ms.
It logs ~ 300 000 queries. It allows us to detect big regressions or
new queries which are very slow.

On another database where I want to track transaction leaks, I'm
forced to put it to 0ms.

Basically, the answer is: set it to the lowest value you can afford
without impacting too much your performances (and if you use syslog,
use async I/O or send your log to the network).

--
Guillaume

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Parsing VACUUM VERBOSE

2007-06-14 Thread Y Sidhu

On 6/14/07, Guillaume Smet <[EMAIL PROTECTED]> wrote:


On 6/14/07, Y Sidhu <[EMAIL PROTECTED]> wrote:
> Can anyone share what value they have set log_min_duration_statement to?

It's OT but we use different values for different databases and needs.

On a very loaded database with a lot of complex queries (lots of join
on big tables, proximity queries, full text queries), we use 100 ms.
It logs ~ 300 000 queries. It allows us to detect big regressions or
new queries which are very slow.

On another database where I want to track transaction leaks, I'm
forced to put it to 0ms.

Basically, the answer is: set it to the lowest value you can afford
without impacting too much your performances (and if you use syslog,
use async I/O or send your log to the network).

--
Guillaume



I am trying to answer the question of how to tell if the cleanup of an index
may be locked by a long transaction. And in the bigger context, why vacuums
are taking long? What triggers them? I came across the following query which
shows one table 'connect_tbl'  with high "heap hits" and "low heap buffer %"
Now, 'heap' seems to be a memory construct. Any light shedding is
appreciated.

mydb=# SELECT
mydb-# 'HEAP:'||relname AS table_name,
mydb-# (heap_blks_read+heap_blks_hit) AS heap_hits,
   ROUND(((heap_blks_hit)::NUMERIC/(heap_blks_read+heap_blks_hit)*100),
2)
mydb-# ROUND(((heap_blks_hit)::NUMERIC/(heap_blks_read+heap_blks_hit)*100),
2)
mydb-# AS heap_buffer_percentage
mydb-# FROM pg_statio_user_tables
mydb-# WHERE(heap_blks_read+heap_blks_hit)>0
mydb-# UNION
mydb-# SELECT
mydb-# 'TOAST:'||relname,
mydb-# (toast_blks_read+toast_blks_hit),
mydb-#
ROUND(((toast_blks_hit)::NUMERIC/(toast_blks_read+toast_blks_hit)*100), 2)
mydb-# FROM pg_statio_user_tables
mydb-# WHERE(toast_blks_read+toast_blks_hit)>0
mydb-# UNION
mydb-# SELECT
mydb-# 'INDEX:'||relname,
mydb-# (idx_blks_read+idx_blks_hit),
mydb-# ROUND(((idx_blks_hit)::NUMERIC/(idx_blks_read+idx_blks_hit)*100), 2)
mydb-# FROM pg_statio_user_tables
mydb-# WHERE(idx_blks_read+idx_blks_hit)>0;
   table_name| heap_hits | heap_buffer_percentage
+--+--
HEAP:connect_tbl |890878 |  43.18
HEAP:tblbound_tbl | 43123 |  13.80
HEAP:tblcruel_tbl  |225819 |   6.98
INDEX:connect_tbl |287224 |  79.82
INDEX:tblbound_tbl | 81640 |  90.28
INDEX:tblcruel_tbl  |253014 |  50.73

--
Yudhvir Singh Sidhu
408 375 3134 cell


[PERFORM] Replication

2007-06-14 Thread Craig James

Looking for replication solutions, I find:

Slony-I
Seems good, single master only, master is a single point of failure,
no good failover system for electing a new master or having a failed
master rejoin the cluster.  Slave databases are mostly for safety or
for parallelizing queries for performance.  Suffers from O(N^2) 
communications (N = cluster size).


Slony-II
Seems brilliant, a solid theoretical foundation, at the forefront of
computer science.  But can't find project status -- when will it be
available?  Is it a pipe dream, or a nearly-ready reality?

PGReplication
Appears to be a page that someone forgot to erase from the old GBorg site.

PGCluster
Seems pretty good, but web site is not current, there are releases in use
that are not on the web site, and also seems to always be a couple steps
behind the current release of Postgres.  Two single-points failure spots,
load balancer and the data replicator.

Is this a good summary of the status of replication?  Have I missed any 
important solutions or mischaracterized anything?

Thanks!
Craig


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Replication

2007-06-14 Thread Joshua D. Drake

Craig James wrote:

Looking for replication solutions, I find:

Slony-I
Seems good, single master only, master is a single point of failure,
no good failover system for electing a new master or having a failed
master rejoin the cluster.  Slave databases are mostly for safety or
for parallelizing queries for performance.  Suffers from O(N^2) 
communications (N = cluster size).


Yep



Slony-II
Seems brilliant, a solid theoretical foundation, at the forefront of
computer science.  But can't find project status -- when will it be
available?  Is it a pipe dream, or a nearly-ready reality?



Dead



PGReplication
Appears to be a page that someone forgot to erase from the old GBorg site.



Dead



PGCluster
Seems pretty good, but web site is not current, there are releases in use
that are not on the web site, and also seems to always be a couple steps
behind the current release of Postgres.  Two single-points failure spots,
load balancer and the data replicator.



Slow as all get out for writes but cool idea

Is this a good summary of the status of replication?  Have I missed any 
important solutions or mischaracterized anything?




log shipping, closed source solutions



Thanks!
Craig


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster




---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Replication

2007-06-14 Thread Ben

Which replication problem are you trying to solve?

On Thu, 14 Jun 2007, Craig James wrote:


Looking for replication solutions, I find:

Slony-I
Seems good, single master only, master is a single point of failure,
no good failover system for electing a new master or having a failed
master rejoin the cluster.  Slave databases are mostly for safety or
for parallelizing queries for performance.  Suffers from O(N^2) 
communications (N = cluster size).


Slony-II
Seems brilliant, a solid theoretical foundation, at the forefront of
computer science.  But can't find project status -- when will it be
available?  Is it a pipe dream, or a nearly-ready reality?

PGReplication
Appears to be a page that someone forgot to erase from the old GBorg site.

PGCluster
Seems pretty good, but web site is not current, there are releases in use
that are not on the web site, and also seems to always be a couple steps
behind the current release of Postgres.  Two single-points failure spots,
load balancer and the data replicator.

Is this a good summary of the status of replication?  Have I missed any 
important solutions or mischaracterized anything?


Thanks!
Craig


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Replication

2007-06-14 Thread Alexander Staubo

On 6/15/07, Craig James <[EMAIL PROTECTED]> wrote:
[snip]

Is this a good summary of the status of replication?  Have I missed any 
important solutions or mischaracterized anything?


* Mammoth Replicator, commercial.

* Continuent uni/cluster, commercial
(http://www.continuent.com/index.php?option=com_content&task=view&id=212&Itemid=169).

* pgpool-II. Supports load-balancing and replication by implementing a
proxy that duplicates all updates to all slaves. It can partition data
by doing this, and it can semi-intelligently route queries to the
appropriate servers.

* Cybertec. This is a commercial packaging of PGCluster-II from an
Austrian company.

* Greenplum Database (formerly Bizgres MPP), commercial. Not so much a
replication solution as a way to parallelize queries, and targeted at
the data warehousing crowd. Similar to ExtenDB, but tightly integrated
with PostgreSQL.

* DRDB (http://www.drbd.org/), a device driver that replicates disk
blocks to other nodes. This works for failover only, not for scaling
reads. Easy migration of devices if combined with an NFS export.

* Skytools (https://developer.skype.com/SkypeGarage/DbProjects/SkyTools),
a collection of replication tools from the Skype people. Purports to
be simpler to use than Slony.

Lastly, and perhaps most promisingly, there's the Google Summer of
Code effort by Florian Pflug
(http://code.google.com/soc/postgres/appinfo.html?csaid=6545828A8197EBC6)
to implement true log-based replication, where PostgreSQL's
transaction logs are used to keep live slave servers up to date with a
master. In theory, such a system would be extremely simple to set up
and use, especially since it should, as far as I can see, also
transparently replicate the schema for you.

Alexander.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Replication

2007-06-14 Thread Kevin Grittner
>>> On Thu, Jun 14, 2007 at  6:14 PM, in message <[EMAIL PROTECTED]>,
Craig James <[EMAIL PROTECTED]> wrote: 
> Looking for replication solutions, I find:
> 
> Slony-I
> Slony-II
> PGReplication
> PGCluster
 
You wouldn't guess it from the name, but pgpool actually supports replication:
 
http://pgpool.projects.postgresql.org/
 



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Replication

2007-06-14 Thread Craig James

Thanks to all who replied and filled in the blanks.  The problem with the web 
is you never know if you've missed something.

Joshua D. Drake wrote:

Looking for replication solutions, I find...
Slony-II

Dead


Wow, I'm surprised.  Is it dead for lack of need, lack of resources, too 
complex, or all of the above?  It sounded like such a promising theoretical 
foundation.

Ben wrote:

Which replication problem are you trying to solve?


Most of our data is replicated offline using custom tools tailored to our loading pattern, but we 
have a small amount of "global" information, such as user signups, system configuration, 
advertisements, and such, that go into a single small (~5-10 MB) "global database" used 
by all servers.

We need "nearly-real-time replication," and instant failover.  That is, it's far more 
important for the system to keep working than it is to lose a little data.  Transactional integrity 
is not important.  Actual hardware failures are rare, and if a user just happens to sign up, or do 
"save preferences", at the instant the global-database server goes down, it's not a 
tragedy.  But it's not OK for the entire web site to go down when the one global-database server 
fails.

Slony-I can keep several slave databases up to date, which is nice.  And I 
think I can combine it with a PGPool instance on each server, with the master 
as primary and few Slony-copies as secondary.  That way, if the master goes 
down, the PGPool servers all switch to their secondary Slony slaves, and 
read-only access can continue.  If the master crashes, users will be able to do 
most activities, but new users can't sign up, and existing users can't change 
their preferences, until either the master server comes back, or one of the 
slaves is promoted to master.

The problem is, there don't seem to be any "vote a new master" type of tools 
for Slony-I, and also, if the original master comes back online, it has no way to know 
that a new master has been elected.  So I'd have to write a bunch of SOAP services or 
something to do all of this.

I would consider PGCluster, but it seems to be a patch to Postgres itself.  I'm 
reluctant to introduce such a major piece of technology into our entire system, 
when only one tiny part of it needs the replication service.

Thanks,
Craig

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


[PERFORM] Replication

2007-06-14 Thread Craig A. James

Looking for replication solutions, I find:

Slony-I
 Seems good, single master only, master is a single point of failure,
 no good failover system for electing a new master or having a failed
 master rejoin the cluster.  Slave databases are mostly for safety or
 for parallelizing queries for performance.  Suffers from O(N^2) 
 communications (N = cluster size).


Slony-II
 Seems brilliant, a solid theoretical foundation, at the forefront of
 computer science.  But can't find project status -- when will it be
 available?  Is it a pipe dream, or a nearly-ready reality?

PGReplication
 Appears to be a page that someone forgot to erase from the old GBorg site.

PGCluster
 Seems pretty good, but web site is not current, there are releases in use
 that are not on the web site, and also seems to always be a couple steps
 behind the current release of Postgres.  Two single-points failure spots,
 load balancer and the data replicator.

Is this a good summary of the status of replication?  Have I missed any 
important solutions or mischaracterized anything?

Thanks!
Craig

(Sorry about the premature send of this message earlier, please ignore.)



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Best way to delete unreferenced rows?

2007-06-14 Thread Ed Tyrrill
Craig James wrote:
> Tyrrill, Ed wrote:
> > QUERY PLAN
> >
> >
> 
> >
> 
> > ---
> >  Merge Left Join  (cost=38725295.93..42505394.70 rows=13799645
> width=8)
> > (actual time=6503583.342..8220629.311 rows=93524 loops=1)
> >Merge Cond: ("outer".record_id = "inner".record_id)
> >Filter: ("inner".record_id IS NULL)
> >->  Index Scan using backupobjects_pkey on backupobjects
> > (cost=0.00..521525.10 rows=13799645 width=8) (actual
> > time=15.955..357813.621 rows=13799645 loops=1)
> >->  Sort  (cost=38725295.93..39262641.69 rows=214938304 width=8)
> > (actual time=6503265.293..7713657.750 rows=214938308 loops=1)
> >  Sort Key: backup_location.record_id
> >  ->  Seq Scan on backup_location  (cost=0.00..3311212.04
> > rows=214938304 width=8) (actual time=11.175..1881179.825
> rows=214938308
> > loops=1)
> >  Total runtime: 8229178.269 ms
> > (8 rows)
> >
> > I ran vacuum analyze after the last time any inserts, deletes, or
> > updates were done, and before I ran the query above.  I've attached
> my
> > postgresql.conf.  The machine has 4 GB of RAM.
> 
> I thought maybe someone with more expertise than me might answer this,
> but since they haven't I'll just make a comment.  It looks to me like
> the sort of 214 million rows is what's killing you.  I suppose you
> could try to increase the sort memory, but that's a lot of memory.  It
> seems to me an index merge of a relation this large would be faster,
> but that's a topic for the experts.
> 
> On a theoretical level, the problem is that it's sorting the largest
> table.  Perhaps you could re-cast the query so that it only has to
> sort the smaller table, something like
> 
>select a.id from a where a.id not in (select distinct b.id from b)
> 
> where "b" is the smaller table.  There's still no guarantee that it
> won't do a sort on "a", though.  In fact one of the clever things
> about Postgres is that it can convert a query like the one above into
> a regular join, unless you do something like "select ... offset 0"
> which blocks the optimizer from doing the rearrangement.
> 
> But I think the first approach is to try to tune for a better plan
> using your original query.
> 
> Craig

Thanks for the input Craig.  I actually started out with a query similar
to what you suggest, but the performance was days to complete back when
the larger table, backup_location, was still under 100 million rows.
The current query is the best performance to date.  I have been playing
around with work_mem, and doubling it to 128MB did result in some
improvement, but doubleing it again to 256MB showed no further gain.
Here is the explain analyze with work_mem increased to 128MB:

mdsdb=# explain analyze select backupobjects.record_id from
backupobjects left outer join backup_location using (record_id) where
backup_location.record_id is null;

QUERY
PLAN
   

 Merge Left Join  (cost=36876242.28..40658535.53 rows=13712990 width=8)
(actual time=5795768.950..5795768.950 rows=0 loops=1)
   Merge Cond: ("outer".record_id = "inner".record_id)
   Filter: ("inner".record_id IS NULL)
   ->  Index Scan using backupobjects_pkey on backupobjects
(cost=0.00..520571.89 rows=13712990 width=8) (actual
time=2.490..201516.228 rows=13706121 loops=1)
   ->  Sort  (cost=36876242.28..37414148.76 rows=215162592 width=8)
(actual time=4904205.255..5440137.309 rows=215162559 loops=1)
 Sort Key: backup_location.record_id
 ->  Seq Scan on backup_location  (cost=0.00..3314666.92
rows=215162592 width=8) (actual time=4.186..1262641.774 rows=215162559
loops=1)
 Total runtime: 5796322.535 ms



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] How much ram is too much

2007-06-14 Thread Dave Cramer

Hi Andrew
On 11-Jun-07, at 11:34 AM, Andrew Sullivan wrote:


On Mon, Jun 11, 2007 at 11:09:42AM -0400, Dave Cramer wrote:

and set them to anything remotely close to 128GB.


Well, we'd give 25% of it to postgres, and the rest to the OS.


Are you quite sure that PostgreSQL's management of the buffers is
efficient with such a large one?


No, I'm not sure of this.

In the past, that wasn't the case
for relatively small buffers; with the replacement of single-pass
LRU, that has certainly changed, but I'd be surprised if anyone
tested a buffer as large as 32G.


So does anyone have experience above 32G ?

Dave


A

--  
Andrew Sullivan  | [EMAIL PROTECTED]

The whole tendency of modern prose is away from concreteness.
--George Orwell

---(end of  
broadcast)---

TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PERFORM] Variable (degrading) perfomance

2007-06-14 Thread Vladimir Stankovic

Hi all,

It seems that I have an issue with the performance of a PostgreSQL server.

I'm running write-intensive, TPC-C like tests. The workload consist of 
150 to 200 thousand transactions. The performance varies dramatically, 
between 5 and more than 9 hours (I don't have the exact figure for the 
longest experiment). Initially the server is relatively fast. It 
finishes the first batch of 50k transactions in an hour. This is 
probably due to the fact that the database is RAM-resident during this 
interval. As soon as the database grows bigger than the RAM the 
performance, not surprisingly, degrades, because of the slow disks.
My problem is that the performance is rather variable, and to me 
non-deterministic. A 150k test can finish in approx. 3h30mins but 
conversely  it can take more than 5h to complete.
Preferably I would like to see *steady-state* performance (where my 
interpretation of the steady-state is that the average 
throughput/response time does not change over time). Is the steady-state 
achievable despite the MVCC and the inherent non-determinism between 
experiments? What could be the reasons for the variable performance?
- misconfiguration of the PG parameters (e.g. autovacuum does not cope 
with the dead tuples on the MVCC architecture)

- file fragmentation
- index bloat
- ???
The initial size of the database (actually the output of the 'du -h' 
command) is ~ 400 MB. The size increases dramatically, somewhere between 
600MB and 1.1GB


I have doubted the client application at some point too. However, other 
server combinations using different DBMS exhibit steady state 
performance.As a matter of fact when PG is paired with Firebird, through 
statement-based replication middleware, the performance of the pair is 
steady too.


The hardware configuration:
Client machine
- 1.5 GHz CPU Pentium 4
- 1GB Rambus RAM
- Seagate st340810a IDE disk (40GB), 5400 rpms

Server machine
- 1.5 GHz CPU Pentium 4
- 640 MB Rambus RAM
- Seagate Barracuda 7200.9 rpms
- Seagate st340810a IDE disk (40GB) - the WAL is stored on an ext2 partition

The Software configuration:
The client application is a multi-threaded Java client running on Win 
2000 Pro sp4

The database server version is  8.1.5 running on Fedora Core 6.
Please find attached:
1 - the output of vmstat  taken after  the first  60k transactions were 
executed

2 - the postgresql.conf file

Any help would be appreciated.

Best regards,
Vladimir
--

Vladimir Stankovic  T: +44 20 7040 0273
Research Student/Research Assistant F: +44 20 7040 8585
Centre for Software Reliability E: [EMAIL PROTECTED]
City University 
Northampton Square, London EC1V 0HB 

# -
# PostgreSQL configuration file
# -
#
# This file consists of lines of the form:
#
#   name = value
#
# (The '=' is optional.) White space may be used. Comments are introduced
# with '#' anywhere on a line. The complete list of option names and
# allowed values can be found in the PostgreSQL documentation. The
# commented-out settings shown in this file represent the default values.
#
# Please note that re-commenting a setting is NOT sufficient to revert it
# to the default value, unless you restart the postmaster.
#
# Any option can also be given as a command line switch to the
# postmaster, e.g. 'postmaster -c log_connections=on'. Some options
# can be changed at run-time with the 'SET' SQL command.
#
# This file is read on postmaster startup and when the postmaster
# receives a SIGHUP. If you edit the file on a running system, you have 
# to SIGHUP the postmaster for the changes to take effect, or use 
# "pg_ctl reload". Some settings, such as listen_addresses, require
# a postmaster shutdown and restart to take effect.


#---
# FILE LOCATIONS
#---

# The default values of these variables are driven from the -D command line
# switch or PGDATA environment variable, represented here as ConfigDir.

#data_directory = 'ConfigDir'   # use data in another directory
#hba_file = 'ConfigDir/pg_hba.conf' # host-based authentication file
#ident_file = 'ConfigDir/pg_ident.conf' # IDENT configuration file

# If external_pid_file is not explicitly set, no extra pid file is written.
#external_pid_file = '(none)'   # write an extra pid file


#---
# CONNECTIONS AND AUTHENTICATION
#---

# - Connection Settings -

#listen_addresses = 'localhost' # what IP address(es) to listen on; 
# comma-separated list of addresses;
# defaults to 'localhost', '*' = all
#port = 5432
#max_connections = 100
max_connections = 150
# not

[PERFORM] Database size

2007-06-14 Thread choksi
Hi all,

I had a database which uses to hold some 50 Mill records and disk
space used was 103 GB. I deleted around 34 Mill records but still the
disk size is same. Can some on please shed some light on this.

Thank in advance for all the help.

Dhawal Choksi


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Replication

2007-06-14 Thread Andreas Kostyrka
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

> Most of our data is replicated offline using custom tools tailored to
> our loading pattern, but we have a small amount of "global" information,
> such as user signups, system configuration, advertisements, and such,
> that go into a single small (~5-10 MB) "global database" used by all
> servers.

Slony provides near instantaneous failovers (in the single digit seconds
 range). You can script an automatic failover if the master server
becomes unreachable. That leaves you the problem of restarting your app
(or making it reconnect) to the new master.

5-10MB data implies such a fast initial replication, that making the
server rejoin the cluster by setting it up from scratch is not an issue.


> The problem is, there don't seem to be any "vote a new master" type of
> tools for Slony-I, and also, if the original master comes back online,
> it has no way to know that a new master has been elected.  So I'd have
> to write a bunch of SOAP services or something to do all of this.

You don't need SOAP services, and you do not need to elect a new master.
if dbX goes down, dbY takes over, you should be able to decide on a
static takeover pattern easily enough.

The point here is, that the servers need to react to a problem, but you
probably want to get the admin on duty to look at the situation as
quickly as possible anyway. With 5-10MB of data in the database, a
complete rejoin from scratch to the cluster is measured in minutes.

Furthermore, you need to checkout pgpool, I seem to remember that it has
some bad habits in routing queries. (E.g. it wants to apply write
queries to all nodes, but slony makes the other nodes readonly.
Furthermore, anything inside a BEGIN is sent to the master node, which
is bad with some ORMs, that by default wrap any access into a transaction)

Andreas
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGceUXHJdudm4KnO0RAgh/AJ4kXFpzoQAEnn1B7K6pzoCxk0wFxQCggGF1
mA1KWvcKtfJ6ZcPiajJK1i4=
=eoNN
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Replication

2007-06-14 Thread Craig James

Andreas Kostyrka wrote:

Slony provides near instantaneous failovers (in the single digit seconds
 range). You can script an automatic failover if the master server
becomes unreachable.


But Slony slaves are read-only, correct?  So the system isn't fully functional 
once the master goes down.


That leaves you the problem of restarting your app
(or making it reconnect) to the new master.


Don't you have to run a Slony app to convert one of the slaves into the master?


5-10MB data implies such a fast initial replication, that making the
server rejoin the cluster by setting it up from scratch is not an issue.


The problem is to PREVENT it from rejoining the cluster.  If you have some 
semi-automatic process that detects the dead server and converts a slave to the 
master, and in the mean time the dead server manages to reboot itself (or its 
network gets fixed, or whatever the problem was), then you have two masters 
sending out updates, and you're screwed.


The problem is, there don't seem to be any "vote a new master" type of
tools for Slony-I, and also, if the original master comes back online,
it has no way to know that a new master has been elected.  So I'd have
to write a bunch of SOAP services or something to do all of this.


You don't need SOAP services, and you do not need to elect a new master.
if dbX goes down, dbY takes over, you should be able to decide on a
static takeover pattern easily enough.


I can't see how that is true.  Any self-healing distributed system needs 
something like the following:

 - A distributed system of nodes that check each other's health
 - A way to detect that a node is down and to transmit that
   information across the nodes
 - An election mechanism that nominates a new master if the
   master fails
 - A way for a node coming online to determine if it is a master
   or a slave

Any solution less than this can cause corruption because you can have two nodes 
that both think they're master, or end up with no master and no process for 
electing a master.  As far as I can tell, Slony doesn't do any of this.  Is 
there a simpler solution?  I've never heard of one.


The point here is, that the servers need to react to a problem, but you
probably want to get the admin on duty to look at the situation as
quickly as possible anyway.


No, our requirement is no administrator interaction.  We need instant, 
automatic recovery from failure so that the system stays online.


Furthermore, you need to checkout pgpool, I seem to remember that it has
some bad habits in routing queries. (E.g. it wants to apply write
queries to all nodes, but slony makes the other nodes readonly.
Furthermore, anything inside a BEGIN is sent to the master node, which
is bad with some ORMs, that by default wrap any access into a transaction)


I should have been more clear about this.  I was planning to use PGPool in the 
PGPool-1 mode (not the new PGPool-2 features that allow replication).  So it 
would only be acting as a failover mechanism.  Slony would be used as the 
replication mechanism.

I don't think I can use PGPool as the replicator, because then it becomes a new 
single point of failure that could bring the whole system down.  If you're 
using it for INSERT/UPDATE, then there can only be one PGPool server.

I was thinking I'd put a PGPool server on every machine in failover mode only.  
It would have the Slony master as the primary connection, and a Slony slave as 
the failover connection.  The applications would route all INSERT/UPDATE 
statements directly to the Slony master, and all SELECT statements to the 
PGPool on localhost.  When the master failed, all of the PGPool servers would 
automatically switch to one of the Slony slaves.

This way, the system would keep running on the Slony slaves (so it would be 
read-only), until a sysadmin could get the master Slony back online.  And when 
the master came online, the PGPool servers would automatically reconnect and 
write-access would be restored.

Does this make sense?

Craig

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Replication

2007-06-14 Thread Joshua D. Drake

Craig James wrote:

Andreas Kostyrka wrote:

Slony provides near instantaneous failovers (in the single digit seconds
 range). You can script an automatic failover if the master server
becomes unreachable.


But Slony slaves are read-only, correct?  So the system isn't fully 
functional once the master goes down.


That is what promotion is for.

Joshua D. Drake



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings




--

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] Replication

2007-06-14 Thread Eugene Ogurtsov
What about "Daffodil Replicator" - GPL - 
http://sourceforge.net/projects/daffodilreplica/



--
Thanks,

Eugene Ogurtsov
Internal Development Chief Architect
SWsoft, Inc.



Craig A. James wrote:

Looking for replication solutions, I find:

Slony-I
 Seems good, single master only, master is a single point of failure,
 no good failover system for electing a new master or having a failed
 master rejoin the cluster.  Slave databases are mostly for safety or
 for parallelizing queries for performance.  Suffers from O(N^2) 
 communications (N = cluster size).


Slony-II
 Seems brilliant, a solid theoretical foundation, at the forefront of
 computer science.  But can't find project status -- when will it be
 available?  Is it a pipe dream, or a nearly-ready reality?

PGReplication
 Appears to be a page that someone forgot to erase from the old GBorg 
site.


PGCluster
 Seems pretty good, but web site is not current, there are releases in 
use

 that are not on the web site, and also seems to always be a couple steps
 behind the current release of Postgres.  Two single-points failure 
spots,

 load balancer and the data replicator.

Is this a good summary of the status of replication?  Have I missed 
any important solutions or mischaracterized anything?


Thanks!
Craig

(Sorry about the premature send of this message earlier, please ignore.)



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Database size

2007-06-14 Thread A. Kretschmer
am  Fri, dem 08.06.2007, um  1:22:14 -0700 mailte choksi folgendes:
> Hi all,
> 
> I had a database which uses to hold some 50 Mill records and disk
> space used was 103 GB. I deleted around 34 Mill records but still the
> disk size is same. Can some on please shed some light on this.

DELETE only mark rows as deleted, if you need the space you need a
VACUUM FULL.

Read more: http://www.postgresql.org/docs/current/static/sql-vacuum.html


Andreas
-- 
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Replication

2007-06-14 Thread Andreas Kostyrka
Ok, slony supports two kinds of operation here: failover (which moves the 
master node to a new one without the old master node being present, it also 
drops the old node from replication) and move set (which moves the master node 
with cooperation)

The usecases for these two are slightly different. one is for all kinds of 
scheduled maintenance, while the other is what you do when you've got a 
hardware failure.

Andreas

-- Ursprüngl. Mitteil. --
Betreff:Re: [PERFORM] Replication
Von:Craig James <[EMAIL PROTECTED]>
Datum:  15.06.2007 01:48

Andreas Kostyrka wrote:
> Slony provides near instantaneous failovers (in the single digit seconds
>  range). You can script an automatic failover if the master server
> becomes unreachable.

But Slony slaves are read-only, correct?  So the system isn't fully functional 
once the master goes down.

> That leaves you the problem of restarting your app
> (or making it reconnect) to the new master.

Don't you have to run a Slony app to convert one of the slaves into the master?

> 5-10MB data implies such a fast initial replication, that making the
> server rejoin the cluster by setting it up from scratch is not an issue.

The problem is to PREVENT it from rejoining the cluster.  If you have some 
semi-automatic process that detects the dead server and converts a slave to the 
master, and in the mean time the dead server manages to reboot itself (or its 
network gets fixed, or whatever the problem was), then you have two masters 
sending out updates, and you're screwed.

>> The problem is, there don't seem to be any "vote a new master" type of
>> tools for Slony-I, and also, if the original master comes back online,
>> it has no way to know that a new master has been elected.  So I'd have
>> to write a bunch of SOAP services or something to do all of this.
> 
> You don't need SOAP services, and you do not need to elect a new master.
> if dbX goes down, dbY takes over, you should be able to decide on a
> static takeover pattern easily enough.

I can't see how that is true.  Any self-healing distributed system needs 
something like the following:

  - A distributed system of nodes that check each other's health
  - A way to detect that a node is down and to transmit that
information across the nodes
  - An election mechanism that nominates a new master if the
master fails
  - A way for a node coming online to determine if it is a master
or a slave

Any solution less than this can cause corruption because you can have two nodes 
that both think they're master, or end up with no master and no process for 
electing a master.  As far as I can tell, Slony doesn't do any of this.  Is 
there a simpler solution?  I've never heard of one.

> The point here is, that the servers need to react to a problem, but you
> probably want to get the admin on duty to look at the situation as
> quickly as possible anyway.

No, our requirement is no administrator interaction.  We need instant, 
automatic recovery from failure so that the system stays online.

> Furthermore, you need to checkout pgpool, I seem to remember that it has
> some bad habits in routing queries. (E.g. it wants to apply write
> queries to all nodes, but slony makes the other nodes readonly.
> Furthermore, anything inside a BEGIN is sent to the master node, which
> is bad with some ORMs, that by default wrap any access into a transaction)

I should have been more clear about this.  I was planning to use PGPool in the 
PGPool-1 mode (not the new PGPool-2 features that allow replication).  So it 
would only be acting as a failover mechanism.  Slony would be used as the 
replication mechanism.

I don't think I can use PGPool as the replicator, because then it becomes a new 
single point of failure that could bring the whole system down.  If you're 
using it for INSERT/UPDATE, then there can only be one PGPool server.

I was thinking I'd put a PGPool server on every machine in failover mode only.  
It would have the Slony master as the primary connection, and a Slony slave as 
the failover connection.  The applications would route all INSERT/UPDATE 
statements directly to the Slony master, and all SELECT statements to the 
PGPool on localhost.  When the master failed, all of the PGPool servers would 
automatically switch to one of the Slony slaves.

This way, the system would keep running on the Slony slaves (so it would be 
read-only), until a sysadmin could get the master Slony back online.  And when 
the master came online, the PGPool servers would automatically reconnect and 
write-access would be restored.

Does this make sense?

Craig


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly