[PERFORM] Parsing VACUUM VERBOSE
Hi there, I'd like to understand completely the report generated by VACUUM VERBOSE. Please tell me where is it documented ? TIA, Sabin ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Best use of second controller with faster disks?
On Jun 13, 2007, at 10:36 PM, Francisco Reyes wrote: FreeBSD, indeed. The vendor, Partners Data Systems, did a wonderful This one? http://www.partnersdata.com that's the one. job ensuring that everything integrated well to the point of talking with various FreeBSD developers, LSI engineers, etc., and sent me a fully tested system end-to-end with a Sun X4100 M2, LSI 4Gb Fibre card, and their RAID array, with FreeBSD installed already. Is there a management program in FreeBSD for the Areca card? So I understand the setup you are describing.. Machine has Areca controller Connects to external enclosure Enclosure has LSI controller In the past I've had systems with RAID cards: LSI and Adaptec. The LSI 320-2X is the fastest one I've ever had. The adaptec ones suck because there is no management software for them on the newer cards for freebsd, especially under amd64. The system I'm working on now is thus: Sun X4100 M2 with an LSI 4Gb fibre channel card connected to an external self-contained RAID enclosure, the Triton RAID from Partners Data. The Triton unit has in it an Areca SATA RAID controller and 16 disks. I have separate disks built-in to the system for boot. How did you get FreeBSD to newfs such a large setup? newfs -s /dev/raw-disk? It is only 2Tb raw, 1.7Tb formatted :-) I just used sysinstall to run fdisk, label, and newfs for me. Since it is just postgres data, no file will ever be larger than 1Gb I didn't need to make any adjustments to the newfs parameters. What are the speed/size of the disks? 7K rpm? I splurged for the 10kRPM drives, even though they are smaller 150Gb each. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Parsing VACUUM VERBOSE
Sabin, On 6/14/07, Sabin Coanda <[EMAIL PROTECTED]> wrote: I'd like to understand completely the report generated by VACUUM VERBOSE. Please tell me where is it documented ? You can take a look to what I did for pgFouine: http://pgfouine.projects.postgresql.org/vacuum.html -- Guillaume ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Parsing VACUUM VERBOSE
Le jeudi 14 juin 2007, Sabin Coanda a écrit : > I'd like to understand completely the report generated by VACUUM VERBOSE. > Please tell me where is it documented ? Try the pgfouine reporting tool : http://pgfouine.projects.postgresql.org/ http://pgfouine.projects.postgresql.org/reports/sample_vacuum.html It's easier to understand the vacuum verbose output from the generated report. -- dim signature.asc Description: This is a digitally signed message part.
Re: [PERFORM] [PG 8.1.0 / AIX 5.3] Vacuum processes freezing
On Wed, 2007-06-13 at 18:33 +0200, RESTOUX, Loïc wrote: > 2) I believed that the poor performances during the vacuum freeze were due to > the obsolete data statistics. But after a full restart of the dabatase, > performances are good. Does PostgreSQL rebuild his statistics during startup > ? You probably don't need to run VACUUM FREEZE. VACUUM FREEZE will thrash the disks much more than normal VACUUM. We're improving that somewhat in 8.3, but the basic issue is that VACUUM FREEZE cleans out more dead rows and so will dirty more data blocks. Are you concurrently running DDL, Truncate or CLUSTER? That will interfere with the operation of VACUUM. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Parsing VACUUM VERBOSE
Hi Guillaume, Very interesting ! Merci beaucoup, Sabin ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Parsing VACUUM VERBOSE
On 6/14/07, Dimitri Fontaine <[EMAIL PROTECTED]> wrote: Le jeudi 14 juin 2007, Sabin Coanda a écrit: > I'd like to understand completely the report generated by VACUUM VERBOSE. > Please tell me where is it documented ? Try the pgfouine reporting tool : http://pgfouine.projects.postgresql.org/ http://pgfouine.projects.postgresql.org/reports/sample_vacuum.html It's easier to understand the vacuum verbose output from the generated report. -- dim Can anyone share what value they have set log_min_duration_statement to? -- Yudhvir Singh Sidhu 408 375 3134 cell
Re: [PERFORM] Parsing VACUUM VERBOSE
On 6/14/07, Y Sidhu <[EMAIL PROTECTED]> wrote: Can anyone share what value they have set log_min_duration_statement to? It's OT but we use different values for different databases and needs. On a very loaded database with a lot of complex queries (lots of join on big tables, proximity queries, full text queries), we use 100 ms. It logs ~ 300 000 queries. It allows us to detect big regressions or new queries which are very slow. On another database where I want to track transaction leaks, I'm forced to put it to 0ms. Basically, the answer is: set it to the lowest value you can afford without impacting too much your performances (and if you use syslog, use async I/O or send your log to the network). -- Guillaume ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Parsing VACUUM VERBOSE
On 6/14/07, Guillaume Smet <[EMAIL PROTECTED]> wrote: On 6/14/07, Y Sidhu <[EMAIL PROTECTED]> wrote: > Can anyone share what value they have set log_min_duration_statement to? It's OT but we use different values for different databases and needs. On a very loaded database with a lot of complex queries (lots of join on big tables, proximity queries, full text queries), we use 100 ms. It logs ~ 300 000 queries. It allows us to detect big regressions or new queries which are very slow. On another database where I want to track transaction leaks, I'm forced to put it to 0ms. Basically, the answer is: set it to the lowest value you can afford without impacting too much your performances (and if you use syslog, use async I/O or send your log to the network). -- Guillaume I am trying to answer the question of how to tell if the cleanup of an index may be locked by a long transaction. And in the bigger context, why vacuums are taking long? What triggers them? I came across the following query which shows one table 'connect_tbl' with high "heap hits" and "low heap buffer %" Now, 'heap' seems to be a memory construct. Any light shedding is appreciated. mydb=# SELECT mydb-# 'HEAP:'||relname AS table_name, mydb-# (heap_blks_read+heap_blks_hit) AS heap_hits, ROUND(((heap_blks_hit)::NUMERIC/(heap_blks_read+heap_blks_hit)*100), 2) mydb-# ROUND(((heap_blks_hit)::NUMERIC/(heap_blks_read+heap_blks_hit)*100), 2) mydb-# AS heap_buffer_percentage mydb-# FROM pg_statio_user_tables mydb-# WHERE(heap_blks_read+heap_blks_hit)>0 mydb-# UNION mydb-# SELECT mydb-# 'TOAST:'||relname, mydb-# (toast_blks_read+toast_blks_hit), mydb-# ROUND(((toast_blks_hit)::NUMERIC/(toast_blks_read+toast_blks_hit)*100), 2) mydb-# FROM pg_statio_user_tables mydb-# WHERE(toast_blks_read+toast_blks_hit)>0 mydb-# UNION mydb-# SELECT mydb-# 'INDEX:'||relname, mydb-# (idx_blks_read+idx_blks_hit), mydb-# ROUND(((idx_blks_hit)::NUMERIC/(idx_blks_read+idx_blks_hit)*100), 2) mydb-# FROM pg_statio_user_tables mydb-# WHERE(idx_blks_read+idx_blks_hit)>0; table_name| heap_hits | heap_buffer_percentage +--+-- HEAP:connect_tbl |890878 | 43.18 HEAP:tblbound_tbl | 43123 | 13.80 HEAP:tblcruel_tbl |225819 | 6.98 INDEX:connect_tbl |287224 | 79.82 INDEX:tblbound_tbl | 81640 | 90.28 INDEX:tblcruel_tbl |253014 | 50.73 -- Yudhvir Singh Sidhu 408 375 3134 cell
[PERFORM] Replication
Looking for replication solutions, I find: Slony-I Seems good, single master only, master is a single point of failure, no good failover system for electing a new master or having a failed master rejoin the cluster. Slave databases are mostly for safety or for parallelizing queries for performance. Suffers from O(N^2) communications (N = cluster size). Slony-II Seems brilliant, a solid theoretical foundation, at the forefront of computer science. But can't find project status -- when will it be available? Is it a pipe dream, or a nearly-ready reality? PGReplication Appears to be a page that someone forgot to erase from the old GBorg site. PGCluster Seems pretty good, but web site is not current, there are releases in use that are not on the web site, and also seems to always be a couple steps behind the current release of Postgres. Two single-points failure spots, load balancer and the data replicator. Is this a good summary of the status of replication? Have I missed any important solutions or mischaracterized anything? Thanks! Craig ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Replication
Craig James wrote: Looking for replication solutions, I find: Slony-I Seems good, single master only, master is a single point of failure, no good failover system for electing a new master or having a failed master rejoin the cluster. Slave databases are mostly for safety or for parallelizing queries for performance. Suffers from O(N^2) communications (N = cluster size). Yep Slony-II Seems brilliant, a solid theoretical foundation, at the forefront of computer science. But can't find project status -- when will it be available? Is it a pipe dream, or a nearly-ready reality? Dead PGReplication Appears to be a page that someone forgot to erase from the old GBorg site. Dead PGCluster Seems pretty good, but web site is not current, there are releases in use that are not on the web site, and also seems to always be a couple steps behind the current release of Postgres. Two single-points failure spots, load balancer and the data replicator. Slow as all get out for writes but cool idea Is this a good summary of the status of replication? Have I missed any important solutions or mischaracterized anything? log shipping, closed source solutions Thanks! Craig ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Replication
Which replication problem are you trying to solve? On Thu, 14 Jun 2007, Craig James wrote: Looking for replication solutions, I find: Slony-I Seems good, single master only, master is a single point of failure, no good failover system for electing a new master or having a failed master rejoin the cluster. Slave databases are mostly for safety or for parallelizing queries for performance. Suffers from O(N^2) communications (N = cluster size). Slony-II Seems brilliant, a solid theoretical foundation, at the forefront of computer science. But can't find project status -- when will it be available? Is it a pipe dream, or a nearly-ready reality? PGReplication Appears to be a page that someone forgot to erase from the old GBorg site. PGCluster Seems pretty good, but web site is not current, there are releases in use that are not on the web site, and also seems to always be a couple steps behind the current release of Postgres. Two single-points failure spots, load balancer and the data replicator. Is this a good summary of the status of replication? Have I missed any important solutions or mischaracterized anything? Thanks! Craig ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Replication
On 6/15/07, Craig James <[EMAIL PROTECTED]> wrote: [snip] Is this a good summary of the status of replication? Have I missed any important solutions or mischaracterized anything? * Mammoth Replicator, commercial. * Continuent uni/cluster, commercial (http://www.continuent.com/index.php?option=com_content&task=view&id=212&Itemid=169). * pgpool-II. Supports load-balancing and replication by implementing a proxy that duplicates all updates to all slaves. It can partition data by doing this, and it can semi-intelligently route queries to the appropriate servers. * Cybertec. This is a commercial packaging of PGCluster-II from an Austrian company. * Greenplum Database (formerly Bizgres MPP), commercial. Not so much a replication solution as a way to parallelize queries, and targeted at the data warehousing crowd. Similar to ExtenDB, but tightly integrated with PostgreSQL. * DRDB (http://www.drbd.org/), a device driver that replicates disk blocks to other nodes. This works for failover only, not for scaling reads. Easy migration of devices if combined with an NFS export. * Skytools (https://developer.skype.com/SkypeGarage/DbProjects/SkyTools), a collection of replication tools from the Skype people. Purports to be simpler to use than Slony. Lastly, and perhaps most promisingly, there's the Google Summer of Code effort by Florian Pflug (http://code.google.com/soc/postgres/appinfo.html?csaid=6545828A8197EBC6) to implement true log-based replication, where PostgreSQL's transaction logs are used to keep live slave servers up to date with a master. In theory, such a system would be extremely simple to set up and use, especially since it should, as far as I can see, also transparently replicate the schema for you. Alexander. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Replication
>>> On Thu, Jun 14, 2007 at 6:14 PM, in message <[EMAIL PROTECTED]>, Craig James <[EMAIL PROTECTED]> wrote: > Looking for replication solutions, I find: > > Slony-I > Slony-II > PGReplication > PGCluster You wouldn't guess it from the name, but pgpool actually supports replication: http://pgpool.projects.postgresql.org/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Replication
Thanks to all who replied and filled in the blanks. The problem with the web is you never know if you've missed something. Joshua D. Drake wrote: Looking for replication solutions, I find... Slony-II Dead Wow, I'm surprised. Is it dead for lack of need, lack of resources, too complex, or all of the above? It sounded like such a promising theoretical foundation. Ben wrote: Which replication problem are you trying to solve? Most of our data is replicated offline using custom tools tailored to our loading pattern, but we have a small amount of "global" information, such as user signups, system configuration, advertisements, and such, that go into a single small (~5-10 MB) "global database" used by all servers. We need "nearly-real-time replication," and instant failover. That is, it's far more important for the system to keep working than it is to lose a little data. Transactional integrity is not important. Actual hardware failures are rare, and if a user just happens to sign up, or do "save preferences", at the instant the global-database server goes down, it's not a tragedy. But it's not OK for the entire web site to go down when the one global-database server fails. Slony-I can keep several slave databases up to date, which is nice. And I think I can combine it with a PGPool instance on each server, with the master as primary and few Slony-copies as secondary. That way, if the master goes down, the PGPool servers all switch to their secondary Slony slaves, and read-only access can continue. If the master crashes, users will be able to do most activities, but new users can't sign up, and existing users can't change their preferences, until either the master server comes back, or one of the slaves is promoted to master. The problem is, there don't seem to be any "vote a new master" type of tools for Slony-I, and also, if the original master comes back online, it has no way to know that a new master has been elected. So I'd have to write a bunch of SOAP services or something to do all of this. I would consider PGCluster, but it seems to be a patch to Postgres itself. I'm reluctant to introduce such a major piece of technology into our entire system, when only one tiny part of it needs the replication service. Thanks, Craig ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] Replication
Looking for replication solutions, I find: Slony-I Seems good, single master only, master is a single point of failure, no good failover system for electing a new master or having a failed master rejoin the cluster. Slave databases are mostly for safety or for parallelizing queries for performance. Suffers from O(N^2) communications (N = cluster size). Slony-II Seems brilliant, a solid theoretical foundation, at the forefront of computer science. But can't find project status -- when will it be available? Is it a pipe dream, or a nearly-ready reality? PGReplication Appears to be a page that someone forgot to erase from the old GBorg site. PGCluster Seems pretty good, but web site is not current, there are releases in use that are not on the web site, and also seems to always be a couple steps behind the current release of Postgres. Two single-points failure spots, load balancer and the data replicator. Is this a good summary of the status of replication? Have I missed any important solutions or mischaracterized anything? Thanks! Craig (Sorry about the premature send of this message earlier, please ignore.) ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Best way to delete unreferenced rows?
Craig James wrote: > Tyrrill, Ed wrote: > > QUERY PLAN > > > > > > > > > > --- > > Merge Left Join (cost=38725295.93..42505394.70 rows=13799645 > width=8) > > (actual time=6503583.342..8220629.311 rows=93524 loops=1) > >Merge Cond: ("outer".record_id = "inner".record_id) > >Filter: ("inner".record_id IS NULL) > >-> Index Scan using backupobjects_pkey on backupobjects > > (cost=0.00..521525.10 rows=13799645 width=8) (actual > > time=15.955..357813.621 rows=13799645 loops=1) > >-> Sort (cost=38725295.93..39262641.69 rows=214938304 width=8) > > (actual time=6503265.293..7713657.750 rows=214938308 loops=1) > > Sort Key: backup_location.record_id > > -> Seq Scan on backup_location (cost=0.00..3311212.04 > > rows=214938304 width=8) (actual time=11.175..1881179.825 > rows=214938308 > > loops=1) > > Total runtime: 8229178.269 ms > > (8 rows) > > > > I ran vacuum analyze after the last time any inserts, deletes, or > > updates were done, and before I ran the query above. I've attached > my > > postgresql.conf. The machine has 4 GB of RAM. > > I thought maybe someone with more expertise than me might answer this, > but since they haven't I'll just make a comment. It looks to me like > the sort of 214 million rows is what's killing you. I suppose you > could try to increase the sort memory, but that's a lot of memory. It > seems to me an index merge of a relation this large would be faster, > but that's a topic for the experts. > > On a theoretical level, the problem is that it's sorting the largest > table. Perhaps you could re-cast the query so that it only has to > sort the smaller table, something like > >select a.id from a where a.id not in (select distinct b.id from b) > > where "b" is the smaller table. There's still no guarantee that it > won't do a sort on "a", though. In fact one of the clever things > about Postgres is that it can convert a query like the one above into > a regular join, unless you do something like "select ... offset 0" > which blocks the optimizer from doing the rearrangement. > > But I think the first approach is to try to tune for a better plan > using your original query. > > Craig Thanks for the input Craig. I actually started out with a query similar to what you suggest, but the performance was days to complete back when the larger table, backup_location, was still under 100 million rows. The current query is the best performance to date. I have been playing around with work_mem, and doubling it to 128MB did result in some improvement, but doubleing it again to 256MB showed no further gain. Here is the explain analyze with work_mem increased to 128MB: mdsdb=# explain analyze select backupobjects.record_id from backupobjects left outer join backup_location using (record_id) where backup_location.record_id is null; QUERY PLAN Merge Left Join (cost=36876242.28..40658535.53 rows=13712990 width=8) (actual time=5795768.950..5795768.950 rows=0 loops=1) Merge Cond: ("outer".record_id = "inner".record_id) Filter: ("inner".record_id IS NULL) -> Index Scan using backupobjects_pkey on backupobjects (cost=0.00..520571.89 rows=13712990 width=8) (actual time=2.490..201516.228 rows=13706121 loops=1) -> Sort (cost=36876242.28..37414148.76 rows=215162592 width=8) (actual time=4904205.255..5440137.309 rows=215162559 loops=1) Sort Key: backup_location.record_id -> Seq Scan on backup_location (cost=0.00..3314666.92 rows=215162592 width=8) (actual time=4.186..1262641.774 rows=215162559 loops=1) Total runtime: 5796322.535 ms ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] How much ram is too much
Hi Andrew On 11-Jun-07, at 11:34 AM, Andrew Sullivan wrote: On Mon, Jun 11, 2007 at 11:09:42AM -0400, Dave Cramer wrote: and set them to anything remotely close to 128GB. Well, we'd give 25% of it to postgres, and the rest to the OS. Are you quite sure that PostgreSQL's management of the buffers is efficient with such a large one? No, I'm not sure of this. In the past, that wasn't the case for relatively small buffers; with the replacement of single-pass LRU, that has certainly changed, but I'd be surprised if anyone tested a buffer as large as 32G. So does anyone have experience above 32G ? Dave A -- Andrew Sullivan | [EMAIL PROTECTED] The whole tendency of modern prose is away from concreteness. --George Orwell ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[PERFORM] Variable (degrading) perfomance
Hi all, It seems that I have an issue with the performance of a PostgreSQL server. I'm running write-intensive, TPC-C like tests. The workload consist of 150 to 200 thousand transactions. The performance varies dramatically, between 5 and more than 9 hours (I don't have the exact figure for the longest experiment). Initially the server is relatively fast. It finishes the first batch of 50k transactions in an hour. This is probably due to the fact that the database is RAM-resident during this interval. As soon as the database grows bigger than the RAM the performance, not surprisingly, degrades, because of the slow disks. My problem is that the performance is rather variable, and to me non-deterministic. A 150k test can finish in approx. 3h30mins but conversely it can take more than 5h to complete. Preferably I would like to see *steady-state* performance (where my interpretation of the steady-state is that the average throughput/response time does not change over time). Is the steady-state achievable despite the MVCC and the inherent non-determinism between experiments? What could be the reasons for the variable performance? - misconfiguration of the PG parameters (e.g. autovacuum does not cope with the dead tuples on the MVCC architecture) - file fragmentation - index bloat - ??? The initial size of the database (actually the output of the 'du -h' command) is ~ 400 MB. The size increases dramatically, somewhere between 600MB and 1.1GB I have doubted the client application at some point too. However, other server combinations using different DBMS exhibit steady state performance.As a matter of fact when PG is paired with Firebird, through statement-based replication middleware, the performance of the pair is steady too. The hardware configuration: Client machine - 1.5 GHz CPU Pentium 4 - 1GB Rambus RAM - Seagate st340810a IDE disk (40GB), 5400 rpms Server machine - 1.5 GHz CPU Pentium 4 - 640 MB Rambus RAM - Seagate Barracuda 7200.9 rpms - Seagate st340810a IDE disk (40GB) - the WAL is stored on an ext2 partition The Software configuration: The client application is a multi-threaded Java client running on Win 2000 Pro sp4 The database server version is 8.1.5 running on Fedora Core 6. Please find attached: 1 - the output of vmstat taken after the first 60k transactions were executed 2 - the postgresql.conf file Any help would be appreciated. Best regards, Vladimir -- Vladimir Stankovic T: +44 20 7040 0273 Research Student/Research Assistant F: +44 20 7040 8585 Centre for Software Reliability E: [EMAIL PROTECTED] City University Northampton Square, London EC1V 0HB # - # PostgreSQL configuration file # - # # This file consists of lines of the form: # # name = value # # (The '=' is optional.) White space may be used. Comments are introduced # with '#' anywhere on a line. The complete list of option names and # allowed values can be found in the PostgreSQL documentation. The # commented-out settings shown in this file represent the default values. # # Please note that re-commenting a setting is NOT sufficient to revert it # to the default value, unless you restart the postmaster. # # Any option can also be given as a command line switch to the # postmaster, e.g. 'postmaster -c log_connections=on'. Some options # can be changed at run-time with the 'SET' SQL command. # # This file is read on postmaster startup and when the postmaster # receives a SIGHUP. If you edit the file on a running system, you have # to SIGHUP the postmaster for the changes to take effect, or use # "pg_ctl reload". Some settings, such as listen_addresses, require # a postmaster shutdown and restart to take effect. #--- # FILE LOCATIONS #--- # The default values of these variables are driven from the -D command line # switch or PGDATA environment variable, represented here as ConfigDir. #data_directory = 'ConfigDir' # use data in another directory #hba_file = 'ConfigDir/pg_hba.conf' # host-based authentication file #ident_file = 'ConfigDir/pg_ident.conf' # IDENT configuration file # If external_pid_file is not explicitly set, no extra pid file is written. #external_pid_file = '(none)' # write an extra pid file #--- # CONNECTIONS AND AUTHENTICATION #--- # - Connection Settings - #listen_addresses = 'localhost' # what IP address(es) to listen on; # comma-separated list of addresses; # defaults to 'localhost', '*' = all #port = 5432 #max_connections = 100 max_connections = 150 # not
[PERFORM] Database size
Hi all, I had a database which uses to hold some 50 Mill records and disk space used was 103 GB. I deleted around 34 Mill records but still the disk size is same. Can some on please shed some light on this. Thank in advance for all the help. Dhawal Choksi ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Replication
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 > Most of our data is replicated offline using custom tools tailored to > our loading pattern, but we have a small amount of "global" information, > such as user signups, system configuration, advertisements, and such, > that go into a single small (~5-10 MB) "global database" used by all > servers. Slony provides near instantaneous failovers (in the single digit seconds range). You can script an automatic failover if the master server becomes unreachable. That leaves you the problem of restarting your app (or making it reconnect) to the new master. 5-10MB data implies such a fast initial replication, that making the server rejoin the cluster by setting it up from scratch is not an issue. > The problem is, there don't seem to be any "vote a new master" type of > tools for Slony-I, and also, if the original master comes back online, > it has no way to know that a new master has been elected. So I'd have > to write a bunch of SOAP services or something to do all of this. You don't need SOAP services, and you do not need to elect a new master. if dbX goes down, dbY takes over, you should be able to decide on a static takeover pattern easily enough. The point here is, that the servers need to react to a problem, but you probably want to get the admin on duty to look at the situation as quickly as possible anyway. With 5-10MB of data in the database, a complete rejoin from scratch to the cluster is measured in minutes. Furthermore, you need to checkout pgpool, I seem to remember that it has some bad habits in routing queries. (E.g. it wants to apply write queries to all nodes, but slony makes the other nodes readonly. Furthermore, anything inside a BEGIN is sent to the master node, which is bad with some ORMs, that by default wrap any access into a transaction) Andreas -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGceUXHJdudm4KnO0RAgh/AJ4kXFpzoQAEnn1B7K6pzoCxk0wFxQCggGF1 mA1KWvcKtfJ6ZcPiajJK1i4= =eoNN -END PGP SIGNATURE- ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Replication
Andreas Kostyrka wrote: Slony provides near instantaneous failovers (in the single digit seconds range). You can script an automatic failover if the master server becomes unreachable. But Slony slaves are read-only, correct? So the system isn't fully functional once the master goes down. That leaves you the problem of restarting your app (or making it reconnect) to the new master. Don't you have to run a Slony app to convert one of the slaves into the master? 5-10MB data implies such a fast initial replication, that making the server rejoin the cluster by setting it up from scratch is not an issue. The problem is to PREVENT it from rejoining the cluster. If you have some semi-automatic process that detects the dead server and converts a slave to the master, and in the mean time the dead server manages to reboot itself (or its network gets fixed, or whatever the problem was), then you have two masters sending out updates, and you're screwed. The problem is, there don't seem to be any "vote a new master" type of tools for Slony-I, and also, if the original master comes back online, it has no way to know that a new master has been elected. So I'd have to write a bunch of SOAP services or something to do all of this. You don't need SOAP services, and you do not need to elect a new master. if dbX goes down, dbY takes over, you should be able to decide on a static takeover pattern easily enough. I can't see how that is true. Any self-healing distributed system needs something like the following: - A distributed system of nodes that check each other's health - A way to detect that a node is down and to transmit that information across the nodes - An election mechanism that nominates a new master if the master fails - A way for a node coming online to determine if it is a master or a slave Any solution less than this can cause corruption because you can have two nodes that both think they're master, or end up with no master and no process for electing a master. As far as I can tell, Slony doesn't do any of this. Is there a simpler solution? I've never heard of one. The point here is, that the servers need to react to a problem, but you probably want to get the admin on duty to look at the situation as quickly as possible anyway. No, our requirement is no administrator interaction. We need instant, automatic recovery from failure so that the system stays online. Furthermore, you need to checkout pgpool, I seem to remember that it has some bad habits in routing queries. (E.g. it wants to apply write queries to all nodes, but slony makes the other nodes readonly. Furthermore, anything inside a BEGIN is sent to the master node, which is bad with some ORMs, that by default wrap any access into a transaction) I should have been more clear about this. I was planning to use PGPool in the PGPool-1 mode (not the new PGPool-2 features that allow replication). So it would only be acting as a failover mechanism. Slony would be used as the replication mechanism. I don't think I can use PGPool as the replicator, because then it becomes a new single point of failure that could bring the whole system down. If you're using it for INSERT/UPDATE, then there can only be one PGPool server. I was thinking I'd put a PGPool server on every machine in failover mode only. It would have the Slony master as the primary connection, and a Slony slave as the failover connection. The applications would route all INSERT/UPDATE statements directly to the Slony master, and all SELECT statements to the PGPool on localhost. When the master failed, all of the PGPool servers would automatically switch to one of the Slony slaves. This way, the system would keep running on the Slony slaves (so it would be read-only), until a sysadmin could get the master Slony back online. And when the master came online, the PGPool servers would automatically reconnect and write-access would be restored. Does this make sense? Craig ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Replication
Craig James wrote: Andreas Kostyrka wrote: Slony provides near instantaneous failovers (in the single digit seconds range). You can script an automatic failover if the master server becomes unreachable. But Slony slaves are read-only, correct? So the system isn't fully functional once the master goes down. That is what promotion is for. Joshua D. Drake ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Replication
What about "Daffodil Replicator" - GPL - http://sourceforge.net/projects/daffodilreplica/ -- Thanks, Eugene Ogurtsov Internal Development Chief Architect SWsoft, Inc. Craig A. James wrote: Looking for replication solutions, I find: Slony-I Seems good, single master only, master is a single point of failure, no good failover system for electing a new master or having a failed master rejoin the cluster. Slave databases are mostly for safety or for parallelizing queries for performance. Suffers from O(N^2) communications (N = cluster size). Slony-II Seems brilliant, a solid theoretical foundation, at the forefront of computer science. But can't find project status -- when will it be available? Is it a pipe dream, or a nearly-ready reality? PGReplication Appears to be a page that someone forgot to erase from the old GBorg site. PGCluster Seems pretty good, but web site is not current, there are releases in use that are not on the web site, and also seems to always be a couple steps behind the current release of Postgres. Two single-points failure spots, load balancer and the data replicator. Is this a good summary of the status of replication? Have I missed any important solutions or mischaracterized anything? Thanks! Craig (Sorry about the premature send of this message earlier, please ignore.) ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Database size
am Fri, dem 08.06.2007, um 1:22:14 -0700 mailte choksi folgendes: > Hi all, > > I had a database which uses to hold some 50 Mill records and disk > space used was 103 GB. I deleted around 34 Mill records but still the > disk size is same. Can some on please shed some light on this. DELETE only mark rows as deleted, if you need the space you need a VACUUM FULL. Read more: http://www.postgresql.org/docs/current/static/sql-vacuum.html Andreas -- Andreas Kretschmer Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header) GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Replication
Ok, slony supports two kinds of operation here: failover (which moves the master node to a new one without the old master node being present, it also drops the old node from replication) and move set (which moves the master node with cooperation) The usecases for these two are slightly different. one is for all kinds of scheduled maintenance, while the other is what you do when you've got a hardware failure. Andreas -- Ursprüngl. Mitteil. -- Betreff:Re: [PERFORM] Replication Von:Craig James <[EMAIL PROTECTED]> Datum: 15.06.2007 01:48 Andreas Kostyrka wrote: > Slony provides near instantaneous failovers (in the single digit seconds > range). You can script an automatic failover if the master server > becomes unreachable. But Slony slaves are read-only, correct? So the system isn't fully functional once the master goes down. > That leaves you the problem of restarting your app > (or making it reconnect) to the new master. Don't you have to run a Slony app to convert one of the slaves into the master? > 5-10MB data implies such a fast initial replication, that making the > server rejoin the cluster by setting it up from scratch is not an issue. The problem is to PREVENT it from rejoining the cluster. If you have some semi-automatic process that detects the dead server and converts a slave to the master, and in the mean time the dead server manages to reboot itself (or its network gets fixed, or whatever the problem was), then you have two masters sending out updates, and you're screwed. >> The problem is, there don't seem to be any "vote a new master" type of >> tools for Slony-I, and also, if the original master comes back online, >> it has no way to know that a new master has been elected. So I'd have >> to write a bunch of SOAP services or something to do all of this. > > You don't need SOAP services, and you do not need to elect a new master. > if dbX goes down, dbY takes over, you should be able to decide on a > static takeover pattern easily enough. I can't see how that is true. Any self-healing distributed system needs something like the following: - A distributed system of nodes that check each other's health - A way to detect that a node is down and to transmit that information across the nodes - An election mechanism that nominates a new master if the master fails - A way for a node coming online to determine if it is a master or a slave Any solution less than this can cause corruption because you can have two nodes that both think they're master, or end up with no master and no process for electing a master. As far as I can tell, Slony doesn't do any of this. Is there a simpler solution? I've never heard of one. > The point here is, that the servers need to react to a problem, but you > probably want to get the admin on duty to look at the situation as > quickly as possible anyway. No, our requirement is no administrator interaction. We need instant, automatic recovery from failure so that the system stays online. > Furthermore, you need to checkout pgpool, I seem to remember that it has > some bad habits in routing queries. (E.g. it wants to apply write > queries to all nodes, but slony makes the other nodes readonly. > Furthermore, anything inside a BEGIN is sent to the master node, which > is bad with some ORMs, that by default wrap any access into a transaction) I should have been more clear about this. I was planning to use PGPool in the PGPool-1 mode (not the new PGPool-2 features that allow replication). So it would only be acting as a failover mechanism. Slony would be used as the replication mechanism. I don't think I can use PGPool as the replicator, because then it becomes a new single point of failure that could bring the whole system down. If you're using it for INSERT/UPDATE, then there can only be one PGPool server. I was thinking I'd put a PGPool server on every machine in failover mode only. It would have the Slony master as the primary connection, and a Slony slave as the failover connection. The applications would route all INSERT/UPDATE statements directly to the Slony master, and all SELECT statements to the PGPool on localhost. When the master failed, all of the PGPool servers would automatically switch to one of the Slony slaves. This way, the system would keep running on the Slony slaves (so it would be read-only), until a sysadmin could get the master Slony back online. And when the master came online, the PGPool servers would automatically reconnect and write-access would be restored. Does this make sense? Craig ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly