Re: [PERFORM] [ADMIN] Monitoring tool for Postgres Database
> We use Zabbix. There's a Zabbix template for PostgreSQL called "pg_monz". http://pg-monz.github.io/pg_monz/index-en.html Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Turn off Hyperthreading! WAS: 60 core performance with 9.3
On Wed, Aug 20, 2014 at 12:13:50PM -0700, Josh Berkus wrote: On a read-write test, it's 10% faster with HT off as well. Further, from their production machine we've seen that having HT on causes the machine to slow down by 5X whenever you get more than 40 cores (as in 100% of real cores or 50% of HT cores) worth of activity. So we're definitely back to If you're using PostgreSQL, turn off Hyperthreading. Not sure how you can make such a blanket statement when so many people have tested and shown the benefits of hyper-threading. I am also unclear exactly what you tested, as I didn't see it mentioned in the email --- CPU type, CPU count, and operating system would be the minimal information required. HT off is common knowledge for better benchmarking result, at least for me. I've never seen better result with HT on, except POWER. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance degradation after launching postgres cluster using pgpool-II
Before starting the cluster, one bulk updates through java code would cost about 1 hour to finish, but then it would take twice amount of time. pgpool-II is not very good at handling extended protocol (mostly used in Java). If you need to execute large updates, you'd better to connect to PostgreSQL directly. Note that there's no problem that some sessions connect via pgpool-II, while others directly connect PostgreSQL. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] limit connections pgpool
Hi, i want to know what is the pgpool parameter for close the connections directly in the database, because the pgpool II close fine the the childs with the life time, but the connection in the database continue open in idle state. That's the result of connection cache functionality of pgpool-II. If you don't need the connection cache of pgpool-II at all, you could turn it off: connection_cache = off Or you could set following to non 0. connection_life_time After an idle connection to PostgreSQL backend lasts for connection_life_time seconds, the connection will be turned off. If you have further questions, you'd better to subscribe and post the question: http://www.pgpool.net/mailman/listinfo/pgpool-general -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp this is my pool config num_init_children = 100 max_pool = 8 child_life_time = 60 child_max_connections = 0 connection_life_time = 0 client_idle_limit = 60 and this is my postgresql.conf max_connections = 800 shared_buffers = 2048MB temp_buffers = 64MB work_mem = 2048MB maintenance_work_mem = 2048MB wal_buffers = 256 checkpoint_segments = 103 thanks -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] out of memory
i have sql file (it's size are 1GB ) when i execute it then the String is 987098801 bytr too long for encoding conversion error occured . pls give me solution about You hit the upper limit of internal memory allocation limit in PostgreSQL. IMO, there's no way to avoid the error except you use client encoding identical to backend. Hackers: The particular limit seem to be set considering TOAST(from include/utils/memutils.h): * XXX This is deliberately chosen to correspond to the limiting size * of varlena objects under TOAST. See VARSIZE_4B() and related macros * in postgres.h. Many datatypes assume that any allocatable size can * be represented in a varlena header. IMO the SQL string size limit is totally different from TOAST. Shouldn't we have different limit for SQL string? (MAX_CONVERSION_GROWTH is different story, of course) -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] pg_dump and thousands of schemas
On Sun, Jun 10, 2012 at 4:47 PM, Jeff Janes jeff.ja...@gmail.com wrote: On Wed, May 30, 2012 at 2:06 AM, Tatsuo Ishii is...@postgresql.org wrote: Yeah, Jeff's experiments indicated that the remaining bottleneck is lock management in the server. What I fixed so far on the pg_dump side should be enough to let partial dumps run at reasonable speed even if the whole database contains many tables. But if psql is taking AccessShareLock on lots of tables, there's still a problem. Ok, I modified the part of pg_dump where tremendous number of LOCK TABLE are issued. I replace them with single LOCK TABLE with multiple tables. With 100k tables LOCK statements took 13 minutes in total, now it only takes 3 seconds. Comments? Could you rebase this? I tried doing it myself, but must have messed it up because it got slower rather than faster. OK, I found the problem. In fixing a merge conflict, I had it execute the query every time it appended a table, rather than just at the end. With my proposed patch in place, I find that for a full default dump your patch is slightly faster with 300,000 tables, and slightly slower with 300,000. The differences are generally 2% in either direction. When it comes to back-patching and partial dumps, I'm not really sure what to test. For the record, there is still a quadratic performance on the server, albeit with a much smaller constant factor than the Reassign one. It is in get_tabstat_entry. I don't know if is worth working on that in isolation--if PG is going to try to accommodate 100s of thousands of table, there probably needs to be a more general way to limit the memory used by all aspects of the rel caches. I would like to test your patch and w/without my patch. Could you please give me the patches? Or do you have your own git repository? -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] pg_dump and thousands of schemas
On Tue, Jun 12, 2012 at 1:54 AM, Tatsuo Ishii is...@postgresql.org wrote: On Sun, Jun 10, 2012 at 4:47 PM, Jeff Janes jeff.ja...@gmail.com wrote: On Wed, May 30, 2012 at 2:06 AM, Tatsuo Ishii is...@postgresql.org wrote: Yeah, Jeff's experiments indicated that the remaining bottleneck is lock management in the server. What I fixed so far on the pg_dump side should be enough to let partial dumps run at reasonable speed even if the whole database contains many tables. But if psql is taking AccessShareLock on lots of tables, there's still a problem. Ok, I modified the part of pg_dump where tremendous number of LOCK TABLE are issued. I replace them with single LOCK TABLE with multiple tables. With 100k tables LOCK statements took 13 minutes in total, now it only takes 3 seconds. Comments? Could you rebase this? I tried doing it myself, but must have messed it up because it got slower rather than faster. OK, I found the problem. In fixing a merge conflict, I had it execute the query every time it appended a table, rather than just at the end. With my proposed patch in place, I find that for a full default dump your patch is slightly faster with 300,000 tables, and slightly slower with 300,000. The differences are generally 2% in either direction. When it comes to back-patching and partial dumps, I'm not really sure what to test. For the record, there is still a quadratic performance on the server, albeit with a much smaller constant factor than the Reassign one. It is in get_tabstat_entry. I don't know if is worth working on that in isolation--if PG is going to try to accommodate 100s of thousands of table, there probably needs to be a more general way to limit the memory used by all aspects of the rel caches. I would like to test your patch and w/without my patch. Could you please give me the patches? Or do you have your own git repository? The main patch is in the commit fest as Resource Owner reassign Locks for the sake of pg_dump My re-basing of your patch is attached. I tested your patches with current master head. The result was pretty good. Before it took 125 minutes (with 9.2 devel) to dump 100k empty tables and now it takes only less than 4 minutes! $ time pg_dump test /dev/null real3m56.412s user0m12.059s sys 0m3.571s Good job! Now I applied rebased pg_dump patch. real4m1.779s user0m11.621s sys 0m3.052s Unfortunately I see no improvement. Probably my patch's value is for dumping against older backend. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] pg_dump and thousands of schemas
We recently fixed a couple of O(N^2) loops in pg_dump, but those covered extremely specific cases that might or might not have anything to do with what you're seeing. The complainant was extremely helpful about tracking down the problems: http://archives.postgresql.org/pgsql-general/2012-03/msg00957.php http://archives.postgresql.org/pgsql-committers/2012-03/msg00225.php http://archives.postgresql.org/pgsql-committers/2012-03/msg00230.php I'm wondering if these fixes (or today's commit) include the case for a database has ~100 thounsands of tables, indexes. One of my customers has had troubles with pg_dump for the database, it takes over 10 hours. So I did qucik test with old PostgreSQL 9.0.2 and current (as of commit 2755abf386e6572bad15cb6a032e504ad32308cc). In a fresh initdb-ed database I created 100,000 tables, and each has two integer attributes, one of them is a primary key. Creating tables were resonably fast as expected (18-20 minutes). This created a 1.4GB database cluster. pg_dump dbname /dev/null took 188 minutes on 9.0.2, which was pretty long time as the customer complained. Now what was current? Well it took 125 minutes. Ps showed that most of time was spent in backend. Below is the script to create tables. cnt=10 while [ $cnt -gt 0 ] do psql -e -p 5432 -c create table t$cnt(i int primary key, j int); test cnt=`expr $cnt - 1` done p.s. You need to increate max_locks_per_transaction before running pg_dump (I raised to 640 in my case). Just for record, I rerun the test again with my single-LOCK patch, and now total runtime of pg_dump is 113 minutes. 188 minutes(9.0)-125 minutes(git master)-113 minutes(with my patch). So far, I'm glad to see 40% time savings at this point. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] pg_dump and thousands of schemas
On Thu, May 31, 2012 at 10:45 AM, Tatsuo Ishii is...@postgresql.org wrote: Just for record, I rerun the test again with my single-LOCK patch, and now total runtime of pg_dump is 113 minutes. 188 minutes(9.0)-125 minutes(git master)-113 minutes(with my patch). So far, I'm glad to see 40% time savings at this point. I see only 9.6% savings (100 * (113/125 - 1)). What am I missing? What I meant was (100 * (113/188 - 1)). -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] pg_dump and thousands of schemas
Yeah, Jeff's experiments indicated that the remaining bottleneck is lock management in the server. What I fixed so far on the pg_dump side should be enough to let partial dumps run at reasonable speed even if the whole database contains many tables. But if psql is taking AccessShareLock on lots of tables, there's still a problem. Ok, I modified the part of pg_dump where tremendous number of LOCK TABLE are issued. I replace them with single LOCK TABLE with multiple tables. With 100k tables LOCK statements took 13 minutes in total, now it only takes 3 seconds. Comments? -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index 3461f3e..cc1ffd7 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -3832,6 +3832,7 @@ getTables(Archive *fout, int *numTables) int i_reloptions; int i_toastreloptions; int i_reloftype; + bool lock_needed = false; /* Make sure we are in proper schema */ selectSourceSchema(fout, pg_catalog); @@ -4273,15 +4274,21 @@ getTables(Archive *fout, int *numTables) * NOTE: it'd be kinda nice to lock other relations too, not only * plain tables, but the backend doesn't presently allow that. */ - if (tblinfo[i].dobj.dump tblinfo[i].relkind == RELKIND_RELATION) + if (i == 0) { resetPQExpBuffer(query); - appendPQExpBuffer(query, - LOCK TABLE %s IN ACCESS SHARE MODE, + appendPQExpBuffer(query,LOCK TABLE ); + } + + if (tblinfo[i].dobj.dump tblinfo[i].relkind == RELKIND_RELATION) + { + if (lock_needed) +appendPQExpBuffer(query,,); + appendPQExpBuffer(query,%s, fmtQualifiedId(fout, tblinfo[i].dobj.namespace-dobj.name, tblinfo[i].dobj.name)); - ExecuteSqlStatement(fout, query-data); + lock_needed = true; } /* Emit notice if join for owner failed */ @@ -4290,6 +4297,12 @@ getTables(Archive *fout, int *numTables) tblinfo[i].dobj.name); } + if (lock_needed) + { + appendPQExpBuffer(query, IN ACCESS SHARE MODE); + ExecuteSqlStatement(fout, query-data); + } + if (lockWaitTimeout fout-remoteVersion = 70300) { ExecuteSqlStatement(fout, SET statement_timeout = 0); -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] pg_dump and thousands of schemas
Yeah, Jeff's experiments indicated that the remaining bottleneck is lock management in the server. What I fixed so far on the pg_dump side should be enough to let partial dumps run at reasonable speed even if the whole database contains many tables. But if psql is taking AccessShareLock on lots of tables, there's still a problem. Ok, I modified the part of pg_dump where tremendous number of LOCK TABLE are issued. I replace them with single LOCK TABLE with multiple tables. With 100k tables LOCK statements took 13 minutes in total, now it only takes 3 seconds. Comments? Shall I commit to master and all supported branches? -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] pg_dump and thousands of schemas
I'm not excited by this patch. It dodges the O(N^2) lock behavior for the initial phase of acquiring the locks, but it does nothing for the lock-related slowdown occurring in all pg_dump's subsequent commands. I think we really need to get in the server-side fix that Jeff Janes is working on, and then re-measure to see if something like this is still worth the trouble. Well, even with current backend, locking 100,000 tables has been done in 3 seconds in my test. So even if Jeff Janes's fix is succeeded, I guess it will just save 3 seconds in my case. and if number of tables is smaller, the saving will smaller. This suggests that most of time for processing LOCK has been spent in communication between pg_dump and backend. Of course this is just my guess, though. I am also a tad concerned about whether we might not have problems with parsing memory usage, or some such, with thousands of tables being listed in a single command. That's easy to fix. Just divide each LOCK statements into multiple LOCK statements. My big concern is, even if the locking part is fixed (either by Jeff Jane's fix or by me) still much time in pg_dump is spent for SELECTs against system catalogs. The fix will be turn many SELECTs into single SELECT, probably using big IN clause for tables oids. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] pg_dump and thousands of schemas
We recently fixed a couple of O(N^2) loops in pg_dump, but those covered extremely specific cases that might or might not have anything to do with what you're seeing. The complainant was extremely helpful about tracking down the problems: http://archives.postgresql.org/pgsql-general/2012-03/msg00957.php http://archives.postgresql.org/pgsql-committers/2012-03/msg00225.php http://archives.postgresql.org/pgsql-committers/2012-03/msg00230.php I'm wondering if these fixes (or today's commit) include the case for a database has ~100 thounsands of tables, indexes. One of my customers has had troubles with pg_dump for the database, it takes over 10 hours. So I did qucik test with old PostgreSQL 9.0.2 and current (as of commit 2755abf386e6572bad15cb6a032e504ad32308cc). In a fresh initdb-ed database I created 100,000 tables, and each has two integer attributes, one of them is a primary key. Creating tables were resonably fast as expected (18-20 minutes). This created a 1.4GB database cluster. pg_dump dbname /dev/null took 188 minutes on 9.0.2, which was pretty long time as the customer complained. Now what was current? Well it took 125 minutes. Ps showed that most of time was spent in backend. Below is the script to create tables. cnt=10 while [ $cnt -gt 0 ] do psql -e -p 5432 -c create table t$cnt(i int primary key, j int); test cnt=`expr $cnt - 1` done p.s. You need to increate max_locks_per_transaction before running pg_dump (I raised to 640 in my case). -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] pg_dump and thousands of schemas
Yeah, Jeff's experiments indicated that the remaining bottleneck is lock management in the server. What I fixed so far on the pg_dump side should be enough to let partial dumps run at reasonable speed even if the whole database contains many tables. But if psql is taking AccessShareLock on lots of tables, there's still a problem. Yes, I saw this kind of lines: 29260 2012-05-30 09:39:19 JST LOG: statement: LOCK TABLE public.t10 IN ACCESS SHARE MODE It seems this is not very efficient query since LOCK TABLE can take multiple tables as an argument and we could pass as many tables as possible to one LOCK TABLE query. This way we could reduce the communication between pg_dump and backend. Also I noticed lots of queries like these: 29260 2012-05-30 09:39:19 JST LOG: statement: SELECT attname, attacl FROM pg_catalog.pg_attribute WHERE attrelid = '516391' AND NOT attisdropped AND attacl IS NOT NULL ORDER BY attnum I guess this is for each table and if there are tones of tables these queries are major bottle neck as well as LOCK. I think we could optimize somewhat this in that we issue queries to extract info of multiple tables rather than extracting only one table inof as current implementation does. Or even better we could create a temp table which contains target table oids to join the query above. In my opinion, particular use case such as multi tenancy would create tons of objects in a database cluster and the performance of pg_dump might be highlighted more in the future. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] pg_dump and thousands of schemas
Hugo Nabble hugo.t...@gmail.com writes: If anyone has more suggestions, I would like to hear them. Thank you! Provide a test case? We recently fixed a couple of O(N^2) loops in pg_dump, but those covered extremely specific cases that might or might not have anything to do with what you're seeing. The complainant was extremely helpful about tracking down the problems: http://archives.postgresql.org/pgsql-general/2012-03/msg00957.php http://archives.postgresql.org/pgsql-committers/2012-03/msg00225.php http://archives.postgresql.org/pgsql-committers/2012-03/msg00230.php I'm wondering if these fixes (or today's commit) include the case for a database has ~100 thounsands of tables, indexes. One of my customers has had troubles with pg_dump for the database, it takes over 10 hours. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Oracle v. Postgres 9.0 query performance
* Tony Capobianco (tcapobia...@prospectiv.com) wrote: HashAggregate (cost=4391163.81..4391288.05 rows=9939 width=12) - Hash Join (cost=14.78..4344767.23 rows=9279316 width=12) Hash Cond: (o.emailcampaignid = s.emailcampaignid) - Seq Scan on openactivity o (cost=0.00..3529930.67 rows=192540967 width=12) - Hash (cost=8.79..8.79 rows=479 width=4) - Seq Scan on ecr_sents s (cost=0.00..8.79 rows=479 width=4) Yikes. Two sequential scans. Err, isn't that more-or-less exactly what you want here? The smaller table is going to be hashed and then you'll traverse the bigger table and bounce each row off the hash table. Have you tried actually running this and seeing how long it takes? The bigger table doesn't look to be *that* big, if your i/o subsystem is decent and you've got a lot of memory available for kernel cacheing, should be quick. Just out of curiosity, is there any chance that this kind of query is speeding up in 9.1 because of following changes? * Allow FULL OUTER JOIN to be implemented as a hash join, and allow either side of a LEFT OUTER JOIN or RIGHT OUTER JOIN to be hashed (Tom Lane) Previously FULL OUTER JOIN could only be implemented as a merge join, and LEFT OUTER JOIN and RIGHT OUTER JOIN could hash only the nullable side of the join. These changes provide additional query optimization possibilities. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] pgpoolAdmin handling several pgpool-II clusters
Can pgpoolAdmin utility handle(administer) more than one pgpool-II custer? No. pgpoolAdmin only supports one pgpool-II server. We have a need of setting up 3 independent postgres clusters. One cluster handling cadastral maps, one handling raster maps and one handling vector maps. Each of these clusters must have a load balancer - EG pgpool-II. Internally in each cluster we plan to (and have tested) PostgreSQL(9.03)'s own streaming replication. We have installed pgpool-II, and are now confronted with the complicated installation of pgpoolAdmin web-app. Hence we would very much like to have only one pgpoolAdmin instance to govern all 3 pgpool-II clusters. (Alternatively we will go for a more complex configuration with PostgresXC.) Becase pgpoolAdmin is a web application, you could assign a tab to a pgpoolAdmin. (Have tried to post to http://pgsqlpgpool.blogspot.com - with no success) It's my personal blog:-) Please post to pgpool-geneal mailing list. You can subscribe it from: http://lists.pgfoundry.org/mailman/listinfo/pgpool-general -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] pgpoolAdmin handling several pgpool-II clusters
Can pgpoolAdmin utility handle(administer) more than one pgpool-II custer? No. pgpoolAdmin only supports one pgpool-II server. We have a need of setting up 3 independent postgres clusters. One cluster handling cadastral maps, one handling raster maps and one handling vector maps. Each of these clusters must have a load balancer - EG pgpool-II. Internally in each cluster we plan to (and have tested) PostgreSQL(9.03)'s own streaming replication. We have installed pgpool-II, and are now confronted with the complicated installation of pgpoolAdmin web-app. Hence we would very much like to have only one pgpoolAdmin instance to govern all 3 pgpool-II clusters. (Alternatively we will go for a more complex configuration with PostgresXC.) Becase pgpoolAdmin is a web application, you could assign a tab to a pgpoolAdmin. (Have tried to post to http://pgsqlpgpool.blogspot.com - with no success) It's my personal blog:-) Please post to pgpool-geneal mailing list. You can subscribe it from: http://lists.pgfoundry.org/mailman/listinfo/pgpool-general -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Stored procedure declared as VOLATILE = no good optimization is done
broadly speaking: *) function generates same output from inputs regardless of what's going on in the database, and has no side effects: IMMUTABLE So can I say if a function is marked IMMUTABLE, then it should never modify database? Is there any counter example? *) function reads (only) from tables, or is an immutable function in most senses but influenced from the GUC (or any other out of scope thing): STABLE It seems if above is correct, I can say STABLE functions should never modify databases as well. *) all other cases: VOLATILE (which is btw the default) -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.
From: Rajesh Kumar Mallah mallah.raj...@gmail.com Subject: Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning. Date: Mon, 19 Jul 2010 08:06:09 +0530 Message-ID: aanlktilggkbmc9h7wlhlcdqfm5rjth1-9dpf8golv...@mail.gmail.com Thanks for the thought but it (-C) does not work . Still you need: pgbench's -c = (pool_size + reserve_pool_size) -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] multiple apaches against single postgres database
Sorry for an off topic posting... Michal, Honza Novak napsal(a): And my questions: 1. Does someone hes similar experience? or clue what to do with it? Sure, this is considered normal behavior for web applications. The solution is to use connection pooling. 2. What is correct setup of postgresql backend serving data for many (4+) apaches? i know that there are connection pooling solutions (pgPool, pgBouncer, or apache 2.2) and i'm thinking about them, but it seems that we have other problem beside that we didn't implement any pooling solution yet. We use pgpool running on each web server. You can have also the pgpool running on the database server or even a separate server just for that. You'll have to test to see what's best for you. As a member of pgpool development team, I am always looking for pgpool examples in the real world which could be open to public. Can you plese tell me more details the pgpool usage if possible? -- Tatsuo Ishii SRA OSS, Inc. Japan 3. is there a way to somehow log what happened to the postgres server before accident? do you think that logging of all sql statements would help me? if i enable it, what will be the performance overhead? What you are seeing is called positive feedback. Once the server reaches a certain performance threshold, it starts to delay the queries, which causes more load, which causes further delay, until everything comes to a halt. Sometimes the system can recover from this, if you have properly setup limits (it will just refuse the requests until it can cool off), sometimes it doesn't. The point is never get over the threshold. Also, maybe you need better hardware for that kind of load, but since you didn't provide more detail, we can't tell you. It's quite meaningless to analyze performance once the system is overloaded. You have to analyze before that happens and identify the longest running queries under normal load and try to optimize them. Under heavy load, even the simplest query may seem to be taking long time, but it doesn't necessarily mean there is something wrong with it. -- Michal Táborský chief systems architect Internet Mall, a.s. http://www.MALL.cz ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL
Tatsuo Ishii [EMAIL PROTECTED] writes: Interesting. We (some Japanese companies including SRA OSS, Inc. Japan) did some PG scalability testing using a Unisys's big 16 (physical) CPU machine and found PG scales up to 8 CPUs. However beyond 8 CPU PG does not scale anymore. The result can be viewed at OSS iPedia web site (http://ossipedia.ipa.go.jp). Our conclusion was PG has a serious lock contention problem in the environment by analyzing the oprofile result. 18% in s_lock is definitely bad :-(. Were you able to determine which LWLock(s) are accounting for the contention? Yes. We were interested in that too. Some people did addtional tests to determin that. I don't have the report handy now. I will report back next week. Sorry for the delay. Finally I got the oprofile data. It's huge(34MB). If you are interested, I can put somewhere. Please let me know. -- Tatsuo Ishii SRA OSS, Inc. Japan ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL
I am thrill to inform you all that Sun has just donated a fully loaded T2000 system to the PostgreSQL community, and it's being setup by Corey Shields at OSL (osuosl.org) and should be online probably early next week. The system has * 8 cores, 4 hw threads/core @ 1.2 GHz. Solaris sees the system as having 32 virtual CPUs, and each can be enabled or disabled individually * 32 GB of DDR2 SDRAM memory * 2 @ 73GB internal SAS drives (1 RPM) * 4 Gigabit ethernet ports For complete spec, visit http://www.sun.com/servers/coolthreads/t2000/specifications.jsp I think this system is well suited for PG scalability testing, among others. We did an informal test using an internal OLTP benchmark and noticed that PG can scale to around 8 CPUs. Would be really cool if all 32 virtual CPUs can be utilized!!! Interesting. We (some Japanese companies including SRA OSS, Inc. Japan) did some PG scalability testing using a Unisys's big 16 (physical) CPU machine and found PG scales up to 8 CPUs. However beyond 8 CPU PG does not scale anymore. The result can be viewed at OSS iPedia web site (http://ossipedia.ipa.go.jp). Our conclusion was PG has a serious lock contention problem in the environment by analyzing the oprofile result. You can take a look at the detailed report at: http://ossipedia.ipa.go.jp/capacity/EV0604210111/ (unfortunately only Japanese contents is available at the moment. Please use some automatic translation services) Evalution environment was: PostgreSQL 8.1.2 OSDL DBT-1 2.1 Miracle Linux 4.0 Unisys ES700 Xeon 2.8GHz CPU x 16 Mem 16GB(HT off) -- Tatsuo Ishii SRA OSS, Inc. Japan ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL
Tatsuo Ishii [EMAIL PROTECTED] writes: Interesting. We (some Japanese companies including SRA OSS, Inc. Japan) did some PG scalability testing using a Unisys's big 16 (physical) CPU machine and found PG scales up to 8 CPUs. However beyond 8 CPU PG does not scale anymore. The result can be viewed at OSS iPedia web site (http://ossipedia.ipa.go.jp). Our conclusion was PG has a serious lock contention problem in the environment by analyzing the oprofile result. 18% in s_lock is definitely bad :-(. Were you able to determine which LWLock(s) are accounting for the contention? Yes. We were interested in that too. Some people did addtional tests to determin that. I don't have the report handy now. I will report back next week. The test case seems to be spending a remarkable amount of time in LIKE comparisons, too. That probably is not a representative condition. I know. I think point is 18% in s_lock only appears with 12 CPUs or more. -- Tatsuo Ishii SRA OSS, Inc. Japan ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] High context switches occurring
Bruce Momjian pgman@candle.pha.pa.us writes: Tom Lane wrote: 1. You don't want number of clients (-c) much higher than scaling factor (-s in the initialization step). Should we throw a warning when someone runs the test this way? Not a bad idea (though of course only for the standard scripts). Tatsuo, what do you think? That would be annoying since almost every users will get the kind of warnings. What about improving the README? -- Tatsuo Ishii SRA OSS, Inc. Japan ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] PostgreSQL clustering VS MySQL clustering
Tatsuo Ishii [EMAIL PROTECTED] writes: Probably VACUUM works well for small to medium size tables, but not for huge ones. I'm considering about to implement on the spot salvaging dead tuples. That's impossible on its face, except for the special case where the same transaction inserts and deletes a tuple. In all other cases, the transaction deleting a tuple cannot know whether it will commit. Of course. We need to keep a list of such that tuples until commit or abort. what about other transactions, which may have started before current one and be still running when current one commites ? Then dead tuples should be left. Perhaps in this case we could register them in FSM or whatever for later processing. -- Tatsuo Ishii I once proposed an extra parameter added to VACUUM FULL which determines how much free space to leave in each page vacuumed. If there were room the new tuple could be placed near the old one in most cases and thus avoid lots of disk head movement when updating huge tables in one go. Hannu Krosing [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] PgPool changes WAS: PostgreSQL clustering VS MySQL
Peter, Ragnar, Are there ones that you use which might use several different connections to send a series of queries from a single web-user, less than 5 seconds apart? Using Apache/Perl I often have a situation where we're sending several queries from the same user (web client) within seconds, or even simultaneously, that use different connections. So from the sound of it, the connection methods I've been using are the exception rather than the rule. Darn, it worked well for us. :-( What this would point to is NOT being able to use Slony-I for database server pooling for most web applications. Yes? Users should look to pgCluster and C-JDBC instead. Yup. That's the limitaion of async replication solutions. BTW, Tatsuo, what's the code relationship between pgPool and pgCluster, if any? PGCluster consists of three kind of servers, load balance server, cluster server(modified PostgreSQL backend) and replication server. I believe some of codes of pgpool are used in the load balance server to avoid re-invent a wheel. This is a beauty of open source software project. -- Tatsuo Ishii ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] PgPool changes WAS: PostgreSQL clustering VS MySQL
On Mon, 2005-01-24 at 09:52 -0800, Josh Berkus wrote: [about keeping connections open in web context] Ah, clarity problem here.I'm talking about connection pooling tools from the client (webserver) side, such as Apache::DBI, PHP's pg_pconnect, Jakarta's connection pools, etc. Not pooling on the database server side, which is what pgPool provides. note that these sometimes do not provide connection pooling as such, just persistent connections (Apache::DBI) Right. Same thing can be said to pg_pconnect. Most of these tools allocate a database connection to an HTTP/middleware client, and only release it after a specific period of inactivity.This means that you *could* count on web-user==connection for purposes of switching back and forth to the master -- as long as the connection-recycling timeout were set higher than the pgPool switch-off period. no. you can only count on web-server-process==connection, but not web-user==connection, unless you can garantee that the same user client always connects to same web-server process. I have same opinion. am i missing something ? -- Tatsuo Ishii ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] PostgreSQL clustering VS MySQL clustering
Tatsuo, I'm not clear what pgPool only needs to monitor update switching by *connection* not by *table* means. In your example: (1) 00:00 User A updates My Profile (2) 00:01 My Profile UPDATE finishes executing. (3) 00:02 User A sees My Profile re-displayed (6) 00:04 My Profile:UserA cascades to the last Slave server I think (2) and (3) are on different connections, thus pgpool cannot judge if SELECT in (3) should go only to the master or not. To solve the problem you need to make pgpool understand web sessions not database connections and it seems impossible for pgpool to understand sessions. Depends on your connection pooling software, I suppose. Most connection pooling software only returns connections to the pool after a user has been inactive for some period ... generally more than 3 seconds. So connection continuity could be trusted. Not sure what you mean by most connection pooling software, but I'm sure that pgpool behaves differently. -- Tatsuo Ishii ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] PostgreSQL clustering VS MySQL clustering
Tatsuo Ishii [EMAIL PROTECTED] writes: Probably VACUUM works well for small to medium size tables, but not for huge ones. I'm considering about to implement on the spot salvaging dead tuples. That's impossible on its face, except for the special case where the same transaction inserts and deletes a tuple. In all other cases, the transaction deleting a tuple cannot know whether it will commit. Of course. We need to keep a list of such that tuples until commit or abort. -- Tatsuo Ishii ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] PostgreSQL clustering VS MySQL clustering
Tatsuo, Yes. However it would be pretty easy to modify pgpool so that it could cope with Slony-I. I.e. 1) pgpool does the load balance and sends query to Slony-I's slave and master if the query is SELECT. 2) pgpool sends query only to the master if the query is other than SELECT. Remaining problem is that Slony-I is not a sync replication solution. Thus you need to prepare that the load balanced query results might differ among servers. Yes, please, some of us are already doing the above ad-hoc. The simple workaround to replication lag is to calculate the longest likely lag (3 seconds if Slony is tuned right) and have the dispatcher (pgpool) send all requests from that connection to the master for that period. Then it switches back to pool mode where the slaves may be used. Can I ask a question? Suppose table A gets updated on the master at time 00:00. Until 00:03 pgpool needs to send all queries regarding A to the master only. My question is, how can pgpool know a query is related to A? -- Tatsuo Ishii Of course, all of the above is only useful if you're doing a web app where 96% of query activity is selects. For additional scalability, put all of your session maintenance in memcached, so that you're not doing database writes every time a page loads. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] PostgreSQL clustering VS MySQL clustering
Tatsuo, Suppose table A gets updated on the master at time 00:00. Until 00:03 pgpool needs to send all queries regarding A to the master only. My question is, how can pgpool know a query is related to A? Well, I'm a little late to head off tangental discussion about this, but The systems where I've implemented something similar are for web applications. In the case of the web app, you don't care if a most users see data which is 2 seconds out of date; with caching and whatnot, it's often much more than that! The one case where it's not permissable for a user to see old data is the case where the user is updating the data. Namely: (1) 00:00 User A updates My Profile (2) 00:01 My Profile UPDATE finishes executing. (3) 00:02 User A sees My Profile re-displayed (6) 00:04 My Profile:UserA cascades to the last Slave server So in an application like the above, it would be a real problem if User A were to get switched over to a slave server immediately after the update; she would see the old data, assume that her update was not saved, and update again. Or send angry e-mails to [EMAIL PROTECTED] However, it makes no difference what User B sees: (1) 00:00 User A updates My Profilev1 Master (2) 00:01 My Profile UPDATE finishes executing. Master (3) 00:02 User A sees My Profilev2 displayed Master (4) 00:02 User B requests MyProfile:UserA Slave2 (5) 00:03 User B sees My Profilev1 Slave2 (6) 00:04 My Profilev2 cascades to the last Slave server Slave2 If the web application is structured properly, the fact that UserB is seeing UserA's information which is 2 seconds old is not a problem (though it might be for web auctions, where it could result in race conditions. Consider memcached as a helper). This means that pgPool only needs to monitor update switching by *connection* not by *table*. Make sense? I'm not clear what pgPool only needs to monitor update switching by *connection* not by *table* means. In your example: (1) 00:00 User A updates My Profile (2) 00:01 My Profile UPDATE finishes executing. (3) 00:02 User A sees My Profile re-displayed (6) 00:04 My Profile:UserA cascades to the last Slave server I think (2) and (3) are on different connections, thus pgpool cannot judge if SELECT in (3) should go only to the master or not. To solve the problem you need to make pgpool understand web sessions not database connections and it seems impossible for pgpool to understand sessions. -- Tatsuo Ishii ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] PostgreSQL clustering VS MySQL clustering
IMO the bottle neck is not WAL but table/index bloat. Lots of updates on large tables will produce lots of dead tuples. Problem is, There' is no effective way to reuse these dead tuples since VACUUM on huge tables takes longer time. 8.0 adds new vacuum delay paramters. Unfortunately this does not help. It just make the execution time of VACUUM longer, that means more and more dead tuples are being made while updating. Probably VACUUM works well for small to medium size tables, but not for huge ones. I'm considering about to implement on the spot salvaging dead tuples. -- Tatsuo Ishii This is probably a lot easier than you would think. You say that your DB will have lots of data, lots of updates and lots of reads. Very likely the disk bottleneck is mostly index reads and writes, with some critical WAL fsync() calls. In the grand scheme of things, the actual data is likely not accessed very often. The indexes can be put on a RAM disk tablespace and that's the end of index problems -- just make sure you have enough memory available. Also make sure that the machine can restart correctly after a crash: the tablespace is dropped and recreated, along with the indexes. This will cause a machine restart to take some time. After that, if the WAL fsync() calls are becoming a problem, put the WAL files on a fast RAID array, etiher a card or external enclosure, that has a good amount of battery-backed write cache. This way, the WAL fsync() calls will flush quickly to the RAM and Pg can move on while the RAID controller worries about putting the data to disk. With WAL, low access time is usually more important than total throughput. The truth is that you could have this running for not much money. Good Luck, Marty Le Jeudi 20 Janvier 2005 19:09, Bruno Almeida do Lago a écrit : Could you explain us what do you have in mind for that solution? I mean, forget the PostgreSQL (or any other database) restrictions and explain us how this hardware would be. Where the data would be stored? I've something in mind for you, but first I need to understand your needs! I just want to make a big database as explained in my first mail ... At the beginning we will have aprox. 150 000 000 records ... each month we will add about 4/8 millions new rows in constant flow during the day ... and in same time web users will access to the database in order to read those data. Stored data are quite close to data stored by google ... (we are not making a google clone ... just a lot of data many small values and some big ones ... that's why I'm comparing with google for data storage). Then we will have a search engine searching into those data ... Dealing about the hardware, for the moment we have only a bi-pentium Xeon 2.8Ghz with 4 Gb of RAM ... and we saw we had bad performance results ... so we are thinking about a new solution with maybe several servers (server design may vary from one to other) ... to get a kind of cluster to get better performance ... Am I clear ? Regards, ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED]) ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] PostgreSQL clustering VS MySQL clustering
Peter, Tatsuo: would happen with SELECT queries that, through a function or some other mechanism, updates data in the database? Would those need to be passed to pgpool in some special way? Oh, yes, that reminds me. It would be helpful if pgPool accepted a control string ... perhaps one in a SQL comment ... which indicated that the statement to follow was, despite appearances, an update. For example: --STATEMENT_IS_UPDATE\n Actually the way judging if it's a pure SELECT or not in pgpool is very simple. pgpool just checkes if the SQL statement exactly begins with SELECT (case insensitive, of course). So, for example, you could insert an SQL comment something like /*this SELECT has side effect*/ at the beginning of line to indicate that pgpool should not send this query to the slave. The alternative is, of course, that pgPool direct all explicit transactions to the master ... which is a good idea anyway.So you could do: BEGIN; SELECT some_update_function(); COMMIT; Yes. pgpool has already done this in load balancing. Expanding this for Slony-I is pretty easy. -- Tatsuo Ishii ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] PostgreSQL clustering VS MySQL clustering
On January 20, 2005 06:49 am, Joshua D. Drake wrote: Stephen Frost wrote: * Herv? Piedvache ([EMAIL PROTECTED]) wrote: Le Jeudi 20 Janvier 2005 15:30, Stephen Frost a écrit : * Herv? Piedvache ([EMAIL PROTECTED]) wrote: Is there any solution with PostgreSQL matching these needs ... ? You might look into pg_pool. Another possibility would be slony, though I'm not sure it's to the point you need it at yet, depends on if you can handle some delay before an insert makes it to the slave select systems. I think not ... pgpool or slony are replication solutions ... but as I have said to Christopher Kings-Lynne how I'll manage the scalabilty of the database ? I'll need several servers able to load a database growing and growing to get good speed performance ... They're both replication solutions, but they also help distribute the load. For example: pg_pool will distribute the select queries amoung the servers. They'll all get the inserts, so that hurts, but at least the select queries are distributed. slony is similar, but your application level does the load distribution of select statements instead of pg_pool. Your application needs to know to send insert statements to the 'main' server, and select from the others. You can put pgpool in front of replicator or slony to get load balancing for reads. Last time I checked load ballanced reads was only available in pgpool if you were using pgpools's internal replication. Has something changed recently? Yes. However it would be pretty easy to modify pgpool so that it could cope with Slony-I. I.e. 1) pgpool does the load balance and sends query to Slony-I's slave and master if the query is SELECT. 2) pgpool sends query only to the master if the query is other than SELECT. Remaining problem is that Slony-I is not a sync replication solution. Thus you need to prepare that the load balanced query results might differ among servers. If there's enough demand, I would do such that enhancements to pgpool. -- Tatsuo Ishii Is there any other solution than a Cluster for our problem ? Bigger server, more CPUs/disks in one box. Try to partition up your data some way such that it can be spread across multiple machines, then if you need to combine the data have it be replicated using slony to a big box that has a view which joins all the tables and do your big queries against that. But I'll arrive to limitation of a box size quickly I thing a 4 processors with 64 Gb of RAM ... and after ? Opteron. IBM Z-series, or other big iron. Go to non-x86 hardware after if you're going to continue to increase the size of the server. Personally I think your better bet might be to figure out a way to partition up your data (isn't that what google does anyway?). Stephen -- Darcy Buskermolen Wavefire Technologies Corp. ph: 250.717.0200 fx: 250.763.1759 http://www.wavefire.com ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] determining max_fsm_pages
Pg: 7.4.5 8G ram 200G RAID5 I have my fsm set as such: max_fsm_pages = 30 # min max_fsm_relations*16, 6 bytes each max_fsm_relations = 500 # min 100, ~50 bytes each I just did a vacuum full on one table and saw this result: INFO: analyzing cdm.cdm_fed_agg_purch INFO: cdm_fed_agg_purch: 667815 pages, 3000 rows sampled, 52089570 estimated total rows My question is this: I have about 8 databases running on this server. When I do a vacuum full on each of these databases, there is a INFO section that I assume is the total pages used for that database. Should add ALL these individual pages together and pad the total and use this as my new max_fsm_pages? Should I do the same thing with max_fsm_relations? I think that's too much and too big FSM affects performance in my opinion. The easiest way to calculate appropreate FSM size is doing vacuumdb -a -v and watching the message. At the very end, you would see something like: INFO: free space map: 13 relations, 1447 pages stored; 1808 total pages needed DETAIL: Allocated FSM size: 100 relations + 1600 pages = 19 kB shared memory. In this case 1808 is the minimum FSM size. Of course this number would change depending on the frequency of VACUUM. Therefore you need some room for the FSM size. -- Tatsuo Ishii ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Caching of Queries
Tatsuo Ishii [EMAIL PROTECTED] writes: First, it's not a particular problem with pgpool. As far as I know any connection pool solution has exactly the same problem. Second, it's easy to fix if PostgreSQL provides a functionarity such as:drop all temporary tables if any. I don't like that definition exactly --- it would mean that every time we add more backend-local state, we expect client drivers to know to issue the right incantation to reset that kind of state. I'm thinking we need to invent a command like RESET CONNECTION that resets GUC variables, drops temp tables, forgets active NOTIFYs, and generally does whatever else needs to be done to make the session state appear virgin. When we add more such state, we can fix it inside the backend without bothering clients. Great. It's much better than I propose. I now realize that our RESET ALL command for GUC variables was not fully thought out. We could possibly redefine it as doing the above, but that might break some applications ... regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Caching of Queries
I don't know what you are exactly referring to in above URL when you are talking about potential pitfalls of pooling. Please explain more. Sorry, I wasn't implying that pgpool doesn't deal with the issues, just that some people aren't necessarily aware of them up front. For instance, pgpool does an 'abort transaction' and a 'reset all' in lieu of a full reconnect (of course, since a full reconnect is exactly what we are trying to avoid). Is this is enough to guarantee that a given pooled connection behaves exactly as a non-pooled connection would from a client perspective? For instance, temporary tables are usually dropped at the end of a session, so a client (badly coded perhaps) that does not already use persistent connections might be confused when the sequence 'connect, create temp table foo ..., disconnect, connect, create temp table foo ...' results in the error 'Relation 'foo' already exists'. First, it's not a particular problem with pgpool. As far as I know any connection pool solution has exactly the same problem. Second, it's easy to fix if PostgreSQL provides a functionarity such as:drop all temporary tables if any. I think we should implement it if we agree that connection pooling should be implemented outside the PostgreSQL engine itself. I think cores agree with this. -- Tatsuo Ishii ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Caching of Queries
More to the point though, I think this is a feature that really really should be in the DB, because then it's trivial for people to use. How does putting it into PGPool make it any less trivial for people to use? The answers are at http://www2b.biglobe.ne.jp/~caco/pgpool/index-e.html . Specifically, it's a separate application that needs configuration, the homepage has no real discussion of the potential pitfalls of pooling and what this implementation does to get around them, you get the idea. I don't know what you are exactly referring to in above URL when you are talking about potential pitfalls of pooling. Please explain more. -- Tatsuo Ishii ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Performance Bottleneck
Jeff wrote: On Aug 8, 2004, at 1:29 AM, Martin Foster wrote: I am currently making use of Apache::DBI which overrides the DBI::disconnect call and keeps a pool of active connections for use when need be. Since it offloads the pooling to the webserver, it seems more advantageous then pgpool which while being able to run on a external system is not adding another layer of complexity. Apache::DBI is not the same sort of a pool as pgpool. DB connections are not shared among all your apache children (A common misconception). So if you have 300 apache kids you can have have 300 db connections. With pgpool connections are shared among all of them so even though you have 300 kids you only have say 32 db connections. Seems that you are right, never noticed that from the documentation before. I always assumed it had something to do with the long lasting/persistent scripts that would remain in transactions for extended periods of time. Here is an odd question. While the server run 7.4.x, the client connects with 7.3.x. Would this in itself make a difference in performance as the protocols are different? At least based from pgpool's documentation. In this case the server fall back from V3 protocol (employed in 7.4 or later) to V2 protocol (employed in from 6.4 to 7.3.x). As far as pgpool concerning, performance difference is significant. Of course that depends on the implementation though. FYI here is the outline of the testing using pgbench. H/W: Pentium4 2.4GHz x2/memory 1GB/HDD IDE 80GB (all PCs are same spec) S/W: RedHat Linux 9/PostgreSQL 7.3.6/7.4.3 postgresql.conf: tcpip_socket = true max_connections = 512 shared_buffers = 2048 host A: pgbench, host B: pgpool, host C: PostgreSQL 7.3.6 or 7.4.3 pgbench parameters: -S -c 10 -t 1000 result: TPS ratio(7.4.3) ratio(7.3.6) without pgpool 4357.625059 100% 100% with pgpool(connection pool mode) 4330.290294 99.4% 94.1% with pgpool(replication mode) 4297.614996 98.6% 87.6% with pgpoo(replication with strictmode) 4270.223136 98.0% 81.5% -- Tatsuo Ishii ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Performance Bottleneck
Arjen van der Meijden wrote: On 8-8-2004 16:29, Matt Clark wrote: There are two well-worn and very mature techniques for dealing with the issue of web apps using one DB connection per apache process, both of which work extremely well and attack the issue at its source. 1)Use a front-end caching proxy like Squid as an accelerator. Static content will be served by the accelerator 99% of the time. Additionally, large pages can be served immediately to the accelerator by Apache, which can then go on to serve another request without waiting for the end user's dial-up connection to pull the data down. Massive speedup, fewer apache processes needed. Another version of this 1) is to run with a content accelerator; our favourite is to run Tux in front of Apache. It takes over the connection-handling stuff, has a very low memoryprofile (compared to Apache) and very little overhead. What it does, is to serve up all simple content (although you can have cgi/php/perl and other languages being processed by it, entirely disabling the need for apache in some cases) and forwards/proxies everything it doesn't understand to an Apache/other webserver running at the same machine (which runs on another port). I think there are a few advantages over Squid; since it is partially done in kernel-space it can be slightly faster in serving up content, apart from its simplicity which will probably matter even more. You'll have no caching issues for pages that should not be cached or static files that change periodically (like every few seconds). Afaik Tux can handle more than 10 times as much ab-generated requests per second than a default-compiled Apache on the same machine. And besides the speed-up, you can do any request you where able to do before, since Tux will simply forward it to Apache if it didn't understand it. Anyway, apart from all that. Reducing the amount of apache-connections is nice, but not really the same as reducing the amount of pooled-connections using a db-pool... You may even be able to run with 1000 http-connections, 40 apache-processes and 10 db-connections. In case of the non-pooled setup, you'd still have 40 db-connections. In a simple test I did, I did feel pgpool had quite some overhead though. So it should be well tested, to find out where the turnover-point is where it will be a gain instead of a loss... I don't know what were the configurations you are using, but I noticed that UNIX domain sockets are preferred for the connection bwteen clients and pgpool. When I tested using pgbench -C (involving connection estblishing for each transaction), with-pgpool-configuration 10 times faster than without-pgpool-conf if using UNIX domain sockets, while there is only 3.6 times speed up with TCP/IP sockets. Best regards, Arjen van der Meijden Other then images, there are very few static pages being loaded up by the user.Since they make up a very small portion of the traffic, it tends to be an optimization we can forgo for now. I attempted to make use of pgpool. At the default 32 connections pre-forked the webserver almost immediately tapped out the pgpool base and content stopped being served because no new processes were being forked to make up for it. So I raised it to a higher value (256) and it immediately segfaulted and dropped the core.So not sure exactly how to proceed, since I rather need the thing to fork additional servers as load hits and not the other way around. What version of pgpool did you test? I know that certain version (actually 2.0.2) had such that problem. Can you try again with the latest verison of pgpool? (it's 2.0.6). -- Tatsuo Ishii ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster