Re: [PERFORM] [ADMIN] Monitoring tool for Postgres Database

2017-05-26 Thread Tatsuo Ishii
> We use Zabbix.

There's a Zabbix template for PostgreSQL called "pg_monz".

http://pg-monz.github.io/pg_monz/index-en.html

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Turn off Hyperthreading! WAS: 60 core performance with 9.3

2014-08-20 Thread Tatsuo Ishii
 On Wed, Aug 20, 2014 at 12:13:50PM -0700, Josh Berkus wrote:
 On a read-write test, it's 10% faster with HT off as well.
 
 Further, from their production machine we've seen that having HT on
 causes the machine to slow down by 5X whenever you get more than 40
 cores (as in 100% of real cores or 50% of HT cores) worth of activity.
 
 So we're definitely back to If you're using PostgreSQL, turn off
 Hyperthreading.
 
 Not sure how you can make such a blanket statement when so many people
 have tested and shown the benefits of hyper-threading.  I am also
 unclear exactly what you tested, as I didn't see it mentioned in the
 email --- CPU type, CPU count, and operating system would be the minimal
 information required.

HT off is common knowledge for better benchmarking result, at least
for me. I've never seen better result with HT on, except POWER.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance degradation after launching postgres cluster using pgpool-II

2014-04-07 Thread Tatsuo Ishii
 Before starting the cluster, one bulk updates through java code would cost
 about 1 hour to finish, but then it would take twice amount of time.

pgpool-II is not very good at handling extended protocol (mostly used
in Java). If you need to execute large updates, you'd better to
connect to PostgreSQL directly. Note that there's no problem that some
sessions connect via pgpool-II, while others directly connect
PostgreSQL.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] limit connections pgpool

2013-10-10 Thread Tatsuo Ishii
 Hi, i want to know what is the pgpool parameter for close the
 connections directly in the database, because the pgpool II close fine
 the the childs with the life time, but the connection in the database
 continue open in idle state.

That's the result of connection cache functionality of pgpool-II. If
you don't need the connection cache of pgpool-II at all, you could
turn it off:

connection_cache = off

Or you could set following to non 0.

connection_life_time

After an idle connection to PostgreSQL backend lasts for
connection_life_time seconds, the connection will be turned off.

If you have further questions, you'd better to subscribe and post the
question:

http://www.pgpool.net/mailman/listinfo/pgpool-general
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

 this is my pool config
 
 num_init_children = 100
 max_pool = 8
 child_life_time = 60
 child_max_connections = 0
 connection_life_time = 0
 client_idle_limit = 60
 
 
 and this is my postgresql.conf
 
 max_connections = 800
 shared_buffers =  2048MB
 temp_buffers =  64MB
 work_mem = 2048MB
 maintenance_work_mem = 2048MB
 wal_buffers = 256
 checkpoint_segments = 103
 
 
 thanks
 
 
 -- 
 Sent via pgsql-performance mailing list
 (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] out of memory

2012-10-30 Thread Tatsuo Ishii
 i have sql file (it's size are 1GB  )
 when i execute it then the String is 987098801 bytr too long for encoding
 conversion  error occured .
 pls give me solution about

You hit the upper limit of internal memory allocation limit in
PostgreSQL. IMO, there's no way to avoid the error except you use
client encoding identical to backend.

Hackers:
The particular limit seem to be set considering TOAST(from
include/utils/memutils.h):

 * XXX This is deliberately chosen to correspond to the limiting size
 * of varlena objects under TOAST.  See VARSIZE_4B() and related macros
 * in postgres.h.  Many datatypes assume that any allocatable size can
 * be represented in a varlena header.

IMO the SQL string size limit is totally different from
TOAST. Shouldn't we have different limit for SQL string?
(MAX_CONVERSION_GROWTH is different story, of course)
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] pg_dump and thousands of schemas

2012-06-12 Thread Tatsuo Ishii
 On Sun, Jun 10, 2012 at 4:47 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Wed, May 30, 2012 at 2:06 AM, Tatsuo Ishii is...@postgresql.org wrote:
 Yeah, Jeff's experiments indicated that the remaining bottleneck is lock
 management in the server.  What I fixed so far on the pg_dump side
 should be enough to let partial dumps run at reasonable speed even if
 the whole database contains many tables.  But if psql is taking
 AccessShareLock on lots of tables, there's still a problem.

 Ok, I modified the part of pg_dump where tremendous number of LOCK
 TABLE are issued. I replace them with single LOCK TABLE with multiple
 tables. With 100k tables LOCK statements took 13 minutes in total, now
 it only takes 3 seconds. Comments?

 Could you rebase this?  I tried doing it myself, but must have messed
 it up because it got slower rather than faster.
 
 OK, I found the problem.  In fixing a merge conflict, I had it execute
 the query every time it appended a table, rather than just at the end.
 
 With my proposed patch in place, I find that for a full default dump
 your patch is slightly faster with  300,000 tables, and slightly
 slower with  300,000.  The differences are generally 2% in either
 direction.  When it comes to back-patching and partial dumps, I'm not
 really sure what to test.
 
 For the record, there is still a quadratic performance on the server,
 albeit with a much smaller constant factor than the Reassign one.  It
 is in get_tabstat_entry.  I don't know if is worth working on that in
 isolation--if PG is going to try to accommodate 100s of thousands of
 table, there probably needs to be a more general way to limit the
 memory used by all aspects of the rel caches.

I would like to test your patch and w/without my patch. Could you
please give me the patches? Or do you have your own git repository?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] pg_dump and thousands of schemas

2012-06-12 Thread Tatsuo Ishii
 On Tue, Jun 12, 2012 at 1:54 AM, Tatsuo Ishii is...@postgresql.org wrote:
 On Sun, Jun 10, 2012 at 4:47 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Wed, May 30, 2012 at 2:06 AM, Tatsuo Ishii is...@postgresql.org wrote:
 Yeah, Jeff's experiments indicated that the remaining bottleneck is lock
 management in the server.  What I fixed so far on the pg_dump side
 should be enough to let partial dumps run at reasonable speed even if
 the whole database contains many tables.  But if psql is taking
 AccessShareLock on lots of tables, there's still a problem.

 Ok, I modified the part of pg_dump where tremendous number of LOCK
 TABLE are issued. I replace them with single LOCK TABLE with multiple
 tables. With 100k tables LOCK statements took 13 minutes in total, now
 it only takes 3 seconds. Comments?

 Could you rebase this?  I tried doing it myself, but must have messed
 it up because it got slower rather than faster.

 OK, I found the problem.  In fixing a merge conflict, I had it execute
 the query every time it appended a table, rather than just at the end.

 With my proposed patch in place, I find that for a full default dump
 your patch is slightly faster with  300,000 tables, and slightly
 slower with  300,000.  The differences are generally 2% in either
 direction.  When it comes to back-patching and partial dumps, I'm not
 really sure what to test.

 For the record, there is still a quadratic performance on the server,
 albeit with a much smaller constant factor than the Reassign one.  It
 is in get_tabstat_entry.  I don't know if is worth working on that in
 isolation--if PG is going to try to accommodate 100s of thousands of
 table, there probably needs to be a more general way to limit the
 memory used by all aspects of the rel caches.

 I would like to test your patch and w/without my patch. Could you
 please give me the patches? Or do you have your own git repository?
 
 The main patch is in the commit fest as Resource Owner reassign Locks
 for the sake of pg_dump
 
 My re-basing of your patch is attached.

I tested your patches with current master head. The result was pretty
good. Before it took 125 minutes (with 9.2 devel) to dump 100k empty
tables and now it takes only less than 4 minutes!

$ time pg_dump test /dev/null

real3m56.412s
user0m12.059s
sys 0m3.571s

Good job!

Now I applied rebased pg_dump patch.

real4m1.779s
user0m11.621s
sys 0m3.052s

Unfortunately I see no improvement. Probably my patch's value is for
dumping against older backend.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] pg_dump and thousands of schemas

2012-05-31 Thread Tatsuo Ishii
 We recently fixed a couple of O(N^2) loops in pg_dump, but those covered
 extremely specific cases that might or might not have anything to do
 with what you're seeing.  The complainant was extremely helpful about
 tracking down the problems:
 http://archives.postgresql.org/pgsql-general/2012-03/msg00957.php
 http://archives.postgresql.org/pgsql-committers/2012-03/msg00225.php
 http://archives.postgresql.org/pgsql-committers/2012-03/msg00230.php
 
 I'm wondering if these fixes (or today's commit) include the case for
 a database has ~100 thounsands of tables, indexes. One of my customers
 has had troubles with pg_dump for the database, it takes over 10
 hours.
 
 So I did qucik test with old PostgreSQL 9.0.2 and current (as of
 commit 2755abf386e6572bad15cb6a032e504ad32308cc). In a fresh initdb-ed
 database I created 100,000 tables, and each has two integer
 attributes, one of them is a primary key. Creating tables were
 resonably fast as expected (18-20 minutes). This created a 1.4GB
 database cluster.
 
 pg_dump dbname /dev/null took 188 minutes on 9.0.2, which was pretty
 long time as the customer complained. Now what was current?  Well it
 took 125 minutes. Ps showed that most of time was spent in backend.
 
 Below is the script to create tables.
 
 cnt=10
 while [ $cnt -gt 0 ]
 do
 psql -e -p 5432 -c create table t$cnt(i int primary key, j int); test
 cnt=`expr $cnt - 1`
 done
 
 p.s. You need to increate max_locks_per_transaction before running
 pg_dump (I raised to 640 in my case).

Just for record, I rerun the test again with my single-LOCK patch, and
now total runtime of pg_dump is 113 minutes.
188 minutes(9.0)-125 minutes(git master)-113 minutes(with my patch).

So far, I'm glad to see 40% time savings at this point.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] pg_dump and thousands of schemas

2012-05-31 Thread Tatsuo Ishii
 On Thu, May 31, 2012 at 10:45 AM, Tatsuo Ishii is...@postgresql.org wrote:
 Just for record, I rerun the test again with my single-LOCK patch, and
 now total runtime of pg_dump is 113 minutes.
 188 minutes(9.0)-125 minutes(git master)-113 minutes(with my patch).

 So far, I'm glad to see 40% time savings at this point.
 
 I see only 9.6% savings (100 * (113/125 - 1)).  What am I missing?

What I meant was (100 * (113/188 - 1)).
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] pg_dump and thousands of schemas

2012-05-30 Thread Tatsuo Ishii
 Yeah, Jeff's experiments indicated that the remaining bottleneck is lock
 management in the server.  What I fixed so far on the pg_dump side
 should be enough to let partial dumps run at reasonable speed even if
 the whole database contains many tables.  But if psql is taking
 AccessShareLock on lots of tables, there's still a problem.

Ok, I modified the part of pg_dump where tremendous number of LOCK
TABLE are issued. I replace them with single LOCK TABLE with multiple
tables. With 100k tables LOCK statements took 13 minutes in total, now
it only takes 3 seconds. Comments?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 3461f3e..cc1ffd7 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -3832,6 +3832,7 @@ getTables(Archive *fout, int *numTables)
 	int			i_reloptions;
 	int			i_toastreloptions;
 	int			i_reloftype;
+	bool		lock_needed = false;
 
 	/* Make sure we are in proper schema */
 	selectSourceSchema(fout, pg_catalog);
@@ -4273,15 +4274,21 @@ getTables(Archive *fout, int *numTables)
 		 * NOTE: it'd be kinda nice to lock other relations too, not only
 		 * plain tables, but the backend doesn't presently allow that.
 		 */
-		if (tblinfo[i].dobj.dump  tblinfo[i].relkind == RELKIND_RELATION)
+		if (i == 0)
 		{
 			resetPQExpBuffer(query);
-			appendPQExpBuffer(query,
-			  LOCK TABLE %s IN ACCESS SHARE MODE,
+			appendPQExpBuffer(query,LOCK TABLE );
+		}
+
+		if (tblinfo[i].dobj.dump  tblinfo[i].relkind == RELKIND_RELATION)
+		{
+			if (lock_needed)
+appendPQExpBuffer(query,,);
+			appendPQExpBuffer(query,%s,
 		 fmtQualifiedId(fout,
 		tblinfo[i].dobj.namespace-dobj.name,
 		tblinfo[i].dobj.name));
-			ExecuteSqlStatement(fout, query-data);
+			lock_needed = true;
 		}
 
 		/* Emit notice if join for owner failed */
@@ -4290,6 +4297,12 @@ getTables(Archive *fout, int *numTables)
 	  tblinfo[i].dobj.name);
 	}
 
+	if (lock_needed)
+	{
+		appendPQExpBuffer(query,  IN ACCESS SHARE MODE);
+		ExecuteSqlStatement(fout, query-data);
+	}
+
 	if (lockWaitTimeout  fout-remoteVersion = 70300)
 	{
 		ExecuteSqlStatement(fout, SET statement_timeout = 0);

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] pg_dump and thousands of schemas

2012-05-30 Thread Tatsuo Ishii
 Yeah, Jeff's experiments indicated that the remaining bottleneck is lock
 management in the server.  What I fixed so far on the pg_dump side
 should be enough to let partial dumps run at reasonable speed even if
 the whole database contains many tables.  But if psql is taking
 AccessShareLock on lots of tables, there's still a problem.
 
 Ok, I modified the part of pg_dump where tremendous number of LOCK
 TABLE are issued. I replace them with single LOCK TABLE with multiple
 tables. With 100k tables LOCK statements took 13 minutes in total, now
 it only takes 3 seconds. Comments?

Shall I commit to master and all supported branches?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] pg_dump and thousands of schemas

2012-05-30 Thread Tatsuo Ishii
 I'm not excited by this patch.  It dodges the O(N^2) lock behavior for
 the initial phase of acquiring the locks, but it does nothing for the
 lock-related slowdown occurring in all pg_dump's subsequent commands.
 I think we really need to get in the server-side fix that Jeff Janes is
 working on, and then re-measure to see if something like this is still
 worth the trouble.

Well, even with current backend, locking 100,000 tables has been done
in 3 seconds in my test. So even if Jeff Janes's fix is succeeded, I
guess it will just save 3 seconds in my case. and if number of tables
is smaller, the saving will smaller. This suggests that most of time
for processing LOCK has been spent in communication between pg_dump
and backend. Of course this is just my guess, though.

 I am also a tad concerned about whether we might not
 have problems with parsing memory usage, or some such, with thousands of
 tables being listed in a single command.

That's easy to fix. Just divide each LOCK statements into multiple
LOCK statements.

My big concern is, even if the locking part is fixed (either by Jeff
Jane's fix or by me) still much time in pg_dump is spent for SELECTs
against system catalogs. The fix will be turn many SELECTs into single
SELECT, probably using big IN clause for tables oids.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] pg_dump and thousands of schemas

2012-05-29 Thread Tatsuo Ishii
 We recently fixed a couple of O(N^2) loops in pg_dump, but those covered
 extremely specific cases that might or might not have anything to do
 with what you're seeing.  The complainant was extremely helpful about
 tracking down the problems:
 http://archives.postgresql.org/pgsql-general/2012-03/msg00957.php
 http://archives.postgresql.org/pgsql-committers/2012-03/msg00225.php
 http://archives.postgresql.org/pgsql-committers/2012-03/msg00230.php
 
 I'm wondering if these fixes (or today's commit) include the case for
 a database has ~100 thounsands of tables, indexes. One of my customers
 has had troubles with pg_dump for the database, it takes over 10
 hours.

So I did qucik test with old PostgreSQL 9.0.2 and current (as of
commit 2755abf386e6572bad15cb6a032e504ad32308cc). In a fresh initdb-ed
database I created 100,000 tables, and each has two integer
attributes, one of them is a primary key. Creating tables were
resonably fast as expected (18-20 minutes). This created a 1.4GB
database cluster.

pg_dump dbname /dev/null took 188 minutes on 9.0.2, which was pretty
long time as the customer complained. Now what was current?  Well it
took 125 minutes. Ps showed that most of time was spent in backend.

Below is the script to create tables.

cnt=10
while [ $cnt -gt 0 ]
do
psql -e -p 5432 -c create table t$cnt(i int primary key, j int); test
cnt=`expr $cnt - 1`
done

p.s. You need to increate max_locks_per_transaction before running
pg_dump (I raised to 640 in my case).
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] pg_dump and thousands of schemas

2012-05-29 Thread Tatsuo Ishii
 Yeah, Jeff's experiments indicated that the remaining bottleneck is lock
 management in the server.  What I fixed so far on the pg_dump side
 should be enough to let partial dumps run at reasonable speed even if
 the whole database contains many tables.  But if psql is taking
 AccessShareLock on lots of tables, there's still a problem.

Yes, I saw this kind of lines:

29260 2012-05-30 09:39:19 JST LOG:  statement: LOCK TABLE public.t10 IN ACCESS 
SHARE MODE

It seems this is not very efficient query since LOCK TABLE can take
multiple tables as an argument and we could pass as many tables as
possible to one LOCK TABLE query. This way we could reduce the
communication between pg_dump and backend.

Also I noticed lots of queries like these:

29260 2012-05-30 09:39:19 JST LOG:  statement: SELECT attname, attacl FROM 
pg_catalog.pg_attribute WHERE attrelid = '516391' AND NOT attisdropped AND 
attacl IS NOT NULL ORDER BY attnum

I guess this is for each table and if there are tones of tables these
queries are major bottle neck as well as LOCK. I think we could
optimize somewhat this in that we issue queries to extract info of
multiple tables rather than extracting only one table inof as current
implementation does.

Or even better we could create a temp table which contains target
table oids to join the query above.

In my opinion, particular use case such as multi tenancy would create
tons of objects in a database cluster and the performance of pg_dump
might be highlighted more in the future.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] pg_dump and thousands of schemas

2012-05-25 Thread Tatsuo Ishii
 Hugo Nabble hugo.t...@gmail.com writes:
 If anyone has more suggestions, I would like to hear them. Thank you!
 
 Provide a test case?
 
 We recently fixed a couple of O(N^2) loops in pg_dump, but those covered
 extremely specific cases that might or might not have anything to do
 with what you're seeing.  The complainant was extremely helpful about
 tracking down the problems:
 http://archives.postgresql.org/pgsql-general/2012-03/msg00957.php
 http://archives.postgresql.org/pgsql-committers/2012-03/msg00225.php
 http://archives.postgresql.org/pgsql-committers/2012-03/msg00230.php

I'm wondering if these fixes (or today's commit) include the case for
a database has ~100 thounsands of tables, indexes. One of my customers
has had troubles with pg_dump for the database, it takes over 10
hours.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Oracle v. Postgres 9.0 query performance

2011-06-09 Thread Tatsuo Ishii
 * Tony Capobianco (tcapobia...@prospectiv.com) wrote:
  HashAggregate  (cost=4391163.81..4391288.05 rows=9939 width=12)
-  Hash Join  (cost=14.78..4344767.23 rows=9279316 width=12)
  Hash Cond: (o.emailcampaignid = s.emailcampaignid)
  -  Seq Scan on openactivity o  (cost=0.00..3529930.67
 rows=192540967 width=12)
  -  Hash  (cost=8.79..8.79 rows=479 width=4)
-  Seq Scan on ecr_sents s  (cost=0.00..8.79 rows=479
 width=4)
 
 Yikes.  Two sequential scans.
 
 Err, isn't that more-or-less exactly what you want here?  The smaller
 table is going to be hashed and then you'll traverse the bigger table
 and bounce each row off the hash table.  Have you tried actually running
 this and seeing how long it takes?  The bigger table doesn't look to be
 *that* big, if your i/o subsystem is decent and you've got a lot of
 memory available for kernel cacheing, should be quick.

Just out of curiosity, is there any chance that this kind of query is
speeding up in 9.1 because of following changes?

 * Allow FULL OUTER JOIN to be implemented as a hash join, and allow
   either side of a LEFT OUTER JOIN or RIGHT OUTER JOIN to be hashed
   (Tom Lane)
   Previously FULL OUTER JOIN could only be implemented as a merge
   join, and LEFT OUTER JOIN and RIGHT OUTER JOIN could hash only the
   nullable side of the join. These changes provide additional query
   optimization possibilities.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] pgpoolAdmin handling several pgpool-II clusters

2011-05-02 Thread Tatsuo Ishii
 Can pgpoolAdmin utility handle(administer) more than one pgpool-II custer?

No. pgpoolAdmin only supports one pgpool-II server.

 We have a need of setting up 3 independent postgres clusters. One cluster
 handling cadastral maps, one handling raster maps and one handling vector
 maps. Each of these clusters must have a load balancer - EG pgpool-II.
 Internally in each cluster we plan to (and have tested) PostgreSQL(9.03)'s
 own streaming replication. We have installed pgpool-II, and are now
 confronted with the complicated installation of pgpoolAdmin web-app. Hence
 we would very much like to have only one pgpoolAdmin instance to govern all
 3 pgpool-II clusters.
 
 (Alternatively we will go for a more complex configuration with PostgresXC.)

Becase pgpoolAdmin is a web application, you could assign a tab to a
pgpoolAdmin.

 (Have tried to post to http://pgsqlpgpool.blogspot.com - with no success)

It's my personal blog:-) Please post to pgpool-geneal mailing list.

You can subscribe it from:
http://lists.pgfoundry.org/mailman/listinfo/pgpool-general
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] pgpoolAdmin handling several pgpool-II clusters

2011-04-29 Thread Tatsuo Ishii
 Can pgpoolAdmin utility handle(administer) more than one pgpool-II custer?

No. pgpoolAdmin only supports one pgpool-II server.

 We have a need of setting up 3 independent postgres clusters. One cluster
 handling cadastral maps, one handling raster maps and one handling vector
 maps. Each of these clusters must have a load balancer - EG pgpool-II.
 Internally in each cluster we plan to (and have tested) PostgreSQL(9.03)'s
 own streaming replication. We have installed pgpool-II, and are now
 confronted with the complicated installation of pgpoolAdmin web-app. Hence
 we would very much like to have only one pgpoolAdmin instance to govern all
 3 pgpool-II clusters.
 
 (Alternatively we will go for a more complex configuration with PostgresXC.)

Becase pgpoolAdmin is a web application, you could assign a tab to a
pgpoolAdmin.

 (Have tried to post to http://pgsqlpgpool.blogspot.com - with no success)

It's my personal blog:-) Please post to pgpool-geneal mailing list.

You can subscribe it from:
http://lists.pgfoundry.org/mailman/listinfo/pgpool-general
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Stored procedure declared as VOLATILE = no good optimization is done

2010-10-15 Thread Tatsuo Ishii
 broadly speaking:
 *) function generates same output from inputs regardless of what's
 going on in the database, and has no side effects: IMMUTABLE

So can I say if a function is marked IMMUTABLE, then it should never
modify database? Is there any counter example?

 *) function reads (only) from tables, or is an immutable function in
 most senses but influenced from the GUC (or any other out of scope
 thing): STABLE

It seems if above is correct, I can say STABLE functions should never
modify databases as well.

 *) all other cases: VOLATILE (which is btw the default)
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.

2010-07-18 Thread Tatsuo Ishii
From: Rajesh Kumar Mallah mallah.raj...@gmail.com
Subject: Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.
Date: Mon, 19 Jul 2010 08:06:09 +0530
Message-ID: aanlktilggkbmc9h7wlhlcdqfm5rjth1-9dpf8golv...@mail.gmail.com

  Thanks for the thought but it (-C) does not work .

Still you need:

pgbench's -c = (pool_size + reserve_pool_size)
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] multiple apaches against single postgres database

2007-10-24 Thread Tatsuo Ishii
Sorry for an off topic posting...

Michal,

 Honza Novak napsal(a):
  And my questions:
  1. Does someone hes similar experience? or clue what to do with it?
 
 Sure, this is considered normal behavior for web applications. The 
 solution is to use connection pooling.
 
  2. What is correct setup of postgresql backend serving data for many 
  (4+) apaches? i know that there are connection pooling solutions 
  (pgPool, pgBouncer, or apache 2.2) and i'm thinking about them, but it 
  seems that we have other problem beside that we didn't implement any 
  pooling solution yet.
 
 We use pgpool running on each web server. You can have also the pgpool 
 running on the database server or even a separate server just for that. 
 You'll have to test to see what's best for you.

As a member of pgpool development team, I am always looking for pgpool
examples in the real world which could be open to public. Can you
plese tell me more details the pgpool usage if possible?
--
Tatsuo Ishii
SRA OSS, Inc. Japan

  3. is there a way to somehow log what happened to the postgres server 
  before accident? do you think that logging of all sql statements would 
  help me? if i enable it, what will be the performance overhead?
 
 What you are seeing is called positive feedback. Once the server 
 reaches a certain performance threshold, it starts to delay the queries, 
 which causes more load, which causes further delay, until everything 
 comes to a halt. Sometimes the system can recover from this, if you have 
 properly setup limits (it will just refuse the requests until it can 
 cool off), sometimes it doesn't. The point is never get over the threshold.
 
 Also, maybe you need better hardware for that kind of load, but since 
 you didn't provide more detail, we can't tell you.
 
 It's quite meaningless to analyze performance once the system is 
 overloaded. You have to analyze before that happens and identify the 
 longest running queries under normal load and try to optimize them. 
 Under heavy load, even the simplest query may seem to be taking long 
 time, but it doesn't necessarily mean there is something wrong with it.
 
 -- 
 Michal Táborský
 chief systems architect
 Internet Mall, a.s.
 http://www.MALL.cz
 
 ---(end of broadcast)---
 TIP 5: don't forget to increase your free space map settings

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL

2006-07-13 Thread Tatsuo Ishii
  Tatsuo Ishii [EMAIL PROTECTED] writes:
   Interesting. We (some Japanese companies including SRA OSS,
   Inc. Japan) did some PG scalability testing using a Unisys's big 16
   (physical) CPU machine and found PG scales up to 8 CPUs. However
   beyond 8 CPU PG does not scale anymore. The result can be viewed at
   OSS iPedia web site (http://ossipedia.ipa.go.jp). Our conclusion was
   PG has a serious lock contention problem in the environment by
   analyzing the oprofile result.
  
  18% in s_lock is definitely bad :-(.  Were you able to determine which
  LWLock(s) are accounting for the contention?
 
 Yes. We were interested in that too. Some people did addtional tests
 to determin that. I don't have the report handy now. I will report
 back next week.

Sorry for the delay. Finally I got the oprofile data. It's
huge(34MB). If you are interested, I can put somewhere. Please let me
know.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL

2006-06-16 Thread Tatsuo Ishii
 I am thrill to inform you all that Sun has just donated a fully loaded 
 T2000 system to the PostgreSQL community, and it's being setup by Corey 
 Shields at OSL (osuosl.org) and should be online probably early next 
 week. The system has
 
 * 8 cores, 4 hw threads/core @ 1.2 GHz. Solaris sees the system as 
 having 32 virtual CPUs, and each can be enabled or disabled individually
 * 32 GB of DDR2 SDRAM memory
 * 2 @ 73GB internal SAS drives (1 RPM)
 * 4 Gigabit ethernet ports
 
 For complete spec, visit 
 http://www.sun.com/servers/coolthreads/t2000/specifications.jsp
 
 I think this system is well suited for PG scalability testing, among 
 others. We did an informal test using an internal OLTP benchmark and 
 noticed that PG can scale to around 8 CPUs. Would be really cool if all 
 32 virtual CPUs can be utilized!!!

Interesting. We (some Japanese companies including SRA OSS,
Inc. Japan) did some PG scalability testing using a Unisys's big 16
(physical) CPU machine and found PG scales up to 8 CPUs. However
beyond 8 CPU PG does not scale anymore. The result can be viewed at
OSS iPedia web site (http://ossipedia.ipa.go.jp). Our conclusion was
PG has a serious lock contention problem in the environment by
analyzing the oprofile result.

You can take a look at the detailed report at:
http://ossipedia.ipa.go.jp/capacity/EV0604210111/
(unfortunately only Japanese contents is available at the
moment. Please use some automatic translation services)

Evalution environment was:
PostgreSQL 8.1.2
OSDL DBT-1 2.1
Miracle Linux 4.0
Unisys ES700 Xeon 2.8GHz CPU x 16 Mem 16GB(HT off)
--
Tatsuo Ishii
SRA OSS, Inc. Japan

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL

2006-06-16 Thread Tatsuo Ishii
 Tatsuo Ishii [EMAIL PROTECTED] writes:
  Interesting. We (some Japanese companies including SRA OSS,
  Inc. Japan) did some PG scalability testing using a Unisys's big 16
  (physical) CPU machine and found PG scales up to 8 CPUs. However
  beyond 8 CPU PG does not scale anymore. The result can be viewed at
  OSS iPedia web site (http://ossipedia.ipa.go.jp). Our conclusion was
  PG has a serious lock contention problem in the environment by
  analyzing the oprofile result.
 
 18% in s_lock is definitely bad :-(.  Were you able to determine which
 LWLock(s) are accounting for the contention?

Yes. We were interested in that too. Some people did addtional tests
to determin that. I don't have the report handy now. I will report
back next week.

 The test case seems to be spending a remarkable amount of time in LIKE
 comparisons, too.  That probably is not a representative condition.

I know. I think point is 18% in s_lock only appears with 12 CPUs or more.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] High context switches occurring

2005-12-11 Thread Tatsuo Ishii
 Bruce Momjian pgman@candle.pha.pa.us writes:
  Tom Lane wrote:
  1. You don't want number of clients (-c) much higher than scaling factor
  (-s in the initialization step).
 
  Should we throw a warning when someone runs the test this way?
 
 Not a bad idea (though of course only for the standard scripts).
 Tatsuo, what do you think?

That would be annoying since almost every users will get the kind of
warnings. What about improving the README?
--
Tatsuo Ishii
SRA OSS, Inc. Japan

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-25 Thread Tatsuo Ishii
   Tatsuo Ishii [EMAIL PROTECTED] writes:
Probably VACUUM works well for small to medium size tables, but not
for huge ones. I'm considering about to implement on the spot
salvaging dead tuples.
   
   That's impossible on its face, except for the special case where the
   same transaction inserts and deletes a tuple.  In all other cases, the
   transaction deleting a tuple cannot know whether it will commit.
  
  Of course. We need to keep a list of such that tuples until commit or
  abort.
 
 what about other transactions, which may have started before current one
 and be still running when current one commites ?

Then dead tuples should be left. Perhaps in this case we could
register them in FSM or whatever for later processing.
--
Tatsuo Ishii

 I once proposed an extra parameter added to VACUUM FULL which determines
 how much free space to leave in each page vacuumed. If there were room
 the new tuple could be placed near the old one in most cases and thus
 avoid lots of disk head movement when updating huge tables in one go.
 
 
 
 Hannu Krosing [EMAIL PROTECTED]
 
 ---(end of broadcast)---
 TIP 6: Have you searched our list archives?
 
http://archives.postgresql.org
 

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] PgPool changes WAS: PostgreSQL clustering VS MySQL

2005-01-25 Thread Tatsuo Ishii
 Peter, Ragnar,
 
   Are there ones that you use which might use several different connections
   to send a series of queries from a single web-user, less than 5 seconds
   apart?
 
  Using Apache/Perl I often have a situation where we're sending several
  queries from the same user (web client) within seconds, or even
  simultaneously, that use different connections.
 
 So from the sound of it, the connection methods I've been using are the 
 exception rather than the rule.   Darn, it worked well for us.  :-(
 
 What this would point to is NOT being able to use Slony-I for database server 
 pooling for most web applications.   Yes?  Users should look to pgCluster and 
 C-JDBC instead.

Yup. That's the limitaion of async replication solutions.

 BTW, Tatsuo, what's the code relationship between pgPool and pgCluster, if 
 any?

PGCluster consists of three kind of servers, load balance server,
cluster server(modified PostgreSQL backend) and replication
server. I believe some of codes of pgpool are used in the load
balance server to avoid re-invent a wheel. This is a beauty of open
source software project.
--
Tatsuo Ishii

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] PgPool changes WAS: PostgreSQL clustering VS MySQL

2005-01-24 Thread Tatsuo Ishii
 On Mon, 2005-01-24 at 09:52 -0800, Josh Berkus wrote:
  [about keeping connections open in web context]
  Ah, clarity problem here.I'm talking about connection pooling tools 
  from 
  the client (webserver) side, such as Apache::DBI, PHP's pg_pconnect, 
  Jakarta's connection pools, etc.   Not pooling on the database server side, 
  which is what pgPool provides.
 
 note that these sometimes do not provide connection pooling as such,
 just persistent connections (Apache::DBI)

Right. Same thing can be said to pg_pconnect.

  Most of these tools allocate a database connection to an HTTP/middleware 
  client, and only release it after a specific period of inactivity.This 
  means that you *could* count on web-user==connection for purposes of 
  switching back and forth to the master -- as long as the 
  connection-recycling 
  timeout were set higher than the pgPool switch-off period.
 
 no. you can only count on web-server-process==connection, but not
 web-user==connection, unless you can garantee that the same user
 client always connects to same web-server process.

I have same opinion.

 am i missing something ?
--
Tatsuo Ishii

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-23 Thread Tatsuo Ishii
 Tatsuo,
 
  I'm not clear what pgPool only needs to monitor update switching by
 
  *connection* not by *table* means. In your example:
   (1) 00:00 User A updates My Profile
   (2) 00:01 My Profile UPDATE finishes executing.
   (3) 00:02  User A sees My Profile re-displayed
   (6) 00:04  My Profile:UserA cascades to the last Slave server
 
  I think (2) and (3) are on different connections, thus pgpool cannot
  judge if SELECT in (3) should go only to the master or not.
 
  To solve the problem you need to make pgpool understand web sessions
  not database connections and it seems impossible for pgpool to
  understand sessions.
 
 Depends on your connection pooling software, I suppose.  Most connection 
 pooling software only returns connections to the pool after a user has been 
 inactive for some period ... generally more than 3 seconds.  So connection 
 continuity could be trusted.

Not sure what you mean by most connection pooling software, but I'm
sure that pgpool behaves differently.
--
Tatsuo Ishii

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-23 Thread Tatsuo Ishii
 Tatsuo Ishii [EMAIL PROTECTED] writes:
  Probably VACUUM works well for small to medium size tables, but not
  for huge ones. I'm considering about to implement on the spot
  salvaging dead tuples.
 
 That's impossible on its face, except for the special case where the
 same transaction inserts and deletes a tuple.  In all other cases, the
 transaction deleting a tuple cannot know whether it will commit.

Of course. We need to keep a list of such that tuples until commit or
abort.
--
Tatsuo Ishii

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-21 Thread Tatsuo Ishii
 Tatsuo,
 
  Yes. However it would be pretty easy to modify pgpool so that it could
  cope with Slony-I. I.e.
 
  1) pgpool does the load balance and sends query to Slony-I's slave and
 master if the query is SELECT.
 
  2) pgpool sends query only to the master if the query is other than
 SELECT.
 
  Remaining problem is that Slony-I is not a sync replication
  solution. Thus you need to prepare that the load balanced query
  results might differ among servers.
 
 Yes, please, some of us are already doing the above ad-hoc.
 
 The simple workaround to replication lag is to calculate the longest likely 
 lag (3 seconds if Slony is tuned right) and have the dispatcher (pgpool) 
 send all requests from that connection to the master for that period.   Then 
 it switches back to pool mode where the slaves may be used.

Can I ask a question?

Suppose table A gets updated on the master at time 00:00. Until 00:03
pgpool needs to send all queries regarding A to the master only. My
question is, how can pgpool know a query is related to A?
--
Tatsuo Ishii

 Of course, all of the above is only useful if you're doing a web app where 
 96% 
 of query activity is selects.   For additional scalability, put all of your 
 session maintenance in memcached, so that you're not doing database writes 
 every time a page loads.
 
 -- 
 Josh Berkus
 Aglio Database Solutions
 San Francisco
 

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-21 Thread Tatsuo Ishii
 Tatsuo,
 
  Suppose table A gets updated on the master at time 00:00. Until 00:03
  pgpool needs to send all queries regarding A to the master only. My
  question is, how can pgpool know a query is related to A?
 
 Well, I'm a little late to head off tangental discussion about this, but 
 
 The systems where I've implemented something similar are for web 
 applications.  
 In the case of the web app, you don't care if a most users see data which is 
 2 seconds out of date; with caching and whatnot, it's often much more than 
 that!
 
 The one case where it's not permissable for a user to see old data is the 
 case where the user is updating the data.   Namely:
 
 (1) 00:00 User A updates My Profile
 (2) 00:01 My Profile UPDATE finishes executing.
 (3) 00:02  User A sees My Profile re-displayed
 (6) 00:04  My Profile:UserA cascades to the last Slave server
 
 So in an application like the above, it would be a real problem if User A 
 were 
 to get switched over to a slave server immediately after the update; she 
 would see the old data, assume that her update was not saved, and update 
 again.  Or send angry e-mails to [EMAIL PROTECTED]   
 
 However, it makes no difference what User B sees:
 
 (1) 00:00 User A updates My Profilev1   Master
 (2) 00:01 My Profile UPDATE finishes executing. Master
 (3) 00:02  User A sees My Profilev2 displayed   Master
 (4) 00:02  User B requests MyProfile:UserA  Slave2
 (5) 00:03  User B sees My Profilev1 Slave2
 (6) 00:04  My Profilev2 cascades to the last Slave server  Slave2
 
 If the web application is structured properly, the fact that UserB is seeing 
 UserA's information which is 2 seconds old is not a problem (though it might 
 be for web auctions, where it could result in race conditions.   Consider 
 memcached as a helper).   This means that pgPool only needs to monitor 
 update switching by *connection* not by *table*.
 
 Make sense?

I'm not clear what pgPool only needs to monitor update switching by
*connection* not by *table* means. In your example:

 (1) 00:00 User A updates My Profile
 (2) 00:01 My Profile UPDATE finishes executing.
 (3) 00:02  User A sees My Profile re-displayed
 (6) 00:04  My Profile:UserA cascades to the last Slave server

I think (2) and (3) are on different connections, thus pgpool cannot
judge if SELECT in (3) should go only to the master or not.

To solve the problem you need to make pgpool understand web sessions
not database connections and it seems impossible for pgpool to
understand sessions.
--
Tatsuo Ishii

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-21 Thread Tatsuo Ishii
IMO the bottle neck is not WAL but table/index bloat. Lots of updates
on large tables will produce lots of dead tuples. Problem is, There'
is no effective way to reuse these dead tuples since VACUUM on huge
tables takes longer time. 8.0 adds new vacuum delay
paramters. Unfortunately this does not help. It just make the
execution time of VACUUM longer, that means more and more dead tuples
are being made while updating.

Probably VACUUM works well for small to medium size tables, but not
for huge ones. I'm considering about to implement on the spot
salvaging dead tuples.
--
Tatsuo Ishii

 This is probably a lot easier than you would think.  You say that your 
 DB will have lots of data, lots of updates and lots of reads.
 
 Very likely the disk bottleneck is mostly index reads and writes, with 
 some critical WAL fsync() calls.  In the grand scheme of things, the 
 actual data is likely not accessed very often.
 
 The indexes can be put on a RAM disk tablespace and that's the end of 
 index problems -- just make sure you have enough memory available.  Also 
 make sure that the machine can restart correctly after a crash: the 
 tablespace is dropped and recreated, along with the indexes.  This will 
 cause a machine restart to take some time.
 
 After that, if the WAL fsync() calls are becoming a problem, put the WAL 
 files on a fast RAID array, etiher a card or external enclosure, that 
 has a good amount of battery-backed write cache.  This way, the WAL 
 fsync() calls will flush quickly to the RAM and Pg can move on while the 
 RAID controller worries about putting the data to disk.  With WAL, low 
 access time is usually more important than total throughput.
 
 The truth is that you could have this running for not much money.
 
 Good Luck,
 Marty
 
  Le Jeudi 20 Janvier 2005 19:09, Bruno Almeida do Lago a écrit :
Could you explain us what do you have in mind for that solution? I mean,
forget the PostgreSQL (or any other database) restrictions and 
  explain us
how this hardware would be. Where the data would be stored?
   
I've something in mind for you, but first I need to understand your 
  needs!
  
  I just want to make a big database as explained in my first mail ... At the
  beginning we will have aprox. 150 000 000 records ... each month we will 
  add
  about 4/8 millions new rows in constant flow during the day ... and in same
  time web users will access to the database in order to read those data.
  Stored data are quite close to data stored by google ... (we are not 
  making a
  google clone ... just a lot of data many small values and some big ones ...
  that's why I'm comparing with google for data storage).
  Then we will have a search engine searching into those data ...
  
  Dealing about the hardware, for the moment we have only a bi-pentium Xeon
  2.8Ghz with 4 Gb of RAM ... and we saw we had bad performance results 
  ... so
  we are thinking about a new solution with maybe several servers (server
  design may vary from one to other) ... to get a kind of cluster to get 
  better
  performance ...
  
  Am I clear ?
  
  Regards,
 
 
 
 
 ---(end of broadcast)---
 TIP 2: you can get off all lists at once with the unregister command
 (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
 

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-21 Thread Tatsuo Ishii
 Peter, Tatsuo:
 
 would happen with SELECT queries that, through a function or some
  other mechanism, updates data in the database?  Would those need to be
  passed to pgpool in some special way?
 
 Oh, yes, that reminds me.  It would be helpful if pgPool accepted a control 
 string ... perhaps one in a SQL comment ... which indicated that the 
 statement to follow was, despite appearances, an update.  For example:
 --STATEMENT_IS_UPDATE\n

Actually the way judging if it's a pure SELECT or not in pgpool is
very simple. pgpool just checkes if the SQL statement exactly begins
with SELECT (case insensitive, of course). So, for example, you
could insert an SQL comment something like /*this SELECT has side
effect*/ at the beginning of line to indicate that pgpool should not
send this query to the slave.

 The alternative is, of course, that pgPool direct all explicit transactions 
 to 
 the master ... which is a good idea anyway.So you could do:
 
 BEGIN;
 SELECT some_update_function();
 COMMIT;

Yes. pgpool has already done this in load balancing. Expanding this
for Slony-I is pretty easy.
--
Tatsuo Ishii

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread Tatsuo Ishii
 On January 20, 2005 06:49 am, Joshua D. Drake wrote:
  Stephen Frost wrote:
  * Herv? Piedvache ([EMAIL PROTECTED]) wrote:
  Le Jeudi 20 Janvier 2005 15:30, Stephen Frost a écrit :
  * Herv? Piedvache ([EMAIL PROTECTED]) wrote:
  Is there any solution with PostgreSQL matching these needs ... ?
  
  You might look into pg_pool.  Another possibility would be slony, though
  I'm not sure it's to the point you need it at yet, depends on if you can
  handle some delay before an insert makes it to the slave select systems.
  
  I think not ... pgpool or slony are replication solutions ... but as I
   have said to Christopher Kings-Lynne how I'll manage the scalabilty of
   the database ? I'll need several servers able to load a database growing
   and growing to get good speed performance ...
  
  They're both replication solutions, but they also help distribute the
  load.  For example:
  
  pg_pool will distribute the select queries amoung the servers.  They'll
  all get the inserts, so that hurts, but at least the select queries are
  distributed.
  
  slony is similar, but your application level does the load distribution
  of select statements instead of pg_pool.  Your application needs to know
  to send insert statements to the 'main' server, and select from the
  others.
 
  You can put pgpool in front of replicator or slony to get load
  balancing for reads.
 
 Last time I checked load ballanced reads was only available in pgpool if you 
 were using pgpools's internal replication.  Has something changed recently?

Yes. However it would be pretty easy to modify pgpool so that it could
cope with Slony-I. I.e.

1) pgpool does the load balance and sends query to Slony-I's slave and
   master if the query is SELECT.

2) pgpool sends query only to the master if the query is other than
   SELECT.

Remaining problem is that Slony-I is not a sync replication
solution. Thus you need to prepare that the load balanced query
results might differ among servers.

If there's enough demand, I would do such that enhancements to pgpool.
--
Tatsuo Ishii

  Is there any other solution than a Cluster for our problem ?
  
  Bigger server, more CPUs/disks in one box.  Try to partition up your
  data some way such that it can be spread across multiple machines, then
  if you need to combine the data have it be replicated using slony to a
  big box that has a view which joins all the tables and do your big
  queries against that.
  
  But I'll arrive to limitation of a box size quickly I thing a 4
   processors with 64 Gb of RAM ... and after ?
 
  Opteron.
 
 IBM Z-series, or other big iron.
 
 
  Go to non-x86 hardware after if you're going to continue to increase the
  size of the server.  Personally I think your better bet might be to
  figure out a way to partition up your data (isn't that what google
  does anyway?).
  
 Stephen
 
 -- 
 Darcy Buskermolen
 Wavefire Technologies Corp.
 ph: 250.717.0200
 fx:  250.763.1759
 http://www.wavefire.com
 
 ---(end of broadcast)---
 TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
 

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] determining max_fsm_pages

2004-10-29 Thread Tatsuo Ishii
 Pg: 7.4.5
 8G ram
 200G RAID5
 
 I have my  fsm set as such:
 max_fsm_pages = 30  # min max_fsm_relations*16, 6 bytes each
 max_fsm_relations = 500 # min 100, ~50 bytes each
 
 
 I just did a vacuum full on one table and saw this result:
 INFO:  analyzing cdm.cdm_fed_agg_purch
 INFO:  cdm_fed_agg_purch: 667815 pages, 3000 rows sampled, 52089570 
 estimated total rows
 
 
 My question is this:  I have about 8 databases running on this server.  
 When I do a vacuum full on each of these databases, there is a INFO 
 section that I assume is the total pages used for that database.  Should 
 add ALL these individual pages together and pad the total and use this 
 as my new max_fsm_pages?  Should I do the same thing with max_fsm_relations?

I think that's too much and too big FSM affects performance in my
opinion. The easiest way to calculate appropreate FSM size is doing
vacuumdb -a -v and watching the message. At the very end, you would
see something like:

INFO:  free space map: 13 relations, 1447 pages stored; 1808 total pages needed
DETAIL:  Allocated FSM size: 100 relations + 1600 pages = 19 kB shared memory.

In this case 1808 is the minimum FSM size. Of course this number would
change depending on the frequency of VACUUM. Therefore you need some
room for the FSM size.
--
Tatsuo Ishii

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Caching of Queries

2004-10-07 Thread Tatsuo Ishii
 Tatsuo Ishii [EMAIL PROTECTED] writes:
  First, it's not a particular problem with pgpool. As far as I know any
  connection pool solution has exactly the same problem. Second, it's
  easy to fix if PostgreSQL provides a functionarity such as:drop all
  temporary tables if any.
 
 I don't like that definition exactly --- it would mean that every time
 we add more backend-local state, we expect client drivers to know to
 issue the right incantation to reset that kind of state.
 
 I'm thinking we need to invent a command like RESET CONNECTION that
 resets GUC variables, drops temp tables, forgets active NOTIFYs, and
 generally does whatever else needs to be done to make the session state
 appear virgin.  When we add more such state, we can fix it inside the
 backend without bothering clients.

Great. It's much better than I propose.

 I now realize that our RESET ALL command for GUC variables was not
 fully thought out.  We could possibly redefine it as doing the above,
 but that might break some applications ...
 
   regards, tom lane
 

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Caching of Queries

2004-10-06 Thread Tatsuo Ishii
  I don't know what you are exactly referring to in above URL 
  when you are talking about potential pitfalls of pooling. 
  Please explain more.
 
 Sorry, I wasn't implying that pgpool doesn't deal with the issues, just that
 some people aren't necessarily aware of them up front.  For instance, pgpool
 does an 'abort transaction' and a 'reset all' in lieu of a full reconnect
 (of course, since a full reconnect is exactly what we are trying to avoid).
 Is this is enough to guarantee that a given pooled connection behaves
 exactly as a non-pooled connection would from a client perspective?  For
 instance, temporary tables are usually dropped at the end of a session, so a
 client (badly coded perhaps) that does not already use persistent
 connections might be confused when the sequence 'connect, create temp table
 foo ..., disconnect, connect, create temp table foo ...' results in the
 error 'Relation 'foo' already exists'.

First, it's not a particular problem with pgpool. As far as I know any
connection pool solution has exactly the same problem. Second, it's
easy to fix if PostgreSQL provides a functionarity such as:drop all
temporary tables if any. I think we should implement it if we agree
that connection pooling should be implemented outside the PostgreSQL
engine itself. I think cores agree with this.
--
Tatsuo Ishii

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Caching of Queries

2004-10-03 Thread Tatsuo Ishii
 More to the point though, I think this is a feature that really really 
 should be in the DB, because then it's trivial for people to use.  
 
 
 
 How does putting it into PGPool make it any less trivial for people to
 use?
 
 The answers are at  http://www2b.biglobe.ne.jp/~caco/pgpool/index-e.html 
 .  Specifically, it's a separate application that needs configuration, 
 the homepage has no real discussion of the potential pitfalls of pooling 
 and what this implementation does to get around them, you get the idea.  

I don't know what you are exactly referring to in above URL when you
are talking about potential pitfalls of pooling. Please explain
more.
--
Tatsuo Ishii

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Performance Bottleneck

2004-08-08 Thread Tatsuo Ishii
 Jeff wrote:
  
  On Aug 8, 2004, at 1:29 AM, Martin Foster wrote:
  
  I am currently making use of Apache::DBI which overrides the 
  DBI::disconnect call and keeps a pool of active connections for use 
  when need be.   Since it offloads the pooling to the webserver, it 
  seems more advantageous then pgpool which while being able to run on a 
  external system is not adding another layer of complexity.
 
  
  Apache::DBI is not the same sort of a pool as pgpool.  DB connections 
  are not shared among all your apache children (A common misconception). 
   So if you have 300 apache kids you can have have 300 db connections.  
  With pgpool connections are  shared among all of them so even though you 
  have 300 kids you only have say 32 db connections.
  
 
 Seems that you are right, never noticed that from the documentation 
 before.   I always assumed it had something to do with the long 
 lasting/persistent scripts that would remain in transactions for 
 extended periods of time.
 
 Here is an odd question.   While the server run 7.4.x, the client 
 connects with 7.3.x.  Would this in itself make a difference in 
 performance as the protocols are different?   At least based from 
 pgpool's documentation.

In this case the server fall back from V3 protocol (employed in 7.4 or
later) to V2 protocol (employed in from 6.4 to 7.3.x). As far as
pgpool concerning, performance difference is significant. Of course
that depends on the implementation though.

FYI here is the outline of the testing using pgbench.

H/W: Pentium4 2.4GHz x2/memory 1GB/HDD IDE 80GB (all PCs are same spec)
S/W: RedHat Linux 9/PostgreSQL 7.3.6/7.4.3

postgresql.conf:
tcpip_socket = true
max_connections = 512
shared_buffers = 2048

host A: pgbench, host B: pgpool, host C: PostgreSQL 7.3.6 or 7.4.3
pgbench parameters: -S -c 10 -t 1000

result:
TPS ratio(7.4.3)   
 ratio(7.3.6)

without pgpool  4357.625059 100%   
 100%
with pgpool(connection pool mode)   4330.290294 99.4%  
 94.1%
with pgpool(replication mode)   4297.614996 98.6%  
 87.6%
with pgpoo(replication with strictmode) 4270.223136 98.0%  
 81.5%
--
Tatsuo Ishii

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Performance Bottleneck

2004-08-08 Thread Tatsuo Ishii
 Arjen van der Meijden wrote:
 
  On 8-8-2004 16:29, Matt Clark wrote:
  
  There are two well-worn and very mature techniques for dealing with the
  issue of web apps using one DB connection per apache process, both of 
  which
  work extremely well and attack the issue at its source.
 
  1)Use a front-end caching proxy like Squid as an accelerator.  Static
  content will be served by the accelerator 99% of the time.  Additionally,
  large pages can be served immediately to the accelerator by Apache, which
  can then go on to serve another request without waiting for the end 
  user's
  dial-up connection to pull the data down.  Massive speedup, fewer apache
  processes needed.
  
  
  Another version of this 1) is to run with a content accelerator; our 
  favourite is to run Tux in front of Apache. It takes over the 
  connection-handling stuff, has a very low memoryprofile (compared to 
  Apache) and very little overhead. What it does, is to serve up all 
  simple content (although you can have cgi/php/perl and other languages 
  being processed by it, entirely disabling the need for apache in some 
  cases) and forwards/proxies everything it doesn't understand to an 
  Apache/other webserver running at the same machine (which runs on 
  another port).
  
  I think there are a few advantages over Squid; since it is partially 
  done in kernel-space it can be slightly faster in serving up content, 
  apart from its simplicity which will probably matter even more. You'll 
  have no caching issues for pages that should not be cached or static 
  files that change periodically (like every few seconds). Afaik Tux can 
  handle more than 10 times as much ab-generated requests per second than 
  a default-compiled Apache on the same machine.
  And besides the speed-up, you can do any request you where able to do 
  before, since Tux will simply forward it to Apache if it didn't 
  understand it.
  
  Anyway, apart from all that. Reducing the amount of apache-connections 
  is nice, but not really the same as reducing the amount of 
  pooled-connections using a db-pool... You may even be able to run with 
  1000 http-connections, 40 apache-processes and 10 db-connections. In 
  case of the non-pooled setup, you'd still have 40 db-connections.
  
  In a simple test I did, I did feel pgpool had quite some overhead 
  though. So it should be well tested, to find out where the 
  turnover-point is where it will be a gain instead of a loss...

I don't know what were the configurations you are using, but I noticed
that UNIX domain sockets are preferred for the connection bwteen
clients and pgpool. When I tested using pgbench -C (involving
connection estblishing for each transaction),
with-pgpool-configuration 10 times faster than without-pgpool-conf if
using UNIX domain sockets, while there is only 3.6 times speed up with
TCP/IP sockets.

  Best regards,
  
  Arjen van der Meijden
  
 
 Other then images, there are very few static pages being loaded up by 
 the user.Since they make up a very small portion of the traffic, it 
 tends to be an optimization we can forgo for now.
 
 I attempted to make use of pgpool.   At the default 32 connections 
 pre-forked the webserver almost immediately tapped out the pgpool base 
 and content stopped being served because no new processes were being 
 forked to make up for it.
 
 So I raised it to a higher value (256) and it immediately segfaulted and 
 dropped the core.So not sure exactly how to proceed, since I rather 
 need the thing to fork additional servers as load hits and not the other 
 way around.

What version of pgpool did you test? I know that certain version
(actually 2.0.2) had such that problem. Can you try again with the
latest verison of pgpool? (it's 2.0.6).
--
Tatsuo Ishii

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster