[PERFORM] Will shared_buffers crash a server

2011-04-29 Thread Qiang Wang
Hei: 


We have PostgreSQL 8.3 running on Debian Linux server. We built an applicantion 
using PHP programming language and Postgres database. There are appoximatly 150 
users using the software constantly. We had some performance degration before 
and after some studies we figured out we will need to tune PostgreSQL 
configurations. 


We have 10GB memory and we tuned PostgreSQL as follow:
- max_connection = 100
- effective_cache_size = 5GB
- shared_buffer = 2GB
- wal_buffer = 30MB
- work_mem = 50MB

However we suffered 2 times server crashes after tunning the configuration. 
Does anyone have any idea how this can happen?

BR 

Kevin Wang


Re: [PERFORM] Will shared_buffers crash a server

2011-04-29 Thread Claudio Freire
As for the question in the title, no, if the server starts, shared
buffers should not be the reason for a subsequent crash.

In debian, it is common that the maximum allowed shared memory setting
on your kernel will prevent a server from even starting, but I guess
that's not your problem (because it did start).

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] pgpoolAdmin handling several pgpool-II clusters

2011-04-29 Thread Jorgen
Can pgpoolAdmin utility handle(administer) more than one pgpool-II custer?

We have a need of setting up 3 independent postgres clusters. One cluster
handling cadastral maps, one handling raster maps and one handling vector
maps. Each of these clusters must have a load balancer - EG pgpool-II.
Internally in each cluster we plan to (and have tested) PostgreSQL(9.03)'s
own streaming replication. We have installed pgpool-II, and are now
confronted with the complicated installation of pgpoolAdmin web-app. Hence
we would very much like to have only one pgpoolAdmin instance to govern all
3 pgpool-II clusters.

(Alternatively we will go for a more complex configuration with PostgresXC.)

(Have tried to post to http://pgsqlpgpool.blogspot.com - with no success)

Kindest regards Jørgen Münster-Swendsen
www.kms.dk--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/pgpoolAdmin-handling-several-pgpool-II-clusters-tp4358647p4358647.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] FUSION-IO io cards

2011-04-29 Thread Joachim Worringen

On 04/29/2011 04:54 PM, Ben Chobot wrote:

We have a bunch of their cards, purchased when we were still on 8.1 and
were having difficulty with vacuums. (Duh.) They helped out a bunch for
that. They're fast, no question about it. Each FusionIO device (they
have cards with multiple devices) can do ~100k iops. So that's nifty.

On the downside, they're also somewhat exotic, in that they need special
kernel drivers, so they're not as easy as just buying a bunch of drives.
More negatively, they're $$$. And even more negatively, their drivers
are inefficient - expect to dedicate a CPU core to doing whatever they
need done.


I would recommend to have a look a Texas Memory Systems for a 
comparison. FusionIO does a lot of work in software, as Ben noted 
correctly, while TMS (their stuff is called RAMSAN) is a more 
all-in-hardware device.


Haven't used TMS myself, but talked to people who do know and their 
experience with both products is that TMS is problem-free and has a more 
deterministic performance. And I have in fact benchmarked FusionIO and 
observed non-deterministic performance, which means performance goes 
down siginificantly on occasion - probably because some software-based 
house-keeping needs to be done.


--
Joachim Worringen
Senior Performance Architect

International Algorithmic Trading GmbH


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] FUSION-IO io cards

2011-04-29 Thread gnuoytr
TMS RAMSAN is a DRAM device.  TMS built DRAM SSDs going back decades, but have 
recently gotten into flash SSDs as well.  The DRAM parts are in an order of 
magnitude more expensive than others' flash SSDs, gig by gig.  Also, about as 
fast as off cpu storage gets.

regards,
Robert

 Original message 
Date: Fri, 29 Apr 2011 18:04:17 +0200
From: pgsql-performance-ow...@postgresql.org (on behalf of Joachim Worringen 
joachim.worrin...@iathh.de)
Subject: Re: [PERFORM] FUSION-IO io cards  
To: pgsql-performance@postgresql.org

On 04/29/2011 04:54 PM, Ben Chobot wrote:
 We have a bunch of their cards, purchased when we were still on 8.1 and
 were having difficulty with vacuums. (Duh.) They helped out a bunch for
 that. They're fast, no question about it. Each FusionIO device (they
 have cards with multiple devices) can do ~100k iops. So that's nifty.

 On the downside, they're also somewhat exotic, in that they need special
 kernel drivers, so they're not as easy as just buying a bunch of drives.
 More negatively, they're $$$. And even more negatively, their drivers
 are inefficient - expect to dedicate a CPU core to doing whatever they
 need done.

I would recommend to have a look a Texas Memory Systems for a 
comparison. FusionIO does a lot of work in software, as Ben noted 
correctly, while TMS (their stuff is called RAMSAN) is a more 
all-in-hardware device.

Haven't used TMS myself, but talked to people who do know and their 
experience with both products is that TMS is problem-free and has a more 
deterministic performance. And I have in fact benchmarked FusionIO and 
observed non-deterministic performance, which means performance goes 
down siginificantly on occasion - probably because some software-based 
house-keeping needs to be done.

-- 
Joachim Worringen
Senior Performance Architect

International Algorithmic Trading GmbH


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] 8.4.7, incorrect estimate

2011-04-29 Thread Wayne Conrad
Howdy.  We've got a query that takes less than a second unless we add a 
order by to it, after which it takes 40 seconds.  Here's the query:


select page_number, ps_id, ps_page_id from ps_page where ps_page_id in 
(select ps_page_id from documents_ps_page where document_id in (select 
document_id from temp_doc_ids)) order by ps_page_id;


The parts of the schema used in this query:

Table public.ps_page
   Column|  Type   |  Modifiers 


-+-+--
 ps_page_id  | integer | not null default 
nextval('ps_page_ps_page_id_seq'::regclass)

 ps_id   | integer | not null
 page_number | integer | not null
Indexes:
ps_page_pkey PRIMARY KEY, btree (ps_page_id)
ps_page_ps_id_key UNIQUE, btree (ps_id, page_number)

 Table public.documents_ps_page
   Column|  Type   | Modifiers
-+-+---
 document_id | text| not null
 ps_page_id  | integer | not null
Indexes:
documents_ps_page_pkey PRIMARY KEY, btree (document_id, ps_page_id)
documents_ps_page_ps_page_id_idx btree (ps_page_id)

temp_doc_ids (temporary table):
  document_id text not null

The query with the order by (slow):

explain analyze select page_number, ps_id, ps_page_id from ps_page where 
ps_page_id in (select ps_page_id from documents_ps_page where 
document_id in (select document_id from temp_document_ids)) order by 
ps_page_id
Merge Semi Join  (cost=212570.02..3164648.31 rows=34398932 
width=12) (actual time=54749.281..54749.295 rows=5 loops=1)

  Merge Cond: (ps_page.ps_page_id = documents_ps_page.ps_page_id)
  -  Index Scan using ps_page_pkey on ps_page 
(cost=0.00..2999686.03 rows=86083592 width=12) (actual 
time=0.029..36659.393 rows=85591467 loops=1)
  -  Sort  (cost=18139.39..18152.52 rows=6255 width=4) (actual 
time=0.080..0.083 rows=5 loops=1)

Sort Key: documents_ps_page.ps_page_id
Sort Method:  quicksort  Memory: 25kB
-  Nested Loop  (cost=26.23..17808.09 rows=6255 width=4) 
(actual time=0.044..0.073 rows=5 loops=1)
  -  HashAggregate  (cost=26.23..27.83 rows=200 
width=32) (actual time=0.015..0.017 rows=5 loops=1)
-  Seq Scan on temp_document_ids 
(cost=0.00..23.48 rows=1310 width=32) (actual time=0.004..0.007 rows=5 
loops=1)
  -  Index Scan using documents_ps_page_pkey on 
documents_ps_page  (cost=0.00..88.59 rows=31 width=42) (actual 
time=0.009..0.010 rows=1 loops=5)
Index Cond: (documents_ps_page.document_id = 
(temp_document_ids.document_id)::text)

Total runtime: 54753.028 ms

The query without the order by (fast):

production= explain analyze select page_number, ps_id, ps_page_id from 
ps_page where ps_page_id in (select ps_page_id from documents_ps_page 
where document_id in (select document_id from temp_doc_ids));


   QUERY PLAN
---
 Nested Loop  (cost=17821.42..87598.71 rows=34398932 width=12) (actual 
time=0.099..0.136 rows=5 loops=1)
   -  HashAggregate  (cost=17821.42..17871.46 rows=6255 width=4) 
(actual time=0.083..0.096 rows=5 loops=1)
 -  Nested Loop  (cost=26.23..17808.28 rows=6255 width=4) 
(actual time=0.047..0.076 rows=5 loops=1)
   -  HashAggregate  (cost=26.23..27.83 rows=200 width=32) 
(actual time=0.014..0.015 rows=5 loops=1)
 -  Seq Scan on temp_doc_ids  (cost=0.00..23.48 
rows=1310 width=32) (actual time=0.005..0.005 rows=5 loops=1)
   -  Index Scan using documents_ps_page_pkey on 
documents_ps_page  (cost=0.00..88.59 rows=31 width=42) (actual 
time=0.010..0.010 rows=1 loops=5)
 Index Cond: (documents_ps_page.document_id = 
temp_doc_ids.document_id)
   -  Index Scan using ps_page_pkey on ps_page  (cost=0.00..11.14 
rows=1 width=12) (actual time=0.007..0.007 rows=1 loops=5)

 Index Cond: (ps_page.ps_page_id = documents_ps_page.ps_page_id)
 Total runtime: 0.213 ms
(10 rows)

We notice that in all cases, the plans contain some estimated row counts 
that differ quite a bit from the actual row counts.  We tried increasing 
(from 100 to 1,000 and 10,000) the statistics targets for each of the 
indexed columns, one at a time, and analyzing the table/column with each 
change.  This had no effect.


Postgres version 8.4.7 on AMD64, Debian Linux wheezy (aka testing).

Where should we look next?

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance

2011-04-29 Thread Greg Smith

James Mansion wrote:
Does the server know which IO it thinks is sequential, and which it 
thinks is random?  Could it not time the IOs (perhaps optionally) and 
at least keep some sort of statistics of the actual observed times?


It makes some assumptions based on what the individual query nodes are 
doing.  Sequential scans are obviously sequential; index lookupss 
random; bitmap index scans random.


The measure the I/O and determine cache state from latency profile has 
been tried, I believe it was Greg Stark who ran a good experiment of 
that a few years ago.  Based on the difficulties of figuring out what 
you're actually going to with that data, I don't think the idea will 
ever go anywhere.  There are some really nasty feedback loops possible 
in all these approaches for better modeling what's in cache, and this 
one suffers the worst from that possibility.  If for example you 
discover that accessing index blocks is slow, you might avoid using them 
in favor of a measured fast sequential scan.  Once you've fallen into 
that local minimum, you're stuck there.  Since you never access the 
index blocks, they'll never get into RAM so that accessing them becomes 
fast--even though doing that once might be much more efficient, 
long-term, than avoiding the index.


There are also some severe query plan stability issues with this idea 
beyond this.  The idea that your plan might vary based on execution 
latency, that the system load going up can make query plans alter with 
it, is terrifying for a production server.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] FUSION-IO io cards

2011-04-29 Thread Joachim Worringen

On 04/29/2011 06:52 PM, gnuo...@rcn.com wrote:

TMS RAMSAN is a DRAM device.  TMS built DRAM SSDs going back decades,
but have recently gotten into flash SSDs as well.  The DRAM parts are
in an order of magnitude more expensive than others' flash SSDs, gig
by gig.  Also, about as fast as off cpu storage gets.


Their naming convention is a bit confusing, but in fact the RamSan boxes 
are available in flash and RAM-based variants:


The RamSan-630 offers 10 TB SLC Flash storage, 1,000,000 IOPS (10 GB/s) 
random sustained throughput, and just 500 watts power consumption.


I was referring to those. Of course, they may be more expensive than 
FusionIO. You get what you pay for (in this case).


--
Joachim Worringen
Senior Performance Architect

International Algorithmic Trading GmbH


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance

2011-04-29 Thread Andy Colson

On 4/29/2011 1:55 PM, Greg Smith wrote:

James Mansion wrote:

Does the server know which IO it thinks is sequential, and which it
thinks is random? Could it not time the IOs (perhaps optionally) and
at least keep some sort of statistics of the actual observed times?


It makes some assumptions based on what the individual query nodes are
doing. Sequential scans are obviously sequential; index lookupss random;
bitmap index scans random.

The measure the I/O and determine cache state from latency profile has
been tried, I believe it was Greg Stark who ran a good experiment of
that a few years ago. Based on the difficulties of figuring out what
you're actually going to with that data, I don't think the idea will
ever go anywhere. There are some really nasty feedback loops possible in
all these approaches for better modeling what's in cache, and this one
suffers the worst from that possibility. If for example you discover
that accessing index blocks is slow, you might avoid using them in favor
of a measured fast sequential scan. Once you've fallen into that local
minimum, you're stuck there. Since you never access the index blocks,
they'll never get into RAM so that accessing them becomes fast--even
though doing that once might be much more efficient, long-term, than
avoiding the index.

There are also some severe query plan stability issues with this idea
beyond this. The idea that your plan might vary based on execution
latency, that the system load going up can make query plans alter with
it, is terrifying for a production server.



How about if the stats were kept, but had no affect on plans, or 
optimizer or anything else.


It would be a diag tool.  When someone wrote the list saying AH! It 
used the wrong index!.  You could say, please post your config 
settings, and the stats from 'select * from pg_stats_something'


We (or, you really) could compare the seq_page_cost and random_page_cost 
from the config to the stats collected by PG and determine they are way 
off... and you should edit your config a little and restart PG.


-Andy

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance

2011-04-29 Thread James Mansion

Greg Smith wrote:
There are also some severe query plan stability issues with this idea 
beyond this.  The idea that your plan might vary based on execution 
latency, that the system load going up can make query plans alter with 
it, is terrifying for a production server.


I thought I was clear that it should present some stats to the DBA, not 
that it would try to auto-tune?  This thread started with a discussion 
of appropriate tunings for random page cost vs sequential page cost I 
believe,, based on some finger in the air based on total size vs 
available disk cache.  And it was observed that on systems that have 
very large databases but modest hot data, you can perform like a fully 
cached system, for much of the time.


I'm just suggesting providing statistical information to the DBA which 
will indicate whether the system has 'recently' been behaving like a 
system that runs from buffer cache and/or subsystem caches, or one that 
runs from disk platters, and what the actual observed latency difference 
is.  It may well be that this varies with time of day or day of week.  
Whether the actual latencies translate directly into the relative costs 
is another matter.





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 8.4.7, incorrect estimate

2011-04-29 Thread Tom Lane
Kevin Grittner kevin.gritt...@wicourts.gov writes:
 Also, make sure that you run ANALYZE against your temp table right
 before running your query.

Yeah.  It's fairly hard to credit that temp_document_ids has any stats
given the way-off estimates for it.  Keep in mind that autovacuum
can't help you on temp tables: since only your own session can
access a temp table, you have to ANALYZE it explicitly if you need
the planner to have decent stats about it.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance

2011-04-29 Thread Greg Smith

James Mansion wrote:
I thought I was clear that it should present some stats to the DBA, 
not that it would try to auto-tune?


You were.  But people are bound to make decisions about how to retune 
their database based on that information.  The situation when doing 
manual tuning isn't that much different, it just occurs more slowly, and 
with the potential to not react at all if the data is incoherent.  That 
might be better, but you have to assume that a naive person will just 
follow suggestions on how to re-tune based on that the same way an 
auto-tune process would.


I don't like this whole approach because it takes something the database 
and DBA have no control over (read timing) and makes it a primary input 
to the tuning model.  Plus, the overhead of collecting this data is big 
relative to its potential value.


Anyway, how to collect this data is a separate problem from what should 
be done with it in the optimizer.  I don't actually care about the 
collection part very much; there are a bunch of approaches with various 
trade-offs.  Deciding how to tell the optimizer about what's cached 
already is the more important problem that needs to be solved before any 
of this takes you somewhere useful, and focusing on the collection part 
doesn't move that forward.  Trying to map the real world into the 
currently exposed parameter set isn't a solvable problem.  We really 
need cached_page_cost and random_page_cost, plus a way to model the 
cached state per relation that doesn't fall easily into feedback loops.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance

2011-04-29 Thread Claudio Freire
On Fri, Apr 29, 2011 at 11:37 PM, Greg Smith g...@2ndquadrant.com wrote:
 Anyway, how to collect this data is a separate problem from what should be
 done with it in the optimizer.  I don't actually care about the collection
 part very much; there are a bunch of approaches with various trade-offs.
  Deciding how to tell the optimizer about what's cached already is the more
 important problem that needs to be solved before any of this takes you
 somewhere useful, and focusing on the collection part doesn't move that
 forward.  Trying to map the real world into the currently exposed parameter
 set isn't a solvable problem.  We really need cached_page_cost and
 random_page_cost, plus a way to model the cached state per relation that
 doesn't fall easily into feedback loops.

This is valuable input...

I was already worried about feedback loops, and hearing that it has
been tried and resulted in them is invaluable.

From my experience, what really blows up in your face when your
servers are saturated, is the effective cache size. Postgres thinks an
index will fit into the cache, but it doesn't at times of high load,
meaning that, actually, a sequential scan would be orders of magnitude
better - if it's a small enough table.

Perhaps just adjusting effective cache size would provide a good
enough benefit without the disastrous feedback loops?

I'll have to test that idea...

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] index usage on queries on inherited tables

2011-04-29 Thread Robert Haas
On Apr 27, 2011, at 11:11 PM, Joseph Shraibman j...@selectacast.net wrote:
 On 04/27/2011 04:32 PM, Robert Haas wrote:
 In the first case, PostgreSQL evidently thinks that using the indexes
 will be slower than just ignoring them.  You could find out whether
 it's right by trying it with enable_seqscan=off.
 
 My point is that this is just a problem with inherited tables.  It
 should be obvious to postgres that few rows are being returned, but in
 the inherited tables case it doesn't use indexes.  This was just an
 example.  In a 52 gig table I have a select id from table limit 1 order
 by id desc returns instantly, but as soon as you declare a child table
 it tries to seq scan all the tables.

Oh, sorry, I must have misunderstood. As Greg says, this is fixed in 9.1.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance

2011-04-29 Thread Robert Haas
On Apr 29, 2011, at 10:25 AM, James Mansion ja...@mansionfamily.plus.com 
wrote:
 Robert Haas wrote:
 The server can and does measure hit rates for the PG buffer pool, but to my 
 knowledge there is no clear-cut way for PG to know whether read() is 
 satisfied from the OS cache or a drive cache or the platter.
 
  
 Does the server know which IO it thinks is sequential, and which it thinks is 
 random? 

No. It models this in the optimizer, but the executor has no clue.  And 
sometimes we model I/O as partly random, partly sequential, as in the case of 
heap fetches on a clustered index.  So the answer isn't even a Boolean.

...Robert
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance

2011-04-29 Thread Robert Haas
On Apr 27, 2011, at 11:01 PM, Claudio Freire klaussfre...@gmail.com wrote:
 The patch may be simple, the testing not so much. I know that.
 
 What tools do we have to do that testing? There are lots, and all
 imply a lot of work. Is that work worth the trouble? Because if it
 is... why not work?
 
 I would propose a step in the right direction: a patch to compute and
 log periodical estimations of the main I/O tunables: random_page_cost,
 sequential_page_cost and effective_cache_size. Maybe per-tablespace.
 Evaluate the performance impact, and work from there.
 
 Because, probably just using those values as input to the optimizer
 won't work, because dbas will want a way to tune the optimizer,
 because the system may not be stable enough, even because even with
 accurate estimates for those values, the optimizer may not perform as
 expected. I mean, right now those values are tunables, not real
 metrics, so perhaps the optimizer won't respond well to real values.
 
 But having the ability to measure them without a serious performance
 impact is a step in the right direction, right?

Sure. It's not a real easy problem, but don't let that discourage you from 
working on it. Getting more eyeballs on these issues can only be a good thing.

...Robert
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] pgpoolAdmin handling several pgpool-II clusters

2011-04-29 Thread Tatsuo Ishii
 Can pgpoolAdmin utility handle(administer) more than one pgpool-II custer?

No. pgpoolAdmin only supports one pgpool-II server.

 We have a need of setting up 3 independent postgres clusters. One cluster
 handling cadastral maps, one handling raster maps and one handling vector
 maps. Each of these clusters must have a load balancer - EG pgpool-II.
 Internally in each cluster we plan to (and have tested) PostgreSQL(9.03)'s
 own streaming replication. We have installed pgpool-II, and are now
 confronted with the complicated installation of pgpoolAdmin web-app. Hence
 we would very much like to have only one pgpoolAdmin instance to govern all
 3 pgpool-II clusters.
 
 (Alternatively we will go for a more complex configuration with PostgresXC.)

Becase pgpoolAdmin is a web application, you could assign a tab to a
pgpoolAdmin.

 (Have tried to post to http://pgsqlpgpool.blogspot.com - with no success)

It's my personal blog:-) Please post to pgpool-geneal mailing list.

You can subscribe it from:
http://lists.pgfoundry.org/mailman/listinfo/pgpool-general
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Re: VX_CONCURRENT flag on vxfs( 5.1 or later) for performance for postgresql?

2011-04-29 Thread Greg Smith

On 04/30/2011 12:24 AM, Hsien-Wen Chu wrote:

I'm little bit confuse why it is not safe. and my question is following.

for database application, we need to avoid double cache, PostgreSQL
shared_buffer will cache the data, so we do not want to file system to
cache the data right?. so  the DIRECT IO is better, right?.
   


No.  There are parts of PostgreSQL that expect the operating system to 
do write caching.  Two examples are the transaction logs and the 
processing done by VACUUM.  If you eliminate that with direct I/O, the 
slowdown can be much, much larger than what you gain by eliminating 
double-buffering on reads.


On the read side, PostgreSQL also expects that operating system features 
like read-ahead are working properly.  While this does introduce some 
double-buffering, the benefits for sequential scans are larger than that 
overhead, too.  You may not get the expected read-ahead behavior if you 
use direct I/O.


Direct I/O is not a magic switch that makes things faster; you have to 
very specifically write your application to work around what it does, 
good and bad, before it is expected to improves things.  And PostgreSQL 
isn't written that way.  It definitely requires OS caching to work well.



for VXFS, if the we use ioctl(fd,vx_cacheset,vx_concurrent) API,
according to the vxfs document, it will hold a shared lock for write
operation, but not the exclusive clock, also it is a direct IO,
   


There are very specific technical requirements that you must follow when 
using direct I/O.  You don't get direct I/O without also following its 
alignment needs.  Read the Direct I/O best practices section of 
http://people.redhat.com/msnitzer/docs/io-limits.txt for a quick intro 
to the subject.  And there's this additional set of requirements you 
mention in order for this particular VXFS feature to work, which I can't 
even comment on.  But you can be sure PostgreSQL doesn't try to do 
either of those things--it's definitely not aligning for direct I/O.  
Has nothing to do with ACID or the filesystem.


Now, the VXFS implementation may do some tricks that bypass the 
alignment requirements.  But even if you got it to work, it would still 
be slower for anything but some read-only workloads.  Double buffering 
is really not that big of a performance problem, you just need to make 
sure you don't set shared_buffers to an extremely large value.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance