Re: [PERFORM] shared_buffers advice

2010-05-24 Thread Konrad Garus
2010/3/11 Paul McGarry p...@paulmcgarry.com:

 I'm basically wondering how the postgresql cache (ie shared_buffers)
 and the OS page_cache interact. The general advice seems to be to
 assign 1/4 of RAM to shared buffers.

 I don't have a good knowledge of the internals but I'm wondering if
 this will effectively mean that roughly the same amount of RAM being
 used for the OS page cache will be used for redundantly caching
 something the Postgres is caching as well?

I have a similar problem but I can't see an answer in this thread.

Our dedicated server has 16 GB RAM. Among other settings
shared_buffers is 2 GB, effective_cache_size is 12 GB.

Do shared_buffers duplicate contents of OS page cache? If so, how do I
know if 25% RAM is the right value for me? Actually it would not seem
to be true - the less redundancy the better.

Another question - is there a tool or built-in statistic that tells
when/how often/how much a table is read from disk? I mean physical
read, not poll from OS cache to shared_buffers.

-- 
Konrad Garus

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[SPAM] Re: [PERFORM] shared_buffers advice

2010-05-24 Thread Ben Chobot
On May 24, 2010, at 4:25 AM, Konrad Garus wrote:

 Do shared_buffers duplicate contents of OS page cache? If so, how do I
 know if 25% RAM is the right value for me? Actually it would not seem
 to be true - the less redundancy the better.

You can look into the pg_buffercache contrib module. 

 Another question - is there a tool or built-in statistic that tells
 when/how often/how much a table is read from disk? I mean physical
 read, not poll from OS cache to shared_buffers.

Well, the pg_stat_* tables tell you how much logical IO is going on, but 
postgres has no way of knowing how effective the OS or disk controller caches 
are.
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] shared_buffers advice

2010-05-24 Thread Merlin Moncure
On Wed, Mar 10, 2010 at 9:28 PM, Paul McGarry p...@paulmcgarry.com wrote:
 Hi there,

 I'm after a little bit of advice on the shared_buffers setting (I have
 read the various docs on/linked from the performance tuning wiki page,
 some very helpful stuff there so thanks to those people).

 I am setting up a 64bit Linux server running Postgresql 8.3, the
 server has 64gigs of memory and Postgres is the only major application
 running on it. (This server is to go alongside some existing 8.3
 servers, we will look at 8.4/9 migration later)

 I'm basically wondering how the postgresql cache (ie shared_buffers)
 and the OS page_cache interact. The general advice seems to be to
 assign 1/4 of RAM to shared buffers.

 I don't have a good knowledge of the internals but I'm wondering if
 this will effectively mean that roughly the same amount of RAM being
 used for the OS page cache will be used for redundantly caching
 something the Postgres is caching as well?

 IE when Postgres reads something from disk it will go into both the OS
 page cache and the Postgresql shared_buffers and the OS page cache
 copy is unlikely to be useful for anything.

 If that is the case what are the downsides to having less overlap
 between the caches, IE heavily favouring one or the other, such as
 allocating shared_buffers to a much larger percentage (such as 90-95%
 of expected 'free' memory).

I've personally heard tons of anecdotal evidence wrt shared buffers
setting.  There is a bit of benchmarking info suggesting you can eek
marginal gains via shared buffers setting but you have to take
(unfortunately) o/s, hardware, filesystem and other factors all into
account.

Here is what I'm pretty confident about saying:
*) a page fault to disk is a much bigger deal than a fault to pg cache
vs os/ cache.

many people assume that raising shared buffers decreases the chance of
a disk fault.  it doesn't -- at least not in the simple way you would
think -- all modern o/s aggressively cache filesystem data already so
we are simply layering over the o/s cache.

If your database is really big -- anything that reduces disk faults is
a win and increases them is a loss.  tps measurements according to
pgbench are not as interesting to me as iops from the disk system.

*) shared buffer affects are hard to detect in the single user case.

The performance of a single 'non disk bound' large query will perform
pretty much the same regardless of how you set shared buffers.  In
other words, you will not be able to easily measure the differences in
the setting outside of a real or simulated production workload.

*) shared_buffers is one of the _least_ important performance settings
in postgresql.conf

Many settings, like work_mem, planner tweaks, commit settings,
autovacuum settings, can dramatically impact your workload performance
in spectacular ways, but tend to be 'case by case' specific.  shared
buffers affects _everything_, albeit in very subtle ways, so you have
to be careful.

*) I sometimes wonder if the o/s should just manage everything.

we just said goodbye to the fsm (thank goodness for that!) -- what
about a fully o/s managed cache?  goodbye svsv ipc?  note my views
here are very simplistic -- I don't have anything close to a full
understanding of the cache machinery in the database.

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] which hardware setup

2010-05-24 Thread Pedro Axelrud
Hello,

I work for a web app to send email newsletters, and I have one question
about postgres' performance in two different setups. Actually we have one
4GB Ram VPS running our app server (it's a rails app under nginx and thin)
and a 4GB Ram VPS running the database (18GB). We want to migrate to bare
metal servers, but I have made two different setups and don't know what is
the best option:

Option 1:
App Server: Dual Xeon 5130 dual core with 4GB ram and SATA disk
Postgres: Xeon 3360 quad core with 4GB ram and 2 x 146GB 15k RPM SAS (RAID1)
disks

Option 2:
App Server and Postgres: Dual Xeon 5520 quad core with 12GB ram and 2x 146GB
15k RPM SAS (RAID1) disks

I know the first option would be better in terms of I/O for postgres, but
our app server doesnt use much I/O and with the second option we would have
much more ram.


Thank you

Pedro Axelrud
http://mailee.me
http://softa.com.br
http://flavors.me/pedroaxl


[PERFORM] Random Page Cost and Planner

2010-05-24 Thread David Jarvis
Hi,

I wrote a query (see below) that extracts climate data from weather stations
within a given radius of a city using the dates for which those weather
stations actually have data. The query uses the measurement table's only
index:

CREATE UNIQUE INDEX measurement_001_stc_idx
  ON climate.measurement_001
  USING btree
  (*station_id, taken, category_id*);

The value for *random_page_cost* was at 2.0; reducing it to 1.1 had a
massive performance improvement (nearly an order of magnitude). While the
results now return in 5 seconds (down from ~85 seconds), problematic lines
remain. Bumping the query's end date by a single year causes a full table
scan:

sc.taken_start = '1900-01-01'::date AND
sc.taken_end = '1997-12-31'::date AND *
*
How do I persuade PostgreSQL to use the indexes, regardless of number of
years between the two dates? (A full table scan against 43 million rows is
probably not the best plan.) Find the EXPLAIN ANALYSE results below the
query.

Thanks again!

Dave

Query
  SELECT
extract(YEAR FROM m.taken) AS year,
avg(m.amount) as amount
  FROM
climate.city c,
climate.station s,
climate.station_category sc,
climate.measurement m
  WHERE
c.id = 5182 AND
earth_distance(
  ll_to_earth(c.latitude_decimal,c.longitude_decimal),
  ll_to_earth(s.latitude_decimal,s.longitude_decimal)) / 1000 = 30 AND
s.elevation BETWEEN 0 AND 3000 AND
s.applicable = TRUE AND
sc.station_id = s.id AND
sc.category_id = 1 AND
*sc.taken_start = '1900-01-01'::date AND
sc.taken_end = '1996-12-31'::date AND
*m.station_id = s.id AND
m.taken BETWEEN sc.taken_start AND sc.taken_end AND
m.category_id = sc.category_id
  GROUP BY
extract(YEAR FROM m.taken)
  ORDER BY
extract(YEAR FROM m.taken)

1900 to 1996: Index*
*Sort  (cost=1348597.71..1348598.21 rows=200 width=12) (actual
time=2268.929..2268.935 rows=92 loops=1)
  Sort Key: (date_part('year'::text, (m.taken)::timestamp without time
zone))
  Sort Method:  quicksort  Memory: 32kB
  -  HashAggregate  (cost=1348586.56..1348590.06 rows=200 width=12)
(actual time=2268.829..2268.886 rows=92 loops=1)
-  Nested Loop  (cost=0.00..1344864.01 rows=744510 width=12)
(actual time=0.807..2084.206 rows=134893 loops=1)
  Join Filter: ((m.taken = sc.taken_start) AND (m.taken =
sc.taken_end) AND (sc.station_id = m.station_id))
  -  Nested Loop  (cost=0.00..12755.07 rows=1220 width=18)
(actual time=0.502..521.937 rows=23 loops=1)
Join Filter:
((sec_to_gc(cube_distance((ll_to_earth((c.latitude_decimal)::double
precision, (c.longitude_decimal)::double precision))::cube,
(ll_to_earth((s.latitude_decimal)::double precision,
(s.longitude_decimal)::double precision))::cube)) / 1000::double precision)
= 30::double precision)
-  Index Scan using city_pkey1 on city c
(cost=0.00..2.47 rows=1 width=16) (actual time=0.014..0.015 rows=1 loops=1)
  Index Cond: (id = 5182)
-  Nested Loop  (cost=0.00..9907.73 rows=3659
width=34) (actual time=0.014..28.937 rows=3458 loops=1)
  -  Seq Scan on station_category sc
(cost=0.00..970.20 rows=3659 width=14) (actual time=0.008..10.947 rows=3458
loops=1)
Filter: ((taken_start =
'1900-01-01'::date) AND (taken_end = '1996-12-31'::date) AND (category_id =
1))
  -  Index Scan using station_pkey1 on station s
(cost=0.00..2.43 rows=1 width=20) (actual time=0.004..0.004 rows=1
loops=3458)
Index Cond: (s.id = sc.station_id)
Filter: (s.applicable AND (s.elevation =
0) AND (s.elevation = 3000))
  -  Append  (cost=0.00..1072.27 rows=947 width=18) (actual
time=6.996..63.199 rows=5865 loops=23)
-  Seq Scan on measurement m  (cost=0.00..25.00 rows=6
width=22) (actual time=0.000..0.000 rows=0 loops=23)
  Filter: (m.category_id = 1)
-  Bitmap Heap Scan on measurement_001 m
(cost=20.79..1047.27 rows=941 width=18) (actual time=6.995..62.390 rows=5865
loops=23)
  Recheck Cond: ((m.station_id = sc.station_id) AND
(m.taken = sc.taken_start) AND (m.taken = sc.taken_end) AND (m.category_id
= 1))
  -  Bitmap Index Scan on measurement_001_stc_idx
(cost=0.00..20.55 rows=941 width=0) (actual time=5.775..5.775 rows=5865
loops=23)
Index Cond: ((m.station_id = sc.station_id)
AND (m.taken = sc.taken_start) AND (m.taken = sc.taken_end) AND
(m.category_id = 1))
Total runtime: 2269.264 ms

1900 to 1997: Full Table Scan*
*Sort  (cost=1370192.26..1370192.76 rows=200 width=12) (actual
time=86165.797..86165.809 rows=94 loops=1)
  Sort Key: (date_part('year'::text, (m.taken)::timestamp without time
zone))
  Sort Method:  quicksort  Memory: 32kB
  -  HashAggregate  (cost=1370181.12..1370184.62 rows=200 width=12)