subject:"\[PERFORM\] PG as in\-memory db\? How to warm up and re\-populate buffers\? How to read in all tuples into memory\?"

[PERFORM] Re: [PERFORM] Re: [PERFORM] Re: [PERFORM] Re: [PERFORM] Re: 回复： [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?

2012-03-02 Thread Tomas Vondra

On 2.3.2012 03:05, Claudio Freire wrote:
 On Thu, Mar 1, 2012 at 10:13 PM, Tomas Vondra t...@fuzzy.cz wrote:

 Maybe. I still am not sure how fsync=off affects the eviction in your
 opinion. I think it does not (or just very remotely) and you were saying
 the opposite. IMHO the eviction of (dirty) buffers is either very fast
 or slow, no matter what the fsync setting is.
 
 I was thinking page cleanup, but if you're confident it doesn't happen
 on a read-only database, I'd have to agree on all your other points.
 
 I have seen a small amount of writes on a read-only devel DB I work
 with, though. Usually in the order of 100kb/s writes per 10mb/s reads
 - I attributed that to page cleanup. In that case, it can add some
 wait time to fsync, even though it's really a slow volume of writes.
 If you're right, I'm thinking, it may be some other thing... atime
 updates maybe, I'd have to check the filesystem configuration I guess.

I'd guess those writes were caused by hint bits (~ page cleanup, but
that's a one-time thing and should be fixed by VACUUM FREEZE right after
the load). Or maybe it was related to runtime stats (i.e. pgstat).

T.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?

2012-03-01 Thread Jeff Janes

On Tue, Feb 28, 2012 at 3:46 PM, Stefan Keller sfkel...@gmail.com wrote:
 2012/2/28 Claudio Freire klaussfre...@gmail.com:

 In the OP, you say There is enough main memory to hold all table
 contents.. I'm assuming, there you refer to your current system, with
 4GB memory.

 Sorry for the confusion: I'm doing these tests on this machine with
 one table (osm_point) and one country. This table has a size of 2.6GB
 and 10 million tuples. The other machine has to deal with at least 5
 tables in total and will be hold more than one country plus routing
 etc..

What is your shared_buffers set to?  2.6GB is uncomfortably close to
4GB, considering the computer has other things it needs to use memory
for as well.

A problem is that often the shared_buffers and the OS cache end up
being basically copies of one another, rather than complementing each
other.  So on read-only applications, the actually useful size of the
total cache turns out to be max(shared_buffers, RAM - 2*shared_buffers
- unknown_overhead).

So one choice is setting shared_buffers low (0.5GB) and let the OS
cache be your main cache.  Advantages of this are that the OS cache
survives PG server restarts, gets populated even by sequential scans,
and can be pre-warmed by the tar trick.  Disadvantages are that pages
can be driven out of the OS cache by non-PG related activity, which
can be hard to monitor and control.  Also, there is some small cost to
constantly transferring data from OS cache to PG cache, but in your
case I htink that would be negligible.

The other choice is setting shared_buffers high (3GB) and having it
be your main cache.  The advantage is that non-PG activity generally
won't drive it out.  The disadvantages are that it is hard to
pre-populate as the tar trick won't work, and neither will sequential
scans on tables due to the ring buffer.

Actually, the tar trick might work somewhat if applied either shortly
before or shortly after the database is started.  If the database
starts out not using its full allotment of memory, the OS will use it
for cache, and you can pre-populate that cache.  Then as the database
runs, the PG cache gets larger by copying needed data from the OS
cache into it.  As the PG cache grows, pages need to get evicted from
OS cache to make room for it.  Ideally, the pages evicted from the OS
cache would be the ones just copied into PG, but the kernel is not
aware of that.  So the whole thing is rather sub-optimal.

Cheers,

Jeff

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?

2012-03-01 Thread Scott Marlowe

On Thu, Mar 1, 2012 at 9:57 AM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Tue, Feb 28, 2012 at 3:46 PM, Stefan Keller sfkel...@gmail.com wrote:
 2012/2/28 Claudio Freire klaussfre...@gmail.com:

 In the OP, you say There is enough main memory to hold all table
 contents.. I'm assuming, there you refer to your current system, with
 4GB memory.

 Sorry for the confusion: I'm doing these tests on this machine with
 one table (osm_point) and one country. This table has a size of 2.6GB
 and 10 million tuples. The other machine has to deal with at least 5
 tables in total and will be hold more than one country plus routing
 etc..

 What is your shared_buffers set to?  2.6GB is uncomfortably close to
 4GB, considering the computer has other things it needs to use memory
 for as well.

The real danger here is that the kernel will happily swap ut
shared_buffers memory to make room to cache more from the hard disks,
especially if that shared_mem hasn't been touched in a while.  On a
stock kernel with swappinness of 60 etc, it's quite likely the OP is
seeing the DB go to get data from shared_buffers, and the OS is
actually paging in for shared_buffers. At that point reading from
kernel cache is MUCH faster, and reading from the HDs is still
probably faster than swapping in shared_buffers.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?

2012-03-01 Thread Dave Crooke

Just curious ... has anyone tried using a ram disk as the PG primary and
DRBD as the means to make it persistent?
On Mar 1, 2012 11:35 AM, Scott Marlowe scott.marl...@gmail.com wrote:

 On Thu, Mar 1, 2012 at 9:57 AM, Jeff Janes jeff.ja...@gmail.com wrote:
  On Tue, Feb 28, 2012 at 3:46 PM, Stefan Keller sfkel...@gmail.com
 wrote:
  2012/2/28 Claudio Freire klaussfre...@gmail.com:
 
  In the OP, you say There is enough main memory to hold all table
  contents.. I'm assuming, there you refer to your current system, with
  4GB memory.
 
  Sorry for the confusion: I'm doing these tests on this machine with
  one table (osm_point) and one country. This table has a size of 2.6GB
  and 10 million tuples. The other machine has to deal with at least 5
  tables in total and will be hold more than one country plus routing
  etc..
 
  What is your shared_buffers set to?  2.6GB is uncomfortably close to
  4GB, considering the computer has other things it needs to use memory
  for as well.

 The real danger here is that the kernel will happily swap ut
 shared_buffers memory to make room to cache more from the hard disks,
 especially if that shared_mem hasn't been touched in a while.  On a
 stock kernel with swappinness of 60 etc, it's quite likely the OP is
 seeing the DB go to get data from shared_buffers, and the OS is
 actually paging in for shared_buffers. At that point reading from
 kernel cache is MUCH faster, and reading from the HDs is still
 probably faster than swapping in shared_buffers.

 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?

2012-03-01 Thread Stefan Keller

2012/3/1 Jeff Janes jeff.ja...@gmail.com:
 On Tue, Feb 28, 2012 at 3:46 PM, Stefan Keller sfkel...@gmail.com wrote:
 2012/2/28 Claudio Freire klaussfre...@gmail.com:

 In the OP, you say There is enough main memory to hold all table
 contents.. I'm assuming, there you refer to your current system, with
 4GB memory.

 Sorry for the confusion: I'm doing these tests on this machine with
 one table (osm_point) and one country. This table has a size of 2.6GB
 and 10 million tuples. The other machine has to deal with at least 5
 tables in total and will be hold more than one country plus routing
 etc..

 What is your shared_buffers set to?  2.6GB is uncomfortably close to
 4GB, considering the computer has other things it needs to use memory
 for as well.

These are the current modified settings in postgresql.conf:
shared_buffers = 128MB
work_mem = 3MB
maintenance_work_mem = 30MB
effective_cache_size = 352MB
wal_buffers = 8MB
default_statistics_target = 50
constraint_exclusion = on
checkpoint_completion_target = 0.9
checkpoint_segments = 16
max_connections = 80

-Stefan

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?

2012-03-01 Thread Andrew Dunstan




On 03/01/2012 05:52 PM, Stefan Keller wrote:

These are the current modified settings in postgresql.conf:
shared_buffers = 128MB
work_mem = 3MB


These are extremely low settings on virtually any modern computer. I 
usually look to set shared buffers in numbers of Gb and work_mem at 
least in tens if not hundreds of Mb for any significantly sized database.


cheers

andrew

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?

2012-03-01 Thread Claudio Freire

On Thu, Mar 1, 2012 at 8:08 PM, Andrew Dunstan and...@dunslane.net wrote:
 These are extremely low settings on virtually any modern computer. I usually
 look to set shared buffers in numbers of Gb and work_mem at least in tens if
 not hundreds of Mb for any significantly sized database.

For a read-only database, as was discussed, a lower shared_buffers
settings makes sense. And 128M is low enough, I'd guess.

Setting work_mem to hundreds of MB in a 4G system is suicide. Tens
even is dangerous.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?

2012-03-01 Thread Peter van Hardenberg

On Thu, Mar 1, 2012 at 4:23 PM, Claudio Freire klaussfre...@gmail.com wrote:
 For a read-only database, as was discussed, a lower shared_buffers
 settings makes sense. And 128M is low enough, I'd guess.

 Setting work_mem to hundreds of MB in a 4G system is suicide. Tens
 even is dangerous.


Why do you say that? We've had work_mem happily at 100MB for years. Is
there a particular degenerate case you're concerned about?

-p

-- 
Peter van Hardenberg
San Francisco, California
Everything was beautiful, and nothing hurt. -- Kurt Vonnegut

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?

2012-03-01 Thread Jeff Janes

On Wed, Feb 29, 2012 at 7:28 AM, Stefan Keller sfkel...@gmail.com wrote:
 2012/2/29 Stefan Keller sfkel...@gmail.com:
 2012/2/29 Jeff Janes jeff.ja...@gmail.com:
 It's quite possible the vacuum full is thrashing your disk cache due
 to maintainance_work_mem. You can overcome this issue with the tar
 trick, which is more easily performed as:

 tar cf /dev/null $PG_DATA/base

 But on many implementations, that will not work.  tar detects the
 output is going to the bit bucket, and so doesn't bother to actually
 read the data.

 Right.
 But what about the commands cp $PG_DATA/base /dev/null or cat
 $PG_DATA/base  /dev/null ?
 They seem to do something.

For me they both give errors, because neither of them works on an
directory rather than ordinary files.


 ...or let's try /dev/zero instead /dev/null:
 tar cf /dev/zero $PG_DATA/base

That does seem to work.

So, does it solve your problem?

Cheers,

Jeff

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?

2012-03-01 Thread Claudio Freire

On Thu, Mar 1, 2012 at 9:28 PM, Peter van Hardenberg p...@pvh.ca wrote:
 Setting work_mem to hundreds of MB in a 4G system is suicide. Tens
 even is dangerous.


 Why do you say that? We've had work_mem happily at 100MB for years. Is
 there a particular degenerate case you're concerned about?

Me too.

But I've analyzed the queries I'll be sending to the database and I've
carefully bound the effective amount of memory used given the load
I'll be experiencing.

Saying that it should be set to 100M without consideration for those
matters is the suicide part. work_mem applies to each sort operation.
Suppose, just for the sake of argument, that each connection is
performing 5 such sorts (ie, 5 joins of big tables - not unthinkable),
then suppose you have your max_connections to the default of 100, then
the system could request as much as 50G of ram.

I set work_mem higher in my database system since I *know* most of the
connections will not perform any merge or hash joins, nor will they
sort the output, so they won't use work_mem even once. The ones that
will, I have limited on the application side to a handful, hence I
*know* that 50G theoretical maximum will not be reached.

Can the OP say that? I have no reason to think so. Hence I don't
suggest 100M is OK on a 4G system.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] Re: [PERFORM] Re: [PERFORM] Re: [PERFORM] Re: [PERFORM] Re: 回复： [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?

2012-03-01 Thread Tomas Vondra

On 28.2.2012 17:42, Claudio Freire wrote:
 On Tue, Feb 28, 2012 at 1:05 PM, Tomas Vondra t...@fuzzy.cz wrote:
 On 28 Únor 2012, 15:24, Claudio Freire wrote:
 It speeds a lot more than the initial load of data.

 Assuming the database is read-only, but not the filesystem (ie: it's
 not a slave, in which case all this is moot, as you said, there are no
 writes on a slave). That is, assuming this is a read-only master, then
 read-only queries don't mean read-only filesystem. Bookkeeping tasks
 like updating catalog dbs, statistics tables, page cleanup, stuff like
 that can actually result in writes.

 Writes that go through the WAL and then the filesystem.

 I'm not sure what maintenance tasks you mean. Sure, there are tasks that
 need to be performed after the load (stats, hint bits, updating system
 catalogs etc.) but this may happen once right after the load and then
 there's effectively zero write activity. Unless the database needs to
 write temp files, but that contradicts the 'fits into RAM' assumption ...
 
 AFAIK, stats need to be constantly updated.

Err, what kind of stats are we talking about? Statistics capturing
characteristics of the data or runtime stats? There's no point in
updating data stats (histograms, MCV, ...) for read-only data and
PostgreSQL doesn't do that.

Runtime stats OTOH are collected and written continuously, that's true.
But in most cases this is not a write-heavy task, and if it is then it's
recommended to place the pg_stat_tmp on ramdrive (it's usually just few
MBs, written repeatedly).

 Not sure about the rest.

AFAIK it's like this:

  updating catalog tables - no updates on read-only data

  updating statistics - data stats: no, runtime stats: yes

  page cleanup - no (just once after the load)

 And yes, it's quite possible to require temp files without a database
 that doesn't fit in memory, only big OLAP-style queries and small
 enough work_mem.

Right. I'm not exactly sure how I arrived to the crazy conclusion that
writing temp files somehow contradicts the 'fits into RAM' assumption.
That's clearly nonsense ...

 
 The writes are always carried out by the OS - except when dirty_ratio is
 exceeded (but that's a different story) and WAL with direct I/O enabled.
 The best way to allow merging the writes in shared buffers or page cache
 is to set the checkpoint_segments / checkpoint_timeout high enough.
 That way the transactions won't need to wait for writes to data files
 (which is the part related to evictions of buffers from cache). And
 read-only transactions won't need to wait at all because they don't need
 to wait for fsync on WAL.
 
 Exactly
 
 In essence, what was required, to keep everything in RAM for as much
 as possible.

 It *does* in the same way affect buffer eviction - it makes eviction
 *very* quick, and re-population equally as quick, if everything fits
 into memory.

 No it doesn't. Only a write caused by a background process (due to full
 shared buffers) means immediate eviction. A simple write (caused by a
 checkpoint) does not evict the page from shared buffers. Not even a
 background writer evicts a page from shared buffers, it merely marks them
 as 'clean' and leaves them there. And all those writes happen on the
 background, so the clients don't need to wait for them to complete (except
 for xlog checkpoints).
 
 So, we're saying the same.

Maybe. I still am not sure how fsync=off affects the eviction in your
opinion. I think it does not (or just very remotely) and you were saying
the opposite. IMHO the eviction of (dirty) buffers is either very fast
or slow, no matter what the fsync setting is.

 With all that, and enough RAM, it already does what was requested.
 
 Maybe it would help to tune shared_buffers-to-os-cache ratio, and
 dirty_ratio to allow a big portion of RAM used for write caching (if
 there were enough writes which I doubt), but, in essence, un
 unmodified postgres installation with enough RAM to hold the whole DB
 + shared buffers in RAM should perform quite optimally.

Probably, for a read-write database that fits into memory. In case of a
read-only database I don't think this really matters because the main
issue there are temp files and if you can stuff them into page cache
then you can just increase the work_mem instead and you're golden.

Tomas

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?

2012-03-01 Thread Andrew Dunstan




On 03/01/2012 07:58 PM, Claudio Freire wrote:

On Thu, Mar 1, 2012 at 9:28 PM, Peter van Hardenbergp...@pvh.ca  wrote:

Setting work_mem to hundreds of MB in a 4G system is suicide. Tens
even is dangerous.


Why do you say that? We've had work_mem happily at 100MB for years. Is
there a particular degenerate case you're concerned about?

Me too.

But I've analyzed the queries I'll be sending to the database and I've
carefully bound the effective amount of memory used given the load
I'll be experiencing.

Saying that it should be set to 100M without consideration for those
matters is the suicide part. work_mem applies to each sort operation.
Suppose, just for the sake of argument, that each connection is
performing 5 such sorts (ie, 5 joins of big tables - not unthinkable),
then suppose you have your max_connections to the default of 100, then
the system could request as much as 50G of ram.

I set work_mem higher in my database system since I *know* most of the
connections will not perform any merge or hash joins, nor will they
sort the output, so they won't use work_mem even once. The ones that
will, I have limited on the application side to a handful, hence I
*know* that 50G theoretical maximum will not be reached.

Can the OP say that? I have no reason to think so. Hence I don't
suggest 100M is OK on a 4G system.


Well, obviously you need to know your workload. Nobody said otherwise.

cheers

andrew

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] Re: [PERFORM] Re: [PERFORM] Re: [PERFORM] Re: [PERFORM] Re: 回复： [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?

2012-03-01 Thread Claudio Freire

On Thu, Mar 1, 2012 at 10:13 PM, Tomas Vondra t...@fuzzy.cz wrote:

 Maybe. I still am not sure how fsync=off affects the eviction in your
 opinion. I think it does not (or just very remotely) and you were saying
 the opposite. IMHO the eviction of (dirty) buffers is either very fast
 or slow, no matter what the fsync setting is.

I was thinking page cleanup, but if you're confident it doesn't happen
on a read-only database, I'd have to agree on all your other points.

I have seen a small amount of writes on a read-only devel DB I work
with, though. Usually in the order of 100kb/s writes per 10mb/s reads
- I attributed that to page cleanup. In that case, it can add some
wait time to fsync, even though it's really a slow volume of writes.
If you're right, I'm thinking, it may be some other thing... atime
updates maybe, I'd have to check the filesystem configuration I guess.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?

2012-02-29 Thread Jeff Janes

On Tue, Feb 28, 2012 at 2:41 PM, Claudio Freire klaussfre...@gmail.com wrote:
 On Tue, Feb 28, 2012 at 5:48 PM, Stefan Keller sfkel...@gmail.com wrote:
 P.S. And yes, the database is aka 'read-only' and truncated and
 re-populated from scratch every night. fsync is off so I don't care
 about ACID. After the indexes on name, hstore and geometry are
 generated I do a VACUUM FULL FREEZE. The current installation is a
 virtual machine with 4GB memory and the filesystem is read/write.
 The future machine will be a pizza box with 72GB memory.

 I don't get this. Something's wrong.

 In the OP, you say There is enough main memory to hold all table
 contents.. I'm assuming, there you refer to your current system, with
 4GB memory.

 So your data is less than 4GB, but then you'll be throwing a 72GB
 server? It's either tremendous overkill, or your data simply isn't
 less than 4GB.

 It's quite possible the vacuum full is thrashing your disk cache due
 to maintainance_work_mem. You can overcome this issue with the tar
 trick, which is more easily performed as:

 tar cf /dev/null $PG_DATA/base

But on many implementations, that will not work.  tar detects the
output is going to the bit bucket, and so doesn't bother to actually
read the data.

...

 Another option is to issue a simple vacuum after the vacuum full.
 Simple vacuum will just scan the tables and indices, I'm hoping doing
 nothing since the vacuum full will have cleaned everything already,
 but loading everything both in the OS cache and into shared_buffers.

Doesn't it use a ring buffer strategy, so it would load to OS, but
probably not to shared_buffers?

Cheers,

Jeff

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?

2012-02-29 Thread Claudio Freire

On Wed, Feb 29, 2012 at 12:16 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 But on many implementations, that will not work.  tar detects the
 output is going to the bit bucket, and so doesn't bother to actually
 read the data.

Really? Getting smart on us?

Shame on it. Who asked it to be smart?

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?

2012-02-29 Thread Stefan Keller

2012/2/29 Jeff Janes jeff.ja...@gmail.com:
 It's quite possible the vacuum full is thrashing your disk cache due
 to maintainance_work_mem. You can overcome this issue with the tar
 trick, which is more easily performed as:

 tar cf /dev/null $PG_DATA/base

 But on many implementations, that will not work.  tar detects the
 output is going to the bit bucket, and so doesn't bother to actually
 read the data.

Right.
But what about the commands cp $PG_DATA/base /dev/null or cat
$PG_DATA/base  /dev/null ?
They seem to do something.

-Stefan

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?

2012-02-29 Thread Stefan Keller

2012/2/29 Stefan Keller sfkel...@gmail.com:
 2012/2/29 Jeff Janes jeff.ja...@gmail.com:
 It's quite possible the vacuum full is thrashing your disk cache due
 to maintainance_work_mem. You can overcome this issue with the tar
 trick, which is more easily performed as:

 tar cf /dev/null $PG_DATA/base

 But on many implementations, that will not work.  tar detects the
 output is going to the bit bucket, and so doesn't bother to actually
 read the data.

 Right.
 But what about the commands cp $PG_DATA/base /dev/null or cat
 $PG_DATA/base  /dev/null ?
 They seem to do something.

...or let's try /dev/zero instead /dev/null:
tar cf /dev/zero $PG_DATA/base

-Stefan

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] Re: 回复： [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?

2012-02-28 Thread Stefan Keller

Hi Wales

2012/2/27 Wales Wang wormw...@yahoo.com wrote:
 There are many approach for PostgreSQL in-memory.
 The quick and easy way is making slave pgsql run on persistent RAM
 filesystem, the slave is part of master/slave replication cluster.

 The fstab and script make RAM file system persistent is below:
 Setup:
 First, create a mountpoint for the disk :
 mkdir /mnt/ramdisk
 Secondly, add this line to /etc/fstab in to mount the drive at boot-time.
 tmpfs   /mnt/ramdisk tmpfs  defaults,size=65536M 0 0
 #! /bin/sh
 # /etc/init.d/ramdisk.sh
 #

 case $1 in
   start)
     echo Copying files to ramdisk
     rsync -av /data/ramdisk-backup/ /mnt/ramdisk/
     echo [`date +%Y-%m-%d %H:%M`] Ramdisk Synched from HD 
 /var/log/ramdisk_sync.log
     ;;
   sync)
     echo Synching files from ramdisk to Harddisk
     echo [`date +%Y-%m-%d %H:%M`] Ramdisk Synched to HD 
 /var/log/ramdisk_sync.log
     rsync -av --delete --recursive --force /mnt/ramdisk/
 /data/ramdisk-backup/
     ;;
   stop)
     echo Synching logfiles from ramdisk to Harddisk
     echo [`date +%Y-%m-%d %H:%M`] Ramdisk Synched to HD 
 /var/log/ramdisk_sync.log
     rsync -av --delete --recursive --force /mnt/ramdisk/
 /data/ramdisk-backup/
     ;;
   *)
     echo Usage: /etc/init.d/ramdisk {start|stop|sync}
     exit 1
     ;;
 esac
 exit 0

 you can run it when startup and shutdown and crontabe hoursly.

 Wales Wang

Thank you for the tipp.
Making slave pgsql run on persistent RAM filesystem is surely at least
a possibility which I'll try out.

But what I'm finally after is a solution, where records don't get
pushed back to disk a.s.a.p. but rather got hold in memory as long as
possible assuming that there is enough memory.
I suspect that currently there is quite some overhead because of that
(besides disk-oriented structures).

-Stefan

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] Re: [PERFORM] Re: 回复： [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?

2012-02-28 Thread Claudio Freire

On Tue, Feb 28, 2012 at 5:30 AM, Stefan Keller sfkel...@gmail.com wrote:

 But what I'm finally after is a solution, where records don't get
 pushed back to disk a.s.a.p. but rather got hold in memory as long as
 possible assuming that there is enough memory.

fsync = off ?

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] Re: [PERFORM] Re: [PERFORM] Re: 回复： [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?

2012-02-28 Thread Tomas Vondra

On 28 Únor 2012, 14:08, Claudio Freire wrote:
 On Tue, Feb 28, 2012 at 5:30 AM, Stefan Keller sfkel...@gmail.com wrote:

 But what I'm finally after is a solution, where records don't get
 pushed back to disk a.s.a.p. but rather got hold in memory as long as
 possible assuming that there is enough memory.

 fsync = off ?

I don't think this is a viable idea, unless you don't care about the data.

Moreover, fsyn=off does not mean not writing and writing does not mean
removing from shared buffers. A page written/fsynced during a checkpoint
may stay in shared buffers.

AFAIK the pages are not removed from shared buffers without a reason. So a
dirty buffer is written to a disk (because it needs to, to keep ACID) but
stays in shared buffers as clean (unless it was written by a backend,
which means there's not enough memory).

Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] Re: [PERFORM] Re: [PERFORM] Re: 回复： [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?

2012-02-28 Thread Claudio Freire

On Tue, Feb 28, 2012 at 10:38 AM, Tomas Vondra t...@fuzzy.cz wrote:
 On 28 Únor 2012, 14:08, Claudio Freire wrote:
 On Tue, Feb 28, 2012 at 5:30 AM, Stefan Keller sfkel...@gmail.com wrote:

 But what I'm finally after is a solution, where records don't get
 pushed back to disk a.s.a.p. but rather got hold in memory as long as
 possible assuming that there is enough memory.

 fsync = off ?

 I don't think this is a viable idea, unless you don't care about the data.

Well, if you keep things in memory as long as possible (as per the
quoted message), then you don't care about memory. There's no way
memory-only DBs can provide ACID guarantees.

synchronous_commit=off goes half way there without sacrificing crash
recovery, which is another option.

 Moreover, fsyn=off does not mean not writing and writing does not mean
 removing from shared buffers. A page written/fsynced during a checkpoint
 may stay in shared buffers.

The OS will write in the background (provided there's enough memory,
which was an assumption on the quoted message). It will not interfere
with other operations, so, in any case, writing or not, you get what
you want.

 AFAIK the pages are not removed from shared buffers without a reason. So a
 dirty buffer is written to a disk (because it needs to, to keep ACID) but
 stays in shared buffers as clean (unless it was written by a backend,
 which means there's not enough memory).

Just writing is not enough. ACID requires fsync. If you don't fsync
(be it with synchronous_commit=off or fsync=off), then it's not full
ACID already.
Because a crash at a bad moment can always make your data nonpersistent.

That's an unavoidable result of keeping things in memory.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] Re: [PERFORM] Re: [PERFORM] Re: [PERFORM] Re: 回复： [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?

2012-02-28 Thread Tomas Vondra

On 28 Únor 2012, 14:52, Claudio Freire wrote:
 On Tue, Feb 28, 2012 at 10:38 AM, Tomas Vondra t...@fuzzy.cz wrote:
 On 28 Únor 2012, 14:08, Claudio Freire wrote:
 On Tue, Feb 28, 2012 at 5:30 AM, Stefan Keller sfkel...@gmail.com
 wrote:

 But what I'm finally after is a solution, where records don't get
 pushed back to disk a.s.a.p. but rather got hold in memory as long as
 possible assuming that there is enough memory.

 fsync = off ?

 I don't think this is a viable idea, unless you don't care about the
 data.

 Well, if you keep things in memory as long as possible (as per the
 quoted message), then you don't care about memory. There's no way
 memory-only DBs can provide ACID guarantees.

 synchronous_commit=off goes half way there without sacrificing crash
 recovery, which is another option.

 Moreover, fsyn=off does not mean not writing and writing does not
 mean
 removing from shared buffers. A page written/fsynced during a
 checkpoint
 may stay in shared buffers.

 The OS will write in the background (provided there's enough memory,
 which was an assumption on the quoted message). It will not interfere
 with other operations, so, in any case, writing or not, you get what
 you want.

 AFAIK the pages are not removed from shared buffers without a reason. So
 a
 dirty buffer is written to a disk (because it needs to, to keep ACID)
 but
 stays in shared buffers as clean (unless it was written by a backend,
 which means there's not enough memory).

 Just writing is not enough. ACID requires fsync. If you don't fsync
 (be it with synchronous_commit=off or fsync=off), then it's not full
 ACID already.
 Because a crash at a bad moment can always make your data nonpersistent.

I haven't said writing is sufficient for ACID, I said it's required. Which
is kind of obvious because of the durability part.

 That's an unavoidable result of keeping things in memory.

Why? IIRC the OP was interested in keeping the data in memory for querying
and that the database is read-only after it's populated with data (once a
day). How does writing the transactional logs / data files properly
interfere with that?

I haven't investigated why exactly the data are not cached initially, but
none of the options that I can think of could be fixed by setting
fsync=off. That's something that influences writes (not read-only
database) and I don't think it influences how buffers are evicted from
shared buffers / page cache.

It might speed up the initial load of data, but that's not what the OP was
asking.

kind regards
Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] Re: [PERFORM] Re: [PERFORM] Re: [PERFORM] Re: 回复： [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?

2012-02-28 Thread Claudio Freire

On Tue, Feb 28, 2012 at 11:15 AM, Tomas Vondra t...@fuzzy.cz wrote:
 I haven't investigated why exactly the data are not cached initially, but
 none of the options that I can think of could be fixed by setting
 fsync=off. That's something that influences writes (not read-only
 database) and I don't think it influences how buffers are evicted from
 shared buffers / page cache.

 It might speed up the initial load of data, but that's not what the OP was
 asking.

It speeds a lot more than the initial load of data.

Assuming the database is read-only, but not the filesystem (ie: it's
not a slave, in which case all this is moot, as you said, there are no
writes on a slave). That is, assuming this is a read-only master, then
read-only queries don't mean read-only filesystem. Bookkeeping tasks
like updating catalog dbs, statistics tables, page cleanup, stuff like
that can actually result in writes.

Writes that go through the WAL and then the filesystem.

With fsync=off, those writes happen on the background, and are carried
out by the OS. Effectively releasing postgres from having to wait on
them, and, assuming there's enough RAM, merging repeated writes to the
same sectors in one operation in the end. For stats, bookkeeping, and
who knows what else, the merging would be quite effective. With enough
RAM to hold the entire DB, the merging would effectively keep
everything in RAM (in system buffers) until there's enough I/O
bandwidth to transparently push that to persistent storage.

In essence, what was required, to keep everything in RAM for as much
as possible.

It *does* in the same way affect buffer eviction - it makes eviction
*very* quick, and re-population equally as quick, if everything fits
into memory.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] Re: 回复： [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?

2012-02-28 Thread Jeff Janes

On Tue, Feb 28, 2012 at 12:30 AM, Stefan Keller sfkel...@gmail.com wrote:

 Thank you for the tipp.
 Making slave pgsql run on persistent RAM filesystem is surely at least
 a possibility which I'll try out.

 But what I'm finally after is a solution, where records don't get
 pushed back to disk a.s.a.p. but rather got hold in memory as long as
 possible assuming that there is enough memory.

That is already the case.  There are two separate issues, when dirty
data is written to disk, and when clean data is dropped from memory.
The only connection between them is that dirty data can't just be
dropped, it must be written first.  But have written it, there is no
reason to immediately drop it.  When a checkpoint cleans data from the
shard_buffers, that now-clean data remains in shared_buffers.  And at
the OS level, when an fsync forces dirty data out to disk, the
now-clean data generally remains in cache (although I've seen nfs
implementations where that was not the case).

It is hard to figure out what problem you are facing.  Is your data
not getting loaded into cache, or is it not staying there?

Cheers,

Jeff

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] Re: [PERFORM] Re: [PERFORM] Re: [PERFORM] Re: [PERFORM] Re: 回复： [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?

2012-02-28 Thread Tomas Vondra

On 28 Únor 2012, 15:24, Claudio Freire wrote:
 On Tue, Feb 28, 2012 at 11:15 AM, Tomas Vondra t...@fuzzy.cz wrote:
 I haven't investigated why exactly the data are not cached initially,
 but
 none of the options that I can think of could be fixed by setting
 fsync=off. That's something that influences writes (not read-only
 database) and I don't think it influences how buffers are evicted from
 shared buffers / page cache.

 It might speed up the initial load of data, but that's not what the OP
 was
 asking.

 It speeds a lot more than the initial load of data.

 Assuming the database is read-only, but not the filesystem (ie: it's
 not a slave, in which case all this is moot, as you said, there are no
 writes on a slave). That is, assuming this is a read-only master, then
 read-only queries don't mean read-only filesystem. Bookkeeping tasks
 like updating catalog dbs, statistics tables, page cleanup, stuff like
 that can actually result in writes.

 Writes that go through the WAL and then the filesystem.

I'm not sure what maintenance tasks you mean. Sure, there are tasks that
need to be performed after the load (stats, hint bits, updating system
catalogs etc.) but this may happen once right after the load and then
there's effectively zero write activity. Unless the database needs to
write temp files, but that contradicts the 'fits into RAM' assumption ...

 With fsync=off, those writes happen on the background, and are carried
 out by the OS. Effectively releasing postgres from having to wait on
 them, and, assuming there's enough RAM, merging repeated writes to the
 same sectors in one operation in the end. For stats, bookkeeping, and
 who knows what else, the merging would be quite effective. With enough
 RAM to hold the entire DB, the merging would effectively keep
 everything in RAM (in system buffers) until there's enough I/O
 bandwidth to transparently push that to persistent storage.

The writes are always carried out by the OS - except when dirty_ratio is
exceeded (but that's a different story) and WAL with direct I/O enabled.
The best way to allow merging the writes in shared buffers or page cache
is to set the checkpoint_segments / checkpoint_timeout high enough.

That way the transactions won't need to wait for writes to data files
(which is the part related to evictions of buffers from cache). And
read-only transactions won't need to wait at all because they don't need
to wait for fsync on WAL.

 In essence, what was required, to keep everything in RAM for as much
 as possible.

 It *does* in the same way affect buffer eviction - it makes eviction
 *very* quick, and re-population equally as quick, if everything fits
 into memory.

No it doesn't. Only a write caused by a background process (due to full
shared buffers) means immediate eviction. A simple write (caused by a
checkpoint) does not evict the page from shared buffers. Not even a
background writer evicts a page from shared buffers, it merely marks them
as 'clean' and leaves them there. And all those writes happen on the
background, so the clients don't need to wait for them to complete (except
for xlog checkpoints).

kind regards
Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] Re: [PERFORM] Re: [PERFORM] Re: [PERFORM] Re: [PERFORM] Re: 回复： [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?

2012-02-28 Thread Claudio Freire

On Tue, Feb 28, 2012 at 1:05 PM, Tomas Vondra t...@fuzzy.cz wrote:
 On 28 Únor 2012, 15:24, Claudio Freire wrote:
 It speeds a lot more than the initial load of data.

 Assuming the database is read-only, but not the filesystem (ie: it's
 not a slave, in which case all this is moot, as you said, there are no
 writes on a slave). That is, assuming this is a read-only master, then
 read-only queries don't mean read-only filesystem. Bookkeeping tasks
 like updating catalog dbs, statistics tables, page cleanup, stuff like
 that can actually result in writes.

 Writes that go through the WAL and then the filesystem.

 I'm not sure what maintenance tasks you mean. Sure, there are tasks that
 need to be performed after the load (stats, hint bits, updating system
 catalogs etc.) but this may happen once right after the load and then
 there's effectively zero write activity. Unless the database needs to
 write temp files, but that contradicts the 'fits into RAM' assumption ...

AFAIK, stats need to be constantly updated.
Not sure about the rest.

And yes, it's quite possible to require temp files without a database
that doesn't fit in memory, only big OLAP-style queries and small
enough work_mem.

 The writes are always carried out by the OS - except when dirty_ratio is
 exceeded (but that's a different story) and WAL with direct I/O enabled.
 The best way to allow merging the writes in shared buffers or page cache
 is to set the checkpoint_segments / checkpoint_timeout high enough.
 That way the transactions won't need to wait for writes to data files
 (which is the part related to evictions of buffers from cache). And
 read-only transactions won't need to wait at all because they don't need
 to wait for fsync on WAL.

Exactly

 In essence, what was required, to keep everything in RAM for as much
 as possible.

 It *does* in the same way affect buffer eviction - it makes eviction
 *very* quick, and re-population equally as quick, if everything fits
 into memory.

 No it doesn't. Only a write caused by a background process (due to full
 shared buffers) means immediate eviction. A simple write (caused by a
 checkpoint) does not evict the page from shared buffers. Not even a
 background writer evicts a page from shared buffers, it merely marks them
 as 'clean' and leaves them there. And all those writes happen on the
 background, so the clients don't need to wait for them to complete (except
 for xlog checkpoints).

So, we're saying the same.

With all that, and enough RAM, it already does what was requested.

Maybe it would help to tune shared_buffers-to-os-cache ratio, and
dirty_ratio to allow a big portion of RAM used for write caching (if
there were enough writes which I doubt), but, in essence, un
unmodified postgres installation with enough RAM to hold the whole DB
+ shared buffers in RAM should perform quite optimally.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?

2012-02-28 Thread Claudio Freire

On Tue, Feb 28, 2012 at 5:48 PM, Stefan Keller sfkel...@gmail.com wrote:
 P.S. And yes, the database is aka 'read-only' and truncated and
 re-populated from scratch every night. fsync is off so I don't care
 about ACID. After the indexes on name, hstore and geometry are
 generated I do a VACUUM FULL FREEZE. The current installation is a
 virtual machine with 4GB memory and the filesystem is read/write.
 The future machine will be a pizza box with 72GB memory.

I don't get this. Something's wrong.

In the OP, you say There is enough main memory to hold all table
contents.. I'm assuming, there you refer to your current system, with
4GB memory.

So your data is less than 4GB, but then you'll be throwing a 72GB
server? It's either tremendous overkill, or your data simply isn't
less than 4GB.

It's quite possible the vacuum full is thrashing your disk cache due
to maintainance_work_mem. You can overcome this issue with the tar
trick, which is more easily performed as:

tar cf /dev/null $PG_DATA/base

tar will read all the table's contents and populate the OS cache. From
there to shared_buffers it should be very very quick. If it is true
that your data fits in 4GB, then that should fix it all. Beware,
whatever you allocate to shared buffers will be redundantly loaded
into RAM, first in shared buffers, then in the OS cache. So your data
has to fit in 4GB - shared buffers.

I don't think query-based tricks will load everything into RAM,
because you will get sequential scans and not index scans - the
indices will remain uncached. If you forced an index scan, it would
have to read the whole index in random order (random I/O), and that
would be horribly slow. The best way is to tar the whole database into
/dev/null and be done with it.

Another option is to issue a simple vacuum after the vacuum full.
Simple vacuum will just scan the tables and indices, I'm hoping doing
nothing since the vacuum full will have cleaned everything already,
but loading everything both in the OS cache and into shared_buffers.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?

2012-02-28 Thread Stefan Keller

2012/2/28 Claudio Freire klaussfre...@gmail.com:
 On Tue, Feb 28, 2012 at 5:48 PM, Stefan Keller sfkel...@gmail.com wrote:
 P.S. And yes, the database is aka 'read-only' and truncated and
 re-populated from scratch every night. fsync is off so I don't care
 about ACID. After the indexes on name, hstore and geometry are
 generated I do a VACUUM FULL FREEZE. The current installation is a
 virtual machine with 4GB memory and the filesystem is read/write.
 The future machine will be a pizza box with 72GB memory.

 I don't get this. Something's wrong.

 In the OP, you say There is enough main memory to hold all table
 contents.. I'm assuming, there you refer to your current system, with
 4GB memory.

Sorry for the confusion: I'm doing these tests on this machine with
one table (osm_point) and one country. This table has a size of 2.6GB
and 10 million tuples. The other machine has to deal with at least 5
tables in total and will be hold more than one country plus routing
etc..

-Stefan

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] 回复： [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?

2012-02-27 Thread Wales Wang

There are many approach for PostgreSQL in-memory.

The quick and easy way is making slave pgsql run on persistent RAM filesystem, 
the slave is part of master/slave replication cluster.
 
The fstab and script make RAM file system persistent is below:
Setup:
First, create a mountpoint for the disk : 
mkdir /mnt/ramdisk
Secondly, add this line to /etc/fstab in to mount the drive at boot-time. 
tmpfs   /mnt/ramdisk tmpfs  defaults,size=65536M 0 0

#! /bin/sh 
# /etc/init.d/ramdisk.sh
#
 
case $1 in
  start)
    echo Copying files to ramdisk
    rsync -av /data/ramdisk-backup/ /mnt/ramdisk/
    echo [`date +%Y-%m-%d %H:%M`] Ramdisk Synched from HD  
/var/log/ramdisk_sync.log
    ;;
  sync)
    echo Synching files from ramdisk to Harddisk
    echo [`date +%Y-%m-%d %H:%M`] Ramdisk Synched to HD  
/var/log/ramdisk_sync.log
    rsync -av --delete --recursive --force /mnt/ramdisk/ /data/ramdisk-backup/
    ;;
  stop)
    echo Synching logfiles from ramdisk to Harddisk
    echo [`date +%Y-%m-%d %H:%M`] Ramdisk Synched to HD  
/var/log/ramdisk_sync.log
    rsync -av --delete --recursive --force /mnt/ramdisk/ /data/ramdisk-backup/
    ;;
  *)
    echo Usage: /etc/init.d/ramdisk {start|stop|sync}
    exit 1
    ;;
esac
exit 0
 
you can run it when startup and shutdown and crontabe hoursly.
 
Wales Wang 


 发件人： Jeff Janes jeff.ja...@gmail.com
收件人： Stefan Keller sfkel...@gmail.com 
抄送： Wales Wang wormw...@yahoo.com; pgsql-performance@postgresql.org; Stephen 
Frost sfr...@snowman.net 
发送日期： 2012年2月27日, 星期一, 上午 6:34
主题: Re: [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? 
How to read in all tuples into memory?
  
On Sun, Feb 26, 2012 at 2:56 AM, Stefan Keller sfkel...@gmail.com wrote:
 Hi Jeff and Wales,

 2012/2/26 Jeff Janes jeff.ja...@gmail.com wrote:
 The problem is that the initial queries are too slow - and there is no
 second chance. I do have to trash the buffer every night. There is
 enough main memory to hold all table contents.

 Just that table, or the entire database?

 The entire database consisting of only about 5 tables which are
 similar but with different geometry types plus a relations table (as
 OpenStreetMap calls it).

And all of those combined fit in RAM?  With how much to spare?


 1. How can I warm up or re-populate shared buffers of Postgres?

 Instead, warm the OS cache. 燭hen data will get transferred into the
 postgres shared_buffers pool from the OS cache very quickly.

 tar -c $PGDATA/base/ |wc -c

 Ok. So with OS cache you mean the files which to me are THE database itself?

Most operating systems will use any otherwise unused RAM to cache
recently accessed file-system data.  That is the OS cache.  The
purpose of the tar is to populate the OS cache with the database
itself.  That way, when postgres wants something that isn't already
in shared_buffers, it doesn't require a disk read to get it, just a
request to the OS.

But this trick is most useful after the OS has been restarted so the
OS cache is empty.  If the OS has been up for a long time, then why
isn't it already populated with the data you need?  Maybe the data
doesn't fit, maybe some other process has trashed the cache (in which
case, why would it not continue to trash the cache on an ongoing
basis?)

Since you just recently created the tables and indexes, they must have
passed through the OS cache on the way to disk.  So why aren't they
still there?  Is shared_buffers so large that little RAM is left over
for the OS?  Did you reboot the OS?  Are there other processes running
that drive the database-specific files out of the OS cache?

 A cache to me is a second storage with controlled redudancy because
 of performance reasons.

Yeah.  But there are multiple caches, with different parties in
control and different opinions of what is redundant.

 2. Are there any hints on how to tell Postgres to read in all table
 contents into memory?

 I don't think so, at least not in core. 營've wondered if it would
 make sense to suppress ring-buffer strategy when there are buffers on
 the free-list. 燭hat way a sequential scan would populate
 shared_buffers after a restart. 燘ut it wouldn't help you get the
 indexes into cache.

 So, are there any developments going on with PostgreSQL as Stephen
 suggested in the former thread?

I don't see any active development for the upcoming release, and most
of what has been suggested wouldn't help you because they are about
re-populating the cache with previously hot data, while you are
destroying your previously hot data and wanting to specify the
future-hot data.

By the way, your explain plan would be more useful if it included
buffers.  Explain (analyze, buffers) select...

I don't know that it is ever better to run analyze without buffers,
other than for backwards compatibility.  I'm trying to get in the
habit of just automatically doing it.

Cheers,

Jeff

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)

Re: [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?

2012-02-27 Thread Cédric Villemain

Le dimanche 26 février 2012 01:16:08, Stefan Keller a écrit :
 Hi,
 
 2011/10/24 Stephen Frost sfr...@snowman.net wrote
 
  Now, we've also been discussing ways to have PG automatically
  re-populate shared buffers and possibly OS cache based on what was in
  memory at the time of the last shut-down, but I'm not sure that would
  help your case either since you're rebuilding everything every night and
  that's what's trashing your buffers (because everything ends up getting
  moved around).  You might actually want to consider if that's doing more
  harm than good for you.  If you weren't doing that, then the cache
  wouldn't be getting destroyed every night..
 
 I'd like to come back on the issue of aka of in-memory key-value database.
 
 To remember, it contains table definition and queries as indicated in
 the appendix [0]. There exist 4 other tables of similar structure.
 There are indexes on each column. The tables contain around 10 million
 tuples. The database is read-only; it's completely updated every
 day. I don't expect more than 5 concurrent users at any time. A
 typical query looks like [1] and varies in an unforeseable way (that's
 why hstore is used). EXPLAIN tells me that the indexes are used [2].
 
 The problem is that the initial queries are too slow - and there is no
 second chance. I do have to trash the buffer every night. There is
 enough main memory to hold all table contents.
 
 1. How can I warm up or re-populate shared buffers of Postgres?

There was a patch proposed for postgresql which purpose was to 
snapshot/Restore postgresql buffers, but it is still not sure how far that 
really help to have that part loaded.

 2. Are there any hints on how to tell Postgres to read in all table
 contents into memory?

I wrote pgfincore for the OS part: you can use it to preload table/index in OS 
cache, and do snapshot/restore if you want fine grain control of what part of 
the object you want to warm.
https://github.com/klando/pgfincore


 
 Yours, Stefan
 
 
 APPENDIX
 
 [0]
 CREATE TABLE osm_point (
  osm_id integer,
  name text,
  tags hstore
  geom geometry(Point,4326)
 );
 
 
 [1]
 SELECT osm_id, name FROM osm_point
   WHERE tags @ 'tourism=viewpoint'
   AND ST_Contains(
 GeomFromText('BOX(8.42 47.072, 9.088 47.431)'::box2d, 4326),
 geom)
 
 [2]
 EXPLAIN ANALYZE returns:
  Bitmap Heap Scan on osm_point  (cost=402.15..40465.85 rows=430
 width=218) (actual time=121.888..137.
Recheck Cond: (tags @ 'tourism=viewpoint'::hstore)
Filter: (('01030...'::geometry  geom) AND
 _st_contains('01030'::geometry, geom))
-  Bitmap Index Scan on osm_point_tags_idx  (cost=0.00..402.04
 rows=11557 width=0) (actual time=1 6710 loops=1)
  Index Cond: (tags @ 'tourism=viewpoint'::hstore)
  Total runtime: 137.881 ms
 (6 rows)

-- 
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] Re： [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?

2012-02-26 Thread Wales Wang

You can try PostgreSQL 9.x master/slave replication, then try run slave 
on persistent RAM Fileystem(tmpfs)
So, access your all data from slave PostgreSQL that run on tmpfs..
 


 发件人： Jeff Janes jeff.ja...@gmail.com
收件人： Stefan Keller sfkel...@gmail.com 
抄送： pgsql-performance@postgresql.org; Stephen Frost sfr...@snowman.net 
发送日期： 2012年2月26日, 星期日, 上午 10:13
主题: Re: [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? 
How to read in all tuples into memory?
  
On Sat, Feb 25, 2012 at 4:16 PM, Stefan Keller sfkel...@gmail.com wrote:

 I'd like to come back on the issue of aka of in-memory key-value database.

 To remember, it contains table definition and queries as indicated in
 the appendix [0]. There exist 4 other tables of similar structure.
 There are indexes on each column. The tables contain around 10 million
 tuples. The database is read-only; it's completely updated every
 day. I don't expect more than 5 concurrent users at any time. A
 typical query looks like [1] and varies in an unforeseable way (that's
 why hstore is used). EXPLAIN tells me that the indexes are used [2].

 The problem is that the initial queries are too slow - and there is no
 second chance. I do have to trash the buffer every night. There is
 enough main memory to hold all table contents.

Just that table, or the entire database?


 1. How can I warm up or re-populate shared buffers of Postgres?

Instead, warm the OS cache.  Then data will get transferred into the
postgres shared_buffers pool from the OS cache very quickly.

tar -c $PGDATA/base/ |wc -c

If you need to warm just one table, because the entire base directory
won't fit in OS cache, then you need to do a bit more work to find out
which files to use.

You might feel clever and try this instead:

tar -c /dev/null $PGDATA/base/  /dev/null

But my tar program is too clever by half.  It detects that it is
writing to /dev/null, and just does not actually read the data.

 2. Are there any hints on how to tell Postgres to read in all table
 contents into memory?

I don't think so, at least not in core.  I've wondered if it would
make sense to suppress ring-buffer strategy when there are buffers on
the free-list.  That way a sequential scan would populate
shared_buffers after a restart.  But it wouldn't help you get the
indexes into cache.

Cheers,

Jeff

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?

2012-02-26 Thread Stefan Keller

Hi Jeff and Wales,

2012/2/26 Jeff Janes jeff.ja...@gmail.com wrote:
 The problem is that the initial queries are too slow - and there is no
 second chance. I do have to trash the buffer every night. There is
 enough main memory to hold all table contents.

 Just that table, or the entire database?

The entire database consisting of only about 5 tables which are
similar but with different geometry types plus a relations table (as
OpenStreetMap calls it).

 1. How can I warm up or re-populate shared buffers of Postgres?

 Instead, warm the OS cache.  Then data will get transferred into the
 postgres shared_buffers pool from the OS cache very quickly.

 tar -c $PGDATA/base/ |wc -c

Ok. So with OS cache you mean the files which to me are THE database itself?
A cache to me is a second storage with controlled redudancy because
of performance reasons.

 2. Are there any hints on how to tell Postgres to read in all table
 contents into memory?

 I don't think so, at least not in core.  I've wondered if it would
 make sense to suppress ring-buffer strategy when there are buffers on
 the free-list.  That way a sequential scan would populate
 shared_buffers after a restart.  But it wouldn't help you get the
 indexes into cache.

So, are there any developments going on with PostgreSQL as Stephen
suggested in the former thread?

2012/2/26 Wales Wang wormw...@yahoo.com:
 You can try PostgreSQL 9.x master/slave replication, then try run slave
 on persistent RAM Fileystem (tmpfs)
 So, access your all data from slave PostgreSQL that run on tmpfs..

Nice idea.
I do have a single upscaled server and up to now I hesitated to
allocate say 48 Gigabytes (out of 72) to such a RAM Fileystem (tmpfs).

Still, would'nt it be more flexible when I could dynamically instruct
PostgreSQL to behave like an in-memory database?

Yours, Stefan

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?

2012-02-26 Thread Stephen Frost

* Stefan Keller (sfkel...@gmail.com) wrote:
 So, are there any developments going on with PostgreSQL as Stephen
 suggested in the former thread?

While the idea has been getting kicked around, I don't know of anyone
actively working on developing code to implement it.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?

2012-02-26 Thread Andy Colson


On 02/25/2012 06:16 PM, Stefan Keller wrote:


1. How can I warm up or re-populate shared buffers of Postgres?
2. Are there any hints on how to tell Postgres to read in all table
contents into memory?

Yours, Stefan



How about after you load the data, vacuum freeze it, then do something like:

SELECT count(*) FROM osm_point WHERE tags @ 'tourism=junk'

-Andy



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?

2012-02-26 Thread Stefan Keller

2012/2/26 Andy Colson a...@squeakycode.net wrote:
 On 02/25/2012 06:16 PM, Stefan Keller wrote:
 1. How can I warm up or re-populate shared buffers of Postgres?
 2. Are there any hints on how to tell Postgres to read in all table
 contents into memory?

 Yours, Stefan

 How about after you load the data, vacuum freeze it, then do something like:

 SELECT count(*) FROM osm_point WHERE tags @ 'tourism=junk'

 -Andy

That good idea is what I proposed elsewhere on one of the PG lists and
got told that this does'nt help.

I can accept this approach that users should'nt directly interfere
with the optimizer. But I think it's still worth to discuss a
configuration option (per table) or so which tells PG that this table
contents should fit into memory so that it tries to load a table into
memory and keeps it there. This option probably only makes sense in
combination with unlogged tables.

Yours, Stefan

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?

2012-02-26 Thread Andy Colson


On 02/26/2012 01:11 PM, Stefan Keller wrote:

2012/2/26 Andy Colsona...@squeakycode.net  wrote:

On 02/25/2012 06:16 PM, Stefan Keller wrote:

1. How can I warm up or re-populate shared buffers of Postgres?
2. Are there any hints on how to tell Postgres to read in all table
contents into memory?

Yours, Stefan


How about after you load the data, vacuum freeze it, then do something like:

SELECT count(*) FROM osm_point WHERE tags @  'tourism=junk'

-Andy


That good idea is what I proposed elsewhere on one of the PG lists and
got told that this does'nt help.

I can accept this approach that users should'nt directly interfere
with the optimizer. But I think it's still worth to discuss a
configuration option (per table) or so which tells PG that this table
contents should fit into memory so that it tries to load a table into
memory and keeps it there. This option probably only makes sense in
combination with unlogged tables.

Yours, Stefan



I don't buy that.  Did you test it?  Who/where did you hear this?  And... how 
long does it take after you replace the entire table until things are good and 
cached?  One or two queries?

After a complete reload of the data, do you vacuum freeze it?

After a complete reload of the data, how long until its fast?

-Andy

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?

2012-02-26 Thread Cédric Villemain

Le dimanche 26 février 2012 01:16:08, Stefan Keller a écrit :
 Hi,
 
 2011/10/24 Stephen Frost sfr...@snowman.net wrote
 
  Now, we've also been discussing ways to have PG automatically
  re-populate shared buffers and possibly OS cache based on what was in
  memory at the time of the last shut-down, but I'm not sure that would
  help your case either since you're rebuilding everything every night and
  that's what's trashing your buffers (because everything ends up getting
  moved around).  You might actually want to consider if that's doing more
  harm than good for you.  If you weren't doing that, then the cache
  wouldn't be getting destroyed every night..
 
 I'd like to come back on the issue of aka of in-memory key-value database.
 
 To remember, it contains table definition and queries as indicated in
 the appendix [0]. There exist 4 other tables of similar structure.
 There are indexes on each column. The tables contain around 10 million
 tuples. The database is read-only; it's completely updated every
 day. I don't expect more than 5 concurrent users at any time. A
 typical query looks like [1] and varies in an unforeseable way (that's
 why hstore is used). EXPLAIN tells me that the indexes are used [2].
 
 The problem is that the initial queries are too slow - and there is no
 second chance. I do have to trash the buffer every night. There is
 enough main memory to hold all table contents.
 
 1. How can I warm up or re-populate shared buffers of Postgres?

There was a patch proposed for postgresql which purpose was to 
snapshot/Restore postgresql buffers, but it is still not sure how far that 
really help to have that part loaded.

 2. Are there any hints on how to tell Postgres to read in all table
 contents into memory?

I wrote pgfincore for the OS part: you can use it to preload table/index in OS 
cache, and do snapshot/restore if you want fine grain control of what part of 
the object you want to warm.
https://github.com/klando/pgfincore


 
 Yours, Stefan
 
 
 APPENDIX
 
 [0]
 CREATE TABLE osm_point (
  osm_id integer,
  name text,
  tags hstore
  geom geometry(Point,4326)
 );
 
 
 [1]
 SELECT osm_id, name FROM osm_point
   WHERE tags @ 'tourism=viewpoint'
   AND ST_Contains(
 GeomFromText('BOX(8.42 47.072, 9.088 47.431)'::box2d, 4326),
 geom)
 
 [2]
 EXPLAIN ANALYZE returns:
  Bitmap Heap Scan on osm_point  (cost=402.15..40465.85 rows=430
 width=218) (actual time=121.888..137.
Recheck Cond: (tags @ 'tourism=viewpoint'::hstore)
Filter: (('01030...'::geometry  geom) AND
 _st_contains('01030'::geometry, geom))
-  Bitmap Index Scan on osm_point_tags_idx  (cost=0.00..402.04
 rows=11557 width=0) (actual time=1 6710 loops=1)
  Index Cond: (tags @ 'tourism=viewpoint'::hstore)
  Total runtime: 137.881 ms
 (6 rows)

-- 
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?

2012-02-26 Thread Stefan Keller

Hi,

2012/2/26 Cédric Villemain ced...@2ndquadrant.fr wrote:
 1. How can I warm up or re-populate shared buffers of Postgres?

 There was a patch proposed for postgresql which purpose was to

Which patch are you referring to?

 snapshot/Restore postgresql buffers, but it is still not sure how far that
 really help to have that part loaded.

What's not sure and why?

 2. Are there any hints on how to tell Postgres to read in all table
 contents into memory?

 I wrote pgfincore for the OS part: you can use it to preload table/index in OS
 cache, and do snapshot/restore if you want fine grain control of what part of
 the object you want to warm.
 https://github.com/klando/pgfincore

Yes, now I remember. I have a look at that.

I'd still like to see something where PG really preloads tuples and
treats them always in-memory (given they fit into RAM).
Since I have a read-only database there's no WAL and locking needed.
But as soon as we allow writes I realize that the in-memory feature
needs to be coupled with other enhancements like replication (which
somehow would avoid WAL).

Yours, Stefan

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?

2012-02-26 Thread Jeff Janes

On Sun, Feb 26, 2012 at 2:56 AM, Stefan Keller sfkel...@gmail.com wrote:
 Hi Jeff and Wales,

 2012/2/26 Jeff Janes jeff.ja...@gmail.com wrote:
 The problem is that the initial queries are too slow - and there is no
 second chance. I do have to trash the buffer every night. There is
 enough main memory to hold all table contents.

 Just that table, or the entire database?

 The entire database consisting of only about 5 tables which are
 similar but with different geometry types plus a relations table (as
 OpenStreetMap calls it).

And all of those combined fit in RAM?  With how much to spare?


 1. How can I warm up or re-populate shared buffers of Postgres?

 Instead, warm the OS cache.  Then data will get transferred into the
 postgres shared_buffers pool from the OS cache very quickly.

 tar -c $PGDATA/base/ |wc -c

 Ok. So with OS cache you mean the files which to me are THE database itself?

Most operating systems will use any otherwise unused RAM to cache
recently accessed file-system data.  That is the OS cache.  The
purpose of the tar is to populate the OS cache with the database
itself.  That way, when postgres wants something that isn't already
in shared_buffers, it doesn't require a disk read to get it, just a
request to the OS.

But this trick is most useful after the OS has been restarted so the
OS cache is empty.  If the OS has been up for a long time, then why
isn't it already populated with the data you need?  Maybe the data
doesn't fit, maybe some other process has trashed the cache (in which
case, why would it not continue to trash the cache on an ongoing
basis?)

Since you just recently created the tables and indexes, they must have
passed through the OS cache on the way to disk.  So why aren't they
still there?  Is shared_buffers so large that little RAM is left over
for the OS?  Did you reboot the OS?  Are there other processes running
that drive the database-specific files out of the OS cache?

 A cache to me is a second storage with controlled redudancy because
 of performance reasons.

Yeah.  But there are multiple caches, with different parties in
control and different opinions of what is redundant.

 2. Are there any hints on how to tell Postgres to read in all table
 contents into memory?

 I don't think so, at least not in core.  I've wondered if it would
 make sense to suppress ring-buffer strategy when there are buffers on
 the free-list.  That way a sequential scan would populate
 shared_buffers after a restart.  But it wouldn't help you get the
 indexes into cache.

 So, are there any developments going on with PostgreSQL as Stephen
 suggested in the former thread?

I don't see any active development for the upcoming release, and most
of what has been suggested wouldn't help you because they are about
re-populating the cache with previously hot data, while you are
destroying your previously hot data and wanting to specify the
future-hot data.

By the way, your explain plan would be more useful if it included
buffers.  Explain (analyze, buffers) select...

I don't know that it is ever better to run analyze without buffers,
other than for backwards compatibility.  I'm trying to get in the
habit of just automatically doing it.

Cheers,

Jeff

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?

2012-02-25 Thread Stefan Keller

Hi,

2011/10/24 Stephen Frost sfr...@snowman.net wrote
 Now, we've also been discussing ways to have PG automatically
 re-populate shared buffers and possibly OS cache based on what was in
 memory at the time of the last shut-down, but I'm not sure that would
 help your case either since you're rebuilding everything every night and
 that's what's trashing your buffers (because everything ends up getting
 moved around).  You might actually want to consider if that's doing more
 harm than good for you.  If you weren't doing that, then the cache
 wouldn't be getting destroyed every night..

I'd like to come back on the issue of aka of in-memory key-value database.

To remember, it contains table definition and queries as indicated in
the appendix [0]. There exist 4 other tables of similar structure.
There are indexes on each column. The tables contain around 10 million
tuples. The database is read-only; it's completely updated every
day. I don't expect more than 5 concurrent users at any time. A
typical query looks like [1] and varies in an unforeseable way (that's
why hstore is used). EXPLAIN tells me that the indexes are used [2].

The problem is that the initial queries are too slow - and there is no
second chance. I do have to trash the buffer every night. There is
enough main memory to hold all table contents.

1. How can I warm up or re-populate shared buffers of Postgres?
2. Are there any hints on how to tell Postgres to read in all table
contents into memory?

Yours, Stefan


APPENDIX

[0]
CREATE TABLE osm_point (
 osm_id integer,
 name text,
 tags hstore
 geom geometry(Point,4326)
);


[1]
SELECT osm_id, name FROM osm_point
  WHERE tags @ 'tourism=viewpoint'
  AND ST_Contains(
GeomFromText('BOX(8.42 47.072, 9.088 47.431)'::box2d, 4326),
geom)

[2]
EXPLAIN ANALYZE returns:
 Bitmap Heap Scan on osm_point  (cost=402.15..40465.85 rows=430
width=218) (actual time=121.888..137.
   Recheck Cond: (tags @ 'tourism=viewpoint'::hstore)
   Filter: (('01030...'::geometry  geom) AND
_st_contains('01030'::geometry, geom))
   -  Bitmap Index Scan on osm_point_tags_idx  (cost=0.00..402.04
rows=11557 width=0) (actual time=1 6710 loops=1)
 Index Cond: (tags @ 'tourism=viewpoint'::hstore)
 Total runtime: 137.881 ms
(6 rows)

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?

2012-02-25 Thread Jeff Janes

On Sat, Feb 25, 2012 at 4:16 PM, Stefan Keller sfkel...@gmail.com wrote:

 I'd like to come back on the issue of aka of in-memory key-value database.

 To remember, it contains table definition and queries as indicated in
 the appendix [0]. There exist 4 other tables of similar structure.
 There are indexes on each column. The tables contain around 10 million
 tuples. The database is read-only; it's completely updated every
 day. I don't expect more than 5 concurrent users at any time. A
 typical query looks like [1] and varies in an unforeseable way (that's
 why hstore is used). EXPLAIN tells me that the indexes are used [2].

 The problem is that the initial queries are too slow - and there is no
 second chance. I do have to trash the buffer every night. There is
 enough main memory to hold all table contents.

Just that table, or the entire database?


 1. How can I warm up or re-populate shared buffers of Postgres?

Instead, warm the OS cache.  Then data will get transferred into the
postgres shared_buffers pool from the OS cache very quickly.

tar -c $PGDATA/base/ |wc -c

If you need to warm just one table, because the entire base directory
won't fit in OS cache, then you need to do a bit more work to find out
which files to use.

You might feel clever and try this instead:

tar -c /dev/null $PGDATA/base/  /dev/null

But my tar program is too clever by half.  It detects that it is
writing to /dev/null, and just does not actually read the data.

 2. Are there any hints on how to tell Postgres to read in all table
 contents into memory?

I don't think so, at least not in core.  I've wondered if it would
make sense to suppress ring-buffer strategy when there are buffers on
the free-list.  That way a sequential scan would populate
shared_buffers after a restart.  But it wouldn't help you get the
indexes into cache.

Cheers,

Jeff

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

41 matches

Mail list logo