Re: [PERFORM] B-Heaps

2010-06-15 Thread A. Kretschmer
In response to Greg Smith :
 For details about what the database does there, see Inside the 
 PostgreSQL Buffer Cache at http://projects.2ndquadrant.com/talks

Nice paper, btw., thanks for that!


Regards, Andreas
-- 
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: - Header)
GnuPG: 0x31720C99, 1006 CCB4 A326 1D42 6431  2EB0 389D 1DC2 3172 0C99

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



Re: [PERFORM] B-Heaps

2010-06-15 Thread Matthew Wakeling

On Mon, 14 Jun 2010, Eliot Gable wrote:

Just curious if this would apply to PostgreSQL:
http://queue.acm.org/detail.cfm?id=1814327


Absolutely, and I said in 
http://archives.postgresql.org/pgsql-performance/2010-03/msg00272.php
but applied to the Postgres B-tree indexes instead of heaps. It's a pretty 
obvious performance improvement really - the principle is that when you do 
have to fetch a page from a slower medium, you may as well make it count 
for a lot.


Lots of research has already been done on this - the paper linked above is 
rather behind the times.


However, AFAIK, Postgres has not implemented this in any of its indexing 
systems.


Matthew

--
An ant doesn't have a lot of processing power available to it. I'm not trying
to be speciesist - I wouldn't want to detract you from such a wonderful
creature, but, well, there isn't a lot there, is there?
   -- Computer Science Lecturer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL as a local in-memory cache

2010-06-15 Thread Steve Wampler

Chris Browne wrote:

jgard...@jonathangardner.net jgard...@jonathangardner.net writes:

My question is how can I configure the database to run as quickly as
possible if I don't care about data consistency or durability? That
is, the data is updated so often and it can be reproduced fairly
rapidly so that if there is a server crash or random particles from
space mess up memory we'd just restart the machine and move on.


For such a scenario, I'd suggest you:

- Set up a filesystem that is memory-backed.  On Linux, RamFS or TmpFS
  are reasonable options for this.

- The complication would be that your restart the machine and move
  on needs to consist of quite a few steps:

  - recreating the filesystem
  - fixing permissions as needed
  - running initdb to set up new PG instance
  - automating any needful fiddling with postgresql.conf, pg_hba.conf
  - starting up that PG instance
  - creating users, databases, schemas, ...


Doesn't PG now support putting both WAL and user table files onto
file systems other than the one holding the PG config files and PG
'admin' tables?  Wouldn't doing so simplify the above considertably
by allowing just the WAL and user tables on the memory-backed file
systems?  I wouldn't think the performance impact of leaving
the rest of the stuff on disk would be that large.

Or does losing WAL files mandate a new initdb?

--
Steve Wampler -- swamp...@noao.edu
The gods that smiled on your birth are now laughing out loud.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL as a local in-memory cache

2010-06-15 Thread Craig James

[oops, didn't hit reply to list first time, resending...]

On 6/15/10 9:02 AM, Steve Wampler wrote:

Chris Browne wrote:

jgard...@jonathangardner.net jgard...@jonathangardner.net writes:

My question is how can I configure the database to run as quickly as
possible if I don't care about data consistency or durability? That
is, the data is updated so often and it can be reproduced fairly
rapidly so that if there is a server crash or random particles from
space mess up memory we'd just restart the machine and move on.


For such a scenario, I'd suggest you:

- Set up a filesystem that is memory-backed. On Linux, RamFS or TmpFS
are reasonable options for this.

- The complication would be that your restart the machine and move
on needs to consist of quite a few steps:

- recreating the filesystem
- fixing permissions as needed
- running initdb to set up new PG instance
- automating any needful fiddling with postgresql.conf, pg_hba.conf
- starting up that PG instance
- creating users, databases, schemas, ...


How about this: Set up a database entirely on a RAM disk, then install a 
WAL-logging warm standby.  If the production computer goes down, you bring the 
warm standby online, shut it down, and use tar(1) to recreate the database on 
the production server when you bring it back online.  You have speed and you 
have near-100% backup.

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL as a local in-memory cache

2010-06-15 Thread jgard...@jonathangardner.net
On Jun 15, 8:47 am, Chris Browne cbbro...@acm.org wrote:
 jgard...@jonathangardner.net jgard...@jonathangardner.net writes:
  My question is how can I configure the database to run as quickly as
  possible if I don't care about data consistency or durability? That
  is, the data is updated so often and it can be reproduced fairly
  rapidly so that if there is a server crash or random particles from
  space mess up memory we'd just restart the machine and move on.

 For such a scenario, I'd suggest you:

 - Set up a filesystem that is memory-backed.  On Linux, RamFS or TmpFS
   are reasonable options for this.


I had forgotten about this. I will try this out.

 - The complication would be that your restart the machine and move
   on needs to consist of quite a few steps:

   - recreating the filesystem
   - fixing permissions as needed
   - running initdb to set up new PG instance
   - automating any needful fiddling with postgresql.conf, pg_hba.conf
   - starting up that PG instance
   - creating users, databases, schemas, ...


I'm going to have a system in place to create these databases when I
restart the service.

 ...

 I wonder if this kind of installation comes into its own for more
 realistic scenarios in the presence of streaming replication.  If you
 know the WAL files have gotten to disk on another server, that's a
 pretty good guarantee :-).


I have found that pre-computing and storing values in a general
relational-type database without durability is an ideal use case to
help improve services that need to return calculated results quickly.
A simple hash lookup is no longer sufficient. Perhaps PostgreSQL
running in this mode will be the ideal solution.

Nowadays, no one is really surprised that it takes 30 seconds or so to
replicate your data everywhere, but they do detest not getting answers
to their complicated queries immediately.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL as a local in-memory cache

2010-06-15 Thread Chris Browne
swamp...@noao.edu (Steve Wampler) writes:
 Or does losing WAL files mandate a new initdb?

Losing WAL would mandate initdb, so I'd think this all fits into the
set of stuff worth putting onto ramfs/tmpfs.  Certainly it'll all be
significant to the performance focus.
-- 
select 'cbbrowne' || '@' || 'cbbrowne.com';
http://cbbrowne.com/info/internet.html
MS  apparently now  has a  team dedicated  to tracking  problems with
Linux  and publicizing them.   I guess  eventually they'll  figure out
this back fires... ;) -- William Burrow aa...@delete.fan.nb.ca

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] B-Heaps

2010-06-15 Thread Yeb Havinga

Greg Smith wrote:

Eliot Gable wrote:
Just curious if this would apply to PostgreSQL: 
http://queue.acm.org/detail.cfm?id=1814327


It's hard to take this seriously at all when it's so ignorant of 
actual research in this area.  Take a look at 
http://www.cc.gatech.edu/~bader/COURSES/UNM/ece637-Fall2003/papers/BFJ01.pdf 
for a second

Interesting paper, thanks for the reference!
PostgreSQL is modeling a much more complicated situation where there 
are many levels of caches, from CPU to disk.  When executing a query, 
the database tries to manage that by estimating the relative costs for 
CPU operations, row operations, sequential disk reads, and random disk 
reads.  Those fundamental operations are then added up to build more 
complicated machinery like sorting.  To minimize query execution cost, 
various query plans are considered, the cost computed for each one, 
and the cheapest one gets executed.  This has to take into account a 
wide variety of subtle tradeoffs related to whether memory should be 
used for things that would otherwise happen on disk.  There are three 
primary ways to search for a row, three main ways to do a join, two 
for how to sort, and they all need to have cost estimates made for 
them that balance CPU time, memory, and disk access.
Do you think that the cache oblivious algorithm described in the paper 
could speed up index scans hitting the disk Postgres (and os/hardware) 
multi level memory case? (so e.g. random page cost could go down?)


regards,
Yeb Havinga

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL as a local in-memory cache

2010-06-15 Thread Jaime Casanova
On Tue, Jun 15, 2010 at 12:37 PM, Chris Browne cbbro...@acm.org wrote:
 swamp...@noao.edu (Steve Wampler) writes:
 Or does losing WAL files mandate a new initdb?

 Losing WAL would mandate initdb, so I'd think this all fits into the
 set of stuff worth putting onto ramfs/tmpfs.  Certainly it'll all be
 significant to the performance focus.

why is that? isn't simply execute pg_resetxlog enough? specially
'cause OP doesn't care about loosing some transactions

-- 
Jaime Casanova www.2ndQuadrant.com
Soporte y capacitación de PostgreSQL

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL as a local in-memory cache

2010-06-15 Thread Josh Berkus
On 6/15/10 10:37 AM, Chris Browne wrote:
 swamp...@noao.edu (Steve Wampler) writes:
 Or does losing WAL files mandate a new initdb?
 
 Losing WAL would mandate initdb, so I'd think this all fits into the
 set of stuff worth putting onto ramfs/tmpfs.  Certainly it'll all be
 significant to the performance focus.

I'd like to see some figures about WAL on RAMfs vs. simply turning off
fsync and full_page_writes.  Per Gavin's tests, PostgreSQL is already
close to TokyoCabinet/MongoDB performance just with those turned off; I
wonder if actually having the WAL on a memory partition would make any
real difference in throughput.

I've seen a lot of call for this recently, especially since PostgreSQL
seems to be increasingly in use as a reporting server for Hadoop.  Might
be worth experimenting with just making wal writing a no-op.  We'd also
want to disable checkpointing, of course.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance