Re: [PERFORM] B-Heaps
In response to Greg Smith : For details about what the database does there, see Inside the PostgreSQL Buffer Cache at http://projects.2ndquadrant.com/talks Nice paper, btw., thanks for that! Regards, Andreas -- Andreas Kretschmer Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: - Header) GnuPG: 0x31720C99, 1006 CCB4 A326 1D42 6431 2EB0 389D 1DC2 3172 0C99 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] B-Heaps
On Mon, 14 Jun 2010, Eliot Gable wrote: Just curious if this would apply to PostgreSQL: http://queue.acm.org/detail.cfm?id=1814327 Absolutely, and I said in http://archives.postgresql.org/pgsql-performance/2010-03/msg00272.php but applied to the Postgres B-tree indexes instead of heaps. It's a pretty obvious performance improvement really - the principle is that when you do have to fetch a page from a slower medium, you may as well make it count for a lot. Lots of research has already been done on this - the paper linked above is rather behind the times. However, AFAIK, Postgres has not implemented this in any of its indexing systems. Matthew -- An ant doesn't have a lot of processing power available to it. I'm not trying to be speciesist - I wouldn't want to detract you from such a wonderful creature, but, well, there isn't a lot there, is there? -- Computer Science Lecturer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL as a local in-memory cache
Chris Browne wrote: jgard...@jonathangardner.net jgard...@jonathangardner.net writes: My question is how can I configure the database to run as quickly as possible if I don't care about data consistency or durability? That is, the data is updated so often and it can be reproduced fairly rapidly so that if there is a server crash or random particles from space mess up memory we'd just restart the machine and move on. For such a scenario, I'd suggest you: - Set up a filesystem that is memory-backed. On Linux, RamFS or TmpFS are reasonable options for this. - The complication would be that your restart the machine and move on needs to consist of quite a few steps: - recreating the filesystem - fixing permissions as needed - running initdb to set up new PG instance - automating any needful fiddling with postgresql.conf, pg_hba.conf - starting up that PG instance - creating users, databases, schemas, ... Doesn't PG now support putting both WAL and user table files onto file systems other than the one holding the PG config files and PG 'admin' tables? Wouldn't doing so simplify the above considertably by allowing just the WAL and user tables on the memory-backed file systems? I wouldn't think the performance impact of leaving the rest of the stuff on disk would be that large. Or does losing WAL files mandate a new initdb? -- Steve Wampler -- swamp...@noao.edu The gods that smiled on your birth are now laughing out loud. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL as a local in-memory cache
[oops, didn't hit reply to list first time, resending...] On 6/15/10 9:02 AM, Steve Wampler wrote: Chris Browne wrote: jgard...@jonathangardner.net jgard...@jonathangardner.net writes: My question is how can I configure the database to run as quickly as possible if I don't care about data consistency or durability? That is, the data is updated so often and it can be reproduced fairly rapidly so that if there is a server crash or random particles from space mess up memory we'd just restart the machine and move on. For such a scenario, I'd suggest you: - Set up a filesystem that is memory-backed. On Linux, RamFS or TmpFS are reasonable options for this. - The complication would be that your restart the machine and move on needs to consist of quite a few steps: - recreating the filesystem - fixing permissions as needed - running initdb to set up new PG instance - automating any needful fiddling with postgresql.conf, pg_hba.conf - starting up that PG instance - creating users, databases, schemas, ... How about this: Set up a database entirely on a RAM disk, then install a WAL-logging warm standby. If the production computer goes down, you bring the warm standby online, shut it down, and use tar(1) to recreate the database on the production server when you bring it back online. You have speed and you have near-100% backup. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL as a local in-memory cache
On Jun 15, 8:47 am, Chris Browne cbbro...@acm.org wrote: jgard...@jonathangardner.net jgard...@jonathangardner.net writes: My question is how can I configure the database to run as quickly as possible if I don't care about data consistency or durability? That is, the data is updated so often and it can be reproduced fairly rapidly so that if there is a server crash or random particles from space mess up memory we'd just restart the machine and move on. For such a scenario, I'd suggest you: - Set up a filesystem that is memory-backed. On Linux, RamFS or TmpFS are reasonable options for this. I had forgotten about this. I will try this out. - The complication would be that your restart the machine and move on needs to consist of quite a few steps: - recreating the filesystem - fixing permissions as needed - running initdb to set up new PG instance - automating any needful fiddling with postgresql.conf, pg_hba.conf - starting up that PG instance - creating users, databases, schemas, ... I'm going to have a system in place to create these databases when I restart the service. ... I wonder if this kind of installation comes into its own for more realistic scenarios in the presence of streaming replication. If you know the WAL files have gotten to disk on another server, that's a pretty good guarantee :-). I have found that pre-computing and storing values in a general relational-type database without durability is an ideal use case to help improve services that need to return calculated results quickly. A simple hash lookup is no longer sufficient. Perhaps PostgreSQL running in this mode will be the ideal solution. Nowadays, no one is really surprised that it takes 30 seconds or so to replicate your data everywhere, but they do detest not getting answers to their complicated queries immediately. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL as a local in-memory cache
swamp...@noao.edu (Steve Wampler) writes: Or does losing WAL files mandate a new initdb? Losing WAL would mandate initdb, so I'd think this all fits into the set of stuff worth putting onto ramfs/tmpfs. Certainly it'll all be significant to the performance focus. -- select 'cbbrowne' || '@' || 'cbbrowne.com'; http://cbbrowne.com/info/internet.html MS apparently now has a team dedicated to tracking problems with Linux and publicizing them. I guess eventually they'll figure out this back fires... ;) -- William Burrow aa...@delete.fan.nb.ca -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] B-Heaps
Greg Smith wrote: Eliot Gable wrote: Just curious if this would apply to PostgreSQL: http://queue.acm.org/detail.cfm?id=1814327 It's hard to take this seriously at all when it's so ignorant of actual research in this area. Take a look at http://www.cc.gatech.edu/~bader/COURSES/UNM/ece637-Fall2003/papers/BFJ01.pdf for a second Interesting paper, thanks for the reference! PostgreSQL is modeling a much more complicated situation where there are many levels of caches, from CPU to disk. When executing a query, the database tries to manage that by estimating the relative costs for CPU operations, row operations, sequential disk reads, and random disk reads. Those fundamental operations are then added up to build more complicated machinery like sorting. To minimize query execution cost, various query plans are considered, the cost computed for each one, and the cheapest one gets executed. This has to take into account a wide variety of subtle tradeoffs related to whether memory should be used for things that would otherwise happen on disk. There are three primary ways to search for a row, three main ways to do a join, two for how to sort, and they all need to have cost estimates made for them that balance CPU time, memory, and disk access. Do you think that the cache oblivious algorithm described in the paper could speed up index scans hitting the disk Postgres (and os/hardware) multi level memory case? (so e.g. random page cost could go down?) regards, Yeb Havinga -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL as a local in-memory cache
On Tue, Jun 15, 2010 at 12:37 PM, Chris Browne cbbro...@acm.org wrote: swamp...@noao.edu (Steve Wampler) writes: Or does losing WAL files mandate a new initdb? Losing WAL would mandate initdb, so I'd think this all fits into the set of stuff worth putting onto ramfs/tmpfs. Certainly it'll all be significant to the performance focus. why is that? isn't simply execute pg_resetxlog enough? specially 'cause OP doesn't care about loosing some transactions -- Jaime Casanova www.2ndQuadrant.com Soporte y capacitación de PostgreSQL -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL as a local in-memory cache
On 6/15/10 10:37 AM, Chris Browne wrote: swamp...@noao.edu (Steve Wampler) writes: Or does losing WAL files mandate a new initdb? Losing WAL would mandate initdb, so I'd think this all fits into the set of stuff worth putting onto ramfs/tmpfs. Certainly it'll all be significant to the performance focus. I'd like to see some figures about WAL on RAMfs vs. simply turning off fsync and full_page_writes. Per Gavin's tests, PostgreSQL is already close to TokyoCabinet/MongoDB performance just with those turned off; I wonder if actually having the WAL on a memory partition would make any real difference in throughput. I've seen a lot of call for this recently, especially since PostgreSQL seems to be increasingly in use as a reporting server for Hadoop. Might be worth experimenting with just making wal writing a no-op. We'd also want to disable checkpointing, of course. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance