Re: [HACKERS] Table and Index compression

2009-08-12 Thread Peter Eisentraut
On Tuesday 11 August 2009 13:05:39 Pierre Frédéric Caillaud wrote: Well, here is the patch. I've included a README, which I paste here. If someone wants to play with it (after the CommitFest...) feel free to do so. While it was an interesting thing to try, I don't think it

Re: [HACKERS] Table and Index compression

2009-08-12 Thread Pierre Frédéric Caillau d
For future reference, and since this keeps appearing every few months: The license of LZO is not acceptable for inclusion or use with PostgreSQL. You need to find a different library if you want to pursue this further. Yes, I know about the license... I used LZO for tests, but since my

Re: [HACKERS] Table and Index compression

2009-08-11 Thread Pierre Frédéric Caillau d
Well, here is the patch. I've included a README, which I paste here. If someone wants to play with it (after the CommitFest...) feel free to do so. While it was an interesting thing to try, I don't think it has enough potential to justify more effort... * How to test - apply

Re: [HACKERS] Table and Index compression

2009-08-07 Thread Pierre Frédéric Caillau d
First, a few things that I forgot to mention in the previous message : I like the idea too, but I think there are some major problems to solve. In particular I think we need a better solution to blocks growing than sparse files. Sparse files allow something great : to test this concept

Re: [HACKERS] Table and Index compression

2009-08-07 Thread Pierre Frédéric Caillau d
On Thu, Aug 6, 2009 at 4:03 PM, Greg Starkgsst...@mit.edu wrote: I like the idea too, but I think there are some major problems to solve. In particular I think we need a better solution to blocks growing than sparse files. How much benefit does this approach have over using TOAST compression

Re: [HACKERS] Table and Index compression

2009-08-07 Thread Sam Mason
On Fri, Aug 07, 2009 at 10:36:39AM +0200, Pierre Frrrdddric Caillaud wrote: Also, about compressed NTFS : it can give you disk-full errors on read(). While this may appear stupid, it is in fact very good. Is this not just because they've broken the semantics of read? As a side note, I have

Re: [HACKERS] Table and Index compression

2009-08-07 Thread Greg Stark
2009/8/7 Pierre Frédéric Caillaud li...@peufeu.com: Also, about compressed NTFS : it can give you disk-full errors on read(). I suspect it's unavoidable for similar reasons to the problems Postgres faces. When you issue a read() you have to find space in the filesystem cache to hold the data.

Re: [HACKERS] Table and Index compression

2009-08-07 Thread Sam Mason
On Fri, Aug 07, 2009 at 10:33:33AM +0100, Greg Stark wrote: 2009/8/7 Pierre Frédéric Caillaud li...@peufeu.com: Also, about compressed NTFS : it can give you disk-full errors on read(). I suspect it's unavoidable for similar reasons to the problems Postgres faces. When you issue a read()

Re: [HACKERS] Table and Index compression

2009-08-07 Thread Greg Stark
On Fri, Aug 7, 2009 at 11:29 AM, Sam Masons...@samason.me.uk wrote: When you choose a compression algorithm you know how much space a worst case compression will take (i.e. lzo takes up to 8% more for a 4kB block size).  This space should be reserved in case of situations like the above and

Re: [HACKERS] Table and Index compression

2009-08-07 Thread Pierre Frédéric Caillau d
Also, I'm puzzled why it would the space increase would proportional to the amount of data and be more than 300 bytes. There's no reason it wouldn't be a small fixed amount. The ideal is you set aside one bit -- if the bit is set the rest is compressed and has to save at least one bit. If the

Re: [HACKERS] Table and Index compression

2009-08-07 Thread Sam Mason
On Fri, Aug 07, 2009 at 11:49:46AM +0100, Greg Stark wrote: On Fri, Aug 7, 2009 at 11:29 AM, Sam Masons...@samason.me.uk wrote: When you choose a compression algorithm you know how much space a worst case compression will take (i.e. lzo takes up to 8% more for a 4kB block size). This space

Re: [HACKERS] Table and Index compression

2009-08-07 Thread Greg Stark
On Fri, Aug 7, 2009 at 12:48 PM, Sam Masons...@samason.me.uk wrote: Well most users want compression for the space savings. So running out of space sooner than without compression when most of the space is actually unused would disappoint them. Note, that as far as I can tell for a

Re: [HACKERS] Table and Index compression

2009-08-07 Thread Greg Stark
For reference what I'm picturing is this: When a table is compressed it's marked read-only which bars any new tuples from being inserted or existing tuples being deleted. Then it's frozen and any pages which contain tuples wich can't be frozen are waited on until they can be. When it's finished

Re: [HACKERS] Table and Index compression

2009-08-07 Thread Sam Mason
On Fri, Aug 07, 2009 at 12:59:57PM +0100, Greg Stark wrote: On Fri, Aug 7, 2009 at 12:48 PM, Sam Masons...@samason.me.uk wrote: Well most users want compression for the space savings. So running out of space sooner than without compression when most of the space is actually unused would

Re: [HACKERS] Table and Index compression

2009-08-07 Thread Robert Haas
On Fri, Aug 7, 2009 at 8:18 AM, Greg Starkgsst...@mit.edu wrote: For reference what I'm picturing is this: When a table is compressed it's marked read-only which bars any new tuples from being inserted or existing tuples being deleted. Then it's frozen and any pages which contain tuples wich

Re: [HACKERS] Table and Index compression

2009-08-07 Thread Pierre Frédéric Caillau d
For reference what I'm picturing is this: When a table is compressed it's marked read-only which bars any new tuples from being inserted or existing tuples being deleted. Then it's frozen and any pages which contain tuples wich can't be frozen are waited on until they can be. When it's finished

Re: [HACKERS] Table and Index compression

2009-08-07 Thread Pierre Frédéric Caillau d
Not strictly related to compression, but I've noticed something really strange... pg 8.4 (vanilla) is doing it, and my compressed version is doing it too. tablespace is a RAID5 of 3 drives, xlog in on a RAID1 of 2 drives, but it does it too if I put the tablespace and data on the same volume.

Re: [HACKERS] Table and Index compression

2009-08-07 Thread Sam Mason
On Fri, Aug 07, 2009 at 03:29:44PM +0200, Pierre Frrrdddric Caillaud wrote: vmstat output : Sorry, I don't know enough of PGs internals to suggest anything here, but iostat may give you more details as to what's going on. -- Sam http://samason.me.uk/ -- Sent via pgsql-hackers mailing list

Re: [HACKERS] Table and Index compression

2009-08-07 Thread Kevin Grittner
Pierre Frédéric Caillaudli...@peufeu.com wrote: tablespace is a RAID5 of 3 drives, xlog in on a RAID1 of 2 drives, but it does it too if I put the tablespace and data on the same volume. it starts out relatively fast : si sobibo in csus sy id wa 00 0

Re: [HACKERS] Table and Index compression

2009-08-07 Thread Pierre Frédéric Caillau d
On Fri, 07 Aug 2009 15:42:35 +0200, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Pierre Frédéric Caillaudli...@peufeu.com wrote: tablespace is a RAID5 of 3 drives, xlog in on a RAID1 of 2 drives, but it does it too if I put the tablespace and data on the same volume. it starts out

Re: [HACKERS] Table and Index compression

2009-08-07 Thread Sam Mason
On Fri, Aug 07, 2009 at 04:17:18PM +0200, Pierre Frrrdddric Caillaud wrote: I'm answering my own question : at the beginning of the run, postgres creates a 800MB temporary file, then it fills the table, then deletes the temp file. Is this because I use generate_series to fill the test

Re: [HACKERS] Table and Index compression

2009-08-07 Thread Josh Berkus
Pierre, I added a field in PageHeader which contains : - 0 to indicate a non-compressed page - length of compressed data if compressed If compression gains nothing (ie gains less than 4K), the page is stored raw. It seems that only pages having a PageHeader are

[HACKERS] Table and Index compression

2009-08-06 Thread PFC
With the talk about adding compression to pg_dump lately, I've been wondering if tables and indexes could be compressed too. So I've implemented a quick on-the-fly compression patch for postgres Sorry for the long email, but I hope you find this interesting. Why compress ? 1- To

Re: [HACKERS] Table and Index compression

2009-08-06 Thread Josh Berkus
On 8/6/09 2:39 AM, PFC wrote: With the talk about adding compression to pg_dump lately, I've been wondering if tables and indexes could be compressed too. So I've implemented a quick on-the-fly compression patch for postgres I find this very interesting, and would like to test it

Re: [HACKERS] Table and Index compression

2009-08-06 Thread Guillaume Smet
Pierre, On Thu, Aug 6, 2009 at 11:39 AM, PFCli...@peufeu.com wrote: The best for this is lzo : very fast decompression, a good compression ratio on a sample of postgres table and indexes, and a license that could work. The license of lzo doesn't allow us to include it in PostgreSQL without

Re: [HACKERS] Table and Index compression

2009-08-06 Thread Greg Stark
I like the idea too, but I think there are some major problems to solve. In particular I think we need a better solution to blocks growing than sparse files. The main problem with using sparse files is that currently postgres is careful to allocate blocks early so it can fail if there's not

Re: [HACKERS] Table and Index compression

2009-08-06 Thread Robert Haas
On Thu, Aug 6, 2009 at 4:03 PM, Greg Starkgsst...@mit.edu wrote: I like the idea too, but I think there are some major problems to solve. In particular I think we need a better solution to blocks growing than sparse files. How much benefit does this approach have over using TOAST compression

Re: [HACKERS] Table and Index compression

2009-08-06 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote: On Thu, Aug 6, 2009 at 4:03 PM, Greg Starkgsst...@mit.edu wrote: I like the idea too, but I think there are some major problems to solve. In particular I think we need a better solution to blocks growing than sparse files. How much benefit does this

Re: [HACKERS] Table and Index compression

2009-08-06 Thread Josh Berkus
On 8/6/09 1:03 PM, Greg Stark wrote: One possibility is to handle only read-only tables. That would make things a *lot* simpler. But it sure would be inconvenient if it's only useful on large static tables but requires you to rewrite the whole table -- just what you don't want to do with large

Re: [HACKERS] Table and Index compression

2009-08-06 Thread Ron Mayer
I'm curious what advantages there are in building compression into the database itself, rather than using filesystem-based compression. I see ZFS articles[1] discuss how enabling compression improves performance with ZFS; for Linux, Btrfs has compression features as well[2]; and on Windows NTFS