subject:"Re\: \[HACKERS\] Faster compression, again"

Re: [HACKERS] Faster compression, again

2012-04-06 Thread Huchev

Well, the patent argument, used like this, looks like a wild card, which
can be freely interpreted as a mortal danger for some, and a non-issue for
others. A perfect scare-mongerer.
Quite frankly, I don't buy that one implementation is safer because there
is Google backing it. I can't think of any reason why byte-aligned LZ77
algorithm could face any risk. And btw, just look at the number of
companies which had to pay protection money to Microsoft or face litigation
with Apple because they were using Google's Android. It looks to me that
Google is more a magnet for such dangers than a protector.

Regarding test tools : Yes, this is correct, Snappy C has more fuzzer tools
provided within the package.

Regarding integration to BTRFS, and therefore into Linux, both
implementation look on equal terms. I haven't seen anything which tells
that one has more chances than the other being part of Linux 3.5. In fact,
maybe both will be integrated at the same time.

However, a little publicized fact is that quite a few people tried both
implementation (Snappy C and LZ4), and there were more
failures/difficulties reported on Snappy C. It doesn't mean that Snappy C
is bad, just more complex to use. It seems that the LZ4 implementation is
more straightforward : less dependancies, less risks, less time spent to
properly optimize it,
well, in a word, simpler.

Le 5 avril 2012 01:11, Daniel Farina-4 [via PostgreSQL]
ml-node+s1045698n5619199...@n5.nabble.com a écrit :

On Tue, Apr 3, 2012 at 7:29 AM, Huchev [hidden
email]http://user/SendEmail.jtp?type=nodenode=5619199i=0
wrote:
For a C implementation, it could interesting to consider LZ4 algorithm,
since
it is written natively in this language. In contrast, Snappy has been
ported
to C by Andy from the original C++ Google code, which lso translate into
less extensive usage and tests.

From what I can tell, the C implementation of snappy has more tests
than this LZ4 implementation, including a fuzz tester. It's a
maintained part of Linux as well, and used for btrfs --- this is why
it was ported. The high compression version of LZ4 is apparently
LGPL. And, finally, there is the issue of patents: snappy has several
multi-billion dollar companies that can be held liable (originator
Google, as well as anyone connected to Linux), and to the best of my
knowledge, nobody has been held to extortion yet.

Consider me unconvinced as to this line of argument.

--
fdr

--
Sent via pgsql-hackers mailing list ([hidden
email]http://user/SendEmail.jtp?type=nodenode=5619199i=1)

To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
If you reply to this email, your message will be added to the discussion
below:

http://postgresql.1045698.n5.nabble.com/Faster-compression-again-tp5565675p5619199.html
To unsubscribe from Faster compression, again, click
herehttp://postgresql.1045698.n5.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=5565675code=aHVnb2NoZXZyYWluQGdtYWlsLmNvbXw1NTY1Njc1fDc3NzM5MzkwMA==
.
NAMLhttp://postgresql.1045698.n5.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml

--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Faster-compression-again-tp5565675p5619870.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

Re: [HACKERS] Faster compression, again

2012-04-04 Thread Daniel Farina

On Tue, Apr 3, 2012 at 7:29 AM, Huchev hugochevr...@gmail.com wrote:
 For a C implementation, it could interesting to consider LZ4 algorithm, since
 it is written natively in this language. In contrast, Snappy has been ported
 to C by Andy from the original C++ Google code, which lso translate into
 less extensive usage and tests.

From what I can tell, the C implementation of snappy has more tests
than this LZ4 implementation, including a fuzz tester.  It's a
maintained part of Linux as well, and used for btrfs --- this is why
it was ported.  The high compression version of LZ4 is apparently
LGPL.  And, finally, there is the issue of patents: snappy has several
multi-billion dollar companies that can be held liable (originator
Google, as well as anyone connected to Linux), and to the best of my
knowledge, nobody has been held to extortion yet.

Consider me unconvinced as to this line of argument.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Faster compression, again

2012-04-03 Thread Huchev

For a C implementation, it could interesting to consider LZ4 algorithm, since
it is written natively in this language. In contrast, Snappy has been ported
to C by Andy from the original C++ Google code, which lso translate into
less extensive usage and tests.

http://code.google.com/p/lz4/

Furthermode, LZ4 license is BSD. And it has been reported in several tests
as being faster than Snappy/LZO, especially on decompression speed.

http://article.gmane.org/gmane.comp.file-systems.btrfs/15744

And last point, there is a high compression mode, which could be useful
for data rarely written/modified but often read.

http://code.google.com/p/lz4hc/


--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Faster-compression-again-tp5565675p5615311.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Faster compression, again

2012-03-22 Thread Simon Riggs

On Thu, Mar 15, 2012 at 10:34 PM, Daniel Farina dan...@heroku.com wrote:

 I'd really like to find a way to layer both message-oblivious and
 message-aware transport under FEBE with both backend and frontend
 support without committing the project to new code for-ever-and-ever.
 I guess I could investigate it in brief now, unless you've already
 thought about/done some work in that area.

Not done anything in that area myself.

I think its important that we have compression for the COPY protocol
within libpq, so I'll add that to my must-do list - but would be more
than happy if you wanted to tackle that yourself.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Faster compression, again

2012-03-15 Thread Daniel Farina

On Wed, Mar 14, 2012 at 11:06 AM, Daniel Farina dan...@heroku.com wrote:
...and it has been ported to C (recently, and with some
 quirks, like no LICENSE file...yet, although it is linked from the
 original Snappy project).

I poked the author about the license and he fixed it in a jiffy.  Now
under BSD, with Intel's Copyright.  He seems to be committing a few
enhancements, but the snail's trail of the Internet suggests that this
code has made its way into Linux as well, including btrfs.

So now I guess we can have at it... https://github.com/andikleen/snappy-c/

--
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Faster compression, again

2012-03-15 Thread Simon Riggs

On Wed, Mar 14, 2012 at 6:06 PM, Daniel Farina dan...@heroku.com wrote:

 If we're curious how it affects replication
 traffic, I could probably gather statistics on LZO-compressed WAL
 traffic, of which we have a pretty huge amount captured.

What's the compression like for shorter chunks of data? Is it worth
considering using this for the libpq copy protocol and therefore
streaming replication also?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Faster compression, again

2012-03-15 Thread Daniel Farina

On Thu, Mar 15, 2012 at 3:14 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Wed, Mar 14, 2012 at 6:06 PM, Daniel Farina dan...@heroku.com wrote:

 If we're curious how it affects replication
 traffic, I could probably gather statistics on LZO-compressed WAL
 traffic, of which we have a pretty huge amount captured.

 What's the compression like for shorter chunks of data? Is it worth
 considering using this for the libpq copy protocol and therefore
 streaming replication also?

The overhead is between 1 and 5 bytes that reserve the high bit as a
continuation bit (so one byte for small data), and then straight into
data.  So I think it could be applied for most payloads that are a few
bytes wide.  Presumably that could be lifted, but the format
description only allows for 2**32 - 1 for the uncompressed size.

I'd really like to find a way to layer both message-oblivious and
message-aware transport under FEBE with both backend and frontend
support without committing the project to new code for-ever-and-ever.
I guess I could investigate it in brief now, unless you've already
thought about/done some work in that area.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Faster compression, again

2012-03-15 Thread k...@rice.edu

On Thu, Mar 15, 2012 at 10:14:12PM +, Simon Riggs wrote:
 On Wed, Mar 14, 2012 at 6:06 PM, Daniel Farina dan...@heroku.com wrote:
 
  If we're curious how it affects replication
  traffic, I could probably gather statistics on LZO-compressed WAL
  traffic, of which we have a pretty huge amount captured.
 
 What's the compression like for shorter chunks of data? Is it worth
 considering using this for the libpq copy protocol and therefore
 streaming replication also?
 
 -- 
  Simon Riggs   http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services

Here is a pointer to some tests with Snappy+CouchDB:

https://github.com/fdmanana/couchdb/blob/b8f806e41727ba18ed6143cee31a3242e024ab2c/snappy-couch-tests.txt

They checked compression on smaller chunks of data. I have extracted the
basic results. The first number is the original size in bytes, followed
by the compressed size in bytes, the percent compressed and the compression
ratio:

77 - 60, 90% or 1.1:1
120 - 104, 87% or 1.15:1
127 - 80, 63% or 1.6:1
5942 - 2930, 49% or 2:1

It looks like a good candidate for both the libpq copy protocol and
streaming replication. My two cents.

Regards,
Ken

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Faster compression, again

2012-03-14 Thread k...@rice.edu

On Wed, Mar 14, 2012 at 11:06:16AM -0700, Daniel Farina wrote:
 For 9.3 at a minimum.
 
 The topic of LZO became mired in doubts about:
 
 * Potential Patents
 * The author's intention for the implementation to be GPL
 
 Since then, Google released Snappy, also an LZ77-class
 implementation, and it has been ported to C (recently, and with some
 quirks, like no LICENSE file...yet, although it is linked from the
 original Snappy project).  The original Snappy (C++) has a BSD license
 and a patent grant (which shields you from Google, at least).  Do we
 want to investigate a very-fast compression algorithm inclusion again
 in the 9.3 cycle?
 

+1 for Snappy and a very fast compression algorithm.

Regards,
Ken

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Faster compression, again

2012-03-14 Thread Merlin Moncure

On Wed, Mar 14, 2012 at 1:06 PM, Daniel Farina dan...@heroku.com wrote:
 For 9.3 at a minimum.

 The topic of LZO became mired in doubts about:

 * Potential Patents
 * The author's intention for the implementation to be GPL

 Since then, Google released Snappy, also an LZ77-class
 implementation, and it has been ported to C (recently, and with some
 quirks, like no LICENSE file...yet, although it is linked from the
 original Snappy project).  The original Snappy (C++) has a BSD license
 and a patent grant (which shields you from Google, at least).  Do we
 want to investigate a very-fast compression algorithm inclusion again
 in the 9.3 cycle?

 I've been using the similar implementation LZO for WAL archiving and
 it is a significant savings (not as much as pg_lesslog, but also less
 invasive).  It is also fast enough that even if one were not to uproot
 TOAST's compression that it would probably be very close to a complete
 win for protocol traffic, whereas SSL's standardized zlib can
 definitely be a drag in some cases.

 This idea resurfaces often, but the reason why I wrote in about it is
 because I have a table which I categorized as small but was, in
 fact, 1.5MB, which made transferring it somewhat slow over a remote
 link.  zlib compression takes it down to about 550K and lzo (similar,
 but not identical) 880K.  If we're curious how it affects replication
 traffic, I could probably gather statistics on LZO-compressed WAL
 traffic, of which we have a pretty huge amount captured.

there are plenty of on gpl lz based libraries out there (for example:
http://www.fastlz.org/) and always have been.  they are all much
faster than zlib.  the main issue is patents...you have to be careful
even though all the lz77/78 patents seem to have expired or apply to
specifics not relevant to general use.

see here for the last round of talks on this:
http://archives.postgresql.org/pgsql-performance/2009-08/msg00052.php

lzo is nearing its 20th birthday, so even if you are paranoid about
patents (admittedly, there is good reason to be), the window is
closing fast to have patent issues that aren't A expired or B  covered
by prior art on that or the various copycat implementations, at least
in the US.

snappy looks amazing.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Faster compression, again

2012-03-14 Thread Andrew Dunstan




On 03/14/2012 04:10 PM, Merlin Moncure wrote:

there are plenty of on gpl lz based libraries out there (for example:
http://www.fastlz.org/) and always have been.  they are all much
faster than zlib.  the main issue is patents...you have to be careful
even though all the lz77/78 patents seem to have expired or apply to
specifics not relevant to general use.



We're not going to include GPL code in the backend. We have enough 
trouble with readline and that's only for psql. SO the fact that there 
are GPL libraries can't help us, whether there are patent issues or not.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Faster compression, again

2012-03-14 Thread k...@rice.edu

On Wed, Mar 14, 2012 at 04:43:55PM -0400, Andrew Dunstan wrote:
 
 
 On 03/14/2012 04:10 PM, Merlin Moncure wrote:
 there are plenty of on gpl lz based libraries out there (for example:
 http://www.fastlz.org/) and always have been.  they are all much
 faster than zlib.  the main issue is patents...you have to be careful
 even though all the lz77/78 patents seem to have expired or apply to
 specifics not relevant to general use.
 
 
 We're not going to include GPL code in the backend. We have enough
 trouble with readline and that's only for psql. SO the fact that
 there are GPL libraries can't help us, whether there are patent
 issues or not.
 
 cheers
 
 andrew
 

That is what makes Google's Snappy so appealing, a BSD license.

Regards,
Ken

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Faster compression, again

2012-03-14 Thread Merlin Moncure

On Wed, Mar 14, 2012 at 3:43 PM, Andrew Dunstan and...@dunslane.net wrote:


 On 03/14/2012 04:10 PM, Merlin Moncure wrote:

 there are plenty of on gpl lz based libraries out there (for example:
 http://www.fastlz.org/) and always have been.  they are all much
 faster than zlib.  the main issue is patents...you have to be careful
 even though all the lz77/78 patents seem to have expired or apply to
 specifics not relevant to general use.


 We're not going to include GPL code in the backend. We have enough trouble
 with readline and that's only for psql. SO the fact that there are GPL
 libraries can't help us, whether there are patent issues or not.

er, typo: I meant to say: *non-gpl* lz based...  :-).

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Faster compression, again

2012-03-14 Thread Daniel Farina

On Wed, Mar 14, 2012 at 2:03 PM, Merlin Moncure mmonc...@gmail.com wrote:
 er, typo: I meant to say: *non-gpl* lz based...  :-).

Given that, few I would say have had the traction that LZO and Snappy
have had, even though in many respects they are interchangeable in the
general trade-off spectrum. The question is: what burden of proof is
required to convince the project that Snappy does not have exorbitant
patent issues, in proportion to the utility of having a compression
scheme of this type integrated?

One would think Google's lawyers did their homework to ensure they
would not be trolled for hideous sums of money by disclosing and
releasing the exact implementation of the compression used virtually
everywhere.  If anything, that may have been a more complicated issue
than writing and releasing yet-another-LZ77 implementation.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Faster compression, again

2012-03-14 Thread Tom Lane

Daniel Farina dan...@heroku.com writes:
 Given that, few I would say have had the traction that LZO and Snappy
 have had, even though in many respects they are interchangeable in the
 general trade-off spectrum. The question is: what burden of proof is
 required to convince the project that Snappy does not have exorbitant
 patent issues, in proportion to the utility of having a compression
 scheme of this type integrated?

Another not-exactly-trivial requirement is to figure out how to not
break on-disk compatibility when installing an alternative compression
scheme.  In hindsight it might've been a good idea if pglz_compress had
wasted a little bit of space on some sort of version identifier ...
but it didn't.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Faster compression, again

2012-03-14 Thread Kevin Grittner

Tom Lane t...@sss.pgh.pa.us wrote:
 
 Another not-exactly-trivial requirement is to figure out how to
 not break on-disk compatibility when installing an alternative
 compression scheme.  In hindsight it might've been a good idea if
 pglz_compress had wasted a little bit of space on some sort of
 version identifier ... but it didn't.
 
Doesn't it always start with a header of two int32 values where the
first must be smaller than the second?  That seems like enough to
get traction for an identifiably different header for an alternative
compression technique.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Faster compression, again

2012-03-14 Thread Daniel Farina

On Wed, Mar 14, 2012 at 2:58 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Daniel Farina dan...@heroku.com writes:
 Given that, few I would say have had the traction that LZO and Snappy
 have had, even though in many respects they are interchangeable in the
 general trade-off spectrum. The question is: what burden of proof is
 required to convince the project that Snappy does not have exorbitant
 patent issues, in proportion to the utility of having a compression
 scheme of this type integrated?

 Another not-exactly-trivial requirement is to figure out how to not
 break on-disk compatibility when installing an alternative compression
 scheme.  In hindsight it might've been a good idea if pglz_compress had
 wasted a little bit of space on some sort of version identifier ...
 but it didn't.

I was more thinking that the latency and throughput in LZ77 schemes
may be best first applied to protocol compression.  The downside is
that requires more protocol wrangling, but at least terabytes of
on-disk format doesn't get in the picture, even though LZ77 on the
data itself may be attractive.

I'm interested allowing layering transports below FEBE (similar to how
SSL is below, except without the complexity of being tied into auth
 auth) in a couple of respects, and this is one of them.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Faster compression, again

2012-03-14 Thread Robert Haas

On Wed, Mar 14, 2012 at 6:08 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Tom Lane t...@sss.pgh.pa.us wrote:
 Another not-exactly-trivial requirement is to figure out how to
 not break on-disk compatibility when installing an alternative
 compression scheme.  In hindsight it might've been a good idea if
 pglz_compress had wasted a little bit of space on some sort of
 version identifier ... but it didn't.

 Doesn't it always start with a header of two int32 values where the
 first must be smaller than the second?  That seems like enough to
 get traction for an identifiably different header for an alternative
 compression technique.

The first of those words is vl_len_, which we can't fiddle with too
much, but the second is rawsize, which we can definitely fiddle with.
Right now, rawsize  vl_len_ means it's compressed; and rawsize ==
vl_len_ means it's uncompressed.  As you point out, rawsize  vl_len_
is undefined; also, and maybe simpler, rawsize  0 is undefined.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Faster compression, again

2012-03-14 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Wed, Mar 14, 2012 at 6:08 PM, Kevin Grittner
 kevin.gritt...@wicourts.gov wrote:
 Doesn't it always start with a header of two int32 values where the
 first must be smaller than the second?  That seems like enough to
 get traction for an identifiably different header for an alternative
 compression technique.

 The first of those words is vl_len_, which we can't fiddle with too
 much, but the second is rawsize, which we can definitely fiddle with.
 Right now, rawsize  vl_len_ means it's compressed; and rawsize ==
 vl_len_ means it's uncompressed.  As you point out, rawsize  vl_len_
 is undefined; also, and maybe simpler, rawsize  0 is undefined.

Well, let's please not make the same mistake again of assuming that
there will never again be any other ideas in this space.  IOW, let's
find a way to shoehorn in an actual compression-method ID value of some
sort.  I don't particularly care for trying to push that into rawsize,
because you don't really have more than about one bit to work with
there, unless you eat the entire word for ID purposes which seems
excessive.

After looking at pg_lzcompress.c for a bit, it appears to me that the
LSB of the first byte of compressed data must always be zero, because
the very first control bit has to say copy a literal byte; you can't
have a back-reference until there's some data in the output buffer.
So what I suggest is that we keep rawsize the same as it is, but peek at
the first byte after that to decide what we have: even means existing
compression method, an odd value is an ID byte selecting some new
method.  This gives us room for 128 new methods before we have trouble
again, while consuming only one byte which seems like acceptable
overhead for the purpose.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Faster compression, again

2012-03-14 Thread Robert Haas

On Wed, Mar 14, 2012 at 9:44 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Well, let's please not make the same mistake again of assuming that
 there will never again be any other ideas in this space.  IOW, let's
 find a way to shoehorn in an actual compression-method ID value of some
 sort.  I don't particularly care for trying to push that into rawsize,
 because you don't really have more than about one bit to work with
 there, unless you eat the entire word for ID purposes which seems
 excessive.

Well, you don't have to go that far.  For example, you could dictate
that, when the value is negative, the most significant byte indicates
the compression algorithm is in use (128 possible compression
algorithms).  The remaining 3 bytes indicate the compressed length;
but when they're all zero, the compressed length is instead stored in
the following 4-byte word.  This consumes one additional 4-byte word
for values that take = 16MB compressed, but that's presumably a
non-problem.

 After looking at pg_lzcompress.c for a bit, it appears to me that the
 LSB of the first byte of compressed data must always be zero, because
 the very first control bit has to say copy a literal byte; you can't
 have a back-reference until there's some data in the output buffer.
 So what I suggest is that we keep rawsize the same as it is, but peek at
 the first byte after that to decide what we have: even means existing
 compression method, an odd value is an ID byte selecting some new
 method.  This gives us room for 128 new methods before we have trouble
 again, while consuming only one byte which seems like acceptable
 overhead for the purpose.

That would work, too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Faster compression, again

Re: [HACKERS] Faster compression, again

Re: [HACKERS] Faster compression, again

Re: [HACKERS] Faster compression, again

Re: [HACKERS] Faster compression, again

Re: [HACKERS] Faster compression, again

Re: [HACKERS] Faster compression, again

Re: [HACKERS] Faster compression, again

Re: [HACKERS] Faster compression, again

Re: [HACKERS] Faster compression, again

Re: [HACKERS] Faster compression, again

Re: [HACKERS] Faster compression, again

Re: [HACKERS] Faster compression, again

Re: [HACKERS] Faster compression, again

Re: [HACKERS] Faster compression, again

Re: [HACKERS] Faster compression, again

Re: [HACKERS] Faster compression, again

Re: [HACKERS] Faster compression, again

Re: [HACKERS] Faster compression, again

Re: [HACKERS] Faster compression, again

20 matches

Site Navigation

Mail list logo

Footer information