On Sun, Feb 5, 2023 at 5:51 PM Tomas Vondra <tomas.von...@enterprisedb.com> wrote: > > On 2/5/23 19:36, Andrey Borodin wrote: > > On Fri, Jan 6, 2023 at 10:02 PM Andrey Borodin <amborodi...@gmail.com> > > wrote: > >> > >> Hello! Please find attached v8. > > > > I got some interesting feedback from some patch users. > > There was an oversight that frequently yielded results that are 1,2 or > > 3 bytes longer than expected. > > Looking closer I found that the correctness of the last 3-byte tail is > > checked in two places. PFA fix for this. Previously compressed data > > was correct, however in some cases few bytes longer than the result of > > current pglz implementation. > > > > Thanks. What were the consequences of the issue? Lower compression > ratio, or did we then fail to decompress the data (or would current pglz > implementation fail to decompress it)? > The data was decompressed fine. But extension tests (Citus's columnar engine) hard-coded a lot of compression ratio stuff. And there is still 1 more test where optimized version produces 1 byte longer output. I'm trying to find it, but with no success yet.
There are known and documented cases when optimized pglz version would do so. good_match without 10-division and memcmp by 4 bytes. But even disabling this, still observing 1-byte longer compression results persists... The problem is the length is changed after deleting some data, so compression of that particular sequence seems to be somewhere far away. It was funny at the beginning - to hunt for 1 byte. But the weekend is ending, and it seems that byte slipped from me again... Best regards, Andrey Borodin.