Re: pglz performance

2020-03-12 Thread Andrey M. Borodin
Hi Petr!

> 4 авг. 2019 г., в 05:41, Petr Jelinek  написал(а):
> 
> Just so that we don't idly talk, what do you think about the attached?
> It:
> - adds new GUC compression_algorithm with possible values of pglz (default) 
> and lz4 (if lz4 is compiled in), requires SIGHUP
> - adds --with-lz4 configure option (default yes, so the configure option is 
> actually --without-lz4) that enables the lz4, it's using system library

Do you plan to work on lz4 aiming at 13 or 14?
Maybe let's register it on CF 2020-07?

Best regards, Andrey Borodin.




Re: pglz performance

2019-11-30 Thread Michael Paquier
On Fri, Nov 29, 2019 at 10:18:18AM +0500, Andrey Borodin wrote:
>> 29 нояб. 2019 г., в 3:43, Tomas Vondra  
>> написал(а):
>>
>> OK, pushed, with some minor cosmetic tweaks on the comments (essentially
>> using the formatting trick pointed out by Alvaro), and removing one
>> unnecessary change in pglz_maximum_compressed_size.
> 
> Cool, thanks!

Yippee.  Thanks.
--
Michael


signature.asc
Description: PGP signature


Re: pglz performance

2019-11-28 Thread Andrey Borodin



> 29 нояб. 2019 г., в 3:43, Tomas Vondra  
> написал(а):
> 
> OK, pushed, with some minor cosmetic tweaks on the comments (essentially
> using the formatting trick pointed out by Alvaro), and removing one
> unnecessary change in pglz_maximum_compressed_size.

Cool, thanks!

> pglz_maximum_compressed_size
It was an artifact of pgindent.

Best regards, Andrey Borodin.



Re: pglz performance

2019-11-28 Thread Tomas Vondra

On Wed, Nov 27, 2019 at 11:27:49PM +0500, Andrey Borodin wrote:




27 нояб. 2019 г., в 20:28, Tomas Vondra  
написал(а):



6) I'm pretty sure the comment in the 'while (off < len)' branch will be
badly mangled by pgindent.


I think I can just write it without line limit and then run pgindent.
Will try to do it this evening. Also, I will try to write more about
memmove.

Well, yes, I could not make pgindent format some parts of that comment, gave up 
and left only simple text.




7) The last change moving "copy" to the next line seems unnecessary.


Oh, looks like I had been rewording this comment, and eventually came
to the same text..Yes, this change is absolutely unnecessary.

Thanks!



Good. I'll wait for an updated version of the patch and then try to get
it pushed by the end of the CF.




OK, pushed, with some minor cosmetic tweaks on the comments (essentially
using the formatting trick pointed out by Alvaro), and removing one
unnecessary change in pglz_maximum_compressed_size.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: pglz performance

2019-11-27 Thread Michael Paquier
On Wed, Nov 27, 2019 at 04:28:18PM +0100, Tomas Vondra wrote:
> Yes. I was considering using the test_pglz extension first, but in the
> end I decided an end-to-end test is easier to do and more relevant.

I actually got something in this area in one of my trees:
https://github.com/michaelpq/pg_plugins/tree/master/compress_test

> Good. I'll wait for an updated version of the patch and then try to get
> it pushed by the end of the CF.

Sounds like a plan.  Thanks.
--
Michael


signature.asc
Description: PGP signature


Re: pglz performance

2019-11-27 Thread Alvaro Herrera
On 2019-Nov-27, Andrey Borodin wrote:

> 
> 
> > 27 нояб. 2019 г., в 20:28, Tomas Vondra  
> > написал(а):
> > 
> >>> 
> >>> 6) I'm pretty sure the comment in the 'while (off < len)' branch will be
> >>> badly mangled by pgindent.
> >> 
> >> I think I can just write it without line limit and then run pgindent.
> >> Will try to do it this evening. Also, I will try to write more about
> >> memmove.
> Well, yes, I could not make pgindent format some parts of that
> comment, gave up and left only simple text.

Please don't.  The way to avoid pgindent from messing with the comment
is to surround it with dashes, /*- and end it with *--- */
Just make sure that you use well-aligned and lines shorter than 80, for
cleanliness; but whatever you do, if you use the dashes then pgindent
won't touch it.

(I think the closing dash line is not necessary, but it looks better for
things to be symmetrical.)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: pglz performance

2019-11-27 Thread Andrey Borodin


> 27 нояб. 2019 г., в 20:28, Tomas Vondra  
> написал(а):
> 
>>> 
>>> 6) I'm pretty sure the comment in the 'while (off < len)' branch will be
>>> badly mangled by pgindent.
>> 
>> I think I can just write it without line limit and then run pgindent.
>> Will try to do it this evening. Also, I will try to write more about
>> memmove.
Well, yes, I could not make pgindent format some parts of that comment, gave up 
and left only simple text.
>> 
>>> 
>>> 7) The last change moving "copy" to the next line seems unnecessary.
>> 
>> Oh, looks like I had been rewording this comment, and eventually came
>> to the same text..Yes, this change is absolutely unnecessary.
>> 
>> Thanks!
>> 
> 
> Good. I'll wait for an updated version of the patch and then try to get
> it pushed by the end of the CF.

PFA v5.
Thanks!

Best regards, Andrey Borodin.



v5-0001-Use-memcpy-in-pglz-decompression.patch
Description: Binary data


Re: pglz performance

2019-11-27 Thread Tomas Vondra

On Wed, Nov 27, 2019 at 05:47:25PM +0500, Andrey Borodin wrote:

Hi Tomas!

Thanks for benchmarking this!


26 нояб. 2019 г., в 14:43, Tomas Vondra  
написал(а):

On Mon, Nov 25, 2019 at 05:29:40PM +0900, Michael Paquier wrote:

On Mon, Nov 25, 2019 at 01:21:27PM +0500, Andrey Borodin wrote:

I think status Needs Review describes what is going on better. It's
not like something is awaited from my side.


Indeed.  You are right so I have moved the patch instead, with "Needs
review".  The patch status was actually incorrect in the CF app, as it
was marked as waiting on author.

@Tomas: updated versions of the patches have been sent by Andrey.


I've done benchmarks on the two last patches, using the data sets from
test_pglz repository [1], but using three simple queries:

1) prefix - first 100 bytes of the value

 SELECT length(substr(value, 0, 100)) FROM t

2) infix - 100 bytes from the middle

 SELECT length(substr(value, test_length/2, 100)) FROM t

3) suffix - last 100 bytes

 SELECT length(substr(value, test_length - 100, 100)) FROM t

See the two attached scripts, implementing this benchmark. The test
itself did a 60-second pgbench runs (single client) measuring tps on two
different machines.

patch 1: v4-0001-Use-memcpy-in-pglz-decompression.patch
patch 2: v4-0001-Use-memcpy-in-pglz-decompression-for-long-matches.patch

The results (compared to master) from the first machine (i5-2500k CPU)
look like this:

 patch 1| patch 2
dataset   prefix   infix  suffix | prefix   infix  suffix
-
00010001 99%134%161% |   100%126%152%
00010006 99%260%287% |   100%257%279%
00010008100%100%100% |   100% 95% 91%
16398   100%168%221% |   100%159%215%
shakespeare.txt 100%138%141% |   100%116%117%
mr   99%120%128% |   100%107%108%
dickens 100%129%132% |   100%100%100%
mozilla 100%119%120% |   100%102%104%
nci 100%149%141% |   100%143%135%
ooffice  99%121%123% |   100% 97% 98%
osdb100% 99% 99% |   100%100% 99%
reymont  99%130%132% |   100%106%107%
samba   100%126%132% |   100%105%111%
sao 100%100% 99% |   100%100%100%
webster 100%127%127% |   100%106%106%
x-ray99% 99% 99% |   100%100%100%
xml 100%144%144% |   100%130%128%

and on the other one (xeon e5-2620v4) looks like this:

patch 1|  patch 2
dataset   prefix  infix  suffix | prefix  infix   suffix

00010001 98%   147%170% |98%   132% 159%
00010006100%   340%314% |98%   334% 355%
00010008 99%   100%105% |99%99% 101%
16398   101%   153%205% |99%   148% 201%
shakespeare.txt 100%   147%149% |99%   117% 118%
mr  100%   131%139% |99%   112% 108%
dickens 100%   143%143% |99%   103% 102%
mozilla 100%   122%122% |99%   105% 106%
nci 100%   151%135% |   100%   135% 125%
ooffice  99%   127%129% |98%   101% 102%
osdb102%   100%101% |   102%   100%  99%
reymont 101%   142%143% |   100%   108% 108%
samba   100%   132%136% |99%   109% 112%
sao  99%   101%100% |99%   100% 100%
webster 100%   132%129% |   100%   106% 106%
x-ray99%   101%100% |90%   101% 101%
xml 100%   147%148% |   100%   127% 125%

In general, I think the results for both patches seem clearly a win, but
maybe patch 1 is  bit better, especially on the newer (xeon) CPU. So I'd
probably go with that one.




From my POV there are two interesting new points in your benchmarks:
1. They are more or lesss end-to-end benchmarks with whole system involved.
2. They provide per-payload breakdown



Yes. I was considering using the test_pglz extension first, but in the
end I decided an end-to-end test is easier to do and more relevant.


Prefix experiment is mostly related to reading from page cache and not

Re: pglz performance

2019-11-27 Thread Andrey Borodin
Hi Tomas!

Thanks for benchmarking this!

> 26 нояб. 2019 г., в 14:43, Tomas Vondra  
> написал(а):
> 
> On Mon, Nov 25, 2019 at 05:29:40PM +0900, Michael Paquier wrote:
>> On Mon, Nov 25, 2019 at 01:21:27PM +0500, Andrey Borodin wrote:
>>> I think status Needs Review describes what is going on better. It's
>>> not like something is awaited from my side.
>> 
>> Indeed.  You are right so I have moved the patch instead, with "Needs
>> review".  The patch status was actually incorrect in the CF app, as it
>> was marked as waiting on author.
>> 
>> @Tomas: updated versions of the patches have been sent by Andrey.
> 
> I've done benchmarks on the two last patches, using the data sets from
> test_pglz repository [1], but using three simple queries:
> 
> 1) prefix - first 100 bytes of the value
> 
>  SELECT length(substr(value, 0, 100)) FROM t
> 
> 2) infix - 100 bytes from the middle
> 
>  SELECT length(substr(value, test_length/2, 100)) FROM t
> 
> 3) suffix - last 100 bytes
> 
>  SELECT length(substr(value, test_length - 100, 100)) FROM t
> 
> See the two attached scripts, implementing this benchmark. The test
> itself did a 60-second pgbench runs (single client) measuring tps on two
> different machines.
> 
> patch 1: v4-0001-Use-memcpy-in-pglz-decompression.patch
> patch 2: v4-0001-Use-memcpy-in-pglz-decompression-for-long-matches.patch
> 
> The results (compared to master) from the first machine (i5-2500k CPU)
> look like this:
> 
>  patch 1| patch 2
> dataset   prefix   infix  suffix | prefix   infix  suffix
> -
> 00010001 99%134%161% |   100%126%152%
> 00010006 99%260%287% |   100%257%279%
> 00010008100%100%100% |   100% 95% 91%
> 16398   100%168%221% |   100%159%215%
> shakespeare.txt 100%138%141% |   100%116%117%
> mr   99%120%128% |   100%107%108%
> dickens 100%129%132% |   100%100%100%
> mozilla 100%119%120% |   100%102%104%
> nci 100%149%141% |   100%143%135%
> ooffice  99%121%123% |   100% 97% 98%
> osdb100% 99% 99% |   100%100% 99%
> reymont  99%130%132% |   100%106%107%
> samba   100%126%132% |   100%105%111%
> sao 100%100% 99% |   100%100%100%
> webster 100%127%127% |   100%106%106%
> x-ray99% 99% 99% |   100%100%100%
> xml 100%144%144% |   100%130%128%
> 
> and on the other one (xeon e5-2620v4) looks like this:
> 
> patch 1|  patch 2
> dataset   prefix  infix  suffix | prefix  infix   suffix
> 
> 00010001 98%   147%170% |98%   132% 159%
> 00010006100%   340%314% |98%   334% 355%
> 00010008 99%   100%105% |99%99% 101%
> 16398   101%   153%205% |99%   148% 201%
> shakespeare.txt 100%   147%149% |99%   117% 118%
> mr  100%   131%139% |99%   112% 108%
> dickens 100%   143%143% |99%   103% 102%
> mozilla 100%   122%122% |99%   105% 106%
> nci 100%   151%135% |   100%   135% 125%
> ooffice  99%   127%129% |98%   101% 102%
> osdb102%   100%101% |   102%   100%  99%
> reymont 101%   142%143% |   100%   108% 108%
> samba   100%   132%136% |99%   109% 112%
> sao  99%   101%100% |99%   100% 100%
> webster 100%   132%129% |   100%   106% 106%
> x-ray99%   101%100% |90%   101% 101%
> xml 100%   147%148% |   100%   127% 125%
> 
> In general, I think the results for both patches seem clearly a win, but
> maybe patch 1 is  bit better, especially on the newer (xeon) CPU. So I'd
> probably go with that one.



From my POV there are two interesting new points in your benchmarks:
1. They are more or lesss end-to-end benchmarks with whole system involved.
2. They provide per-payload breakdown

Prefix experiment is mostly related to reading from page cache and not directly 
connected with 

Re: pglz performance

2019-11-27 Thread Tomas Vondra

On Wed, Nov 27, 2019 at 05:01:47PM +0900, Michael Paquier wrote:

On Tue, Nov 26, 2019 at 09:05:59PM +0100, Tomas Vondra wrote:

Yeah, although the difference is minimal. We could probably construct a
benchmark where #2 wins, but I think these queries are fairly realistic.
So I'd just go with #1.


Nice results.  Using your benchmarks it indeed looks like patch 1 is a
winner here.


Code-wise I think the patches are mostly fine, although the comments
might need some proof-reading.

1) I wasn't really sure what a "nibble" is, but maybe it's just me and
it's a well-known term.

2) First byte use lower -> First byte uses lower

3) nibble contain upper -> nibble contains upper

4) to preven possible uncertanity -> to prevent possible uncertainty

5) I think we should briefly explain why memmove would be incompatible
with pglz, it's not quite clear to me.

6) I'm pretty sure the comment in the 'while (off < len)' branch will be
badly mangled by pgindent.

7) The last change moving "copy" to the next line seems unnecessary.


Patch 1 has a typo as well here:
+   * When offset is smaller than lengh - source and
s/lengh/length/

Okay, if we are reaching a conclusion here, Tomas or Peter, are you
planning to finish brushing the patch and potentially commit it?


I'd like some feedback from Andrey regarding the results and memmove
comment, ant then I'll polish and push.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: pglz performance

2019-11-27 Thread Michael Paquier
On Tue, Nov 26, 2019 at 09:05:59PM +0100, Tomas Vondra wrote:
> Yeah, although the difference is minimal. We could probably construct a
> benchmark where #2 wins, but I think these queries are fairly realistic.
> So I'd just go with #1.

Nice results.  Using your benchmarks it indeed looks like patch 1 is a
winner here.

> Code-wise I think the patches are mostly fine, although the comments
> might need some proof-reading.
> 
> 1) I wasn't really sure what a "nibble" is, but maybe it's just me and
> it's a well-known term.
> 
> 2) First byte use lower -> First byte uses lower
> 
> 3) nibble contain upper -> nibble contains upper
> 
> 4) to preven possible uncertanity -> to prevent possible uncertainty
> 
> 5) I think we should briefly explain why memmove would be incompatible
> with pglz, it's not quite clear to me.
> 
> 6) I'm pretty sure the comment in the 'while (off < len)' branch will be
> badly mangled by pgindent.
> 
> 7) The last change moving "copy" to the next line seems unnecessary.

Patch 1 has a typo as well here:
+   * When offset is smaller than lengh - source and
s/lengh/length/

Okay, if we are reaching a conclusion here, Tomas or Peter, are you
planning to finish brushing the patch and potentially commit it?
--
Michael


signature.asc
Description: PGP signature


Re: pglz performance

2019-11-26 Thread Tomas Vondra

On Tue, Nov 26, 2019 at 08:17:13PM +0100, Peter Eisentraut wrote:

On 2019-11-26 10:43, Tomas Vondra wrote:

In general, I think the results for both patches seem clearly a win, but
maybe patch 1 is  bit better, especially on the newer (xeon) CPU. So I'd
probably go with that one.


Patch 1 is also the simpler patch, so it seems clearly preferable.



Yeah, although the difference is minimal. We could probably construct a
benchmark where #2 wins, but I think these queries are fairly realistic.
So I'd just go with #1.

Code-wise I think the patches are mostly fine, although the comments
might need some proof-reading.

1) I wasn't really sure what a "nibble" is, but maybe it's just me and
it's a well-known term.

2) First byte use lower -> First byte uses lower

3) nibble contain upper -> nibble contains upper

4) to preven possible uncertanity -> to prevent possible uncertainty

5) I think we should briefly explain why memmove would be incompatible
with pglz, it's not quite clear to me.

6) I'm pretty sure the comment in the 'while (off < len)' branch will be
badly mangled by pgindent.

7) The last change moving "copy" to the next line seems unnecessary.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: pglz performance

2019-11-26 Thread Peter Eisentraut

On 2019-11-26 10:43, Tomas Vondra wrote:

In general, I think the results for both patches seem clearly a win, but
maybe patch 1 is  bit better, especially on the newer (xeon) CPU. So I'd
probably go with that one.


Patch 1 is also the simpler patch, so it seems clearly preferable.

--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: pglz performance

2019-11-26 Thread Tomas Vondra

On Mon, Nov 25, 2019 at 05:29:40PM +0900, Michael Paquier wrote:

On Mon, Nov 25, 2019 at 01:21:27PM +0500, Andrey Borodin wrote:

I think status Needs Review describes what is going on better. It's
not like something is awaited from my side.


Indeed.  You are right so I have moved the patch instead, with "Needs
review".  The patch status was actually incorrect in the CF app, as it
was marked as waiting on author.

@Tomas: updated versions of the patches have been sent by Andrey.


I've done benchmarks on the two last patches, using the data sets from
test_pglz repository [1], but using three simple queries:

1) prefix - first 100 bytes of the value

  SELECT length(substr(value, 0, 100)) FROM t

2) infix - 100 bytes from the middle

  SELECT length(substr(value, test_length/2, 100)) FROM t

3) suffix - last 100 bytes

  SELECT length(substr(value, test_length - 100, 100)) FROM t

See the two attached scripts, implementing this benchmark. The test
itself did a 60-second pgbench runs (single client) measuring tps on two
different machines.

patch 1: v4-0001-Use-memcpy-in-pglz-decompression.patch
patch 2: v4-0001-Use-memcpy-in-pglz-decompression-for-long-matches.patch

The results (compared to master) from the first machine (i5-2500k CPU)
look like this:

  patch 1| patch 2
dataset   prefix   infix  suffix | prefix   infix  suffix
-
00010001 99%134%161% |   100%126%152%
00010006 99%260%287% |   100%257%279%
00010008100%100%100% |   100% 95% 91%
16398   100%168%221% |   100%159%215%
shakespeare.txt 100%138%141% |   100%116%117%
mr   99%120%128% |   100%107%108%
dickens 100%129%132% |   100%100%100%
mozilla 100%119%120% |   100%102%104%
nci 100%149%141% |   100%143%135%
ooffice  99%121%123% |   100% 97% 98%
osdb100% 99% 99% |   100%100% 99%
reymont  99%130%132% |   100%106%107%
samba   100%126%132% |   100%105%111%
sao 100%100% 99% |   100%100%100%
webster 100%127%127% |   100%106%106%
x-ray99% 99% 99% |   100%100%100%
xml 100%144%144% |   100%130%128%

and on the other one (xeon e5-2620v4) looks like this:

 patch 1|  patch 2
dataset   prefix  infix  suffix | prefix  infix   suffix

00010001 98%   147%170% |98%   132% 159%
00010006100%   340%314% |98%   334% 355%
00010008 99%   100%105% |99%99% 101%
16398   101%   153%205% |99%   148% 201%
shakespeare.txt 100%   147%149% |99%   117% 118%
mr  100%   131%139% |99%   112% 108%
dickens 100%   143%143% |99%   103% 102%
mozilla 100%   122%122% |99%   105% 106%
nci 100%   151%135% |   100%   135% 125%
ooffice  99%   127%129% |98%   101% 102%
osdb102%   100%101% |   102%   100%  99%
reymont 101%   142%143% |   100%   108% 108%
samba   100%   132%136% |99%   109% 112%
sao  99%   101%100% |99%   100% 100%
webster 100%   132%129% |   100%   106% 106%
x-ray99%   101%100% |90%   101% 101%
xml 100%   147%148% |   100%   127% 125%

In general, I think the results for both patches seem clearly a win, but
maybe patch 1 is  bit better, especially on the newer (xeon) CPU. So I'd
probably go with that one.

[1] https://github.com/x4m/test_pglz

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 


pglz-test.sh
Description: Bourne shell script


pglz-load.sql
Description: application/sql


pglz.ods
Description: application/vnd.oasis.opendocument.spreadsheet


Re: pglz performance

2019-11-25 Thread Michael Paquier
On Mon, Nov 25, 2019 at 01:21:27PM +0500, Andrey Borodin wrote:
> I think status Needs Review describes what is going on better. It's
> not like something is awaited from my side.

Indeed.  You are right so I have moved the patch instead, with "Needs
review".  The patch status was actually incorrect in the CF app, as it
was marked as waiting on author.

@Tomas: updated versions of the patches have been sent by Andrey. 
--
Michael


signature.asc
Description: PGP signature


Re: pglz performance

2019-11-25 Thread Andrey Borodin



> 25 нояб. 2019 г., в 13:03, Michael Paquier  написал(а):
> 
> On Wed, Nov 06, 2019 at 09:04:25AM +0100, Peter Eisentraut wrote:
>> OK, waiting on some independent verification of benchmark numbers.
> 
> Still waiting for these after 19 days, so the patch has been marked as
> returned with feedback.

I think status Needs Review describes what is going on better. It's not like 
something is awaited from my side.

Thanks.

Best regards, Andrey Borodin.



Re: pglz performance

2019-11-25 Thread Michael Paquier
On Wed, Nov 06, 2019 at 09:04:25AM +0100, Peter Eisentraut wrote:
> OK, waiting on some independent verification of benchmark numbers.

Still waiting for these after 19 days, so the patch has been marked as
returned with feedback.
--
Michael


signature.asc
Description: PGP signature


Re: pglz performance

2019-11-06 Thread Peter Eisentraut

On 2019-11-01 17:59, Tomas Vondra wrote:

I'd try running the benchmarks to verify the numbers, and maybe do some
additional tests, but it's not clear to me which patches should I use.

I think the last patches with 'hacked' and 'hacked8' in the name are a
couple of months old, and the recent posts attach just a single patch.
Andrey, can you post current versions of both patches?


OK, waiting on some independent verification of benchmark numbers.

--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: pglz performance

2019-11-06 Thread Peter Eisentraut

On 2019-11-01 16:48, Alvaro Herrera wrote:

As I understand that report, in these results "less is better", so the
hacked8 variant shows better performance (33.8) than current (42.5).
The "hacked" variant shows worse performance (48.2) that the current
code.


Which appears to be the opposite of, or at least inconsistent with, 
results earlier in the thread.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: pglz performance

2019-11-03 Thread Andrey Borodin
Hi Tels!
Thanks for your interest in fast decompression.

> 3 нояб. 2019 г., в 12:24, Tels  написал(а):
> 
> I wonder if you agree and what would happen if you try this variant on your 
> corpus tests.

I've tried some different optimization for literals. For example loop 
unrolling[0] and literals bulk-copying.
This approaches were brining some performance improvement. But with noise. 
Statistically they were somewhere better, somewhere worse, net win, but that 
"net win" depends on what we consider important data and important platform.

Proposed patch makes clearly decompression faster on any dataset, and platform.
I believe improving pglz further is viable, but optimizations like common data 
prefix seems more promising to me.
Also, I think we actually need real codecs like lz4, zstd and brotli instead of 
our own invented wheel.

If you have some spare time - Pull Requests to test_pglz are welcome, lets 
benchmark more micro optimizations, it brings a lot of fun :)


--
Andrey Borodin
Open source RDBMS development team leader
Yandex.Cloud

[0] https://github.com/x4m/test_pglz/blob/master/pg_lzcompress_hacked.c#L166



Re: pglz performance

2019-11-03 Thread Tels

Hello Andrey,

On 2019-11-02 12:30, Andrey Borodin wrote:
1 нояб. 2019 г., в 18:48, Alvaro Herrera  
написал(а):

PFA two patches:
v4-0001-Use-memcpy-in-pglz-decompression.patch (known as 'hacked' in
test_pglz extension)
v4-0001-Use-memcpy-in-pglz-decompression-for-long-matches.patch (known
as 'hacked8')


Looking at the patches, it seems only the case of a match is changed. 
But when we observe a literal byte, this is copied byte-by-byte with:


 else
  {
  * An unset control bit means LITERAL BYTE. So we just
  * copy one from INPUT to OUTPUT.
  */
  *dp++ = *sp++;
  }

Maybe we can optimize this, too. For instance, you could just increase a 
counter:


 else
  {
  /*
  * An unset control bit means LITERAL BYTE. We count
  * these and copy them later.
  */
  literal_bytes ++;
  }

and in the case of:

  if (ctrl & 1)
{
/* First copy all the literal bytes */
if (literal_bytes > 0)
  {
  memcpy( sp, dp, literal_bytes);
  sp += literal_bytes;
  dp += literal_bytes;
  literal_bytes = 0;
  }

(Code untested!)

The same would need to be done at the very end, if the input ends 
without any new CTRL-byte.


Wether that gains us anything depends on how common literal bytes are. 
It might be that highly compressible input has almost none, while input 
that is a mix of incompressible strings and compressible ones might have 
longer stretches. One example would be something like an SHA-256, that 
is repeated twice. The first instance would be incompressible, the 
second one would be just a copy. This might not happens that often in 
practical inputs, though.


I wonder if you agree and what would happen if you try this variant on 
your corpus tests.


Best regards,

Tels




Re: pglz performance

2019-11-02 Thread Andrey Borodin


> 1 нояб. 2019 г., в 18:48, Alvaro Herrera  
> написал(а):
> 
> On 2019-Nov-01, Peter Eisentraut wrote:
> 
>> On 2019-10-25 07:05, Andrey Borodin wrote:
 21 окт. 2019 г., в 14:09, Andrey Borodin  написал(а):
 
 With Silesian corpus pglz_decompress_hacked is actually decreasing 
 performance on high-entropy data.
 Meanwhile pglz_decompress_hacked8 is still faster than usual 
 pglz_decompress.
 In spite of this benchmarks, I think that pglz_decompress_hacked8 is safer 
 option.
>>> 
>>> Here's v3 which takes into account recent benchmarks with Silesian Corpus 
>>> and have better comments.
>> 
>> Your message from 21 October appears to say that this change makes the
>> performance worse.  So I don't know how to proceed with this.
> 
> As I understand that report, in these results "less is better", so the
> hacked8 variant shows better performance (33.8) than current (42.5).
> The "hacked" variant shows worse performance (48.2) that the current
> code.
This is correct. Thanks, Álvaro.

> The "in spite" phrase seems to have been a mistake.
Yes. Sorry, I actually thought that "in spite" is a contradiction of "despite" 
and means "In view of".

> I am surprised that there is so much variability in the performance
> numbers, though, based on such small tweaks of the code.
Silesian Corpus is very different from WALs and PG data files. Data files are 
rich in long sequences of same byte. This sequences are long, thus unrolled 
very effectively by memcpy method.
But Silesian corpus is rich in short matches of few bytes.

> 1 нояб. 2019 г., в 19:59, Tomas Vondra  
> написал(а):
> I'd try running the benchmarks to verify the numbers, and maybe do some
> additional tests, but it's not clear to me which patches should I use.
Cool, thanks!

> I think the last patches with 'hacked' and 'hacked8' in the name are a
> couple of months old, and the recent posts attach just a single patch.
> Andrey, can you post current versions of both patches?
PFA two patches:
v4-0001-Use-memcpy-in-pglz-decompression.patch (known as 'hacked' in test_pglz 
extension)
v4-0001-Use-memcpy-in-pglz-decompression-for-long-matches.patch (known as 
'hacked8')

Best regards, Andrey Borodin.


v4-0001-Use-memcpy-in-pglz-decompression.patch
Description: Binary data


v4-0001-Use-memcpy-in-pglz-decompression-for-long-matches.patch
Description: Binary data


Re: pglz performance

2019-11-01 Thread Tomas Vondra

On Fri, Nov 01, 2019 at 12:48:28PM -0300, Alvaro Herrera wrote:

On 2019-Nov-01, Peter Eisentraut wrote:


On 2019-10-25 07:05, Andrey Borodin wrote:
> > 21 окт. 2019 г., в 14:09, Andrey Borodin  написал(а):
> >
> > With Silesian corpus pglz_decompress_hacked is actually decreasing 
performance on high-entropy data.
> > Meanwhile pglz_decompress_hacked8 is still faster than usual 
pglz_decompress.
> > In spite of this benchmarks, I think that pglz_decompress_hacked8 is safer 
option.
>
> Here's v3 which takes into account recent benchmarks with Silesian Corpus and 
have better comments.

Your message from 21 October appears to say that this change makes the
performance worse.  So I don't know how to proceed with this.


As I understand that report, in these results "less is better", so the
hacked8 variant shows better performance (33.8) than current (42.5).
The "hacked" variant shows worse performance (48.2) that the current
code.  The "in spite" phrase seems to have been a mistake.

I am surprised that there is so much variability in the performance
numbers, though, based on such small tweaks of the code.



I'd try running the benchmarks to verify the numbers, and maybe do some
additional tests, but it's not clear to me which patches should I use.

I think the last patches with 'hacked' and 'hacked8' in the name are a
couple of months old, and the recent posts attach just a single patch.
Andrey, can you post current versions of both patches?


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: pglz performance

2019-11-01 Thread Alvaro Herrera
On 2019-Nov-01, Peter Eisentraut wrote:

> On 2019-10-25 07:05, Andrey Borodin wrote:
> > > 21 окт. 2019 г., в 14:09, Andrey Borodin  
> > > написал(а):
> > > 
> > > With Silesian corpus pglz_decompress_hacked is actually decreasing 
> > > performance on high-entropy data.
> > > Meanwhile pglz_decompress_hacked8 is still faster than usual 
> > > pglz_decompress.
> > > In spite of this benchmarks, I think that pglz_decompress_hacked8 is 
> > > safer option.
> > 
> > Here's v3 which takes into account recent benchmarks with Silesian Corpus 
> > and have better comments.
> 
> Your message from 21 October appears to say that this change makes the
> performance worse.  So I don't know how to proceed with this.

As I understand that report, in these results "less is better", so the
hacked8 variant shows better performance (33.8) than current (42.5).
The "hacked" variant shows worse performance (48.2) that the current
code.  The "in spite" phrase seems to have been a mistake.

I am surprised that there is so much variability in the performance
numbers, though, based on such small tweaks of the code.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: pglz performance

2019-11-01 Thread Peter Eisentraut

On 2019-10-25 07:05, Andrey Borodin wrote:

21 окт. 2019 г., в 14:09, Andrey Borodin  написал(а):

With Silesian corpus pglz_decompress_hacked is actually decreasing performance 
on high-entropy data.
Meanwhile pglz_decompress_hacked8 is still faster than usual pglz_decompress.
In spite of this benchmarks, I think that pglz_decompress_hacked8 is safer 
option.


Here's v3 which takes into account recent benchmarks with Silesian Corpus and 
have better comments.


Your message from 21 October appears to say that this change makes the 
performance worse.  So I don't know how to proceed with this.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: pglz performance

2019-10-24 Thread Andrey Borodin


> 21 окт. 2019 г., в 14:09, Andrey Borodin  написал(а):
> 
> With Silesian corpus pglz_decompress_hacked is actually decreasing 
> performance on high-entropy data.
> Meanwhile pglz_decompress_hacked8 is still faster than usual pglz_decompress.
> In spite of this benchmarks, I think that pglz_decompress_hacked8 is safer 
> option.

Here's v3 which takes into account recent benchmarks with Silesian Corpus and 
have better comments.

Thanks!


--
Andrey Borodin
Open source RDBMS development team leader
Yandex.Cloud


v3-0001-Use-memcpy-in-pglz-decompression.patch
Description: Binary data


Re: pglz performance

2019-10-21 Thread Andrey Borodin



> 28 сент. 2019 г., в 10:29, Andrey Borodin  написал(а):
> 
> I hope to benchmark decompression on Silesian corpus soon.

I've done it. And results are quite controversial.
Dataset adds 12 payloads to our 5. Payloads have relatively high entropy. In 
many cases pglz cannot compress them at all, so decompression is nop, data is 
stored as is.

Decompressor pglz_decompress_hacked result 48.281747
Decompressor pglz_decompress_hacked8 result 33.868779
Decompressor pglz_decompress_vanilla result 42.510165

Tested on Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz

With Silesian corpus pglz_decompress_hacked is actually decreasing performance 
on high-entropy data.
Meanwhile pglz_decompress_hacked8 is still faster than usual pglz_decompress.
In spite of this benchmarks, I think that pglz_decompress_hacked8 is safer 
option.

I've updated test suite [0] and anyone interested can verify benchmarks.

--
Andrey Borodin
Open source RDBMS development team leader
Yandex.Cloud

[0] https://github.com/x4m/test_pglz



Re: pglz performance

2019-09-28 Thread Andrey Borodin
Oleg, Peter, thanks for looking into this!

I hope to benchmark decompression on Silesian corpus soon.

PFA v2 with better comments.

> 27 сент. 2019 г., в 14:41, Peter Eisentraut 
>  написал(а):
> 
> After reviewing this thread and more testing, I think
> 0001-Use-memcpy-in-pglz-decompression.patch appears to be a solid win
> and we should move ahead with it.
> 
> I don't, however, fully understand the code changes, and I think this
> could use more and better comments.  In particular, I wonder about
> 
> off *= 2;
I've changed this to
off += off;

> 
> This is new logic that isn't explained anywhere.
> 
> This whole function is commented a bit strangely.  It begins with
> "Otherwise", but there is nothing before it.  And what does "from OUTPUT
> to OUTPUT" mean?  There is no "output" variable.  We should make this
> match the code better.


I've added small example to illustrate what is going on.

Thanks!

--
Andrey Borodin
Open source RDBMS development team leader
Yandex.Cloud


v2-0001-Use-memcpy-in-pglz-decompression.patch
Description: Binary data


Re: pglz performance

2019-09-27 Thread Peter Eisentraut
On 2019-09-04 14:45, Andrey Borodin wrote:
>> On 2019-09-04 11:22, Andrey Borodin wrote:
 What about the two patches?  Which one is better?
>>> On our observations pglz_decompress_hacked.patch is best for most of tested 
>>> platforms.
>>> Difference is that pglz_decompress_hacked8.patch will not appply 
>>> optimization if decompressed match is not greater than 8 bytes. This 
>>> optimization was suggested by Tom, that's why we benchmarked it 
>>> specifically.
>>
>> The patches attached to the message I was replying to are named
>>
>> 0001-Use-memcpy-in-pglz-decompression-for-long-matches.patch
>> 0001-Use-memcpy-in-pglz-decompression.patch
>>
>> Are those the same ones?
> 
> Yes. Sorry for this confusion.
> 
> The only difference of 
> 0001-Use-memcpy-in-pglz-decompression-for-long-matches.patch is that it 
> fallbacks to byte-loop if len is <= 8.

After reviewing this thread and more testing, I think
0001-Use-memcpy-in-pglz-decompression.patch appears to be a solid win
and we should move ahead with it.

I don't, however, fully understand the code changes, and I think this
could use more and better comments.  In particular, I wonder about

off *= 2;

This is new logic that isn't explained anywhere.

This whole function is commented a bit strangely.  It begins with
"Otherwise", but there is nothing before it.  And what does "from OUTPUT
to OUTPUT" mean?  There is no "output" variable.  We should make this
match the code better.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: pglz performance

2019-09-25 Thread Petr Jelinek

Hi,

On 05/08/2019 09:26, Andres Freund wrote:

Hi,

On 2019-08-05 00:19:28 +0200, Petr Jelinek wrote:

It carries that information inside the compressed value, like I said in the
other reply, that's why the extra byte.


I'm not convinced that that is a good plan - imo the reference to the
compressed data should carry that information.

I.e. in the case of toast, at least toast pointers should hold enough
information to determine the compression algorithm. And in the case of
WAL, the WAL record should contain that.


Point taken.



For external datums I suggest encoding the compression method as a
distinct VARTAG_ONDISK_COMPRESSED, and then have that include the
compression method as a field.


So the new reads/writes will use this and reads of old format won't 
change? Sounds fine.




For in-line compressed values (so VARATT_IS_4B_C), doing something
roughly like you did, indicating the type of metadata following using
the high bit sounds reasonable. But I think I'd make it so that if the
highbit is set, the struct is instead entirely different, keeping a full
4 byte byte length, and including the compression type header inside the
struct. Perhaps I'd combine the compression type with the high-bit-set
part?  So when the high bit is set, it'd be something like

{
 int32  vl_len_;/* varlena header (do not touch 
directly!) */

 /*
  * Actually only 7 bit, the high bit determines whether this
  * is the old compression header (if unset), or this type of header
  * (if set).
  */
 uint8 type;

 /*
  * Stored as uint8[4], to avoid unnecessary alignment padding.
  */
 uint8[4] length;

 char va_data[FLEXIBLE_ARRAY_MEMBER];
}



Won't this break BW compatibility on big-endian (if I understand 
corretly what you are trying to achieve here)?



I think it's worth spending some effort trying to get this right - we'll
be bound by design choices for a while.



Sure, however I am not in business of redesigning TOAST from scratch 
right now, even if I do agree that the current header is far from ideal.


--
Petr Jelinek
2ndQuadrant - PostgreSQL Solutions for the Enterprise
https://www.2ndQuadrant.com/




Re: pglz performance

2019-09-15 Thread Oleg Bartunov
On Wed, Sep 4, 2019 at 12:22 PM Andrey Borodin  wrote:
>
> Hi, Peter! Thanks for looking into this.
>
> > 4 сент. 2019 г., в 14:09, Peter Eisentraut 
> >  написал(а):
> >
> > On 2019-06-24 10:44, Andrey Borodin wrote:
> >>> 18 мая 2019 г., в 11:44, Andrey Borodin  написал(а):
> >>>
> >> Hi!
> >> Here's rebased version of patches.
> >>
> >> Best regards, Andrey Borodin.
> >
> > I think this is the most recent patch for the CF entry
> > .
> >
> > What about the two patches?  Which one is better?
> On our observations pglz_decompress_hacked.patch is best for most of tested 
> platforms.
> Difference is that pglz_decompress_hacked8.patch will not appply optimization 
> if decompressed match is not greater than 8 bytes. This optimization was 
> suggested by Tom, that's why we benchmarked it specifically.
>
> > Have you also considered using memmove() to deal with the overlap issue?
> Yes, memmove() resolves ambiguity of copying overlapping regions in a way 
> that is not compatible with pglz. In proposed patch we never copy overlapping 
> regions.
>
> > Benchmarks have been posted in this thread.  Where is the benchmarking
> > tool?  Should we include that in the source somehow?
>
> Benchmarking tool is here [0]. Well, code of the benchmarking tool do not 
> adhere to our standards in some places, we did not consider its inclusion in 
> core.
> However, most questionable part of benchmarking is choice of test data. It's 
> about 100Mb of useless WALs, datafile and valuable Shakespeare writings.

Why not use 'Silesia compression corpus'
(http://sun.aei.polsl.pl/~sdeor/index.php?page=silesia), which used by
lzbench (https://github.com/inikep/lzbench) ?  I and Teodor remember
that testing on non-english texts could be very important.


>
> Best regards, Andrey Borodin.
>
>
> [0] https://github.com/x4m/test_pglz
>
>
>


-- 
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: pglz performance

2019-09-12 Thread Alvaro Herrera
I just noticed we had two CF items pointing to this thread,

https://commitfest.postgresql.org/24/2119/
https://commitfest.postgresql.org/24/2180/

so I marked the newer one as withdrawn.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: pglz performance

2019-09-04 Thread Andrey Borodin



> 4 сент. 2019 г., в 17:40, Peter Eisentraut  
> написал(а):
> 
> On 2019-09-04 11:22, Andrey Borodin wrote:
>>> What about the two patches?  Which one is better?
>> On our observations pglz_decompress_hacked.patch is best for most of tested 
>> platforms.
>> Difference is that pglz_decompress_hacked8.patch will not appply 
>> optimization if decompressed match is not greater than 8 bytes. This 
>> optimization was suggested by Tom, that's why we benchmarked it specifically.
> 
> The patches attached to the message I was replying to are named
> 
> 0001-Use-memcpy-in-pglz-decompression-for-long-matches.patch
> 0001-Use-memcpy-in-pglz-decompression.patch
> 
> Are those the same ones?

Yes. Sorry for this confusion.

The only difference of 
0001-Use-memcpy-in-pglz-decompression-for-long-matches.patch is that it 
fallbacks to byte-loop if len is <= 8.

Best regards, Andrey Borodin.



Re: pglz performance

2019-09-04 Thread Peter Eisentraut
On 2019-09-04 11:22, Andrey Borodin wrote:
>> What about the two patches?  Which one is better?
> On our observations pglz_decompress_hacked.patch is best for most of tested 
> platforms.
> Difference is that pglz_decompress_hacked8.patch will not appply optimization 
> if decompressed match is not greater than 8 bytes. This optimization was 
> suggested by Tom, that's why we benchmarked it specifically.

The patches attached to the message I was replying to are named

0001-Use-memcpy-in-pglz-decompression-for-long-matches.patch
0001-Use-memcpy-in-pglz-decompression.patch

Are those the same ones?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: pglz performance

2019-09-04 Thread Andrey Borodin
Hi, Peter! Thanks for looking into this.

> 4 сент. 2019 г., в 14:09, Peter Eisentraut  
> написал(а):
> 
> On 2019-06-24 10:44, Andrey Borodin wrote:
>>> 18 мая 2019 г., в 11:44, Andrey Borodin  написал(а):
>>> 
>> Hi! 
>> Here's rebased version of patches.
>> 
>> Best regards, Andrey Borodin.
> 
> I think this is the most recent patch for the CF entry
> .
> 
> What about the two patches?  Which one is better?
On our observations pglz_decompress_hacked.patch is best for most of tested 
platforms.
Difference is that pglz_decompress_hacked8.patch will not appply optimization 
if decompressed match is not greater than 8 bytes. This optimization was 
suggested by Tom, that's why we benchmarked it specifically.

> Have you also considered using memmove() to deal with the overlap issue?
Yes, memmove() resolves ambiguity of copying overlapping regions in a way that 
is not compatible with pglz. In proposed patch we never copy overlapping 
regions.

> Benchmarks have been posted in this thread.  Where is the benchmarking
> tool?  Should we include that in the source somehow?

Benchmarking tool is here [0]. Well, code of the benchmarking tool do not 
adhere to our standards in some places, we did not consider its inclusion in 
core.
However, most questionable part of benchmarking is choice of test data. It's 
about 100Mb of useless WALs, datafile and valuable Shakespeare writings.

Best regards, Andrey Borodin.


[0] https://github.com/x4m/test_pglz





Re: pglz performance

2019-09-04 Thread Peter Eisentraut
On 2019-06-24 10:44, Andrey Borodin wrote:
>> 18 мая 2019 г., в 11:44, Andrey Borodin  написал(а):
>>
> Hi! 
> Here's rebased version of patches.
> 
> Best regards, Andrey Borodin.

I think this is the most recent patch for the CF entry
.

What about the two patches?  Which one is better?

Have you also considered using memmove() to deal with the overlap issue?

Benchmarks have been posted in this thread.  Where is the benchmarking
tool?  Should we include that in the source somehow?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: pglz performance

2019-08-06 Thread Andres Freund
Hi,

On 2019-08-05 16:04:46 +0900, Michael Paquier wrote:
> On Fri, Aug 02, 2019 at 07:52:39PM +0200, Tomas Vondra wrote:
> > On Fri, Aug 02, 2019 at 10:12:58AM -0700, Andres Freund wrote:
> >> Why would they be stuck continuing to *compress* with pglz? As we
> >> fully retoast on write anyway we can just gradually switch over to the
> >> better algorithm. Decompression speed is another story, of course.
> > 
> > Hmmm, I don't remember the details of those patches so I didn't realize
> > it allows incremental recompression. If that's possible, that would mean
> > existing systems can start using it. Which is good.
> 
> It may become a problem on some platforms though (Windows?), so
> patches to improve either the compression or decompression of pglz are
> not that much crazy as they are still likely going to be used, and for
> read-mostly switching to a new algo may not be worth the extra cost so
> it is not like we are going to drop it completely either.

What's the platform dependency that you're thinking of? And how's
compression speed relevant to "read mostly"?  Switching would just
happen whenever tuple fields are changed. And it'll not have an
additional cost, because all it does is reduce the cost of a toast write
that'd otherwise happened with pglz.


> Linking to system libraries would make our maintenance much easier,
> and when it comes to have a copy of something else in the tree we
> would be stuck with more maintenance around it.  These tend to rot
> easily.

I don't think it's really our experience that they "rot easily".


> After that comes the case where the compression algo is not
> in the binary across one server to another, in which case we have an
> automatic ERROR in case of a mismatching algo, or FATAL for
> deompression of FPWs at recovery when wal_compression is used.

Huh? That's a failure case that only exists if you don't include it in
the tree (with the option to use an out-of-tree lib)?

Greetings,

Andres Freund




Re: pglz performance

2019-08-05 Thread Andres Freund
Hi,

On 2019-08-05 00:19:28 +0200, Petr Jelinek wrote:
> It carries that information inside the compressed value, like I said in the
> other reply, that's why the extra byte.

I'm not convinced that that is a good plan - imo the reference to the
compressed data should carry that information.

I.e. in the case of toast, at least toast pointers should hold enough
information to determine the compression algorithm. And in the case of
WAL, the WAL record should contain that.

Consider e.g. adding support for slice fetching of datums compressed
with some algorithm - we should be able to determine whether that's
possible without fetching the datum (so we can take a non-exceptional
path for datums compressed otherwise). Similarly, for WAL, we should be
able to detect whether an incompatible compression format is used,
without having to invoke a generic compression routine that then fails
in some way. Or adding compression reporting for WAL to xlogdump.

I also don't particularly like baking in the assumption that we don't
support tuples larger than 1GB in further places. To me it seems likely
that we're going to have to fix that, and it's hard enough already... I
know that my patch did that too...

For external datums I suggest encoding the compression method as a
distinct VARTAG_ONDISK_COMPRESSED, and then have that include the
compression method as a field.

For in-line compressed values (so VARATT_IS_4B_C), doing something
roughly like you did, indicating the type of metadata following using
the high bit sounds reasonable. But I think I'd make it so that if the
highbit is set, the struct is instead entirely different, keeping a full
4 byte byte length, and including the compression type header inside the
struct. Perhaps I'd combine the compression type with the high-bit-set
part?  So when the high bit is set, it'd be something like

{
int32   vl_len_;/* varlena header (do not touch 
directly!) */

/*
 * Actually only 7 bit, the high bit determines whether this
 * is the old compression header (if unset), or this type of header
 * (if set).
 */
uint8 type;

/*
 * Stored as uint8[4], to avoid unnecessary alignment padding.
 */
uint8[4] length;

char va_data[FLEXIBLE_ARRAY_MEMBER];
}

I think it's worth spending some effort trying to get this right - we'll
be bound by design choices for a while.


It's kinda annoying that toast datums aren't better designed. Creating
them from scratch, I'd make it something like:

1) variable-width integer describing the "physical length", so that
   tuple deforming can quickly determine length - all the ifs necessary
   to determine lengths are a bottleneck. I'd probably just use a 127bit
   encoding, if the high bit is set, there's a following length byte.

2) type of toasted datum, probably also in a variable width encoding,
   starting at 1byte. Not that I think it's likely we'd overrun 256
   types - but it's cheap enough to just declare the high bit as an
   length extension bit.

These are always stored unaligned. So there's no need to deal with
padding bytes having to be zero to determine whether we're dealing with
a 1byte datum etc.

Then, type dependant:

For In-line uncompressed datums
3a) alignment padding, amount determined by 2) above, i.e. we'd just
have different types for different amounts of alignment. Probably
using some heuristic to use unaligned when either dealing with data
that doesn't need alignment, or when the datum is fairly small, so
copying to get the data as unaligned won't be a significant penalty.
4a) data

For in-line compressed datums
3b) compression metadata {varint rawsize, varint compression algorithm}
4b) unaligned compressed data - there's no benefit in keeping it aligned

For External toast for uncompressed data:
3d) {toastrelid, valueid, varint rawsize}

For External toast for compressed data:
3e) {valueid, toastrelid, varint compression_algorithm, varint rawsize, varint 
extsize}


That'd make it a lot more extensible, easier to understand, faster to
decode in a lot of cases, remove a lot of arbitrary limits.  Yes, it'd
increase the header size for small datums to two bytes, but I think
that'd be more than bought back by the other improvements.


Greetings,

Andres Freund




Re: pglz performance

2019-08-05 Thread Michael Paquier
On Fri, Aug 02, 2019 at 07:52:39PM +0200, Tomas Vondra wrote:
> On Fri, Aug 02, 2019 at 10:12:58AM -0700, Andres Freund wrote:
>> Why would they be stuck continuing to *compress* with pglz? As we
>> fully retoast on write anyway we can just gradually switch over to the
>> better algorithm. Decompression speed is another story, of course.
> 
> Hmmm, I don't remember the details of those patches so I didn't realize
> it allows incremental recompression. If that's possible, that would mean
> existing systems can start using it. Which is good.

It may become a problem on some platforms though (Windows?), so
patches to improve either the compression or decompression of pglz are
not that much crazy as they are still likely going to be used, and for
read-mostly switching to a new algo may not be worth the extra cost so
it is not like we are going to drop it completely either.  My take,
still the same as upthread, is that it mostly depends on the amount of
complexity each patch introduces compared to the performance gain. 

> Another question is whether we'd actually want to include the code in
> core directly, or use system libraries (and if some packagers might
> decide to disable that, for whatever reason).

Linking to system libraries would make our maintenance much easier,
and when it comes to have a copy of something else in the tree we
would be stuck with more maintenance around it.  These tend to rot
easily.  After that comes the case where the compression algo is not
in the binary across one server to another, in which case we have an
automatic ERROR in case of a mismatching algo, or FATAL for
deompression of FPWs at recovery when wal_compression is used.
--
Michael


signature.asc
Description: PGP signature


Re: pglz performance

2019-08-04 Thread Petr Jelinek

Hi,

On 05/08/2019 00:15, Andres Freund wrote:

Hi,

On 2019-08-04 17:53:26 +0200, Petr Jelinek wrote:

5) I wonder why compression_algorithm is defined as PGC_SIGHUP. Why not
to allow users to set it per session? I suppose we might have a separate
option for WAL compression_algorithm.



Yeah I was thinking we might want to change wal_compression to enum as well.
Although that complicates the code quite a bit (the caller has to decide
algorithm instead compression system doing it).


Isn't that basically required anyway? The WAL record will need to carry
information about the type of compression used, independent of
PGC_SIGHUP/PGC_USERSET, unless you want to make it an initdb option or
something super heavyweight like that.



It carries that information inside the compressed value, like I said in 
the other reply, that's why the extra byte.


--
Petr Jelinek
2ndQuadrant - PostgreSQL Solutions for the Enterprise
https://www.2ndQuadrant.com/




Re: pglz performance

2019-08-04 Thread Andres Freund
Hi,

On 2019-08-04 17:53:26 +0200, Petr Jelinek wrote:
> > 5) I wonder why compression_algorithm is defined as PGC_SIGHUP. Why not
> > to allow users to set it per session? I suppose we might have a separate
> > option for WAL compression_algorithm.
> > 
> 
> Yeah I was thinking we might want to change wal_compression to enum as well.
> Although that complicates the code quite a bit (the caller has to decide
> algorithm instead compression system doing it).

Isn't that basically required anyway? The WAL record will need to carry
information about the type of compression used, independent of
PGC_SIGHUP/PGC_USERSET, unless you want to make it an initdb option or
something super heavyweight like that.

We could just have the wal_compression assign hook set a the compression
callback and a compression type integer or such if we want to avoid
doing that kind of thing at runtime.

Greetings,

Andres Freund




Re: pglz performance

2019-08-04 Thread Petr Jelinek

Hi,

On 04/08/2019 21:20, Andres Freund wrote:

On 2019-08-04 02:41:24 +0200, Petr Jelinek wrote:

Same here.

Just so that we don't idly talk, what do you think about the attached?


Cool!


It:
- adds new GUC compression_algorithm with possible values of pglz (default)
and lz4 (if lz4 is compiled in), requires SIGHUP


As Tomas remarked, I think it shouldn't be SIGHUP but USERSET. And I
think lz4 should be preferred, if available.  I could see us using a
list style guc, so we could set it to lz4, pglz, and the first available
one would be used.



Sounds reasonable.


- adds 1 byte header to the compressed data where we currently store the
algorithm kind, that leaves us with 254 more to add :) (that's an extra
overhead compared to the current state)


Hm. Why do we need an additional byte?  IIRC my patch added that only
for the case we would run out of space for compression formats without
extending any sizes?



Yeah your patch worked differently (I didn't actually use any code from 
it). The main reason why I add the byte is that I am storing the 
algorithm in the compressed value itself, not in varlena header. I was 
mainly trying to not have every caller care about storing and loading 
the compression algorithm. I also can't say I particularly like that 
hack in your patch.


However if we'd want to have separate GUCs for TOAST and WAL then we'll 
have to do that anyway so maybe it does not matter anymore (we can't use 
similar hack there AFAICS though).





- changes the rawsize in TOAST header to 31 bits via bit packing
- uses the extra bit to differentiate between old and new format


Hm. Wouldn't it be easier to just use a different vartag for this?



That would only work for external TOAST pointers right? The compressed 
varlena can also be stored inline and potentially in index tuple.





- I expect my changes to configure.in are not the greatest as I don't have
pretty much zero experience with autoconf


FWIW the configure output changes are likely because you used a modified
version of autoconf. Unfortunately debian/ubuntu ship one with vendor
patches.



Yeah, Ubuntu here, that explains.

--
Petr Jelinek
2ndQuadrant - PostgreSQL Solutions for the Enterprise
https://www.2ndQuadrant.com/




Re: pglz performance

2019-08-04 Thread Andres Freund
Hi,

On 2019-08-04 02:41:24 +0200, Petr Jelinek wrote:
> Same here.
> 
> Just so that we don't idly talk, what do you think about the attached?

Cool!

> It:
> - adds new GUC compression_algorithm with possible values of pglz (default)
> and lz4 (if lz4 is compiled in), requires SIGHUP

As Tomas remarked, I think it shouldn't be SIGHUP but USERSET. And I
think lz4 should be preferred, if available.  I could see us using a
list style guc, so we could set it to lz4, pglz, and the first available
one would be used.

> - adds 1 byte header to the compressed data where we currently store the
> algorithm kind, that leaves us with 254 more to add :) (that's an extra
> overhead compared to the current state)

Hm. Why do we need an additional byte?  IIRC my patch added that only
for the case we would run out of space for compression formats without
extending any sizes?


> - changes the rawsize in TOAST header to 31 bits via bit packing
> - uses the extra bit to differentiate between old and new format

Hm. Wouldn't it be easier to just use a different vartag for this?


> - I expect my changes to configure.in are not the greatest as I don't have
> pretty much zero experience with autoconf

FWIW the configure output changes are likely because you used a modified
version of autoconf. Unfortunately debian/ubuntu ship one with vendor
patches.

Greetings,

Andres Freund




Re: pglz performance

2019-08-04 Thread Tomas Vondra

On Sun, Aug 04, 2019 at 05:53:26PM +0200, Petr Jelinek wrote:


...



4) I did a simple test with physical replication, with lz4 enabled on
both sides (well, can't build without lz4 anyway, per previous point).
It immediately failed like this:

FATAL:  failed to restore block image
CONTEXT:  WAL redo at 0/5000A40 for Btree/INSERT_LEAF: off 138
LOG:  startup process (PID 15937) exited with exit code 1

This is a simple UPDATE on a trivial table:

create table t (a int primary key);
insert into t select i from generate_series(1,1000) s(i);
update t set a = a - 10 where random () < 0.1;

with some checkpoints to force FPW (and wal_compression=on, of course).

I haven't tried `make check-world` but I suppose some of the TAP tests
should fail because of this. And if not, we need to improve coverage.



FWIW I did run check-world without problems, will have to look into this.



Not sure if we bother to set wal_compression=on for check-world (I don't
think we do, but I may be missing something), so maybe check-world does
not really test wal compression.

IMO the issue is that RestoreBlockImage() still calls pglz_decompress
directly, instead of going through pg_decompress().


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: pglz performance

2019-08-04 Thread Petr Jelinek

Hi,

On 04/08/2019 11:57, Andrey Borodin wrote:




2 авг. 2019 г., в 21:39, Andres Freund  написал(а):

On 2019-08-02 20:40:51 +0500, Andrey Borodin wrote:

We have some kind of "roadmap" of "extensible pglz". We plan to provide 
implementation on Novembers CF.


I don't understand why it's a good idea to improve the compression side
of pglz. There's plenty other people that spent a lot of time developing
better compression algorithms.

Improving compression side of pglz has two different projects:
1. Faster compression with less code and same compression ratio (patch in this 
thread).
2. Better compression ratio with at least same compression speed of 
uncompressed values.
Why I want to do patch for 2? Because it's interesting.
Will 1 or 2 be reviewed or committed? I have no idea.
Will many users benefit from 1 or 2? Yes, clearly. Unless we force everyone to 
stop compressing with pglz.



FWIW I agree.


Just so that we don't idly talk, what do you think about the attached?
It:
- adds new GUC compression_algorithm with possible values of pglz (default) and 
lz4 (if lz4 is compiled in), requires SIGHUP
- adds --with-lz4 configure option (default yes, so the configure option is 
actually --without-lz4) that enables the lz4, it's using system library
- uses the compression_algorithm for both TOAST and WAL compression (if on)
- supports slicing for lz4 as well (pglz was already supported)
- supports reading old TOAST values
- adds 1 byte header to the compressed data where we currently store the 
algorithm kind, that leaves us with 254 more to add :) (that's an extra 
overhead compared to the current state)
- changes the rawsize in TOAST header to 31 bits via bit packing
- uses the extra bit to differentiate between old and new format
- supports reading from table which has different rows stored with different 
algorithm (so that the GUC itself can be freely changed)

That's cool. I suggest defaulting to lz4 if it is available. You cannot start 
cluster on non-lz4 binaries which used lz4 once.
Do we plan the possibility of compression algorithm as extension? Or will all 
algorithms be packed into that byte in core?


What I wrote does not expect extensions providing new compression. We'd 
have to somehow reserve compression ids for specific extensions and that 
seems like a lot of extra complexity for little benefit. I don't see 
much benefit in having more than say 3 generic compressors (I could 
imagine adding zstd). If you are thinking about data type specific 
compression then I think this is wrong layer.



What about lz4 "common prefix"? System or user-defined. If lz4 is compiled in 
we can even offer in-system training, just make sure that trained prefixes will make 
their way to standbys.



I definitely don't plan to work on common prefix. But don't see why that 
could not be added later.


--
Petr Jelinek
2ndQuadrant - PostgreSQL Solutions for the Enterprise
https://www.2ndQuadrant.com/




Re: pglz performance

2019-08-04 Thread Petr Jelinek

Hi,

On 04/08/2019 13:52, Tomas Vondra wrote:

On Sun, Aug 04, 2019 at 02:41:24AM +0200, Petr Jelinek wrote:

Hi,

On 02/08/2019 21:48, Tomas Vondra wrote:

On Fri, Aug 02, 2019 at 11:20:03AM -0700, Andres Freund wrote:




Another question is whether we'd actually want to include the code in
core directly, or use system libraries (and if some packagers might
decide to disable that, for whatever reason).


I'd personally say we should have an included version, and a
--with-system-... flag that uses the system one.



OK. I'd say to require a system library, but that's a minor detail.



Same here.

Just so that we don't idly talk, what do you think about the attached?
It:
- adds new GUC compression_algorithm with possible values of pglz 
(default) and lz4 (if lz4 is compiled in), requires SIGHUP
- adds --with-lz4 configure option (default yes, so the configure 
option is actually --without-lz4) that enables the lz4, it's using 
system library
- uses the compression_algorithm for both TOAST and WAL compression 
(if on)

- supports slicing for lz4 as well (pglz was already supported)
- supports reading old TOAST values
- adds 1 byte header to the compressed data where we currently store 
the algorithm kind, that leaves us with 254 more to add :) (that's an 
extra overhead compared to the current state)

- changes the rawsize in TOAST header to 31 bits via bit packing
- uses the extra bit to differentiate between old and new format
- supports reading from table which has different rows stored with 
different algorithm (so that the GUC itself can be freely changed)




Cool.


Simple docs and a TAP test included.

I did some basic performance testing (it's not really my thing though, 
so I would appreciate if somebody did more).
I get about 7x perf improvement on data load with lz4 compared to pglz 
on my dataset but strangely only tiny decompression improvement. 
Perhaps more importantly I also did before patch and after patch tests 
with pglz and the performance difference with my data set was <1%.


Note that this will just link against lz4, it does not add lz4 into 
PostgreSQL code-base.




WFM, although I think Andres wanted to do both (link against system and
add lz4 code as a fallback). I think the question is what happens when
you run with lz4 for a while, and then switch to binaries without lz4
support. Or when you try to replicate from lz4-enabled instance to an
instance without it. Especially for physical replication, but I suppose
it may affect logical replication with binary protocol?



I generally prefer having system library, we don't include for example 
ICU either.




A very brief review:


1) I wonder what "runstatedir" is about.



No idea, that stuff is generated by autoconf from configure.in.



2) This seems rather suspicious, and obviously the comment is now
entirely bogus:

/* Check that off_t can represent 2**63 - 1 correctly.
    We can't simply define LARGE_OFF_T to be 9223372036854775807,
    since some C++ compilers masquerading as C compilers
    incorrectly reject 9223372036854775807.  */
-#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62))
+#define LARGE_OFF_T off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) 
<< 31))




Same as above. TBH I am not sure why we even include configure in git 
repo given that different autoconf versions can build different outputs.




3) I can't really build without lz4:

config.status: linking src/makefiles/Makefile.linux to src/Makefile.port
pg_lzcompress.c: In function ‘pg_compress_bound’:
pg_lzcompress.c:892:22: error: ‘SIZEOF_PG_COMPRESS_HEADER’ undeclared 
(first use in this function)

    return slen + 4 + SIZEOF_PG_COMPRESS_HEADER;
  ^
pg_lzcompress.c:892:22: note: each undeclared identifier is reported 
only once for each function it appears in




Okay, that's just problem of SIZEOF_PG_COMPRESS_HEADER being defined 
inside the HAVE_LZ4 ifdef while it should be defined above ifdef.




4) I did a simple test with physical replication, with lz4 enabled on
both sides (well, can't build without lz4 anyway, per previous point).
It immediately failed like this:

FATAL:  failed to restore block image
CONTEXT:  WAL redo at 0/5000A40 for Btree/INSERT_LEAF: off 138
LOG:  startup process (PID 15937) exited with exit code 1

This is a simple UPDATE on a trivial table:

create table t (a int primary key);
insert into t select i from generate_series(1,1000) s(i);
update t set a = a - 10 where random () < 0.1;

with some checkpoints to force FPW (and wal_compression=on, of course).

I haven't tried `make check-world` but I suppose some of the TAP tests
should fail because of this. And if not, we need to improve coverage.



FWIW I did run check-world without problems, will have to look into this.



5) I wonder why compression_algorithm is defined as PGC_SIGHUP. Why not
to allow users to set it per session? I suppose we might have a separate
option for WAL compression_algorithm.




Re: pglz performance

2019-08-04 Thread Tomas Vondra

On Sun, Aug 04, 2019 at 02:41:24AM +0200, Petr Jelinek wrote:

Hi,

On 02/08/2019 21:48, Tomas Vondra wrote:

On Fri, Aug 02, 2019 at 11:20:03AM -0700, Andres Freund wrote:




Another question is whether we'd actually want to include the code in
core directly, or use system libraries (and if some packagers might
decide to disable that, for whatever reason).


I'd personally say we should have an included version, and a
--with-system-... flag that uses the system one.



OK. I'd say to require a system library, but that's a minor detail.



Same here.

Just so that we don't idly talk, what do you think about the attached?
It:
- adds new GUC compression_algorithm with possible values of pglz 
(default) and lz4 (if lz4 is compiled in), requires SIGHUP
- adds --with-lz4 configure option (default yes, so the configure 
option is actually --without-lz4) that enables the lz4, it's using 
system library

- uses the compression_algorithm for both TOAST and WAL compression (if on)
- supports slicing for lz4 as well (pglz was already supported)
- supports reading old TOAST values
- adds 1 byte header to the compressed data where we currently store 
the algorithm kind, that leaves us with 254 more to add :) (that's an 
extra overhead compared to the current state)

- changes the rawsize in TOAST header to 31 bits via bit packing
- uses the extra bit to differentiate between old and new format
- supports reading from table which has different rows stored with 
different algorithm (so that the GUC itself can be freely changed)




Cool.


Simple docs and a TAP test included.

I did some basic performance testing (it's not really my thing though, 
so I would appreciate if somebody did more).
I get about 7x perf improvement on data load with lz4 compared to pglz 
on my dataset but strangely only tiny decompression improvement. 
Perhaps more importantly I also did before patch and after patch tests 
with pglz and the performance difference with my data set was <1%.


Note that this will just link against lz4, it does not add lz4 into 
PostgreSQL code-base.




WFM, although I think Andres wanted to do both (link against system and
add lz4 code as a fallback). I think the question is what happens when
you run with lz4 for a while, and then switch to binaries without lz4
support. Or when you try to replicate from lz4-enabled instance to an
instance without it. Especially for physical replication, but I suppose
it may affect logical replication with binary protocol?


A very brief review:


1) I wonder what "runstatedir" is about.


2) This seems rather suspicious, and obviously the comment is now
entirely bogus:

/* Check that off_t can represent 2**63 - 1 correctly.
   We can't simply define LARGE_OFF_T to be 9223372036854775807,
   since some C++ compilers masquerading as C compilers
   incorrectly reject 9223372036854775807.  */
-#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62))
+#define LARGE_OFF_T off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 31))


3) I can't really build without lz4:

config.status: linking src/makefiles/Makefile.linux to src/Makefile.port
pg_lzcompress.c: In function ‘pg_compress_bound’:
pg_lzcompress.c:892:22: error: ‘SIZEOF_PG_COMPRESS_HEADER’ undeclared (first 
use in this function)
   return slen + 4 + SIZEOF_PG_COMPRESS_HEADER;
 ^
pg_lzcompress.c:892:22: note: each undeclared identifier is reported only once 
for each function it appears in


4) I did a simple test with physical replication, with lz4 enabled on
both sides (well, can't build without lz4 anyway, per previous point).
It immediately failed like this:

FATAL:  failed to restore block image
CONTEXT:  WAL redo at 0/5000A40 for Btree/INSERT_LEAF: off 138
LOG:  startup process (PID 15937) exited with exit code 1

This is a simple UPDATE on a trivial table:

create table t (a int primary key);
insert into t select i from generate_series(1,1000) s(i);
update t set a = a - 10 where random () < 0.1;

with some checkpoints to force FPW (and wal_compression=on, of course).

I haven't tried `make check-world` but I suppose some of the TAP tests
should fail because of this. And if not, we need to improve coverage.


5) I wonder why compression_algorithm is defined as PGC_SIGHUP. Why not
to allow users to set it per session? I suppose we might have a separate
option for WAL compression_algorithm.


6) It seems a bit strange that pg_compress/pg_decompress are now defined
in pglz_compress.{c,h}. Maybe we should invent src/common/compress.c?


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: pglz performance

2019-08-04 Thread Andrey Borodin



> 2 авг. 2019 г., в 21:39, Andres Freund  написал(а):
> 
> On 2019-08-02 20:40:51 +0500, Andrey Borodin wrote:
>> We have some kind of "roadmap" of "extensible pglz". We plan to provide 
>> implementation on Novembers CF.
> 
> I don't understand why it's a good idea to improve the compression side
> of pglz. There's plenty other people that spent a lot of time developing
> better compression algorithms.
Improving compression side of pglz has two different projects:
1. Faster compression with less code and same compression ratio (patch in this 
thread).
2. Better compression ratio with at least same compression speed of 
uncompressed values.
Why I want to do patch for 2? Because it's interesting.
Will 1 or 2 be reviewed or committed? I have no idea.
Will many users benefit from 1 or 2? Yes, clearly. Unless we force everyone to 
stop compressing with pglz.

>> Currently, pglz starts with empty cache map: there is no prior 4k bytes 
>> before start. We can add imaginary prefix to any data with common 
>> substrings: this will enhance compression ratio.
>> It is hard to decide on training data set for this "common prefix". So we 
>> want to produce extension with aggregate function which produces some 
>> "adapted common prefix" from users's data.
>> Then we can "reserve" few negative bytes for "decompression commands". This 
>> command can instruct database on which common prefix to use.
>> But also system command can say "invoke decompression from extension".
>> 
>> Thus, user will be able to train database compression on his data and 
>> substitute pglz compression with custom compression method seamlessly.
>> 
>> This will make hard-choosen compression unneeded, but seems overly hacky. 
>> But there will be no need to have lz4, zstd, brotli, lzma and others in 
>> core. Why not provide e.g. "time series compression"? Or "DNA compression"? 
>> Whatever gun user wants for his foot.
> 
> I think this is way too complicated, and will provide not particularly
> much benefit for the majority users.
> 
> In fact, I'll argue that we should flat out reject any such patch until
> we have at least one decent default compression algorithm in
> core. You're trying to work around a poor compression algorithm with
> complicated dictionary improvement
OK. The idea of something plugged into pglz seemed odd even to me.
But looks like it restarted lz4 discussion :)

> , that require user interaction, and
> only will work in a relatively small subset of the cases, and will very
> often increase compression times.
No, surely, if implementation of "common prefix" will increase compression 
times I will not even post a patch.
BTW, lz4 also supports "common prefix", let's do that too?
Here's link on Zstd dictionary builder, but it is compatible with lz4
https://github.com/facebook/zstd#the-case-for-small-data-compression
We actually have small datums.

> 4 авг. 2019 г., в 5:41, Petr Jelinek  написал(а):
> 
> Just so that we don't idly talk, what do you think about the attached?
> It:
> - adds new GUC compression_algorithm with possible values of pglz (default) 
> and lz4 (if lz4 is compiled in), requires SIGHUP
> - adds --with-lz4 configure option (default yes, so the configure option is 
> actually --without-lz4) that enables the lz4, it's using system library
> - uses the compression_algorithm for both TOAST and WAL compression (if on)
> - supports slicing for lz4 as well (pglz was already supported)
> - supports reading old TOAST values
> - adds 1 byte header to the compressed data where we currently store the 
> algorithm kind, that leaves us with 254 more to add :) (that's an extra 
> overhead compared to the current state)
> - changes the rawsize in TOAST header to 31 bits via bit packing
> - uses the extra bit to differentiate between old and new format
> - supports reading from table which has different rows stored with different 
> algorithm (so that the GUC itself can be freely changed)
That's cool. I suggest defaulting to lz4 if it is available. You cannot start 
cluster on non-lz4 binaries which used lz4 once.
Do we plan the possibility of compression algorithm as extension? Or will all 
algorithms be packed into that byte in core?
What about lz4 "common prefix"? System or user-defined. If lz4 is compiled in 
we can even offer in-system training, just make sure that trained prefixes will 
make their way to standbys.

Best regards, Andrey Borodin.



Re: pglz performance

2019-08-03 Thread Petr Jelinek

Hi,

On 02/08/2019 21:48, Tomas Vondra wrote:

On Fri, Aug 02, 2019 at 11:20:03AM -0700, Andres Freund wrote:




Another question is whether we'd actually want to include the code in
core directly, or use system libraries (and if some packagers might
decide to disable that, for whatever reason).


I'd personally say we should have an included version, and a
--with-system-... flag that uses the system one.



OK. I'd say to require a system library, but that's a minor detail.



Same here.

Just so that we don't idly talk, what do you think about the attached?
It:
- adds new GUC compression_algorithm with possible values of pglz 
(default) and lz4 (if lz4 is compiled in), requires SIGHUP
- adds --with-lz4 configure option (default yes, so the configure option 
is actually --without-lz4) that enables the lz4, it's using system library

- uses the compression_algorithm for both TOAST and WAL compression (if on)
- supports slicing for lz4 as well (pglz was already supported)
- supports reading old TOAST values
- adds 1 byte header to the compressed data where we currently store the 
algorithm kind, that leaves us with 254 more to add :) (that's an extra 
overhead compared to the current state)

- changes the rawsize in TOAST header to 31 bits via bit packing
- uses the extra bit to differentiate between old and new format
- supports reading from table which has different rows stored with 
different algorithm (so that the GUC itself can be freely changed)


Simple docs and a TAP test included.

I did some basic performance testing (it's not really my thing though, 
so I would appreciate if somebody did more).
I get about 7x perf improvement on data load with lz4 compared to pglz 
on my dataset but strangely only tiny decompression improvement. Perhaps 
more importantly I also did before patch and after patch tests with pglz 
and the performance difference with my data set was <1%.


Note that this will just link against lz4, it does not add lz4 into 
PostgreSQL code-base.


The issues I know of:
- the pg_decompress function really ought to throw error in the default 
branch but that file is also used in front-end so not sure how to do that
- the TAP test probably does not work with all possible configurations 
(but that's why it needs to be set in PG_TEST_EXTRA like for example ssl)
- we don't really have any automated test for reading old TOAST format, 
no idea how to do that
- I expect my changes to configure.in are not the greatest as I don't 
have pretty much zero experience with autoconf


--
Petr Jelinek
2ndQuadrant - PostgreSQL Solutions for the Enterprise
https://www.2ndQuadrant.com/
>From 862919a271e6bfa151fde2fed19c9e5ae6227cdc Mon Sep 17 00:00:00 2001
From: Petr Jelinek 
Date: Sun, 4 Aug 2019 02:02:30 +0200
Subject: [PATCH] Add new GUC compression_algorithm

Sets which algorithm to use for TOAST and WAL compression.

Currently allows either pglz which is the standard PostgreSQL algorithm
or lz4 if the PostgreSQL was configured with --with-lz4 (which is the
default).

The implementation allows different values to have different compression
algorithms and also supports reading old TOAST format which always uses
the pglz.
---
 configure   | 116 ++--
 configure.in|  19 
 doc/src/sgml/config.sgml|  38 +++
 doc/src/sgml/storage.sgml   |   5 +-
 src/Makefile.global.in  |   1 +
 src/backend/access/heap/tuptoaster.c|  86 +++
 src/backend/access/transam/xloginsert.c |   5 +-
 src/backend/utils/misc/guc.c|  23 +++-
 src/common/pg_lzcompress.c  | 134 +++-
 src/include/common/pg_lzcompress.h  |  49 +++--
 src/include/pg_config.h.in  |   3 +
 src/include/postgres.h  |   3 +-
 src/test/Makefile   |   8 +-
 src/test/toast/.gitignore   |   2 +
 src/test/toast/Makefile |  25 +
 src/test/toast/README   |  25 +
 src/test/toast/t/001_lz4.pl | 124 ++
 17 files changed, 619 insertions(+), 47 deletions(-)
 create mode 100644 src/test/toast/.gitignore
 create mode 100644 src/test/toast/Makefile
 create mode 100644 src/test/toast/README
 create mode 100644 src/test/toast/t/001_lz4.pl

diff --git a/configure b/configure
index 7a6bfc2339..edd2bfefd6 100755
--- a/configure
+++ b/configure
@@ -704,6 +704,7 @@ with_system_tzdata
 with_libxslt
 with_libxml
 XML2_CONFIG
+with_lz4
 UUID_EXTRA_OBJS
 with_uuid
 with_systemd
@@ -795,6 +796,7 @@ infodir
 docdir
 oldincludedir
 includedir
+runstatedir
 localstatedir
 sharedstatedir
 sysconfdir
@@ -859,6 +861,7 @@ with_readline
 with_libedit_preferred
 with_uuid
 with_ossp_uuid
+with_lz4
 with_libxml
 with_libxslt
 with_system_tzdata
@@ -932,6 +935,7 @@ datadir='${datarootdir}'
 sysconfdir='${prefix}/etc'
 sharedstatedir='${prefix}/com'
 localstatedir='${prefix}/var'

Re: pglz performance

2019-08-02 Thread Konstantin Knizhnik




On 02.08.2019 21:20, Andres Freund wrote:

Another question is whether we'd actually want to include the code in

core directly, or use system libraries (and if some packagers might
decide to disable that, for whatever reason).

I'd personally say we should have an included version, and a
--with-system-... flag that uses the system one.

+1





Re: pglz performance

2019-08-02 Thread Tomas Vondra

On Fri, Aug 02, 2019 at 11:20:03AM -0700, Andres Freund wrote:

Hi,

On 2019-08-02 19:52:39 +0200, Tomas Vondra wrote:

Hmmm, I don't remember the details of those patches so I didn't realize
it allows incremental recompression. If that's possible, that would mean
existing systems can start using it. Which is good.


That depends on what do you mean by "incremental"? A single toasted
datum can only have one compression type, because we only update them
all in one anyway. But different datums can be compressed differently.



I meant different toast values using different compression algorithm,
sorry for the confusion.




Another question is whether we'd actually want to include the code in
core directly, or use system libraries (and if some packagers might
decide to disable that, for whatever reason).


I'd personally say we should have an included version, and a
--with-system-... flag that uses the system one.



OK. I'd say to require a system library, but that's a minor detail.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: pglz performance

2019-08-02 Thread Andres Freund
Hi,

On 2019-08-02 19:52:39 +0200, Tomas Vondra wrote:
> Hmmm, I don't remember the details of those patches so I didn't realize
> it allows incremental recompression. If that's possible, that would mean
> existing systems can start using it. Which is good.

That depends on what do you mean by "incremental"? A single toasted
datum can only have one compression type, because we only update them
all in one anyway. But different datums can be compressed differently.


> Another question is whether we'd actually want to include the code in
> core directly, or use system libraries (and if some packagers might
> decide to disable that, for whatever reason).

I'd personally say we should have an included version, and a
--with-system-... flag that uses the system one.

Greetings,

Andres Freund




Re: pglz performance

2019-08-02 Thread Tomas Vondra

On Fri, Aug 02, 2019 at 10:12:58AM -0700, Andres Freund wrote:

Hi,

On 2019-08-02 19:00:39 +0200, Tomas Vondra wrote:

On Fri, Aug 02, 2019 at 09:39:48AM -0700, Andres Freund wrote:
> Hi,
>
> On 2019-08-02 20:40:51 +0500, Andrey Borodin wrote:
> > We have some kind of "roadmap" of "extensible pglz". We plan to
> > provide implementation on Novembers CF.
>
> I don't understand why it's a good idea to improve the compression side
> of pglz. There's plenty other people that spent a lot of time
> developing better compression algorithms.
>

Isn't it beneficial for existing systems, that will be stuck with pglz
even if we end up adding other algorithms?


Why would they be stuck continuing to *compress* with pglz? As we
fully retoast on write anyway we can just gradually switch over to the
better algorithm. Decompression speed is another story, of course.



Hmmm, I don't remember the details of those patches so I didn't realize
it allows incremental recompression. If that's possible, that would mean 
existing systems can start using it. Which is good.


Another question is whether we'd actually want to include the code in
core directly, or use system libraries (and if some packagers might
decide to disable that, for whatever reason).

But yeah, I agree you may have a point about optimizing pglz compression.



Here's what I had a few years back:

https://www.postgresql.org/message-id/20130621000900.GA12425%40alap2.anarazel.de
see also
https://www.postgresql.org/message-id/20130605150144.GD28067%40alap2.anarazel.de

I think we should refresh something like that patch, and:
- make the compression algorithm GUC an enum, rename
- add --with-system-lz4
- obviously refresh the copy of lz4
- drop snappy



That's a reasonable plan, I guess.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: pglz performance

2019-08-02 Thread Andres Freund
Hi,

On 2019-08-02 19:00:39 +0200, Tomas Vondra wrote:
> On Fri, Aug 02, 2019 at 09:39:48AM -0700, Andres Freund wrote:
> > Hi,
> > 
> > On 2019-08-02 20:40:51 +0500, Andrey Borodin wrote:
> > > We have some kind of "roadmap" of "extensible pglz". We plan to
> > > provide implementation on Novembers CF.
> > 
> > I don't understand why it's a good idea to improve the compression side
> > of pglz. There's plenty other people that spent a lot of time
> > developing better compression algorithms.
> > 
> 
> Isn't it beneficial for existing systems, that will be stuck with pglz
> even if we end up adding other algorithms?

Why would they be stuck continuing to *compress* with pglz? As we
fully retoast on write anyway we can just gradually switch over to the
better algorithm. Decompression speed is another story, of course.


Here's what I had a few years back:

https://www.postgresql.org/message-id/20130621000900.GA12425%40alap2.anarazel.de
see also
https://www.postgresql.org/message-id/20130605150144.GD28067%40alap2.anarazel.de

I think we should refresh something like that patch, and:
- make the compression algorithm GUC an enum, rename
- add --with-system-lz4
- obviously refresh the copy of lz4
- drop snappy

Greetings,

Andres Freund




Re: pglz performance

2019-08-02 Thread Tomas Vondra

On Fri, Aug 02, 2019 at 09:39:48AM -0700, Andres Freund wrote:

Hi,

On 2019-08-02 20:40:51 +0500, Andrey Borodin wrote:

We have some kind of "roadmap" of "extensible pglz". We plan to
provide implementation on Novembers CF.


I don't understand why it's a good idea to improve the compression side
of pglz. There's plenty other people that spent a lot of time
developing better compression algorithms.



Isn't it beneficial for existing systems, that will be stuck with pglz
even if we end up adding other algorithms?




Currently, pglz starts with empty cache map: there is no prior 4k
bytes before start. We can add imaginary prefix to any data with
common substrings: this will enhance compression ratio.  It is hard
to decide on training data set for this "common prefix". So we want
to produce extension with aggregate function which produces some
"adapted common prefix" from users's data.  Then we can "reserve" few
negative bytes for "decompression commands". This command can
instruct database on which common prefix to use.  But also system
command can say "invoke decompression from extension".

Thus, user will be able to train database compression on his data and
substitute pglz compression with custom compression method
seamlessly.

This will make hard-choosen compression unneeded, but seems overly
hacky. But there will be no need to have lz4, zstd, brotli, lzma and
others in core. Why not provide e.g. "time series compression"? Or
"DNA compression"? Whatever gun user wants for his foot.


I think this is way too complicated, and will provide not particularly
much benefit for the majority users.



I agree with this. I do see value in the feature, but probably not as a
drop-in replacement for the default compression algorithm. I'd compare
it to the "custom compression methods" patch that was submitted some
time ago.


In fact, I'll argue that we should flat out reject any such patch until
we have at least one decent default compression algorithm in core.
You're trying to work around a poor compression algorithm with
complicated dictionary improvement, that require user interaction, and
only will work in a relatively small subset of the cases, and will very
often increase compression times.



I wouldn't be so strict I guess. But I do agree an algorithm that 
requires additional steps (training, ...) is unlikely to be a good

candidate for default instance compression alorithm.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: pglz performance

2019-08-02 Thread Andres Freund
Hi,

On 2019-08-02 20:40:51 +0500, Andrey Borodin wrote:
> We have some kind of "roadmap" of "extensible pglz". We plan to provide 
> implementation on Novembers CF.

I don't understand why it's a good idea to improve the compression side
of pglz. There's plenty other people that spent a lot of time developing
better compression algorithms.


> Currently, pglz starts with empty cache map: there is no prior 4k bytes 
> before start. We can add imaginary prefix to any data with common substrings: 
> this will enhance compression ratio.
> It is hard to decide on training data set for this "common prefix". So we 
> want to produce extension with aggregate function which produces some 
> "adapted common prefix" from users's data.
> Then we can "reserve" few negative bytes for "decompression commands". This 
> command can instruct database on which common prefix to use.
> But also system command can say "invoke decompression from extension".
> 
> Thus, user will be able to train database compression on his data and 
> substitute pglz compression with custom compression method seamlessly.
> 
> This will make hard-choosen compression unneeded, but seems overly hacky. But 
> there will be no need to have lz4, zstd, brotli, lzma and others in core. Why 
> not provide e.g. "time series compression"? Or "DNA compression"? Whatever 
> gun user wants for his foot.

I think this is way too complicated, and will provide not particularly
much benefit for the majority users.

In fact, I'll argue that we should flat out reject any such patch until
we have at least one decent default compression algorithm in
core. You're trying to work around a poor compression algorithm with
complicated dictionary improvement, that require user interaction, and
only will work in a relatively small subset of the cases, and will very
often increase compression times.

Greetings,

Andres Freund




Re: pglz performance

2019-08-02 Thread Andrey Borodin
Thanks for looking into this!

> 2 авг. 2019 г., в 19:43, Tomas Vondra  
> написал(а):
> 
> On Fri, Aug 02, 2019 at 04:45:43PM +0300, Konstantin Knizhnik wrote:
>> 
>> It takes me some time to understand that your memcpy optimization is 
>> correct;)
Seems that comments are not explanatory enough... will try to fix.

>> I have tested different ways of optimizing this fragment of code, but failed 
>> tooutperform your implementation!
JFYI we tried optimizations with memcpy with const size (optimized into 
assembly instead of call), unrolling literal loop and some others. All these 
did not work better.

>> But ...  below are results for lz4:
>> 
>> Decompressor score (summ of all times):
>> NOTICE:  Decompressor lz4_decompress result 3.660066
>> Compressor score (summ of all times):
>> NOTICE:  Compressor lz4_compress result 10.288594
>> 
>> There is 2 times advantage in decompress speed and 10 times advantage in 
>> compress speed.
>> So may be instead of "hacking" pglz algorithm we should better switch to lz4?
>> 
> 
> I think we should just bite the bullet and add initdb option to pick
> compression algorithm. That's been discussed repeatedly, but we never
> ended up actually doing that. See for example [1].
> 
> If there's anyone willing to put some effort into getting this feature
> over the line, I'm willing to do reviews & commit. It's a seemingly
> small change with rather insane potential impact.
> 
> But even if we end up doing that, it still makes sense to optimize the
> hell out of pglz, because existing systems will still use that
> (pg_upgrade can't switch from one compression algorithm to another).

We have some kind of "roadmap" of "extensible pglz". We plan to provide 
implementation on Novembers CF.

Currently, pglz starts with empty cache map: there is no prior 4k bytes before 
start. We can add imaginary prefix to any data with common substrings: this 
will enhance compression ratio.
It is hard to decide on training data set for this "common prefix". So we want 
to produce extension with aggregate function which produces some "adapted 
common prefix" from users's data.
Then we can "reserve" few negative bytes for "decompression commands". This 
command can instruct database on which common prefix to use.
But also system command can say "invoke decompression from extension".

Thus, user will be able to train database compression on his data and 
substitute pglz compression with custom compression method seamlessly.

This will make hard-choosen compression unneeded, but seems overly hacky. But 
there will be no need to have lz4, zstd, brotli, lzma and others in core. Why 
not provide e.g. "time series compression"? Or "DNA compression"? Whatever gun 
user wants for his foot.

Best regards, Andrey Borodin.



Re: pglz performance

2019-08-02 Thread Tomas Vondra

On Fri, Aug 02, 2019 at 04:45:43PM +0300, Konstantin Knizhnik wrote:



On 27.06.2019 21:33, Andrey Borodin wrote:



13 мая 2019 г., в 12:14, Michael Paquier  написал(а):

Decompression can matter a lot for mostly-read workloads and
compression can become a bottleneck for heavy-insert loads, so
improving compression or decompression should be two separate
problems, not two problems linked.  Any improvement in one or the
other, or even both, is nice to have.

Here's patch hacked by Vladimir for compression.

Key differences (as far as I see, maybe Vladimir will post more complete list 
of optimizations):
1. Use functions instead of macro-functions: not surprisingly it's easier to 
optimize them and provide less constraints for compiler to optimize.
2. More compact hash table: use indexes instead of pointers.
3. More robust segment comparison: like memcmp, but return index of first 
different byte

In weighted mix of different data (same as for compression), overall speedup is 
x1.43 on my machine.

Current implementation is integrated into test_pglz suit for benchmarking 
purposes[0].

Best regards, Andrey Borodin.

[0] https://github.com/x4m/test_pglz


It takes me some time to understand that your memcpy optimization is 
correct;)
I have tested different ways of optimizing this fragment of code, but 
failed tooutperform your implementation!

Results at my computer is simlar with yours:

Decompressor score (summ of all times):
NOTICE:  Decompressor pglz_decompress_hacked result 6.627355
NOTICE:  Decompressor pglz_decompress_hacked_unrolled result 7.497114
NOTICE:  Decompressor pglz_decompress_hacked8 result 7.412944
NOTICE:  Decompressor pglz_decompress_hacked16 result 7.792978
NOTICE:  Decompressor pglz_decompress_vanilla result 10.652603

Compressor score (summ of all times):
NOTICE:  Compressor pglz_compress_vanilla result 116.970005
NOTICE:  Compressor pglz_compress_hacked result 89.706105


But ...  below are results for lz4:

Decompressor score (summ of all times):
NOTICE:  Decompressor lz4_decompress result 3.660066
Compressor score (summ of all times):
NOTICE:  Compressor lz4_compress result 10.288594

There is 2 times advantage in decompress speed and 10 times advantage 
in compress speed.
So may be instead of "hacking" pglz algorithm we should better switch 
to lz4?




I think we should just bite the bullet and add initdb option to pick
compression algorithm. That's been discussed repeatedly, but we never
ended up actually doing that. See for example [1].

If there's anyone willing to put some effort into getting this feature
over the line, I'm willing to do reviews & commit. It's a seemingly
small change with rather insane potential impact.

But even if we end up doing that, it still makes sense to optimize the
hell out of pglz, because existing systems will still use that
(pg_upgrade can't switch from one compression algorithm to another).

regards

[1] 
https://www.postgresql.org/message-id/flat/55341569.1090107%402ndquadrant.com

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: pglz performance

2019-08-02 Thread Konstantin Knizhnik




On 27.06.2019 21:33, Andrey Borodin wrote:



13 мая 2019 г., в 12:14, Michael Paquier  написал(а):

Decompression can matter a lot for mostly-read workloads and
compression can become a bottleneck for heavy-insert loads, so
improving compression or decompression should be two separate
problems, not two problems linked.  Any improvement in one or the
other, or even both, is nice to have.

Here's patch hacked by Vladimir for compression.

Key differences (as far as I see, maybe Vladimir will post more complete list 
of optimizations):
1. Use functions instead of macro-functions: not surprisingly it's easier to 
optimize them and provide less constraints for compiler to optimize.
2. More compact hash table: use indexes instead of pointers.
3. More robust segment comparison: like memcmp, but return index of first 
different byte

In weighted mix of different data (same as for compression), overall speedup is 
x1.43 on my machine.

Current implementation is integrated into test_pglz suit for benchmarking 
purposes[0].

Best regards, Andrey Borodin.

[0] https://github.com/x4m/test_pglz


It takes me some time to understand that your memcpy optimization is 
correct;)
I have tested different ways of optimizing this fragment of code, but 
failed tooutperform your implementation!

Results at my computer is simlar with yours:

Decompressor score (summ of all times):
NOTICE:  Decompressor pglz_decompress_hacked result 6.627355
NOTICE:  Decompressor pglz_decompress_hacked_unrolled result 7.497114
NOTICE:  Decompressor pglz_decompress_hacked8 result 7.412944
NOTICE:  Decompressor pglz_decompress_hacked16 result 7.792978
NOTICE:  Decompressor pglz_decompress_vanilla result 10.652603

Compressor score (summ of all times):
NOTICE:  Compressor pglz_compress_vanilla result 116.970005
NOTICE:  Compressor pglz_compress_hacked result 89.706105


But ...  below are results for lz4:

Decompressor score (summ of all times):
NOTICE:  Decompressor lz4_decompress result 3.660066
Compressor score (summ of all times):
NOTICE:  Compressor lz4_compress result 10.288594

There is 2 times advantage in decompress speed and 10 times advantage in 
compress speed.
So may be instead of "hacking" pglz algorithm we should better switch to 
lz4?



--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





Re: pglz performance

2019-06-27 Thread Andrey Borodin


> 13 мая 2019 г., в 12:14, Michael Paquier  написал(а):
> 
> Decompression can matter a lot for mostly-read workloads and
> compression can become a bottleneck for heavy-insert loads, so
> improving compression or decompression should be two separate
> problems, not two problems linked.  Any improvement in one or the
> other, or even both, is nice to have.

Here's patch hacked by Vladimir for compression.

Key differences (as far as I see, maybe Vladimir will post more complete list 
of optimizations):
1. Use functions instead of macro-functions: not surprisingly it's easier to 
optimize them and provide less constraints for compiler to optimize.
2. More compact hash table: use indexes instead of pointers.
3. More robust segment comparison: like memcmp, but return index of first 
different byte

In weighted mix of different data (same as for compression), overall speedup is 
x1.43 on my machine.

Current implementation is integrated into test_pglz suit for benchmarking 
purposes[0].

Best regards, Andrey Borodin.

[0] https://github.com/x4m/test_pglz


0001-Reorganize-pglz-compression-code.patch
Description: Binary data


Re: pglz performance

2019-06-24 Thread Andrey Borodin


> 18 мая 2019 г., в 11:44, Andrey Borodin  написал(а):
> 
Hi! 
Here's rebased version of patches.

Best regards, Andrey Borodin.


0001-Use-memcpy-in-pglz-decompression-for-long-matches.patch
Description: Binary data


0001-Use-memcpy-in-pglz-decompression.patch
Description: Binary data


Re: pglz performance

2019-05-23 Thread Binguo Bao
Hi hackers!
I am a student participating in GSoC 2019. I am looking forward to working
with you all and learning from you.
My project would aim to provide the ability to de-TOAST a fully TOAST'd and
compressed field using an iterator.
For more details, please take a look at my proposal[0]. Any suggestions or
comments about my immature ideas would be much appreciated:)

I've implemented the first step of the project, the segment pglz
compression provides the ability to get the subset of the raw data without
decompressing the entire field.
And I've done some test[1] for the compressor. The test result is as
follows:
NOTICE:  Test summary:
NOTICE:  Payload 00010001
NOTICE:   Decompressor name  |Compression time (ns/bit)  |
Decompression time (ns/bit) | ratio
NOTICE:   pglz_decompress_hacked |23.747444   |
 0.578344 | 0.159809
NOTICE:   pglz_decompress_hacked8|23.764193   |
 0.677800 | 0.159809
NOTICE:   pglz_decompress_hacked16   |23.740351   |
 0.704730 | 0.159809
NOTICE:   pglz_decompress_vanilla|23.797917   |
 1.227868 | 0.159809
NOTICE:   pglz_decompress_hacked_seg |12.261808   |
 0.625634 | 0.184952

Comment: Compression speed increased by nearly 100% with compression rate
dropped by 15%

NOTICE:  Payload 00010001 sliced by 2Kb
NOTICE:   pglz_decompress_hacked |12.616956   |
 0.621223 | 0.156953
NOTICE:   pglz_decompress_hacked8|12.583685   |
 0.756741 | 0.156953
NOTICE:   pglz_decompress_hacked16   |12.512636   |
 0.774980 | 0.156953
NOTICE:   pglz_decompress_vanilla|12.493062   |
 1.262820 | 0.156953
NOTICE:   pglz_decompress_hacked_seg |11.986554   |
 0.622654 | 0.159590
NOTICE:  Payload 00010001 sliced by 4Kb
NOTICE:   pglz_decompress_hacked |15.514469   |
 0.565565 | 0.154213
NOTICE:   pglz_decompress_hacked8|15.529144   |
 0.699675 | 0.154213
NOTICE:   pglz_decompress_hacked16   |15.514040   |
 0.721145 | 0.154213
NOTICE:   pglz_decompress_vanilla|15.558958   |
 1.237237 | 0.154213
NOTICE:   pglz_decompress_hacked_seg |14.650309   |
 0.563228 | 0.153652
NOTICE:  Payload 00010006
NOTICE:   Decompressor name  |  Compression time (ns/bit)  |
Decompression time (ns/bit) | ratio
NOTICE:   pglz_decompress_hacked |8.610177   |
 0.153577 | 0.052294
NOTICE:   pglz_decompress_hacked8|8.566785   |
 0.168002 | 0.052294
NOTICE:   pglz_decompress_hacked16   |8.643126   |
 0.167537 | 0.052294
NOTICE:   pglz_decompress_vanilla|8.574498   |
 0.930738 | 0.052294
NOTICE:   pglz_decompress_hacked_seg |7.394731   |
 0.171924 | 0.056081
NOTICE:  Payload 00010006 sliced by 2Kb
NOTICE:   pglz_decompress_hacked |6.724060   |
 0.295043 | 0.065541
NOTICE:   pglz_decompress_hacked8|6.623018   |
 0.318527 | 0.065541
NOTICE:   pglz_decompress_hacked16   |6.898034   |
 0.318360 | 0.065541
NOTICE:   pglz_decompress_vanilla|6.712711   |
 1.045430 | 0.065541
NOTICE:   pglz_decompress_hacked_seg |6.630743   |
 0.302589 | 0.068471
NOTICE:  Payload 00010006 sliced by 4Kb
NOTICE:   pglz_decompress_hacked |6.624067   |
 0.220942 | 0.058865
NOTICE:   pglz_decompress_hacked8|6.659424   |
 0.240183 | 0.058865
NOTICE:   pglz_decompress_hacked16   |6.763864   |
 0.240564 | 0.058865
NOTICE:   pglz_decompress_vanilla|6.743574   |
 0.985348 | 0.058865
NOTICE:   pglz_decompress_hacked_seg |6.613123   |
 0.227582 | 0.060330
NOTICE:  Payload 00010008
NOTICE:   Decompressor name  |  Compression time (ns/bit)  |
Decompression time (ns/bit) | ratio
NOTICE:   pglz_decompress_hacked |52.425957   |
 1.050544 | 0.498941
NOTICE:   pglz_decompress_hacked8|52.204561   |
 1.261592 | 0.498941
NOTICE:   pglz_decompress_hacked16   |52.328491   |
 1.466751 | 0.498941
NOTICE:   pglz_decompress_vanilla|52.465308   |
 1.341271 | 0.498941
NOTICE:   pglz_decompress_hacked_seg |   

Re: pglz performance

2019-05-18 Thread Andrey Borodin



> 17 мая 2019 г., в 18:40, Gasper Zejn  написал(а):
> 
> I've tested according to instructions at the test repo
> https://github.com/x4m/test_pglz
> 
> Test_pglz is at a97f63b and postgres at 6ba500.
> 
> Hardware is desktop AMD Ryzen 5 2600, 32GB RAM
> 
> Decompressor score (summ of all times):
> 
> NOTICE:  Decompressor pglz_decompress_hacked result 6.988909
> NOTICE:  Decompressor pglz_decompress_hacked8 result 7.562619
> NOTICE:  Decompressor pglz_decompress_hacked16 result 8.316957
> NOTICE:  Decompressor pglz_decompress_vanilla result 10.725826

Thanks, Gasper! Basically we observe same 0.65 time reduction here.

That's very good that we have independent scores.

I'm still somewhat not sure that score is fair, on payload 
00010008 we have vanilla decompression sometimes slower than 
hacked by few percents. And this is especially visible on AMD. Degradation for 
00010008 sliced by 8Kb reaches 10%

I think this is because 00010008 have highest entropy.It is 
almost random and matches are very short, but present.
00010008 
Entropy = 4.360546 bits per byte.
00010006
Entropy = 1.450059 bits per byte.
00010001 
Entropy = 2.944235 bits per byte.
shakespeare.txt 
Entropy = 3.603659 bits per byte
16398 
Entropy = 1.897640 bits per byte.

Best regards, Andrey Borodin.



Re: pglz performance

2019-05-17 Thread Gasper Zejn

On 16. 05. 19 19:13, Andrey Borodin wrote:
>
>> 15 мая 2019 г., в 15:06, Andrey Borodin  написал(а):
>>
>> Owners of AMD and ARM devices are welcome.

I've tested according to instructions at the test repo
https://github.com/x4m/test_pglz

Test_pglz is at a97f63b and postgres at 6ba500.

Hardware is desktop AMD Ryzen 5 2600, 32GB RAM

Decompressor score (summ of all times):

NOTICE:  Decompressor pglz_decompress_hacked result 6.988909
NOTICE:  Decompressor pglz_decompress_hacked8 result 7.562619
NOTICE:  Decompressor pglz_decompress_hacked16 result 8.316957
NOTICE:  Decompressor pglz_decompress_vanilla result 10.725826


Attached is the full test run, if needed.

Kind regards,

Gasper

> Yandex hardware RND guys gave me ARM server and Power9 server. They are 
> looking for AMD and some new Intel boxes.
>
> Meanwhile I made some enhancements to test suit:
> 1. I've added Shakespeare payload: concatenation of works of this prominent 
> poet.
> 2. For each payload compute "sliced time" - time to decompress payload if it 
> was sliced by 2Kb pieces or 8Kb pieces.
> 3. For each decompressor we compute "score": (sum of time to decompress each 
> payload, each payload sliced by 2Kb and 8Kb) * 5 times
>
> I've attached full test logs, meanwhile here's results for different 
> platforms.
>
> Intel Server
> NOTICE:  0: Decompressor pglz_decompress_hacked result 10.346763
> NOTICE:  0: Decompressor pglz_decompress_hacked8 result 11.192078
> NOTICE:  0: Decompressor pglz_decompress_hacked16 result 11.957727
> NOTICE:  0: Decompressor pglz_decompress_vanilla result 14.262256
>
> ARM Server
> NOTICE:  Decompressor pglz_decompress_hacked result 12.98
> NOTICE:  Decompressor pglz_decompress_hacked8 result 13.004935
> NOTICE:  Decompressor pglz_decompress_hacked16 result 13.043015
> NOTICE:  Decompressor pglz_decompress_vanilla result 18.239242
>
> Power9 Server
> NOTICE:  Decompressor pglz_decompress_hacked result 10.992974
> NOTICE:  Decompressor pglz_decompress_hacked8 result 11.747443
> NOTICE:  Decompressor pglz_decompress_hacked16 result 11.026342
> NOTICE:  Decompressor pglz_decompress_vanilla result 16.375315
>
> Intel laptop
> NOTICE:  Decompressor pglz_decompress_hacked result 9.445808
> NOTICE:  Decompressor pglz_decompress_hacked8 result 9.105360
> NOTICE:  Decompressor pglz_decompress_hacked16 result 9.621833
> NOTICE:  Decompressor pglz_decompress_vanilla result 10.661968
>
> From these results pglz_decompress_hacked looks best.
>
> Best regards, Andrey Borodin.
>
✔ ~/project/pgsql/contrib/test_pglz 
[master|…5] 

15:29 $ ./test.sh 
make -C ../../src/backend generated-headers
make[1]: Entering directory '/home/hruske/project/pgsql/src/backend'
make -C catalog distprep generated-header-symlinks
make[2]: Entering directory '/home/hruske/project/pgsql/src/backend/catalog'
make[2]: Nothing to be done for 'distprep'.
make[2]: Nothing to be done for 'generated-header-symlinks'.
make[2]: Leaving directory '/home/hruske/project/pgsql/src/backend/catalog'
make -C utils distprep generated-header-symlinks
make[2]: Entering directory '/home/hruske/project/pgsql/src/backend/utils'
make[2]: Nothing to be done for 'distprep'.
make[2]: Nothing to be done for 'generated-header-symlinks'.
make[2]: Leaving directory '/home/hruske/project/pgsql/src/backend/utils'
make[1]: Leaving directory '/home/hruske/project/pgsql/src/backend'
/bin/mkdir -p '/home/hruske/project/lib/postgresql'
/bin/mkdir -p '/home/hruske/project/share/postgresql/extension'
/bin/mkdir -p '/home/hruske/project/share/postgresql/extension'
/usr/bin/install -c -m 755  test_pglz.so 
'/home/hruske/project/lib/postgresql/test_pglz.so'
/usr/bin/install -c -m 644 ./test_pglz.control 
'/home/hruske/project/share/postgresql/extension/'
/usr/bin/install -c -m 644 ./test_pglz--1.0.sql ./00010006 
./00010001 ./00010008 ./16398 ./shakespeare.txt 
 '/home/hruske/project/share/postgresql/extension/'
The files belonging to this database system will be owned by user "hruske".
This user must also own the server process.

The database cluster will be initialized with locale "sl_SI.UTF-8".
The default database encoding has accordingly been set to "UTF8".
initdb: could not find suitable text search configuration for locale 
"sl_SI.UTF-8"
The default text search configuration will be set to "simple".

Data page checksums are disabled.

creating directory /home/hruske/DemoDb1 ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default timezone ... Europe/Ljubljana
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok

initdb: warning: enabling "trust" authentication for local connections
You can change this by editing 

Re: pglz performance

2019-05-17 Thread Andrey Borodin


> 17 мая 2019 г., в 6:44, Michael Paquier  написал(а):
> 
> That's nice.
> 
> From the numbers you are presenting here, all of them are much better
> than the original, and there is not much difference between any of the
> patched versions.  Having a 20%~30% improvement with a patch is very
> nice.
> 
> After that comes the simplicity and the future maintainability of what
> is proposed.  I am not much into accepting a patch which has a 1%~2%
> impact for some hardwares and makes pglz much more complex and harder
> to understand.  But I am really eager to see a patch with at least a
> 10% improvement which remains simple, even more if it simplifies the
> logic used in pglz.

Here are patches for both winning versions. I'll place them on CF.
My gut feeling is pglz_decompress_hacked8 should be better, but on most 
architectures benchmarks show opposite.


Best regards, Andrey Borodin.



pglz_decompress_hacked8.diff
Description: Binary data


pglz_decompress_hacked.diff
Description: Binary data



Re: pglz performance

2019-05-16 Thread Michael Paquier
On Thu, May 16, 2019 at 10:13:22PM +0500, Andrey Borodin wrote:
> Meanwhile I made some enhancements to test suit:
> Intel Server
> NOTICE:  0: Decompressor pglz_decompress_hacked result 10.346763
> NOTICE:  0: Decompressor pglz_decompress_hacked8 result 11.192078
> NOTICE:  0: Decompressor pglz_decompress_hacked16 result 11.957727
> NOTICE:  0: Decompressor pglz_decompress_vanilla result 14.262256
> 
> ARM Server
> NOTICE:  Decompressor pglz_decompress_hacked result 12.98
> NOTICE:  Decompressor pglz_decompress_hacked8 result 13.004935
> NOTICE:  Decompressor pglz_decompress_hacked16 result 13.043015
> NOTICE:  Decompressor pglz_decompress_vanilla result 18.239242
> 
> Power9 Server
> NOTICE:  Decompressor pglz_decompress_hacked result 10.992974
> NOTICE:  Decompressor pglz_decompress_hacked8 result 11.747443
> NOTICE:  Decompressor pglz_decompress_hacked16 result 11.026342
> NOTICE:  Decompressor pglz_decompress_vanilla result 16.375315
> 
> Intel laptop
> NOTICE:  Decompressor pglz_decompress_hacked result 9.445808
> NOTICE:  Decompressor pglz_decompress_hacked8 result 9.105360
> NOTICE:  Decompressor pglz_decompress_hacked16 result 9.621833
> NOTICE:  Decompressor pglz_decompress_vanilla result 10.661968
> 
> From these results pglz_decompress_hacked looks best.

That's nice.

From the numbers you are presenting here, all of them are much better
than the original, and there is not much difference between any of the
patched versions.  Having a 20%~30% improvement with a patch is very
nice.

After that comes the simplicity and the future maintainability of what
is proposed.  I am not much into accepting a patch which has a 1%~2%
impact for some hardwares and makes pglz much more complex and harder
to understand.  But I am really eager to see a patch with at least a
10% improvement which remains simple, even more if it simplifies the
logic used in pglz.
--
Michael


signature.asc
Description: PGP signature


Re: pglz performance

2019-05-16 Thread Andrey Borodin


> 15 мая 2019 г., в 15:06, Andrey Borodin  написал(а):
> 
> Owners of AMD and ARM devices are welcome.

Yandex hardware RND guys gave me ARM server and Power9 server. They are looking 
for AMD and some new Intel boxes.

Meanwhile I made some enhancements to test suit:
1. I've added Shakespeare payload: concatenation of works of this prominent 
poet.
2. For each payload compute "sliced time" - time to decompress payload if it 
was sliced by 2Kb pieces or 8Kb pieces.
3. For each decompressor we compute "score": (sum of time to decompress each 
payload, each payload sliced by 2Kb and 8Kb) * 5 times

I've attached full test logs, meanwhile here's results for different platforms.

Intel Server
NOTICE:  0: Decompressor pglz_decompress_hacked result 10.346763
NOTICE:  0: Decompressor pglz_decompress_hacked8 result 11.192078
NOTICE:  0: Decompressor pglz_decompress_hacked16 result 11.957727
NOTICE:  0: Decompressor pglz_decompress_vanilla result 14.262256

ARM Server
NOTICE:  Decompressor pglz_decompress_hacked result 12.98
NOTICE:  Decompressor pglz_decompress_hacked8 result 13.004935
NOTICE:  Decompressor pglz_decompress_hacked16 result 13.043015
NOTICE:  Decompressor pglz_decompress_vanilla result 18.239242

Power9 Server
NOTICE:  Decompressor pglz_decompress_hacked result 10.992974
NOTICE:  Decompressor pglz_decompress_hacked8 result 11.747443
NOTICE:  Decompressor pglz_decompress_hacked16 result 11.026342
NOTICE:  Decompressor pglz_decompress_vanilla result 16.375315

Intel laptop
NOTICE:  Decompressor pglz_decompress_hacked result 9.445808
NOTICE:  Decompressor pglz_decompress_hacked8 result 9.105360
NOTICE:  Decompressor pglz_decompress_hacked16 result 9.621833
NOTICE:  Decompressor pglz_decompress_vanilla result 10.661968

From these results pglz_decompress_hacked looks best.

Best regards, Andrey Borodin.

Intel Server

pgload01f/postgres M # select test_pglz();
NOTICE:  0: Time to decompress one byte in ns:
LOCATION:  test_pglz, test_pglz.c:249
NOTICE:  0: Payload 00010001
LOCATION:  test_pglz, test_pglz.c:252
NOTICE:  0: Decompressor pglz_decompress_hacked result 0.142841
LOCATION:  test_pglz, test_pglz.c:255
NOTICE:  0: Decompressor pglz_decompress_hacked8 result 0.136788
LOCATION:  test_pglz, test_pglz.c:255
NOTICE:  0: Decompressor pglz_decompress_hacked16 result 0.145459
LOCATION:  test_pglz, test_pglz.c:255
NOTICE:  0: Decompressor pglz_decompress_vanilla result 0.207186
LOCATION:  test_pglz, test_pglz.c:255
NOTICE:  0: Payload 00010001 sliced by 2Kb
LOCATION:  test_pglz, test_pglz.c:259
NOTICE:  0: Decompressor pglz_decompress_hacked result 0.747311
LOCATION:  test_pglz, test_pglz.c:262
NOTICE:  0: Decompressor pglz_decompress_hacked8 result 0.780535
LOCATION:  test_pglz, test_pglz.c:262
NOTICE:  0: Decompressor pglz_decompress_hacked16 result 0.826859
LOCATION:  test_pglz, test_pglz.c:262
NOTICE:  0: Decompressor pglz_decompress_vanilla result 1.074147
LOCATION:  test_pglz, test_pglz.c:262
NOTICE:  0: Payload 00010001 sliced by 8Kb
LOCATION:  test_pglz, test_pglz.c:266
NOTICE:  0: Decompressor pglz_decompress_hacked result 0.680256
LOCATION:  test_pglz, test_pglz.c:269
NOTICE:  0: Decompressor pglz_decompress_hacked8 result 0.746620
LOCATION:  test_pglz, test_pglz.c:269
NOTICE:  0: Decompressor pglz_decompress_hacked16 result 0.759602
LOCATION:  test_pglz, test_pglz.c:269
NOTICE:  0: Decompressor pglz_decompress_vanilla result 1.030290
LOCATION:  test_pglz, test_pglz.c:269
NOTICE:  0: Payload 00010006
LOCATION:  test_pglz, test_pglz.c:252
NOTICE:  0: Decompressor pglz_decompress_hacked result 0.040214
LOCATION:  test_pglz, test_pglz.c:255
NOTICE:  0: Decompressor pglz_decompress_hacked8 result 0.042281
LOCATION:  test_pglz, test_pglz.c:255
NOTICE:  0: Decompressor pglz_decompress_hacked16 result 0.042684
LOCATION:  test_pglz, test_pglz.c:255
NOTICE:  0: Decompressor pglz_decompress_vanilla result 0.123906
LOCATION:  test_pglz, test_pglz.c:255
NOTICE:  0: Payload 00010006 sliced by 2Kb
LOCATION:  test_pglz, test_pglz.c:259
NOTICE:  0: Decompressor pglz_decompress_hacked result 0.368333
LOCATION:  test_pglz, test_pglz.c:262
NOTICE:  0: Decompressor pglz_decompress_hacked8 result 0.367808
LOCATION:  test_pglz, test_pglz.c:262
NOTICE:  0: Decompressor pglz_decompress_hacked16 result 0.367856
LOCATION:  test_pglz, test_pglz.c:262
NOTICE:  0: Decompressor pglz_decompress_vanilla result 0.742352
LOCATION:  test_pglz, test_pglz.c:262
NOTICE:  0: Payload 00010006 sliced by 8Kb
LOCATION:  test_pglz, test_pglz.c:266
NOTICE:  0: Decompressor pglz_decompress_hacked result 0.276363
LOCATION:  test_pglz, test_pglz.c:269
NOTICE:  0: Decompressor pglz_decompress_hacked8 result 0.281572
LOCATION:  test_pglz, test_pglz.c:269
NOTICE:  0: Decompressor pglz_decompress_hacked16 result 

Re: pglz performance

2019-05-15 Thread Andrey Borodin



> 13 мая 2019 г., в 12:14, Michael Paquier  написал(а):
> 
>> Currently we test mostly decompression improvements against two WALs
>> and one data file taken from pgbench-generated database. Any
>> suggestion on more relevant data payloads are very welcome.
> 
> Text strings made of random data and variable length?
Like text corpus?

>  For any test of
> this kind I think that it is good to focus on the performance of the
> low-level calls, even going as far as a simple C wrapper on top of the
> pglz APIs to test only the performance and not have extra PG-related
> overhead like palloc() which can be a barrier.
Our test_pglz extension is measuring only time of real compression, doing 
warmup run, all allocations are done before measurement.

>  Focusing on strings of
> lengths of 1kB up to 16kB may be an idea of size, and it is important
> to keep the same uncompressed strings for performance comparison.
We intentionally avoid using generated data, thus keep test files committed 
into git repo.
Also we check that decompressed data matches source of compression. All tests 
are done 5 times.

We use PG extension only for simplicity of deployment of benchmarks to our PG 
clusters.


Here are some test results.

Currently we test on 4 payloads:
1. WAL from cluster initialization
2. 2 WALs from pgbench pgbench -i -s 10
3. data file taken from pgbench -i -s 10

We use these decompressors:
1. pglz_decompress_vanilla - taken from PG source code
2. pglz_decompress_hacked - use sliced memcpy to imitate byte-by-byte pglz 
decompression
3. pglz_decompress_hacked4, pglz_decompress_hacked8, pglz_decompress_hackedX - 
use memcpy if match is no less than X bytes. We need to determine best X, if 
this approach is used.

I used three platforms:
1. Server XEONE5-2660 SM/SYS1027RN3RF/10S2.5/1U/2P (2*INTEL XEON 
E5-2660/16*DDR3ECCREG/10*SAS-2.5) Under Ubuntu 14, PG 9.6.
2. Desktop Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz Ubuntu 18, PG 12devel
3. Laptop MB Pro 15 2015 2.2 GHz Core i7 (I7-4770HQ) MacOS, PG 12devel
Owners of AMD and ARM devices are welcome.

Server results (less is better):
NOTICE:  0: Time to decompress one byte in ns:
NOTICE:  0: Payload 00010001
NOTICE:  0: Decompressor pglz_decompress_hacked result 0.647235
NOTICE:  0: Decompressor pglz_decompress_hacked4 result 0.671029
NOTICE:  0: Decompressor pglz_decompress_hacked8 result 0.699949
NOTICE:  0: Decompressor pglz_decompress_hacked16 result 0.739586
NOTICE:  0: Decompressor pglz_decompress_hacked32 result 0.787926
NOTICE:  0: Decompressor pglz_decompress_vanilla result 1.147282
NOTICE:  0: Payload 00010006
NOTICE:  0: Decompressor pglz_decompress_hacked result 0.201774
NOTICE:  0: Decompressor pglz_decompress_hacked4 result 0.211859
NOTICE:  0: Decompressor pglz_decompress_hacked8 result 0.212610
NOTICE:  0: Decompressor pglz_decompress_hacked16 result 0.214601
NOTICE:  0: Decompressor pglz_decompress_hacked32 result 0.221813
NOTICE:  0: Decompressor pglz_decompress_vanilla result 0.706005
NOTICE:  0: Payload 00010008
NOTICE:  0: Decompressor pglz_decompress_hacked result 1.370132
NOTICE:  0: Decompressor pglz_decompress_hacked4 result 1.388991
NOTICE:  0: Decompressor pglz_decompress_hacked8 result 1.388502
NOTICE:  0: Decompressor pglz_decompress_hacked16 result 1.529455
NOTICE:  0: Decompressor pglz_decompress_hacked32 result 1.520813
NOTICE:  0: Decompressor pglz_decompress_vanilla result 1.433527
NOTICE:  0: Payload 16398
NOTICE:  0: Decompressor pglz_decompress_hacked result 0.606943
NOTICE:  0: Decompressor pglz_decompress_hacked4 result 0.623044
NOTICE:  0: Decompressor pglz_decompress_hacked8 result 0.624118
NOTICE:  0: Decompressor pglz_decompress_hacked16 result 0.620987
NOTICE:  0: Decompressor pglz_decompress_hacked32 result 0.621183
NOTICE:  0: Decompressor pglz_decompress_vanilla result 1.365318

Comment: pglz_decompress_hacked is unconditionally optimal. On most of cases it 
is 2x better than current implementation.
On 00010008 it is only marginally better. 
pglz_decompress_hacked8 is few percents worse than pglz_decompress_hacked.

Desktop results:
NOTICE:  Time to decompress one byte in ns:
NOTICE:  Payload 00010001
NOTICE:  Decompressor pglz_decompress_hacked result 0.396454
NOTICE:  Decompressor pglz_decompress_hacked4 result 0.429249
NOTICE:  Decompressor pglz_decompress_hacked8 result 0.436413
NOTICE:  Decompressor pglz_decompress_hacked16 result 0.478077
NOTICE:  Decompressor pglz_decompress_hacked32 result 0.491488
NOTICE:  Decompressor pglz_decompress_vanilla result 0.695527
NOTICE:  Payload 00010006
NOTICE:  Decompressor pglz_decompress_hacked result 0.110710
NOTICE:  Decompressor pglz_decompress_hacked4 result 0.115669
NOTICE:  Decompressor pglz_decompress_hacked8 result 0.127637
NOTICE:  Decompressor 

Re: pglz performance

2019-05-13 Thread Michael Paquier
On Mon, May 13, 2019 at 07:45:59AM +0500, Andrey Borodin wrote:
> I was reviewing Paul Ramsey's TOAST patch[0] and noticed that there
> is a big room for improvement in performance of pglz compression and
> decompression.

Yes, I believe so too.  pglz is a huge CPU-consumer when it comes to
compilation compared to more modern algos like lz4.

> With Vladimir we started to investigate ways to boost byte copying
> and eventually created test suit[1] to investigate performance of
> compression and decompression.  This is and extension with single
> function test_pglz() which performs tests for different: 
> 1. Data payloads
> 2. Compression implementations
> 3. Decompression implementations

Cool.  I got something rather similar in my wallet of plugins:
https://github.com/michaelpq/pg_plugins/tree/master/compress_test
This is something I worked on mainly for FPW compression in WAL.

> Currently we test mostly decompression improvements against two WALs
> and one data file taken from pgbench-generated database. Any
> suggestion on more relevant data payloads are very welcome.

Text strings made of random data and variable length?  For any test of
this kind I think that it is good to focus on the performance of the
low-level calls, even going as far as a simple C wrapper on top of the
pglz APIs to test only the performance and not have extra PG-related
overhead like palloc() which can be a barrier.  Focusing on strings of
lengths of 1kB up to 16kB may be an idea of size, and it is important
to keep the same uncompressed strings for performance comparison.

> My laptop tests show that our decompression implementation [2] can
> be from 15% to 50% faster.  Also I've noted that compression is
> extremely slow, ~30 times slower than decompression. I believe we
> can do something about it.

That's nice.

> We focus only on boosting existing codec without any considerations
> of other compression algorithms.

There is this as well.  A couple of algorithms have a license
compatible with Postgres, but it may be more simple to just improve
pglz.  A 10%~20% improvement is something worth doing.

> Most important questions are:
> 1. What are relevant data sets?
> 2. What are relevant CPUs? I have only XEON-based servers and few
> laptops\desktops with intel CPUs
> 3. If compression is 30 times slower, should we better focus on
> compression instead of decompression?

Decompression can matter a lot for mostly-read workloads and
compression can become a bottleneck for heavy-insert loads, so
improving compression or decompression should be two separate
problems, not two problems linked.  Any improvement in one or the
other, or even both, is nice to have.
--
Michael


signature.asc
Description: PGP signature


pglz performance

2019-05-12 Thread Andrey Borodin
Hi hackers!

I was reviewing Paul Ramsey's TOAST patch[0] and noticed that there is a big 
room for improvement in performance of pglz compression and decompression.

With Vladimir we started to investigate ways to boost byte copying and 
eventually created test suit[1] to investigate performance of compression and 
decompression.
This is and extension with single function test_pglz() which performs tests for 
different:
1. Data payloads
2. Compression implementations
3. Decompression implementations

Currently we test mostly decompression improvements against two WALs and one 
data file taken from pgbench-generated database. Any suggestion on more 
relevant data payloads are very welcome.
My laptop tests show that our decompression implementation [2] can be from 15% 
to 50% faster.
Also I've noted that compression is extremely slow, ~30 times slower than 
decompression. I believe we can do something about it.

We focus only on boosting existing codec without any considerations of other 
compression algorithms.

Any comments are much appreciated.

Most important questions are:
1. What are relevant data sets?
2. What are relevant CPUs? I have only XEON-based servers and few 
laptops\desktops with intel CPUs
3. If compression is 30 times slower, should we better focus on compression 
instead of decompression?

Best regards, Andrey Borodin.


[0] 
https://www.postgresql.org/message-id/flat/CANP8%2BjKcGj-JYzEawS%2BCUZnfeGKq4T5LswcswMP4GUHeZEP1ag%40mail.gmail.com
[1] https://github.com/x4m/test_pglz
[2] 
https://www.postgresql.org/message-id/C2D8E5D5-3E83-469B-8751-1C7877C2A5F2%40yandex-team.ru