Hi Tomas! Thanks for benchmarking this!
> 26 нояб. 2019 г., в 14:43, Tomas Vondra <tomas.von...@2ndquadrant.com> > написал(а): > > On Mon, Nov 25, 2019 at 05:29:40PM +0900, Michael Paquier wrote: >> On Mon, Nov 25, 2019 at 01:21:27PM +0500, Andrey Borodin wrote: >>> I think status Needs Review describes what is going on better. It's >>> not like something is awaited from my side. >> >> Indeed. You are right so I have moved the patch instead, with "Needs >> review". The patch status was actually incorrect in the CF app, as it >> was marked as waiting on author. >> >> @Tomas: updated versions of the patches have been sent by Andrey. > > I've done benchmarks on the two last patches, using the data sets from > test_pglz repository [1], but using three simple queries: > > 1) prefix - first 100 bytes of the value > > SELECT length(substr(value, 0, 100)) FROM t > > 2) infix - 100 bytes from the middle > > SELECT length(substr(value, test_length/2, 100)) FROM t > > 3) suffix - last 100 bytes > > SELECT length(substr(value, test_length - 100, 100)) FROM t > > See the two attached scripts, implementing this benchmark. The test > itself did a 60-second pgbench runs (single client) measuring tps on two > different machines. > > patch 1: v4-0001-Use-memcpy-in-pglz-decompression.patch > patch 2: v4-0001-Use-memcpy-in-pglz-decompression-for-long-matches.patch > > The results (compared to master) from the first machine (i5-2500k CPU) > look like this: > > patch 1 | patch 2 > dataset prefix infix suffix | prefix infix suffix > ------------------------------------------------------------------------- > 000000010000000000000001 99% 134% 161% | 100% 126% 152% > 000000010000000000000006 99% 260% 287% | 100% 257% 279% > 000000010000000000000008 100% 100% 100% | 100% 95% 91% > 16398 100% 168% 221% | 100% 159% 215% > shakespeare.txt 100% 138% 141% | 100% 116% 117% > mr 99% 120% 128% | 100% 107% 108% > dickens 100% 129% 132% | 100% 100% 100% > mozilla 100% 119% 120% | 100% 102% 104% > nci 100% 149% 141% | 100% 143% 135% > ooffice 99% 121% 123% | 100% 97% 98% > osdb 100% 99% 99% | 100% 100% 99% > reymont 99% 130% 132% | 100% 106% 107% > samba 100% 126% 132% | 100% 105% 111% > sao 100% 100% 99% | 100% 100% 100% > webster 100% 127% 127% | 100% 106% 106% > x-ray 99% 99% 99% | 100% 100% 100% > xml 100% 144% 144% | 100% 130% 128% > > and on the other one (xeon e5-2620v4) looks like this: > > patch 1 | patch 2 > dataset prefix infix suffix | prefix infix suffix > ------------------------------------------------------------------------ > 000000010000000000000001 98% 147% 170% | 98% 132% 159% > 000000010000000000000006 100% 340% 314% | 98% 334% 355% > 000000010000000000000008 99% 100% 105% | 99% 99% 101% > 16398 101% 153% 205% | 99% 148% 201% > shakespeare.txt 100% 147% 149% | 99% 117% 118% > mr 100% 131% 139% | 99% 112% 108% > dickens 100% 143% 143% | 99% 103% 102% > mozilla 100% 122% 122% | 99% 105% 106% > nci 100% 151% 135% | 100% 135% 125% > ooffice 99% 127% 129% | 98% 101% 102% > osdb 102% 100% 101% | 102% 100% 99% > reymont 101% 142% 143% | 100% 108% 108% > samba 100% 132% 136% | 99% 109% 112% > sao 99% 101% 100% | 99% 100% 100% > webster 100% 132% 129% | 100% 106% 106% > x-ray 99% 101% 100% | 90% 101% 101% > xml 100% 147% 148% | 100% 127% 125% > > In general, I think the results for both patches seem clearly a win, but > maybe patch 1 is bit better, especially on the newer (xeon) CPU. So I'd > probably go with that one. From my POV there are two interesting new points in your benchmarks: 1. They are more or lesss end-to-end benchmarks with whole system involved. 2. They provide per-payload breakdown Prefix experiment is mostly related to reading from page cache and not directly connected with decompression. It's a bit strange that we observe 1% degradation in certain experiments, but I believe it's a noise. Infix and Suffix results are correlated. We observe no impact of the patch on compressed data. test_pglz also includes slicing by 2Kb and 8Kb. This was done to imitate toasting. But as far as I understand, in your test data payload will be inserted into toast table too, won't it? If so, I agree that patch 1 looks like a better option. > 27 нояб. 2019 г., в 1:05, Tomas Vondra <tomas.von...@2ndquadrant.com> > написал(а): > > Code-wise I think the patches are mostly fine, although the comments > might need some proof-reading. > > 1) I wasn't really sure what a "nibble" is, but maybe it's just me and > it's a well-known term. I've took the word from pg_lzcompress.c comments * The offset is in the upper nibble of T1 and in T2. * The length is in the lower nibble of T1. > > 2) First byte use lower -> First byte uses lower > > 3) nibble contain upper -> nibble contains upper > > 4) to preven possible uncertanity -> to prevent possible uncertainty > > 5) I think we should briefly explain why memmove would be incompatible > with pglz, it's not quite clear to me. Here's the example + * Consider input: 112341234123412341234 + * At byte 5 here ^ we have match with length 16 and + * offset 4. 11234M(len=16, off=4) If we simply memmove() this 16 bytes we will produce 112341234XXXXXXXXXXXX, where series of X is 12 undefined bytes, that were at bytes [6:18]. > > 6) I'm pretty sure the comment in the 'while (off < len)' branch will be > badly mangled by pgindent. I think I can just write it without line limit and then run pgindent. Will try to do it this evening. Also, I will try to write more about memmove. > > 7) The last change moving "copy" to the next line seems unnecessary. Oh, looks like I had been rewording this comment, and eventually came to the same text..Yes, this change is absolutely unnecessary. Thanks! Best regards, Andrey Borodin.