On Wed, Feb 12, 2014 at 8:19 PM, Bruce Momjian <br...@momjian.us> wrote:
> On Wed, Feb 12, 2014 at 10:02:32AM +0530, Amit Kapila wrote:
>
> I think 99.9% of users are never going to adjust this so we had better
> choose something we are happy to enable for effectively everyone.  In my
> reading, prefix/suffix seemed safe for everyone.  We can always revisit
> this if we think of something better later, as WAL format changes are not
> a problem for pg_upgrade.

  Agreed.

> I also think making it user-tunable is so hard for users to know when to
> adjust as to be almost not worth the user interface complexity it adds.
>
> I suggest we go with always-on prefix/suffix mode, then add some check
> so the worst case is avoided by just giving up on compression.
>
> As I said previously, I think compressing the page images is the next
> big win in this area.
>
>> I think here one might argue that for some users it is not feasible to
>> decide whether their tuples data for UPDATE is going to be similar
>> or completely different and they are not at all ready for any risk for
>> CPU overhead, but they would be happy to see I/O reduction in which
>> case it is difficult to decide what should be the value of table-level
>> switch. Here I think the only answer is "nothing is free" in this world,
>> so either make sure about the application's behaviour for UPDATE
>> statement before going to production or just don't enable this switch and
>> be happy with the current behaviour.
>
> Again, can't set do a minimal attempt at prefix/suffix compression so
> there is no measurable overhead?

Yes, currently it is there at 25%, which means there should be atleast 25%
match in prefix-suffix, then only we consider it for compression and that
is pretty fast and almost no overhead, but the worst case here is other
way i.e when the string has 25% match in prefix-suffix, but after that
there is no match or at least in next few bytes there is no match.

For example, consider below 2 cases:

Case-1

old tuple
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

new tuple
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbaaaaaaaaaaaaaaaaaaaaaaaaa

Here there is a suffix match for 25% of string, but after that there is no
match, so we have to copy all the 75% remaining bytes as it is byte-by-byte.
Now here with bit longer tuples (800 bytes), the performance data taken be me
shows around ~11% of CPU overhead. Now as this test is a fabricated test
to just see how much extra CPU it consumes for worst scenario, in reality
user might not see this, at least in synchronous commit mode on, because
there is always some I/O involved at end of transaction (unless there is some
error in between or user rollbacks transaction chances of which are very less).


First thing that comes to mind after seeing above scenario, is that why not
increase the minimum limit of 25%, because we have almost negligible
overhead in comparing prefix-suffix, so I have tried that by increasing it
to 35% or more but in that case it starts falling from other side like
for cases when there is 34% match and still we return.

Here one of the improvements which can be done is that after prefix-suffix
match, instead of going byte-by-byte copy as per LZ format we can directly
copy all the remaining part of tuple but I think that would require us to use
some different format than LZ which is also not too difficult to do, but the
question is do we really need such a change to handle the above kind of
worst case.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to