On 06/02/14 16:59, Robert Haas wrote:
On Wed, Feb 5, 2014 at 6:43 AM, Heikki Linnakangas
<hlinnakan...@vmware.com> wrote:
So, I came up with the attached worst case test, modified from your latest
test suite.


                testname               | wal_generated |     duration
  ten long fields, all but one changed |     343385312 | 2.20806908607483
  ten long fields, all but one changed |     336263592 | 2.18997097015381
  ten long fields, all but one changed |     336264504 | 2.17843413352966
(3 rows)


                testname               | wal_generated |     duration
  ten long fields, all but one changed |     338356944 | 3.33501315116882
  ten long fields, all but one changed |     344059272 | 3.37364101409912
  ten long fields, all but one changed |     336257840 | 3.36244201660156
(3 rows)

So with this test, the overhead is very significant.
Yuck.  Well that sucks.

With the skipping logic, another kind of "worst case" case is that you have
a lot of similarity between the old and new tuple, but you miss it because
you skip. For example, if you change the first few columns, but leave a
large text column at the end of the tuple unchanged.
I suspect there's no way to have our cake and eat it, too.  Most of
the work that Amit has done on this patch in the last few revs is to
cut back CPU overhead in the cases where the patch can't help because
the tuple has been radically modified.  If we're trying to get maximum
compression, we need to go the other way: for example, we could just
feed both the old and new tuples through pglz (or snappy, or
whatever).  That would allow us to take advantage not only of
similarity between the old and new tuples but also internal
duplication within either the old or the new tuple, but it would also
cost more CPU.  The concern with minimizing overhead in cases where
the compression doesn't help has thus far pushed us in the opposite
direction, namely passing over compression opportunities that a more
aggressive algorithm could find in order to keep the overhead low.

Off-hand, I'm wondering why we shouldn't apply the same skipping
algorithm that Amit is using at the beginning of the string for the
rest of it as well.  It might be a little too aggressive (maybe the
skip distance shouldn't increase by quite as much as doubling every
time, or not beyond 16/32 bytes?) but I don't see why the general
principle isn't sound wherever we are in the tuple.

Unfortunately, despite changing things to make a history entry only
every 4th character, building the history is still pretty expensive.
By the time we even begin looking at the tuple we're gonna compress,
we've already spent something like half the total effort, and of
course we have to go further than that before we know whether our
attempt to compress is actually going anywhere.  I think that's the
central problem here.  pglz has several safeguards to ensure that it
doesn't do too much work in vain: we abort if we find nothing
compressible within first_success_by bytes, or if we emit enough total
output to be certain that we won't meet the need_rate threshold.
Those safeguards are a lot less effective here because they can't be
applied until *after* we've already paid the cost of building the
history.  If we could figure out some way to apply those guards, or
other guards, earlier in the algorithm, we could do a better job
mitigating the worst-case scenarios, but I don't have a good idea.

Surely the weighting should be done according to the relative scarcity of processing power vs I/O bandwidth? I get the impression that different workloads and hardware configurations may favour conserving one of processor or I/O resources. Would it be feasible to have different logic, depending on the trade-offs identified?


Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to