From: pgsql-hackers-ow...@postgresql.org
[mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Simon Riggs
Sent: Thursday, August 09, 2012 2:49 PM
On 9 August 2012 09:49, Amit Kapila <amit.kap...@huawei.com> wrote:

>>> I'd suggest we do this only when the saving is large enough for
>>> benefit, rather than do this every time.
>>   Do you mean to say that when length of updated values of tuple is less
>> than some threshold(1/3 or 2/3, etc..) value of
>>   total length?

> Some heuristic, yes, similar to TOAST's minimum threshold. To attempt
> removal of rows in all cases would not be worth it, so we need a fast
> path way of saying lets just take all of the columns.

  Yes, it has to be done. Currently I have 2 ideas to take care of this:
  a. Based on number of updated columns
  b. Based on length of updated values
  If you have any other idea or you favor among one of the above, let me
know your opinion.

>>> You don't mention whether or not the old and the new tuple are on the
>>> same data block.
>
>>   WAL reduction is done for the case even when old and new are on
different
>> data blocks as well.

> That makes me feel nervous. I doubt the marginal gain is worth it.
> Most updates don't cross blocks.

How can it be proved whether gain is marginal or substantial to handle the
case.

One way is test after modification:
I have updated pg_bench tpc_b case:
1. Schema is such that it contains 1800 length rows
2. tpc_b only has updates
3. length of updated column values is 300.
4. All tables has 100% fill factor.
5. Vacuum is OFF

So in such a run, I think many should be updates are across blocks. But not
sure, neither I have verified it in any way.
The above run has given a good performance improvement.



>>> Please also bear in mind that Andres will be looking to include the PK
>>> columns in every WAL record for BDR. That could be an option, but I
>>> doubt there is much value in excluding PK columns.
>
>>   Agreed. However once the implementation by Andres is done I can merge
both
>> codes and
>>   take the performance data again, based on which we can take decision.

> It won't happen like that because there won't be a single point where
> Andres is done. If you agree, then its worth doing it that way to
> begin with, rather than requiring us to revisit the same section of
> code twice.

This optimization is to reduce the amount of WAL and definitely adding
anything extra will 
have some impact. 
However if there is no better way other than by including PK in WAL, then I
don't have any problem.

> One huge point that needs to be thought through is how we prove this
> code actually works on WAL/recovery side. A normal regression test
> won't prove that and we don't have a framework in place for that.

My initial idea to validate recovery :
1. Manual Test: a. To generate enough scenarios for update operation. 
                b. For each scenario, make sure Replay happens properly.
2. Community Review.



With Regards,
Amit Kapila.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to