Re: logical decoding and replication of sequences, take 2

Amit Kapila Tue, 05 Dec 2023 04:18:34 -0800

On Sun, Dec 3, 2023 at 11:22 PM Tomas Vondra
<tomas.von...@enterprisedb.com> wrote:
>
> Thanks for the script. Are you also measuring the time it takes to
> decode this using test_decoding?
>
> FWIW I did more comprehensive suite of tests over the weekend, with a
> couple more variations. I'm attaching the updated scripts, running it
> should be as simple as
>
>   ./run.sh BRANCH TRANSACTIONS RUNS
>
> so perhaps
>
>   ./run.sh master 1000 3
>
> to do 3 runs with 1000 transactions per client. And it'll run a bunch of
> combinations hard-coded in the script, and write the timings into a CSV
> file (with "master" in each row).
>
> I did this on two machines (i5 with 4 cores, xeon with 16/32 cores). I
> did this with current master, the basic patch (without the 0002 part),
> and then with the optimized approach (single global hash table, see the
> 0004 part). That's what master / patched / optimized in the results is.
>
> Interestingly enough, the i5 handled this much faster, it seems to be
> better in single-core tasks. The xeon is still running, so the results
> for "optimized" only have one run (out of 3), but shouldn't change much.
>
> Attached is also a table summarizing this, and visualizing the timing
> change (vs. master) in the last couple columns. Green is "faster" than
> master (but we don't really expect that), and "red" means slower than
> master (the more red, the slower).
>
> There results are grouped by script (see the attached .tgz), with either
> 32 or 96 clients (which does affect the timing, but not between master
> and patch). Some executions have no pg_sleep() calls, some have 0.001
> wait (but that doesn't seem to make much difference).
>
> Overall, I'd group the results into about three groups:
>
> 1) good cases [nextval, nextval-40, nextval-abort]
>
> These are cases that slow down a bit, but the slowdown is mostly within
> reasonable bounds (we're making the decoding to do more stuff, so it'd
> be a bit silly to require that extra work to make no impact). And I do
> think this is reasonable, because this is pretty much an extreme / worst
> case behavior. People don't really do just nextval() calls, without
> doing anything else. Not to mention doing aborts for 100% transactions.
>
> So in practice this is going to be within noise (and in those cases the
> results even show speedup, which seems a bit surprising). It's somewhat
> dependent on CPU too - on xeon there's hardly any regression.
>
>
> 2) nextval-40-abort
>
> Here the slowdown is clear, but I'd argue it generally falls in the same
> group as (1). Yes, I'd be happier if it didn't behave like this, but if
> someone can show me a practical workload affected by this ...
>
>
> 3) irrelevant cases [all the alters taking insane amounts of time]
>
> I absolutely refuse to care about these extreme cases where decoding
> 100k transactions takes 5-10 minutes (on i5), or up to 30 minutes (on
> xeon). If this was a problem for some practical workload, we'd have
> already heard about it I guess. And even if there was such workload, it
> wouldn't be up to this patch to fix that. There's clearly something
> misbehaving in the snapshot builder.
>
>
> I was hopeful the global hash table would be an improvement, but that
> doesn't seem to be the case. I haven't done much profiling yet, but I'd
> guess most of the overhead is due to ReorderBufferQueueSequence()
> starting and aborting a transaction in the non-transactinal case. Which
> is unfortunate, but I don't know if there's a way to optimize that.
>


Before discussing the alternative ideas you shared, let me try to
clarify my understanding so that we are on the same page. I see two
observations based on the testing and discussion we had (a) for
non-transactional cases, the overhead observed is mainly due to
starting/aborting a transaction for each change; (b) for transactional
cases, we see overhead due to traversing all the top-level txns and
check the hash table for each one to find whether change is
transactional.

Am, I missing something?

-- 
With Regards,
Amit Kapila.

Re: logical decoding and replication of sequences, take 2

Reply via email to