On Sun, May 28, 2017 at 3:17 PM, Mark Kirkwood <
> On 28/05/17 19:01, Mark Kirkwood wrote:
>> So running in cloud land now...so for no errors - will update.
> The framework ran 600 tests last night, and I see 3 'NOK' results, i.e 3
> failed test runs (all scale 25 and 8 pgbench clients). Given the way the
> test decides on failure (gets tired of waiting for the table md5's to
> match) - it begs the question 'What if it had waited a bit longer'? However
> from what I can see in all cases:
> - the rowcounts were the same in master and replica
> - the md5 of pgbench_accounts was different
All four tables should be wrong if there is still a transaction it is
waiting for, as all the changes happen in a single transaction.
I also got a failure, after 87 iterations of a similar test case. It
waited for hours, as mine requires manual intervention to stop waiting. On
the subscriber, one account still had a zero balance, while the history
table on the subscriber agreed with both history and accounts on the
publisher and the account should not have been zero, so definitely a
transaction atomicity got busted.
I altered the script to also save the tellers and branches tables and
repeated the runs, but so far it hasn't failed again in over 800 iterations
using the altered script.
> ...so does seem possible that there is some bug being tickled here.
> Unfortunately the test framework blasts away the failed tables and
> subscription and continues on...I'm going to amend it to stop on failure so
> I can have a closer look at what happened.
What would you want to look at? Would saving the WAL from the master be