On Thu, Jan 25, 2018 at 6:21 PM, Thomas Munro
<thomas.mu...@enterprisedb.com> wrote:
> On Fri, Jan 26, 2018 at 9:38 AM, Claudio Freire <klaussfre...@gmail.com> 
> wrote:
>> I had the tests running in a loop all day long, and I cannot reproduce
>> that variance.
>> Can you share your steps to reproduce it, including configure flags?
> Here are two build logs where it failed:
> https://travis-ci.org/postgresql-cfbot/postgresql/builds/332968819
> https://travis-ci.org/postgresql-cfbot/postgresql/builds/332592511
> Here's one where it succeeded:
> https://travis-ci.org/postgresql-cfbot/postgresql/builds/333139855
> The full build script used is:
> ./configure --enable-debug --enable-cassert --enable-coverage
> --enable-tap-tests --with-tcl --with-python --with-perl --with-ldap
> --with-icu && make -j4 all contrib docs && make -Otarget -j3
> check-world
> This is a virtualised 4 core system.  I wonder if "make -Otarget -j3
> check-world" creates enough load on it to produce some weird timing
> effect that you don't see on your development system.

I can't reproduce it, not even with the same build script.

It's starting to look like a timing effect indeed.

I get a similar effect if there's an active snapshot in another
session while vacuum runs. I don't know how the test suite ends up in
that situation, but it seems to be the case.

How do you suggest we go about fixing this? The test in question is
important, I've caught actual bugs in the implementation with it,
because it checks that vacuum effectively frees up space.

I'm thinking this vacuum test could be put on its own parallel group
perhaps? Since I can't reproduce it, I can't know whether that will
fix it, but it seems sensible.

