Hello,

At Fri, 2 Feb 2018 19:52:02 -0300, Claudio Freire <klaussfre...@gmail.com> 
wrote in <cagtbqpainqsnjc8y4w82ubtapsvsqrrg++yei5wre1mfe2i...@mail.gmail.com>
> On Thu, Jan 25, 2018 at 6:21 PM, Thomas Munro
> <thomas.mu...@enterprisedb.com> wrote:
> > On Fri, Jan 26, 2018 at 9:38 AM, Claudio Freire <klaussfre...@gmail.com> 
> > wrote:
> >> I had the tests running in a loop all day long, and I cannot reproduce
> >> that variance.
> >>
> >> Can you share your steps to reproduce it, including configure flags?
> >
> > Here are two build logs where it failed:
> >
> > https://travis-ci.org/postgresql-cfbot/postgresql/builds/332968819
> > https://travis-ci.org/postgresql-cfbot/postgresql/builds/332592511
> >
> > Here's one where it succeeded:
> >
> > https://travis-ci.org/postgresql-cfbot/postgresql/builds/333139855
> >
> > The full build script used is:
> >
> > ./configure --enable-debug --enable-cassert --enable-coverage
> > --enable-tap-tests --with-tcl --with-python --with-perl --with-ldap
> > --with-icu && make -j4 all contrib docs && make -Otarget -j3
> > check-world
> >
> > This is a virtualised 4 core system.  I wonder if "make -Otarget -j3
> > check-world" creates enough load on it to produce some weird timing
> > effect that you don't see on your development system.
> 
> I can't reproduce it, not even with the same build script.

I had the same error by "make -j3 check-world" but only twice
from many trials.

> It's starting to look like a timing effect indeed.

It seems to be truncation skip, maybe caused by concurrent
autovacuum. See lazy_truncate_heap() for details. Updates of
pg_stat_*_tables can be delayed so looking it also can fail. Even
though I haven't looked the patch closer, the "SELECT
pg_relation_size()" doesn't seem to give something meaningful
anyway.

> I get a similar effect if there's an active snapshot in another
> session while vacuum runs. I don't know how the test suite ends up in
> that situation, but it seems to be the case.
> 
> How do you suggest we go about fixing this? The test in question is
> important, I've caught actual bugs in the implementation with it,
> because it checks that vacuum effectively frees up space.
> 
> I'm thinking this vacuum test could be put on its own parallel group
> perhaps? Since I can't reproduce it, I can't know whether that will
> fix it, but it seems sensible.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


Reply via email to