On Tue, Dec 23, 2014 at 11:55 AM, Peter Geoghegan <p...@heroku.com> wrote:

> On Thu, Dec 18, 2014 at 9:20 AM, Jeff Janes <jeff.ja...@gmail.com> wrote:
> > I've put this through an adaptation of my usual torture test, and it ran
> > fine until wraparound shutdown.  I'll poke at it more later.
>
> Could you elaborate, please? What are the details of the torture test
> you're performing?
>

I've uploaded it here.

https://drive.google.com/folderview?id=0Bzqrh1SO9FcEZ3plX0l5RWNXd00&usp=sharing

The gist of it is that I increment a count column of a random row (via pk)
in multiple connections simultaneously.

When the server crashes, or it gets to a certain number of increments, the
threads report their activity up to the parent, which then waits for
automatic recovery and compares the state of the database to the reported
state of the children threads.

That is for my original code.  For this purpose, I made the count go either
up or down randomly, and when a row's count passes through zero it gets
deleted.  Then when it is chosen for increment/decrement again, it has to
be inserted.  I've made this happen either through a
update-or-insert-or-retry loop (two variants) or by using your new syntax.

There is a patch which adds a simulation for a torn-page-write followed by
a crash, and also adds some elogs that I've sometimes found useful for
tracking down problems, with new GUCs to control them.

I don't think you made changes to the WAL/recovery routines, so I don't
expect crashing recovery to be a big hazard for your patch, but I wanted to
run a test where I was generally familiar with the framework, and thought
an independently derived test might exercise some new aspects.

The one thing I noticed is that using your syntax starts out slightly
slower than the retry loop, but then gets much slower (down by 2 or 3
times) after a while.  It might be a vacuuming issue.  The constant
intentional crashes interferes with good vacuuming behavior, and I need to
retest this with the intentional crashes turned off to see if that fixes
it.  I'm having difficult access to my usual testing hardware over the
holidays, so I'm not getting as much done as I hoped.

I'll try to look at your own stress tests on github as well.

Cheers,

Jeff

Reply via email to