On Thu, Oct 9, 2014 at 7:31 PM, Andres Freund <and...@2ndquadrant.com> wrote: > > On 2014-10-09 18:17:09 +0530, Amit Kapila wrote: > > On Fri, Sep 26, 2014 at 7:04 PM, Robert Haas <robertmh...@gmail.com> wrote: > > > > > > On another point, I think it would be a good idea to rebase the > > > bgreclaimer patch over what I committed, so that we have a > > > clean patch against master to test with. > > > > Please find the rebased patch attached with this mail. I have taken > > some performance data as well and done some analysis based on > > the same. > > > > Performance Data > > ---------------------------- > > IBM POWER-8 24 cores, 192 hardware threads > > RAM = 492GB > > max_connections =300 > > Database Locale =C > > checkpoint_segments=256 > > checkpoint_timeout =15min > > shared_buffers=8GB > > scale factor = 5000 > > Client Count = number of concurrent sessions and threads (ex. -c 8 -j 8) > > Duration of each individual run = 5mins > > I don't think OLTP really is the best test case for this. Especially not > pgbench with relatilvely small rows *and* a uniform distribution of > access. > > Try parallel COPY TO. Batch write loads is where I've seen this hurt > badly. > > > patch_ver/client_count 1 8 32 64 128 256 > > HEAD 18884 118628 251093 216294 186625 177505 > > PATCH 18743 122578 247243 205521 179712 175031 > > So, pretty much no benefits on any scale, right?
Almost Right, there seem to be slight benefit at client count 8, however that can be due to variation as well. > > Here we can see that the performance dips at higher client > > count(>=32) which was quite surprising for me, as I was expecting > > it to improve, because bgreclaimer reduces the contention by making > > buffers available on free list. So I tried to analyze the situation by > > using perf and found that in above configuration, there is a contention > > around freelist spinlock with HEAD and the same is removed by Patch, > > but still the performance goes down with Patch. On further analysis, I > > observed that actually after Patch there is an increase in contention > > around ProcArrayLock (shared LWlock) via GetSnapshotData which > > sounds bit odd, but that's what I can see in profiles. Based on analysis, > > few ideas which I would like to further investigate are: > > a. As there is an increase in spinlock contention, I would like to check > > with Andres's latest patch which reduces contention around shared > > lwlocks. > > b. Reduce some instructions added by patch in StrategyGetBuffer(), > > like instead of awakening bgreclaimer at low threshold, awaken when > > it tries to do clock sweep. > > > > Are you sure you didn't mix up the profiles here? I have tried this 2 times, basically I am quite confident from myside, but human errors can't be ruled out. I have used below statements: Steps used for profiling during configure, use CFLAS="-fno-omit-frame-pointer" Terminal -1 Start Server Terminal -2 ./pgbench -c 64 -j 64 -T 300 -S -M prepared postgres Terminal-3 perf record -a -g sleep 60 --This command is run after a minute or so of -- test start After test is finished - perf report -g graph,0.5,callee Do you see any problem in the way I am collecting perf reports? In any case, I can try once more if you still doubt the profiles. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com