On 09/08/2017 03:35 PM, Sokolov Yura wrote:
-M prepared: Up to 11% improvement
-M prepared -S: No improvement, no regression ("noise")
-M prepared -N: Up to 12% improvement
for all runs the improvement shows up the closer you get to the number
of CPU threads, or above. Although I'm not seeing the same
improvements as you on very large client counts there are definitely
It is expected:
- patch "fixes NUMA": for example, it doesn't give improvement on 1 socket
at all (I've tested it using numactl to bind to 1 socket)
- and certainly it gives less improvement on 2 sockets than on 4 sockets
(and 28 cores vs 72 cores also gives difference),
- one of hot points were CLogControlLock, and it were fixed with
"Group mode for CLOG updates" 
I'm planning to re-test that patch.
+static inline bool
+LWLockAttemptLockOrQueueSelf(LWLock *lock, LWLockMode mode,
I'll leave it to the Committer to decide if this method is too big to
GCC 4.9 doesn't want to inline it without directive, and function call
is then remarkable in profile.
Attach contains version with all suggestions applied except remove of
Yes, ideally the method will be kept at "inline".
* spins_per_delay as extern
* Calculation of skip_wait_list
Currently calculation of skip_wait_list is mostly empirical (ie best
Ok, good to know.
I strongly think that instead of spins_per_delay something dependent
on concrete lock should be used. I tried to store it in a LWLock
itself, but it were worse.
Yes, LWLock should be kept as small as possible, and cache line aligned
due to the cache storms, as shown by perf c2c.
Recently I understand it should be stored in array indexed by tranche,
but I didn't implement it yet, and therefore didn't measure.
Different constants for the LWLock could have an impact, but the
constants would also be dependent on machine setup, and work load.
Thanks for working on this !
Sent via pgsql-hackers mailing list (email@example.com)
To make changes to your subscription: