On 9 July 2018 at 14:49, Heikki Linnakangas <hlinn...@iki.fi> wrote: > On 03/04/18 19:20, Andres Freund wrote: >> >> On 2018-04-03 09:56:24 -0400, Tom Lane wrote: >>> >>> Heikki Linnakangas <hlinn...@iki.fi> writes: >>>> >>>> But let's go back to why we're considering this. The idea was to >>>> optimize this block: >>>> ... >>>> One trick that we could do is to replace that with a 128-bit atomic >>>> compare-and-swap instruction. Modern 64-bit Intel systems have that, >>>> it's called CMPXCHG16B. Don't know about other architectures. An atomic >>>> fetch-and-add, as envisioned in the comment above, would presumably be >>>> better, but I suspect that a compare-and-swap would be good enough to >>>> move the bottleneck elsewhere again. >>> >>> >>> +1 for taking a look at that. A bit of experimentation shows that >>> recent gcc and clang can generate that instruction using >>> __sync_bool_compare_and_swap or __sync_val_compare_and_swap >>> on an __int128 value. >> >> >> The problem will presumably be that early opteron AMD64s lacked that >> instruction. I'm not sure which distributions still target them (windows >> dropped support a few years ago), but we should make sure that neither >> the necessary dispatch code isn't going to add so much overhead it's >> eating into our margin, nor that the generated code SIGILLs on such >> platforms. > > > Yeah. > > I'll mark this as "returned with feedback" in the commitfest. The way > forward is to test if we can get the same performance benefit from switching > to CMPXCHG16B, and keep the WAL format unchanged. If not, then we can > continue discussing the WAL format and the tradeoffs with xl_prev, but let's > take the easy way out if we can.
Agreed. Were you working on this? Or was anybody else? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services