On Tue, Jan 3, 2017 at 12:07 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > [ shrug... ] I have not seen you putting any effort into optimizing > slock_t on non-x86 architectures, where it might make a difference today. > Why all the concern about shaving hypothetical future bytes for > pg_atomic_flag?
I don't know what you're getting all wrapped around the axle here about. I already explained why I think it's an issue. You can disagree if you like. As to whether I've put any effort into optimizing slock_t on non-x86 architectures, I commented in my first post to this thread about my disappointment regarding the situation on ppc64. I didn't realize that we had that issue and I think it would be worth fixing at some point, but I haven't quite gotten around to it in the 4 hours since I had that realization. I have previously done work on non-x86 spinlocks, though (c01c25fbe525869fa81237954727e1eb4b7d4a14). > But to reduce this to the simplest possible terms: it really does not > matter whether the implementation saves bytes or cycles or anything else, > if it fails to *work*. The existing coding for PPC fails on FreeBSD, > and likely on some other platforms using the same compiler. I'd be the > first to say that that doesn't reflect well on the state of their PPC > port, but it's reality that we ought to be cognizant of. And we've > worked around similar bugs nearby, eg see this bit in atomics.h: > > * Given a gcc-compatible xlc compiler, prefer the xlc implementation. The > * ppc64le "IBM XL C/C++ for Linux, V13.1.2" implements both interfaces, but > * __sync_lock_test_and_set() of one-byte types elicits SIGSEGV. > > From here it seems like you're arguing that we mustn't give FreeBSD > the identical pass that we already gave IBM, for what is nearly the > same bug. The only point I'm making here is that the width of a spinlock is not irrelevant. Or at least, it didn't use to be. lwlock.h says: * LWLOCK_MINIMAL_SIZE should be 32 on basically all common platforms, but * because slock_t is more than 2 bytes on some obscure platforms, we allow * for the possibility that it might be 64. That comment is actually nonsense since 008608b9d51061b1f598c197477b3dc7be9c4a64 but before that it was relevant. Also, before 48354581a49c30f5757c203415aa8412d85b0f70, a BufferDesc fit within a 64-byte cache line if slock_t was 1 or 2 bytes, a point commented on explicitly by 6150a1b08a9fe7ead2b25240be46dddeae9d98e1. So we've clearly made efforts in that direction in the past. Looking a little more, though, since both lwlock.c and bufmgr.c have been rewritten to use atomics directly, there might not be any remaining places where the spinlocks get hot enough to care about the extra bytes. All things being equal I still think a narrower one is significantly better than a wider one, but we can always leave solving that problem to a day when the difference can be proved out by benchmarks. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers