Instead of de-supporting platforms that don't have CAS support or
providing parallel implementations we could relatively easily build a
spinlock based fallback using the already existing requirement for
Something like an array of 16 spinlocks, indexed by a more advanced
version of ((char *)(&atomics) >> sizeof(char *)) % 16. The platforms
that would fallback aren't that likely to be used under heavy
concurrency, so the price for that shouldn't be too high.

The only real problem with that would be that we'd need to remove the
spinnlock fallback for barriers, but that seems to be pretty much



