Re: _BitInt vs. _Atomic
Hello, On Tue, 1 Aug 2023, Joseph Myers wrote: > > Only because cmpxchg is defined in terms of memcpy/memcmp. If it were > > defined in terms of the == operator (obviously applied recursively > > member-wise for structs) and simple-assignment that wouldn't be a problem. > > It also wouldn't work for floating point, where I think clearly the atomic > operations should consider positive and negative zero as different, and > should consider different DFP quantum exponents for the same real number > as different - but should also consider the same NaN (same payload, same > choice of quiet / signaling) as being the same. That is all true. But the current wording can't work either. It happily requires copying around memory between types of different representations and sizes, it makes padding observable behaviour, and due to that makes basic algebraic guarantees not be observed (after two values are checked equal with the predicates of the algrabra, they are not then in fact equal with predicates of the same algebra). Ciao, Michael.
Re: _BitInt vs. _Atomic
Am Dienstag, dem 01.08.2023 um 15:54 + schrieb Michael Matz: > Hello, > > On Mon, 31 Jul 2023, Martin Uecker wrote: > > > > Say you have a loop like so: > > > > > > _Atomic T obj; > > > ... > > > T expected1, expected2, newval; > > > newval = ...; > > > expected1 = ...; > > > do { > > > expected2 = expected1; > > > if (atomic_compare_exchange_weak(, , newval); > > > break; > > > expected1 = expected2; > > > } while (1); > > > > > > As written this looks of course stupid, and you may say "don't do that", > > > but internally the copies might result from temporaries (compiler > > > generated or wrapper function arguments, or suchlike). > > > Now, while > > > expected2 will contain the copied padding bits after the cmpxchg the > > > copies to and from expected1 will possibly destroy them. Either way I > > > don't see why the above loop should be out-of-spec, so I can write it and > > > expect it to proceed eventually (certainly when the _strong variant is > > > used). Any argument that would declare the above loop out-of-spec I > > > would > > > consider a defect in the spec. > > > > It is "out-of-spec" for C in the sense that it can not be > > expected work with the semantics as specified in the C standard. > > (I call that a defect. See below) This was extensively discussed in WG14 (before my time). In fact, there was a defect report about the previous version defined in terms of values and the wording was changed to memcmp / memcpy operating on padding bytes (also to align with C++ at that time): https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2059.htm#dr_431 https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1906.htm > > In practice, what the semantics specified using memcpy/memcmp > > allow one to do is to also apply atomic operations on non-atomic > > types. This is not guaranteed to work by the C standard, but > > in practice people often have to do this. For example, nobody > > is going to copy a 256 GB numerical array with non-atomic types > > into another data structure with atomic versions of the same > > type just so that you can apply atomic operations on it. > > So one simply does an unsafe cast and hopes the compiler does > > not break this. > > > > If the non-atomic struct now has non-zero values in the padding, > > and the compiler would clear those automatically for "expected", > > you would create the problem of an infinite loop (this time > > for real). > > Only because cmpxchg is defined in terms of memcpy/memcmp. Yes, but this is intentional. > If it were > defined in terms of the == operator (obviously applied recursively > member-wise for structs) and simple-assignment that wouldn't be a problem. C has no == operator or any concept of struct equality. It would also cause implementation overhead and I guess could cause severe performance issues when there several padding bytes distributed over an object and you need to jump over those when doing copying or doing comparisons. (how do vectorization?) > In addition that would get rid of all discussion of what happens or > doesn't happen with padding. Introducing reliance on padding bits (which > IMHO goes against some fundamental ideas of the C standard) has > far-reaching consequences, see below. Working with representation bytes of objects is a rather fundamental property of C. That you can do this using character pointers or that you can copy objects with memcpy and that the result are compared with memcmp is something I expect to work in C. > The current definition of the > atomic_cmpxchg is also inconsistent with the rest of the standard: > > We have: > > ... (C is non-atomic variant of A) ... > _Bool atomic_compare_exchange_strong(volatile A *object, >C *expected, C desired); > ... (is equivalent to atomic variant of:) > if (memcmp(object, expected, sizeof (*object)) == 0) > { memcpy(object, , sizeof (*object)); return true; } > else > { memcpy(expected, object, sizeof (*object)); return false; } > > But we also have: > > The size, representation, and alignment of an atomic type need not be > the same as those of the corresponding unqualified type. > > (with later text only suggesting that at least for atomic integer > types these please be the same. But here we aren't talking about > integer types even.) Reading the old meeting minutes, it seems WG14 considered the case that an atomic type could have a content part and possibly a lock and you would compare only the content part (with padding) and not the lock. But I agree, the wording should be improved. > > So, already the 'memcmp(object, expected, sizeof (*object)' may be > undefined. sizeof(*object) need not be the same as sizeof(*expected). > In particular the memcpy in the else branch might clobber memory outside > *expected. > > That alone should be sufficient to show that defining this all in terms of > memcpy/memcmp is
Re: _BitInt vs. _Atomic
On Tue, 1 Aug 2023, Michael Matz via Gcc-patches wrote: > Only because cmpxchg is defined in terms of memcpy/memcmp. If it were > defined in terms of the == operator (obviously applied recursively > member-wise for structs) and simple-assignment that wouldn't be a problem. It also wouldn't work for floating point, where I think clearly the atomic operations should consider positive and negative zero as different, and should consider different DFP quantum exponents for the same real number as different - but should also consider the same NaN (same payload, same choice of quiet / signaling) as being the same. -- Joseph S. Myers jos...@codesourcery.com
Re: _BitInt vs. _Atomic
Hello, On Mon, 31 Jul 2023, Martin Uecker wrote: > > Say you have a loop like so: > > > > _Atomic T obj; > > ... > > T expected1, expected2, newval; > > newval = ...; > > expected1 = ...; > > do { > > expected2 = expected1; > > if (atomic_compare_exchange_weak(, , newval); > > break; > > expected1 = expected2; > > } while (1); > > > > As written this looks of course stupid, and you may say "don't do that", > > but internally the copies might result from temporaries (compiler > > generated or wrapper function arguments, or suchlike). > > Now, while > > expected2 will contain the copied padding bits after the cmpxchg the > > copies to and from expected1 will possibly destroy them. Either way I > > don't see why the above loop should be out-of-spec, so I can write it and > > expect it to proceed eventually (certainly when the _strong variant is > > used). Any argument that would declare the above loop out-of-spec I would > > consider a defect in the spec. > > It is "out-of-spec" for C in the sense that it can not be > expected work with the semantics as specified in the C standard. (I call that a defect. See below) > In practice, what the semantics specified using memcpy/memcmp > allow one to do is to also apply atomic operations on non-atomic > types. This is not guaranteed to work by the C standard, but > in practice people often have to do this. For example, nobody > is going to copy a 256 GB numerical array with non-atomic types > into another data structure with atomic versions of the same > type just so that you can apply atomic operations on it. > So one simply does an unsafe cast and hopes the compiler does > not break this. > > If the non-atomic struct now has non-zero values in the padding, > and the compiler would clear those automatically for "expected", > you would create the problem of an infinite loop (this time > for real). Only because cmpxchg is defined in terms of memcpy/memcmp. If it were defined in terms of the == operator (obviously applied recursively member-wise for structs) and simple-assignment that wouldn't be a problem. In addition that would get rid of all discussion of what happens or doesn't happen with padding. Introducing reliance on padding bits (which IMHO goes against some fundamental ideas of the C standard) has far-reaching consequences, see below. The current definition of the atomic_cmpxchg is also inconsistent with the rest of the standard: We have: ... (C is non-atomic variant of A) ... _Bool atomic_compare_exchange_strong(volatile A *object, C *expected, C desired); ... (is equivalent to atomic variant of:) if (memcmp(object, expected, sizeof (*object)) == 0) { memcpy(object, , sizeof (*object)); return true; } else { memcpy(expected, object, sizeof (*object)); return false; } But we also have: The size, representation, and alignment of an atomic type need not be the same as those of the corresponding unqualified type. (with later text only suggesting that at least for atomic integer types these please be the same. But here we aren't talking about integer types even.) So, already the 'memcmp(object, expected, sizeof (*object)' may be undefined. sizeof(*object) need not be the same as sizeof(*expected). In particular the memcpy in the else branch might clobber memory outside *expected. That alone should be sufficient to show that defining this all in terms of memcpy/memcmp is a bad idea. But it also has other consequences: you can't copy (simple-assign) or compare (== operator) atomic values anymore reliably and expect the atomic_cmpxchg to work. My example from earlier shows that you can't copy them, a similar one can be constructed for breaking ==. But it goes further: you can also construct an example that shows an internal inconsistency just with using atomic_cmpxchg (of course, assume all this to run without concurrent accesses to the respective objects): _Atomic T obj; ... T expected, newval; expected = ...; newval = expected + 1; // just to make it different atomic_store (, expected); if (atomic_cmpxchg_strong (, , newval)) { /* Now we have: obj == newval. Do we also have memcmp(,)==0? */ if (!atomic_cmpxchg_strong (, , expected)) { /* No, we can't rely on that! */ error("what's going on?"); } } else { /* May happen, padding of expected may not be the same as in obj, even after atomic_store. */ error("WTH? a compare after a store doesn't even work?"); } So, even though cmpxchg is defined in terms of memcpy/memcmp, we still can't rely on anything after it succeeded (or failed). Simply because the by-value passing of the 'desired' argument will have unknown padding (within the implementation of cmpxchg) that isn't necessarily the same as the newval object. Now, about your suggestion of clearing or ignoring the padding bits at specific
Re: _BitInt vs. _Atomic
Am Montag, dem 31.07.2023 um 14:33 + schrieb Michael Matz: > Hello, > > On Fri, 28 Jul 2023, Martin Uecker wrote: > > > > > Sorry, somehow I must be missing something here. > > > > > > > > If you add something you would create a new value and this may (in > > > > an object) have random new padding. But the "expected" value should > > > > be updated by a failed atomic_compare_exchange cycle and then have > > > > same padding as the value stored in the atomic. So the next cycle > > > > should succeed. The user would not change the representation of > > > > the "expected" value but create a new value for another object > > > > by adding something. > > > > > > You're right that it would pass the expected value not something after an > > > operation on it usually. But still, expected type will be something like > > > _BitInt(37) or _BitInt(195) and so neither the atomic_load nor what > > > atomic_compare_exchange copies back on failure is guaranteed to have the > > > padding bits preserved. > > > > For atomic_load in C a value is returned. A value does not care about > > padding and when stored into a new object can produce new and different > > padding. > > > > But for atomic_compare_exchange the memory content is copied into > > an object passed by pointer, so here the C standard requires to > > that the padding is preserved. It explicitely states that the effect > > is like: > > > > if (memcmp(object, expected, sizeof(*object)) == 0) > > memcpy(object, , sizeof(*object)); > > else > > memcpy(expected, object, sizeof(*object)); > > > > > It is true that if it is larger than 16 bytes the libatomic > > > atomic_compare_exchange will memcpy the value back which copies the > > > padding bits, but is there a guarantee that the user code doesn't > > > actually copy that value further into some other variable? > > > > I do not think it would be surprising for C user when > > the next atomic_compare_exchange fails in this case. > > But that is a problem (the same one the cited C++ paper tries to resolve, > IIUC). I do not quite understand the paper. I can't see how the example 3 in https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0528r0.html could loop indefinitely with the memcpy / memcmp semantics, but somehow the authors seem to claim this. > Say you have a loop like so: > > _Atomic T obj; > ... > T expected1, expected2, newval; > newval = ...; > expected1 = ...; > do { > expected2 = expected1; > if (atomic_compare_exchange_weak(, , newval); > break; > expected1 = expected2; > } while (1); > > As written this looks of course stupid, and you may say "don't do that", > but internally the copies might result from temporaries (compiler > generated or wrapper function arguments, or suchlike). > Now, while > expected2 will contain the copied padding bits after the cmpxchg the > copies to and from expected1 will possibly destroy them. Either way I > don't see why the above loop should be out-of-spec, so I can write it and > expect it to proceed eventually (certainly when the _strong variant is > used). Any argument that would declare the above loop out-of-spec I would > consider a defect in the spec. It is "out-of-spec" for C in the sense that it can not be expected work with the semantics as specified in the C standard. But I agree with you that it would be better if it just worked. A compiler could, for example, always clear the padding when initializing or storing atomic values. It might also clear the padding of the initial "expected", when it is initialized or stored to. But it should not clear / ignore the padding when copying to "expected" using atomic_compare_exhange or when comparing to the memory content. See below why I think this would not be helpful. > > It's never a good idea to introduce reliance on padding bits. Exactly > because you can trivially destroy them with simple value copies. > > > > Anyway, for smaller or equal to 16 (or 8) bytes if > > > atomic_compare_exchange is emitted inline I don't see what would > > > preserve the bits. > > > > This then seems to be incorrect for C. > > Or the spec is. In practice, what the semantics specified using memcpy/memcmp allow one to do is to also apply atomic operations on non-atomic types. This is not guaranteed to work by the C standard, but in practice people often have to do this. For example, nobody is going to copy a 256 GB numerical array with non-atomic types into another data structure with atomic versions of the same type just so that you can apply atomic operations on it. So one simply does an unsafe cast and hopes the compiler does not break this. If the non-atomic struct now has non-zero values in the padding, and the compiler would clear those automatically for "expected", you would create the problem of an infinite loop (this time for real). Martin
Re: _BitInt vs. _Atomic
Hello, On Fri, 28 Jul 2023, Martin Uecker wrote: > > > Sorry, somehow I must be missing something here. > > > > > > If you add something you would create a new value and this may (in > > > an object) have random new padding. But the "expected" value should > > > be updated by a failed atomic_compare_exchange cycle and then have > > > same padding as the value stored in the atomic. So the next cycle > > > should succeed. The user would not change the representation of > > > the "expected" value but create a new value for another object > > > by adding something. > > > > You're right that it would pass the expected value not something after an > > operation on it usually. But still, expected type will be something like > > _BitInt(37) or _BitInt(195) and so neither the atomic_load nor what > > atomic_compare_exchange copies back on failure is guaranteed to have the > > padding bits preserved. > > For atomic_load in C a value is returned. A value does not care about > padding and when stored into a new object can produce new and different > padding. > > But for atomic_compare_exchange the memory content is copied into > an object passed by pointer, so here the C standard requires to > that the padding is preserved. It explicitely states that the effect > is like: > > if (memcmp(object, expected, sizeof(*object)) == 0) > memcpy(object, , sizeof(*object)); > else > memcpy(expected, object, sizeof(*object)); > > > It is true that if it is larger than 16 bytes the libatomic > > atomic_compare_exchange will memcpy the value back which copies the > > padding bits, but is there a guarantee that the user code doesn't > > actually copy that value further into some other variable? > > I do not think it would be surprising for C user when > the next atomic_compare_exchange fails in this case. But that is a problem (the same one the cited C++ paper tries to resolve, IIUC). Say you have a loop like so: _Atomic T obj; ... T expected1, expected2, newval; newval = ...; expected1 = ...; do { expected2 = expected1; if (atomic_compare_exchange_weak(, , newval); break; expected1 = expected2; } while (1); As written this looks of course stupid, and you may say "don't do that", but internally the copies might result from temporaries (compiler generated or wrapper function arguments, or suchlike). Now, while expected2 will contain the copied padding bits after the cmpxchg the copies to and from expected1 will possibly destroy them. Either way I don't see why the above loop should be out-of-spec, so I can write it and expect it to proceed eventually (certainly when the _strong variant is used). Any argument that would declare the above loop out-of-spec I would consider a defect in the spec. It's never a good idea to introduce reliance on padding bits. Exactly because you can trivially destroy them with simple value copies. > > Anyway, for smaller or equal to 16 (or 8) bytes if > > atomic_compare_exchange is emitted inline I don't see what would > > preserve the bits. > > This then seems to be incorrect for C. Or the spec is. Ciao, Michael.
Re: _BitInt vs. _Atomic
On Fri, 28 Jul 2023, Jakub Jelinek via Gcc-patches wrote: > The C++ way of dealing with this is using __builtin_clear_padding, > done on atomic stores/updates of the atomic memory (padding is cleared > if any on the value to be stored, or on the expected and desired values). > > I don't know enough about the C atomic requirements whether that is feasible > for it as well, or whether it is possible to make the padding bits partially > or fully set somehow non-atomically without invoking UB and then make it > never match. If padding bits not being reliably preserved causes problems for the compare-exchange loops in C in practice, then it would seem reasonable to use __builtin_clear_padding internally as part of implementing those cases of atomic compound assignment. > And another issue is that while __atomic_load, __atomic_store, > __atomic_exchange and __atomic_compare_exchange work on arbitrary _BitInt > sizes, others like __atomic_fetch_add only support _BitInt or other integral > types which have size of 1, 2, 4, 8 or 16 bytes, others emit an error > in c-family/c-common.cc (sync_resolve_size). So, either > resolve_overloaded_builtin should for the case when pointer is pointer to > _BitInt which doesn't have 1, 2, 4, 8 or 16 bytes size lower those into > a loop using __atomic_compare_exchange (or perhaps also if there is > padding), or should do that. The interfaces definitely need to work with _BitInt. My guess is that doing this with the built-in expansion would be more robust than putting more complicated definitions in the header that choose which built-in functions to use depending on properties of the type (and keeping the built-in functions limited to certain widths), but I don't know. Note also that those operations have no undefined behavior on signed integer overflow. If any ABIs require sign / zero extension of _BitInt values in memory, care would also be needed in the case of (size of 1, 2, 4, 8 or 16 bytes, but also has high bits required to be sign / zero extended) to ensure that the operations are implemented so as to leave the high bits with the expected values in case of overflow, which wouldn't result from simply using the underlying operation for a type with the full precision of its memory size. -- Joseph S. Myers jos...@codesourcery.com
Re: _BitInt vs. _Atomic
Am Freitag, dem 28.07.2023 um 17:10 +0200 schrieb Jakub Jelinek: > On Fri, Jul 28, 2023 at 04:53:30PM +0200, Martin Uecker wrote: > > > The thing is that user doesn't have much control over those > > > padding bits, so whether _Atomic operations on long double (when it is 80 > > > bit and stores from hw actually store 10 bytes rather than 12 or 16), or > > > _BitInt(37) or _BitInt(195) or struct S { char a; int b; }; then depend > > > purely on luck. If the expected value is based on atomic_load on the > > > atomic_compare_exchange location or whatever atomic_compare_exchange gave > > > back, if in the loop one e.g. adds something to it, then again it might > > > get > > > different padding bits from what is originally in memory, so it isn't true > > > that it will always succeed at least in the second loop iteration. > > > > Sorry, somehow I must be missing something here. > > > > If you add something you would create a new value and this may (in > > an object) have random new padding. But the "expected" value should > > be updated by a failed atomic_compare_exchange cycle and then have > > same padding as the value stored in the atomic. So the next cycle > > should succeed. The user would not change the representation of > > the "expected" value but create a new value for another object > > by adding something. > > You're right that it would pass the expected value not something after an > operation on it usually. But still, expected type will be something like > _BitInt(37) or _BitInt(195) and so neither the atomic_load nor what > atomic_compare_exchange copies back on failure is guaranteed to have the > padding bits preserved. For atomic_load in C a value is returned. A value does not care about padding and when stored into a new object can produce new and different padding. But for atomic_compare_exchange the memory content is copied into an object passed by pointer, so here the C standard requires to that the padding is preserved. It explicitely states that the effect is like: if (memcmp(object, expected, sizeof(*object)) == 0) memcpy(object, , sizeof(*object)); else memcpy(expected, object, sizeof(*object)); > It is true that if it is larger than 16 bytes the libatomic > atomic_compare_exchange will memcpy the value back which copies the padding > bits, but is there a guarantee that the user code doesn't actually copy that > value further into some other variable? I do not think it would be surprising for C user when the next atomic_compare_exchange fails in this case. > Anyway, for smaller or equal > to 16 (or 8) bytes if atomic_compare_exchange is emitted inline I don't see > what would preserve the bits. This then seems to be incorrect for C. Martin
Re: _BitInt vs. _Atomic
On Fri, Jul 28, 2023 at 04:53:30PM +0200, Martin Uecker wrote: > > The thing is that user doesn't have much control over those > > padding bits, so whether _Atomic operations on long double (when it is 80 > > bit and stores from hw actually store 10 bytes rather than 12 or 16), or > > _BitInt(37) or _BitInt(195) or struct S { char a; int b; }; then depend > > purely on luck. If the expected value is based on atomic_load on the > > atomic_compare_exchange location or whatever atomic_compare_exchange gave > > back, if in the loop one e.g. adds something to it, then again it might get > > different padding bits from what is originally in memory, so it isn't true > > that it will always succeed at least in the second loop iteration. > > Sorry, somehow I must be missing something here. > > If you add something you would create a new value and this may (in > an object) have random new padding. But the "expected" value should > be updated by a failed atomic_compare_exchange cycle and then have > same padding as the value stored in the atomic. So the next cycle > should succeed. The user would not change the representation of > the "expected" value but create a new value for another object > by adding something. You're right that it would pass the expected value not something after an operation on it usually. But still, expected type will be something like _BitInt(37) or _BitInt(195) and so neither the atomic_load nor what atomic_compare_exchange copies back on failure is guaranteed to have the padding bits preserved. It is true that if it is larger than 16 bytes the libatomic atomic_compare_exchange will memcpy the value back which copies the padding bits, but is there a guarantee that the user code doesn't actually copy that value further into some other variable? Anyway, for smaller or equal to 16 (or 8) bytes if atomic_compare_exchange is emitted inline I don't see what would preserve the bits. Jakub
Re: _BitInt vs. _Atomic
Am Freitag, dem 28.07.2023 um 16:26 +0200 schrieb Jakub Jelinek: > On Fri, Jul 28, 2023 at 04:03:39PM +0200, Martin Uecker wrote: > > > On Thu, Jul 27, 2023 at 07:06:03PM +, Joseph Myers wrote: > > > > I think there should be tests for _Atomic _BitInt types. Hopefully > > > > atomic > > > > compound assignment just works via the logic for compare-and-exchange > > > > loops, but does e.g. atomic_fetch_add work with _Atomic _BitInt types? > > > > > > So, there are 2 issues. > > > > > > One is something I haven't seen being handled for C at all so far, but > > > handled for C++ - padding bits. > > > > > > Already e.g. x86 long double has some padding bits - 16 bits on ia32, > > > 48 bits on x86_64, when one does > > > _Atomic long double l; > > > ... > > > l += 2.0; > > > it will sometimes work and sometimes hang forever. > > > Similarly atomic_compare_exchange with structs which contain padding > > > (unions with padding bits are lost case, there is nothing that can be > > > reliably done for that, because we don't know at runtime what is the > > > active > > > union member if any). And _BitInt if it doesn't use all bits in > > > all containing limbs has padding as well (and psABI doesn't say it is sign > > > or zero extended). > > > > What is the problem here? In C, atomic_compare_exchange is defined in terms > > of the memory content which includes padding. So it may fail spuriously > > due to padding differences (but it may fail anyway for arbitrary reasons > > even without padding differences), but then should work in the second > > iterations. > > See https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0528r3.html > for background. Thanks. I have seen this at that time, but it seems to refer to C++ specific problems. At least at that time, I concluded (maybe incorrectly) that this is not a serious problem for how things work in C. > The thing is that user doesn't have much control over those > padding bits, so whether _Atomic operations on long double (when it is 80 > bit and stores from hw actually store 10 bytes rather than 12 or 16), or > _BitInt(37) or _BitInt(195) or struct S { char a; int b; }; then depend > purely on luck. If the expected value is based on atomic_load on the > atomic_compare_exchange location or whatever atomic_compare_exchange gave > back, if in the loop one e.g. adds something to it, then again it might get > different padding bits from what is originally in memory, so it isn't true > that it will always succeed at least in the second loop iteration. Sorry, somehow I must be missing something here. If you add something you would create a new value and this may (in an object) have random new padding. But the "expected" value should be updated by a failed atomic_compare_exchange cycle and then have same padding as the value stored in the atomic. So the next cycle should succeed. The user would not change the representation of the "expected" value but create a new value for another object by adding something. Martin
Re: _BitInt vs. _Atomic
On Fri, Jul 28, 2023 at 04:03:39PM +0200, Martin Uecker wrote: > > On Thu, Jul 27, 2023 at 07:06:03PM +, Joseph Myers wrote: > > > I think there should be tests for _Atomic _BitInt types. Hopefully > > > atomic > > > compound assignment just works via the logic for compare-and-exchange > > > loops, but does e.g. atomic_fetch_add work with _Atomic _BitInt types? > > > > So, there are 2 issues. > > > > One is something I haven't seen being handled for C at all so far, but > > handled for C++ - padding bits. > > > > Already e.g. x86 long double has some padding bits - 16 bits on ia32, > > 48 bits on x86_64, when one does > > _Atomic long double l; > > ... > > l += 2.0; > > it will sometimes work and sometimes hang forever. > > Similarly atomic_compare_exchange with structs which contain padding > > (unions with padding bits are lost case, there is nothing that can be > > reliably done for that, because we don't know at runtime what is the active > > union member if any). And _BitInt if it doesn't use all bits in > > all containing limbs has padding as well (and psABI doesn't say it is sign > > or zero extended). > > What is the problem here? In C, atomic_compare_exchange is defined in terms > of the memory content which includes padding. So it may fail spuriously > due to padding differences (but it may fail anyway for arbitrary reasons > even without padding differences), but then should work in the second > iterations. See https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0528r3.html for background. The thing is that user doesn't have much control over those padding bits, so whether _Atomic operations on long double (when it is 80 bit and stores from hw actually store 10 bytes rather than 12 or 16), or _BitInt(37) or _BitInt(195) or struct S { char a; int b; }; then depend purely on luck. If the expected value is based on atomic_load on the atomic_compare_exchange location or whatever atomic_compare_exchange gave back, if in the loop one e.g. adds something to it, then again it might get different padding bits from what is originally in memory, so it isn't true that it will always succeed at least in the second loop iteration. Jakub
Re: _BitInt vs. _Atomic
Am Freitag, dem 28.07.2023 um 16:03 +0200 schrieb Martin Uecker: > > > On Thu, Jul 27, 2023 at 07:06:03PM +, Joseph Myers wrote: > > > I think there should be tests for _Atomic _BitInt types. Hopefully > > > atomic > > > compound assignment just works via the logic for compare-and-exchange > > > loops, but does e.g. atomic_fetch_add work with _Atomic _BitInt types? > > > > So, there are 2 issues. > > > > One is something I haven't seen being handled for C at all so far, but > > handled for C++ - padding bits. > > > > Already e.g. x86 long double has some padding bits - 16 bits on ia32, > > 48 bits on x86_64, when one does > > _Atomic long double l; > > ... > > l += 2.0; > > it will sometimes work and sometimes hang forever. > > Similarly atomic_compare_exchange with structs which contain padding > > (unions with padding bits are lost case, there is nothing that can be > > reliably done for that, because we don't know at runtime what is the active > > union member if any). And _BitInt if it doesn't use all bits in > > all containing limbs has padding as well (and psABI doesn't say it is sign > > or zero extended). > > What is the problem here? In C, atomic_compare_exchange is defined in terms > of the memory content which includes padding. So it may fail spuriously > due to padding differences (but it may fail anyway for arbitrary reasons > even without padding differences), but then should work in the second > iterations. (only the weak version can fail spuriously, but the strong one can still fail if there are differences in the padding)
Re: _BitInt vs. _Atomic
> On Thu, Jul 27, 2023 at 07:06:03PM +, Joseph Myers wrote: > > I think there should be tests for _Atomic _BitInt types. Hopefully atomic > > compound assignment just works via the logic for compare-and-exchange > > loops, but does e.g. atomic_fetch_add work with _Atomic _BitInt types? > > So, there are 2 issues. > > One is something I haven't seen being handled for C at all so far, but > handled for C++ - padding bits. > > Already e.g. x86 long double has some padding bits - 16 bits on ia32, > 48 bits on x86_64, when one does > _Atomic long double l; > ... > l += 2.0; > it will sometimes work and sometimes hang forever. > Similarly atomic_compare_exchange with structs which contain padding > (unions with padding bits are lost case, there is nothing that can be > reliably done for that, because we don't know at runtime what is the active > union member if any). And _BitInt if it doesn't use all bits in > all containing limbs has padding as well (and psABI doesn't say it is sign > or zero extended). What is the problem here? In C, atomic_compare_exchange is defined in terms of the memory content which includes padding. So it may fail spuriously due to padding differences (but it may fail anyway for arbitrary reasons even without padding differences), but then should work in the second iterations. Martin
Re: _BitInt vs. _Atomic
On Thu, Jul 27, 2023 at 07:06:03PM +, Joseph Myers wrote: > I think there should be tests for _Atomic _BitInt types. Hopefully atomic > compound assignment just works via the logic for compare-and-exchange > loops, but does e.g. atomic_fetch_add work with _Atomic _BitInt types? So, there are 2 issues. One is something I haven't seen being handled for C at all so far, but handled for C++ - padding bits. Already e.g. x86 long double has some padding bits - 16 bits on ia32, 48 bits on x86_64, when one does _Atomic long double l; ... l += 2.0; it will sometimes work and sometimes hang forever. Similarly atomic_compare_exchange with structs which contain padding (unions with padding bits are lost case, there is nothing that can be reliably done for that, because we don't know at runtime what is the active union member if any). And _BitInt if it doesn't use all bits in all containing limbs has padding as well (and psABI doesn't say it is sign or zero extended). The C++ way of dealing with this is using __builtin_clear_padding, done on atomic stores/updates of the atomic memory (padding is cleared if any on the value to be stored, or on the expected and desired values). I don't know enough about the C atomic requirements whether that is feasible for it as well, or whether it is possible to make the padding bits partially or fully set somehow non-atomically without invoking UB and then make it never match. If one ignores this or deals with it, then _Atomic _BitInt(15) a; _Atomic(_BitInt(15)) b; _Atomic _BitInt(115) c; _Atomic _BitInt(192) d; _Atomic _BitInt(575) e; _BitInt(575) f; int main () { a += 1wb; b -= 2wb; c += 3wb; d += 4wb; e -= 5wb; // f = __atomic_fetch_add (, 54342985743985743985743895743834298574985734895743895734895wb, __ATOMIC_SEQ_CST); } compiles fine with the patch set. And another issue is that while __atomic_load, __atomic_store, __atomic_exchange and __atomic_compare_exchange work on arbitrary _BitInt sizes, others like __atomic_fetch_add only support _BitInt or other integral types which have size of 1, 2, 4, 8 or 16 bytes, others emit an error in c-family/c-common.cc (sync_resolve_size). So, either resolve_overloaded_builtin should for the case when pointer is pointer to _BitInt which doesn't have 1, 2, 4, 8 or 16 bytes size lower those into a loop using __atomic_compare_exchange (or perhaps also if there is padding), or should do that. Thoughts on that? Jakub