https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82677
Bug ID: 82677 Summary: Many projects (linux, coreutils, GMP, gcrypt, openSSL, etc) are misusing asm(divq/divl) etc, potentially resulting in faulty/unintended optimisations Product: gcc Version: 7.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: inline-asm Assignee: unassigned at gcc dot gnu.org Reporter: infinity0 at pwned dot gg Target Milestone: --- Created attachment 42439 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42439&action=edit Basic optimisation example, C-reduced from flint test code If I understood the issue correctly, this is strictly speaking not a GCC bug. But I'm filing this here so there is a central place to have the discussion. There is also the possibility of GCC developers discussing a central fix for this issue in GCC itself, rather than patching 20+ FOSS projects. Background ========== When using asm() it is vital that the dependencies are expressed correctly, otherwise GCC can optimise stuff in an unintended way. Those new to the topic (e.g. me a few days ago) can read: - https://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly-HOWTO.html for a nice intro - https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html for details The Intel instruction manual [0] for the "div" and "idiv" instructions states that: - "The CF, OF, SF, ZF, AF, and PF flags are undefined." - raises a #DE (Divide error) (#a) "If the source operand (divisor) is 0" and (#b) "If the quotient is too large for the designated register." Therefore, if you want to write a general-purpose "div" utility in GCC asm, that assumes nothing about its inputs or outputs, the correct syntax would be something along the lines of: __asm__ __volatile__("divq %4" : "=a"(q), "=d"(r) : "0"((ulong)n0), "1"((ulong)n1), "rm"((ulong)d) : "cc"); - __volatile__ because raising an exception is a "side-effect", i.e. affects something outside of the declared inputs and outputs, and - "cc" because it might clobber the FLAGS register. Note that __volatile__ is necessary even if the divisor "cannot be 0" in most cases. (Therefore, it is necessary for a general-purpose "div" utility.) For example, (##) if the code is unreachable for reasons unrelated to the division operands, then the instruction will never raise a DE, but if __volatile__ is not declared then GCC might optimise the code in a way such that the div operation *is* reachable sometimes when the divisor is zero. To demonstrate that example concretely, see udiv.c as a test case, where "if (g)" should prevent the div from running, but GCC 7.2 -O2 lifts the asm() out of the inner loop and the if(), causing a DE (SIGFPE) at run time. (udiv.c was minimised from the real-world example at [1] using C-Reduce [2] - which I couldn't have done this without.) One way to make __volatile__ *actually* unnecessary, is to do an explicit unconditional check (i.e. not based on a removeable macro) for divisor != 0 before the operation, and branch away if = 0. In this case, since divisor is an input to the asm() instruction, GCC will then know not to reorder it with respect to the branch. Also, I am not sure if this sort of unexpected SIGFPE could potentially result in a security problem. If so then perhaps the priority should be raised. [0] https://software.intel.com/en-us/articles/intel-sdm [1] https://github.com/fredrik-johansson/arb/issues/194 [2] https://embed.cs.utah.edu/creduce/ Problem ======= Lots of people copied the same code (longlong.h), with the same definition of udiv_qrnnd, into their own projects. [3] At the time of writing this includes: linux, grub2, coreutils, GMP, gcrypt, mpfr, and even older versions of GCC itself. The code seems to originate from GMP, but I didn't confirm this or investigate in great detail. Now technically it is possible to use this correctly, by doing an unconditional explicit check for divisor != 0 as mentioned above. However, this complexity is *not* documented in the description, e.g. [4], which even describes the *other* condition (#b) above, for the instruction not to raise an exception - "HIGH_NUMERATOR must be less than DENOMINATOR for correct operation.". Even if it was documented naively ("divisor must be zero"), one can imagine someone interpreting this as (##) above, which would still be incorrect and still cause faulty optimisations. Also, there is a fallback macro __udiv_qrnnd_c [5] defined which reimplements the instruction in C for platforms that don't have it, and in this case GCC knows that the C "/" operation can cause div-by-zero errors, and optimises everything as intended, as one can see by compiling the udiv-alt.c attached with -DNOOPT_UDIV. So this suggests that the original author of udiv_qrnnd was not aware of these issues either - since the correctness requirements between udiv_qrnnd and __udiv_qrnnd_c are different. [3] https://codesearch.debian.net/search?q=asm.*%5C%28.*%22i%3Fdiv%5Bql%5D%5C+*%25%5Cd+path%3Alonglong.h&perpkg=1 [4] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/lib/mpi/longlong.h?id=5ce3e312ec5c11abce13215be70700778bf601f0#n56 [5] https://gmplib.org/repo/gmp/file/tip/longlong.h#l2066 [6] https://codesearch.debian.net/search?q=asm.*%5C%28.*%22i%3Fdiv%5Bql%5D%5C+*%25%5Cd&perpkg=1 Here [6] is a quick search through Debian packages, of all potential instances of this issue. At the time of writing, a lot of these are affected by the issue described above, their code sharing history with the GMP version, [3] being a subset of [6]. Some projects correctly implement a generic div operation, using both __volatile__ and "cc": - https://boringssl.googlesource.com/boringssl/+/d1425f69df16310bdca46a3d66144dcb4e3ad4fc%5E!/ - https://github.com/ceph/ceph/commit/38e60deb5af10d5e2d8023a73a955080285b2760#diff-3501c244d08381cbfa48b804dfac9f09R412 Some projects were almost there: - https://github.com/GStreamer/gstreamer/commit/b0c1ebbd0883a9191d553a40b78facff8cb7c26e#diff-b162b68977c03c9a2029a6dd58860a50R267 // whoops, misses out "cc" - https://github.com/python/cpython/commit/a0346e56acc8c5371c7b6356e1862e6e01406dbb#diff-61d4a6a33e8afe74d982d4c887e55d41R226 // whoops, misses out "volatile" Moving forward ============== I pieced together the various parts of the puzzle over the last few days from talking to various people. There was no one single person who could tell me with certainty all of the details mentioned here, so I am not sure if I got everything correct - therefore it would be good if a GCC developer could confirm my analysis. Then I suppose we shall have to start filing bug reports and patches to these projects to get things fixed. Given that this is such a wide issue perhaps it might be good for GCC to emit a warning, or else automatically treat asm(div) as volatile unless explicitly marked as non-volatile and likewise for "cc". A topic for GCC developers - I'm not in a position to know if this is feasible or not.