https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91053

            Bug ID: 91053
           Summary: __builtin___clear_cache can fail
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libgcc
          Assignee: unassigned at gcc dot gnu.org
          Reporter: oth+gccbugs at google dot com
  Target Milestone: ---

This issue arises from using libgcc in the Android Runtime's Just-In-Time
compiler. The Android Runtime uses __buitin__clear_cache() for JIT cache
maintenance and we’ve become aware that this builtin can silently fail. This
leaves the CPU caches in an unknown state potentially leading to a crash from
the execution of unintended instruction sequences.

The specific case where we've observed this failure is for devices with ARMv7
Linux kernels. The libgcc clear_cache builtin calls the cacheflush() system
call which bottoms out in v7_coherent_user_range():

  https://github.com/torvalds/linux/blob/master/arch/arm/mm/cache-v7.S#L253

The important detail in that code is that the blocks of code within USER()
macros will call the fault handler, labelled 9001, if the cache operation
causes a fault. The return value from v7_coherent_user_range is then -EFAULT.
When this happens after code updates in the JIT cache, crashes can be expected.
For example, if we didn't manage to invalidate all of the instruction cache
range, and thus executes a mix-and-match of old and new instructions.

This issue is quite hard to reproduce and typically only occurs under memory
pressure.

The documentation for __builtin___clear_cache() does not comment on the
possibilities of failures:

  https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html

but this is documented behaviour of the cacheflush() system call:

  http://man7.org/linux/man-pages/man2/cacheflush.2.html

Potential fixes could include:

  1) no change to __builtin__clear_cache, but update the documentation to
indicate that failure is possible on systems where cache flushing is a
privileged operation. On these systems callers of the builtin should clear
errno before the call and check it afterwards.

  2) change __builtin___clear_cache() to return an error code.

  3) a fix for Linux kernels so cacheflush() cannot fail. [Outside the scope of
this bug.]

Our workaround for now is to special case on ARMv7 to use the cacheflush()
system call directly and check for errors (http://r.android.com/989545).

Reply via email to