Re: [PATCH 13/X] [libsanitizer][options] Add hwasan flags and argument parsing

2019-11-07 Thread Evgenii Stepanov via gcc-patches
Clang has a function level attribute,
  __attribute__((no_sanitize("hwaddress")))
a feature macro
  #if __has_feature(hwaddress_sanitizer)
and a blacklist section
  [hwaddress]
  https://clang.llvm.org/docs/SanitizerSpecialCaseList.html

I think it makes sense for the compiler to err on the side of not losing
information and provide distinct macros for these two sanitizers. If the
kernel does not care about the difference, they can add a simple #ifdef.
They would need to, anyway, because gcc does not have feature macros and
clang does not define __SANITIZE_ADDRESS__.


On Thu, Nov 7, 2019 at 7:51 AM Andrey Konovalov 
wrote:

> On Thu, Nov 7, 2019 at 1:48 PM Matthew Malcomson
>  wrote:
> >
> > On 05/11/2019 13:11, Andrey Konovalov wrote:
> > > On Tue, Nov 5, 2019 at 12:34 PM Matthew Malcomson
> > >  wrote:
> > >>
> > >> NOTE:
> > >> --
> > >> I have defined a new macro of __SANITIZE_HWADDRESS__ that gets
> > >> automatically defined when compiling with hwasan.  This is analogous
> to
> > >> __SANITIZE_ADDRESS__ which is defined when compiling with asan.
> > >>
> > >> Users in the kernel have expressed an interest in using
> > >> __SANITIZE_ADDRESS__ for both
> > >> (
> https://lists.infradead.org/pipermail/linux-arm-kernel/2019-October/690703.html
> ).
> > >>
> > >> One approach to do this could be to define __SANITIZE_ADDRESS__ with
> > >> different values depending on whether we are compiling with hwasan or
> > >> asan.
> > >>
> > >> Using __SANITIZE_ADDRESS__ for both means that code like the kernel
> > >> which wants to treat the two sanitizers as alternate implementations
> of
> > >> the same thing gets that automatically.
> > >>
> > >> My preference is to use __SANITIZE_HWADDRESS__ since that means any
> > >> existing code will not be predicated on this (and hence I guess less
> > >> surprises), but would appreciate feedback on this given the point
> above.
> > >
> > > +Evgenii Stepanov
> > >
> > > (A repost from my answer from the mentioned thread):
> > >
> > >> Similarly, I'm thinking I'll add no_sanitize_hwaddress as the hwasan
> > >> equivalent of no_sanitize_address, which will require an update in the
> > >> kernel given it seems you want KASAN to be used the same whether using
> > >> tags or not.
> > >
> > > We have intentionally reused the same macros to simplify things. Is
> > > there any reason to use separate macros for GCC? Are there places
> > > where we need to use specifically no_sanitize_hwaddress and
> > > __SANITIZE_HWADDRESS__, but not no_sanitize_address and
> > > __SANITIZE_ADDRESS__?
> > >
> > >
> >
> > I've just looked through some open source repositories (via github
> > search) that used the existing __SANITIZE_ADDRESS__ macro.
> >
> > There are a few repos that would want to use a feature macro for hwasan
> > or asan in the exact same way as each other, but of the 31 truly
> > different uses I found, 11 look like they would need to distinguish
> > between hwasan and asan (where 4 uses I found I couldn't easily tell)
> >
> > NOTE
> > - This is a count of unique uses, ignoring those repos which use a file
> > from another repo.
> > - I'm just giving links to the first of the relevant kind that I found,
> > not putting effort into finding the "canonical" source of each
> repository.
> >
> >
> > Places that need distinction (and their reasons):
> >
> > There are quite a few that use the ASAN_POISON_MEMORY_REGION and
> > ASAN_UNPOISON_MEMORY_REGION macros to poison/unpoison memory themselves.
> >   This abstraction doesn't quite make sense in a hwasan environment, as
> > there is not really a "poisoned/unpoisoned" concept.
> >
> > https://github.com/laurynas-biveinis/unodb
> > https://github.com/darktable-org/rawspeed
> > https://github.com/MariaDB/server
> > https://github.com/ralfbrown/framepac-ng
> > https://github.com/peters/aom
> > https://github.com/pspacek/knot-resolver-docker-fix
> > https://github.com/harikrishnan94/sheap
> >
> >
> > Some use it to record their compilation "type" as `-fsanitize=address`
> > https://github.com/wallix/redemption
> >
> > Or to decide to set the environment variable ASAN_OPTIONS
> > https://github.com/dephonatine/VBox5.2.18
> >
> > Others worry about stack space due to asan's redzones (hwasan has a much
> > smaller stack memory overhead).
> > https://github.com/fastbuild/fastbuild
> > https://github.com/scylladb/seastar
> > (n.b. seastar has a lot more conditioned code that would be the same
> > between asan and hwasan).
> >
> >
> > Each of these needs to know the difference between compiling with asan
> > and hwasan, so I'm confident that having some way to determine that in
> > the source code is a good idea.
> >
> >
> > I also believe there could be code in the wild that would need to
> > distinguish between hwasan and asan where the existence of tags could be
> > problematic:
> >
> > - code already using the top-byte-ignore feature may be able to be used
> > with asan but not hwasan.
> > - Code that makes assumptions about pointer 

Re: Question on direction of GCC support for HWASAN.

2019-09-24 Thread Evgenii Stepanov via gcc-patches
On Tue, Sep 24, 2019 at 9:36 AM Szabolcs Nagy  wrote:
>
> On 23/09/2019 08:52, Martin Liška wrote:
> > On 9/20/19 7:11 PM, Matthew Malcomson wrote:
> >> The implementation is unlikely to be production-quality since
> >> development on libhwasan is only on its `platform` ABI.  This libhwasan
> >> ABI requires changes to the system libc so that it calls into libhwasan
> >> on interesting events.
> >> I haven't looked into adding these changes to glibc, but expect that
> >> most people running a Linux distribution would not want to install a
> >> special glibc to use this sanitizer.
> >
> > Can you please provide a link about what special one needs in glibc
> > to support HWASAN?
>
> i don't know if there is such a link other than taking
> a hint from the internal api in the source
> https://github.com/llvm/llvm-project/blob/master/compiler-rt/lib/hwasan/hwasan_interface_internal.h
>
> memory has to be (un)tagged on (de)allocation, which
> requires libc help to know the limits and when the
> (de)allocation happens in case of tls/stack memory
> (e.g. dealloced at unwind, longjmp, setcontext, thread
> exit, thread cancel, child exit after vfork) and in
> case of global data in dynamically loaded shared libs.

This is a slightly better link, but it misses
__hwasan_library_(load|unload) hooks:
https://github.com/llvm/llvm-project/blob/master/compiler-rt/include/sanitizer/hwasan_interface.h

You can also search bionic source for __hwasan and __sanitizer:
https://android.googlesource.com/platform/bionic/+/refs/heads/master


Re: [Patch 0/X] [WIP][RFC][libsanitizer] Introduce HWASAN to GCC

2019-09-11 Thread Evgenii Stepanov via gcc-patches
On Wed, Sep 11, 2019 at 9:37 AM Matthew Malcomson
 wrote:
>
> On 11/09/19 12:53, Martin Liška wrote:
> > On 9/9/19 5:54 PM, Matthew Malcomson wrote:
> >> On 09/09/19 11:47, Martin Liška wrote:
> >>> On 9/6/19 4:46 PM, Matthew Malcomson wrote:
>  Hello,
> 
> >> As I understand it, `hwasan-abi=interceptor` vs `platform` is about
> >> adding such MTE emulation for "application code" or "platform code (e.g.
> >> kernel)" respectively.
> >
> > Hm, are you sure? Clang also uses -fsanitize=kernel-hwaddress which should
> > be equivalent to kernel-address for -fsanitize=address.
> >
>
> I'm not at all sure it's to do with the kernel ;-}
>
> Here's the commit that adds the flag.
> https://reviews.llvm.org/D56038
>
>  From the commit message it seems the point is to distinguish between
> running on runtimes that natively support HWASAN (named the "platform"
> abi) and those where functions like malloc and pthread_create have to be
> intercepted (named the "interceptor" abi).
>
> I had assumed that targeting the kernel would be in the "platform"
> group, but it could easily not be the case.
>
> Considering the message form the below commit it seems that this is more
> targeted at instrumenting things like libc https://reviews.llvm.org/D50922.

With hwasan we tried a different approach from asan: instead of
intercepting libc we build it with sanitizer instrumentation, and rely
on a few hooks to update internal state of the tool on interesting
events, such as process startup, thread creation and destruction,
stack unwind (longjmp, vfork). This effectively puts hwasan _below_
libc (as in libc depends on libhwasan).

It has worked amazingly well for Android, where we aim to sanitize
most of platform code at once. Ex. ASan has this requirement that the
main executable needs to be built with ASan before any of the
libraries could - otherwise the tool will not be able to interpose
malloc/free symbols. As a consequence, when there are binaries that
can not be sanitized for any reason, we need to keep unsanitized
copies of all their transitive dependencies, and that turns into a
huge build/deployment mess. Hwasan approach avoids this problem by
making sure that the allocator is always there (because everything
depends on libc).

The downside, of course, is that this can not be used to sanitize a
single binary without a specially built libc. Hence the "interceptor"
ABI, which was an attempt to support running hwasan-instrumented
applications on regular, non-hwasan devices. We are not developing
this mode any longer, but it is used to run compiler-rt tests on
aarch64-android.

> I'm currently working on writing down the questions I plan to ask the
> developers of HWASAN in LLVM, I'll put this on the list :-)
>
> >>
> >>>
> >> There's an even more fundamental problem of accesses within the
> >> instrumented binary -- I haven't yet figured out how to remove the tag
> >> before accesses on architectures without the AArch64 TBI feature.
> >
> > Which should platforms like x86_64, right?
>
> Yes.
> As yet I haven't gotten anything working for architectures without TBI
> (everything except AArch64).
> This particular problem was one I was hoping for suggestions around (my
> first of the questions in my cover letter).

We have support for hwasan on x86_64 in LLVM (by removing tags before
accesses), but it is not really practical because any library built
without instrumentation is a big source of false positives. Even, say,
libc++/libstdc++. We use it exclusively for tests.

> 
>  The current patch series is far from complete, but I'm posting the 
>  current state
>  to provide something to discuss at the Cauldron next week.
> 
>  In its current state, this sanitizer only works on AArch64 with a custom 
>  kernel
>  to allow tagged pointers in system calls.  This is discussed in the 
>  below link
>  https://source.android.com/devices/tech/debug/hwasan -- the custom 
>  kernel allows
>  tagged pointers in syscalls.
> >>>
> >>> Can you be please more specific. Is the MTE in upstream linux kernel? If 
> >>> so,
> >>> starting from which version?
> >>
> >> I find I can only make complicated statements remotely clear in bullet
> >> points ;-)
> >>
> >> What I was trying to say was:
> >> - HWASAN from this patch series requires AArch64 TBI.
> >> (I have not handled architectures without TBI)
> >> - The upstream kernel does not accept tagged pointers in syscalls.
> >> (programs that use TBI must currently clear tags before passing
> >>  pointers to the kernel)
> >
> > I know that in case of ASAN, the libasan provides wrappers (interceptors) 
> > for various glibc
> > functions that are often system calls. Similar wrappers are probably used 
> > in HWASAN
> > and so that one can create the memory pointer tags.
> >
> >> - This patch series doesn't include any way to avoid passing tagged
> >> pointers to syscalls.
> >
> > I bet LLVM has the same problem so I would expect