Re: [Patch 0/X] [WIP][RFC][libsanitizer] Introduce HWASAN to GCC

2019-10-24 Thread Martin Liška
On 10/23/19 1:01 PM, Matthew Malcomson wrote:
> Hi Martin,

Hello.

> 
> I'm getting close to putting up a patch series that I believe could go 
> in before stage1 close.
> 
> I currently have to do testing on sanitizing the kernel, and track down 
> a bootstrap comparison diff in the code handling shadow-stack cleanup 
> during exception unwinding.
> 
> I just thought I'd answer these questions below to see if there's 
> anything I extra could to do to make reviewing easier.

I welcome that approach.

> 
> On 23/09/19 09:02, Martin Liška wrote:
>> Hi.
>>
>> As mentioned in the next email thread, there are main objectives
>> that will help me to make a proper patch review:
>>
>> 1) Make first libsanitizer merge from trunk, it will remove the need
>> of the backports that you made. Plus I will be able to apply the
>> patchset on the current master.
> Done
>> 2) I would exclude the setjmp/longjmp - these should be upstreamed first
>> in libsanitizer.
> 
> Will exclude in the patch series, upstreaming under progress 
> (https://reviews.llvm.org/D69045)
> 
>> 3) I would like to see a two HWASAN options that will clearly separate the
>> 2 supported modes: TBI without MTE and MTE. Here I would appreciate to 
>> have
>> a compiler farm machine with TBI which we can use for testing.
> 
> I went back and looked at clang to see that it uses 
> `-fsanitize=hwaddress` and `-fsanitize=memtag`, which are completely 
> different options.
> 
> I'm now doing the same, with the two sanitizers just using similar code 
> paths.
> 
> In fact, I'm not going to have the MTE instrumentation ready by the end 
> of stage1, so my aim is to just put the `-fsanitize=hwaddress` sanitizer 
> in, but send some outline code to the mailing list to demonstrate how 
> `-fsanitize=memtag` would fit in.

As well here. That will make it easier to merge -fsanitize=hwaddress first.

> 
> 
> ## w.r.t. a compiler farm machine with TBI
> 
> Any AArch64 machine has this feature.  However in order to use the 
> sanitizer the kernel needs to allow "tagged pointers" in syscalls.

If so, then it will be very easy to grab a machine and run 5.4 kernel in it.
So I'll will be able to test the patches.

> 
> The kernel has allowed these tagged pointers in syscalls (once it's been 
> turned on with a relevant prctl) in mainline since 5.4-rc1 (i.e. the 
> start of this month).
> 
> My testing has been on a virtual machine with a mainline kernel built 
> from source.
> 
> Given that I'm not sure how you want to proceed.
> Could we set up a virtual machine on the compiler farm?
> 
> 
>> 4) About the BUILTIN expansion: you provided a patch for couple of them. My 
>> question
>> is whether the list is complete?
> 
> The list of BUILTINs was nowhere near complete at the time I posted the 
> RFC patches.
> 
> Since then I've added features and correspondingly added BUILTINs.
> 
> Now I believe I've added all the BUILTIN's into sanitizer.def this 
> sanitizer will need.
> 
>> 5) I would appreciate the patch set to be split into less logical parts, e.g.
>> libsanitizer changes; option introduction; stack variable handling 
>> (colour/uncolour/alignment);
>> hwasan pass and other GIMPLE-related changes; RTL hooks, new RTL 
>> instructions and expansion changes.
>>
> 
> Will do!

Great.

Thanks,
Martin

> 
>> Thank you,
>> Martin
>> 
>>
> 



Re: [Patch 0/X] [WIP][RFC][libsanitizer] Introduce HWASAN to GCC

2019-10-23 Thread Matthew Malcomson
Hi Martin,

I'm getting close to putting up a patch series that I believe could go 
in before stage1 close.

I currently have to do testing on sanitizing the kernel, and track down 
a bootstrap comparison diff in the code handling shadow-stack cleanup 
during exception unwinding.

I just thought I'd answer these questions below to see if there's 
anything I extra could to do to make reviewing easier.

On 23/09/19 09:02, Martin Liška wrote:
> Hi.
> 
> As mentioned in the next email thread, there are main objectives
> that will help me to make a proper patch review:
> 
> 1) Make first libsanitizer merge from trunk, it will remove the need
> of the backports that you made. Plus I will be able to apply the
> patchset on the current master.
Done
> 2) I would exclude the setjmp/longjmp - these should be upstreamed first
> in libsanitizer.

Will exclude in the patch series, upstreaming under progress 
(https://reviews.llvm.org/D69045)

> 3) I would like to see a two HWASAN options that will clearly separate the
> 2 supported modes: TBI without MTE and MTE. Here I would appreciate to 
> have
> a compiler farm machine with TBI which we can use for testing.

I went back and looked at clang to see that it uses 
`-fsanitize=hwaddress` and `-fsanitize=memtag`, which are completely 
different options.

I'm now doing the same, with the two sanitizers just using similar code 
paths.

In fact, I'm not going to have the MTE instrumentation ready by the end 
of stage1, so my aim is to just put the `-fsanitize=hwaddress` sanitizer 
in, but send some outline code to the mailing list to demonstrate how 
`-fsanitize=memtag` would fit in.


## w.r.t. a compiler farm machine with TBI

Any AArch64 machine has this feature.  However in order to use the 
sanitizer the kernel needs to allow "tagged pointers" in syscalls.

The kernel has allowed these tagged pointers in syscalls (once it's been 
turned on with a relevant prctl) in mainline since 5.4-rc1 (i.e. the 
start of this month).

My testing has been on a virtual machine with a mainline kernel built 
from source.

Given that I'm not sure how you want to proceed.
Could we set up a virtual machine on the compiler farm?


> 4) About the BUILTIN expansion: you provided a patch for couple of them. My 
> question
> is whether the list is complete?

The list of BUILTINs was nowhere near complete at the time I posted the 
RFC patches.

Since then I've added features and correspondingly added BUILTINs.

Now I believe I've added all the BUILTIN's into sanitizer.def this 
sanitizer will need.

> 5) I would appreciate the patch set to be split into less logical parts, e.g.
> libsanitizer changes; option introduction; stack variable handling 
> (colour/uncolour/alignment);
> hwasan pass and other GIMPLE-related changes; RTL hooks, new RTL 
> instructions and expansion changes.
> 

Will do!

> Thank you,
> Martin
> 
> 



Re: [Patch 0/X] [WIP][RFC][libsanitizer] Introduce HWASAN to GCC

2019-09-23 Thread Martin Liška
Hi.

As mentioned in the next email thread, there are main objectives
that will help me to make a proper patch review:

1) Make first libsanitizer merge from trunk, it will remove the need
   of the backports that you made. Plus I will be able to apply the
   patchset on the current master.
2) I would exclude the setjmp/longjmp - these should be upstreamed first
   in libsanitizer.
3) I would like to see a two HWASAN options that will clearly separate the
   2 supported modes: TBI without MTE and MTE. Here I would appreciate to have
   a compiler farm machine with TBI which we can use for testing.
4) About the BUILTIN expansion: you provided a patch for couple of them. My 
question
   is whether the list is complete?
5) I would appreciate the patch set to be split into less logical parts, e.g.
   libsanitizer changes; option introduction; stack variable handling 
(colour/uncolour/alignment);
   hwasan pass and other GIMPLE-related changes; RTL hooks, new RTL 
instructions and expansion changes.

Thank you,
Martin
   


Re: [Patch 0/X] [WIP][RFC][libsanitizer] Introduce HWASAN to GCC

2019-09-11 Thread Evgenii Stepanov via gcc-patches
On Wed, Sep 11, 2019 at 9:37 AM Matthew Malcomson
 wrote:
>
> On 11/09/19 12:53, Martin Liška wrote:
> > On 9/9/19 5:54 PM, Matthew Malcomson wrote:
> >> On 09/09/19 11:47, Martin Liška wrote:
> >>> On 9/6/19 4:46 PM, Matthew Malcomson wrote:
>  Hello,
> 
> >> As I understand it, `hwasan-abi=interceptor` vs `platform` is about
> >> adding such MTE emulation for "application code" or "platform code (e.g.
> >> kernel)" respectively.
> >
> > Hm, are you sure? Clang also uses -fsanitize=kernel-hwaddress which should
> > be equivalent to kernel-address for -fsanitize=address.
> >
>
> I'm not at all sure it's to do with the kernel ;-}
>
> Here's the commit that adds the flag.
> https://reviews.llvm.org/D56038
>
>  From the commit message it seems the point is to distinguish between
> running on runtimes that natively support HWASAN (named the "platform"
> abi) and those where functions like malloc and pthread_create have to be
> intercepted (named the "interceptor" abi).
>
> I had assumed that targeting the kernel would be in the "platform"
> group, but it could easily not be the case.
>
> Considering the message form the below commit it seems that this is more
> targeted at instrumenting things like libc https://reviews.llvm.org/D50922.

With hwasan we tried a different approach from asan: instead of
intercepting libc we build it with sanitizer instrumentation, and rely
on a few hooks to update internal state of the tool on interesting
events, such as process startup, thread creation and destruction,
stack unwind (longjmp, vfork). This effectively puts hwasan _below_
libc (as in libc depends on libhwasan).

It has worked amazingly well for Android, where we aim to sanitize
most of platform code at once. Ex. ASan has this requirement that the
main executable needs to be built with ASan before any of the
libraries could - otherwise the tool will not be able to interpose
malloc/free symbols. As a consequence, when there are binaries that
can not be sanitized for any reason, we need to keep unsanitized
copies of all their transitive dependencies, and that turns into a
huge build/deployment mess. Hwasan approach avoids this problem by
making sure that the allocator is always there (because everything
depends on libc).

The downside, of course, is that this can not be used to sanitize a
single binary without a specially built libc. Hence the "interceptor"
ABI, which was an attempt to support running hwasan-instrumented
applications on regular, non-hwasan devices. We are not developing
this mode any longer, but it is used to run compiler-rt tests on
aarch64-android.

> I'm currently working on writing down the questions I plan to ask the
> developers of HWASAN in LLVM, I'll put this on the list :-)
>
> >>
> >>>
> >> There's an even more fundamental problem of accesses within the
> >> instrumented binary -- I haven't yet figured out how to remove the tag
> >> before accesses on architectures without the AArch64 TBI feature.
> >
> > Which should platforms like x86_64, right?
>
> Yes.
> As yet I haven't gotten anything working for architectures without TBI
> (everything except AArch64).
> This particular problem was one I was hoping for suggestions around (my
> first of the questions in my cover letter).

We have support for hwasan on x86_64 in LLVM (by removing tags before
accesses), but it is not really practical because any library built
without instrumentation is a big source of false positives. Even, say,
libc++/libstdc++. We use it exclusively for tests.

> 
>  The current patch series is far from complete, but I'm posting the 
>  current state
>  to provide something to discuss at the Cauldron next week.
> 
>  In its current state, this sanitizer only works on AArch64 with a custom 
>  kernel
>  to allow tagged pointers in system calls.  This is discussed in the 
>  below link
>  https://source.android.com/devices/tech/debug/hwasan -- the custom 
>  kernel allows
>  tagged pointers in syscalls.
> >>>
> >>> Can you be please more specific. Is the MTE in upstream linux kernel? If 
> >>> so,
> >>> starting from which version?
> >>
> >> I find I can only make complicated statements remotely clear in bullet
> >> points ;-)
> >>
> >> What I was trying to say was:
> >> - HWASAN from this patch series requires AArch64 TBI.
> >> (I have not handled architectures without TBI)
> >> - The upstream kernel does not accept tagged pointers in syscalls.
> >> (programs that use TBI must currently clear tags before passing
> >>  pointers to the kernel)
> >
> > I know that in case of ASAN, the libasan provides wrappers (interceptors) 
> > for various glibc
> > functions that are often system calls. Similar wrappers are probably used 
> > in HWASAN
> > and so that one can create the memory pointer tags.
> >
> >> - This patch series doesn't include any way to avoid passing tagged
> >> pointers to syscalls.
> >
> > I bet LLVM has the same problem so I would expect 

Re: [Patch 0/X] [WIP][RFC][libsanitizer] Introduce HWASAN to GCC

2019-09-11 Thread Matthew Malcomson
On 11/09/19 12:53, Martin Liška wrote:
> On 9/9/19 5:54 PM, Matthew Malcomson wrote:
>> On 09/09/19 11:47, Martin Liška wrote:
>>> On 9/6/19 4:46 PM, Matthew Malcomson wrote:
 Hello,

>> As I understand it, `hwasan-abi=interceptor` vs `platform` is about
>> adding such MTE emulation for "application code" or "platform code (e.g.
>> kernel)" respectively.
> 
> Hm, are you sure? Clang also uses -fsanitize=kernel-hwaddress which should
> be equivalent to kernel-address for -fsanitize=address.
> 

I'm not at all sure it's to do with the kernel ;-}

Here's the commit that adds the flag.
https://reviews.llvm.org/D56038

 From the commit message it seems the point is to distinguish between 
running on runtimes that natively support HWASAN (named the "platform" 
abi) and those where functions like malloc and pthread_create have to be 
intercepted (named the "interceptor" abi).

I had assumed that targeting the kernel would be in the "platform" 
group, but it could easily not be the case.

Considering the message form the below commit it seems that this is more 
targeted at instrumenting things like libc https://reviews.llvm.org/D50922.

I'm currently working on writing down the questions I plan to ask the 
developers of HWASAN in LLVM, I'll put this on the list :-)

>>
>>>
>> There's an even more fundamental problem of accesses within the
>> instrumented binary -- I haven't yet figured out how to remove the tag
>> before accesses on architectures without the AArch64 TBI feature.
> 
> Which should platforms like x86_64, right?

Yes.
As yet I haven't gotten anything working for architectures without TBI 
(everything except AArch64).
This particular problem was one I was hoping for suggestions around (my 
first of the questions in my cover letter).


 The current patch series is far from complete, but I'm posting the current 
 state
 to provide something to discuss at the Cauldron next week.

 In its current state, this sanitizer only works on AArch64 with a custom 
 kernel
 to allow tagged pointers in system calls.  This is discussed in the below 
 link
 https://source.android.com/devices/tech/debug/hwasan -- the custom kernel 
 allows
 tagged pointers in syscalls.
>>>
>>> Can you be please more specific. Is the MTE in upstream linux kernel? If so,
>>> starting from which version?
>>
>> I find I can only make complicated statements remotely clear in bullet
>> points ;-)
>>
>> What I was trying to say was:
>> - HWASAN from this patch series requires AArch64 TBI.
>> (I have not handled architectures without TBI)
>> - The upstream kernel does not accept tagged pointers in syscalls.
>> (programs that use TBI must currently clear tags before passing
>>  pointers to the kernel)
> 
> I know that in case of ASAN, the libasan provides wrappers (interceptors) for 
> various glibc
> functions that are often system calls. Similar wrappers are probably used in 
> HWASAN
> and so that one can create the memory pointer tags.
> 
>> - This patch series doesn't include any way to avoid passing tagged
>> pointers to syscalls.
> 
> I bet LLVM has the same problem so I would expect a handling in the 
> interceptors.
> 

I'm pretty sure this problem hasn't been solved with interceptors.

The android page describing hwasan specifically mentions the requirement 
of a Linux kernel accepting tagged pointers, and I believe this is the 
most supported environment.

https://source.android.com/devices/tech/debug/hwasan
"HWASan requires the Linux kernel to accept tagged pointers in system 
call arguments."

Also, there are surprisingly few interceptors defined in libhwasan.

Thanks,
Matthew

>> - Hence on order to test the sanitizer I'm using a kernel that has been
>> patched to accept tagged pointers in many syscalls.
>> - The link to the android.com site is just another source describing the
>> same requirement.
>>
>>
>> The support for the relaxed ABI (of accepting tagged pointers in various
>> syscalls in the kernel) is being discussed on the kernel mailing list,
>> the latest patchset I know of is here:
>> https://lkml.org/lkml/2019/7/25/725
> 
> Thanks for pointer.
> 


Re: [Patch 0/X] [WIP][RFC][libsanitizer] Introduce HWASAN to GCC

2019-09-11 Thread Martin Liška
On 9/9/19 5:54 PM, Matthew Malcomson wrote:
> On 09/09/19 11:47, Martin Liška wrote:
>> On 9/6/19 4:46 PM, Matthew Malcomson wrote:
>>> Hello,
>>>
>>> This patch series is a WORK-IN-PROGRESS towards porting the LLVM hardware
>>> address sanitizer (HWASAN) in GCC.  The document describing HWASAN can be 
>>> found
>>> here http://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html.
>>
>> Hello.
>>
>> I'm happy that you are working on the functionality for GCC and I can provide
>> my knowledge that I have with ASAN. I briefly read the patch series and I 
>> have
>> multiple questions (and observations):
>>
>> 1) Is the ambition of the patchset to be a software emulation of MTE that can
>> work targets that do not support MTE? Is it something what clang
>> names hwasan-abi=interceptor?
> 
> The ambition is to provide a software emulation of MTE for AArch64 
> targets that don't support MTE.

Hello.

It would be also great to provide the emulation on targets that do not provide 
TBI
(like x86_64).

> I also hope to have the framework set up so that enabling for other 
> architectures is relatively easy and can be done by those interested.
> 
> As I understand it, `hwasan-abi=interceptor` vs `platform` is about 
> adding such MTE emulation for "application code" or "platform code (e.g. 
> kernel)" respectively.

Hm, are you sure? Clang also uses -fsanitize=kernel-hwaddress which should
be equivalent to kernel-address for -fsanitize=address.

> 
>>
>> 2) Do you have a real aarch64 hardware that has MTE support? Would it be 
>> possible
>> for the future to give such a machine to GCC Compile Farm for testing 
>> purpose?
> 
> No our team doesn't have real MTE hardware, I have been testing on an 
> AArch64 machine that has TBI, other work in the team that requires MTE 
> support is being tested on the Arm "Fast Models" emulator.
> 
>>
>> 3) I like the idea of sharing of internal functions like 
>> ASAN_CHECK/HWASAN_CHECK.
>> We should benefit from that in the future.
>>
>> 4) Am I correct that due to escape of "tagged" pointers, one needs to have 
>> an entire
>> DSO (dynamic shared object) built with hwasan enabled? Otherwise, a 
>> dereference of
>> a tagged pointer will lead to a segfault (except TBI feature on aarch64)?
> 
> 
> Yes, one needs to take pains to avoid the escape of tagged pointers on 
> architectures other than AArch64.

Which is the very same pain which MPX was suffering from, before it was dropped
in GCC :)

> 
> I don't believe that compiling the entire DSO with HWASAN enabled is 
> enough, since pointers can be passed across DSO boundaries.
> I haven't yet looked into how to handle this.
> 
> There's an even more fundamental problem of accesses within the 
> instrumented binary -- I haven't yet figured out how to remove the tag 
> before accesses on architectures without the AArch64 TBI feature.

Which should platforms like x86_64, right?

> 
> 
>>
>> 5) Is there a documentation/definition of how shadow memory for memory 
>> tagging looks like?
>> Is it similar to ASAN, where one can get to tag with:
>> u8 memory_tag = *((PTR >> TG) + SHADOW_OFFSET) & 0xf?
>>
> 
> Yes, it's similar.
> 
>  From the libhwasan code, the function to fetch a pointer to the shadow 
> memory byte corresponding to a memory address is MemToShadow.
> 
> constexpr uptr kShadowScale = 4;
> inline uptr MemToShadow(uptr untagged_addr) {
>return (untagged_addr >> kShadowScale) +
>   __hwasan_shadow_memory_dynamic_address;
> }
> 
> https://github.com/llvm-mirror/compiler-rt/blob/99ce9876124e910475c627829bf14326b8073a9d/lib/hwasan/hwasan_mapping.h#L42
> 
> 
>> 6) Note that thing like memtag_tag_size, memtag_granule_size define an ABI 
>> of libsanitizer
>>
> 
> Yes, the size of these values define an ABI.
> 
> Those particular hooks are added as a demonstration for how something 
> like MTE would be implemented on top of this framework (where the 
> backend would specify the tag and granule size to match their targets 
> architecture).
> 
> HWASAN itself would use the hard-coded tag and granule size that matches 
> what libsanitizer uses.
> https://github.com/llvm-mirror/compiler-rt/blob/99ce9876124e910475c627829bf14326b8073a9d/lib/hwasan/hwasan_mapping.h#L36
> 
> I define these as `HWASAN_TAG_SIZE` and `HWASAN_TAG_GRANULE_SIZE` in 
> asan.h, and when using the sanitizer library the macro 
> `HARDWARE_MEMORY_TAGGING` would be false so their values would be constant.
> 
> 
>>>
>>> The current patch series is far from complete, but I'm posting the current 
>>> state
>>> to provide something to discuss at the Cauldron next week.
>>>
>>> In its current state, this sanitizer only works on AArch64 with a custom 
>>> kernel
>>> to allow tagged pointers in system calls.  This is discussed in the below 
>>> link
>>> https://source.android.com/devices/tech/debug/hwasan -- the custom kernel 
>>> allows
>>> tagged pointers in syscalls.
>>
>> Can you be please more specific. Is the MTE in upstream 

Re: [Patch 0/X] [WIP][RFC][libsanitizer] Introduce HWASAN to GCC

2019-09-09 Thread Kostya Serebryany via gcc-patches
+Peter Collingbourne +Evgeniy Stepanov (the main developers of HWASAN
in LLVM,  FYI)
Please note that Peter has recently implemented support for globals in
LLVM's HWASAN.

--kcc

On Mon, Sep 9, 2019 at 8:55 AM Matthew Malcomson
 wrote:
>
> On 09/09/19 11:47, Martin Liška wrote:
> > On 9/6/19 4:46 PM, Matthew Malcomson wrote:
> >> Hello,
> >>
> >> This patch series is a WORK-IN-PROGRESS towards porting the LLVM hardware
> >> address sanitizer (HWASAN) in GCC.  The document describing HWASAN can be 
> >> found
> >> here 
> >> http://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html.
> >
> > Hello.
> >
> > I'm happy that you are working on the functionality for GCC and I can 
> > provide
> > my knowledge that I have with ASAN. I briefly read the patch series and I 
> > have
> > multiple questions (and observations):
> >
> > 1) Is the ambition of the patchset to be a software emulation of MTE that 
> > can
> > work targets that do not support MTE? Is it something what clang
> > names hwasan-abi=interceptor?
>
> The ambition is to provide a software emulation of MTE for AArch64
> targets that don't support MTE.
> I also hope to have the framework set up so that enabling for other
> architectures is relatively easy and can be done by those interested.
>
> As I understand it, `hwasan-abi=interceptor` vs `platform` is about
> adding such MTE emulation for "application code" or "platform code (e.g.
> kernel)" respectively.
>
> >
> > 2) Do you have a real aarch64 hardware that has MTE support? Would it be 
> > possible
> > for the future to give such a machine to GCC Compile Farm for testing 
> > purpose?
>
> No our team doesn't have real MTE hardware, I have been testing on an
> AArch64 machine that has TBI, other work in the team that requires MTE
> support is being tested on the Arm "Fast Models" emulator.
>
> >
> > 3) I like the idea of sharing of internal functions like 
> > ASAN_CHECK/HWASAN_CHECK.
> > We should benefit from that in the future.
> >
> > 4) Am I correct that due to escape of "tagged" pointers, one needs to have 
> > an entire
> > DSO (dynamic shared object) built with hwasan enabled? Otherwise, a 
> > dereference of
> > a tagged pointer will lead to a segfault (except TBI feature on aarch64)?
>
>
> Yes, one needs to take pains to avoid the escape of tagged pointers on
> architectures other than AArch64.
>
> I don't believe that compiling the entire DSO with HWASAN enabled is
> enough, since pointers can be passed across DSO boundaries.
> I haven't yet looked into how to handle this.
>
> There's an even more fundamental problem of accesses within the
> instrumented binary -- I haven't yet figured out how to remove the tag
> before accesses on architectures without the AArch64 TBI feature.
>
>
> >
> > 5) Is there a documentation/definition of how shadow memory for memory 
> > tagging looks like?
> > Is it similar to ASAN, where one can get to tag with:
> > u8 memory_tag = *((PTR >> TG) + SHADOW_OFFSET) & 0xf?
> >
>
> Yes, it's similar.
>
>  From the libhwasan code, the function to fetch a pointer to the shadow
> memory byte corresponding to a memory address is MemToShadow.
>
> constexpr uptr kShadowScale = 4;
> inline uptr MemToShadow(uptr untagged_addr) {
>return (untagged_addr >> kShadowScale) +
>   __hwasan_shadow_memory_dynamic_address;
> }
>
> https://github.com/llvm-mirror/compiler-rt/blob/99ce9876124e910475c627829bf14326b8073a9d/lib/hwasan/hwasan_mapping.h#L42
>
>
> > 6) Note that thing like memtag_tag_size, memtag_granule_size define an ABI 
> > of libsanitizer
> >
>
> Yes, the size of these values define an ABI.
>
> Those particular hooks are added as a demonstration for how something
> like MTE would be implemented on top of this framework (where the
> backend would specify the tag and granule size to match their targets
> architecture).
>
> HWASAN itself would use the hard-coded tag and granule size that matches
> what libsanitizer uses.
> https://github.com/llvm-mirror/compiler-rt/blob/99ce9876124e910475c627829bf14326b8073a9d/lib/hwasan/hwasan_mapping.h#L36
>
> I define these as `HWASAN_TAG_SIZE` and `HWASAN_TAG_GRANULE_SIZE` in
> asan.h, and when using the sanitizer library the macro
> `HARDWARE_MEMORY_TAGGING` would be false so their values would be constant.
>
>
> >>
> >> The current patch series is far from complete, but I'm posting the current 
> >> state
> >> to provide something to discuss at the Cauldron next week.
> >>
> >> In its current state, this sanitizer only works on AArch64 with a custom 
> >> kernel
> >> to allow tagged pointers in system calls.  This is discussed in the below 
> >> link
> >> https://source.android.com/devices/tech/debug/hwasan -- the custom kernel 
> >> allows
> >> tagged pointers in syscalls.
> >
> > Can you be please more specific. Is the MTE in upstream linux kernel? If so,
> > starting from which version?
>
> I find I can only make complicated statements remotely clear in bullet
> points ;-)
>
> What I 

Re: [Patch 0/X] [WIP][RFC][libsanitizer] Introduce HWASAN to GCC

2019-09-09 Thread Matthew Malcomson
On 09/09/19 11:47, Martin Liška wrote:
> On 9/6/19 4:46 PM, Matthew Malcomson wrote:
>> Hello,
>>
>> This patch series is a WORK-IN-PROGRESS towards porting the LLVM hardware
>> address sanitizer (HWASAN) in GCC.  The document describing HWASAN can be 
>> found
>> here http://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html.
> 
> Hello.
> 
> I'm happy that you are working on the functionality for GCC and I can provide
> my knowledge that I have with ASAN. I briefly read the patch series and I have
> multiple questions (and observations):
> 
> 1) Is the ambition of the patchset to be a software emulation of MTE that can
> work targets that do not support MTE? Is it something what clang
> names hwasan-abi=interceptor?

The ambition is to provide a software emulation of MTE for AArch64 
targets that don't support MTE.
I also hope to have the framework set up so that enabling for other 
architectures is relatively easy and can be done by those interested.

As I understand it, `hwasan-abi=interceptor` vs `platform` is about 
adding such MTE emulation for "application code" or "platform code (e.g. 
kernel)" respectively.

> 
> 2) Do you have a real aarch64 hardware that has MTE support? Would it be 
> possible
> for the future to give such a machine to GCC Compile Farm for testing 
> purpose?

No our team doesn't have real MTE hardware, I have been testing on an 
AArch64 machine that has TBI, other work in the team that requires MTE 
support is being tested on the Arm "Fast Models" emulator.

> 
> 3) I like the idea of sharing of internal functions like 
> ASAN_CHECK/HWASAN_CHECK.
> We should benefit from that in the future.
> 
> 4) Am I correct that due to escape of "tagged" pointers, one needs to have an 
> entire
> DSO (dynamic shared object) built with hwasan enabled? Otherwise, a 
> dereference of
> a tagged pointer will lead to a segfault (except TBI feature on aarch64)?


Yes, one needs to take pains to avoid the escape of tagged pointers on 
architectures other than AArch64.

I don't believe that compiling the entire DSO with HWASAN enabled is 
enough, since pointers can be passed across DSO boundaries.
I haven't yet looked into how to handle this.

There's an even more fundamental problem of accesses within the 
instrumented binary -- I haven't yet figured out how to remove the tag 
before accesses on architectures without the AArch64 TBI feature.


> 
> 5) Is there a documentation/definition of how shadow memory for memory 
> tagging looks like?
> Is it similar to ASAN, where one can get to tag with:
> u8 memory_tag = *((PTR >> TG) + SHADOW_OFFSET) & 0xf?
> 

Yes, it's similar.

 From the libhwasan code, the function to fetch a pointer to the shadow 
memory byte corresponding to a memory address is MemToShadow.

constexpr uptr kShadowScale = 4;
inline uptr MemToShadow(uptr untagged_addr) {
   return (untagged_addr >> kShadowScale) +
  __hwasan_shadow_memory_dynamic_address;
}

https://github.com/llvm-mirror/compiler-rt/blob/99ce9876124e910475c627829bf14326b8073a9d/lib/hwasan/hwasan_mapping.h#L42


> 6) Note that thing like memtag_tag_size, memtag_granule_size define an ABI of 
> libsanitizer
> 

Yes, the size of these values define an ABI.

Those particular hooks are added as a demonstration for how something 
like MTE would be implemented on top of this framework (where the 
backend would specify the tag and granule size to match their targets 
architecture).

HWASAN itself would use the hard-coded tag and granule size that matches 
what libsanitizer uses.
https://github.com/llvm-mirror/compiler-rt/blob/99ce9876124e910475c627829bf14326b8073a9d/lib/hwasan/hwasan_mapping.h#L36

I define these as `HWASAN_TAG_SIZE` and `HWASAN_TAG_GRANULE_SIZE` in 
asan.h, and when using the sanitizer library the macro 
`HARDWARE_MEMORY_TAGGING` would be false so their values would be constant.


>>
>> The current patch series is far from complete, but I'm posting the current 
>> state
>> to provide something to discuss at the Cauldron next week.
>>
>> In its current state, this sanitizer only works on AArch64 with a custom 
>> kernel
>> to allow tagged pointers in system calls.  This is discussed in the below 
>> link
>> https://source.android.com/devices/tech/debug/hwasan -- the custom kernel 
>> allows
>> tagged pointers in syscalls.
> 
> Can you be please more specific. Is the MTE in upstream linux kernel? If so,
> starting from which version?

I find I can only make complicated statements remotely clear in bullet 
points ;-)

What I was trying to say was:
- HWASAN from this patch series requires AArch64 TBI.
   (I have not handled architectures without TBI)
- The upstream kernel does not accept tagged pointers in syscalls.
   (programs that use TBI must currently clear tags before passing
pointers to the kernel)
- This patch series doesn't include any way to avoid passing tagged
   pointers to syscalls.
- Hence on order to test the sanitizer I'm using a kernel that has been

Re: [Patch 0/X] [WIP][RFC][libsanitizer] Introduce HWASAN to GCC

2019-09-09 Thread Martin Liška
On 9/6/19 4:46 PM, Matthew Malcomson wrote:
> Hello,
> 
> This patch series is a WORK-IN-PROGRESS towards porting the LLVM hardware
> address sanitizer (HWASAN) in GCC.  The document describing HWASAN can be 
> found
> here http://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html.

Hello.

I'm happy that you are working on the functionality for GCC and I can provide
my knowledge that I have with ASAN. I briefly read the patch series and I have
multiple questions (and observations):

1) Is the ambition of the patchset to be a software emulation of MTE that can
   work targets that do not support MTE? Is it something what clang
   names hwasan-abi=interceptor?

2) Do you have a real aarch64 hardware that has MTE support? Would it be 
possible
   for the future to give such a machine to GCC Compile Farm for testing 
purpose?

3) I like the idea of sharing of internal functions like 
ASAN_CHECK/HWASAN_CHECK.
   We should benefit from that in the future.

4) Am I correct that due to escape of "tagged" pointers, one needs to have an 
entire
DSO (dynamic shared object) built with hwasan enabled? Otherwise, a dereference 
of
a tagged pointer will lead to a segfault (except TBI feature on aarch64)?

5) Is there a documentation/definition of how shadow memory for memory tagging 
looks like?
Is it similar to ASAN, where one can get to tag with:
u8 memory_tag = *((PTR >> TG) + SHADOW_OFFSET) & 0xf?

6) Note that thing like memtag_tag_size, memtag_granule_size define an ABI of 
libsanitizer

> 
> The current patch series is far from complete, but I'm posting the current 
> state
> to provide something to discuss at the Cauldron next week.
> 
> In its current state, this sanitizer only works on AArch64 with a custom 
> kernel
> to allow tagged pointers in system calls.  This is discussed in the below link
> https://source.android.com/devices/tech/debug/hwasan -- the custom kernel 
> allows
> tagged pointers in syscalls.

Can you be please more specific. Is the MTE in upstream linux kernel? If so,
starting from which version?

> I have also not yet put tests into the DejaGNU framework, but instead have a
> simple test file from which the tests will eventually come.  That test file is
> attached to this email despite not being in the patch series.
> 
> Something close to this patch series bootstraps and passes most regression
> tests when ~--with-build-config=bootstrap-hwasan~ is used.  The regressions it
> doesn't pass are all the other sanitizer tests and all linker plugin tests.
> The linker plugin tests fail due to a configuration problem where the library
> path is not correctly set.
> (I say "something close to this patch series" because I recently made a change
> that breaks bootstrap but I believe is the best approach once I've fixed it,
> hence for an RFC I'm leaving it in).
> 
> HWASAN works by storing a tag in the top bits of every pointer and a colour in
> a shadow memory region corresponding to every area of memory.  On every memory
> access through a pointer the tag in the pointer is checked against the colour 
> in
> shadow memory corresponding to the memory the pointer is accessing.  If the 
> tag
> and colour do not match then a fault is signalled.
> 
> The instrumentation required for this sanitizer has a large overlap with the
> instrumentation required for implementing MTE (which has similar functionality
> but checks are automatically done in the hardware and instructions for 
> colouring
> shadow memory and for managing tags are provided by the architecture).
> https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-a-profile-architecture-2018-developments-armv85a
> 
> We hope to use the HWASAN framework to implement MTE tagging on the stack, and
> hence I have a "dummy" patch demonstrating the approach envisaged for this.

What's the situation with heap allocated memory and global variables?

> 
> Though there is still much to implement here, the general approach should be
> clear.  Any feedback is welcomed, but I have three main points that I'm
> particularly hoping for external opinions.
> 
> 1) The current approach stores a tag on the RTL representing a given variable,
>in order to implement HWASAN for x86_64 the tag needs to be removed before
>every memory access but not on things like function calls.
>Is there any obvious way to handle removing the tag in these places?
>Maybe something with legitimize_address?

Not being a target expect, but I bet you'll need to store the tag with a RTL
representation of a stack variable.

Thanks,
Martin

> 2) The first draft presented here introduces a new RTL expression called
>ADDTAG.  I now believe that a hook would be neater here but haven't yet
>looked into it.  Do people agree?
>(addtag is introduced in the patch titled "Put tags into each stack 
> variable
>pointer", but the reason it's introduced is so the backend can define how
>this gets implemented with a