Re: Add support to trace comparison instructions and switch statements

2017-07-14 Thread Dmitry Vyukov via gcc-patches
On Thu, Jul 13, 2017 at 11:18 PM, Kostya Serebryany  wrote:
>> > Hi
>> >
>> > I wrote a test for "-fsanitize-coverage=trace-cmp" .
>> >
>> > Is there anybody tells me if these codes could be merged into gcc ?
>>
>>
>> Nice!
>>
>> We are currently working on Linux kernel fuzzing that use the
>> comparison tracing. We use clang at the moment, but having this
>> support in gcc would be great for kernel land.
>>
>> One concern I have: do we want to do some final refinements to the API
>> before we implement this in both compilers?
>>
>> 2 things we considered from our perspective:
>>  - communicating to the runtime which operands are constants
>>  - communicating to the runtime which comparisons are counting loop checks
>>
>> First is useful if you do "find one operand in input and replace with
>> the other one" thing. Second is useful because counting loop checks
>> are usually not useful (at least all but one).
>> In the original Go implementation I also conveyed signedness of
>> operands, exact comparison operation (<, >, etc):
>> https://github.com/dvyukov/go-fuzz/blob/master/go-fuzz-defs/defs.go#L13
>> But I did not find any use for that.
>> I also gave all comparisons unique IDs:
>> https://github.com/dvyukov/go-fuzz/blob/master/go-fuzz-dep/sonar.go#L24
>> That turned out to be useful. And there are chances we will want this
>> for C/C++ as well.
>>
>> Kostya, did anything like this pop up in your work on libfuzzer?
>> Can we still change the clang API? At least add an additional argument
>> to the callbacks?
>
>
> I'd prefer not to change the API, but extend it (new compiler flag, new
> callbacks), if absolutely needed.
> Probably make it trace-cmp-guard (similar to trace-pc-guard, with an extra
> parameter that has the ID).
> I don't like the approach with compiler-generated constant IDs.

Yes, if we do it for C/C++, we need to create globals and pass pointer
to a global to the callbacks. IDs do not work for C/C++.

> Yes, it's a bit more efficient, but much less flexible in presence of
> multiple modules, DSOs, dlopen, etc.
>
> I was also looking at completely inlining this instrumentation because it's
> pretty expensive even in it's current form
> (adding more parameters will make things worse).
> This is going to be much less flexible, of course, so I'll attack it only
> once I settle on the algorithm to handle CMPs in libFuzzer.

This will require a new, completely different API for
compiler<->runtime anyway, so we can put this aside for now.


>> At the very least I would suggest that we add an additional arg that
>> contains some flags (1/2 arg is a const, this is counting loop check,
>> etc). If we do that we can also have just 1 callback that accepts
>> uint64's for args because we can pass operand size in the flags:
>
>
> How many flag combinations do we need and do we *really* need them?
>
> If the number of flag combinations is small, I'd prefer to have separate
> callbacks (__sanitizer_cov_trace_cmp_loop_bound ?)
>
> Do we really need to know that one arg is a const?
> It could well be a constant in fact, but compiler won't see it.
>
> int foo(int n) {   ... if (i < n) ... }
> ...
> foo(42);  // not inlined.
>
> We need to handle both cases the same way.


Well, following this line of reasoning we would need only
__asan_load/storeN callbacks for asan and remove
__asan_load/store1/2/4/8, because compiler might not know the size at
compile time. Constant-ness is only an optimization. If compiler does
not know that something is a const, fine. Based on my experience with
go-fuzz and our early experience with kernel, we badly need const
hint. Otherwise fuzzer generates gazillions of candidates based on
comparison arguments. Note that kernel is several order of magnitude
larger than what people usually fuzz in user-space, inputs are more
complex and at the same time execution speed is several order of
magnitude lower. We can't rely on raw speed.

Thinking of this more, I don't thing that globals will be useful in
the kernel context (the main problem is that we have multiple
transient isolated kernels). If we track per-comparison site
information, we will probably use PCs. So I am ready to give up on
this.

Both of you expressed concerns about performance. Kostya says we
should not break existing clang API.
If we limit this to only constant-ness, then I think we can make this
both forward and backward compatible, which means we don't need to
handle it now. E.g. we can:
 - if both operands are const (if it's possible at all), don't emit any callback
 - if only one is const, emit __sanitizer_cov_trace_cmp_const1 and
pass the const in a known position (i.e. always first/second arg)
 - if none are const, emit callback __sanitizer_cov_trace_cmp_dyn1
Then compiler emits weak aliases form
__sanitizer_cov_trace_cmp_const/dyn1 to the old
__sanitizer_cov_trace_cmp1, which makes it backwards compatible.
New runtimes that implement __sanitizer_cov_trace_cmp_const/dyn1, also
need to provide 

Re: Add support to trace comparison instructions and switch statements

2017-07-13 Thread Dmitry Vyukov via gcc-patches
On Tue, Jul 11, 2017 at 1:59 PM, Wish Wu  wrote:
> Hi
>
> I wrote a test for "-fsanitize-coverage=trace-cmp" .
>
> Is there anybody tells me if these codes could be merged into gcc ?


Nice!

We are currently working on Linux kernel fuzzing that use the
comparison tracing. We use clang at the moment, but having this
support in gcc would be great for kernel land.

One concern I have: do we want to do some final refinements to the API
before we implement this in both compilers?

2 things we considered from our perspective:
 - communicating to the runtime which operands are constants
 - communicating to the runtime which comparisons are counting loop checks

First is useful if you do "find one operand in input and replace with
the other one" thing. Second is useful because counting loop checks
are usually not useful (at least all but one).
In the original Go implementation I also conveyed signedness of
operands, exact comparison operation (<, >, etc):
https://github.com/dvyukov/go-fuzz/blob/master/go-fuzz-defs/defs.go#L13
But I did not find any use for that.
I also gave all comparisons unique IDs:
https://github.com/dvyukov/go-fuzz/blob/master/go-fuzz-dep/sonar.go#L24
That turned out to be useful. And there are chances we will want this
for C/C++ as well.

Kostya, did anything like this pop up in your work on libfuzzer?
Can we still change the clang API? At least add an additional argument
to the callbacks?

At the very least I would suggest that we add an additional arg that
contains some flags (1/2 arg is a const, this is counting loop check,
etc). If we do that we can also have just 1 callback that accepts
uint64's for args because we can pass operand size in the flags:

void __sanitizer_cov_trace_cmp(uint64 arg1, uint64 arg2, uint64 flags);

But I wonder if 3 uint64 args will be too inefficient for 32 bit archs?...

If we create a global per comparison then we could put the flags into
the global:

void __sanitizer_cov_trace_cmp(uint64 arg1, uint64 arg2, something_t *global);

Thoughts?




> Index: gcc/testsuite/gcc.dg/sancov/basic3.c
> ===
> --- gcc/testsuite/gcc.dg/sancov/basic3.c (nonexistent)
> +++ gcc/testsuite/gcc.dg/sancov/basic3.c (working copy)
> @@ -0,0 +1,42 @@
> +/* Basic test on number of inserted callbacks.  */
> +/* { dg-do compile } */
> +/* { dg-options "-fsanitize-coverage=trace-cmp -fdump-tree-optimized" } */
> +
> +void foo(char *a, short *b, int *c, long long *d, float *e, double *f)
> +{
> +  if (*a)
> +*a += 1;
> +  if (*b)
> +*b = *a;
> +  if (*c)
> +*c += 1;
> +  if(*d)
> +*d = *c;
> +  if(*e == *c)
> +*e = *c;
> +  if(*f == *e)
> +*f = *e;
> +  switch(*a)
> +{
> +case 2:
> +  *b += 2;
> +  break;
> +default:
> +  break;
> +}
> +  switch(*d)
> +{
> +case 3:
> +  *d += 3;
> +case -4:
> +  *d -= 4;
> +}
> +}
> +
> +/* { dg-final { scan-tree-dump-times
> "__builtin___sanitizer_cov_trace_cmp1 \\(" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times
> "__builtin___sanitizer_cov_trace_cmp2 \\(" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times
> "__builtin___sanitizer_cov_trace_cmp4 \\(" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times
> "__builtin___sanitizer_cov_trace_cmp8 \\(" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times
> "__builtin___sanitizer_cov_trace_cmpf \\(" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times
> "__builtin___sanitizer_cov_trace_cmpd \\(" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times
> "__builtin___sanitizer_cov_trace_switch \\(" 2 "optimized" } } */
>
>
> With Regards
> Wish Wu
>
> On Mon, Jul 10, 2017 at 8:07 PM, 吴潍浠(此彼)  wrote:
>> Hi
>>
>> I write some codes to make gcc support comparison-guided fuzzing.
>> It is very like 
>> http://clang.llvm.org/docs/SanitizerCoverage.html#tracing-data-flow .
>> With -fsanitize-coverage=trace-cmp the compiler will insert extra 
>> instrumentation around comparison instructions and switch statements.
>> I think it is useful for fuzzing.  :D
>>
>> Patch is below, I may supply test cases later.
>>
>> With Regards
>> Wish Wu
>>
>> Index: gcc/asan.c
>> ===
>> --- gcc/asan.c  (revision 250082)
>> +++ gcc/asan.c  (working copy)
>> @@ -2705,6 +2705,29 @@ initialize_sanitizer_builtins (void)
>>tree BT_FN_SIZE_CONST_PTR_INT
>>  = build_function_type_list (size_type_node, const_ptr_type_node,
>> integer_type_node, NULL_TREE);
>> +
>> +  tree BT_FN_VOID_UINT8_UINT8
>> += build_function_type_list (void_type_node, unsigned_char_type_node,
>> +   unsigned_char_type_node, NULL_TREE);
>> +  tree BT_FN_VOID_UINT16_UINT16
>> += build_function_type_list (void_type_node, uint16_type_node,
>> +   uint16_type_node, NULL_TREE);
>> 

Re: Add support to trace comparison instructions and switch statements

2017-07-13 Thread Dmitry Vyukov via gcc-patches
On Thu, Jul 13, 2017 at 12:41 PM, Wish Wu  wrote:
> Hi
>
> In fact, under linux with "return address" and file "/proc/self/maps",
> we can give unique id for every comparison.

Yes, it's doable. But you expressed worries about performance hit of
merging callbacks for different sizes. Mapping pc + info from
/proc/self/maps to a unique id via an external map is an order of
magnitude slower than the hit of merged callbacks.


> For fuzzing, we may give 3 bits for every comparison as marker of if
> "<", "==" or ">" is showed. :D
>
> With Regards
> Wish Wu of Ant-financial Light-Year Security Lab
>
> On Thu, Jul 13, 2017 at 6:04 PM, Wish Wu  wrote:
>> Hi
>>
>> In my perspective:
>>
>> 1. Do we need to assign unique id for every comparison ?
>> Yes, I suggest to implement it like -fsanitize-coverage=trace-pc-guard .
>> Because some fuzzing targets may invoke dlopen() like functions to
>> load libraries(modules) after fork(), while these libraries are
>> compiled with trace-cmp as well.
>> With ALSR enabled by linker and/or kernel, return address can't be
>> a unique id for every comparison.
>>
>> 2. Should we merge cmp1(),cmp2(),cmp4(),cmp8(),cmpf(),cmpd() into one cmp() ?
>> No, It may reduce the performance of fuzzing. It may wastes
>> registers. But the number "switch" statements are much less than "if",
>> I forgive "switch"'s wasting behaviors.
>>
>> 3.Should we record operands(<,>,==,<= ..) ?
>> Probably no. As comparison,"<" , "==" and ">" all of them are
>> meaningful, because programmers must have some reasons to do that. As
>> practice , "==" is more meaningful.
>>
>> 4.Should we record comparisons for counting loop checks ?
>> Not sure.
>>
>> With Regards
>> Wish Wu of Ant-financial Light-Year Security Lab
>>
>> On Thu, Jul 13, 2017 at 4:09 PM, Dmitry Vyukov  wrote:
>>> On Tue, Jul 11, 2017 at 1:59 PM, Wish Wu  wrote:
 Hi

 I wrote a test for "-fsanitize-coverage=trace-cmp" .

 Is there anybody tells me if these codes could be merged into gcc ?
>>>
>>>
>>> Nice!
>>>
>>> We are currently working on Linux kernel fuzzing that use the
>>> comparison tracing. We use clang at the moment, but having this
>>> support in gcc would be great for kernel land.
>>>
>>> One concern I have: do we want to do some final refinements to the API
>>> before we implement this in both compilers?
>>>
>>> 2 things we considered from our perspective:
>>>  - communicating to the runtime which operands are constants
>>>  - communicating to the runtime which comparisons are counting loop checks
>>>
>>> First is useful if you do "find one operand in input and replace with
>>> the other one" thing. Second is useful because counting loop checks
>>> are usually not useful (at least all but one).
>>> In the original Go implementation I also conveyed signedness of
>>> operands, exact comparison operation (<, >, etc):
>>> https://github.com/dvyukov/go-fuzz/blob/master/go-fuzz-defs/defs.go#L13
>>> But I did not find any use for that.
>>> I also gave all comparisons unique IDs:
>>> https://github.com/dvyukov/go-fuzz/blob/master/go-fuzz-dep/sonar.go#L24
>>> That turned out to be useful. And there are chances we will want this
>>> for C/C++ as well.
>>>
>>> Kostya, did anything like this pop up in your work on libfuzzer?
>>> Can we still change the clang API? At least add an additional argument
>>> to the callbacks?
>>>
>>> At the very least I would suggest that we add an additional arg that
>>> contains some flags (1/2 arg is a const, this is counting loop check,
>>> etc). If we do that we can also have just 1 callback that accepts
>>> uint64's for args because we can pass operand size in the flags:
>>>
>>> void __sanitizer_cov_trace_cmp(uint64 arg1, uint64 arg2, uint64 flags);
>>>
>>> But I wonder if 3 uint64 args will be too inefficient for 32 bit archs?...
>>>
>>> If we create a global per comparison then we could put the flags into
>>> the global:
>>>
>>> void __sanitizer_cov_trace_cmp(uint64 arg1, uint64 arg2, something_t 
>>> *global);
>>>
>>> Thoughts?
>>>
>>>
>>>
>>>
 Index: gcc/testsuite/gcc.dg/sancov/basic3.c
 ===
 --- gcc/testsuite/gcc.dg/sancov/basic3.c (nonexistent)
 +++ gcc/testsuite/gcc.dg/sancov/basic3.c (working copy)
 @@ -0,0 +1,42 @@
 +/* Basic test on number of inserted callbacks.  */
 +/* { dg-do compile } */
 +/* { dg-options "-fsanitize-coverage=trace-cmp -fdump-tree-optimized" } */
 +
 +void foo(char *a, short *b, int *c, long long *d, float *e, double *f)
 +{
 +  if (*a)
 +*a += 1;
 +  if (*b)
 +*b = *a;
 +  if (*c)
 +*c += 1;
 +  if(*d)
 +*d = *c;
 +  if(*e == *c)
 +*e = *c;
 +  if(*f == *e)
 +*f = *e;
 +  switch(*a)
 +{
 +case 2:
 +  *b += 2;
 +  break;
 +default:

Re: Add support to trace comparison instructions and switch statements

2017-07-14 Thread Dmitry Vyukov via gcc-patches
On Fri, Jul 14, 2017 at 11:17 PM, Kostya Serebryany  wrote:
 > Hi
 >
 > I wrote a test for "-fsanitize-coverage=trace-cmp" .
 >
 > Is there anybody tells me if these codes could be merged into gcc ?


 Nice!

 We are currently working on Linux kernel fuzzing that use the
 comparison tracing. We use clang at the moment, but having this
 support in gcc would be great for kernel land.

 One concern I have: do we want to do some final refinements to the API
 before we implement this in both compilers?

 2 things we considered from our perspective:
  - communicating to the runtime which operands are constants
  - communicating to the runtime which comparisons are counting loop checks

 First is useful if you do "find one operand in input and replace with
 the other one" thing. Second is useful because counting loop checks
 are usually not useful (at least all but one).
 In the original Go implementation I also conveyed signedness of
 operands, exact comparison operation (<, >, etc):
 https://github.com/dvyukov/go-fuzz/blob/master/go-fuzz-defs/defs.go#L13
 But I did not find any use for that.
 I also gave all comparisons unique IDs:
 https://github.com/dvyukov/go-fuzz/blob/master/go-fuzz-dep/sonar.go#L24
 That turned out to be useful. And there are chances we will want this
 for C/C++ as well.

 Kostya, did anything like this pop up in your work on libfuzzer?
 Can we still change the clang API? At least add an additional argument
 to the callbacks?
>>>
>>>
>>> I'd prefer not to change the API, but extend it (new compiler flag, new
>>> callbacks), if absolutely needed.
>>> Probably make it trace-cmp-guard (similar to trace-pc-guard, with an extra
>>> parameter that has the ID).
>>> I don't like the approach with compiler-generated constant IDs.
>>
>> Yes, if we do it for C/C++, we need to create globals and pass pointer
>> to a global to the callbacks. IDs do not work for C/C++.
>>
>>> Yes, it's a bit more efficient, but much less flexible in presence of
>>> multiple modules, DSOs, dlopen, etc.
>>>
>>> I was also looking at completely inlining this instrumentation because it's
>>> pretty expensive even in it's current form
>>> (adding more parameters will make things worse).
>>> This is going to be much less flexible, of course, so I'll attack it only
>>> once I settle on the algorithm to handle CMPs in libFuzzer.
>>
>> This will require a new, completely different API for
>> compiler<->runtime anyway, so we can put this aside for now.
>>
>>
 At the very least I would suggest that we add an additional arg that
 contains some flags (1/2 arg is a const, this is counting loop check,
 etc). If we do that we can also have just 1 callback that accepts
 uint64's for args because we can pass operand size in the flags:
>>>
>>>
>>> How many flag combinations do we need and do we *really* need them?
>>>
>>> If the number of flag combinations is small, I'd prefer to have separate
>>> callbacks (__sanitizer_cov_trace_cmp_loop_bound ?)
>>>
>>> Do we really need to know that one arg is a const?
>>> It could well be a constant in fact, but compiler won't see it.
>>>
>>> int foo(int n) {   ... if (i < n) ... }
>>> ...
>>> foo(42);  // not inlined.
>>>
>>> We need to handle both cases the same way.
>>
>>
>> Well, following this line of reasoning we would need only
>> __asan_load/storeN callbacks for asan and remove
>> __asan_load/store1/2/4/8, because compiler might not know the size at
>> compile time. Constant-ness is only an optimization. If compiler does
>> not know that something is a const, fine. Based on my experience with
>> go-fuzz and our early experience with kernel, we badly need const
>> hint.
>> Otherwise fuzzer generates gazillions of candidates based on
>> comparison arguments. Note that kernel is several order of magnitude
>> larger than what people usually fuzz in user-space, inputs are more
>> complex and at the same time execution speed is several order of
>> magnitude lower. We can't rely on raw speed.
>>
>> Thinking of this more, I don't thing that globals will be useful in
>> the kernel context (the main problem is that we have multiple
>> transient isolated kernels). If we track per-comparison site
>> information, we will probably use PCs. So I am ready to give up on
>> this.
>>
>> Both of you expressed concerns about performance. Kostya says we
>> should not break existing clang API.
>> If we limit this to only constant-ness, then I think we can make this
>> both forward and backward compatible, which means we don't need to
>> handle it now. E.g. we can:
>>  - if both operands are const (if it's possible at all), don't emit any 
>> callback
>>  - if only one is const, emit __sanitizer_cov_trace_cmp_const1 and
>> pass the const in a known position (i.e. always first/second arg)
>>  - if none are const, emit callback 

Re: Add support to trace comparison instructions and switch statements

2017-07-15 Thread Dmitry Vyukov via gcc-patches
On Sat, Jul 15, 2017 at 9:21 AM, 吴潍浠(此彼)  wrote:
> Hi
>
> Implementing __sanitizer_cov_trace_cmp[1248]_const is OK .
> And I will try to find some determinate way to judge this comparison is for 
> loop or not.
> Because all the loops(for() or while()) seem to be transformed to "if" and 
> "goto" before running sancov pass.
> Does it necessary to include APIs for float and double comparison ? like 
> __sanitizer_cov_trace_cmpf(float, float)
> Why you do not include them ?


Note you don't need to implement any of this right now, since we can
make it [almost] backwards compatible. Unless, of course, it's simple
and you find it useful and want to do this.
I think it's better to get a first version of the change in as is
(implement current clang API), and then do improvements in subsequent
patches. I would expect that detecting consts should be simple. Re
loops, I don't know, I would expect that gcc has some existing
analysis for loop (it does lots of optimization with counting loop).
Re floats/doubles, I am not sure, we managed to live without them in
go-fuzz and then in libfuzzer, and I still don't see lots of value in
them. Anyway, it should be a separate patch.




> Now, I am following some suggests about my codes from Gribow. Final patch is 
> still on processing.
> I am also waiting for copyright assignment of FSF which is requested by Jeff.
>
> With Regards
> Wish Wu
>
> --
> From:Dmitry Vyukov 
> Time:2017 Jul 15 (Sat) 13:41
> To:Kostya Serebryany 
> Cc:Wish Wu ; gcc ; gcc-patches 
> ; Wish Wu ; Alexander 
> Potapenko ; andreyknvl ; Victor 
> Chibotaru ; Yuri Gribov 
> Subject:Re: Add support to trace comparison instructions and switch statements
>
>
> On Fri, Jul 14, 2017 at 11:17 PM, Kostya Serebryany  wrote:
> > Hi
> >
> > I wrote a test for "-fsanitize-coverage=trace-cmp" .
> >
> > Is there anybody tells me if these codes could be merged into gcc ?
>
>
> Nice!
>
> We are currently working on Linux kernel fuzzing that use the
> comparison tracing. We use clang at the moment, but having this
> support in gcc would be great for kernel land.
>
> One concern I have: do we want to do some final refinements to the API
> before we implement this in both compilers?
>
> 2 things we considered from our perspective:
>  - communicating to the runtime which operands are constants
>  - communicating to the runtime which comparisons are counting loop checks
>
> First is useful if you do "find one operand in input and replace with
> the other one" thing. Second is useful because counting loop checks
> are usually not useful (at least all but one).
> In the original Go implementation I also conveyed signedness of
> operands, exact comparison operation (<, >, etc):
> https://github.com/dvyukov/go-fuzz/blob/master/go-fuzz-defs/defs.go#L13
> But I did not find any use for that.
> I also gave all comparisons unique IDs:
> https://github.com/dvyukov/go-fuzz/blob/master/go-fuzz-dep/sonar.go#L24
> That turned out to be useful. And there are chances we will want this
> for C/C++ as well.
>
> Kostya, did anything like this pop up in your work on libfuzzer?
> Can we still change the clang API? At least add an additional argument
> to the callbacks?


 I'd prefer not to change the API, but extend it (new compiler flag, new
 callbacks), if absolutely needed.
 Probably make it trace-cmp-guard (similar to trace-pc-guard, with an extra
 parameter that has the ID).
 I don't like the approach with compiler-generated constant IDs.
>>>
>>> Yes, if we do it for C/C++, we need to create globals and pass pointer
>>> to a global to the callbacks. IDs do not work for C/C++.
>>>
 Yes, it's a bit more efficient, but much less flexible in presence of
 multiple modules, DSOs, dlopen, etc.

 I was also looking at completely inlining this instrumentation because it's
 pretty expensive even in it's current form
 (adding more parameters will make things worse).
 This is going to be much less flexible, of course, so I'll attack it only
 once I settle on the algorithm to handle CMPs in libFuzzer.
>>>
>>> This will require a new, completely different API for
>>> compiler<->runtime anyway, so we can put this aside for now.
>>>
>>>
> At the very least I would suggest that we add an additional arg that
> contains some flags (1/2 arg is a const, this is counting loop check,
> etc). If we do that we can also have just 1 callback that accepts
> uint64's for args because we can pass operand size in the flags:


 How many flag 

Re: Add support to trace comparison instructions and switch statements

2017-08-30 Thread Dmitry Vyukov via gcc-patches
On Sat, Aug 5, 2017 at 11:53 AM, 吴潍浠(此彼)  wrote:
> Hi all
> Is it worth adding my codes to gcc ? Are there some steps I need to do ?
> Could somebody tell me the progress ?


FYI, we've mailed a Linux kernel change that uses this instrumentation:
https://groups.google.com/forum/#!topic/syzkaller/r0ARNVV-Bhg
Another reason to have this in gcc.

Can somebody from gcc maintainers take a look at this? Jakub?

Thanks


> Maybe there should be a project like libfuzzer to solve bugs in program.
>
> Wish Wu
> --
> From:Wish Wu 
> Time:2017 Jul 21 (Fri) 13:38
> To:gcc ; gcc-patches ; Jeff Law 
> 
> Cc:wishwu007 
> Subject:Re: Add support to trace comparison instructions and switch statements
>
>
> Hi Jeff
>
> I have signed the copyright assignment, and used the name 'Wish Wu' .
> Should I send you a copy of my assignment ?
>
> The attachment is my new patch with small changes.
> Codes are checked by ./contrib/check_GNU_style.sh, except some special files.
>
> With
>
> --
> From:Jeff Law 
> Time:2017 Jul 14 (Fri) 15:37
> To:Wish Wu ; gcc ; gcc-patches 
> 
> Cc:wishwu007 
> Subject:Re: Add support to trace comparison instructions and switch statements
>
>
> On 07/10/2017 06:07 AM, 吴潍浠(此彼) wrote:
>> Hi
>>
>> I write some codes to make gcc support comparison-guided fuzzing.
>> It is very like 
>> http://clang.llvm.org/docs/SanitizerCoverage.html#tracing-data-flow .
>> With -fsanitize-coverage=trace-cmp the compiler will insert extra 
>> instrumentation around comparison instructions and switch statements.
>> I think it is useful for fuzzing.  :D
>>
>> Patch is below, I may supply test cases later.
> Before anyone can really look at this code you'll need to get a
> copyright assignment on file with the FSF.
>
> See:
> https://gcc.gnu.org/contribute.html
>
> If you've already done this, please let me know and I'll confirm with
> the FSF copyright clerk.
>
> Jeff


Re: Add support to trace comparison instructions and switch statements

2017-09-12 Thread Dmitry Vyukov via gcc-patches
Some stats from kernel build for number of trace_cmp callbacks:

gcc
non-const: 38051
const: 272726
total: 310777

clang:
non-const: 45944
const: 266299
total: 312243

The total is quite close. Gcc seems to emit more const callbacks, which is good.



On Tue, Sep 12, 2017 at 4:32 PM, Dmitry Vyukov  wrote:
> On Thu, Sep 7, 2017 at 9:02 AM, 吴潍浠(此彼)  wrote:
>> Hi
>> The trace-div and trace-gep options seems be used to evaluate corpus
>> to trigger specific kind of bugs. And they don't have strong effect to 
>> coverage.
>>
>> The trace-pc-guard is useful, but it may be much more complex than trace-pc.
>> I think the best benefit of trace-pc-guard is avoiding dealing ASLR of 
>> programs,
>> especially programs with dlopen(). Seems hard to implement it in Linux 
>> kernel.
>>
>> The inline-8bit-counters and pc-table is too close to implementation of fuzz 
>> tool and
>> option trace-pc-guard .
>>
>> I am interesting in "stack-depth" and "func". If we trace user-defined 
>> functions enter and exit,
>> we can calculate stack-depth and difference level of stack to past existed 
>> stack.
>> Adding __sanitizer_cov_trace_pc_{enter,exit} is easy , but it is not 
>> standard of llvm.
>>
>> How do you think Dmitry ?
>>
>> Wish Wu
>>
>> --
>> From:Jakub Jelinek 
>> Time:2017 Sep 6 (Wed) 22:37
>> To:Wish Wu 
>> Cc:Dmitry Vyukov ; gcc-patches 
>> ; Jeff Law ; wishwu007 
>> 
>> Subject:Re: Add support to trace comparison instructions and switch 
>> statements
>>
>>
>> On Wed, Sep 06, 2017 at 07:47:29PM +0800, 吴潍浠(此彼) wrote:
>>> Hi Jakub
>>> I compiled libjpeg-turbo and libdng_sdk with options "-g -O3 -Wall 
>>> -fsanitize-coverage=trace-pc,trace-cmp -fsanitize=address".
>>> And run my fuzzer with pc and cmp feedbacks for hours. It works fine.
>>> About __sanitizer_cov_trace_cmp{f,d} , yes, it  isn't provided by llvm. But 
>>> once we trace integer comparisons, why not real type comparisons.
>>> I remember Dmitry said it is not enough useful to trace real type 
>>> comparisons because it is rare to see them in programs.
>>> But libdng_sdk really has real type comparisons. So I want to keep them and 
>>> implementing __sanitizer_cov_trace_const_cmp{f,d} may be necessary.
>>
>> Ok.  Please make sure those entrypoints make it into the various example
>> __sanitier_cov_trace* fuzzer implementations though, so that people using
>> -fsanitize-coverage=trace-cmp in GCC will not need to hack stuff themselves.
>> At least it should be added to sanitizer_common (both in LLVM and GCC).
>>
>> BTW, https://clang.llvm.org/docs/SanitizerCoverage.html shows various other
>> -fsanitize-coverage= options, some of them terribly misnamed (e.g. trace-gep
>> using some weirdo LLVM IL acronym instead of being named by what it really
>> traces (trace-array-idx or something similar)).
>>
>> Any plans to implement some or all of those?
>
>
> Thanks, Jakub!
>
> I've tested it on Linux kernel. Compiler does not crash, code is instrumented.
>
> Re  terribly misnamed trace-gep, dunno, I will leave this to Kostya.
>
> Re __sanitizer_cov_trace_cmp{f,d}, I am still not sure.
>
>> But libdng_sdk really has real type comparisons.
>
> Do they come from input data? In what format? How do you want to use
> them? E.g. if they come from input but with using some non-trivial
> transformation and the fuzzer will try to find them in the input, it
> won't be able to do so.
> On the other hand, it does not seem to be harmful, fuzzers that are
> not interested in them can just add empty callbacks.
>
> Re trace-pc-guard, I also don't have strong preference. Global
> variables should work for kernel, but we probably will not use them in
> kernel, because even aslr aside we would need to establish some global
> numbering of these globals across multiple different machines. But
> then it's much easier and simper to just use PCs as identifiers.
>
> Re __sanitizer_cov_trace_pc_{enter,exit}, I don't think we ever
> experimented/evaluated this idea. Do you have any data that it's
> useful? I suspect that it can grow corpus too much.


Re: Add support to trace comparison instructions and switch statements

2017-09-12 Thread Dmitry Vyukov via gcc-patches
On Thu, Sep 7, 2017 at 9:02 AM, 吴潍浠(此彼)  wrote:
> Hi
> The trace-div and trace-gep options seems be used to evaluate corpus
> to trigger specific kind of bugs. And they don't have strong effect to 
> coverage.
>
> The trace-pc-guard is useful, but it may be much more complex than trace-pc.
> I think the best benefit of trace-pc-guard is avoiding dealing ASLR of 
> programs,
> especially programs with dlopen(). Seems hard to implement it in Linux kernel.
>
> The inline-8bit-counters and pc-table is too close to implementation of fuzz 
> tool and
> option trace-pc-guard .
>
> I am interesting in "stack-depth" and "func". If we trace user-defined 
> functions enter and exit,
> we can calculate stack-depth and difference level of stack to past existed 
> stack.
> Adding __sanitizer_cov_trace_pc_{enter,exit} is easy , but it is not standard 
> of llvm.
>
> How do you think Dmitry ?
>
> Wish Wu
>
> --
> From:Jakub Jelinek 
> Time:2017 Sep 6 (Wed) 22:37
> To:Wish Wu 
> Cc:Dmitry Vyukov ; gcc-patches ; 
> Jeff Law ; wishwu007 
> Subject:Re: Add support to trace comparison instructions and switch statements
>
>
> On Wed, Sep 06, 2017 at 07:47:29PM +0800, 吴潍浠(此彼) wrote:
>> Hi Jakub
>> I compiled libjpeg-turbo and libdng_sdk with options "-g -O3 -Wall 
>> -fsanitize-coverage=trace-pc,trace-cmp -fsanitize=address".
>> And run my fuzzer with pc and cmp feedbacks for hours. It works fine.
>> About __sanitizer_cov_trace_cmp{f,d} , yes, it  isn't provided by llvm. But 
>> once we trace integer comparisons, why not real type comparisons.
>> I remember Dmitry said it is not enough useful to trace real type 
>> comparisons because it is rare to see them in programs.
>> But libdng_sdk really has real type comparisons. So I want to keep them and 
>> implementing __sanitizer_cov_trace_const_cmp{f,d} may be necessary.
>
> Ok.  Please make sure those entrypoints make it into the various example
> __sanitier_cov_trace* fuzzer implementations though, so that people using
> -fsanitize-coverage=trace-cmp in GCC will not need to hack stuff themselves.
> At least it should be added to sanitizer_common (both in LLVM and GCC).
>
> BTW, https://clang.llvm.org/docs/SanitizerCoverage.html shows various other
> -fsanitize-coverage= options, some of them terribly misnamed (e.g. trace-gep
> using some weirdo LLVM IL acronym instead of being named by what it really
> traces (trace-array-idx or something similar)).
>
> Any plans to implement some or all of those?


Thanks, Jakub!

I've tested it on Linux kernel. Compiler does not crash, code is instrumented.

Re  terribly misnamed trace-gep, dunno, I will leave this to Kostya.

Re __sanitizer_cov_trace_cmp{f,d}, I am still not sure.

> But libdng_sdk really has real type comparisons.

Do they come from input data? In what format? How do you want to use
them? E.g. if they come from input but with using some non-trivial
transformation and the fuzzer will try to find them in the input, it
won't be able to do so.
On the other hand, it does not seem to be harmful, fuzzers that are
not interested in them can just add empty callbacks.

Re trace-pc-guard, I also don't have strong preference. Global
variables should work for kernel, but we probably will not use them in
kernel, because even aslr aside we would need to establish some global
numbering of these globals across multiple different machines. But
then it's much easier and simper to just use PCs as identifiers.

Re __sanitizer_cov_trace_pc_{enter,exit}, I don't think we ever
experimented/evaluated this idea. Do you have any data that it's
useful? I suspect that it can grow corpus too much.


Re: Add support to trace comparison instructions and switch statements

2017-09-03 Thread Dmitry Vyukov via gcc-patches
On Fri, Sep 1, 2017 at 6:23 PM, Jakub Jelinek  wrote:
> On Fri, Jul 21, 2017 at 01:38:17PM +0800, 吴潍浠(此彼) wrote:
>> Hi Jeff
>>
>> I have signed the copyright assignment, and used the name 'Wish Wu' .
>> Should I send you a copy of my assignment ?
>>
>> The attachment is my new patch with small changes.
>> Codes are checked by ./contrib/check_GNU_style.sh, except some special files.
>
> Please provide a ChangeLog entry, you can use ./contrib/mklog as a start.
>
> @@ -975,6 +974,10 @@ fsanitize=
>  Common Driver Report Joined
>  Select what to sanitize.
>
> +fsanitize-coverage=
> +Common Driver Report Joined
> +Select what to coverage sanitize.
> +
>
> Why Driver?  The reason fsanitize= needs it is that say for
> -fsanitize=address we add libraries in the driver, etc., but that
> isn't the case for the coverage, right?


Yes, there is no compiler-provided library that provides
implementation of the emitted instrumentation. User is meant to
provide them (or, use a third-party fuzzer that provides them).



> --- gcc/flag-types.h(revision 250199)
> +++ gcc/flag-types.h(working copy)
> @@ -250,6 +250,14 @@ enum sanitize_code {
>   | SANITIZE_BOUNDS_STRICT
>  };
>
> +/* Different trace modes.  */
> +enum sanitize_coverage_code {
> +  /* Trace PC.  */
> +  SANITIZE_COV_TRACE_PC = 1UL << 0,
> +  /* Trace Compare.  */
> +  SANITIZE_COV_TRACE_CMP = 1UL << 1
> +};
>
> No need for UL suffixes, the reason sanitize_code uses them is
> that it includes 1 << 16 and above and might be included even in target code
> (for host we require 32-bit integers, for target code it might be just
> 16-bit).
>
> --- gcc/opts.c  (revision 250199)
> +++ gcc/opts.c  (working copy)
> @@ -1519,6 +1519,17 @@ const struct sanitizer_opts_s sanitizer_opts[] =
>{ NULL, 0U, 0UL, false }
>  };
>
> +/* -f{,no-}sanitize-coverage= suboptions.  */
> +const struct sanitizer_opts_s coverage_sanitizer_opts[] =
> +{
> +#define SANITIZER_OPT(name, flags, recover) \
> +{ #name, flags, sizeof #name - 1, recover }
> +  SANITIZER_OPT (trace-pc, SANITIZE_COV_TRACE_PC, false),
> +  SANITIZER_OPT (trace-cmp, SANITIZE_COV_TRACE_CMP, false),
> +#undef SANITIZER_OPT
> +  { NULL, 0U, 0UL, false }
>
> No need to have the recover argument for the macro, just add false to it
> (unless you want to use a different struct type that wouldn't even include
> that member).
>
> +/* Given ARG, an unrecognized coverage sanitizer option, return the best
> +   matching coverage sanitizer option, or NULL if there isn't one.  */
> +
> +static const char *
> +get_closest_coverage_sanitizer_option (const string_fragment )
> +{
> +  best_match  bm (arg);
> +  for (int i = 0; coverage_sanitizer_opts[i].name != NULL; ++i)
> +{
> +  bm.consider (coverage_sanitizer_opts[i].name);
> +}
>
> Body which contains just one line shouldn't be wrapped in {}s, just use
>   for (int i = 0; coverage_sanitizer_opts[i].name != NULL; ++i)
> bm.consider (coverage_sanitizer_opts[i].name);
>
> +unsigned int
> +parse_coverage_sanitizer_options (const char *p, location_t loc,
> +unsigned int flags, int value, bool complain)
>
> Wrong formatting, unsigned int should go below const char *, like:
>
> parse_coverage_sanitizer_options (const char *p, location_t loc,
>   unsigned int flags, int value, bool 
> complain)
>
> +{
> +  while (*p != 0)
> +{
> +  size_t len, i;
> +  bool found = false;
> +  const char *comma = strchr (p, ',');
> +
> +  if (comma == NULL)
> +   len = strlen (p);
> +  else
> +   len = comma - p;
> +  if (len == 0)
> +   {
> + p = comma + 1;
> + continue;
> +   }
> +
> +  /* Check to see if the string matches an option class name.  */
> +  for (i = 0; coverage_sanitizer_opts[i].name != NULL; ++i)
> +   if (len == coverage_sanitizer_opts[i].len
> +   && memcmp (p, coverage_sanitizer_opts[i].name, len) == 0)
> + {
> +   if (value)
> + flags |= coverage_sanitizer_opts[i].flag;
> +   else
> + flags &= ~coverage_sanitizer_opts[i].flag;
> +   found = true;
> +   break;
> + }
> +
> +  if (! found && complain)
> +   {
> + const char *hint
> +   = get_closest_coverage_sanitizer_option (string_fragment (p, 
> len));
> +
> + if (hint)
> +   error_at (loc,
> + "unrecognized argument to "
> + "-f%ssanitize-coverage= option: %q.*s;"
> + " did you mean %qs?",
> + value ? "" : "no-",
> + (int) len, p, hint);
> + else
> +   error_at (loc,
> + "unrecognized argument to "
> + "-f%ssanitize-coverage= option: %q.*s",
> + value ? "" : "no-",
> + (int) len, p);
> +   }
> +
> +  if (comma == 

Re: Add support to trace comparison instructions and switch statements

2017-09-03 Thread Dmitry Vyukov via gcc-patches
On Sun, Sep 3, 2017 at 12:01 PM, Jakub Jelinek  wrote:
> On Sun, Sep 03, 2017 at 10:50:16AM +0200, Dmitry Vyukov wrote:
>> What we instrument in LLVM is _comparisons_ rather than control
>> structures. So that would be:
>> _4 = x_8(D) == 98;
>> For example, result of the comparison can be stored into a bool struct
>> field, and then used in branching long time after. We still want to
>> intercept this comparison.
>
> Then we need to instrument not just GIMPLE_COND, which is the stmt
> where the comparison decides to which of the two basic block successors to
> jump, but also GIMPLE_ASSIGN with tcc_comparison class
> gimple_assign_rhs_code (the comparison above), and maybe also
> GIMPLE_ASSIGN with COND_EXPR comparison code (that is say
>   _4 = x_1 == y_2 ? 23 : _3;
> ).
>
>> > Perhaps for -fsanitize-coverage= it might be a good idea to force
>> > LOGICAL_OP_NON_SHORT_CIRCUIT/BRANCH_COST or whatever affects GIMPLE
>> > decisions mentioned above so that the IL is closer to what the user wrote.
>>
>> If we recurse down to comparison operations and instrument them, this
>> will not be so important, right?
>
> Well, if you just handle tcc_comparison GIMPLE_ASSIGN and not GIMPLE_COND,
> then you don't handle many comparisons from the source code.  And if you
> handle both, some of the GIMPLE_CONDs might be just artificial comparisons.
> By pretending small branch cost for the tracing case you get fewer
> artificial comparisons.


Are these artificial comparisons on BOOLEAN_TYPE? I think BOOLEAN_TYPE
needs to be ignored entirely, there is just like 2 combinations of
possible values.
If not, then what it is? Is it a dup of previous comparisons?

I am not saying these modes should not be enabled. You know much
better. I just wanted to point that that integer comparisons is what
we should be handling.

Your example:

  _1 = x_8(D) == 21;
  _2 = x_8(D) == 64;
  _3 = _1 | _2;
  if (_3 != 0)

raises another point. Most likely we don't want to see speculative
comparisons. At least not yet (we will see them once we get through
the first comparison). So that may be another reason to enable these
modes (make compiler stick closer to original code).


Re: Add support to trace comparison instructions and switch statements

2017-09-03 Thread Dmitry Vyukov via gcc-patches
On Sun, Sep 3, 2017 at 12:19 PM, Dmitry Vyukov  wrote:
> On Sun, Sep 3, 2017 at 12:01 PM, Jakub Jelinek  wrote:
>> On Sun, Sep 03, 2017 at 10:50:16AM +0200, Dmitry Vyukov wrote:
>>> What we instrument in LLVM is _comparisons_ rather than control
>>> structures. So that would be:
>>> _4 = x_8(D) == 98;
>>> For example, result of the comparison can be stored into a bool struct
>>> field, and then used in branching long time after. We still want to
>>> intercept this comparison.
>>
>> Then we need to instrument not just GIMPLE_COND, which is the stmt
>> where the comparison decides to which of the two basic block successors to
>> jump, but also GIMPLE_ASSIGN with tcc_comparison class
>> gimple_assign_rhs_code (the comparison above), and maybe also
>> GIMPLE_ASSIGN with COND_EXPR comparison code (that is say
>>   _4 = x_1 == y_2 ? 23 : _3;
>> ).
>>
>>> > Perhaps for -fsanitize-coverage= it might be a good idea to force
>>> > LOGICAL_OP_NON_SHORT_CIRCUIT/BRANCH_COST or whatever affects GIMPLE
>>> > decisions mentioned above so that the IL is closer to what the user wrote.
>>>
>>> If we recurse down to comparison operations and instrument them, this
>>> will not be so important, right?
>>
>> Well, if you just handle tcc_comparison GIMPLE_ASSIGN and not GIMPLE_COND,
>> then you don't handle many comparisons from the source code.  And if you
>> handle both, some of the GIMPLE_CONDs might be just artificial comparisons.
>> By pretending small branch cost for the tracing case you get fewer
>> artificial comparisons.
>
>
> Are these artificial comparisons on BOOLEAN_TYPE? I think BOOLEAN_TYPE
> needs to be ignored entirely, there is just like 2 combinations of
> possible values.
> If not, then what it is? Is it a dup of previous comparisons?
>
> I am not saying these modes should not be enabled. You know much
> better. I just wanted to point that that integer comparisons is what
> we should be handling.
>
> Your example:
>
>   _1 = x_8(D) == 21;
>   _2 = x_8(D) == 64;
>   _3 = _1 | _2;
>   if (_3 != 0)
>
> raises another point. Most likely we don't want to see speculative
> comparisons. At least not yet (we will see them once we get through
> the first comparison). So that may be another reason to enable these
> modes (make compiler stick closer to original code).

Wait, it is not speculative in this case as branch is on _1 | _2. But
still, it just makes it harder for fuzzer to get through as it needs
to guess both values at the same time rather then doing incremental
progress.


Re: Add support to trace comparison instructions and switch statements

2017-09-03 Thread Dmitry Vyukov via gcc-patches
On Sun, Sep 3, 2017 at 12:38 PM, 吴潍浠(此彼)  wrote:
> Hi
> I will update the patch according to your requirements, and with some my 
> suggestions.
> It will take me one or two days.

Thanks! No hurry, just wanted to make sure you still want to pursue this.

> Wish Wu
>
> --
> From:Dmitry Vyukov 
> Time:2017 Sep 3 (Sun) 18:21
> To:Jakub Jelinek 
> Cc:Wish Wu ; gcc ; gcc-patches 
> ; Jeff Law ; wishwu007 
> 
> Subject:Re: Add support to trace comparison instructions and switch statements
>
>
> On Sun, Sep 3, 2017 at 12:19 PM, Dmitry Vyukov  wrote:
>> On Sun, Sep 3, 2017 at 12:01 PM, Jakub Jelinek  wrote:
>>> On Sun, Sep 03, 2017 at 10:50:16AM +0200, Dmitry Vyukov wrote:
 What we instrument in LLVM is _comparisons_ rather than control
 structures. So that would be:
 _4 = x_8(D) == 98;
 For example, result of the comparison can be stored into a bool struct
 field, and then used in branching long time after. We still want to
 intercept this comparison.
>>>
>>> Then we need to instrument not just GIMPLE_COND, which is the stmt
>>> where the comparison decides to which of the two basic block successors to
>>> jump, but also GIMPLE_ASSIGN with tcc_comparison class
>>> gimple_assign_rhs_code (the comparison above), and maybe also
>>> GIMPLE_ASSIGN with COND_EXPR comparison code (that is say
>>>   _4 = x_1 == y_2 ? 23 : _3;
>>> ).
>>>
 > Perhaps for -fsanitize-coverage= it might be a good idea to force
 > LOGICAL_OP_NON_SHORT_CIRCUIT/BRANCH_COST or whatever affects GIMPLE
 > decisions mentioned above so that the IL is closer to what the user 
 > wrote.

 If we recurse down to comparison operations and instrument them, this
 will not be so important, right?
>>>
>>> Well, if you just handle tcc_comparison GIMPLE_ASSIGN and not GIMPLE_COND,
>>> then you don't handle many comparisons from the source code.  And if you
>>> handle both, some of the GIMPLE_CONDs might be just artificial comparisons.
>>> By pretending small branch cost for the tracing case you get fewer
>>> artificial comparisons.
>>
>>
>> Are these artificial comparisons on BOOLEAN_TYPE? I think BOOLEAN_TYPE
>> needs to be ignored entirely, there is just like 2 combinations of
>> possible values.
>> If not, then what it is? Is it a dup of previous comparisons?
>>
>> I am not saying these modes should not be enabled. You know much
>> better. I just wanted to point that that integer comparisons is what
>> we should be handling.
>>
>> Your example:
>>
>>   _1 = x_8(D) == 21;
>>   _2 = x_8(D) == 64;
>>   _3 = _1 | _2;
>>   if (_3 != 0)
>>
>> raises another point. Most likely we don't want to see speculative
>> comparisons. At least not yet (we will see them once we get through
>> the first comparison). So that may be another reason to enable these
>> modes (make compiler stick closer to original code).
>
> Wait, it is not speculative in this case as branch is on _1 | _2. But
> still, it just makes it harder for fuzzer to get through as it needs
> to guess both values at the same time rather then doing incremental
> progress.


Re: [PATCH] tsan: Add optional support for distinguishing volatiles

2020-04-28 Thread Dmitry Vyukov via Gcc-patches
On Tue, Apr 28, 2020 at 4:55 PM Jakub Jelinek  wrote:
>
> On Tue, Apr 28, 2020 at 04:48:31PM +0200, Dmitry Vyukov wrote:
> > FWIW this is:
> >
> > Acked-by: Dmitry Vyukov 
> >
> > We just landed a similar change to llvm:
> > https://github.com/llvm/llvm-project/commit/5a2c31116f412c3b6888be361137efd705e05814
> >
> > Do you have any objections?
>
> I don't have objections or anything right now, we are just trying to
> finalize GCC 10 and once it branches, patches like this can be
> reviewed/committed for GCC11.

Thanks for clarification!
Then we will just wait.


Re: [PATCH] tsan: Add optional support for distinguishing volatiles

2020-04-28 Thread Dmitry Vyukov via Gcc-patches
On Thu, Apr 23, 2020 at 5:43 PM Marco Elver  wrote:
>
> Add support to optionally emit different instrumentation for accesses to
> volatile variables. While the default TSAN runtime likely will never
> require this feature, other runtimes for different environments that
> have subtly different memory models or assumptions may require
> distinguishing volatiles.
>
> One such environment are OS kernels, where volatile is still used in
> various places for various reasons, and often declare volatile to be
> "safe enough" even in multi-threaded contexts. One such example is the
> Linux kernel, which implements various synchronization primitives using
> volatile (READ_ONCE(), WRITE_ONCE()). Here the Kernel Concurrency
> Sanitizer (KCSAN) [1], is a runtime that uses TSAN instrumentation but
> otherwise implements a very different approach to race detection from
> TSAN.
>
> While in the Linux kernel it is generally discouraged to use volatiles
> explicitly, the topic will likely come up again, and we will eventually
> need to distinguish volatile accesses [2]. The other use-case is
> ignoring data races on specially marked variables in the kernel, for
> example bit-flags (here we may hide 'volatile' behind a different name
> such as 'no_data_race').
>
> [1] https://github.com/google/ktsan/wiki/KCSAN
> [2] 
> https://lkml.kernel.org/r/CANpmjNOfXNE-Zh3MNP=-gmnhvKbsfUfTtWkyg_=vqtxs4nn...@mail.gmail.com


Hi Jakub,

FWIW this is:

Acked-by: Dmitry Vyukov 

We just landed a similar change to llvm:
https://github.com/llvm/llvm-project/commit/5a2c31116f412c3b6888be361137efd705e05814

Do you have any objections?
Yes, I know volatile is not related to threading :) But 5 years we
have a similar patch for gcc for another race detector prototype:
https://gist.github.com/xairy/862ba3260348efe23a37decb93aa79e9
So this is not the first time this comes up.

Thanks




> 2020-04-23  Marco Elver  
>
> gcc/
> * params.opt: Define --param=tsan-distinguish-volatile=[0,1].
> * sanitizer.def (BUILT_IN_TSAN_VOLATILE_READ1): Define new
> builtin for volatile instrumentation of reads/writes.
> (BUILT_IN_TSAN_VOLATILE_READ2): Likewise.
> (BUILT_IN_TSAN_VOLATILE_READ4): Likewise.
> (BUILT_IN_TSAN_VOLATILE_READ8): Likewise.
> (BUILT_IN_TSAN_VOLATILE_READ16): Likewise.
> (BUILT_IN_TSAN_VOLATILE_WRITE1): Likewise.
> (BUILT_IN_TSAN_VOLATILE_WRITE2): Likewise.
> (BUILT_IN_TSAN_VOLATILE_WRITE4): Likewise.
> (BUILT_IN_TSAN_VOLATILE_WRITE8): Likewise.
> (BUILT_IN_TSAN_VOLATILE_WRITE16): Likewise.
> * tsan.c (get_memory_access_decl): Argument if access is
> volatile. If param tsan-distinguish-volatile is non-zero, and
> access if volatile, return volatile instrumentation decl.
> (instrument_expr): Check if access is volatile.
>
> gcc/testsuite/
> * c-c++-common/tsan/volatile.c: New test.
> ---
>  gcc/ChangeLog  | 19 +++
>  gcc/params.opt |  4 ++
>  gcc/sanitizer.def  | 21 
>  gcc/testsuite/ChangeLog|  4 ++
>  gcc/testsuite/c-c++-common/tsan/volatile.c | 62 ++
>  gcc/tsan.c | 53 --
>  6 files changed, 146 insertions(+), 17 deletions(-)
>  create mode 100644 gcc/testsuite/c-c++-common/tsan/volatile.c
>
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index 5f299e463db..aa2bb98ae05 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,22 @@
> +2020-04-23  Marco Elver  
> +
> +   * params.opt: Define --param=tsan-distinguish-volatile=[0,1].
> +   * sanitizer.def (BUILT_IN_TSAN_VOLATILE_READ1): Define new
> +   builtin for volatile instrumentation of reads/writes.
> +   (BUILT_IN_TSAN_VOLATILE_READ2): Likewise.
> +   (BUILT_IN_TSAN_VOLATILE_READ4): Likewise.
> +   (BUILT_IN_TSAN_VOLATILE_READ8): Likewise.
> +   (BUILT_IN_TSAN_VOLATILE_READ16): Likewise.
> +   (BUILT_IN_TSAN_VOLATILE_WRITE1): Likewise.
> +   (BUILT_IN_TSAN_VOLATILE_WRITE2): Likewise.
> +   (BUILT_IN_TSAN_VOLATILE_WRITE4): Likewise.
> +   (BUILT_IN_TSAN_VOLATILE_WRITE8): Likewise.
> +   (BUILT_IN_TSAN_VOLATILE_WRITE16): Likewise.
> +   * tsan.c (get_memory_access_decl): Argument if access is
> +   volatile. If param tsan-distinguish-volatile is non-zero, and
> +   access if volatile, return volatile instrumentation decl.
> +   (instrument_expr): Check if access is volatile.
> +
>  2020-04-23  Srinath Parvathaneni  
>
> * config/arm/arm_mve.h (__arm_vbicq_n_u16): Modify function 
> parameter's
> diff --git a/gcc/params.opt b/gcc/params.opt
> index 4aec480798b..9b564bb046c 100644
> --- a/gcc/params.opt
> +++ b/gcc/params.opt
> @@ -908,6 +908,10 @@ Stop reverse growth if the reverse probability of best 
> edge is less than this th
>  Common Joined UInteger Var(param_tree_reassoc_width) Param 

Re: [PATCH] tsan: Add optional support for distinguishing volatiles

2020-05-18 Thread Dmitry Vyukov via Gcc-patches
On Wed, May 13, 2020 at 12:48 PM Marco Elver  wrote:
> > Hello, Jakub,
> >
> > On Tue, 28 Apr 2020 at 16:58, Dmitry Vyukov  wrote:
> > >
> > > On Tue, Apr 28, 2020 at 4:55 PM Jakub Jelinek  wrote:
> > > >
> > > > On Tue, Apr 28, 2020 at 04:48:31PM +0200, Dmitry Vyukov wrote:
> > > > > FWIW this is:
> > > > >
> > > > > Acked-by: Dmitry Vyukov 
> > > > >
> > > > > We just landed a similar change to llvm:
> > > > > https://github.com/llvm/llvm-project/commit/5a2c31116f412c3b6888be361137efd705e05814
> > > > >
> > > > > Do you have any objections?
> > > >
> > > > I don't have objections or anything right now, we are just trying to
> > > > finalize GCC 10 and once it branches, patches like this can be
> > > > reviewed/committed for GCC11.
> > >
> > > Thanks for clarification!
> > > Then we will just wait.
> >
> > Just saw the announcement that GCC11 is in development stage 1 [1]. In
> > case it is still too early, do let us know what time window we shall
> > follow up.
> >
> > Would it be useful to rebase and resend the patch?
>
> So, it's starting to look like we're really going to need this sooner
> than later. Given the feature is guarded behind a flag, and otherwise
> does not affect anything else, would it be possible to take this for
> GCC11? What do we need to do to make this happen?
>
> Thanks,
> -- Marco
>
> > [1] https://gcc.gnu.org/pipermail/gcc/2020-April/000505.html

Jakub, could you please give some update. Do we just wait? That's
fine, just want to understand because there are some interesting
discussions in the kernel re bumping compiler requirements.
Thanks