Re: HELP!! How to inhibit optimizations applied to .DEFERRED_INIT argument?

2021-06-30 Thread Qing Zhao via Gcc-patches


On Jun 30, 2021, at 2:46 AM, Richard Biener 
mailto:rguent...@suse.de>> wrote:

On Wed, 30 Jun 2021, Qing Zhao wrote:

Hi,

I am testing the 4th patch of -ftrivial-auto-var-init with CPU2017 today, and 
found the following issues:

In the dump file of “*t.i.031t.objsz1”, we have:

 :
 __s1_len_217 = .DEFERRED_INIT (__s1_len_176, 2);
 __s2_len_218 = .DEFERRED_INIT (__s2_len_177, 2);

I looks like this .DEFERRED_INIT initializes an already initialized
variable.

Yes.

For cases like the following:

int s2_len;
s2_len = 4;

i.e, the initialization is not at the declaration.

We cannot avoid initialization for such cases.

 I'd expect to only ever see default definition SSA names
as first argument to .DEFERRED_INIT.

You mean something like:
__s2_len_218 = .DEFERRED_INIT (__s2_len, 2);

?


 __s2_len_219 = 7;
 if (__s2_len_219 <= 3)
   goto ; [INV]
 else
   goto ; [INV]

  :
 _1 = (long unsigned int) i_175;


However, after “ccp”, in “t.i.032t.ccp1”, we have:

 :
 __s1_len_217 = .DEFERRED_INIT (__s1_len_176, 2);
 __s2_len_218 = .DEFERRED_INIT (7, 2);
 _36 = (long unsigned int) i_175;
 _37 = _36 * 8;
 _38 = argv_220(D) + _37;


Looks like that the optimization “ccp” replaced the first argument of the call 
.DEFERRED_INIT with the constant 7.
This should be avoided.

(NOTE, this issue existed in the previous patches, however, only exposed with 
this version since I added more verification
code in tree-cfg.c to verify the call to .DEFERRED_INIT).

I am wondering what’s the best solution to this problem?

I think you have to trace where this "bogus" .DEFERRED_INIT comes from
originally.  Or alternatively, if this is unavoidable,

This is unavoidable, I believe.

add "constant
folding" of .DEFERRED_INIT so that defered init of an initialized
object becomes the object itself, thus retain the previous - eventually
partial - initialization only.

If this additional .DEFERRED_INIT will be kept till RTL expansion phase, then 
it will become a real initialization:

i.e.

s2_len = 0;//.DEFERRED_INIT expanded
s2_len = 4;// the original initialization

Then the first initialization will be eliminated by current RTL optimization 
easily, right?

Qing


Richard.

Can we add any attribute to the internal function argument to prevent later 
optimizations that might applied on it?
Or just update “ccp” phase to specially handle calls to .DEFERRED_INIT? (Not 
sure whether there are other phases have the
Same issue?)

Let me know if you have any suggestion.

Thanks a lot for your help.

Qing

--
Richard Biener mailto:rguent...@suse.de>>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)



HELP!! How to inhibit optimizations applied to .DEFERRED_INIT argument?

2021-06-29 Thread Qing Zhao via Gcc-patches
Hi, 

I am testing the 4th patch of -ftrivial-auto-var-init with CPU2017 today, and 
found the following issues:

In the dump file of “*t.i.031t.objsz1”, we have:

 :
  __s1_len_217 = .DEFERRED_INIT (__s1_len_176, 2);
  __s2_len_218 = .DEFERRED_INIT (__s2_len_177, 2);
  __s2_len_219 = 7;
  if (__s2_len_219 <= 3)
goto ; [INV]
  else
goto ; [INV]

   :
  _1 = (long unsigned int) i_175;
 

However, after “ccp”, in “t.i.032t.ccp1”, we have:

 :
  __s1_len_217 = .DEFERRED_INIT (__s1_len_176, 2);
  __s2_len_218 = .DEFERRED_INIT (7, 2);
  _36 = (long unsigned int) i_175;
  _37 = _36 * 8;
  _38 = argv_220(D) + _37;


Looks like that the optimization “ccp” replaced the first argument of the call 
.DEFERRED_INIT with the constant 7.
This should be avoided. 

(NOTE, this issue existed in the previous patches, however, only exposed with 
this version since I added more verification
code in tree-cfg.c to verify the call to .DEFERRED_INIT).

I am wondering what’s the best solution to this problem? 

Can we add any attribute to the internal function argument to prevent later 
optimizations that might applied on it? 
Or just update “ccp” phase to specially handle calls to .DEFERRED_INIT? (Not 
sure whether there are other phases have the
Same issue?)

Let me know if you have any suggestion.

Thanks a lot for your help.

Qing

Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-06-22 Thread Qing Zhao via Gcc-patches
Okay.

Now, I believe that we agreed on the following:

For this current patch:

1. Use byte-repeatable pattern for pattern-initialization;
2. Use one pattern for all types;
3. Use “0xFE” for the byte pattern value.

Possible future improvement:

1. Type specific patterns if needed;
2. User-specified pattern if needed; (add a new option for user to change the 
patterns).
3. Make the code generation part a target hook if needed.

Let me know if I miss anything.

Thanks.

Qing

> On Jun 22, 2021, at 1:18 PM, Richard Sandiford  
> wrote:
> 
> Kees Cook  writes:
>> On Tue, Jun 22, 2021 at 09:25:57AM +0100, Richard Sandiford wrote:
>>> Kees Cook  writes:
>>>> On Mon, Jun 21, 2021 at 03:39:45PM +, Qing Zhao wrote:
>>>>> So, if “pattern value” is “0x”, then it’s a valid 
>>>>> canonical virtual memory address.  However, for most OS, 
>>>>> “0x” should be not in user space.
>>>>> 
>>>>> My question is, is “0xF” good for pointer? Or 
>>>>> “0x” better?
>>>> 
>>>> I think 0xFF repeating is fine for this version. Everything else is a
>>>> "nice to have" for the pattern-init, IMO. :)
>>> 
>>> Sorry to be awkward, but 0xFF seems worse than 0xAA to me.
>>> 
>>> For integer types, all values are valid representations, and we're
>>> relying on the pattern being “obviously” wrong in context.  0x…
>>> is unlikely to be a correct integer but 0x… would instead be a
>>> “nice” -1.  It would be difficult to tell in a debugger that a -1
>>> came from pattern init rather than a deliberate choice.
>> 
>> I can live with 0xAA. On x86_64, this puts it nicely in the middle of
>> the middle of the non-canonical space:
>> 
>> 0x8000 - 0x7fff
>> 
>> The only trouble is with 32-bit, where the value 0x is a
>> legitimate allocatable userspace address. If we want some kind-of middle
>> ground, how about 0xFE? That'll be non-canonical on x86_64, and at the
>> high end of the i386 kernel address space.
> 
> Sounds good to me FWIW.  That'd give float -1.694739530317379e+38
> (suspiciously big even for astrophysics, I hope!) and would still
> look unusual in an integer context.
> 
>>> I agree that, all other things being equal, it would be nice to use NaNs
>>> for floats.  But relying on wrong numerical values for floats doesn't
>>> seem worse than doing that for integers.
>>> 
>>> 0xAA… for float is (if I've got this right) -3.0316488252093987e-13,
>>> which admittedly doesn't stand out as wrong.  But I'm not sure we
>>> should sacrifice integer debugging for float debugging here.
>> 
>> In some future version type-specific patterns would be a nice improvement,
>> but I don't want that to block getting the zero-init portion landed. :)
> 
> Yeah.
> 
> Thanks,
> Richard



Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-06-22 Thread Qing Zhao via Gcc-patches


> On Jun 22, 2021, at 9:15 AM, Richard Biener  wrote:
> 
> On Tue, 22 Jun 2021, Qing Zhao wrote:
> 
>> 
>> 
>>> On Jun 22, 2021, at 9:00 AM, Richard Biener  wrote:
>>> 
>>> On Tue, 22 Jun 2021, Qing Zhao wrote:
>>> 
>>>> So, I am wondering why not still keep my current implementation on 
>>>> assign different patterns for different types?
>>>> 
>>>> This major issue with this design is the code size and runtime overhead, 
>>>> but for debugging purpose, those are not that important, right? And we 
>>>> can add some optimization later to improve the code size and runtime 
>>>> overhead.
>>>> 
>>>> Otherwise, if we only use one pattern for all the types in this initial 
>>>> version, later we still might need change it.
>>>> 
>>>> How do you think?
>>> 
>>> No, let's not re-open that discussion.  As said we can look to support
>>> multi-byte pattern if that has a chance to improve things but only
>>> as followup.
>> 
>> I am fine with this.
>> 
>> However, we need to decide whether we will use one-byte repeatable pattern, 
>> or multiple-byte repeatable pattern now,
>> Since the implementation will be different. If using one-byte, the 
>> implementation will be the simplest, we can use memset for all
>> VLA, non-vla, zero-init, or pattern-init consistently.
>> 
>> However, if we choose multiple-byte pattern, then the implementation will be 
>> different, we cannot use memset for pattern-init, and 
>> The implemenation for VLA pattern-init also is different.
> 
> As said, we can do this as followup.  For now get the easiest thing
> working - one-byte patterns via memset.  

Okay. I will work on this.

> There's enough bits in the
> patch that will likely need followup fixes (the .DEFERED_INIT stuff),

Do you mean your previous suggestion to merge the handling of VLA to non-VLA 
during gimplification phase?
I have done with this change locally.

> actual code gneration of the init is separate enough we can deal with
> it later.  Also IMHO not all targets necessarily need to behave the
> same there.

Then, shall we make the code generation part a target hook now? Or do this 
later?

Qing
> 
> Richard.
> 
>> Qing
>>> 
>>> Thanks,
>>> Richard.
>>> 
>>>> Qing
>>>> 
>>>> On Jun 22, 2021, at 3:59 AM, Richard Biener 
>>>> mailto:rguent...@suse.de>> wrote:
>>>> 
>>>> On Tue, 22 Jun 2021, Richard Sandiford wrote:
>>>> 
>>>> Kees Cook mailto:keesc...@chromium.org>> writes:
>>>> On Mon, Jun 21, 2021 at 03:39:45PM +, Qing Zhao wrote:
>>>> So, if “pattern value” is “0x”, then it’s a valid 
>>>> canonical virtual memory address.  However, for most OS, 
>>>> “0x” should be not in user space.
>>>> 
>>>> My question is, is “0xF” good for pointer? Or 
>>>> “0x” better?
>>>> 
>>>> I think 0xFF repeating is fine for this version. Everything else is a
>>>> "nice to have" for the pattern-init, IMO. :)
>>>> 
>>>> Sorry to be awkward, but 0xFF seems worse than 0xAA to me.
>>>> 
>>>> For integer types, all values are valid representations, and we're
>>>> relying on the pattern being “obviously” wrong in context.  0x…
>>>> is unlikely to be a correct integer but 0x… would instead be a
>>>> “nice” -1.  It would be difficult to tell in a debugger that a -1
>>>> came from pattern init rather than a deliberate choice.
>>>> 
>>>> I agree that, all other things being equal, it would be nice to use NaNs
>>>> for floats.  But relying on wrong numerical values for floats doesn't
>>>> seem worse than doing that for integers.
>>>> 
>>>> 0xAA… for float is (if I've got this right) -3.0316488252093987e-13,
>>>> which admittedly doesn't stand out as wrong.  But I'm not sure we
>>>> should sacrifice integer debugging for float debugging here.
>>>> 
>>>> We can always expose the actual value as --param.  Now, I think
>>>> we'd need a two-byte pattern to reliably produce NaNs anyway,
>>>> so with floats taken out of the picture the focus should be on
>>>> pointers where IMHO val & 1 and val & 15 would be nice to have.
>>>> So sth like 0xf7 would work for those.  With a two-byte pattern
>>>> we could use 0xffef or 0x7fef.
>>>> 
>>>> Anyway, it's probably down to priorities of the project involved
>>>> (debugging FP stuff or integer stuff).
>>>> 
>>>> Richard.
>>>> 
>>>> 
>>> 
>>> -- 
>>> Richard Biener 
>>> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
>>> Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
>> 
>> 
> 
> -- 
> Richard Biener 
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)



Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-06-22 Thread Qing Zhao via Gcc-patches


> On Jun 22, 2021, at 9:00 AM, Richard Biener  wrote:
> 
> On Tue, 22 Jun 2021, Qing Zhao wrote:
> 
>> So, I am wondering why not still keep my current implementation on 
>> assign different patterns for different types?
>> 
>> This major issue with this design is the code size and runtime overhead, 
>> but for debugging purpose, those are not that important, right? And we 
>> can add some optimization later to improve the code size and runtime 
>> overhead.
>> 
>> Otherwise, if we only use one pattern for all the types in this initial 
>> version, later we still might need change it.
>> 
>> How do you think?
> 
> No, let's not re-open that discussion.  As said we can look to support
> multi-byte pattern if that has a chance to improve things but only
> as followup.

I am fine with this.

However, we need to decide whether we will use one-byte repeatable pattern, or 
multiple-byte repeatable pattern now,
Since the implementation will be different. If using one-byte, the 
implementation will be the simplest, we can use memset for all
VLA, non-vla, zero-init, or pattern-init consistently.

However, if we choose multiple-byte pattern, then the implementation will be 
different, we cannot use memset for pattern-init, and 
The implemenation for VLA pattern-init also is different.

Qing
> 
> Thanks,
> Richard.
> 
>> Qing
>> 
>> On Jun 22, 2021, at 3:59 AM, Richard Biener 
>> mailto:rguent...@suse.de>> wrote:
>> 
>> On Tue, 22 Jun 2021, Richard Sandiford wrote:
>> 
>> Kees Cook mailto:keesc...@chromium.org>> writes:
>> On Mon, Jun 21, 2021 at 03:39:45PM +, Qing Zhao wrote:
>> So, if “pattern value” is “0x”, then it’s a valid canonical 
>> virtual memory address.  However, for most OS, “0x” should 
>> be not in user space.
>> 
>> My question is, is “0xF” good for pointer? Or 
>> “0x” better?
>> 
>> I think 0xFF repeating is fine for this version. Everything else is a
>> "nice to have" for the pattern-init, IMO. :)
>> 
>> Sorry to be awkward, but 0xFF seems worse than 0xAA to me.
>> 
>> For integer types, all values are valid representations, and we're
>> relying on the pattern being “obviously” wrong in context.  0x…
>> is unlikely to be a correct integer but 0x… would instead be a
>> “nice” -1.  It would be difficult to tell in a debugger that a -1
>> came from pattern init rather than a deliberate choice.
>> 
>> I agree that, all other things being equal, it would be nice to use NaNs
>> for floats.  But relying on wrong numerical values for floats doesn't
>> seem worse than doing that for integers.
>> 
>> 0xAA… for float is (if I've got this right) -3.0316488252093987e-13,
>> which admittedly doesn't stand out as wrong.  But I'm not sure we
>> should sacrifice integer debugging for float debugging here.
>> 
>> We can always expose the actual value as --param.  Now, I think
>> we'd need a two-byte pattern to reliably produce NaNs anyway,
>> so with floats taken out of the picture the focus should be on
>> pointers where IMHO val & 1 and val & 15 would be nice to have.
>> So sth like 0xf7 would work for those.  With a two-byte pattern
>> we could use 0xffef or 0x7fef.
>> 
>> Anyway, it's probably down to priorities of the project involved
>> (debugging FP stuff or integer stuff).
>> 
>> Richard.
>> 
>> 
> 
> -- 
> Richard Biener 
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)



Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-06-22 Thread Qing Zhao via Gcc-patches
So, I am wondering why not still keep my current implementation on assign 
different patterns for different types?

This major issue with this design is the code size and runtime overhead, but 
for debugging purpose, those are not that important, right? And we can add some 
optimization later to improve the code size and runtime overhead.

Otherwise, if we only use one pattern for all the types in this initial 
version, later we still might need change it.

How do you think?

Qing

On Jun 22, 2021, at 3:59 AM, Richard Biener 
mailto:rguent...@suse.de>> wrote:

On Tue, 22 Jun 2021, Richard Sandiford wrote:

Kees Cook mailto:keesc...@chromium.org>> writes:
On Mon, Jun 21, 2021 at 03:39:45PM +0000, Qing Zhao wrote:
So, if “pattern value” is “0x”, then it’s a valid canonical 
virtual memory address.  However, for most OS, “0x” should be 
not in user space.

My question is, is “0xF” good for pointer? Or 
“0x” better?

I think 0xFF repeating is fine for this version. Everything else is a
"nice to have" for the pattern-init, IMO. :)

Sorry to be awkward, but 0xFF seems worse than 0xAA to me.

For integer types, all values are valid representations, and we're
relying on the pattern being “obviously” wrong in context.  0x…
is unlikely to be a correct integer but 0x… would instead be a
“nice” -1.  It would be difficult to tell in a debugger that a -1
came from pattern init rather than a deliberate choice.

I agree that, all other things being equal, it would be nice to use NaNs
for floats.  But relying on wrong numerical values for floats doesn't
seem worse than doing that for integers.

0xAA… for float is (if I've got this right) -3.0316488252093987e-13,
which admittedly doesn't stand out as wrong.  But I'm not sure we
should sacrifice integer debugging for float debugging here.

We can always expose the actual value as --param.  Now, I think
we'd need a two-byte pattern to reliably produce NaNs anyway,
so with floats taken out of the picture the focus should be on
pointers where IMHO val & 1 and val & 15 would be nice to have.
So sth like 0xf7 would work for those.  With a two-byte pattern
we could use 0xffef or 0x7fef.

Anyway, it's probably down to priorities of the project involved
(debugging FP stuff or integer stuff).

Richard.



Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-06-21 Thread Qing Zhao via Gcc-patches


> On Jun 21, 2021, at 11:18 AM, Kees Cook  wrote:
> 
> On Mon, Jun 21, 2021 at 03:39:45PM +0000, Qing Zhao wrote:
>> So, if “pattern value” is “0x”, then it’s a valid canonical 
>> virtual memory address.  However, for most OS, “0x” should 
>> be not in user space.
>> 
>> My question is, is “0xF” good for pointer? Or 
>> “0x” better?
> 
> I think 0xFF repeating is fine for this version. Everything else is a
> "nice to have" for the pattern-init, IMO. :)

Okay, thank you!

Qing
> 
> -- 
> Kees Cook



Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-06-21 Thread Qing Zhao via Gcc-patches


> On Jun 21, 2021, at 10:35 AM, Richard Biener  wrote:
>>> I think we can drop -fauto-var-init=pattern and just go with block
>>> initializing which will cover padding as well which means we can
>>> stay with the odd -ftrivial-auto-var-init name used by CLANG and
>>> add no additional options.
>> 
>> Yes, this is a good idea. 
>> 
>> block initializing will cover all paddings automatically. 
>> 
>> Shall we do block initializing for both “zero initialization” and
>> “pattern initialization”?
>> 
>> Currently, for zero initialization, I used the following:
>> 
> +case AUTO_INIT_ZERO:
> +  init = build_zero_cst (TREE_TYPE (var));
> +  expand_assignment (var, init, false);
> +  break;
>> 
>> Looks like that the current “expand_assignment” does not initialize
>> paddings with zeroes. 
>> Shall I also use “memset” for “zero initialization”?
> 
> I'd say so, yes. 

Okay.

One more question for the current “expand_builtin_memset”:

Is the current implementation of “expand_builtin_memset” automatically handle 
short length memset optimally? 

i.e, do I need to specially handle char type, short type, or other types that 
can fit to a register?


>>> 
>>> There's no "safe" pattern besides all-zero for all "undefined" uses
>>> (note that uses do not necessarily use declared types).  Which is why
>>> recommending pattern init is somewhat misguided.  There's maybe 
>>> some useful pattern that more readily produces crashes, those that
>>> produce a FP sNaN for all of the float types.
>> 
>> So, pattern value as 0xFF might be better than 0xAA since 0x
>> will be a NaN value for floating type?
> 
> I think for debugging NaNs are quite nice, yes. 

For floating point, 0x is good. 
But for pointer type, is it good? (See my other email to Kees).

>> 
>> Not sure whether it’s necessary to expose this to user.
>> 
>> One question that is important to the implementation is:
>> 
>> Shall we use “byte-repeated” or “word-repeated” pattern?
>> Is “word-repeated” pattern better than “byte-repeated” pattern?
>> 
>> For implementation, “byte-repeated” pattern will make the whole
>> implementation much simpler since both “zero initialization” 
>> and “pattern initialization” can be implemented with “memset” with
>> different “value”.  
>> 
>> So, if “word-repeated” pattern will not have too much more benefit, I
>> will prefer “byte-repeated” pattern.
>> 
>> Let me know your comments here.
> 
> I have no strong opinion and prefer byte repetition for simplicity. But I 
> would document this as implementation detail that can change. 

Okay, if we finally decide to go with byte repetition, I will document this as 
implementation details that can be changed later.

Qing
> 
> Richard. 
> 
>>> 
 
 
 As said, for example glibc allocator hardening with MALLOC_PERTURB_
 uses simple byte-init.
 
 What’s the pattern glibc used?
>>> 
>>> The value of the MALLOC_PERTURB_ environment truncated to a byte.
>> 
>> Okay.
>> 
>> thanks.
>> 
>> Qing
>>> 
>>> Richard.
>>> 
> 



Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-06-21 Thread Qing Zhao via Gcc-patches
Hi, Kees,

On Jun 18, 2021, at 6:47 PM, Kees cook 
mailto:keesc...@chromium.org>> wrote:

On Wed, Jun 16, 2021 at 07:39:02PM +, Qing Zhao wrote:
So, the major question now is:

Is one single repeatable pattern enough for pattern initialization for all 
different types of auto variables?

If YES, then the implementation for pattern initialization will be much easier 
and simpler
 as you pointed out. And will save me a lot of pain to implement this part.
If NO, then we have to keep the current complicate implementation since it 
provides us
 the flexibility to assign different patterns to different types.

Honestly, I don’t have a good justification on this question myself.

The previous references I have so far are the current behavior of CLANG and 
Microsoft compiler.

For your reference,
. CLANG uses different patterns for INTEGER  (0x) and FLOAT 
(0x) and 32-bit pointer (0x00AA)
https://reviews.llvm.org/D54604
. Microsoft uses different patterns for INTEGERS ( 0xE2), FLOAT (1.0)
https://msrc-blog.microsoft.com/2020/05/13/solving-uninitialized-stack-memory-on-windows/

My understanding from CLANG’s comment is, the patterns are easier to crash the 
program for the certain type, therefore easier to
catch any potential bugs.

Right, this is the justification for the different patterns. I am
fine with a static value for the first version of this functionality,
as long as it's a non-canonical virtual memory address when evaluated
as a pointer (so that the pattern can't be made to aim at a legitimate
fixed allocatable address in memory).

Just searched online, 
(https://en.wikipedia.org/wiki/X86-64#Virtual_address_space_details)

===
Canonical form addresses run from 0 through 7FFF', and from 
8000' through ', for a total of 256 TB of usable 
virtual address space.
===

So, if “pattern value” is “0x”, then it’s a valid canonical 
virtual memory address.  However, for most OS, “0x” should be 
not in user space.

My question is, is “0xF” good for pointer? Or 
“0x” better?

Thanks.
Qing


Don’t know why Microsoft chose the pattern like this.

So, For GCC, what should we do on the pattern initializations, shall we choose 
one single repeatable pattern for all the types as you suggested,
Or chose different patterns for different types as Clang and Microsoft 
compiler’s behavior?

Kees, do you have any comment on this?

How did Linux Kernel use -ftrivial-auto-var-init=pattern feature of CLANG?

It's just used as-is from the compiler, and recommended for "debug
builds".

--
Kees Cook



Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-06-21 Thread Qing Zhao via Gcc-patches
HI, Richard,

> On Jun 21, 2021, at 2:53 AM, Richard Biener  wrote:
> 
>> 
>> 
>> This is for the compatibility with CLANG. -:). 
>> (https://reviews.llvm.org/D54604)
> 
> I don't care about functional 1:1 "compatibility" with CLANG.

Okay.  -:)

> 
>> 1. Pattern initialization
>> 
>>  This is the recommended initialization approach. Pattern initialization's
> 
> But elsewhere you said pattern initialization is only for debugging,
> not production …

Yes. Pattern initialization is only for debugging purpose during development 
phase.

> 
>> Use a pattern that fits them all.  I mean memory allocation hardening
>> fills allocated storage with a repeated (byte) pattern and people are
>> happy with that.  It also makes it easy to spot uninitialized storage
>> from a debugger.  So please, do not over-design this, it really doesn't
>> make any sense and the common case you are inevitably chasing here
>> would already be fine with a random repeated pattern.
>> 
>> So, My question is:
>> 
>> If we want to pattern initialize with the single repeated pattern for all 
>> types, with one is better to use:  “0x”
>> or “0x” , or other pattern that our current glibc used? What’s that 
>> pattern?
> 
> It's set by the user.

Yes, looks like that glibc uses a byte-repeated pattern that is set by the user 
through environment variable.

> 
>> Will  “0x” in a floating type auto variable crash the program?
>> Will “0x” in a pointer type auto variable crash the program? (Might 
>> crash?)
>> 
>> 
>> (thus also my suggestion to split out
>> padding handling - now we can also split out pattern init handling,
>> maybe somebody else feels like reviewing and approving this, who knows).
>> 
>> I am okay with further splitting pattern initialization part to a separate 
>> patch. Then we will
>> have 4 independent patches in total:
>> 
>> 1. -fauto-var-init=zero and all the handling in other passes to the new 
>> added call to .DEFERRED_INIT.
>> 2. Add -fauto-var-init=pattern
>> 3. Add -fauto-var-init-padding
>> 4. Add -ftrivial-auto-var-init for CLANG compatibility.
>> 
>> Are the above the correct understanding?
> 
> I think we can drop -fauto-var-init=pattern and just go with block
> initializing which will cover padding as well which means we can
> stay with the odd -ftrivial-auto-var-init name used by CLANG and
> add no additional options.

Yes, this is a good idea. 

block initializing will cover all paddings automatically. 

Shall we do block initializing for both “zero initialization” and “pattern 
initialization”?

Currently, for zero initialization, I used the following:

>>> +case AUTO_INIT_ZERO:
>>> +  init = build_zero_cst (TREE_TYPE (var));
>>> +  expand_assignment (var, init, false);
>>> +  break;

Looks like that the current “expand_assignment” does not initialize paddings 
with zeroes. 
Shall I also use “memset” for “zero initialization”?

> 
>> As said, block-initializing with a repeated pattern is OK and I can see
>> that being useful.  Trying to produce "nicer" values for floats, bools
>> and pointers on 32bit platforms is IMHO not going to fix anything and
>> introduce as many problems as it will "fix".
>> 
>> Yes, I agree, if we can find a good repeated pattern for all types’s 
>> pattern initialization, that will be much easier and simpler to 
>> implement, I am happy to do that.  (Honestly, the part of implementation 
>> that took me most of the time is pattern-initialization.. and I am still 
>> not very comfortable with this part Of the code myself.  -:)
> 
> There's no "safe" pattern besides all-zero for all "undefined" uses
> (note that uses do not necessarily use declared types).  Which is why
> recommending pattern init is somewhat misguided.  There's maybe 
> some useful pattern that more readily produces crashes, those that
> produce a FP sNaN for all of the float types.

So, pattern value as 0xFF might be better than 0xAA since 0x will be a 
NaN value for floating type?

> 
>> And if you block-initialize stuff you then automagically cover padding.
>> I call this a win-win, no?
>> 
>> Yes, this will also initialize paddings with patterns (Not zeroes as CLANG 
>> did).
>> Shall we compatible with CLANG on this?
> 
> No, why?

Okay.

>> in my example code (untested) you then still need
>> 
>>  expand_assignment (var, ctor, false);
>> 
>> it would be the easiest way to try pattern init with a pattern that's
>> bigger than a byte (otherwise of course the memset path is optimal).
>> 
>> If the pattern that is used to initialize all types is byte-repeatable, for 
>> example, 0xA or 0xF, then
>> We can use memset to initialize all types, however, the potential problem 
>> is, if later we decide
>> To change to another pattern that might not be byte-repeatable, then the 
>> memset implementation
>> is not proper at that time.
>> 
>> Is it possible that we might change the pattern later?
> 
> The pattern should be documented as an implementation detail 

Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-06-16 Thread Qing Zhao via Gcc-patches
So, the major question now is:

Is one single repeatable pattern enough for pattern initialization for all 
different types of auto variables?

If YES, then the implementation for pattern initialization will be much easier 
and simpler
  as you pointed out. And will save me a lot of pain to implement this part.
If NO, then we have to keep the current complicate implementation since it 
provides us
  the flexibility to assign different patterns to different types.

Honestly, I don’t have a good justification on this question myself.

The previous references I have so far are the current behavior of CLANG and 
Microsoft compiler.

For your reference,
. CLANG uses different patterns for INTEGER  (0x) and FLOAT 
(0x) and 32-bit pointer (0x00AA)
https://reviews.llvm.org/D54604
. Microsoft uses different patterns for INTEGERS ( 0xE2), FLOAT (1.0)
https://msrc-blog.microsoft.com/2020/05/13/solving-uninitialized-stack-memory-on-windows/

My understanding from CLANG’s comment is, the patterns are easier to crash the 
program for the certain type, therefore easier to
catch any potential bugs.
Don’t know why Microsoft chose the pattern like this.

So, For GCC, what should we do on the pattern initializations, shall we choose 
one single repeatable pattern for all the types as you suggested,
Or chose different patterns for different types as Clang and Microsoft 
compiler’s behavior?

Kees, do you have any comment on this?

How did Linux Kernel use -ftrivial-auto-var-init=pattern feature of CLANG?

Thanks.

Qing





Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-06-16 Thread Qing Zhao via Gcc-patches
Hi, Richard,

On Jun 16, 2021, at 1:19 AM, Richard Biener 
mailto:rguent...@suse.de>> wrote:

+/* Expand the IFN_DEFERRED_INIT function according to its second
argument.  */
+static void
+expand_DEFERRED_INIT (internal_fn, gcall *stmt)
+{
+  tree var = gimple_call_lhs (stmt);
+  tree init = NULL_TREE;
+  enum auto_init_type init_type
+= (enum auto_init_type) TREE_INT_CST_LOW (gimple_call_arg (stmt, 1));
+
+  switch (init_type)
+{
+default:
+  gcc_unreachable ();
+case AUTO_INIT_PATTERN:
+  init = build_pattern_cst_for_auto_init (TREE_TYPE (var));
+  expand_assignment (var, init, false);
+  break;
+case AUTO_INIT_ZERO:
+  init = build_zero_cst (TREE_TYPE (var));
+  expand_assignment (var, init, false);
+  break;
+}

I think actually building build_pattern_cst_for_auto_init can generate
massive garbage and for big auto vars code size is also a concern and
ideally on x86 you'd produce rep movq.  So I don't think going
via expand_assignment is good.  Instead you possibly want to lower
.DEFERRED_INIT to MEMs following expand_builtin_memset and
eventually enhance that to allow storing pieces larger than a byte.

Due to “BOOLEAN_TYPE” and “POINTER_TYPE”, we cannot always have a
repeated byte-pattern for variables that include BOOLEAN_TYPE Or Pointer
types. Therefore, lowering the .DEFERRED_INIT for “PATTERN”
initialization through “memset” is not always possible.

Let me know if I miss anything in the above. Do you have other suggestions?

The main point is that you need to avoid building the explicit initializer
only to have it consumed by assignment expansion.  If you want to keep
all the singing and dancing (as opposed to maybe initializing with a
0x1 byte pattern) then I think for efficiency you still want to
block-initialize the variable and then only fixup the special fields.

Yes, this is a good idea.

We can memset the whole structure with repeated pattern “0xAA” first,
Then mixup BOOLEAN_TYPE and POINTER TYPE for 32-bit platform.
That might be more efficient.

However, after more consideration, I feel that this might be a more
general optimization for “store_constructor” itself:

I.e,  if the “constructor” includes repeated byte value “0xAA” or any other 
value over a certain threshold,
i.e, 70% of the total size, then we might need to use a call to memset first, 
and then emit some additional single
field stores  to fix up the fields that have different initialization values?

Just like the current handling of “zeroes” in the current “store_constructor”, 
if “zeroes” occupy most of the constructor, then
“Clear the whole structure” first, then emit additional single field stories to 
fix up other fields that do not hold zeros.

So, I think that it might be better to keep the current
“expand_assignment” for “Pattern initialization” as it is in this patch.

And then, later we can add a separate patch to add this more general
optimization in “store_constructor” to improve the run time performance
and code size in general?

What’s your opinion on this?

My point is that _building_ the constructor is what we want to avoid
since that involves a lot of overhead memory-wise, it also requires
yet another complex structure field walk with much room for errors.

Block-initializing the object is so much easier and more efficient.
Implementing block initialization with a block size different from
a single byte should be also reasonably possible.  I mean there's
wmemset (not in GCC), so such block initialization would have other
uses as well.

If the pattern of the value that is used to initialize is repeatable, then
Block-initializing is ideal. However, Since the patterns of the values that
are used to initialize might not be completely repeatable due to BOOLEAN (0),
POINTER_TYPE at 32-bit platform (0x00AA) and FLOATING TYPE (NaN),
After block initializing of the whole object, we still need to add additional 
fix up
stores of these different patterns to the corresponding fields.

But that's a bug with the pattern used then.  You can never be sure that
an object is used only as its declared type but you are initializing it
as if it were.  Also all uninit uses invoke undefined behavior so I don't
see why you need to pay special attention here.  After all this makes
pattern init so much more fragile than zero-init which makes me question
it even more ...

Yes, you are right.  The major reason for the complexity of the code to handle 
pattern initialization
is because multiple different patterns are assigned to different types.

This is for the compatibility with CLANG. -:). (https://reviews.llvm.org/D54604)

For reference, I copied the part for pattern initialization from CLANG’s patch 
below:


1. Pattern initialization

  This is the recommended initialization approach. Pattern initialization's
  goal is to initialize automatic variables with values which will likely
  transform logic bugs into crashes down the line, are easily recognizable in
  a crash dump, 

Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-06-15 Thread Qing Zhao via Gcc-patches
Hi, Richard,


> On Jun 15, 2021, at 8:21 AM, Richard Biener  wrote:
>> 
>> 
>> +/* Expand the IFN_DEFERRED_INIT function according to its second
>> argument.  */
>> +static void
>> +expand_DEFERRED_INIT (internal_fn, gcall *stmt)
>> +{
>> +  tree var = gimple_call_lhs (stmt);
>> +  tree init = NULL_TREE;
>> +  enum auto_init_type init_type
>> += (enum auto_init_type) TREE_INT_CST_LOW (gimple_call_arg (stmt, 1));
>> +
>> +  switch (init_type)
>> +{
>> +default:
>> +  gcc_unreachable ();
>> +case AUTO_INIT_PATTERN:
>> +  init = build_pattern_cst_for_auto_init (TREE_TYPE (var));
>> +  expand_assignment (var, init, false);
>> +  break;
>> +case AUTO_INIT_ZERO:
>> +  init = build_zero_cst (TREE_TYPE (var));
>> +  expand_assignment (var, init, false);
>> +  break;
>> +}
>> 
>> I think actually building build_pattern_cst_for_auto_init can generate
>> massive garbage and for big auto vars code size is also a concern and
>> ideally on x86 you'd produce rep movq.  So I don't think going
>> via expand_assignment is good.  Instead you possibly want to lower
>> .DEFERRED_INIT to MEMs following expand_builtin_memset and
>> eventually enhance that to allow storing pieces larger than a byte.
>> 
>> Due to “BOOLEAN_TYPE” and “POINTER_TYPE”, we cannot always have a
>> repeated byte-pattern for variables that include BOOLEAN_TYPE Or Pointer
>> types. Therefore, lowering the .DEFERRED_INIT for “PATTERN”
>> initialization through “memset” is not always possible.
>> 
>> Let me know if I miss anything in the above. Do you have other suggestions?
>> 
>> The main point is that you need to avoid building the explicit initializer
>> only to have it consumed by assignment expansion.  If you want to keep
>> all the singing and dancing (as opposed to maybe initializing with a
>> 0x1 byte pattern) then I think for efficiency you still want to
>> block-initialize the variable and then only fixup the special fields.
>> 
>> Yes, this is a good idea.
>> 
>> We can memset the whole structure with repeated pattern “0xAA” first,
>> Then mixup BOOLEAN_TYPE and POINTER TYPE for 32-bit platform.
>> That might be more efficient.
>> 
>> However, after more consideration, I feel that this might be a more 
>> general optimization for “store_constructor” itself:
>> 
>> I.e,  if the “constructor” includes repeated byte value “0xAA” or any other 
>> value over a certain threshold,
>> i.e, 70% of the total size, then we might need to use a call to memset 
>> first, and then emit some additional single
>> field stores  to fix up the fields that have different initialization values?
>> 
>> Just like the current handling of “zeroes” in the current 
>> “store_constructor”, if “zeroes” occupy most of the constructor, then
>> “Clear the whole structure” first, then emit additional single field stories 
>> to fix up other fields that do not hold zeros.
>> 
>> So, I think that it might be better to keep the current 
>> “expand_assignment” for “Pattern initialization” as it is in this patch.
>> 
>> And then, later we can add a separate patch to add this more general 
>> optimization in “store_constructor” to improve the run time performance 
>> and code size in general?
>> 
>> What’s your opinion on this?
> 
> My point is that _building_ the constructor is what we want to avoid
> since that involves a lot of overhead memory-wise, it also requires
> yet another complex structure field walk with much room for errors.

So, you mean I should completely get rid of the new added routine 
“build_pattern_cst_for_auto_init”, since it built constructors for RECORD,
UNION, and ARRAY types.  And the current RTL expansion of constructor 
assignment is not efficient enough for pattern initialization purpose? 

> 
> Block-initializing the object is so much easier and more efficient.
> Implementing block initialization with a block size different from
> a single byte should be also reasonably possible.  I mean there's
> wmemset (not in GCC), so such block initialization would have other
> uses as well.

If the pattern of the value that is used to initialize is repeatable, then 
Block-initializing is ideal. However, Since the patterns of the values that
are used to initialize might not be completely repeatable due to BOOLEAN (0),
POINTER_TYPE at 32-bit platform (0x00AA) and FLOATING TYPE (NaN), 
After block initializing of the whole object, we still need to add additional 
fix up 
stores of these different patterns to the corresponding fields. 

For some of the objects whose most fields are BOOLEAN, POINTER_TYPE, 
rr FLOATING_TYPE, pattern  initializing likee this might be less efficient. Do 
you 
agree on this?


> 
> I'm going to repeatedly point at those large chunks of code that
> handle padding and building the CTOR - I don't even want to review
> them ;)  They should not exist

So, just want to confirm -:),  do you mean to completely delete the routine 
“build_pattern_cst_for_auto_init”? And then use the approach you 

Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-06-14 Thread Qing Zhao via Gcc-patches
Hi, Richard:

On Jun 11, 2021, at 10:49 AM, Qing Zhao via Gcc-patches 
mailto:gcc-patches@gcc.gnu.org>> wrote:


On May 26, 2021, at 6:18 AM, Richard Biener 
mailto:rguent...@suse.de>> wrote:

+/* Expand the IFN_DEFERRED_INIT function according to its second
argument.  */
+static void
+expand_DEFERRED_INIT (internal_fn, gcall *stmt)
+{
+  tree var = gimple_call_lhs (stmt);
+  tree init = NULL_TREE;
+  enum auto_init_type init_type
+= (enum auto_init_type) TREE_INT_CST_LOW (gimple_call_arg (stmt, 1));
+
+  switch (init_type)
+{
+default:
+  gcc_unreachable ();
+case AUTO_INIT_PATTERN:
+  init = build_pattern_cst_for_auto_init (TREE_TYPE (var));
+  expand_assignment (var, init, false);
+  break;
+case AUTO_INIT_ZERO:
+  init = build_zero_cst (TREE_TYPE (var));
+  expand_assignment (var, init, false);
+  break;
+}

I think actually building build_pattern_cst_for_auto_init can generate
massive garbage and for big auto vars code size is also a concern and
ideally on x86 you'd produce rep movq.  So I don't think going
via expand_assignment is good.  Instead you possibly want to lower
.DEFERRED_INIT to MEMs following expand_builtin_memset and
eventually enhance that to allow storing pieces larger than a byte.

Due to “BOOLEAN_TYPE” and “POINTER_TYPE”, we cannot always have a
repeated byte-pattern for variables that include BOOLEAN_TYPE Or Pointer
types. Therefore, lowering the .DEFERRED_INIT for “PATTERN”
initialization through “memset” is not always possible.

Let me know if I miss anything in the above. Do you have other suggestions?

The main point is that you need to avoid building the explicit initializer
only to have it consumed by assignment expansion.  If you want to keep
all the singing and dancing (as opposed to maybe initializing with a
0x1 byte pattern) then I think for efficiency you still want to
block-initialize the variable and then only fixup the special fields.

Yes, this is a good idea.

We can memset the whole structure with repeated pattern “0xAA” first,
Then mixup BOOLEAN_TYPE and POINTER TYPE for 32-bit platform.
That might be more efficient.

However, after more consideration, I feel that this might be a more general 
optimization for “store_constructor” itself:

I.e,  if the “constructor” includes repeated byte value “0xAA” or any other 
value over a certain threshold,
i.e, 70% of the total size, then we might need to use a call to memset first, 
and then emit some additional single
field stores  to fix up the fields that have different initialization values?

Just like the current handling of “zeroes” in the current “store_constructor”, 
if “zeroes” occupy most of the constructor, then
“Clear the whole structure” first, then emit additional single field stories to 
fix up other fields that do not hold zeros.

So, I think that it might be better to keep the current “expand_assignment”  
for “Pattern initialization” as it is in this patch.

And then, later we can  add a separate patch to add this more general 
optimization in “store_constructor” to improve the
run time performance and code size in general?

What’s your opinion on this?

Qing



Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-06-11 Thread Qing Zhao via Gcc-patches


On Jun 11, 2021, at 10:49 AM, Qing Zhao via Gcc-patches 
mailto:gcc-patches@gcc.gnu.org>> wrote:



On Jun 11, 2021, at 6:12 AM, Richard Biener 
mailto:rguent...@suse.de>> wrote:

On Thu, 10 Jun 2021, Qing Zhao wrote:

Hi, Richard,

I need more discussion on the following comments you raised:

On May 26, 2021, at 6:18 AM, Richard Biener 
mailto:rguent...@suse.de>> wrote:

+/* Expand the IFN_DEFERRED_INIT function according to its second
argument.  */
+static void
+expand_DEFERRED_INIT (internal_fn, gcall *stmt)
+{
+  tree var = gimple_call_lhs (stmt);
+  tree init = NULL_TREE;
+  enum auto_init_type init_type
+= (enum auto_init_type) TREE_INT_CST_LOW (gimple_call_arg (stmt, 1));
+
+  switch (init_type)
+{
+default:
+  gcc_unreachable ();
+case AUTO_INIT_PATTERN:
+  init = build_pattern_cst_for_auto_init (TREE_TYPE (var));
+  expand_assignment (var, init, false);
+  break;
+case AUTO_INIT_ZERO:
+  init = build_zero_cst (TREE_TYPE (var));
+  expand_assignment (var, init, false);
+  break;
+}

I think actually building build_pattern_cst_for_auto_init can generate
massive garbage and for big auto vars code size is also a concern and
ideally on x86 you'd produce rep movq.  So I don't think going
via expand_assignment is good.  Instead you possibly want to lower
.DEFERRED_INIT to MEMs following expand_builtin_memset and
eventually enhance that to allow storing pieces larger than a byte.

When I tried to lower .DEFERRED_INIT to MEMs for  “AUTO_INIT_PATTERN”, I have 
the following questions:

1. If .DEFERRED_INIT will be lowered to MEMS through “memset”, then we 
basically initialize the whole memory covering the
auto variable, including paddings. Right?

Yes.

2. Only when the value that is used to initialization has a repeated
 byte-pattern, we can lower it through “memset”. Otherwise, If the
 value that is used to initialization does Not have a repeated
 byte-pattern, we can NOT lower it through “memset”, right?

Yes.  This is why I said you should do it _similar_ to how memcpy
is implemented.  OTOH I don't see a good reason to support patterns
that are bigger than a byte ...

Currently, for the values that are used to initialize for “AUTO_INIT_PATTERN”, 
we have:

/* The following value is a guaranteed unmappable pointer value and has a
   repeated byte-pattern which makes it easier to synthesize.  We use it for
   pointers as well as integers so that aggregates are likely to be
   initialized with this repeated value.  */
uint64_t largevalue = 0xull;
/* For 32-bit platforms it's a bit trickier because, across systems, only the
   zero page can reasonably be expected to be unmapped, and even then we need
   a very low address.  We use a smaller value, and that value sadly doesn't
   have a repeated byte-pattern.  We don't use it for integers.  */
uint32_t smallvalue = 0x00AA;

In additional to the above, for BOOLEAN_TYPE:

  case BOOLEAN_TYPE:
/* We think that initializing the boolean variable to 0 other than 1
   is better even for pattern initialization.  */

Due to “BOOLEAN_TYPE” and “POINTER_TYPE”, we cannot always have a
repeated byte-pattern for variables that include BOOLEAN_TYPE Or Pointer
types. Therefore, lowering the .DEFERRED_INIT for “PATTERN”
initialization through “memset” is not always possible.

Let me know if I miss anything in the above. Do you have other suggestions?

The main point is that you need to avoid building the explicit initializer
only to have it consumed by assignment expansion.  If you want to keep
all the singing and dancing (as opposed to maybe initializing with a
0x1 byte pattern) then I think for efficiency you still want to
block-initialize the variable and then only fixup the special fields.

Yes, this is a good idea.

We can memset the whole structure with repeated pattern “0xAA” first,
Then mixup BOOLEAN_TYPE and POINTER TYPE for 32-bit platform.
That might be more efficient.

However, the paddings will be initialized to “0xAA”.
But this should be fine since with -fauto-var-init,  the paddings can be any 
value.

So, still should be fine.

Qing



Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-06-11 Thread Qing Zhao via Gcc-patches


> On Jun 11, 2021, at 6:12 AM, Richard Biener  wrote:
> 
> On Thu, 10 Jun 2021, Qing Zhao wrote:
> 
>> Hi, Richard,
>> 
>> I need more discussion on the following comments you raised:
>> 
>>> On May 26, 2021, at 6:18 AM, Richard Biener  wrote:
>>> 
>>> +/* Expand the IFN_DEFERRED_INIT function according to its second 
>>> argument.  */
>>> +static void
>>> +expand_DEFERRED_INIT (internal_fn, gcall *stmt)
>>> +{
>>> +  tree var = gimple_call_lhs (stmt);
>>> +  tree init = NULL_TREE;
>>> +  enum auto_init_type init_type
>>> += (enum auto_init_type) TREE_INT_CST_LOW (gimple_call_arg (stmt, 1));
>>> +
>>> +  switch (init_type)
>>> +{
>>> +default:
>>> +  gcc_unreachable ();
>>> +case AUTO_INIT_PATTERN:
>>> +  init = build_pattern_cst_for_auto_init (TREE_TYPE (var));
>>> +  expand_assignment (var, init, false);
>>> +  break;
>>> +case AUTO_INIT_ZERO:
>>> +  init = build_zero_cst (TREE_TYPE (var));
>>> +  expand_assignment (var, init, false);
>>> +  break;
>>> +}
>>> 
>>> I think actually building build_pattern_cst_for_auto_init can generate
>>> massive garbage and for big auto vars code size is also a concern and
>>> ideally on x86 you'd produce rep movq.  So I don't think going
>>> via expand_assignment is good.  Instead you possibly want to lower
>>> .DEFERRED_INIT to MEMs following expand_builtin_memset and
>>> eventually enhance that to allow storing pieces larger than a byte.
>> 
>> When I tried to lower .DEFERRED_INIT to MEMs for  “AUTO_INIT_PATTERN”, I 
>> have the following questions:
>> 
>> 1. If .DEFERRED_INIT will be lowered to MEMS through “memset”, then we 
>> basically initialize the whole memory covering the
>> auto variable, including paddings. Right?
> 
> Yes.
> 
>> 2. Only when the value that is used to initialization has a repeated 
>>   byte-pattern, we can lower it through “memset”. Otherwise, If the 
>>   value that is used to initialization does Not have a repeated 
>>   byte-pattern, we can NOT lower it through “memset”, right?
> 
> Yes.  This is why I said you should do it _similar_ to how memcpy
> is implemented.  OTOH I don't see a good reason to support patterns
> that are bigger than a byte ...
> 
>> Currently, for the values that are used to initialize for 
>> “AUTO_INIT_PATTERN”, we have:
>> 
>>  /* The following value is a guaranteed unmappable pointer value and has a
>> repeated byte-pattern which makes it easier to synthesize.  We use it for
>> pointers as well as integers so that aggregates are likely to be
>> initialized with this repeated value.  */
>>  uint64_t largevalue = 0xull;
>>  /* For 32-bit platforms it's a bit trickier because, across systems, only 
>> the
>> zero page can reasonably be expected to be unmapped, and even then we 
>> need
>> a very low address.  We use a smaller value, and that value sadly doesn't
>> have a repeated byte-pattern.  We don't use it for integers.  */
>>  uint32_t smallvalue = 0x00AA;
>> 
>> In additional to the above, for BOOLEAN_TYPE:
>> 
>>case BOOLEAN_TYPE:
>>  /* We think that initializing the boolean variable to 0 other than 1
>> is better even for pattern initialization.  */
>> 
>> Due to “BOOLEAN_TYPE” and “POINTER_TYPE”, we cannot always have a 
>> repeated byte-pattern for variables that include BOOLEAN_TYPE Or Pointer 
>> types. Therefore, lowering the .DEFERRED_INIT for “PATTERN” 
>> initialization through “memset” is not always possible.
>> 
>> Let me know if I miss anything in the above. Do you have other suggestions?
> 
> The main point is that you need to avoid building the explicit initializer
> only to have it consumed by assignment expansion.  If you want to keep
> all the singing and dancing (as opposed to maybe initializing with a
> 0x1 byte pattern) then I think for efficiency you still want to
> block-initialize the variable and then only fixup the special fields.

Yes, this is a good idea. 

We can memset the whole structure with repeated pattern “0xAA” first,
Then mixup BOOLEAN_TYPE and POINTER TYPE for 32-bit platform. 
That might be more efficient. 

> 
> But as said, all this is quite over-designed IMHO and simply
> zeroing everything would be much simpler and good enough.

So, the fundenmental questions are:

1. do we need the functionality of “Pattern Initialization” for debugging 
purpose?
I se

Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-06-10 Thread Qing Zhao via Gcc-patches
Hi, Richard,

I need more discussion on the following comments you raised:

> On May 26, 2021, at 6:18 AM, Richard Biener  wrote:
> 
> +/* Expand the IFN_DEFERRED_INIT function according to its second 
> argument.  */
> +static void
> +expand_DEFERRED_INIT (internal_fn, gcall *stmt)
> +{
> +  tree var = gimple_call_lhs (stmt);
> +  tree init = NULL_TREE;
> +  enum auto_init_type init_type
> += (enum auto_init_type) TREE_INT_CST_LOW (gimple_call_arg (stmt, 1));
> +
> +  switch (init_type)
> +{
> +default:
> +  gcc_unreachable ();
> +case AUTO_INIT_PATTERN:
> +  init = build_pattern_cst_for_auto_init (TREE_TYPE (var));
> +  expand_assignment (var, init, false);
> +  break;
> +case AUTO_INIT_ZERO:
> +  init = build_zero_cst (TREE_TYPE (var));
> +  expand_assignment (var, init, false);
> +  break;
> +}
> 
> I think actually building build_pattern_cst_for_auto_init can generate
> massive garbage and for big auto vars code size is also a concern and
> ideally on x86 you'd produce rep movq.  So I don't think going
> via expand_assignment is good.  Instead you possibly want to lower
> .DEFERRED_INIT to MEMs following expand_builtin_memset and
> eventually enhance that to allow storing pieces larger than a byte.

When I tried to lower .DEFERRED_INIT to MEMs for  “AUTO_INIT_PATTERN”, I have 
the following questions:

1. If .DEFERRED_INIT will be lowered to MEMS through “memset”, then we 
basically initialize the whole memory covering the
auto variable, including paddings. Right?
2. Only when the value that is used to initialization has a repeated 
byte-pattern, we can lower it through “memset”. Otherwise,
If the value that is used to initialization does Not have a repeated 
byte-pattern, we can NOT lower it through “memset”, right?

Currently, for the values that are used to initialize for “AUTO_INIT_PATTERN”, 
we have:

  /* The following value is a guaranteed unmappable pointer value and has a
 repeated byte-pattern which makes it easier to synthesize.  We use it for
 pointers as well as integers so that aggregates are likely to be
 initialized with this repeated value.  */
  uint64_t largevalue = 0xull;
  /* For 32-bit platforms it's a bit trickier because, across systems, only the
 zero page can reasonably be expected to be unmapped, and even then we need
 a very low address.  We use a smaller value, and that value sadly doesn't
 have a repeated byte-pattern.  We don't use it for integers.  */
  uint32_t smallvalue = 0x00AA;

In additional to the above, for BOOLEAN_TYPE:

case BOOLEAN_TYPE:
  /* We think that initializing the boolean variable to 0 other than 1
 is better even for pattern initialization.  */

Due to “BOOLEAN_TYPE” and “POINTER_TYPE”,  we cannot always have a repeated 
byte-pattern for variables that include BOOLEAN_TYPE
Or Pointer types. Therefore, lowering the .DEFERRED_INIT for “PATTERN” 
initialization through “memset” is not always possible. 

Let me know if I miss anything in the above. Do you have other suggestions?

thanks.

Qing



Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-06-08 Thread Qing Zhao via Gcc-patches


> On Jun 8, 2021, at 11:59 AM, Kees Cook  wrote:
> 
> On Tue, Jun 08, 2021 at 09:41:38AM +0200, Richard Biener wrote:
>> On Mon, 7 Jun 2021, Qing Zhao wrote:
>>> 
>>> Personally, I am okay with splitting padding initialization from this 
>>> current patch,
>>> Kees, what’s your opinion on this? i.e, the current -ftrivial-auto-var-init 
>>> will NOT initialize padding, we will add another option to 
>>> Explicitly initialize padding.
>> 
>> It would also be possible to have -fauto-var-init, -fauto-var-init-padding
>> and have -ftrivial-auto-var-init for clang compatibility enabling both.
> 
> Sounds good to me!

Agreed!

Then I will take this approach:

1.  Adding two separate new options:
   -fauto-var-init. initialize auto variables to zero or patterns. For 
variables that have paddings, only initialize valid fields, no padding 
initialization;
   -fauto-var-init-paddinginitialize paddings inside an auto variables to 
zeroes. 

2.  Add another new option for Clang compatibility:
   -ftrivial-auto-var-init   will enable -fauto-var-init + 
-fauto-var-init-padding


Thanks.

Qing
> 
>> Or -fauto-var-init={zero,pattern,padding} and allow
>> -fauto-var-init=pattern,padding to be specified.  Note there's also
>> padding between auto variables on the stack - that "trailing"
>> padding isn't initialized either?  (yes, GCC sorts variables to minimize
>> that padding)  For example for
>> 
>> void foo()
>> {
>>  char a[3];
>>  bar (a);
>> }
>> 
>> there's 12 bytes padding after 'a', shouldn't we initialize that?  If not,
>> why's other padding important to be initialized?
> 
> This isn't a situation that I'm aware of causing real-world problems.
> The issues have all come from padding within an addressable object. I
> haven't tested Clang's behavior on this (and I have no kernel tests for
> this padding), but I do check for trailing padding, like:
> 
> struct test_trailing_hole {
>char *one;
>char *two;
>char *three;
>char four;
>/* "sizeof(unsigned long) - 1" byte padding hole here. */
> };
> 
> -Kees
> 
> -- 
> Kees Cook



Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-06-08 Thread Qing Zhao via Gcc-patches
Thanks a lot.

Kees. 

Do you have the same issue with my emails?

I see this problem with my email mostly to part of the emails that were sent to 
gcc-patches alias. 
Other emails are fine. 

> On Jun 8, 2021, at 11:56 AM, Kees Cook  wrote:
> 
> On Tue, Jun 08, 2021 at 09:37:30AM +0200, Richard Biener wrote:
>> On Mon, 7 Jun 2021, Qing Zhao wrote:
>>> On Jun 7, 2021, at 2:48 AM, Richard Biener 
>>> mailto:rguent...@suse.de>> wrote:
>>> 
>>> Meh - can you try using a mailer that does proper quoting?  It's difficult
>>> to spot your added comments.  Will try anyway (and sorry for the delay)
>>> 
>>> Only the email replied to gcc-patch alias had this issue, all the other 
>>> emails I sent are fine. Not sure why?
>> 
>> All your mails have this problem for me, it makes it quite difficult to
>> follow the conversation.
> 
> I think the first step is to make sure the MUA is sending "text only"
> emails. Then configure the "quoting style" to do the standard "> "-style.
> 
> What email client are you using?

I am using Mac’s Apple Mail client on my computer. 

I have been using this mail client for a long time, but only had such issues 
recently. 

Really not sure what’s going on.

I will try to figure this out.

Qing
> 
> -- 
> Kees Cook



Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-06-08 Thread Qing Zhao via Gcc-patches


On Jun 8, 2021, at 2:41 AM, Richard Biener 
mailto:rguent...@suse.de>> wrote:



Which is also why I suggested to split out the padding initialization
bits to a separate patch (and option).

Personally, I am okay with splitting padding initialization from this current 
patch,
Kees, what’s your opinion on this? i.e, the current -ftrivial-auto-var-init 
will NOT initialize padding, we will add another option to
Explicitly initialize padding.

It would also be possible to have -fauto-var-init, -fauto-var-init-padding
and have -ftrivial-auto-var-init for clang compatibility enabling
both.

I really like this idea.

Personally, I do think that separating padding initialization from auto-var 
initialization will make the design and implemenation more clean.

With an additional -ftrivial-auto-var-init to include both will serve the clang 
compatibility well.

 Or -fauto-var-init={zero,pattern,padding} and allow
-fauto-var-init=pattern,padding to be specified.  Note there's also
padding between auto variables on the stack - that "trailing"
padding isn't initialized either?  (yes, GCC sorts variables to minimize
that padding)  For example for

void foo()
{
 char a[3];
 bar (a);
}

there's 12 bytes padding after 'a', shouldn't we initialize that?

Yes, in the current patch, tail paddings are also initialized.

But “paddings” between auto variables are not initialized. (They are not belong 
to variables).

Qing


 If not,
why's other padding important to be initialized?

Richard.



Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-06-07 Thread Qing Zhao via Gcc-patches
Hi, 

> On Jun 7, 2021, at 2:53 AM, Richard Biener  wrote:
> 
>> 
>> To address the above suggestion:
>> 
>> My study shows: the call to __builtin_clear_padding is expanded during 
>> gimplification phase.
>> And there is no __bultin_clear_padding expanding during rtx expanding phase.
>> However, for -ftrivial-auto-var-init, padding initialization should be done 
>> both in gimplification phase and rtx expanding phase.
>> since the __builtin_clear_padding might not be good for rtx expanding, 
>> reusing __builtin_clear_padding might not work.
>> 
>> Let me know if you have any more comments on this.
> 
> Yes, I didn't suggest to literally emit calls to __builtin_clear_padding 
> but instead to leverage the lowering code, more specifically share the
> code that figures _what_ is to be initialized (where the padding is)
> and eventually the actual code generation pieces.  That might need some
> refactoring but the code where padding resides should be present only
> a single time (since it's quite complex).

Okay, I see your point here.

> 
> Which is also why I suggested to split out the padding initialization
> bits to a separate patch (and option).

Personally, I am okay with splitting padding initialization from this current 
patch,
Kees, what’s your opinion on this? i.e, the current -ftrivial-auto-var-init 
will NOT initialize padding, we will add another option to 
Explicitly initialize padding.

Qing


> 
> Richard.



Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-06-07 Thread Qing Zhao via Gcc-patches
(Kees, can you answer one of Richard’s question below? On the reason to 
initialize padding of structures)

Richard,


On Jun 7, 2021, at 2:48 AM, Richard Biener 
mailto:rguent...@suse.de>> wrote:

Meh - can you try using a mailer that does proper quoting?  It's difficult
to spot your added comments.  Will try anyway (and sorry for the delay)

Only the email replied to gcc-patch alias had this issue, all the other emails 
I sent are fine. Not sure why?


Both clang and my patch add initialization to the above auto variable “line”.

So, I have the following questions need help:

1. Do we need to exclude C++ class with ctor from auto initialization?

2. I see Clang use call to internal memset to initialize such class, but for my 
patch, I only initialize the data fields inside this class.
   Which one is better?

I can't answer either question, but generally using block-initialization
(for example via memset, but we'd generally prefer X = {}) is better for
later optimization.

Okay. So, Is this he same reason as lowering the call to .DEFFERED_INIT through 
expand_builtin_memset other than expand_assign?


seeing this, can you explain why using .DEFERRED_INIT does not
work for VLAs?

The major reason for going different routes for VLAs vs. no-VLAs is:

In the original gimplification phase, VLAs and no-VLAs go different routes.
I just followed the different routes for them:

In “gimplify_decl_expr”, VLA goes to “gimplify_vla_decl”, and is expanded to
call to alloca.  Naturally, I add calls to “memset/memcpy” in 
“gimplify_vla_decl” to
Initialize it.

On the other hand, no-VLAs are handled differently in “gimplify_decl_expr”, so
I added calls to “.DEFFERED_INIT” to initialize them.

What’s the major issue if I add calls to “memset/memcpy” in “gimplify_vla_decl” 
to
Initialize VLAs?

Just inconsistency and unexpected different behavior with respect to
uninitialized warnings?

Okay.
Will try to initialize VLA through the call to .DEFFERED_INIT too. And see 
whether there is any issue with it.


@@ -5001,6 +5185,17 @@ gimplify_init_constructor (tree *expr_p, gimple_seq
*pre_p, gimple_seq *post_p,
/* If a single access to the target must be ensured and all
elements
   are zero, then it's optimal to clear whatever their number.
*/
cleared = true;
+   else if (flag_trivial_auto_var_init > AUTO_INIT_UNINITIALIZED
+&& !TREE_STATIC (object)
+&& type_has_padding (type))
+ /* If the user requests to initialize automatic variables with
+paddings inside the type, we should initialize the paddings
too.
+C guarantees that brace-init with fewer initializers than
members
+aggregate will initialize the rest of the aggregate as-if it
were
+static initialization.  In turn static initialization
guarantees
+that pad is initialized to zero bits.
+So, it's better to clear the whole record under such
situation.  */
+ cleared = true;

so here we have padding as well - I think this warrants to be controlled
by an extra option?  And we can maybe split this out to a separate
patch? (the whole padding stuff)

Clang does the padding initialization with this option, shall we be
consistent with Clang?

Just for the sake of consistency?  No.  Is there a technical reason
for this complication?  Say we have

 struct { short s; int i; } a;

what's the technical reason to initialize the padding?  I might
be tempted to use -ftrivial-auto-init but I'd definitely don't
want to spend cycles/instructions initializing the padding in the
above struct.

Kees, could you please answer this question? What’s the major reason to 
initialize padding
of structures from the security point of view?


At this point I also wonder whether doing the actual initialization
by block-initializing the current function frame at allocation
time.

Which phase is for “allocation time”, please point me to the specific phase and 
source file.


That would be a way smaller patch (but possibly backend
specific).  On x86 it could be a single rep mov; for all but the
VLA cases.  Just a thought.



Thanks.

Qing



Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-06-03 Thread Qing Zhao via Gcc-patches
Hi, Richard,


On May 26, 2021, at 6:18 AM, Richard Biener 
mailto:rguent...@suse.de>> wrote:

On Wed, 12 May 2021, Qing Zhao wrote:

Hi,

This is the 3rd version of the patch for the new security feature for GCC.

Please take look and let me know your comments and suggestions.


+/* Returns true when the given TYPE has padding inside it.
+   return false otherwise.  */
+bool
+type_has_padding (tree type)
+{
+  switch (TREE_CODE (type))
+{
+case RECORD_TYPE:
+  {

btw, there's __builtin_clear_padding and a whole machinery around
it in gimple-fold.c, I'm sure that parts could be re-used if they
are neccessary in the end.

To address the above suggestion:

My study shows: the call to __builtin_clear_padding is expanded during 
gimplification phase.
And there is no __bultin_clear_padding expanding during rtx expanding phase.
However, for -ftrivial-auto-var-init, padding initialization should be done 
both in gimplification phase and rtx expanding phase.
since the __builtin_clear_padding might not be good for rtx expanding, reusing 
__builtin_clear_padding might not work.

Let me know if you have any more comments on this.

Thanks.

Qing


Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-06-03 Thread Qing Zhao via Gcc-patches
Hi, Richard,

For the following, I need more clarification:



+/* Expand the IFN_DEFERRED_INIT function according to its second
argument.  */
+static void
+expand_DEFERRED_INIT (internal_fn, gcall *stmt)
+{
+  tree var = gimple_call_lhs (stmt);
+  tree init = NULL_TREE;
+  enum auto_init_type init_type
+= (enum auto_init_type) TREE_INT_CST_LOW (gimple_call_arg (stmt, 1));
+
+  switch (init_type)
+{
+default:
+  gcc_unreachable ();
+case AUTO_INIT_PATTERN:
+  init = build_pattern_cst_for_auto_init (TREE_TYPE (var));
+  expand_assignment (var, init, false);
+  break;
+case AUTO_INIT_ZERO:
+  init = build_zero_cst (TREE_TYPE (var));
+  expand_assignment (var, init, false);
+  break;
+}

I think actually building build_pattern_cst_for_auto_init can generate
massive garbage and for big auto vars code size is also a concern and
ideally on x86 you'd produce rep movq.  So I don't think going
via expand_assignment is good.  Instead you possibly want to lower
.DEFERRED_INIT to MEMs following expand_builtin_memset and
eventually enhance that to allow storing pieces larger than a byte.


I will lower .DEFFERED_INIT to MEMS following expand_builtin_memset for 
“AUTO_INIT_PATTERN”.
My question is:
Do I need to do the same for “AUTO_INIT_ZERO”?

Thanks.

Qing



Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-05-27 Thread Qing Zhao via Gcc-patches
(Resend, the previous version messed up your questions and my answers, 
hopefully this time it’s better)

Hi, Richard,

Thanks a lot for your comments.

On May 26, 2021, at 6:18 AM, Richard Biener 
mailto:rguent...@suse.de>> wrote:

On Wed, 12 May 2021, Qing Zhao wrote:

Hi,

This is the 3rd version of the patch for the new security feature for GCC.

Please take look and let me know your comments and suggestions.

thanks.

Qing

**Compare with the 2nd version, the following are the major changes:

1. use "lookup_attribute ("uninitialized",) directly instead of adding
  one new field "uninitialized" into tree_decl_with_vis.
2. update documentation to mention that the new option will not confuse
  -Wuninitialized, GCC still consider an auto without explicit initializer
  as uninitialized.
3. change the name of "build_pattern_cst" to more specific name as
  "build_pattern_cst_for_auto_init".
4. handling of nested VLA;
  Adding new testing cases (auto-init-15/16.c) for this new handling.
5. Add  new verifications of calls to .DEFERRED_INIT in tree-cfg.c;
6. in tree-sra.c, update the handling of "grp_to_be_debug_replaced",
  bind the lhs variable to a call to .DEFERRED_INIT.
7. In tree-ssa-structalias.c, delete "find_func_aliases_for_deferred_init",
  return directly for a call to .DEFERRED_INIT in "find_func_aliases_for_call".
8. Add more detailed comments in tree-ssa-uninit.c and tree-ssa.c to explain
  the special handling on REALPART_EXPR/IMAGPRT_EXPR.
9. in build_pattern_cst_for_auto_init:
  BOOLEAN_TYPE will be set to zero always;
  INTEGER_TYPE (?and ENUMERAL_TYPE) use wi::from_buffer in order to
   correctly handle 128-bit integers.
  POINTER_TYPE will not assert on SIZE < 32.
  REAL_TYPE add fallback;
10. changed gcc_assert to gcc_unreachable in several places;
11. add more comments;
12. some style issue changes.

**Please see the version 2 at:
https://gcc.gnu.org/pipermail/gcc-patches/2021-March/567262.html


**The following 2 items are the ones I didn’t addressed in this version due 
to further study and might need more discussion:

1. Using __builtin_clear_padding  to replace type_has_padding.

My study shows: the call to __builtin_clear_padding is expanded during 
gimplification phase.
And there is no __bultin_clear_padding expanding during rtx expanding phase.
If so,  for -ftrivial-auto-var-init, padding initialization should be done both 
in gimplification phase and rtx expanding phase.
And since the __builtin_clear_padding might not be good for rtx expanding, 
reusing __builtin_clear_padding might not work.

2. Pattern init to NULLPTR_TYPE and ENUMERAL_TYPE: need more comments from 
Richard Biener on this.

**The change of the 3rd version compared to the 2nd version are:


+@item -ftrivial-auto-var-init=@var{choice}
+@opindex ftrivial-auto-var-init
+Initialize automatic variables with either a pattern or with zeroes to
increase
+the security and predictability of a program by preventing uninitialized
memory
+disclosure and use.

the docs do not state what "trivial" actually means?  Does it affect
C++ classes with ctors, thus is "trivial" equal to what C++ considers
a POD type?


Thank you for this question.

The name -ftrivial-auto-var-init is just for compatible with Clang. I really 
don’t know why
they added trivial.

As I checked a small example with C++ class with ctors, I see both Clang and my 
patch add
Initialization to this class:

=
#include 

using namespace std;

class Line {
  public:
 void setLength( double len );
 double getLength( void );
 Line();  // This is the constructor
  private:
 double length;
};

// Member functions definitions including constructor
Line::Line(void) {
  cout << "Object is being created" << endl;
}
void Line::setLength( double len ) {
 length = len;
}
double Line::getLength( void ) {
 return length;
}

// Main function for the program
int main() {
 Line line;

 // set line length
 line.setLength(6.0);
 cout << "Length of line : " << line.getLength() <
AUTO_INIT_UNINITIALIZED
+&& !TREE_STATIC (exp)
+&& type_has_padding (type)))

testing flag_trivial_auto_var_init tests the global options, if TUs
are compiled with different setting of flag_trivial_auto_var_init
and you use LTO or flag_trivial_auto_var_init is specified per
function via optimize attributes it's more appropriate to test
opt_for_fn (cfun->decl, flag_trivial_auto_var_init)


Okay.  Thanks for the info. I will update this.


You do not actually test whether TARGET is an auto-var in this place,
so I question this change

Will add checking for Auto on this.

- the documentation of ftrivial-auto-var-init
also doesn't mention initialization of padding

Will add initialization of padding on this.
Clang add the padding initialization 

Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-05-27 Thread Qing Zhao via Gcc-patches
Hi, Richard,

Thanks a lot for your comments.


On May 26, 2021, at 6:18 AM, Richard Biener 
mailto:rguent...@suse.de>> wrote:

On Wed, 12 May 2021, Qing Zhao wrote:

Hi,

This is the 3rd version of the patch for the new security feature for GCC.

Please take look and let me know your comments and suggestions.

thanks.

Qing

**Compare with the 2nd version, the following are the major changes:

1. use "lookup_attribute ("uninitialized",) directly instead of adding
  one new field "uninitialized" into tree_decl_with_vis.
2. update documentation to mention that the new option will not confuse
  -Wuninitialized, GCC still consider an auto without explicit initializer
  as uninitialized.
3. change the name of "build_pattern_cst" to more specific name as
  "build_pattern_cst_for_auto_init".
4. handling of nested VLA;
  Adding new testing cases (auto-init-15/16.c) for this new handling.
5. Add  new verifications of calls to .DEFERRED_INIT in tree-cfg.c;
6. in tree-sra.c, update the handling of "grp_to_be_debug_replaced",
  bind the lhs variable to a call to .DEFERRED_INIT.
7. In tree-ssa-structalias.c, delete "find_func_aliases_for_deferred_init",
  return directly for a call to .DEFERRED_INIT in "find_func_aliases_for_call".
8. Add more detailed comments in tree-ssa-uninit.c and tree-ssa.c to explain
  the special handling on REALPART_EXPR/IMAGPRT_EXPR.
9. in build_pattern_cst_for_auto_init:
  BOOLEAN_TYPE will be set to zero always;
  INTEGER_TYPE (?and ENUMERAL_TYPE) use wi::from_buffer in order to
   correctly handle 128-bit integers.
  POINTER_TYPE will not assert on SIZE < 32.
  REAL_TYPE add fallback;
10. changed gcc_assert to gcc_unreachable in several places;
11. add more comments;
12. some style issue changes.

**Please see the version 2 at:
https://gcc.gnu.org/pipermail/gcc-patches/2021-March/567262.html


**The following 2 items are the ones I didn’t addressed in this version due 
to further study and might need more discussion:

1. Using __builtin_clear_padding  to replace type_has_padding.

My study shows: the call to __builtin_clear_padding is expanded during 
gimplification phase.
And there is no __bultin_clear_padding expanding during rtx expanding phase.
If so,  for -ftrivial-auto-var-init, padding initialization should be done both 
in gimplification phase and rtx expanding phase.
And since the __builtin_clear_padding might not be good for rtx expanding, 
reusing __builtin_clear_padding might not work.

2. Pattern init to NULLPTR_TYPE and ENUMERAL_TYPE: need more comments from 
Richard Biener on this.

**The change of the 3rd version compared to the 2nd version are:


+@item -ftrivial-auto-var-init=@var{choice}
+@opindex ftrivial-auto-var-init
+Initialize automatic variables with either a pattern or with zeroes to
increase
+the security and predictability of a program by preventing uninitialized
memory
+disclosure and use.

the docs do not state what "trivial" actually means?  Does it affect
C++ classes with ctors, thus is "trivial" equal to what C++ considers
a POD type?

Thank you for this question.

The name -ftrivial-auto-var-init is just for compatible with Clang. I really 
don’t know why
they added trivial.

As I checked a small example with C++ class with ctors, I see both Clang and my 
patch add
Initialization to this class:

=
#include 

using namespace std;

class Line {
   public:
  void setLength( double len );
  double getLength( void );
  Line();  // This is the constructor
   private:
  double length;
};

// Member functions definitions including constructor
Line::Line(void) {
   cout << "Object is being created" << endl;
}
void Line::setLength( double len ) {
  length = len;
}
double Line::getLength( void ) {
  return length;
}

// Main function for the program
int main() {
  Line line;

  // set line length
  line.setLength(6.0);
  cout << "Length of line : " << line.getLength() <
AUTO_INIT_UNINITIALIZED
+&& !TREE_STATIC (exp)
+&& type_has_padding (type)))

testing flag_trivial_auto_var_init tests the global options, if TUs
are compiled with different setting of flag_trivial_auto_var_init
and you use LTO or flag_trivial_auto_var_init is specified per
function via optimize attributes it's more appropriate to test
opt_for_fn (cfun->decl, flag_trivial_auto_var_init)

Okay.  Thanks for the info. I will update this.


You do not actually test whether TARGET is an auto-var in this place,
so I question this change

Will add checking for Auto on this.

- the documentation of ftrivial-auto-var-init
also doesn't mention initialization of padding

Will add initialization of padding on this.

Clang add the padding initialization with this option, I guess that we might 
need to
be compatible with it?

and the above doesn't
s

Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-05-25 Thread Qing Zhao via Gcc-patches
Ping….

Qing

On May 12, 2021, at 12:16 PM, Qing Zhao via Gcc-patches 
mailto:gcc-patches@gcc.gnu.org>> wrote:

Hi,

This is the 3rd version of the patch for the new security feature for GCC.

Please take look and let me know your comments and suggestions.

thanks.

Qing

**Compare with the 2nd version, the following are the major changes:

1. use "lookup_attribute ("uninitialized",) directly instead of adding
   one new field "uninitialized" into tree_decl_with_vis.
2. update documentation to mention that the new option will not confuse
   -Wuninitialized, GCC still consider an auto without explicit initializer
   as uninitialized.
3. change the name of "build_pattern_cst" to more specific name as
   "build_pattern_cst_for_auto_init".
4. handling of nested VLA;
   Adding new testing cases (auto-init-15/16.c) for this new handling.
5. Add  new verifications of calls to .DEFERRED_INIT in tree-cfg.c;
6. in tree-sra.c, update the handling of "grp_to_be_debug_replaced",
   bind the lhs variable to a call to .DEFERRED_INIT.
7. In tree-ssa-structalias.c, delete "find_func_aliases_for_deferred_init",
   return directly for a call to .DEFERRED_INIT in "find_func_aliases_for_call".
8. Add more detailed comments in tree-ssa-uninit.c and tree-ssa.c to explain
   the special handling on REALPART_EXPR/IMAGPRT_EXPR.
9. in build_pattern_cst_for_auto_init:
   BOOLEAN_TYPE will be set to zero always;
   INTEGER_TYPE (?and ENUMERAL_TYPE) use wi::from_buffer in order to
correctly handle 128-bit integers.
   POINTER_TYPE will not assert on SIZE < 32.
   REAL_TYPE add fallback;
10. changed gcc_assert to gcc_unreachable in several places;
11. add more comments;
12. some style issue changes.

**Please see the version 2 at:
https://gcc.gnu.org/pipermail/gcc-patches/2021-March/567262.html


**The following 2 items are the ones I didn’t addressed in this version due 
to further study and might need more discussion:

1. Using __builtin_clear_padding  to replace type_has_padding.

My study shows: the call to __builtin_clear_padding is expanded during 
gimplification phase.
And there is no __bultin_clear_padding expanding during rtx expanding phase.
If so,  for -ftrivial-auto-var-init, padding initialization should be done both 
in gimplification phase and rtx expanding phase.
And since the __builtin_clear_padding might not be good for rtx expanding, 
reusing __builtin_clear_padding might not work.

2. Pattern init to NULLPTR_TYPE and ENUMERAL_TYPE: need more comments from 
Richard Biener on this.

**The change of the 3rd version compared to the 2nd version are:



**The complete 3rd version of the patch are:



<3rd-version-ftrivial-auto-var-init.patch>



Re: [patch for gcc12 stage1][version 2] add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-05-05 Thread Qing Zhao via Gcc-patches
> 
>> 
>>> @@ -11950,6 +12088,72 @@ lower_bound_in_type (tree outer, tree inner)
>>>}
>>> }
>>> 
>>> +/* Returns true when the given TYPE has padding inside it.
>>> +   return false otherwise.  */
>>> +bool
>>> +type_has_padding (tree type)
>> 
>> Would it be possible to reuse __builtin_clear_padding here?
> 
> Not sure, where can I get more details on __builtin_clear_padding? I can 
> study a little bit more on this to make sure this.

After some study, my understanding is, the call to __builtin_clear_padding is 
expanded during gimplification phase.  
And there is no __bultin_clear_padding expanding during rtx expanding phase.

Is the above understanding correct? 

If so,  for -ftrivial-auto-var-init, padding initialization should be done both 
in gimplification phase and rtx expanding phase. 
And since the __builtin_clear_padding might not be good for rtx expanding, 
reusing __builtin_clear_padding might not work.

Let me know if I misunderstand something.

Qing

>> 



Re: [patch for gcc12 stage1][version 2] add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-05-05 Thread Qing Zhao via Gcc-patches
Hi, Richard, 

During the change for the 2nd version based on your previous comments, I have 
the following questions need your help:

> 
>> +  sra_stats.subtree_deferred_init++;
>> +}
>> +  else if (access->grp_to_be_debug_replaced)
>> +{
>> +  /* FIXME, this part might have some issue.  */
>> +  tree drhs = build_debug_ref_for_model (loc, agg,
>> + access->offset - top_offset,
>> + access);
>> +  gdebug *ds = gimple_build_debug_bind (get_access_replacement (access),
>> +drhs, gsi_stmt (*gsi));
>> +  gsi_insert_before (gsi, ds, GSI_SAME_STMT);
> 
> Would be good to fix the FIXME :-)
> 
> I guess the thing we need to decide here is whether -ftrivial-auto-var-init
> should affect debug-only constructs too.  If it doesn't, exmaining removed
> components in a debugger might show uninitialised values in cases where
> the user was expecting initialised ones.  There would be no security
> concern, but it might be surprising.
> 
> I think in principle the DRHS can contain a call to DEFERRED_INIT.
> Doing that would probably require further handling elsewhere though.

Right now, what I did is:

  else if (lhs_access->grp_to_be_debug_replaced)
{
  tree lhs_drepl = get_access_replacement (lhs_access);
  tree init_type_node
   = build_int_cst (integer_type_node, (int) init_type);
  tree call = build_call_expr_internal_loc
  (UNKNOWN_LOCATION, IFN_DEFERRED_INIT,
  TREE_TYPE (lhs_drepl), 2, lhs_drepl, init_type_node);
  gdebug *ds = gimple_build_debug_bind (lhs_drepl, call,
gsi_stmt (*gsi));
  gsi_insert_before (gsi, ds, GSI_SAME_STMT);
}

Is the above matching what you suggested?

What do you mean by “further handling elsewhere”?

> 
>> + is better even for pattern initialization.  */
>> +  return build_int_cstu (type, largevalue);
> 
> I've no objection to that choice for booleans, but: booleans in some
> languages (like Ada) can have multibit precision.  If we want booleans
> to be zero then it would probably be better to treat them as a separate
> case and just use build_zero_cst (type) for them.
> 
> Also, the above won't work correctly for 128-bit integers: it will
> zero-initialize the upper half.  It would probably be better to use
> wi::from_buffer to construct the integer instead.

You mean using wi::from_buffer to construct all the integer type (including 
64-bit, 32-bit, etc.)?

I read the corresponding source codes related to “wi::from_buffer”, but still 
not very clear
On how to use it for my purpose,

From my current understanding, I should use it like the following:

"

unsigned char *ptr = “0x”;

Int total_bytes = GET_MODE_SIZE (SCALAR_INT_TYPE_MODE (type));
wide_int result = wi::from_buffer (ptr, total_bytes);
return wide_int_to_tree (type, result);

“

Is the above correct for INTEGER type?

thanks.

Qing


Re: [patch for gcc12 stage1][version 2] add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-04-27 Thread Qing Zhao via Gcc-patches



> On Apr 27, 2021, at 1:30 AM, Richard Biener  wrote:
> 
>> 
>> equivalent in all respects.  And if we were trying to make them
>> equivalent, we'd need to do much more than this.
>> 
>> The same applies to the pattern case.  If “x” is initialised to a pattern
>> that happens to point to a real decl, we don't have to preserve the
>> order of accesses to the decl wrt accesses to “*x” (especially since
>> we're hoping that “*x” will trap).
>> 
>> I think for aliasing purposes, the .DEFERRED_INIT return value is still
>> analogous to an undefined SSA name, even though we will later generate
>> code to initialise it.
> 
> (only replying to this part, I'll look at the next revised patch series)

Okay, thanks.

> 
> Since .DEFERRED_INIT does not produce any pointers and is not a real
> initialization you don't need to do anything in PTA - but you might
> want to ignore it to not pessimize it by the default handling.  Thus
> 
> +  if (gimple_call_internal_p (t, IFN_DEFERRED_INIT))
> +{
> +  find_func_aliases_for_deferred_init (t);
> +  return;
> +}
> 
> should simply return and find_func_aliases_for_deferred_init can be 
> removed.

Okay. Will do that.

thanks.

Qing



Re: [patch for gcc12 stage1][version 2] add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-04-26 Thread Qing Zhao via Gcc-patches



> On Apr 26, 2021, at 12:47 PM, Richard Sandiford  
> wrote:
> 
> Qing Zhao  writes:
>>>> @@ -1831,6 +2000,17 @@ gimplify_decl_expr (tree *stmt_p, gimple_seq *seq_p)
>>>>   as they may contain a label address.  */
>>>>walk_tree (, force_labels_r, NULL, NULL);
>>>>}
>>>> +  /* When there is no explicit initializer, if the user requested,
>>>> +   We should insert an artifical initializer for this automatic
>>>> +   variable for non vla variables.  */
>>> 
>>> I think we should explain why we can skip VLAs here.
>> 
>> VLA is handled in another place already, it should be initialized with calls 
>> to memset/memcpy.
> 
> Yeah, what I meant here was that the comment should explain the
> difference between the handling of VLAs and non-VLAs.  It's fairly
> obvious when reading the patch, but it won't be as obvious once the
> patch is applied.

Okay, I see, will update the comments.
> 
>>>> +   children are to be processed.  TOP_OFFSET is the offset  of the 
>>>> processed
>>>> +   subtree which has to be subtracted from offsets of individual accesses 
>>>> to
>>>> +   get corresponding offsets for AGG.  GSI is a statement iterator used 
>>>> to place
>>>> +   the new statements.  */
>>>> +static void
>>>> +generate_subtree_deferred_init (struct access *access, tree agg,
>>>> +  enum auto_init_type init_type,
>>>> +  HOST_WIDE_INT top_offset,
>>>> +  gimple_stmt_iterator *gsi,
>>>> +  location_t loc)
>>>> +{
>>>> +  do
>>>> +{
>>>> +  if (access->grp_to_be_replaced)
>>>> +  {
>>>> +tree repl = get_access_replacement (access);
>>>> +tree init_type_node
>>>> +  = build_int_cst (integer_type_node, (int) init_type);
>>>> +gimple *call = gimple_build_call_internal (IFN_DEFERRED_INIT, 2,
>>>> +   repl, init_type_node);
>>>> +gimple_call_set_lhs (call, repl);
>>> 
>>> AFAICT “access” is specifically for the lhs of the original call.
>>> So there seems to be an implicit assumption here that the lhs of the
>>> original call is the same as the first argument of the original call.
>>> Is that guaranteed/required?
>> 
>> For call to DEFFERED_INIT, yes, this is guaranteed.
> 
> OK, in that case…
> 
>>> If so, I think it's something that
>>> tree-cfg.c should check.  It might also be worth having an assertion
>>> in sra_modify_deferred_init.
>> I can definitely add an assertion to make sure this.
> 
> …I think we need the tree-cfg.c check too.  Having the check there
> ensures that the invariant is maintained throughout gimple.

Okay, will add check in tree-cfg.c too.

> 
>>>> +gimple_set_location (call, loc);
>>>> +
>>>> +sra_stats.subtree_deferred_init++;
>>>> +  }
>>>> +  else if (access->grp_to_be_debug_replaced)
>>>> +  {
>>>> +/* FIXME, this part might have some issue.  */
>>>> +tree drhs = build_debug_ref_for_model (loc, agg,
>>>> +   access->offset - top_offset,
>>>> +   access);
>>>> +gdebug *ds = gimple_build_debug_bind (get_access_replacement (access),
>>>> +  drhs, gsi_stmt (*gsi));
>>>> +gsi_insert_before (gsi, ds, GSI_SAME_STMT);
>>> 
>>> Would be good to fix the FIXME :-)
>> 
>> This is the part I am not very sure, so I added the FIXME in order to get 
>> more review and suggestion
>> to make sure it. -:)
>>> 
>>> I guess the thing we need to decide here is whether -ftrivial-auto-var-init
>>> should affect debug-only constructs too.
>> 
>> Where can I get more details on Debug-only constructs ?
> 
> What I meant by “debug-only construct” is a piece of source-level data
> (in this case a field of an aggregate) that has been optimised out of
> the executable code but still exists in debug stmts.

Okay, I see now.

Then, we should handle this case too, I think.

>  AIUI that's what
> the code above is handling.
> 
>>>> @@ -4863,6 +4863,29 @@ find_func_aliases_for_builtin_call (struct function 
>>>> *fn, gcall *t)
>>>>  return false;
>>

Question on __gcov_indirect_call

2021-04-26 Thread Qing Zhao via Gcc-patches
Hi,

We met the following linking error when building our important application:


> …./ld: .o(.text.startup+0x13): unresolvable R_X86_64_TPOFF32 relocation 
> against symbol `__gcov_indirect_call’

Looks like that current “__gcov_indirect_call”’s TLS_MODEL is local exec. 

If recompiling .c with adding -ftls-model=initial-exec, the above linking 
error disappeared.

Our question is:

If __gcov_indirect_call can be linked into a shared library and can be 
referenced from non-PIC code ?

If So, gcc should generate more relaxed TLS mode for it.

Thanks a lot for your help.

Qing



Re: [patch for gcc12 stage1][version 2] add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-04-26 Thread Qing Zhao via Gcc-patches
Hi, Richard,

Thanks a lot for your review.

> On Apr 23, 2021, at 2:05 PM, Richard Sandiford  
> wrote:
> 
> Finally getting to this now that the GCC 11 rush is over.  Sorry for
> the slow response.
> 
> I've tried to review most of the code below, but skipped the testsuite
> parts in the interests of time.  I'll probably have more comments in
> future rounds, just wanted to get the ball rolling.
> 
> This is realy Richi's area more than mine though, so please take this
> with a grain of salt.
> 
> Qing Zhao  writes:
>> 2.  initialize all paddings to zero when -ftrivial-auto-var-init is present.
>> In expr.c (store_constructor):
>> 
>>Clear the whole structure when
>>-ftrivial-auto-var-init and the structure has paddings.
>> 
>> In gimplify.c (gimplify_init_constructor):
>> 
>>Clear the whole structure when
>>-ftrivial-auto-var-init and the structure has paddings.
> 
> Just to check: are we sure we want to use zero as the padding fill
> value even for -ftrivial-auto-var-init=pattern?  Or should it be
> 0xAA instead, to match the integer fill pattern?
> 
> I can see the arguments both ways, just thought it was worth asking.

For this question, I think Kees had provided the background information on it.
Yes, this is basically following Clang’s current implementation in order to 
match C spec.

> 
>> […]
>> @@ -1589,6 +1592,24 @@ handle_retain_attribute (tree *pnode, tree name, tree 
>> ARG_UNUSED (args),
>>   return NULL_TREE;
>> }
>> 
>> +/* Handle a "uninitialized" attribute; arguments as in
> 
> This occurs in existing code too, but s/a/an/.
Okay, will fix it.
> 
>> +   struct attribute_spec.handler.  */
>> +
>> +static tree
>> +handle_uninitialized_attribute (tree *node, tree name, tree ARG_UNUSED 
>> (args),
>> +int ARG_UNUSED (flags), bool *no_add_attrs)
>> +{
>> +  if (VAR_P (*node))
>> +DECL_UNINITIALIZED (*node) = 1;
>> +  else
>> +{
>> +  warning (OPT_Wattributes, "%qE attribute ignored", name);
>> +  *no_add_attrs = true;
>> +}
>> +
>> +  return NULL_TREE;
>> +}
>> +
>> /* Handle a "externally_visible" attribute; arguments as in
>>struct attribute_spec.handler.  */
> 
>> […]
>> @@ -11689,6 +11689,34 @@ Perform basic block vectorization on trees. This 
>> flag is enabled by default at
>> @option{-O3} and by @option{-ftree-vectorize}, @option{-fprofile-use},
>> and @option{-fauto-profile}.
>> 
>> +@item -ftrivial-auto-var-init=@var{choice}
>> +@opindex ftrivial-auto-var-init
>> +Initialize automatic variables with either a pattern or with zeroes to 
>> increase
>> +program security by preventing uninitialized memory disclosure and use.
>> +
>> +The three values of @var{choice} are:
>> +
>> +@itemize @bullet
>> +@item
>> +@samp{uninitialized} doesn't initialize any automatic variables.
>> +This is C and C++'s default.
>> +
>> +@item
>> +@samp{pattern} Initialize automatic variables with values which will likely
>> +transform logic bugs into crashes down the line, are easily recognized in a
>> +crash dump and without being values that programmers can rely on for useful
>> +program semantics.
>> +The values used for pattern initialization might be changed in the future.
>> +
>> +@item
>> +@samp{zero} Initialize automatic variables with zeroes.
>> +@end itemize
>> +
>> +The default is @samp{uninitialized}.
>> +
>> +You can control this behavior for a specific variable by using the variable
>> +attribute @code{uninitialized} (@pxref{Variable Attributes}).
>> +
> 
> I think it's important to say here that GCC still considers the
> variables to be uninitialised and still considers reading them to
> be undefined behaviour.  The option is simply trying to improve the
> security and predictability of the program in the presence of these
> uninitialised variables.
> 
> I think it would also be worth saying that options like -Wuninitialized
> still try to warn about uninitialised variables, although using
> -ftrivial-auto-var-init may change which warnings are generated.
> 
> (The above comments are just a summary, not suitable for direct
> inclusion. :-))

Agreed.

Will update the documentation per your suggestions.

> 
>> @item -fvect-cost-model=@var{model}
>> @opindex fvect-cost-model
>> Alter the cost model used for vectorization.  The @var{model} argument
>> […]
>> diff --git a/gcc/gimplify.c b/gcc/gimplify.c
>> 

Re: [patch for gcc12 stage1][version 2] add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-04-08 Thread Qing Zhao via Gcc-patches
Hi, Kees,

Thanks a lot for your testing on the linux kernel, I am so happy that this time 
it works well.

> On Apr 7, 2021, at 5:19 PM, Kees Cook  wrote:
> 
> On Wed, Mar 24, 2021 at 04:21:49PM -0500, Qing Zhao wrote:
>> This is the 2nd version of the patch for the new security feature for GCC.
>> 
>> Could you please take a look at it and let me know any comments and issues.
> 
> This behaves perfectly as far as I'm able to test in the Linux kernel!
> Thank you!
> 
> For comparison to v1, here's the stack init test output for version 2 of the 
> patch:
> 
> test_stackinit: u8_zero ok
> test_stackinit: u16_zero ok
> test_stackinit: u32_zero ok
> test_stackinit: u64_zero ok
> test_stackinit: char_array_zero ok
> test_stackinit: small_hole_zero ok
> test_stackinit: big_hole_zero ok
> test_stackinit: trailing_hole_zero ok
> test_stackinit: packed_zero ok
> test_stackinit: small_hole_dynamic_partial ok
> test_stackinit: big_hole_dynamic_partial ok
> test_stackinit: trailing_hole_dynamic_partial ok
> test_stackinit: packed_dynamic_partial ok
> test_stackinit: small_hole_static_partial ok
> test_stackinit: big_hole_static_partial ok
> test_stackinit: trailing_hole_static_partial ok
> test_stackinit: packed_static_partial ok
> test_stackinit: small_hole_static_all ok
> test_stackinit: big_hole_static_all ok
> test_stackinit: trailing_hole_static_all ok
> test_stackinit: packed_static_all ok
> test_stackinit: small_hole_dynamic_all ok
> test_stackinit: big_hole_dynamic_all ok
> test_stackinit: trailing_hole_dynamic_all ok
> test_stackinit: packed_dynamic_all ok
> test_stackinit: small_hole_runtime_partial ok
> test_stackinit: big_hole_runtime_partial ok
> test_stackinit: trailing_hole_runtime_partial ok
> test_stackinit: packed_runtime_partial ok
> test_stackinit: small_hole_runtime_all ok
> test_stackinit: big_hole_runtime_all ok
> test_stackinit: trailing_hole_runtime_all ok
> test_stackinit: packed_runtime_all ok
> test_stackinit: u8_none ok
> test_stackinit: u16_none ok
> test_stackinit: u32_none ok
> test_stackinit: u64_none ok
> test_stackinit: char_array_none ok
> test_stackinit: switch_1_none XFAIL (uninit bytes: 8)
> test_stackinit: switch_2_none XFAIL (uninit bytes: 8)
> test_stackinit: small_hole_none ok
> test_stackinit: big_hole_none ok
> test_stackinit: trailing_hole_none ok
> test_stackinit: packed_none ok
> test_stackinit: user ok
> test_stackinit: all tests passed!
> 
> The switch cases are an "expected fail" still, and that's totally fine
> for now[1]. Those were all purged from real kernel code anyway. ;)
> 
> So that's a big "Ack" from me. :)
> 
> What are next steps for this patch?

My understanding is, it needs to be reviewed and get approved by the global 
reviewers first, 
Currently, I am waiting for any comments from global reviewers in order to be 
able to push 
this patch into gcc12 as early as possible. 

> I know a lot of people are looking
> forward to it. (And then long-open bug[2] for auto-init can be closed.)
> 
> Thanks again!

Thank you!


Qing
> 
> -Kees
> 
> [1] Clang doesn't handle this either 
> https://bugs.llvm.org/show_bug.cgi?id=44916 
> <https://bugs.llvm.org/show_bug.cgi?id=44916>
> [2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87210 
> <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87210>
> 
>> 
>> Thanks.
>> 
>> Qing
>> 
>> **compared to Version 1, this version added the following new features 
>> to address Kees’s comments:
>> 
>> 1.  correctly handle VLA inside a structure for pattern initialization.
>> In tree.c (build_pattern_cst):
>> 
>> +   /* if the field is a variable length array, it should be the last
>> +  field of the record, and no need to initialize.  */
>> +   if (TREE_CODE (TREE_TYPE (field)) == ARRAY_TYPE
>> +   && TYPE_SIZE (TREE_TYPE (field)) == NULL_TREE
>> +   && ((TYPE_DOMAIN (TREE_TYPE (field)) != NULL_TREE
>> +   && TYPE_MAX_VALUE (TYPE_DOMAIN (TREE_TYPE (field)))
>> +  == NULL_TREE)
>> +  || TYPE_DOMAIN (TREE_TYPE (field)) == NULL_TREE))
>> + continue;
>> 
>> 2.  initialize all paddings to zero when -ftrivial-auto-var-init is present.
>> In expr.c (store_constructor):
>> 
>>  Clear the whole structure when
>>-ftrivial-auto-var-init and the structure has paddings.
>> 
>> In gimplify.c (gimplify_init_constructor):
>> 
>>  Clear the whole structure when
>>-ftrivial-auto-var-init and the structure has pad

Re: [patch for gcc12 stage1][version 2] add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-04-07 Thread Qing Zhao via Gcc-patches
Ping

> On Mar 24, 2021, at 4:21 PM, Qing Zhao via Gcc-patches 
>  wrote:
> 
> Hi, 
> 
> This is the 2nd version of the patch for the new security feature for GCC.
> 
> Could you please take a look at it and let me know any comments and issues.
> 
> Thanks.
> 
> Qing
> 
> **compared to Version 1, this version added the following new features to 
> address Kees’s comments:
> 
> 1.  correctly handle VLA inside a structure for pattern initialization.
> In tree.c (build_pattern_cst):
> 
> +   /* if the field is a variable length array, it should be the last
> +  field of the record, and no need to initialize.  */
> +   if (TREE_CODE (TREE_TYPE (field)) == ARRAY_TYPE
> +   && TYPE_SIZE (TREE_TYPE (field)) == NULL_TREE
> +   && ((TYPE_DOMAIN (TREE_TYPE (field)) != NULL_TREE
> +   && TYPE_MAX_VALUE (TYPE_DOMAIN (TREE_TYPE (field)))
> +  == NULL_TREE)
> +  || TYPE_DOMAIN (TREE_TYPE (field)) == NULL_TREE))
> + continue;
> 
> 2.  initialize all paddings to zero when -ftrivial-auto-var-init is present.
> In expr.c (store_constructor):
> 
> Clear the whole structure when
> -ftrivial-auto-var-init and the structure has paddings.
> 
> In gimplify.c (gimplify_init_constructor):
> 
> Clear the whole structure when
> -ftrivial-auto-var-init and the structure has paddings.
> 
> As agreed with Kees, treat the issue related to auto variables outside of the 
> cases and inside the switch as a low priority one. 
> 
> 3. Add  the following new testing cases for the above 1 and 2:
> 
> * c-c++-common/auto-init-13.c: New test.
> * c-c++-common/auto-init-14.c: New test. 
> 
> * gcc.target/aarch64/auto-init-9.c: New test.
> * gcc.target/aarch64/auto-init-10.c: New test.
> * gcc.target/aarch64/auto-init-11.c: New test.
> * gcc.target/aarch64/auto-init-12.c: New test.
> * gcc.target/aarch64/auto-init-13.c: New test.
> * gcc.target/aarch64/auto-init-14.c: New test.
> * gcc.target/aarch64/auto-init-15.c: New test.
> * gcc.target/aarch64/auto-init-16.c: New test.
> * gcc.target/aarch64/auto-init-17.c: New test.
> * gcc.target/aarch64/auto-init-18.c: New test.
> * gcc.target/aarch64/auto-init-19.c: New test.
> * gcc.target/aarch64/auto-init-20.c: New test.
> 
> * gcc.target/i386/auto-init-9.c: New test.
> * gcc.target/i386/auto-init-10.c: New test.
> * gcc.target/i386/auto-init-11.c: New test.
> * gcc.target/i386/auto-init-12.c: New test.
> * gcc.target/i386/auto-init-13.c: New test.
> * gcc.target/i386/auto-init-14.c: New test.
> * gcc.target/i386/auto-init-15.c: New test.
> * gcc.target/i386/auto-init-16.c: New test.
> * gcc.target/i386/auto-init-17.c: New test.
> * gcc.target/i386/auto-init-18.c: New test.
> * gcc.target/i386/auto-init-19.c: New test.
> * gcc.target/i386/auto-init-20.c: New test.
> 
> 4. Update the default approach as D,  then when specify 
> -ftrivial-auto-var-init,  the default approach is the “.DEFERRED_INIT” 
> Approach. No need to add -fauto-var-init-approach=D anymore.
> 
> If we need to compare approach A and D, we can add -fauto-var-init-approach=A 
> to get that implementation. 
> 
> 5.  Delete all -fauto-var-init-approach=D in the testing cases. 
> 
> ** others are the same as Version 1, please see the version 1 description 
> at:
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2021-February/565581.html 
> <https://gcc.gnu.org/pipermail/gcc-patches/2021-February/565581.html>
> 
> 
> ***Changelog:
> 
> gcc/:
> 
> 2021-03-24  qing zhao  
> 
> * common.opt (ftrivial-auto-var-init=): New.
> (fauto-var-init-approach=): Likewise.
> * doc/extend.texi: Document the uninitialized attribute.
> * doc/invoke.texi: Document -ftrivial-auto-var-init.
> * expr.c (store_constructor): Clear the whole structure when
> -ftrivial-auto-var-init and the structure has paddings.
> * flag-types.h (enum auto_init_type): New enumerated type
> auto_init_type.
> (enum auto_init_approach): New enumerated type auto_init_approach.
> * gimple.h (enum gf_mask): Add GF_CALL_MEMSET_FOR_UNINIT case.
> (gimple_call_set_memset_for_uninit): New function.
> (gimple_call_memset_for_uninit_p): Likewise.
> * gimplify.c (gimplify_vla_decl): Add initialization to vla per users'
> requests. 
> 

Re: [PATCH] testsuite: Disable zero-scratch-regs-{8, 9, 10, 11}.c on s390* [PR97680]

2021-03-31 Thread Qing Zhao via Gcc-patches
Yes, basically, I agreed with Eric. 

One of the major reason to intentionally put these testing cases under 
c-c++-common
 is to fail them by default on the platforms that do not support this feature 
yet. 

Then the platform maintainer could decide whether to complete this feature on 
the 
specific platform or skip them if they don’t want such feature on this 
platform. 

Qing

> On Mar 31, 2021, at 2:14 AM, Eric Botcazou  wrote:
> 
>> That is true, but nothing really happened during the 5 months that the tests
>> have been failing on many other architectures (except that powerpc and arm
>> had skipped those tests).  There has been a PR open for all those 5 months.
> 
> So what?  This is not the first example and I don't see anything special with 
> it.  You or maintainers can decide to XFAIL particular architectures at will, 
> but hiding the failures by default is IMO not appropriate.
> 
>> We can perhaps revert the skips after branching GCC 11 off, but I have
>> little hope other target maintainers will do what you did, so unsure if it
>> would help.  And the changes need people familiar with each of the backends
>> to decide what needs to be done and what is doable.
> 
> That's exactly the same situation as for -fstack-usage/-Wstack-usage, where I 
> intentionally made gcc.dg/stack-usage-1.c fail by default so that maintainers 
> could add the missing bits; this worked relatively well.
> 
> -- 
> Eric Botcazou
> 
> 



Re: [RFC][patch for gcc12][version 1] add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-03-15 Thread Qing Zhao via Gcc-patches
(CC’ing gcc-patch alias).

Hi, Kees,


> On Mar 12, 2021, at 3:55 PM, Kees Cook  wrote:
> 
> On Fri, Mar 12, 2021 at 03:35:28PM -0600, Qing Zhao wrote:
>> Hi, Kees,
>> 
>> I am looking at the structure padding initialization issue. And also have 
>> some questions:
>> 
>> 
>>> On Feb 24, 2021, at 10:41 PM, Kees Cook  wrote:
>>> 
>>> It looks like there is still some issues with padding and pre-case
>>> switch variables. Here's the test output, FWIW:
>>> 
>>> 
>>> test_stackinit: small_hole_static_all FAIL (uninit bytes: 3)
>>> test_stackinit: big_hole_static_all FAIL (uninit bytes: 61)
>>> test_stackinit: trailing_hole_static_all FAIL (uninit bytes: 7)
>>> test_stackinit: small_hole_dynamic_all FAIL (uninit bytes: 3)
>>> test_stackinit: big_hole_dynamic_all FAIL (uninit bytes: 61)
>>> test_stackinit: trailing_hole_dynamic_all FAIL (uninit bytes: 7)
>>> 
>>> test_stackinit: switch_1_none FAIL (uninit bytes: 8)
>>> test_stackinit: switch_2_none FAIL (uninit bytes: 8)
>>> test_stackinit: failures: 8
>>> 
>>> 
>>> /* Simple structure with padding likely to be covered by compiler. */
>>> struct test_small_hole {
>>> size_t one;
>>> char two;
>>> /* 3 byte padding hole here. */
>>> int three;
>>> unsigned long four;
>>> };
>>> 
>>> /* Try to trigger unhandled padding in a structure. */
>>> struct test_aligned {
>>> u32 internal1;
>>> u64 internal2;
>>> } __aligned(64);
>>> 
>>> struct test_big_hole {
>>> u8 one;
>>> u8 two;
>>> u8 three;
>>> /* 61 byte padding hole here. */
>>> struct test_aligned four;
>>> } __aligned(64);
>>> 
>>> struct test_trailing_hole {
>>> char *one;
>>> char *two;
>>> char *three;
>>> char four;
>>> /* "sizeof(unsigned long) - 1" byte padding hole here. */
>>> };
>>> 
>>> They fail when they're statically initialized (either fully or
>>> partially),
>> 
>> So, when the structure is not statically initialized,  the compiler 
>> initialization is good?
>> 
>> For the failing cases, what’s the behavior of the LLVM 
>> -ftrivial-auto-var-init?
>> 
>> From the LLVM patch:  (https://reviews.llvm.org/D54604 
>> <https://reviews.llvm.org/D54604>)
>> 
>> 
>> To keep the patch simple, only some undef is removed for now, see
>> replaceUndef. The padding-related infoleaks are therefore not all gone yet.
>> This will be addressed in a follow-up, mainly because addressing 
>> padding-related
>> leaks should be a stand-alone option which is implied by variable
>> initialization.
>> 
> 
> Right, padding init happened in:
> https://github.com/llvm/llvm-project/commit/4f7bc0eee7e6099b1abd57dac3c83529944ab23c
> 
> And was further clarified that, IIUC, padding _must be zero_ regardless
> of pattern-vs-zero in:
> https://github.com/llvm/llvm-project/commit/d39fbc7e20d84364e409ce59724ce20625637062

Thanks a lot for the above information, they are very useful.
I will take a look at the LLVM patch and try to implement this feature into GCC 
as well.

> 
>> Yes, in GCC’s implementation, I think that  fixing all padding-related leaks 
>> also require a
>> separate patch.
> 
> That's fine -- but it'll need to be tied to -ftrivial-auto-var-init,
> since otherwise the memory isn't actually fully initialized. :)

Okay, will do that.

Thanks again.

Qing
> 
> -Kees
> 
> -- 
> Kees Cook



Re: [RFC][patch for gcc12][version 1] add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-03-12 Thread Qing Zhao via Gcc-patches
Hi, Kees,

I am looking at the structure padding initialization issue. And also have some 
questions:


> On Feb 24, 2021, at 10:41 PM, Kees Cook  wrote:
> 
> It looks like there is still some issues with padding and pre-case
> switch variables. Here's the test output, FWIW:
> 
> 
> test_stackinit: small_hole_static_all FAIL (uninit bytes: 3)
> test_stackinit: big_hole_static_all FAIL (uninit bytes: 61)
> test_stackinit: trailing_hole_static_all FAIL (uninit bytes: 7)
> test_stackinit: small_hole_dynamic_all FAIL (uninit bytes: 3)
> test_stackinit: big_hole_dynamic_all FAIL (uninit bytes: 61)
> test_stackinit: trailing_hole_dynamic_all FAIL (uninit bytes: 7)
> 
> test_stackinit: switch_1_none FAIL (uninit bytes: 8)
> test_stackinit: switch_2_none FAIL (uninit bytes: 8)
> test_stackinit: failures: 8
> 
> 
> /* Simple structure with padding likely to be covered by compiler. */
> struct test_small_hole {
>   size_t one;
>   char two;
>   /* 3 byte padding hole here. */
>   int three;
>   unsigned long four;
> };
> 
> /* Try to trigger unhandled padding in a structure. */
> struct test_aligned {
>   u32 internal1;
>   u64 internal2;
> } __aligned(64);
> 
> struct test_big_hole {
>   u8 one;
>   u8 two;
>   u8 three;
>   /* 61 byte padding hole here. */
>   struct test_aligned four;
> } __aligned(64);
> 
> struct test_trailing_hole {
>   char *one;
>   char *two;
>   char *three;
>   char four;
>   /* "sizeof(unsigned long) - 1" byte padding hole here. */
> };
> 
> They fail when they're statically initialized (either fully or
> partially),

So, when the structure is not statically initialized,  the compiler 
initialization is good?

For the failing cases, what’s the behavior of the LLVM -ftrivial-auto-var-init?

From the LLVM patch:  (https://reviews.llvm.org/D54604 
)


To keep the patch simple, only some undef is removed for now, see
replaceUndef. The padding-related infoleaks are therefore not all gone yet.
This will be addressed in a follow-up, mainly because addressing padding-related
leaks should be a stand-alone option which is implied by variable
initialization.


Yes, in GCC’s implementation, I think that  fixing all padding-related leaks 
also require a
separate patch.

Qing

> for example:
> 
> struct test_..._hole instance = { .two = ..., };
> 
> or
> 
> struct test_..._hole instance = { .one = ...,
> .two = ...,
> .three = ...,
> .four = ...,
>   };
> 
> The last case is for switch variables outside of case statements, like
> "var" here:
> 
>   switch (path) {
>   unsigned long var;
> 
>   case ..:
>   ...
>   case ..:
>   ...
>   ...
>   }
> 
> 
> I'm really looking forward to having this available. Thanks again!
> 
> -Kees
> 
> [1] 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/lib/test_stackinit.c
> 
> -- 
> Kees Cook



Re: [RFC][patch for gcc12][version 1] add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-03-12 Thread Qing Zhao via Gcc-patches



> On Mar 11, 2021, at 6:46 PM, Kees Cook  wrote:
> 
> On Thu, Mar 11, 2021 at 03:47:17PM -0600, Qing Zhao wrote:
>> Hi, Kees,
>> 
>> Sorry for the late reply (I have been busy with other work recently).
>> 
>> Currently, I am working on the issue of flexible length array as the last 
>> field of the structure.
>> 
>> In order to fix it correctly, I have the following question:
>> 
>> 
>>> On Feb 26, 2021, at 3:42 PM, Kees Cook  wrote:
>>> 
>>> On Thu, Feb 25, 2021 at 05:56:38PM -0600, Qing Zhao wrote:
>>>> Just noticed that you didn’t add -fauto-var-init-approach=D to the command 
>>>> line.
>>> 
>>> Ah-ha! I didn't realize that was needed; thanks. However, now some of the 
>>> sources crash in a different way. Here's the reproducer:
>>> 
>>> $ cat poc.i
>>> struct a {
>>> int b;
>>> int array[];
>>> };
>>> void c() {
>>> struct a d;
>>> }
>>> 
>> 
>> For such variable length array as the last field of the structure, static 
>> initialization is not allowed, 
>> User needs to explicitly allocate memory and initialize the allocated array 
>> manually in the source code. 
>> 
>> So, if the compiler has to initialize this structure when requested by 
>> -ftrivial-auto-var-init,  I think that 
>> only the fields before the last fields need to be initialized, Is this the 
>> correct behavior you expected?
> 
> Right, that would be my expectation as well. Putting such a struct on
> the stack tends to be nonsensical, but maybe happens if part of a union,
> which would get initialized correctly, etc:
> 
> union {
>   struct a {
>   int b;
>   int array[];
>   };
>   char buf[32];
> };
> 

Okay, thanks. This issue has been fixed in my local repository.

Qing
> -- 
> Kees Cook



Re: [RFC][patch for gcc12][version 1] add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-03-11 Thread Qing Zhao via Gcc-patches
Hi, Kees,

Sorry for the late reply (I have been busy with other work recently).

Currently, I am working on the issue of flexible length array as the last field 
of the structure.

In order to fix it correctly, I have the following question:


> On Feb 26, 2021, at 3:42 PM, Kees Cook  wrote:
> 
> On Thu, Feb 25, 2021 at 05:56:38PM -0600, Qing Zhao wrote:
>> Just noticed that you didn’t add -fauto-var-init-approach=D to the command 
>> line.
> 
> Ah-ha! I didn't realize that was needed; thanks. However, now some of the 
> sources crash in a different way. Here's the reproducer:
> 
> $ cat poc.i
> struct a {
>  int b;
>  int array[];
> };
> void c() {
>  struct a d;
> }
> 

For such variable length array as the last field of the structure, static 
initialization is not allowed, 
User needs to explicitly allocate memory and initialize the allocated array 
manually in the source code. 

So, if the compiler has to initialize this structure when requested by 
-ftrivial-auto-var-init,  I think that 
only the fields before the last fields need to be initialized, Is this the 
correct behavior you expected?

Thanks.

Qing


> $ gcc -ftrivial-auto-var-init=pattern -fauto-var-init-approach=D -c /dev/null 
> poc.i
> during RTL pass: expand
> poc.i: In function ‘c’:
> poc.i:6:12: internal compiler error: in build_pattern_cst, at tree.c:2652
>6 |   struct a d;
>  |^
> 0x75b572 build_pattern_cst(tree_node*)
>../../../gcc/gcc/tree.c:2652
> 0x10db116 build_pattern_cst(tree_node*)
>../../../gcc/gcc/tree.c:2612
> 0xb8a230 expand_DEFERRED_INIT
>../../../gcc/gcc/internal-fn.c:2980
> 0x970e17 expand_call_stmt
>../../../gcc/gcc/cfgexpand.c:2749
> 0x970e17 expand_gimple_stmt_1
>../../../gcc/gcc/cfgexpand.c:3844
> 0x970e17 expand_gimple_stmt
>../../../gcc/gcc/cfgexpand.c:4008
> 0x9766b3 expand_gimple_basic_block
>../../../gcc/gcc/cfgexpand.c:6045
> 0x9780d6 execute
>../../../gcc/gcc/cfgexpand.c:6729
> Please submit a full bug report,
> with preprocessed source if appropriate.
> Please include the complete backtrace with any bug report.
> See <https://gcc.gnu.org/bugs/> for instructions.
> 
> I assume it's not handling the flex-array happily?
> 
> -- 
> Kees Cook



Re: [RFC][patch for gcc12][version 1] add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-02-26 Thread Qing Zhao via Gcc-patches
Thanks. I will take a look and fix this issue.

BTW, could you please also re-test -ftrivial-auto-var-init=zero 
-fauto-var-init-approach=D too? 
And let me know are there new issues for -ftrivial-auto-var-init=zero?

(FYI, I have tested -ftrivial-auto-var-init=zero -fauto-var-init-approach=D and
also -ftrivial-auto-var-init=pattern -fauto-var-init-approach=D on cpu2017, 
without any issue).


Thanks a lot for your help.

Qing


On Feb 26, 2021, at 3:42 PM, Kees Cook  wrote:
> 
> On Thu, Feb 25, 2021 at 05:56:38PM -0600, Qing Zhao wrote:
>> Just noticed that you didn’t add -fauto-var-init-approach=D to the command 
>> line.
> 
> Ah-ha! I didn't realize that was needed; thanks. However, now some of the 
> sources crash in a different way. Here's the reproducer:
> 
> $ cat poc.i
> struct a {
>  int b;
>  int array[];
> };
> void c() {
>  struct a d;
> }
> 
> $ gcc -ftrivial-auto-var-init=pattern -fauto-var-init-approach=D -c /dev/null 
> poc.i
> during RTL pass: expand
> poc.i: In function ‘c’:
> poc.i:6:12: internal compiler error: in build_pattern_cst, at tree.c:2652
>6 |   struct a d;
>  |^
> 0x75b572 build_pattern_cst(tree_node*)
>../../../gcc/gcc/tree.c:2652
> 0x10db116 build_pattern_cst(tree_node*)
>../../../gcc/gcc/tree.c:2612
> 0xb8a230 expand_DEFERRED_INIT
>../../../gcc/gcc/internal-fn.c:2980
> 0x970e17 expand_call_stmt
>../../../gcc/gcc/cfgexpand.c:2749
> 0x970e17 expand_gimple_stmt_1
>../../../gcc/gcc/cfgexpand.c:3844
> 0x970e17 expand_gimple_stmt
>../../../gcc/gcc/cfgexpand.c:4008
> 0x9766b3 expand_gimple_basic_block
>../../../gcc/gcc/cfgexpand.c:6045
> 0x9780d6 execute
>../../../gcc/gcc/cfgexpand.c:6729
> Please submit a full bug report,
> with preprocessed source if appropriate.
> Please include the complete backtrace with any bug report.
> See <https://gcc.gnu.org/bugs/> for instructions.
> 
> I assume it's not handling the flex-array happily?
> 
> -- 
> Kees Cook



Re: [RFC][patch for gcc12][version 1] add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-02-25 Thread Qing Zhao via Gcc-patches
Hi, Kees,

Just noticed that you didn’t add -fauto-var-init-approach=D to the command line.
[qinzhao@localhost uninit]$ cat t8.c
a() { char b[1]; }
[qinzhao@localhost uninit]$ sh t
/home/qinzhao/Install/latest/bin/gcc -ftrivial-auto-var-init=pattern 
-fauto-var-init-approach=D t8.c -S
t8.c:1:1: warning: return type defaults to ‘int’ [-Wimplicit-int]
1 | a() { char b[1]; }
  | ^

Without -fauto-var-init-approach=D, I have the same error as yours. 

(This option is just temporary, its purpose is to compare two different 
implementations for “zero” initialization,
Since “pattern” initialization is added later after the comparison, “pattern” 
initialization does not support the 
Default “A” approach. I plan to make the default as “D” in the final version of 
the patch).

So, please add “-fauto-var-init-approach=D” along with 
“-ftrivial-auto-var-init=pattern/zero” for the testing.

Sorry for the confusion.


> On Feb 25, 2021, at 2:00 PM, Kees Cook  wrote:
> 
> On Thu, Feb 25, 2021 at 12:15:01PM -0600, Qing Zhao wrote:
>>> On Feb 24, 2021, at 10:41 PM, Kees Cook  wrote:
>>> [...]
>>> test_stackinit: trailing_hole_none ok
>>> test_stackinit: packed_none ok
>>> test_stackinit: user ok
>>> test_stackinit: failures: 8
>> 
>> Does the above testing include “pattern initialization” in addition to “zero 
>> initialization”?
> 
> This was from the zero-init case. I've just tested pattern-init now and
> it actually crashes GCC. I minimized the test case to this:
> 
> $ cat main.i
> a() { char b[1]; }
> $ gcc -ftrivial-auto-var-init=pattern -c /dev/null main.i
> main.i:1:1: warning: return type defaults to ‘int’ [-Wimplicit-int]
>1 | a() { char b[1]; }
>  | ^
> main.i: In function ‘a’:
> main.i:1:12: internal compiler error: in gimplify_init_ctor_eval, at
> gimplify.c:4873
>1 | a() { char b[1]; }
>  |^
> 0x69740d gimplify_init_ctor_eval
>../../../gcc/gcc/gimplify.c:4873
> 0xb5ac8f gimplify_init_constructor
>../../../gcc/gcc/gimplify.c:5320
> 0xb6b68a gimplify_modify_expr
>../../../gcc/gcc/gimplify.c:5952
> 0xb533ba gimplify_expr(tree_node**, gimple**, gimple**, bool (*)(tree_node*), 
> int)
>../../../gcc/gcc/gimplify.c:14262
> 0xb56b26 gimplify_stmt(tree_node**, gimple**)
>../../../gcc/gcc/gimplify.c:7056
> 0xb68e6e gimplify_and_add(tree_node*, gimple**)
>../../../gcc/gcc/gimplify.c:489
> 0xb68e6e gimple_add_init_for_auto_var
>../../../gcc/gcc/gimplify.c:1892
> 0xb68e6e gimplify_decl_expr
>../../../gcc/gcc/gimplify.c:2010
> 0xb53bd6 gimplify_expr(tree_node**, gimple**, gimple**, bool (*)(tree_node*), 
> int)
>../../../gcc/gcc/gimplify.c:14459
> 0xb56b26 gimplify_stmt(tree_node**, gimple**)
>../../../gcc/gcc/gimplify.c:7056
> 0xb5727d gimplify_bind_expr
>../../../gcc/gcc/gimplify.c:1421
> 0xb536f0 gimplify_expr(tree_node**, gimple**, gimple**, bool (*)(tree_node*), 
> int)
>../../../gcc/gcc/gimplify.c:14463
> 0xb6ccc9 gimplify_stmt(tree_node**, gimple**)
>../../../gcc/gcc/gimplify.c:7056
> 0xb6ccc9 gimplify_body(tree_node*, bool)
>../../../gcc/gcc/gimplify.c:15498
> 0xb6d0ed gimplify_function_tree(tree_node*)
>../../../gcc/gcc/gimplify.c:15652
> 0x9ae8d7 cgraph_node::analyze()
>../../../gcc/gcc/cgraphunit.c:670
> 0x9b13a7 analyze_functions
>../../../gcc/gcc/cgraphunit.c:1233
> 0x9b1f9d symbol_table::finalize_compilation_unit()
>../../../gcc/gcc/cgraphunit.c:2511
> Please submit a full bug report,
> with preprocessed source if appropriate.
> Please include the complete backtrace with any bug report.
> See <https://gcc.gnu.org/bugs/> for instructions.
> 
> 
>>> struct test_..._hole instance = { .two = ..., };
>>> 
>>> or
>>> 
>>> struct test_..._hole instance = { .one = ...,
>>>   .two = ...,
>>>   .three = ...,
>>>   .four = ...,
>>> 
>>> };
>> 
>> So, when the structure variables are not statically initialized, all the 
>> paddings are initialized correctly by the compiler?
> 
> For the zero case, yes. (Usually such happen via copies into stack from from 
> .rodata
> sections, or accidentally from already-initialized copies.)
> 
>> In the current implementation, when the auto variable is explicitly 
>> initialized, compiler will do nothing.
>> Looks like for structure variables we need some special handling to 
>> initialize the paddings.
>> Need to study this a little bit and see how to fix it.
> 
> D

Re: [RFC][patch for gcc12][version 1] add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-02-25 Thread Qing Zhao via Gcc-patches
Hi, Kees,

Thanks a lot for your testings on linux kernel.
I am happy to know that the initial implementation works fine. 
I will study the padding case and the switch case to fix the issues there.


> On Feb 24, 2021, at 10:41 PM, Kees Cook  wrote:
> 
> (please keep me in CC, I'm not subscribed...)

Yes, I will.

> 
> On Thu Feb 18, 2021 Qing Zhao said:
>> Initialize automatic variables with new first class option 
>> -ftrivial-auto-var-init=[uninitialized|pattern|zero]
> 
> Yay! I'm really excited to see this. Thank you for working on
> it! I've built GCC with this applied, and it works out of the box
> for a Linux kernel build, which correctly detects the availability
> of -ftrivial-auto-var-init=[pattern|zero] for the respective
> CONFIG_INIT_STACK_ALL_PATTERN and CONFIG_INIT_STACK_ALL_ZERO options.
> 
> The output from the kernel's CONFIG_TEST_STACKINIT module shows coverage
> for most uninitialized cases. Yay! :)
> 
> It looks like there is still some issues with padding and pre-case
> switch variables. Here's the test output, FWIW:
> 
> test_stackinit: u8_zero ok
> test_stackinit: u16_zero ok
> test_stackinit: u32_zero ok
> test_stackinit: u64_zero ok
> test_stackinit: char_array_zero ok
> test_stackinit: small_hole_zero ok
> test_stackinit: big_hole_zero ok
> test_stackinit: trailing_hole_zero ok
> test_stackinit: packed_zero ok
> test_stackinit: small_hole_dynamic_partial ok
> test_stackinit: big_hole_dynamic_partial ok
> test_stackinit: trailing_hole_dynamic_partial ok
> test_stackinit: packed_dynamic_partial ok
> test_stackinit: small_hole_static_partial ok
> test_stackinit: big_hole_static_partial ok
> test_stackinit: trailing_hole_static_partial ok
> test_stackinit: packed_static_partial ok
> test_stackinit: small_hole_static_all FAIL (uninit bytes: 3)
> test_stackinit: big_hole_static_all FAIL (uninit bytes: 61)
> test_stackinit: trailing_hole_static_all FAIL (uninit bytes: 7)
> test_stackinit: packed_static_all ok
> test_stackinit: small_hole_dynamic_all FAIL (uninit bytes: 3)
> test_stackinit: big_hole_dynamic_all FAIL (uninit bytes: 61)
> test_stackinit: trailing_hole_dynamic_all FAIL (uninit bytes: 7)
> test_stackinit: packed_dynamic_all ok
> test_stackinit: small_hole_runtime_partial ok
> test_stackinit: big_hole_runtime_partial ok
> test_stackinit: trailing_hole_runtime_partial ok
> test_stackinit: packed_runtime_partial ok
> test_stackinit: small_hole_runtime_all ok
> test_stackinit: big_hole_runtime_all ok
> test_stackinit: trailing_hole_runtime_all ok
> test_stackinit: packed_runtime_all ok
> test_stackinit: u8_none ok
> test_stackinit: u16_none ok
> test_stackinit: u32_none ok
> test_stackinit: u64_none ok
> test_stackinit: char_array_none ok
> test_stackinit: switch_1_none FAIL (uninit bytes: 8)
> test_stackinit: switch_2_none FAIL (uninit bytes: 8)
> test_stackinit: small_hole_none ok
> test_stackinit: big_hole_none ok
> test_stackinit: trailing_hole_none ok
> test_stackinit: packed_none ok
> test_stackinit: user ok
> test_stackinit: failures: 8

Does the above testing include “pattern initialization” in addition to “zero 
initialization”?
> 
> The kernel's test for this is a mess[1] of macros I used to avoid losing
> my sanity from cut/pasting, but it makes the tests hard to read. To
> break it out, the failing cases are due to padding, as seen with the
> "test_small_hole", "test_big_hole", and "test_trailing_hole" structures:
> 
> /* Simple structure with padding likely to be covered by compiler. */
> struct test_small_hole {
>   size_t one;
>   char two;
>   /* 3 byte padding hole here. */
>   int three;
>   unsigned long four;
> };
> 
> /* Try to trigger unhandled padding in a structure. */
> struct test_aligned {
>   u32 internal1;
>   u64 internal2;
> } __aligned(64);
> 
> struct test_big_hole {
>   u8 one;
>   u8 two;
>   u8 three;
>   /* 61 byte padding hole here. */
>   struct test_aligned four;
> } __aligned(64);
> 
> struct test_trailing_hole {
>   char *one;
>   char *two;
>   char *three;
>   char four;
>   /* "sizeof(unsigned long) - 1" byte padding hole here. */
> };
> 
> They fail when they're statically initialized (either fully or
> partially), for example:
> 
> struct test_..._hole instance = { .two = ..., };
> 
> or
> 
> struct test_..._hole instance = { .one = ...,
> .two = ...,
> .three = ...,
> .four = ...,
>   
>   };

So, when the structure variables are not statically initialized, al

Re: PR 96391? Can we fix it for gcc11?

2021-02-23 Thread Qing Zhao via Gcc-patches
Hi, Richard,

> On Feb 9, 2021, at 11:36 AM, Richard Biener  wrote:
> 
> On Tue, 9 Feb 2021, Qing Zhao wrote:
>> 
>> Yes, I understand that without a working testing case to repeat the error, 
>> it’s very hard to debug and fix the issue. 
>> 
>> However, providing a testing case for this bug is really challenging from 
>> our side due to multiple reasons…
>> 
>> 
> 
> Note you can try reducing a proprietary testcase with tools like
> cvise or creduce.  Does your case also happen in a mingw/windows
> environment?

We are trying to install a creduce on our system, and noticed that it depend on 
LLVM, I am wondering whether there is
a similar tool that depends on GCC? 

Qing
> 
> Richard.
> 



[RFC][patch for gcc12][version 1] add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-02-18 Thread Qing Zhao via Gcc-patches
 size data are all based on the 
comparison between A and D,
 Please help me to make sure both the implementation is correct)

Currently, there is a first class option -fauto-var-init-approach=[A|B|C|D]. is 
added to choose A or D. 

If after the review, there is no issue, I will delete the code related to “A” 
in the next version. 

There are several places, I put “FIXME” in the comments, please review 
those part and let me 
know any issues with those parts. 

***Changelog:

gcc/ChangeLog:

2021-02-17  qing zhao  mailto:qing.z...@oracle.com>>

* common.opt (ftrivial-auto-var-init=): New.
(fauto-var-init-approach=): Likewise.
* doc/extend.texi: Document the uninitialized attribute.
* doc/invoke.texi: Document -ftrivial-auto-var-init.
* flag-types.h (enum auto_init_type): New enumerated type
auto_init_type.
(enum auto_init_approach): New enumerated type auto_init_approach.
* gimple.h (enum gf_mask): Add GF_CALL_MEMSET_FOR_UNINIT case.
(gimple_call_set_memset_for_uninit): New function.
(gimple_call_memset_for_uninit_p): Likewise.
* gimplify.c (gimplify_vla_decl): Add initialization to vla per users'
requests. 
(build_deferred_init): New function.
(gimple_add_init_for_auto_var): Likewise.
(gimplify_decl_expr): Add initialization to automatic variables per
users' requests.
* internal-fn.c (expand_DEFERRED_INIT): New function.
* internal-fn.def (DEFERRED_INIT): New internal function.
* tree-cfg.c (verify_gimple_call): Skip calls to DEFERRED_INIT.
* tree-core.h (tree_decl_with_vis): Add uninitialized field.
* tree-sra.c (sra_stats): Add two new fields deferred_init and
subtree_deferred_init.
(generate_subtree_deferred_init): New function.
(sra_modify_deferred_init): Likewise.
(sra_modify_function_body): Handle calls to DEFERRED_INIT specially.
* tree-ssa-structalias.c (find_func_aliases_for_deferred_init): New
function.
(find_func_aliases_for_call): Handle calls to DEFERRED_INIT specially.
* tree-ssa-uninit.c (warn_uninit): Handle calls to DEFERRED_INIT
specially.
(check_defs): Handle calls to DEFERRED_INIT and MEMSET for uninitialized
variable specially.
(warn_uninitialized_vars): Handle calls to DEFERRED_INIT specially.
* tree-ssa.c (ssa_undefined_value_p): Handle calls to DEFERRED_INIT
specially.
* tree.c (build_pattern_cst): New function.
* tree.h (DECL_UNINITIALIZED): New macro.
(build_pattern_cst): New declaration.

gcc/c-family/ChangeLog:

2021-02-17  qing zhao  mailto:qing.z...@oracle.com>>

* c-attribs.c (handle_uninitialized_attribute): New function.
(c_common_attribute_table): Add "uninitialized" attribute.

gcc/testsuite/ChangeLog:

2021-02-17  qing zhao  mailto:qing.z...@oracle.com>>

* c-c++-common/auto-init-1.c: New test.
* c-c++-common/auto-init-10.c: New test.
* c-c++-common/auto-init-11.c: New test.
* c-c++-common/auto-init-12.c: New test.
* c-c++-common/auto-init-2.c: New test.
* c-c++-common/auto-init-3.c: New test.
* c-c++-common/auto-init-4.c: New test.
* c-c++-common/auto-init-5.c: New test.
* c-c++-common/auto-init-6.c: New test.
* c-c++-common/auto-init-7.c: New test.
* c-c++-common/auto-init-8.c: New test.
* c-c++-common/auto-init-9.c: New test.
* c-c++-common/auto-init-esra.c: New test.
* g++.dg/auto-init-uninit-pred-1_a.C: New test.
* g++.dg/auto-init-uninit-pred-1_b.C: New test.
* g++.dg/auto-init-uninit-pred-2_a.C: New test.
* g++.dg/auto-init-uninit-pred-2_b.C: New test.
* g++.dg/auto-init-uninit-pred-3_a.C: New test.
* g++.dg/auto-init-uninit-pred-3_b.C: New test.
* g++.dg/auto-init-uninit-pred-4.C: New test.
* g++.dg/auto-init-uninit-pred-loop-1_a.cc 
<http://auto-init-uninit-pred-loop-1_a.cc/>: New test.
* g++.dg/auto-init-uninit-pred-loop-1_b.cc 
<http://auto-init-uninit-pred-loop-1_b.cc/>: New test.
* g++.dg/auto-init-uninit-pred-loop-1_c.cc 
<http://auto-init-uninit-pred-loop-1_c.cc/>: New test.
* g++.dg/auto-init-uninit-pred-loop_1.cc 
<http://auto-init-uninit-pred-loop_1.cc/>: New test.
* gcc.dg/auto-init-uninit-1.c: New test.
* gcc.dg/auto-init-uninit-11.c: New test.
* gcc.dg/auto-init-uninit-12.c: New test.
* gcc.dg/auto-init-uninit-13.c: New test.
* gcc.dg/auto-init-uninit-14.c: New test.
* gcc.dg/auto-init-uninit-15.c: New test.
* gcc.dg/auto-init-uninit-16.c: New test.
* gcc.dg/auto-init-uninit-17.c: New test.
* gcc.dg/auto-init-uninit-18.c: New test.
* gcc.dg/auto-init-uninit-19.c: New test.
* gcc.dg/auto-init-uninit-2

Re: [PATCH] Change semantics of -frecord-gcc-switches and add -frecord-gcc-switches-format.

2021-02-16 Thread Qing Zhao via Gcc-patches
Hello,

What’s the status of this patch now? Is there any major technical issue with 
the patch?

Our company has been waiting for this patch for almost one year, we need it for 
our important application.

Could this one be approved and committed to gcc11?

Thanks.

Qing

> On Feb 5, 2021, at 3:34 AM, Martin Liška  wrote:
> 
> Hello.
> 
> Based on discussion with Richi, I'm re-sending the patch. Note that the patch
> has been waiting for a review for almost one year and I would like to see
> it in GCC 11.1.
> 
> Thank you,
> Martin
> <0001-Change-semantics-of-frecord-gcc-switches-and-add-fre.patch>



Re: PR 96391? Can we fix it for gcc11?

2021-02-10 Thread Qing Zhao via Gcc-patches



> On Feb 9, 2021, at 11:36 AM, Richard Biener  wrote:
> 
> On Tue, 9 Feb 2021, Qing Zhao wrote:
> 
>> Richard,
>> 
>> Thank you for the reply.
>> 
>> Yes, I understand that without a working testing case to repeat the error, 
>> it’s very hard to debug and fix the issue. 
>> 
>> However, providing a testing case for this bug is really challenging from 
>> our side due to multiple reasons…
>> 
>> I will discuss with our building engineer to see what we can do. Or, I will 
>> try to debug and fix this issue myself…
> 
> Note you can try reducing a proprietary testcase with tools like
> cvise or creduce.
Thanks for the suggestions. 

>  Does your case also happen in a mingw/windows
> environment?

Our application is compiled in linux environment. 
Right now, looks like that David Malcolm already can repeat the error and is 
able to debug it right now.

Hopefully this can be fixed soon.

Thanks a lot.

Qing
> 
> Richard.
> 
>> 
>> Qing
>> 
>>> On Feb 9, 2021, at 2:18 AM, Richard Biener >> <mailto:rguent...@suse.de>> wrote:
>>> 
>>> On Mon, 8 Feb 2021, Qing Zhao wrote:
>>> 
>>>> Hi, 
>>>> 
>>>> The bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96391 
>>>> <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96391> 
>>>> <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96391 
>>>> <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96391>> 
>>>> <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96391 
>>>> <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96391><https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96391
>>>>  <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96391>>>
>>>> 
>>>> Bug 96391 - [10/11 Regression] internal compiler error: in 
>>>> linemap_compare_locations, at libcpp/line-map.c:1359 
>>>> 
>>>> has been opened on 7/30/2020, and multiple users reported the same issue. 
>>>> 
>>>> For our important application, all the C++ modules failed with this bug 
>>>> when we use gcc10 or gcc11. Then we have 
>>>> To use icc to compile C++, and gcc to compile C, it’s very inconvenient. 
>>>> 
>>>> I have raised the priority of this bug to P2 on 10/09/2020,  hope it can 
>>>> be fixed in gcc11. 
>>>> 
>>>> I see that Michael Cronenworth has attached a preprocessed file for the 
>>>> reproducing purpose in comment 4. 
>>>> 
>>>> So, can we have the fix of the bug in gcc11?
>>> 
>>> The issue is that the preprocessed source does not reproduce the issue and
>>> a mingw development environment is not easily accessible (to me at least).
>>> 
>>> So unless you can reproduce this in a standard linux environment and can
>>> provide a testcase I don't see a way to get this bug forward.
>>> 
>>> Richard.
>>> 
>>>> Thanks a lot.
>>>> 
>>>> Qing
>>> 
>>> -- 
>>> Richard Biener mailto:rguent...@suse.de> 
>>> <mailto:rguent...@suse.de <mailto:rguent...@suse.de>>>
>>> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
>>> Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
>> 
>> 
> 
> -- 
> Richard Biener mailto:rguent...@suse.de>>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)



Testing case for fix-point type auto variables?

2021-02-10 Thread Qing Zhao via Gcc-patches
Hi,

I am trying to add testing cases for verifing  pattern initialization or zero 
initialization to fix-point type auto variables.
Please let me know where in the test suite directory I can find a good example 
for fix-point type programming?

Thanks.

Qing

Re: PR 96391? Can we fix it for gcc11?

2021-02-09 Thread Qing Zhao via Gcc-patches
Richard,

Thank you for the reply.

Yes, I understand that without a working testing case to repeat the error, it’s 
very hard to debug and fix the issue. 

However, providing a testing case for this bug is really challenging from our 
side due to multiple reasons…

I will discuss with our building engineer to see what we can do. Or, I will try 
to debug and fix this issue myself…


Qing

> On Feb 9, 2021, at 2:18 AM, Richard Biener  wrote:
> 
> On Mon, 8 Feb 2021, Qing Zhao wrote:
> 
>> Hi, 
>> 
>> The bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96391 
>> <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96391> 
>> <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96391 
>> <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96391>>
>> 
>> Bug 96391 - [10/11 Regression] internal compiler error: in 
>> linemap_compare_locations, at libcpp/line-map.c:1359 
>> 
>> has been opened on 7/30/2020, and multiple users reported the same issue. 
>> 
>> For our important application, all the C++ modules failed with this bug when 
>> we use gcc10 or gcc11. Then we have 
>> To use icc to compile C++, and gcc to compile C, it’s very inconvenient. 
>> 
>> I have raised the priority of this bug to P2 on 10/09/2020,  hope it can be 
>> fixed in gcc11. 
>> 
>> I see that Michael Cronenworth has attached a preprocessed file for the 
>> reproducing purpose in comment 4. 
>> 
>> So, can we have the fix of the bug in gcc11?
> 
> The issue is that the preprocessed source does not reproduce the issue and
> a mingw development environment is not easily accessible (to me at least).
> 
> So unless you can reproduce this in a standard linux environment and can
> provide a testcase I don't see a way to get this bug forward.
> 
> Richard.
> 
>> Thanks a lot.
>> 
>> Qing
> 
> -- 
> Richard Biener mailto:rguent...@suse.de>>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)



PR 96391? Can we fix it for gcc11?

2021-02-08 Thread Qing Zhao via Gcc-patches
Hi, 

The bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96391 


Bug 96391 - [10/11 Regression] internal compiler error: in 
linemap_compare_locations, at libcpp/line-map.c:1359 

has been opened on 7/30/2020, and multiple users reported the same issue. 

For our important application, all the C++ modules failed with this bug when we 
use gcc10 or gcc11. Then we have 
To use icc to compile C++, and gcc to compile C, it’s very inconvenient. 

I have raised the priority of this bug to P2 on 10/09/2020,  hope it can be 
fixed in gcc11. 

I see that Michael Cronenworth has attached a preprocessed file for the 
reproducing purpose in comment 4. 

So, can we have the fix of the bug in gcc11?

Thanks a lot.

Qing

Re: The performance data for two different implementation of new security feature -ftrivial-auto-var-init

2021-02-02 Thread Qing Zhao via Gcc-patches
Hi,

With the following patch:

[qinzhao@localhost gcc]$ git diff tree-ssa-structalias.c
diff --git a/gcc/tree-ssa-structalias.c b/gcc/tree-ssa-structalias.c
index cf653be..bd18841 100644
--- a/gcc/tree-ssa-structalias.c
+++ b/gcc/tree-ssa-structalias.c
@@ -4851,6 +4851,30 @@ find_func_aliases_for_builtin_call (struct function *fn, 
gcall *t)
   return false;
 }
 
+static void
+find_func_aliases_for_deferred_init (gcall *t)
+{
+  
+  tree lhsop = gimple_call_lhs (t);
+  enum auto_init_type init_type
+= (enum auto_init_type) TREE_INT_CST_LOW (gimple_call_arg (t, 1));
+  auto_vec lhsc;
+  auto_vec rhsc;
+  struct constraint_expr temp;
+ 
+  get_constraint_for (lhsop, );
+  if (init_type == AUTO_INIT_ZERO && flag_delete_null_pointer_checks)
+temp.var = nothing_id;
+  else
+temp.var = nonlocal_id;
+  temp.type = ADDRESSOF;
+  temp.offset = 0;
+  rhsc.safe_push (temp);
+
+  process_all_all_constraints (lhsc, rhsc);
+  return;
+}
+
 /* Create constraints for the call T.  */
 
 static void
@@ -4864,6 +4888,12 @@ find_func_aliases_for_call (struct function *fn, gcall 
*t)
   && find_func_aliases_for_builtin_call (fn, t))
 return;
 
+  if (gimple_call_internal_p (t, IFN_DEFERRED_INIT))
+{
+  find_func_aliases_for_deferred_init (t);
+  return;
+}
+

The *.ealias dump for the routine “bump_map” are exactly the same for approach 
A and D. 
However, the stack size for D still bigger than A. 

Any suggestions?

Qing


On Feb 2, 2021, at 9:17 AM, Qing Zhao via Gcc-patches  
wrote:
> 
> 
> 
>> On Feb 2, 2021, at 1:43 AM, Richard Biener  wrote:
>> 
>> On Mon, 1 Feb 2021, Qing Zhao wrote:
>> 
>>> Hi, Richard,
>>> 
>>> I have adjusted SRA phase to split calls to DEFERRED_INIT per you 
>>> suggestion.
>>> 
>>> And now the routine “bump_map” in 511.povray is like following:
>>> ...
>>> 
>>> # DEBUG BEGIN_STMT
>>> xcoor = 0.0;
>>> ycoor = 0.0;
>>> # DEBUG BEGIN_STMT
>>> index = .DEFERRED_INIT (index, 2);
>>> index2 = .DEFERRED_INIT (index2, 2);
>>> index3 = .DEFERRED_INIT (index3, 2);
>>> # DEBUG BEGIN_STMT
>>> colour1 = .DEFERRED_INIT (colour1, 2);
>>> colour2 = .DEFERRED_INIT (colour2, 2);
>>> colour3 = .DEFERRED_INIT (colour3, 2);
>>> # DEBUG BEGIN_STMT
>>> p1$0_181 = .DEFERRED_INIT (p1$0_195(D), 2);
>>> # DEBUG p1$0 => p1$0_181
>>> p1$1_184 = .DEFERRED_INIT (p1$1_182(D), 2);
>>> # DEBUG p1$1 => p1$1_184
>>> p1$2_172 = .DEFERRED_INIT (p1$2_185(D), 2);
>>> # DEBUG p1$2 => p1$2_172
>>> p2$0_177 = .DEFERRED_INIT (p2$0_173(D), 2);
>>> # DEBUG p2$0 => p2$0_177
>>> p2$1_135 = .DEFERRED_INIT (p2$1_178(D), 2);
>>> # DEBUG p2$1 => p2$1_135
>>> p2$2_137 = .DEFERRED_INIT (p2$2_136(D), 2);
>>> # DEBUG p2$2 => p2$2_137
>>> p3$0_377 = .DEFERRED_INIT (p3$0_376(D), 2);
>>> # DEBUG p3$0 => p3$0_377
>>> p3$1_379 = .DEFERRED_INIT (p3$1_378(D), 2);
>>> # DEBUG p3$1 => p3$1_379
>>> p3$2_381 = .DEFERRED_INIT (p3$2_380(D), 2);
>>> # DEBUG p3$2 => p3$2_381
>>> 
>>> 
>>> In the above, p1, p2, and p3 are all splitted to calls to DEFERRED_INIT of 
>>> the components of p1, p2 and p3. 
>>> 
>>> With this change, the stack usage numbers with -fstack-usage for approach 
>>> A, old approach D and new D with the splitting in SRA are:
>>> 
>>> Approach A  Approach D-old  Approach D-new
>>> 
>>> 272 624 368
>>> 
>>> From the above, we can see that splitting the call to DEFERRED_INIT in SRA 
>>> can reduce the stack usage increase dramatically. 
>>> 
>>> However, looks like that the stack size for D is still bigger than A. 
>>> 
>>> I checked the IR again, and found that the alias analysis might be 
>>> responsible for this (by compare the image.cpp.026t.ealias for both A and 
>>> D):
>>> 
>>> (Due to the call to:
>>> 
>>> colour1 = .DEFERRED_INIT (colour1, 2);
>>> )
>>> 
>>> **Approach A:
>>> 
>>> Points_to analysis:
>>> 
>>> Constraints:
>>> …
>>> colour1 = 
>>> …
>>> colour1 = 
>>> colour1 = 
>>> colour1 = 
>>> colour1 = 
>>> colour1 = 
>>> ...
>>> callarg(53) = 
>>> ...
>>> _53 = colour1
>>> 
>>> Points_to sets:
>>> …
>>> colour1 = { NULL ESCAPED NONLOCAL } same as _53
>>> ...
>>> CALLUSED(48) = { NULL ESCAPED NONLOCAL

Re: The performance data for two different implementation of new security feature -ftrivial-auto-var-init

2021-02-02 Thread Qing Zhao via Gcc-patches



> On Feb 2, 2021, at 1:43 AM, Richard Biener  wrote:
> 
> On Mon, 1 Feb 2021, Qing Zhao wrote:
> 
>> Hi, Richard,
>> 
>> I have adjusted SRA phase to split calls to DEFERRED_INIT per you suggestion.
>> 
>> And now the routine “bump_map” in 511.povray is like following:
>> ...
>> 
>> # DEBUG BEGIN_STMT
>>  xcoor = 0.0;
>>  ycoor = 0.0;
>>  # DEBUG BEGIN_STMT
>>  index = .DEFERRED_INIT (index, 2);
>>  index2 = .DEFERRED_INIT (index2, 2);
>>  index3 = .DEFERRED_INIT (index3, 2);
>>  # DEBUG BEGIN_STMT
>>  colour1 = .DEFERRED_INIT (colour1, 2);
>>  colour2 = .DEFERRED_INIT (colour2, 2);
>>  colour3 = .DEFERRED_INIT (colour3, 2);
>>  # DEBUG BEGIN_STMT
>>  p1$0_181 = .DEFERRED_INIT (p1$0_195(D), 2);
>>  # DEBUG p1$0 => p1$0_181
>>  p1$1_184 = .DEFERRED_INIT (p1$1_182(D), 2);
>>  # DEBUG p1$1 => p1$1_184
>>  p1$2_172 = .DEFERRED_INIT (p1$2_185(D), 2);
>>  # DEBUG p1$2 => p1$2_172
>>  p2$0_177 = .DEFERRED_INIT (p2$0_173(D), 2);
>>  # DEBUG p2$0 => p2$0_177
>>  p2$1_135 = .DEFERRED_INIT (p2$1_178(D), 2);
>>  # DEBUG p2$1 => p2$1_135
>>  p2$2_137 = .DEFERRED_INIT (p2$2_136(D), 2);
>>  # DEBUG p2$2 => p2$2_137
>>  p3$0_377 = .DEFERRED_INIT (p3$0_376(D), 2);
>>  # DEBUG p3$0 => p3$0_377
>>  p3$1_379 = .DEFERRED_INIT (p3$1_378(D), 2);
>>  # DEBUG p3$1 => p3$1_379
>>  p3$2_381 = .DEFERRED_INIT (p3$2_380(D), 2);
>>  # DEBUG p3$2 => p3$2_381
>> 
>> 
>> In the above, p1, p2, and p3 are all splitted to calls to DEFERRED_INIT of 
>> the components of p1, p2 and p3. 
>> 
>> With this change, the stack usage numbers with -fstack-usage for approach A, 
>> old approach D and new D with the splitting in SRA are:
>> 
>>  Approach A  Approach D-old  Approach D-new
>> 
>>  272 624 368
>> 
>> From the above, we can see that splitting the call to DEFERRED_INIT in SRA 
>> can reduce the stack usage increase dramatically. 
>> 
>> However, looks like that the stack size for D is still bigger than A. 
>> 
>> I checked the IR again, and found that the alias analysis might be 
>> responsible for this (by compare the image.cpp.026t.ealias for both A and D):
>> 
>> (Due to the call to:
>> 
>>  colour1 = .DEFERRED_INIT (colour1, 2);
>> )
>> 
>> **Approach A:
>> 
>> Points_to analysis:
>> 
>> Constraints:
>> …
>> colour1 = 
>> …
>> colour1 = 
>> colour1 = 
>> colour1 = 
>> colour1 = 
>> colour1 = 
>> ...
>> callarg(53) = 
>> ...
>> _53 = colour1
>> 
>> Points_to sets:
>> …
>> colour1 = { NULL ESCAPED NONLOCAL } same as _53
>> ...
>> CALLUSED(48) = { NULL ESCAPED NONLOCAL index colour1 }
>> CALLCLOBBERED(49) = { NULL ESCAPED NONLOCAL index colour1 } same as 
>> CALLUSED(48)
>> ...
>> callarg(53) = { NULL ESCAPED NONLOCAL colour1 }
>> 
>> **Apprach D:
>> 
>> Points_to analysis:
>> 
>> Constraints:
>> …
>> callarg(19) = colour1
>> callarg(19) = 
>> colour1 = callarg(19) + UNKNOWN
>> colour1 = 
>> …
>> colour1 = 
>> colour1 = 
>> colour1 = 
>> colour1 = 
>> colour1 = 
>> …
>> callarg(74) = 
>> callarg(74) = callarg(74) + UNKNOWN
>> callarg(74) = *callarg(74) + UNKNOWN
>> …
>> _53 = colour1
>> _54 = _53
>> _55 = _54 + UNKNOWN
>> _55 = 
>> _56 = colour1
>> _57 = _56
>> _58 = _57 + UNKNOWN
>> _58 = 
>> _59 = _55 + UNKNOWN
>> _59 = _58 + UNKNOWN
>> _60 = colour1
>> _61 = _60
>> _62 = _61 + UNKNOWN
>> _62 = 
>> _63 = _59 + UNKNOWN
>> _63 = _62 + UNKNOWN
>> _64 = _63 + UNKNOWN
>> ..
>> Points_to set:
>> …
>> colour1 = { ESCAPED NONLOCAL } same as callarg(19)
>> …
>> CALLUSED(69) = { ESCAPED NONLOCAL index colour1 }
>> CALLCLOBBERED(70) = { ESCAPED NONLOCAL index colour1 } same as CALLUSED(69)
>> callarg(71) = { ESCAPED NONLOCAL }
>> callarg(72) = { ESCAPED NONLOCAL }
>> callarg(73) = { ESCAPED NONLOCAL }
>> callarg(74) = { ESCAPED NONLOCAL colour1 }
>> 
>> My question:
>> 
>> Is it possible to adjust alias analysis to resolve this issue?
> 
> You probably want to handle .DEFERRED_INIT in tree-ssa-structalias.c
> find_func_aliases_for_call (it's not a builtin but you can look in
> the respective subroutine for examples).  Specifically you want to
> avoid making anything escaped or clobbered.

Okay, thanks.

Will check on that.

Qing
>> 
> 
> -- 
> Richard Biener mailto:rguent...@suse.de>>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)



Re: The performance data for two different implementation of new security feature -ftrivial-auto-var-init

2021-02-01 Thread Qing Zhao via Gcc-patches
Hi, Richard,

I have adjusted SRA phase to split calls to DEFERRED_INIT per you suggestion.

And now the routine “bump_map” in 511.povray is like following:
...

 # DEBUG BEGIN_STMT
  xcoor = 0.0;
  ycoor = 0.0;
  # DEBUG BEGIN_STMT
  index = .DEFERRED_INIT (index, 2);
  index2 = .DEFERRED_INIT (index2, 2);
  index3 = .DEFERRED_INIT (index3, 2);
  # DEBUG BEGIN_STMT
  colour1 = .DEFERRED_INIT (colour1, 2);
  colour2 = .DEFERRED_INIT (colour2, 2);
  colour3 = .DEFERRED_INIT (colour3, 2);
  # DEBUG BEGIN_STMT
  p1$0_181 = .DEFERRED_INIT (p1$0_195(D), 2);
  # DEBUG p1$0 => p1$0_181
  p1$1_184 = .DEFERRED_INIT (p1$1_182(D), 2);
  # DEBUG p1$1 => p1$1_184
  p1$2_172 = .DEFERRED_INIT (p1$2_185(D), 2);
  # DEBUG p1$2 => p1$2_172
  p2$0_177 = .DEFERRED_INIT (p2$0_173(D), 2);
  # DEBUG p2$0 => p2$0_177
  p2$1_135 = .DEFERRED_INIT (p2$1_178(D), 2);
  # DEBUG p2$1 => p2$1_135
  p2$2_137 = .DEFERRED_INIT (p2$2_136(D), 2);
  # DEBUG p2$2 => p2$2_137
  p3$0_377 = .DEFERRED_INIT (p3$0_376(D), 2);
  # DEBUG p3$0 => p3$0_377
  p3$1_379 = .DEFERRED_INIT (p3$1_378(D), 2);
  # DEBUG p3$1 => p3$1_379
  p3$2_381 = .DEFERRED_INIT (p3$2_380(D), 2);
  # DEBUG p3$2 => p3$2_381


In the above, p1, p2, and p3 are all splitted to calls to DEFERRED_INIT of the 
components of p1, p2 and p3. 

With this change, the stack usage numbers with -fstack-usage for approach A, 
old approach D and new D with the splitting in SRA are:

  Approach AApproach D-old  Approach D-new

272 624 368

From the above, we can see that splitting the call to DEFERRED_INIT in SRA can 
reduce the stack usage increase dramatically. 

However, looks like that the stack size for D is still bigger than A. 

I checked the IR again, and found that the alias analysis might be responsible 
for this (by compare the image.cpp.026t.ealias for both A and D):

(Due to the call to:

  colour1 = .DEFERRED_INIT (colour1, 2);
)

**Approach A:

Points_to analysis:

Constraints:
…
colour1 = 
…
colour1 = 
colour1 = 
colour1 = 
colour1 = 
colour1 = 
...
callarg(53) = 
...
_53 = colour1

Points_to sets:
…
colour1 = { NULL ESCAPED NONLOCAL } same as _53
...
CALLUSED(48) = { NULL ESCAPED NONLOCAL index colour1 }
CALLCLOBBERED(49) = { NULL ESCAPED NONLOCAL index colour1 } same as CALLUSED(48)
...
callarg(53) = { NULL ESCAPED NONLOCAL colour1 }

**Apprach D:

Points_to analysis:

Constraints:
…
callarg(19) = colour1
callarg(19) = 
colour1 = callarg(19) + UNKNOWN
colour1 = 
…
colour1 = 
colour1 = 
colour1 = 
colour1 = 
colour1 = 
…
callarg(74) = 
callarg(74) = callarg(74) + UNKNOWN
callarg(74) = *callarg(74) + UNKNOWN
…
_53 = colour1
_54 = _53
_55 = _54 + UNKNOWN
_55 = 
_56 = colour1
_57 = _56
_58 = _57 + UNKNOWN
_58 = 
_59 = _55 + UNKNOWN
_59 = _58 + UNKNOWN
_60 = colour1
_61 = _60
_62 = _61 + UNKNOWN
_62 = 
_63 = _59 + UNKNOWN
_63 = _62 + UNKNOWN
_64 = _63 + UNKNOWN
..
Points_to set:
…
colour1 = { ESCAPED NONLOCAL } same as callarg(19)
…
CALLUSED(69) = { ESCAPED NONLOCAL index colour1 }
CALLCLOBBERED(70) = { ESCAPED NONLOCAL index colour1 } same as CALLUSED(69)
callarg(71) = { ESCAPED NONLOCAL }
callarg(72) = { ESCAPED NONLOCAL }
callarg(73) = { ESCAPED NONLOCAL }
callarg(74) = { ESCAPED NONLOCAL colour1 }

My question:

Is it possible to adjust alias analysis to resolve this issue?

thanks.

Qing

> On Jan 18, 2021, at 10:12 AM, Qing Zhao via Gcc-patches 
>  wrote:
> 
>>>>> I checked the routine “poverties::bump_map” in 511.povray_r since it
>>>>> has a lot stack increase 
>>>>> due to implementation D, by examine the IR immediate before RTL
>>>>> expansion phase.  
>>>>> (image.cpp.244t.optimized), I found that we have the following
>>>>> additional statements for the array elements:
>>>>> 
>>>>> void  pov::bump_map (double * EPoint, struct TNORMAL * Tnormal, double
>>>>> * normal)
>>>>> {
>>>>> …
>>>>> double p3[3];
>>>>> double p2[3];
>>>>> double p1[3];
>>>>> float colour3[5];
>>>>> float colour2[5];
>>>>> float colour1[5];
>>>>> …
>>>>> # DEBUG BEGIN_STMT
>>>>> colour1 = .DEFERRED_INIT (colour1, 2);
>>>>> colour2 = .DEFERRED_INIT (colour2, 2);
>>>>> colour3 = .DEFERRED_INIT (colour3, 2);
>>>>> # DEBUG BEGIN_STMT
>>>>> MEM  [(double[3] *)] = p1$0_144(D);
>>>>> MEM  [(double[3] *) + 8B] = p1$1_135(D);
>>>>> MEM  [(double[3] *) + 16B] = p1$2_138(D);
>>>>> p1 = .DEFERRED_INIT (p1, 2);
>>>>> # DEBUG D#12 => MEM  [(double[3] *)]
>>>>> # DEBUG p1$0 => D#12
>>>>> # DEBUG D#11 => MEM  [(double[3] *) + 8B]
>>>>> # DEBUG p1$1 

Re: The performance data for two different implementation of new security feature -ftrivial-auto-var-init

2021-01-18 Thread Qing Zhao via Gcc-patches



> On Jan 18, 2021, at 7:09 AM, Richard Sandiford  
> wrote:
> 
> Qing Zhao  writes:
>>>>> D will keep all initialized aggregates as aggregates and live which
>>>>> means stack will be allocated for it.  With A the usual optimizations
>>>>> to reduce stack usage can be applied.
>>>> 
>>>> I checked the routine “poverties::bump_map” in 511.povray_r since it
>>>> has a lot stack increase 
>>>> due to implementation D, by examine the IR immediate before RTL
>>>> expansion phase.  
>>>> (image.cpp.244t.optimized), I found that we have the following
>>>> additional statements for the array elements:
>>>> 
>>>> void  pov::bump_map (double * EPoint, struct TNORMAL * Tnormal, double
>>>> * normal)
>>>> {
>>>> …
>>>> double p3[3];
>>>> double p2[3];
>>>> double p1[3];
>>>> float colour3[5];
>>>> float colour2[5];
>>>> float colour1[5];
>>>> …
>>>> # DEBUG BEGIN_STMT
>>>> colour1 = .DEFERRED_INIT (colour1, 2);
>>>> colour2 = .DEFERRED_INIT (colour2, 2);
>>>> colour3 = .DEFERRED_INIT (colour3, 2);
>>>> # DEBUG BEGIN_STMT
>>>> MEM  [(double[3] *)] = p1$0_144(D);
>>>> MEM  [(double[3] *) + 8B] = p1$1_135(D);
>>>> MEM  [(double[3] *) + 16B] = p1$2_138(D);
>>>> p1 = .DEFERRED_INIT (p1, 2);
>>>> # DEBUG D#12 => MEM  [(double[3] *)]
>>>> # DEBUG p1$0 => D#12
>>>> # DEBUG D#11 => MEM  [(double[3] *) + 8B]
>>>> # DEBUG p1$1 => D#11
>>>> # DEBUG D#10 => MEM  [(double[3] *) + 16B]
>>>> # DEBUG p1$2 => D#10
>>>> MEM  [(double[3] *)] = p2$0_109(D);
>>>> MEM  [(double[3] *) + 8B] = p2$1_111(D);
>>>> MEM  [(double[3] *) + 16B] = p2$2_254(D);
>>>> p2 = .DEFERRED_INIT (p2, 2);
>>>> # DEBUG D#9 => MEM  [(double[3] *)]
>>>> # DEBUG p2$0 => D#9
>>>> # DEBUG D#8 => MEM  [(double[3] *) + 8B]
>>>> # DEBUG p2$1 => D#8
>>>> # DEBUG D#7 => MEM  [(double[3] *) + 16B]
>>>> # DEBUG p2$2 => D#7
>>>> MEM  [(double[3] *)] = p3$0_256(D);
>>>> MEM  [(double[3] *) + 8B] = p3$1_258(D);
>>>> MEM  [(double[3] *) + 16B] = p3$2_260(D);
>>>> p3 = .DEFERRED_INIT (p3, 2);
>>>> ….
>>>> }
>>>> 
>>>> I guess that the above “MEM ….. = …” are the ones that make the
>>>> differences. Which phase introduced them?
>>> 
>>> Looks like SRA. But you can just dump all and grep for the first 
>>> occurrence. 
>> 
>> Yes, looks like that SRA is the one:
>> 
>> image.cpp.035t.esra:  MEM  [(double[3] *)] = p1$0_195(D);
>> image.cpp.035t.esra:  MEM  [(double[3] *) + 8B] = p1$1_182(D);
>> image.cpp.035t.esra:  MEM  [(double[3] *) + 16B] = p1$2_185(D);
> 
> I realise no-one was suggesting otherwise, but FWIW: SRA could easily
> be extended to handle .DEFERRED_INIT if that's the main source of
> excess stack usage.  A single .DEFERRED_INIT of an aggregate can
> be split into .DEFERRED_INITs of individual components.

Thanks a lot for the suggestion,
I will study the code of SRA to see how to do this and then see whether this 
can resolve the issue.
> 
> In other words, the investigation you're doing looks like the right way
> of deciding which passes are worth extending to handle .DEFERRED_INIT.
Yes, with the study so far, looks like the major issue with the .DERERRED_INIT 
approach is the stack size increase.
Hopefully after resolving this issue, we will be done.

Qing

> 
> Thanks,
> Richard



Re: The performance data for two different implementation of new security feature -ftrivial-auto-var-init

2021-01-15 Thread Qing Zhao via Gcc-patches



> On Jan 15, 2021, at 11:22 AM, Richard Biener  wrote:
> 
> On January 15, 2021 5:16:40 PM GMT+01:00, Qing Zhao  <mailto:qing.z...@oracle.com>> wrote:
>> 
>> 
>>> On Jan 15, 2021, at 2:11 AM, Richard Biener 
>> wrote:
>>> 
>>> 
>>> 
>>> On Thu, 14 Jan 2021, Qing Zhao wrote:
>>> 
>>>> Hi, 
>>>> More data on code size and compilation time with CPU2017:
>>>> Compilation time data:   the numbers are the slowdown
>> against the
>>>> default “no”:
>>>> benchmarks  A/no D/no
>>>> 
>>>> 500.perlbench_r 5.19% 1.95%
>>>> 502.gcc_r 0.46% -0.23%
>>>> 505.mcf_r 0.00% 0.00%
>>>> 520.omnetpp_r 0.85% 0.00%
>>>> 523.xalancbmk_r 0.79% -0.40%
>>>> 525.x264_r -4.48% 0.00%
>>>> 531.deepsjeng_r 16.67% 16.67%
>>>> 541.leela_r  0.00%  0.00%
>>>> 557.xz_r 0.00%  0.00%
>>>> 
>>>> 507.cactuBSSN_r 1.16% 0.58%
>>>> 508.namd_r 9.62% 8.65%
>>>> 510.parest_r 0.48% 1.19%
>>>> 511.povray_r 3.70% 3.70%
>>>> 519.lbm_r 0.00% 0.00%
>>>> 521.wrf_r 0.05% 0.02%
>>>> 526.blender_r 0.33% 1.32%
>>>> 527.cam4_r -0.93% -0.93%
>>>> 538.imagick_r 1.32% 3.95%
>>>> 544.nab_r  0.00% 0.00%
>>>> From the above data, looks like that the compilation time impact
>>>> from implementation A and D are almost the same.
>>>> ***code size data: the numbers are the code size increase
>> against the
>>>> default “no”:
>>>> benchmarks A/no D/no
>>>> 
>>>> 500.perlbench_r 2.84% 0.34%
>>>> 502.gcc_r 2.59% 0.35%
>>>> 505.mcf_r 3.55% 0.39%
>>>> 520.omnetpp_r 0.54% 0.03%
>>>> 523.xalancbmk_r 0.36%  0.39%
>>>> 525.x264_r 1.39% 0.13%
>>>> 531.deepsjeng_r 2.15% -1.12%
>>>> 541.leela_r 0.50% -0.20%
>>>> 557.xz_r 0.31% 0.13%
>>>> 
>>>> 507.cactuBSSN_r 5.00% -0.01%
>>>> 508.namd_r 3.64% -0.07%
>>>> 510.parest_r 1.12% 0.33%
>>>> 511.povray_r 4.18% 1.16%
>>>> 519.lbm_r 8.83% 6.44%
>>>> 521.wrf_r 0.08% 0.02%
>>>> 526.blender_r 1.63% 0.45%
>>>> 527.cam4_r  0.16% 0.06%
>>>> 538.imagick_r 3.18% -0.80%
>>>> 544.nab_r 5.76% -1.11%
>>>> Avg 2.52% 0.36%
>>>> From the above data, the implementation D is always better than A,
>> it’s a
>>>> surprising to me, not sure what’s the reason for this.
>>> 
>>> D probably inhibits most interesting loop transforms (check SPEC FP
>>> performance).
>> 
>> The call to .DEFERRED_INIT is marked as ECF_CONST:
>> 
>> /* A function to represent an artifical initialization to an
>> uninitialized
>>  automatic variable. The first argument is the variable itself, the
>>  second argument is the initialization type.  */
>> DEF_INTERNAL_FN (DEFERRED_INIT, ECF_CONST | ECF_LEAF | ECF_NOTHROW,
>> NULL)
>> 
>> So, I assume that such const call should minimize the impact to loop
>> optimizations. But yes, it will still inhibit some of the loop
>> transformations.
>> 
>>> It will also most definitely disallow SRA which, when
>>> an aggregate is not completely elided, tends to grow code.
>> 
>> Make sense to me. 
>> 
>> The run-time performance data for D and A are actually very similar as
>> I posted in the previous email (I listed it here for convenience)
>> 
>> Run-time performance overhead with A and D:
>> 
>> benchmarks   A / no  D /no
>> 
>> 500.perlbench_r  1.25%   1.25%
>> 502.gcc_r0.68%   1.80%
>> 505.mcf_r0.68%   0.14%
>> 520.omnetpp_r4.83%   4.68%
>> 523.xalancbmk_r  0.18%   1.96%
>> 525.x264_r   1.55%   2.07%
>> 531.deepsjeng_   11.57%  11.85%
>> 541.leela_r  0.64%   0.80%
>> 557.xz_   -0.41% -0.41%
>> 
>> 507.cactuBSSN_r  0.44%   0.44%
>> 508.namd_r   0.34%   0.34%
>> 510.parest_r 0.17%   0.25%
>> 511.povray_r 56.57%  57.27%
>> 519.lbm_r0.00%   0.00%
>> 521.wrf_r -0.28% -0.37%
>> 526.blender_r16.96%  17.71%
>> 527.cam4_r   0.70%   0.53%
>> 538.imagick_r2.40%   2.40%
>> 544.nab_r0.00%   -0.65%
>> 
>> avg  5.17%   5.37%
>> 
>> Especially for the

Re: The performance data for two different implementation of new security feature -ftrivial-auto-var-init

2021-01-15 Thread Qing Zhao via Gcc-patches



> On Jan 15, 2021, at 2:11 AM, Richard Biener  wrote:
> 
> 
> 
> On Thu, 14 Jan 2021, Qing Zhao wrote:
> 
>> Hi, 
>> More data on code size and compilation time with CPU2017:
>> Compilation time data:   the numbers are the slowdown against the
>> default “no”:
>> benchmarks  A/no D/no
>> 
>> 500.perlbench_r 5.19% 1.95%
>> 502.gcc_r 0.46% -0.23%
>> 505.mcf_r 0.00% 0.00%
>> 520.omnetpp_r 0.85% 0.00%
>> 523.xalancbmk_r 0.79% -0.40%
>> 525.x264_r -4.48% 0.00%
>> 531.deepsjeng_r 16.67% 16.67%
>> 541.leela_r  0.00%  0.00%
>> 557.xz_r 0.00%  0.00%
>> 
>> 507.cactuBSSN_r 1.16% 0.58%
>> 508.namd_r 9.62% 8.65%
>> 510.parest_r 0.48% 1.19%
>> 511.povray_r 3.70% 3.70%
>> 519.lbm_r 0.00% 0.00%
>> 521.wrf_r 0.05% 0.02%
>> 526.blender_r 0.33% 1.32%
>> 527.cam4_r -0.93% -0.93%
>> 538.imagick_r 1.32% 3.95%
>> 544.nab_r  0.00% 0.00%
>> From the above data, looks like that the compilation time impact
>> from implementation A and D are almost the same.
>> ***code size data: the numbers are the code size increase against the
>> default “no”:
>> benchmarks A/no D/no
>> 
>> 500.perlbench_r 2.84% 0.34%
>> 502.gcc_r 2.59% 0.35%
>> 505.mcf_r 3.55% 0.39%
>> 520.omnetpp_r 0.54% 0.03%
>> 523.xalancbmk_r 0.36%  0.39%
>> 525.x264_r 1.39% 0.13%
>> 531.deepsjeng_r 2.15% -1.12%
>> 541.leela_r 0.50% -0.20%
>> 557.xz_r 0.31% 0.13%
>> 
>> 507.cactuBSSN_r 5.00% -0.01%
>> 508.namd_r 3.64% -0.07%
>> 510.parest_r 1.12% 0.33%
>> 511.povray_r 4.18% 1.16%
>> 519.lbm_r 8.83% 6.44%
>> 521.wrf_r 0.08% 0.02%
>> 526.blender_r 1.63% 0.45%
>> 527.cam4_r  0.16% 0.06%
>> 538.imagick_r 3.18% -0.80%
>> 544.nab_r 5.76% -1.11%
>> Avg 2.52% 0.36%
>> From the above data, the implementation D is always better than A, it’s a
>> surprising to me, not sure what’s the reason for this.
> 
> D probably inhibits most interesting loop transforms (check SPEC FP
> performance).

The call to .DEFERRED_INIT is marked as ECF_CONST:

/* A function to represent an artifical initialization to an uninitialized
   automatic variable. The first argument is the variable itself, the
   second argument is the initialization type.  */
DEF_INTERNAL_FN (DEFERRED_INIT, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)

So, I assume that such const call should minimize the impact to loop 
optimizations. But yes, it will still inhibit some of the loop transformations.

>  It will also most definitely disallow SRA which, when
> an aggregate is not completely elided, tends to grow code.

Make sense to me. 

The run-time performance data for D and A are actually very similar as I posted 
in the previous email (I listed it here for convenience)

Run-time performance overhead with A and D:

benchmarks  A / no  D /no

500.perlbench_r 1.25%   1.25%
502.gcc_r   0.68%   1.80%
505.mcf_r   0.68%   0.14%
520.omnetpp_r   4.83%   4.68%
523.xalancbmk_r 0.18%   1.96%
525.x264_r  1.55%   2.07%
531.deepsjeng_  11.57%  11.85%
541.leela_r 0.64%   0.80%
557.xz_  -0.41% -0.41%

507.cactuBSSN_r 0.44%   0.44%
508.namd_r  0.34%   0.34%
510.parest_r0.17%   0.25%
511.povray_r56.57%  57.27%
519.lbm_r   0.00%   0.00%
521.wrf_r-0.28% -0.37%
526.blender_r   16.96%  17.71%
527.cam4_r  0.70%   0.53%
538.imagick_r   2.40%   2.40%
544.nab_r   0.00%   -0.65%

avg 5.17%   5.37%

Especially for the SPEC FP benchmarks, I didn’t see too much performance 
difference between A and D. 
I guess that the RTL optimizations might be enough to get rid of most of the 
overhead introduced by the additional initialization. 

> 
>> stack usage data, I added -fstack-usage to the compilation line when
>> compiling CPU2017 benchmarks. And all the *.su files were generated for each
>> of the modules.
>> Since there a lot of such files, and the stack size information are embedded
>> in each of the files.  I just picked up one benchmark 511.povray to
>> check. Which is the one that 
>> has the most runtime overhead when adding initialization (both A and D). 
>> I identified all the *.su files that are different between A and D and do a
>> diff on those *.su files, and looks like that the stack size is much higher
>> with D than that with A, for example:
>> $ diff build_base_auto_init.D./bbox.su
>> build_base_auto_init.A./bbox.su5c5
>> < bbox.cpp:1782:12:int pov::sort_and_split(pov

Re: The performance data for two different implementation of new security feature -ftrivial-auto-var-init

2021-01-14 Thread Qing Zhao via Gcc-patches
Hi, 

More data on code size and compilation time with CPU2017:

Compilation time data:   the numbers are the slowdown against the 
default “no”:

benchmarks   A/no   D/no

500.perlbench_r 5.19%   1.95%
502.gcc_r   0.46%   -0.23%
505.mcf_r   0.00%   0.00%
520.omnetpp_r   0.85%   0.00%
523.xalancbmk_r 0.79%   -0.40%
525.x264_r  -4.48%  0.00%
531.deepsjeng_r 16.67%  16.67%
541.leela_r  0.00%   0.00%
557.xz_r0.00%0.00%

507.cactuBSSN_r 1.16%   0.58%
508.namd_r  9.62%   8.65%
510.parest_r0.48%   1.19%
511.povray_r3.70%   3.70%
519.lbm_r   0.00%   0.00%
521.wrf_r   0.05%   0.02%
526.blender_r   0.33%   1.32%
527.cam4_r  -0.93%  -0.93%
538.imagick_r   1.32%   3.95%
544.nab_r   0.00%   0.00%

From the above data, looks like that the compilation time impact from 
implementation A and D are almost the same.

***code size data: the numbers are the code size increase against the 
default “no”:
benchmarks  A/noD/no

500.perlbench_r 2.84%   0.34%
502.gcc_r   2.59%   0.35%
505.mcf_r   3.55%   0.39%
520.omnetpp_r   0.54%   0.03%
523.xalancbmk_r 0.36%0.39%
525.x264_r  1.39%   0.13%
531.deepsjeng_r 2.15%   -1.12%
541.leela_r 0.50%   -0.20%
557.xz_r0.31%   0.13%

507.cactuBSSN_r 5.00%   -0.01%
508.namd_r  3.64%   -0.07%
510.parest_r1.12%   0.33%
511.povray_r4.18%   1.16%
519.lbm_r   8.83%   6.44%
521.wrf_r   0.08%   0.02%
526.blender_r   1.63%   0.45%
527.cam4_r   0.16%  0.06%
538.imagick_r   3.18%   -0.80%
544.nab_r   5.76%   -1.11%
Avg 2.52%   0.36%

From the above data, the implementation D is always better than A, it’s a 
surprising to me, not sure what’s the reason for this.

stack usage data, I added -fstack-usage to the compilation line when 
compiling CPU2017 benchmarks. And all the *.su files were generated for each of 
the modules.
Since there a lot of such files, and the stack size information are embedded in 
each of the files.  I just picked up one benchmark 511.povray to check. Which 
is the one that 
has the most runtime overhead when adding initialization (both A and D). 

I identified all the *.su files that are different between A and D and do a 
diff on those *.su files, and looks like that the stack size is much higher 
with D than that with A, for example:

$ diff build_base_auto_init.D./bbox.su build_base_auto_init.A./bbox.su
5c5
< bbox.cpp:1782:12:int pov::sort_and_split(pov::BBOX_TREE**, pov::BBOX_TREE**&, 
long int*, long int, long int)  160 static
---
> bbox.cpp:1782:12:int pov::sort_and_split(pov::BBOX_TREE**, pov::BBOX_TREE**&, 
> long int*, long int, long int)  96  static

$ diff build_base_auto_init.D./image.su build_base_auto_init.A./image.su
9c9
< image.cpp:240:6:void pov::bump_map(double*, pov::TNORMAL*, double*)   624 
static
---
> image.cpp:240:6:void pov::bump_map(double*, pov::TNORMAL*, double*)   272 
> static
….

Looks like that implementation D has more stack size impact than A. 

Do you have any insight on what the reason for this?

Let me know if you have any comments and suggestions.

thanks.

Qing
> On Jan 13, 2021, at 1:39 AM, Richard Biener  wrote:
> 
> On Tue, 12 Jan 2021, Qing Zhao wrote:
> 
>> Hi, 
>> 
>> Just check in to see whether you have any comments and suggestions on this:
>> 
>> FYI, I have been continue with Approach D implementation since last week:
>> 
>> D. Adding  calls to .DEFFERED_INIT during gimplification, expand the 
>> .DEFFERED_INIT during expand to
>> real initialization. Adjusting uninitialized pass with the new refs with 
>> “.DEFFERED_INIT”.
>> 
>> For the remaining work of Approach D:
>> 
>> ** complete the implementation of -ftrivial-auto-var-init=pattern;
>> ** complete the implementation of uninitialized warnings maintenance work 
>> for D. 
>> 
>> I have completed the uninitialized warnings maintenance work for D.
>> And finished partial of the -ftrivial-auto-var-init=pattern implementation. 
>> 
>> The following are remaining work of Approach D:
>> 
>>   ** -ftrivial-auto-var-init=pattern for VLA;
>>   **add a new attribute for variable:
>> __attribute((uninitialized)
>> the marked variable is uninitialized intentionaly for performance purpose.
>>   ** adding complete testing cases;
>> 
>> 
>> Please let me know if you have any objection on my current decisi

Re: The performance data for two different implementation of new security feature -ftrivial-auto-var-init

2021-01-13 Thread Qing Zhao via Gcc-patches



> On Jan 13, 2021, at 9:10 AM, Richard Biener  wrote:
> 
> On Wed, 13 Jan 2021, Qing Zhao wrote:
> 
>> 
>> 
>>> On Jan 13, 2021, at 1:39 AM, Richard Biener  wrote:
>>> 
>>> On Tue, 12 Jan 2021, Qing Zhao wrote:
>>> 
>>>> Hi, 
>>>> 
>>>> Just check in to see whether you have any comments and suggestions on this:
>>>> 
>>>> FYI, I have been continue with Approach D implementation since last week:
>>>> 
>>>> D. Adding  calls to .DEFFERED_INIT during gimplification, expand the 
>>>> .DEFFERED_INIT during expand to
>>>> real initialization. Adjusting uninitialized pass with the new refs with 
>>>> “.DEFFERED_INIT”.
>>>> 
>>>> For the remaining work of Approach D:
>>>> 
>>>> ** complete the implementation of -ftrivial-auto-var-init=pattern;
>>>> ** complete the implementation of uninitialized warnings maintenance work 
>>>> for D. 
>>>> 
>>>> I have completed the uninitialized warnings maintenance work for D.
>>>> And finished partial of the -ftrivial-auto-var-init=pattern 
>>>> implementation. 
>>>> 
>>>> The following are remaining work of Approach D:
>>>> 
>>>>  ** -ftrivial-auto-var-init=pattern for VLA;
>>>>  **add a new attribute for variable:
>>>> __attribute((uninitialized)
>>>> the marked variable is uninitialized intentionaly for performance purpose.
>>>>  ** adding complete testing cases;
>>>> 
>>>> 
>>>> Please let me know if you have any objection on my current decision on 
>>>> implementing approach D. 
>>> 
>>> Did you do any analysis on how stack usage and code size are changed 
>>> with approach D?
>> 
>> I did the code size change comparison (I will provide the data in another 
>> email). And with this data, D works better than A in general. (This is 
>> surprise to me actually).
>> 
>> But not the stack usage.  Not sure how to collect the stack usage data, 
>> do you have any suggestion on this?
> 
> There is -fstack-usage you could use, then of course watching
> the stack segment at runtime.

I can do this for CPU2017 to collect the stack usage data and report back.

>  I'm mostly concerned about
> stack-limited "processes" such as the linux kernel which I think
> is a primary target of your work.

I don’t have any experience on building linux kernel. 
Do we have to collect data for linux kernel at this time? Is CPU2017 data not 
enough?

Qing
> 
> Richard.
> 
>> 
>>> How does compile-time behave (we could gobble up
>>> lots of .DEFERRED_INIT calls I guess)?
>> I can collect this data too and report it later.
>> 
>> Thanks.
>> 
>> Qing
>>> 
>>> Richard.
>>> 
>>>> Thanks a lot for your help.
>>>> 
>>>> Qing
>>>> 
>>>> 
>>>>> On Jan 5, 2021, at 1:05 PM, Qing Zhao via Gcc-patches 
>>>>>  wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> This is an update for our previous discussion. 
>>>>> 
>>>>> 1. I implemented the following two different implementations in the 
>>>>> latest upstream gcc:
>>>>> 
>>>>> A. Adding real initialization during gimplification, not maintain the 
>>>>> uninitialized warnings.
>>>>> 
>>>>> D. Adding  calls to .DEFFERED_INIT during gimplification, expand the 
>>>>> .DEFFERED_INIT during expand to
>>>>> real initialization. Adjusting uninitialized pass with the new refs with 
>>>>> “.DEFFERED_INIT”.
>>>>> 
>>>>> Note, in this initial implementation,
>>>>>   ** I ONLY implement -ftrivial-auto-var-init=zero, the implementation of 
>>>>> -ftrivial-auto-var-init=pattern 
>>>>>  is not done yet.  Therefore, the performance data is only about 
>>>>> -ftrivial-auto-var-init=zero. 
>>>>> 
>>>>>   ** I added an temporary  option -fauto-var-init-approach=A|B|C|D  to 
>>>>> choose implementation A or D for 
>>>>>  runtime performance study.
>>>>>   ** I didn’t finish the uninitialized warnings maintenance work for D. 
>>>>> (That might take more time than I expected). 
>>>>> 
>>>>> 2. I collected runtime data for CPU2017 on a x86 machine with th

Re: The performance data for two different implementation of new security feature -ftrivial-auto-var-init

2021-01-13 Thread Qing Zhao via Gcc-patches



> On Jan 13, 2021, at 1:39 AM, Richard Biener  wrote:
> 
> On Tue, 12 Jan 2021, Qing Zhao wrote:
> 
>> Hi, 
>> 
>> Just check in to see whether you have any comments and suggestions on this:
>> 
>> FYI, I have been continue with Approach D implementation since last week:
>> 
>> D. Adding  calls to .DEFFERED_INIT during gimplification, expand the 
>> .DEFFERED_INIT during expand to
>> real initialization. Adjusting uninitialized pass with the new refs with 
>> “.DEFFERED_INIT”.
>> 
>> For the remaining work of Approach D:
>> 
>> ** complete the implementation of -ftrivial-auto-var-init=pattern;
>> ** complete the implementation of uninitialized warnings maintenance work 
>> for D. 
>> 
>> I have completed the uninitialized warnings maintenance work for D.
>> And finished partial of the -ftrivial-auto-var-init=pattern implementation. 
>> 
>> The following are remaining work of Approach D:
>> 
>>   ** -ftrivial-auto-var-init=pattern for VLA;
>>   **add a new attribute for variable:
>> __attribute((uninitialized)
>> the marked variable is uninitialized intentionaly for performance purpose.
>>   ** adding complete testing cases;
>> 
>> 
>> Please let me know if you have any objection on my current decision on 
>> implementing approach D. 
> 
> Did you do any analysis on how stack usage and code size are changed 
> with approach D?

I did the code size change comparison (I will provide the data in another 
email). And with this data, D works better than A in general. (This is surprise 
to me actually).

But not the stack usage.  Not sure how to collect the stack usage data, do you 
have any suggestion on this?


> How does compile-time behave (we could gobble up
> lots of .DEFERRED_INIT calls I guess)?
I can collect this data too and report it later.

Thanks.

Qing
> 
> Richard.
> 
>> Thanks a lot for your help.
>> 
>> Qing
>> 
>> 
>>> On Jan 5, 2021, at 1:05 PM, Qing Zhao via Gcc-patches 
>>>  wrote:
>>> 
>>> Hi,
>>> 
>>> This is an update for our previous discussion. 
>>> 
>>> 1. I implemented the following two different implementations in the latest 
>>> upstream gcc:
>>> 
>>> A. Adding real initialization during gimplification, not maintain the 
>>> uninitialized warnings.
>>> 
>>> D. Adding  calls to .DEFFERED_INIT during gimplification, expand the 
>>> .DEFFERED_INIT during expand to
>>> real initialization. Adjusting uninitialized pass with the new refs with 
>>> “.DEFFERED_INIT”.
>>> 
>>> Note, in this initial implementation,
>>> ** I ONLY implement -ftrivial-auto-var-init=zero, the implementation of 
>>> -ftrivial-auto-var-init=pattern 
>>>is not done yet.  Therefore, the performance data is only about 
>>> -ftrivial-auto-var-init=zero. 
>>> 
>>> ** I added an temporary  option -fauto-var-init-approach=A|B|C|D  to 
>>> choose implementation A or D for 
>>>runtime performance study.
>>> ** I didn’t finish the uninitialized warnings maintenance work for D. 
>>> (That might take more time than I expected). 
>>> 
>>> 2. I collected runtime data for CPU2017 on a x86 machine with this new gcc 
>>> for the following 3 cases:
>>> 
>>> no: default. (-g -O2 -march=native )
>>> A:  default +  -ftrivial-auto-var-init=zero -fauto-var-init-approach=A 
>>> D:  default +  -ftrivial-auto-var-init=zero -fauto-var-init-approach=D 
>>> 
>>> And then compute the slowdown data for both A and D as following:
>>> 
>>> benchmarks  A / no  D /no
>>> 
>>> 500.perlbench_r 1.25%   1.25%
>>> 502.gcc_r   0.68%   1.80%
>>> 505.mcf_r   0.68%   0.14%
>>> 520.omnetpp_r   4.83%   4.68%
>>> 523.xalancbmk_r 0.18%   1.96%
>>> 525.x264_r  1.55%   2.07%
>>> 531.deepsjeng_  11.57%  11.85%
>>> 541.leela_r 0.64%   0.80%
>>> 557.xz_  -0.41% -0.41%
>>> 
>>> 507.cactuBSSN_r 0.44%   0.44%
>>> 508.namd_r  0.34%   0.34%
>>> 510.parest_r0.17%   0.25%
>>> 511.povray_r56.57%  57.27%
>>> 519.lbm_r   0.00%   0.00%
>>> 521.wrf_r-0.28% -0.37%
>>> 526.blender_r   16.96%  17.71%
>>> 527.cam4_r  0.70%   0.53%
>>> 538.imagick_r   2.40%   2.40%
>>> 544.nab_r   

Re: The performance data for two different implementation of new security feature -ftrivial-auto-var-init

2021-01-12 Thread Qing Zhao via Gcc-patches
Hi, 

Just check in to see whether you have any comments and suggestions on this:

FYI, I have been continue with Approach D implementation since last week:

D. Adding  calls to .DEFFERED_INIT during gimplification, expand the 
.DEFFERED_INIT during expand to
real initialization. Adjusting uninitialized pass with the new refs with 
“.DEFFERED_INIT”.

For the remaining work of Approach D:

 ** complete the implementation of -ftrivial-auto-var-init=pattern;
 ** complete the implementation of uninitialized warnings maintenance work for 
D. 

I have completed the uninitialized warnings maintenance work for D.
And finished partial of the -ftrivial-auto-var-init=pattern implementation. 

The following are remaining work of Approach D:

   ** -ftrivial-auto-var-init=pattern for VLA;
   **add a new attribute for variable:
__attribute((uninitialized)
the marked variable is uninitialized intentionaly for performance purpose.
   ** adding complete testing cases;
  

Please let me know if you have any objection on my current decision on 
implementing approach D. 

Thanks a lot for your help.

Qing


> On Jan 5, 2021, at 1:05 PM, Qing Zhao via Gcc-patches 
>  wrote:
> 
> Hi,
> 
> This is an update for our previous discussion. 
> 
> 1. I implemented the following two different implementations in the latest 
> upstream gcc:
> 
> A. Adding real initialization during gimplification, not maintain the 
> uninitialized warnings.
> 
> D. Adding  calls to .DEFFERED_INIT during gimplification, expand the 
> .DEFFERED_INIT during expand to
> real initialization. Adjusting uninitialized pass with the new refs with 
> “.DEFFERED_INIT”.
> 
> Note, in this initial implementation,
>   ** I ONLY implement -ftrivial-auto-var-init=zero, the implementation of 
> -ftrivial-auto-var-init=pattern 
>  is not done yet.  Therefore, the performance data is only about 
> -ftrivial-auto-var-init=zero. 
> 
>   ** I added an temporary  option -fauto-var-init-approach=A|B|C|D  to 
> choose implementation A or D for 
>  runtime performance study.
>   ** I didn’t finish the uninitialized warnings maintenance work for D. 
> (That might take more time than I expected). 
> 
> 2. I collected runtime data for CPU2017 on a x86 machine with this new gcc 
> for the following 3 cases:
> 
> no: default. (-g -O2 -march=native )
> A:  default +  -ftrivial-auto-var-init=zero -fauto-var-init-approach=A 
> D:  default +  -ftrivial-auto-var-init=zero -fauto-var-init-approach=D 
> 
> And then compute the slowdown data for both A and D as following:
> 
> benchmarksA / no  D /no
> 
> 500.perlbench_r   1.25%   1.25%
> 502.gcc_r 0.68%   1.80%
> 505.mcf_r 0.68%   0.14%
> 520.omnetpp_r 4.83%   4.68%
> 523.xalancbmk_r   0.18%   1.96%
> 525.x264_r1.55%   2.07%
> 531.deepsjeng_11.57%  11.85%
> 541.leela_r   0.64%   0.80%
> 557.xz_-0.41% -0.41%
> 
> 507.cactuBSSN_r   0.44%   0.44%
> 508.namd_r0.34%   0.34%
> 510.parest_r  0.17%   0.25%
> 511.povray_r  56.57%  57.27%
> 519.lbm_r 0.00%   0.00%
> 521.wrf_r  -0.28% -0.37%
> 526.blender_r 16.96%  17.71%
> 527.cam4_r0.70%   0.53%
> 538.imagick_r 2.40%   2.40%
> 544.nab_r 0.00%   -0.65%
> 
> avg   5.17%   5.37%
> 
> From the above data, we can see that in general, the runtime performance 
> slowdown for 
> implementation A and D are similar for individual benchmarks.
> 
> There are several benchmarks that have significant slowdown with the new 
> added initialization for both
> A and D, for example, 511.povray_r, 526.blender_, and 531.deepsjeng_r, I will 
> try to study a little bit
> more on what kind of new initializations introduced such slowdown. 
> 
> From the current study so far, I think that approach D should be good enough 
> for our final implementation. 
> So, I will try to finish approach D with the following remaining work
> 
>  ** complete the implementation of -ftrivial-auto-var-init=pattern;
>  ** complete the implementation of uninitialized warnings maintenance 
> work for D. 
> 
> 
> Let me know if you have any comments and suggestions on my current and future 
> work.
> 
> Thanks a lot for your help.
> 
> Qing
> 
>> On Dec 9, 2020, at 10:18 AM, Qing Zhao via Gcc-patches 
>>  wrote:
>> 
>> The following are the approaches I will implement and compare:
>> 
>> Our final goal is to keep the uninitialized warning and minimize the 
>> run-time performance cost.
>> 
>> A. Adding real initialization during gimplification, not mainta

Pointer width in GCC?

2021-01-08 Thread Qing Zhao via Gcc-patches
Hi,

Is there an utility routine in GCC to query the pointer width of the current 
target? Whether it’s 32bit pointer or 64 bit pointer for the target?

Thanks a lot for the help.

Qing

Re: The performance data for two different implementation of new security feature -ftrivial-auto-var-init

2021-01-05 Thread Qing Zhao via Gcc-patches
I am attaching my current (incomplete) patch to gcc for your reference.

From a71eb73bee5857440c4ff67c4c82be115e0675cb Mon Sep 17 00:00:00 2001
From: qing zhao 
Date: Sat, 12 Dec 2020 00:02:28 +0100
Subject: [PATCH] First version of -ftrivial-auto-var-init

---
 gcc/common.opt| 35 ++
 gcc/flag-types.h  | 14 
 gcc/gimple-pretty-print.c |  2 +-
 gcc/gimplify.c| 90 +++
 gcc/internal-fn.c | 20 +++
 gcc/internal-fn.def   |  5 +++
 gcc/tree-cfg.c|  3 ++
 gcc/tree-ssa-uninit.c |  3 ++
 gcc/tree-ssa.c|  5 +++
 9 files changed, 176 insertions(+), 1 deletion(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index 6645539f5e5..c4c4fc28ef7 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -3053,6 +3053,41 @@ ftree-scev-cprop
 Common Report Var(flag_tree_scev_cprop) Init(1) Optimization
 Enable copy propagation of scalar-evolution information.
 
+ftrivial-auto-var-init=
+Common Joined RejectNegative Enum(auto_init_type) 
Var(flag_trivial_auto_var_init) Init(AUTO_INIT_UNINITIALIZED)
+-ftrivial-auto-var-init=[uninitialized|pattern|zero]   Add initializations to 
automatic variables. 
+
+Enum
+Name(auto_init_type) Type(enum auto_init_type) UnknownError(unrecognized 
automatic variable initialization type %qs)
+
+EnumValue
+Enum(auto_init_type) String(uninitialized) Value(AUTO_INIT_UNINITIALIZED)
+
+EnumValue
+Enum(auto_init_type) String(pattern) Value(AUTO_INIT_PATTERN)
+
+EnumValue
+Enum(auto_init_type) String(zero) Value(AUTO_INIT_ZERO)
+
+fauto-var-init-approach=
+Common Joined RejectNegative Enum(auto_init_approach) 
Var(flag_auto_init_approach) Init(AUTO_INIT_A))
+-fauto-var-init-approach=[A|B|C|D] Choose the approach to initialize 
automatic variables.  
+
+Enum
+Name(auto_init_approach) Type(enum auto_init_approach) 
UnknownError(unrecognized automatic variable initialization approach %qs)
+
+EnumValue
+Enum(auto_init_approach) String(A) Value(AUTO_INIT_A)
+
+EnumValue
+Enum(auto_init_approach) String(B) Value(AUTO_INIT_B)
+
+EnumValue
+Enum(auto_init_approach) String(C) Value(AUTO_INIT_C)
+
+EnumValue
+Enum(auto_init_approach) String(D) Value(AUTO_INIT_D)
+
 ; -fverbose-asm causes extra commentary information to be produced in
 ; the generated assembly code (to make it more readable).  This option
 ; is generally only of use to those who actually need to read the
diff --git a/gcc/flag-types.h b/gcc/flag-types.h
index 9342bd87be3..bfd0692b82c 100644
--- a/gcc/flag-types.h
+++ b/gcc/flag-types.h
@@ -242,6 +242,20 @@ enum vect_cost_model {
   VECT_COST_MODEL_DEFAULT = 1
 };
 
+/* Automatic variable initialization type.  */
+enum auto_init_type {
+  AUTO_INIT_UNINITIALIZED = 0,
+  AUTO_INIT_PATTERN = 1,
+  AUTO_INIT_ZERO = 2
+};
+
+enum auto_init_approach {
+  AUTO_INIT_A = 0,
+  AUTO_INIT_B = 1,
+  AUTO_INIT_C = 2,
+  AUTO_INIT_D = 3
+};
+
 /* Different instrumentation modes.  */
 enum sanitize_code {
   /* AddressSanitizer.  */
diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c
index 075d6e5208a..1044d54e8d3 100644
--- a/gcc/gimple-pretty-print.c
+++ b/gcc/gimple-pretty-print.c
@@ -81,7 +81,7 @@ newline_and_indent (pretty_printer *buffer, int spc)
 DEBUG_FUNCTION void
 debug_gimple_stmt (gimple *gs)
 {
-  print_gimple_stmt (stderr, gs, 0, TDF_VOPS|TDF_MEMSYMS);
+  print_gimple_stmt (stderr, gs, 0, TDF_VOPS|TDF_MEMSYMS|TDF_LINENO|TDF_ALIAS);
 }
 
 
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 54cb66bd1dd..1eb0747ea2f 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -1674,6 +1674,16 @@ gimplify_return_expr (tree stmt, gimple_seq *pre_p)
   return GS_ALL_DONE;
 }
 
+/* Return the value that is used to initialize the vla DECL based 
+   on INIT_TYPE.  */
+tree memset_init_node (enum auto_init_type init_type)
+{
+  if (init_type == AUTO_INIT_ZERO)
+return integer_zero_node;
+  else
+gcc_assert (0);
+}
+
 /* Gimplify a variable-length array DECL.  */
 
 static void
@@ -1712,6 +1722,19 @@ gimplify_vla_decl (tree decl, gimple_seq *seq_p)
 
   gimplify_and_add (t, seq_p);
 
+  /* Add a call to memset to initialize this vla when the user requested.  */
+  if (flag_trivial_auto_var_init > AUTO_INIT_UNINITIALIZED
+  && !DECL_ARTIFICIAL (decl)
+  && VAR_P (decl) 
+  && !DECL_EXTERNAL (decl) 
+  && !TREE_STATIC (decl))
+  {
+t = builtin_decl_implicit (BUILT_IN_MEMSET);
+tree init_node = memset_init_node (flag_trivial_auto_var_init);
+t = build_call_expr (t, 3, addr, init_node, DECL_SIZE_UNIT (decl)); 
+gimplify_and_add (t, seq_p);
+  }
+
   /* Record the dynamic allocation associated with DECL if requested.  */
   if (flag_callgraph_info & CALLGRAPH_INFO_DYNAMIC_ALLOC)
 record_dynamic_alloc (decl);
@@ -1734,6 +1757,63 @@ force_labels_r (tree *tp, int *walk_subtrees, void *data 
ATTRIBUTE_UNUSED)
   return NULL_TREE;
 }
 
+
+/* Build a call to internal const functio

The performance data for two different implementation of new security feature -ftrivial-auto-var-init

2021-01-05 Thread Qing Zhao via Gcc-patches
Hi,

This is an update for our previous discussion. 

1. I implemented the following two different implementations in the latest 
upstream gcc:

A. Adding real initialization during gimplification, not maintain the 
uninitialized warnings.

D. Adding  calls to .DEFFERED_INIT during gimplification, expand the 
.DEFFERED_INIT during expand to
 real initialization. Adjusting uninitialized pass with the new refs with 
“.DEFFERED_INIT”.

Note, in this initial implementation,
** I ONLY implement -ftrivial-auto-var-init=zero, the implementation of 
-ftrivial-auto-var-init=pattern 
   is not done yet.  Therefore, the performance data is only about 
-ftrivial-auto-var-init=zero. 

** I added an temporary  option -fauto-var-init-approach=A|B|C|D  to 
choose implementation A or D for 
   runtime performance study.
** I didn’t finish the uninitialized warnings maintenance work for D. 
(That might take more time than I expected). 

2. I collected runtime data for CPU2017 on a x86 machine with this new gcc for 
the following 3 cases:

no: default. (-g -O2 -march=native )
A:  default +  -ftrivial-auto-var-init=zero -fauto-var-init-approach=A 
D:  default +  -ftrivial-auto-var-init=zero -fauto-var-init-approach=D 

And then compute the slowdown data for both A and D as following:

benchmarks  A / no  D /no

500.perlbench_r 1.25%   1.25%
502.gcc_r   0.68%   1.80%
505.mcf_r   0.68%   0.14%
520.omnetpp_r   4.83%   4.68%
523.xalancbmk_r 0.18%   1.96%
525.x264_r  1.55%   2.07%
531.deepsjeng_  11.57%  11.85%
541.leela_r 0.64%   0.80%
557.xz_  -0.41% -0.41%

507.cactuBSSN_r 0.44%   0.44%
508.namd_r  0.34%   0.34%
510.parest_r0.17%   0.25%
511.povray_r56.57%  57.27%
519.lbm_r   0.00%   0.00%
521.wrf_r-0.28% -0.37%
526.blender_r   16.96%  17.71%
527.cam4_r  0.70%   0.53%
538.imagick_r   2.40%   2.40%
544.nab_r   0.00%   -0.65%

avg 5.17%   5.37%

From the above data, we can see that in general, the runtime performance 
slowdown for 
implementation A and D are similar for individual benchmarks.

There are several benchmarks that have significant slowdown with the new added 
initialization for both
A and D, for example, 511.povray_r, 526.blender_, and 531.deepsjeng_r, I will 
try to study a little bit
more on what kind of new initializations introduced such slowdown. 

From the current study so far, I think that approach D should be good enough 
for our final implementation. 
So, I will try to finish approach D with the following remaining work

  ** complete the implementation of -ftrivial-auto-var-init=pattern;
  ** complete the implementation of uninitialized warnings maintenance work 
for D. 


Let me know if you have any comments and suggestions on my current and future 
work.

Thanks a lot for your help.

Qing

> On Dec 9, 2020, at 10:18 AM, Qing Zhao via Gcc-patches 
>  wrote:
> 
> The following are the approaches I will implement and compare:
> 
> Our final goal is to keep the uninitialized warning and minimize the run-time 
> performance cost.
> 
> A. Adding real initialization during gimplification, not maintain the 
> uninitialized warnings.
> B. Adding real initialization during gimplification, marking them with 
> “artificial_init”. 
> Adjusting uninitialized pass, maintaining the annotation, making sure the 
> real init not
> Deleted from the fake init. 
> C.  Marking the DECL for an uninitialized auto variable as “no_explicit_init” 
> during gimplification,
>  maintain this “no_explicit_init” bit till after 
> pass_late_warn_uninitialized, or till pass_expand, 
>  add real initialization for all DECLs that are marked with 
> “no_explicit_init”.
> D. Adding .DEFFERED_INIT during gimplification, expand the .DEFFERED_INIT 
> during expand to
> real initialization. Adjusting uninitialized pass with the new refs with 
> “.DEFFERED_INIT”.
> 
> 
> In the above, approach A will be the one that have the minimum run-time cost, 
> will be the base for the performance
> comparison. 
> 
> I will implement approach D then, this one is expected to have the most 
> run-time overhead among the above list, but
> Implementation should be the cleanest among B, C, D. Let’s see how much more 
> performance overhead this approach
> will be. If the data is good, maybe we can avoid the effort to implement B, 
> and C. 
> 
> If the performance of D is not good, I will implement B or C at that time.
> 
> Let me know if you have any comment or suggestions.
> 
> Thanks.
> 
> Qing



Re: How to traverse all the local variables that declared in the current routine?

2020-12-09 Thread Qing Zhao via Gcc-patches



> On Dec 9, 2020, at 9:12 AM, Richard Biener  wrote:
> 
> On Wed, Dec 9, 2020 at 4:04 PM Qing Zhao  <mailto:qing.z...@oracle.com>> wrote:
>> 
>> 
>> 
>> On Dec 9, 2020, at 2:23 AM, Richard Biener  
>> wrote:
>> 
>> On Tue, Dec 8, 2020 at 8:54 PM Qing Zhao  wrote:
>> 
>> 
>> 
>> 
>> On Dec 8, 2020, at 1:40 AM, Richard Biener  
>> wrote:
>> 
>> On Mon, Dec 7, 2020 at 5:20 PM Qing Zhao  wrote:
>> 
>> 
>> 
>> 
>> On Dec 7, 2020, at 1:12 AM, Richard Biener  
>> wrote:
>> 
>> On Fri, Dec 4, 2020 at 5:19 PM Qing Zhao  wrote:
>> 
>> 
>> 
>> 
>> On Dec 4, 2020, at 2:50 AM, Richard Biener  
>> wrote:
>> 
>> On Thu, Dec 3, 2020 at 6:33 PM Richard Sandiford
>>  wrote:
>> 
>> 
>> Richard Biener via Gcc-patches  writes:
>> 
>> On Tue, Nov 24, 2020 at 4:47 PM Qing Zhao  wrote:
>> 
>> Another issue is, in order to check whether an auto-variable has 
>> initializer, I plan to add a new bit in “decl_common” as:
>> /* In a VAR_DECL, this is DECL_IS_INITIALIZED.  */
>> unsigned decl_is_initialized :1;
>> 
>> /* IN VAR_DECL, set when the decl is initialized at the declaration.  */
>> #define DECL_IS_INITIALIZED(NODE) \
>> (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized)
>> 
>> set this bit when setting DECL_INITIAL for the variables in FE. then keep it
>> even though DECL_INITIAL might be NULLed.
>> 
>> 
>> For locals it would be more reliable to set this flag during gimplification.
>> 
>> Do you have any comment and suggestions?
>> 
>> 
>> As said above - do you want to cover registers as well as locals?  I'd do
>> the actual zeroing during RTL expansion instead since otherwise you
>> have to figure youself whether a local is actually used (see 
>> expand_stack_vars)
>> 
>> Note that optimization will already made have use of "uninitialized" state
>> of locals so depending on what the actual goal is here "late" may be too 
>> late.
>> 
>> 
>> Haven't thought about this much, so it might be a daft idea, but would a
>> compromise be to use a const internal function:
>> 
>> X1 = .DEFERRED_INIT (X0, INIT)
>> 
>> where the X0 argument is an uninitialised value and the INIT argument
>> describes the initialisation pattern?  So for a decl we'd have:
>> 
>> X = .DEFERRED_INIT (X, INIT)
>> 
>> and for an SSA name we'd have:
>> 
>> X_2 = .DEFERRED_INIT (X_1(D), INIT)
>> 
>> with all other uses of X_1(D) being replaced by X_2.  The idea is that:
>> 
>> * Having the X0 argument would keep the uninitialised use of the
>> variable around for the later warning passes.
>> 
>> * Using a const function should still allow the UB to be deleted as dead
>> if X1 isn't needed.
>> 
>> * Having a function in the way should stop passes from taking advantage
>> of direct uninitialised uses for optimisation.
>> 
>> This means we won't be able to optimise based on the actual init
>> value at the gimple level, but that seems like a fair trade-off.
>> AIUI this is really a security feature or anti-UB hardening feature
>> (in the sense that users are more likely to see predictable behaviour
>> “in the field” even if the program has UB).
>> 
>> 
>> The question is whether it's in line of peoples expectation that
>> explicitely zero-initialized code behaves differently from
>> implicitely zero-initialized code with respect to optimization
>> and secondary side-effects (late diagnostics, latent bugs, etc.).
>> 
>> Introducing a new concept like .DEFERRED_INIT is much more
>> heavy-weight than an explicit zero initializer.
>> 
>> 
>> What exactly you mean by “heavy-weight”? More difficult to implement or much 
>> more run-time overhead or both? Or something else?
>> 
>> The major benefit of the approach of “.DEFERRED_INIT”  is to enable us keep 
>> the current -Wuninitialized analysis untouched and also pass
>> the “uninitialized” info from source code level to “pass_expand”.
>> 
>> 
>> Well, "untouched" is a bit oversimplified.  You do need to handle
>> .DEFERRED_INIT as not
>> being an initialization which will definitely get interesting.
>> 
>> 
>> Yes, during uninitialized variable analysis pass, we should specially handle 
>> the defs with “.DEFERRED_INIT”, to treat them as uninitializations.
>> 
>> If we want to keep the current -Wuninitialized an

Re: How to traverse all the local variables that declared in the current routine?

2020-12-09 Thread Qing Zhao via Gcc-patches



> On Dec 9, 2020, at 2:23 AM, Richard Biener  wrote:
> 
> On Tue, Dec 8, 2020 at 8:54 PM Qing Zhao  <mailto:qing.z...@oracle.com>> wrote:
>> 
>> 
>> 
>> On Dec 8, 2020, at 1:40 AM, Richard Biener > <mailto:richard.guent...@gmail.com>> wrote:
>> 
>> On Mon, Dec 7, 2020 at 5:20 PM Qing Zhao > <mailto:qing.z...@oracle.com>> wrote:
>> 
>> 
>> 
>> 
>> On Dec 7, 2020, at 1:12 AM, Richard Biener  
>> wrote:
>> 
>> On Fri, Dec 4, 2020 at 5:19 PM Qing Zhao  wrote:
>> 
>> 
>> 
>> 
>> On Dec 4, 2020, at 2:50 AM, Richard Biener  
>> wrote:
>> 
>> On Thu, Dec 3, 2020 at 6:33 PM Richard Sandiford
>>  wrote:
>> 
>> 
>> Richard Biener via Gcc-patches  writes:
>> 
>> On Tue, Nov 24, 2020 at 4:47 PM Qing Zhao  wrote:
>> 
>> Another issue is, in order to check whether an auto-variable has 
>> initializer, I plan to add a new bit in “decl_common” as:
>> /* In a VAR_DECL, this is DECL_IS_INITIALIZED.  */
>> unsigned decl_is_initialized :1;
>> 
>> /* IN VAR_DECL, set when the decl is initialized at the declaration.  */
>> #define DECL_IS_INITIALIZED(NODE) \
>> (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized)
>> 
>> set this bit when setting DECL_INITIAL for the variables in FE. then keep it
>> even though DECL_INITIAL might be NULLed.
>> 
>> 
>> For locals it would be more reliable to set this flag during gimplification.
>> 
>> Do you have any comment and suggestions?
>> 
>> 
>> As said above - do you want to cover registers as well as locals?  I'd do
>> the actual zeroing during RTL expansion instead since otherwise you
>> have to figure youself whether a local is actually used (see 
>> expand_stack_vars)
>> 
>> Note that optimization will already made have use of "uninitialized" state
>> of locals so depending on what the actual goal is here "late" may be too 
>> late.
>> 
>> 
>> Haven't thought about this much, so it might be a daft idea, but would a
>> compromise be to use a const internal function:
>> 
>> X1 = .DEFERRED_INIT (X0, INIT)
>> 
>> where the X0 argument is an uninitialised value and the INIT argument
>> describes the initialisation pattern?  So for a decl we'd have:
>> 
>> X = .DEFERRED_INIT (X, INIT)
>> 
>> and for an SSA name we'd have:
>> 
>> X_2 = .DEFERRED_INIT (X_1(D), INIT)
>> 
>> with all other uses of X_1(D) being replaced by X_2.  The idea is that:
>> 
>> * Having the X0 argument would keep the uninitialised use of the
>> variable around for the later warning passes.
>> 
>> * Using a const function should still allow the UB to be deleted as dead
>> if X1 isn't needed.
>> 
>> * Having a function in the way should stop passes from taking advantage
>> of direct uninitialised uses for optimisation.
>> 
>> This means we won't be able to optimise based on the actual init
>> value at the gimple level, but that seems like a fair trade-off.
>> AIUI this is really a security feature or anti-UB hardening feature
>> (in the sense that users are more likely to see predictable behaviour
>> “in the field” even if the program has UB).
>> 
>> 
>> The question is whether it's in line of peoples expectation that
>> explicitely zero-initialized code behaves differently from
>> implicitely zero-initialized code with respect to optimization
>> and secondary side-effects (late diagnostics, latent bugs, etc.).
>> 
>> Introducing a new concept like .DEFERRED_INIT is much more
>> heavy-weight than an explicit zero initializer.
>> 
>> 
>> What exactly you mean by “heavy-weight”? More difficult to implement or much 
>> more run-time overhead or both? Or something else?
>> 
>> The major benefit of the approach of “.DEFERRED_INIT”  is to enable us keep 
>> the current -Wuninitialized analysis untouched and also pass
>> the “uninitialized” info from source code level to “pass_expand”.
>> 
>> 
>> Well, "untouched" is a bit oversimplified.  You do need to handle
>> .DEFERRED_INIT as not
>> being an initialization which will definitely get interesting.
>> 
>> 
>> Yes, during uninitialized variable analysis pass, we should specially handle 
>> the defs with “.DEFERRED_INIT”, to treat them as uninitializations.
>> 
>> If we want to keep the current -Wuninitialized analysis untouched, this is a 
>> quite reasonable approach.
>> 
>> However, if it’s not req

Re: How to traverse all the local variables that declared in the current routine?

2020-12-08 Thread Qing Zhao via Gcc-patches



> On Dec 8, 2020, at 1:40 AM, Richard Biener  wrote:
> 
> On Mon, Dec 7, 2020 at 5:20 PM Qing Zhao  <mailto:qing.z...@oracle.com>> wrote:
>> 
>> 
>> 
>> On Dec 7, 2020, at 1:12 AM, Richard Biener  
>> wrote:
>> 
>> On Fri, Dec 4, 2020 at 5:19 PM Qing Zhao  wrote:
>> 
>> 
>> 
>> 
>> On Dec 4, 2020, at 2:50 AM, Richard Biener  
>> wrote:
>> 
>> On Thu, Dec 3, 2020 at 6:33 PM Richard Sandiford
>>  wrote:
>> 
>> 
>> Richard Biener via Gcc-patches  writes:
>> 
>> On Tue, Nov 24, 2020 at 4:47 PM Qing Zhao  wrote:
>> 
>> Another issue is, in order to check whether an auto-variable has 
>> initializer, I plan to add a new bit in “decl_common” as:
>> /* In a VAR_DECL, this is DECL_IS_INITIALIZED.  */
>> unsigned decl_is_initialized :1;
>> 
>> /* IN VAR_DECL, set when the decl is initialized at the declaration.  */
>> #define DECL_IS_INITIALIZED(NODE) \
>> (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized)
>> 
>> set this bit when setting DECL_INITIAL for the variables in FE. then keep it
>> even though DECL_INITIAL might be NULLed.
>> 
>> 
>> For locals it would be more reliable to set this flag during gimplification.
>> 
>> Do you have any comment and suggestions?
>> 
>> 
>> As said above - do you want to cover registers as well as locals?  I'd do
>> the actual zeroing during RTL expansion instead since otherwise you
>> have to figure youself whether a local is actually used (see 
>> expand_stack_vars)
>> 
>> Note that optimization will already made have use of "uninitialized" state
>> of locals so depending on what the actual goal is here "late" may be too 
>> late.
>> 
>> 
>> Haven't thought about this much, so it might be a daft idea, but would a
>> compromise be to use a const internal function:
>> 
>> X1 = .DEFERRED_INIT (X0, INIT)
>> 
>> where the X0 argument is an uninitialised value and the INIT argument
>> describes the initialisation pattern?  So for a decl we'd have:
>> 
>> X = .DEFERRED_INIT (X, INIT)
>> 
>> and for an SSA name we'd have:
>> 
>> X_2 = .DEFERRED_INIT (X_1(D), INIT)
>> 
>> with all other uses of X_1(D) being replaced by X_2.  The idea is that:
>> 
>> * Having the X0 argument would keep the uninitialised use of the
>> variable around for the later warning passes.
>> 
>> * Using a const function should still allow the UB to be deleted as dead
>> if X1 isn't needed.
>> 
>> * Having a function in the way should stop passes from taking advantage
>> of direct uninitialised uses for optimisation.
>> 
>> This means we won't be able to optimise based on the actual init
>> value at the gimple level, but that seems like a fair trade-off.
>> AIUI this is really a security feature or anti-UB hardening feature
>> (in the sense that users are more likely to see predictable behaviour
>> “in the field” even if the program has UB).
>> 
>> 
>> The question is whether it's in line of peoples expectation that
>> explicitely zero-initialized code behaves differently from
>> implicitely zero-initialized code with respect to optimization
>> and secondary side-effects (late diagnostics, latent bugs, etc.).
>> 
>> Introducing a new concept like .DEFERRED_INIT is much more
>> heavy-weight than an explicit zero initializer.
>> 
>> 
>> What exactly you mean by “heavy-weight”? More difficult to implement or much 
>> more run-time overhead or both? Or something else?
>> 
>> The major benefit of the approach of “.DEFERRED_INIT”  is to enable us keep 
>> the current -Wuninitialized analysis untouched and also pass
>> the “uninitialized” info from source code level to “pass_expand”.
>> 
>> 
>> Well, "untouched" is a bit oversimplified.  You do need to handle
>> .DEFERRED_INIT as not
>> being an initialization which will definitely get interesting.
>> 
>> 
>> Yes, during uninitialized variable analysis pass, we should specially handle 
>> the defs with “.DEFERRED_INIT”, to treat them as uninitializations.
>> 
>> If we want to keep the current -Wuninitialized analysis untouched, this is a 
>> quite reasonable approach.
>> 
>> However, if it’s not required to keep the current -Wuninitialized analysis 
>> untouched, adding zero-initializer directly during gimplification should
>> be much easier and simpler, and also smaller run-time overhead.
>> 
>> 
>> As for optimization I fea

Re: How to traverse all the local variables that declared in the current routine?

2020-12-07 Thread Qing Zhao via Gcc-patches



> On Dec 7, 2020, at 12:05 PM, Richard Sandiford  
> wrote:
> 
> Qing Zhao mailto:qing.z...@oracle.com>> writes:
>>> On Dec 7, 2020, at 11:10 AM, Richard Sandiford  
>>> wrote:
>>>>>> 
>>>>>> Another issue is, in order to check whether an auto-variable has 
>>>>>> initializer, I plan to add a new bit in “decl_common” as:
>>>>>> /* In a VAR_DECL, this is DECL_IS_INITIALIZED.  */
>>>>>> unsigned decl_is_initialized :1;
>>>>>> 
>>>>>> /* IN VAR_DECL, set when the decl is initialized at the declaration.  */
>>>>>> #define DECL_IS_INITIALIZED(NODE) \
>>>>>> (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized)
>>>>>> 
>>>>>> set this bit when setting DECL_INITIAL for the variables in FE. then 
>>>>>> keep it
>>>>>> even though DECL_INITIAL might be NULLed.
>>>>>> 
>>>>>> 
>>>>>> For locals it would be more reliable to set this flag during 
>>>>>> gimplification.
>>>>>> 
>>>>>> Do you have any comment and suggestions?
>>>>>> 
>>>>>> 
>>>>>> As said above - do you want to cover registers as well as locals?  I'd do
>>>>>> the actual zeroing during RTL expansion instead since otherwise you
>>>>>> have to figure youself whether a local is actually used (see 
>>>>>> expand_stack_vars)
>>>>>> 
>>>>>> Note that optimization will already made have use of "uninitialized" 
>>>>>> state
>>>>>> of locals so depending on what the actual goal is here "late" may be too 
>>>>>> late.
>>>>>> 
>>>>>> 
>>>>>> Haven't thought about this much, so it might be a daft idea, but would a
>>>>>> compromise be to use a const internal function:
>>>>>> 
>>>>>> X1 = .DEFERRED_INIT (X0, INIT)
>>>>>> 
>>>>>> where the X0 argument is an uninitialised value and the INIT argument
>>>>>> describes the initialisation pattern?  So for a decl we'd have:
>>>>>> 
>>>>>> X = .DEFERRED_INIT (X, INIT)
>>>>>> 
>>>>>> and for an SSA name we'd have:
>>>>>> 
>>>>>> X_2 = .DEFERRED_INIT (X_1(D), INIT)
>>>>>> 
>>>>>> with all other uses of X_1(D) being replaced by X_2.  The idea is that:
>>>>>> 
>>>>>> * Having the X0 argument would keep the uninitialised use of the
>>>>>> variable around for the later warning passes.
>>>>>> 
>>>>>> * Using a const function should still allow the UB to be deleted as dead
>>>>>> if X1 isn't needed.
>>>>>> 
>>>>>> * Having a function in the way should stop passes from taking advantage
>>>>>> of direct uninitialised uses for optimisation.
>>>>>> 
>>>>>> This means we won't be able to optimise based on the actual init
>>>>>> value at the gimple level, but that seems like a fair trade-off.
>>>>>> AIUI this is really a security feature or anti-UB hardening feature
>>>>>> (in the sense that users are more likely to see predictable behaviour
>>>>>> “in the field” even if the program has UB).
>>>>>> 
>>>>>> 
>>>>>> The question is whether it's in line of peoples expectation that
>>>>>> explicitely zero-initialized code behaves differently from
>>>>>> implicitely zero-initialized code with respect to optimization
>>>>>> and secondary side-effects (late diagnostics, latent bugs, etc.).
>>>>>> 
>>>>>> Introducing a new concept like .DEFERRED_INIT is much more
>>>>>> heavy-weight than an explicit zero initializer.
>>>>>> 
>>>>>> 
>>>>>> What exactly you mean by “heavy-weight”? More difficult to implement or 
>>>>>> much more run-time overhead or both? Or something else?
>>>>>> 
>>>>>> The major benefit of the approach of “.DEFERRED_INIT”  is to enable us 
>>>>>> keep the current -Wuninitialized analysis untouched and also pass
>>>>>> the “uninitialized” info from s

Re: How to traverse all the local variables that declared in the current routine?

2020-12-07 Thread Qing Zhao via Gcc-patches



> On Dec 7, 2020, at 11:10 AM, Richard Sandiford  
> wrote:
 
 Another issue is, in order to check whether an auto-variable has 
 initializer, I plan to add a new bit in “decl_common” as:
 /* In a VAR_DECL, this is DECL_IS_INITIALIZED.  */
 unsigned decl_is_initialized :1;
 
 /* IN VAR_DECL, set when the decl is initialized at the declaration.  */
 #define DECL_IS_INITIALIZED(NODE) \
 (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized)
 
 set this bit when setting DECL_INITIAL for the variables in FE. then keep 
 it
 even though DECL_INITIAL might be NULLed.
 
 
 For locals it would be more reliable to set this flag during 
 gimplification.
 
 Do you have any comment and suggestions?
 
 
 As said above - do you want to cover registers as well as locals?  I'd do
 the actual zeroing during RTL expansion instead since otherwise you
 have to figure youself whether a local is actually used (see 
 expand_stack_vars)
 
 Note that optimization will already made have use of "uninitialized" state
 of locals so depending on what the actual goal is here "late" may be too 
 late.
 
 
 Haven't thought about this much, so it might be a daft idea, but would a
 compromise be to use a const internal function:
 
 X1 = .DEFERRED_INIT (X0, INIT)
 
 where the X0 argument is an uninitialised value and the INIT argument
 describes the initialisation pattern?  So for a decl we'd have:
 
 X = .DEFERRED_INIT (X, INIT)
 
 and for an SSA name we'd have:
 
 X_2 = .DEFERRED_INIT (X_1(D), INIT)
 
 with all other uses of X_1(D) being replaced by X_2.  The idea is that:
 
 * Having the X0 argument would keep the uninitialised use of the
 variable around for the later warning passes.
 
 * Using a const function should still allow the UB to be deleted as dead
 if X1 isn't needed.
 
 * Having a function in the way should stop passes from taking advantage
 of direct uninitialised uses for optimisation.
 
 This means we won't be able to optimise based on the actual init
 value at the gimple level, but that seems like a fair trade-off.
 AIUI this is really a security feature or anti-UB hardening feature
 (in the sense that users are more likely to see predictable behaviour
 “in the field” even if the program has UB).
 
 
 The question is whether it's in line of peoples expectation that
 explicitely zero-initialized code behaves differently from
 implicitely zero-initialized code with respect to optimization
 and secondary side-effects (late diagnostics, latent bugs, etc.).
 
 Introducing a new concept like .DEFERRED_INIT is much more
 heavy-weight than an explicit zero initializer.
 
 
 What exactly you mean by “heavy-weight”? More difficult to implement or 
 much more run-time overhead or both? Or something else?
 
 The major benefit of the approach of “.DEFERRED_INIT”  is to enable us 
 keep the current -Wuninitialized analysis untouched and also pass
 the “uninitialized” info from source code level to “pass_expand”.
>>> 
>>> Well, "untouched" is a bit oversimplified.  You do need to handle
>>> .DEFERRED_INIT as not
>>> being an initialization which will definitely get interesting.
>> 
>> Yes, during uninitialized variable analysis pass, we should specially handle 
>> the defs with “.DEFERRED_INIT”, to treat them as uninitializations.
> 
> Are you sure we need to do that?  The point of having the first argument
> to .DEFERRED_INIT was that that argument would still provide an
> uninitialised use of the variable.  And the values are passed and
> returned by value, so the lack of initialisation is explicit in
> the gcall itself, without knowing what the target function does.
> 
> The idea is that we can essentially treat .DEFERRED_INIT as a normal
> (const) function call.  I'd be surprised if many passes needed to
> handle it specially.
> 

Just checked with a small testing case (to emulate the .DEFERRED_INIT approach):

qinzhao@gcc10:~/Bugs/auto-init$ cat t.c
extern int DEFFERED_INIT (int, int) __attribute__ ((const));

int foo (int n, int r)
{
  int v;

  v = DEFFERED_INIT (v, 0);
  if (n < 10) 
v = r;

  return v;
}
qinzhao@gcc10:~/Bugs/auto-init$ sh t
/home/qinzhao/Install/latest_write/bin/gcc -O -Wuninitialized -fdump-tree-all 
-S t.c
t.c: In function ‘foo’:
t.c:7:7: warning: ‘v’ is used uninitialized [-Wuninitialized]
7 |   v = DEFFERED_INIT (v, 0);
  |   ^~~~

We can see that the current uninitialized variable analysis treats the new 
added artificial initialization as the first use of the uninialized variable.  
Therefore report the warning there.
However, we should report warning at “return v”. 
So, I think that we still need to specifically handle the new added artificial 

Re: How to traverse all the local variables that declared in the current routine?

2020-12-07 Thread Qing Zhao via Gcc-patches



> On Dec 7, 2020, at 1:12 AM, Richard Biener  wrote:
> 
> On Fri, Dec 4, 2020 at 5:19 PM Qing Zhao  <mailto:qing.z...@oracle.com>> wrote:
>> 
>> 
>> 
>> On Dec 4, 2020, at 2:50 AM, Richard Biener  
>> wrote:
>> 
>> On Thu, Dec 3, 2020 at 6:33 PM Richard Sandiford
>>  wrote:
>> 
>> 
>> Richard Biener via Gcc-patches  writes:
>> 
>> On Tue, Nov 24, 2020 at 4:47 PM Qing Zhao  wrote:
>> 
>> Another issue is, in order to check whether an auto-variable has 
>> initializer, I plan to add a new bit in “decl_common” as:
>> /* In a VAR_DECL, this is DECL_IS_INITIALIZED.  */
>> unsigned decl_is_initialized :1;
>> 
>> /* IN VAR_DECL, set when the decl is initialized at the declaration.  */
>> #define DECL_IS_INITIALIZED(NODE) \
>> (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized)
>> 
>> set this bit when setting DECL_INITIAL for the variables in FE. then keep it
>> even though DECL_INITIAL might be NULLed.
>> 
>> 
>> For locals it would be more reliable to set this flag during gimplification.
>> 
>> Do you have any comment and suggestions?
>> 
>> 
>> As said above - do you want to cover registers as well as locals?  I'd do
>> the actual zeroing during RTL expansion instead since otherwise you
>> have to figure youself whether a local is actually used (see 
>> expand_stack_vars)
>> 
>> Note that optimization will already made have use of "uninitialized" state
>> of locals so depending on what the actual goal is here "late" may be too 
>> late.
>> 
>> 
>> Haven't thought about this much, so it might be a daft idea, but would a
>> compromise be to use a const internal function:
>> 
>> X1 = .DEFERRED_INIT (X0, INIT)
>> 
>> where the X0 argument is an uninitialised value and the INIT argument
>> describes the initialisation pattern?  So for a decl we'd have:
>> 
>> X = .DEFERRED_INIT (X, INIT)
>> 
>> and for an SSA name we'd have:
>> 
>> X_2 = .DEFERRED_INIT (X_1(D), INIT)
>> 
>> with all other uses of X_1(D) being replaced by X_2.  The idea is that:
>> 
>> * Having the X0 argument would keep the uninitialised use of the
>> variable around for the later warning passes.
>> 
>> * Using a const function should still allow the UB to be deleted as dead
>> if X1 isn't needed.
>> 
>> * Having a function in the way should stop passes from taking advantage
>> of direct uninitialised uses for optimisation.
>> 
>> This means we won't be able to optimise based on the actual init
>> value at the gimple level, but that seems like a fair trade-off.
>> AIUI this is really a security feature or anti-UB hardening feature
>> (in the sense that users are more likely to see predictable behaviour
>> “in the field” even if the program has UB).
>> 
>> 
>> The question is whether it's in line of peoples expectation that
>> explicitely zero-initialized code behaves differently from
>> implicitely zero-initialized code with respect to optimization
>> and secondary side-effects (late diagnostics, latent bugs, etc.).
>> 
>> Introducing a new concept like .DEFERRED_INIT is much more
>> heavy-weight than an explicit zero initializer.
>> 
>> 
>> What exactly you mean by “heavy-weight”? More difficult to implement or much 
>> more run-time overhead or both? Or something else?
>> 
>> The major benefit of the approach of “.DEFERRED_INIT”  is to enable us keep 
>> the current -Wuninitialized analysis untouched and also pass
>> the “uninitialized” info from source code level to “pass_expand”.
> 
> Well, "untouched" is a bit oversimplified.  You do need to handle
> .DEFERRED_INIT as not
> being an initialization which will definitely get interesting.

Yes, during uninitialized variable analysis pass, we should specially handle 
the defs with “.DEFERRED_INIT”, to treat them as uninitializations.

>> If we want to keep the current -Wuninitialized analysis untouched, this is a 
>> quite reasonable approach.
>> 
>> However, if it’s not required to keep the current -Wuninitialized analysis 
>> untouched, adding zero-initializer directly during gimplification should
>> be much easier and simpler, and also smaller run-time overhead.
>> 
>> 
>> As for optimization I fear you'll get a load of redundant zero-init
>> actually emitted if you can just rely on RTL DSE/DCE to remove it.
>> 
>> 
>> Runtime overhead for -fauto-init=zero is one important consideration for the 
>> whole 

Re: How to traverse all the local variables that declared in the current routine?

2020-12-04 Thread Qing Zhao via Gcc-patches



> On Dec 4, 2020, at 2:50 AM, Richard Biener  wrote:
> 
> On Thu, Dec 3, 2020 at 6:33 PM Richard Sandiford
> mailto:richard.sandif...@arm.com>> wrote:
>> 
>> Richard Biener via Gcc-patches  writes:
>>> On Tue, Nov 24, 2020 at 4:47 PM Qing Zhao  wrote:
>>>> Another issue is, in order to check whether an auto-variable has 
>>>> initializer, I plan to add a new bit in “decl_common” as:
>>>>  /* In a VAR_DECL, this is DECL_IS_INITIALIZED.  */
>>>>  unsigned decl_is_initialized :1;
>>>> 
>>>> /* IN VAR_DECL, set when the decl is initialized at the declaration.  */
>>>> #define DECL_IS_INITIALIZED(NODE) \
>>>>  (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized)
>>>> 
>>>> set this bit when setting DECL_INITIAL for the variables in FE. then keep 
>>>> it
>>>> even though DECL_INITIAL might be NULLed.
>>> 
>>> For locals it would be more reliable to set this flag during gimplification.
>>> 
>>>> Do you have any comment and suggestions?
>>> 
>>> As said above - do you want to cover registers as well as locals?  I'd do
>>> the actual zeroing during RTL expansion instead since otherwise you
>>> have to figure youself whether a local is actually used (see 
>>> expand_stack_vars)
>>> 
>>> Note that optimization will already made have use of "uninitialized" state
>>> of locals so depending on what the actual goal is here "late" may be too 
>>> late.
>> 
>> Haven't thought about this much, so it might be a daft idea, but would a
>> compromise be to use a const internal function:
>> 
>>  X1 = .DEFERRED_INIT (X0, INIT)
>> 
>> where the X0 argument is an uninitialised value and the INIT argument
>> describes the initialisation pattern?  So for a decl we'd have:
>> 
>>  X = .DEFERRED_INIT (X, INIT)
>> 
>> and for an SSA name we'd have:
>> 
>>  X_2 = .DEFERRED_INIT (X_1(D), INIT)
>> 
>> with all other uses of X_1(D) being replaced by X_2.  The idea is that:
>> 
>> * Having the X0 argument would keep the uninitialised use of the
>>  variable around for the later warning passes.
>> 
>> * Using a const function should still allow the UB to be deleted as dead
>>  if X1 isn't needed.
>> 
>> * Having a function in the way should stop passes from taking advantage
>>  of direct uninitialised uses for optimisation.
>> 
>> This means we won't be able to optimise based on the actual init
>> value at the gimple level, but that seems like a fair trade-off.
>> AIUI this is really a security feature or anti-UB hardening feature
>> (in the sense that users are more likely to see predictable behaviour
>> “in the field” even if the program has UB).
> 
> The question is whether it's in line of peoples expectation that
> explicitely zero-initialized code behaves differently from
> implicitely zero-initialized code with respect to optimization
> and secondary side-effects (late diagnostics, latent bugs, etc.).
> 
> Introducing a new concept like .DEFERRED_INIT is much more
> heavy-weight than an explicit zero initializer.

What exactly you mean by “heavy-weight”? More difficult to implement or much 
more run-time overhead or both? Or something else?

The major benefit of the approach of “.DEFERRED_INIT”  is to enable us keep the 
current -Wuninitialized analysis untouched and also pass
the “uninitialized” info from source code level to “pass_expand”. 

If we want to keep the current -Wuninitialized analysis untouched, this is a 
quite reasonable approach. 

However, if it’s not required to keep the current -Wuninitialized analysis 
untouched, adding zero-initializer directly during gimplification should
be much easier and simpler, and also smaller run-time overhead.

> 
> As for optimization I fear you'll get a load of redundant zero-init
> actually emitted if you can just rely on RTL DSE/DCE to remove it.

Runtime overhead for -fauto-init=zero is one important consideration for the 
whole feature, we should minimize the runtime overhead for zero
Initialization since it will be used in production build. 
We can do some run-time performance evaluation when we have an implementation 
ready. 

> 
> Btw, I don't think theres any reason to cling onto clangs semantics
> for a particular switch.  We'll never be able to emulate 1:1 behavior
> and our -Wuninit behavior is probably wastly different already.

From my study so far, yes, the currently behavior of -Wunit for Clang and GCC 
is not exactly the same. 

For example, for the following small testing case:
void blah(int);

int foo_2 (int n, int l, int m, int r)
{
  int v;

  if ( (n > 10) && (m != 100)  && (r < 20) )
v = r;

  if (l > 100)
if ( (n <= 8) &&  (m < 102)  && (r < 19) )
  blah(v); /* { dg-warning "uninitialized" "real warning" } */

  return 0;
}

GCC is able to report maybe uninitialized warning, but Clang cannot. 
Looks like that GCC’s uninitialized analysis relies on more analysis and 
optimization information than CLANG. 

Really curious on how clang implement its uninitialized analysis?

Qing



> 
> Richard.
> 
>> Thanks,
>> Richard



Re: How to traverse all the local variables that declared in the current routine?

2020-12-03 Thread Qing Zhao via Gcc-patches
Hi, Richard,

Thanks a lot for your suggestion.

Actually, I like this idea. 

My understanding of your suggestion is:

1. During gimplification phase:

For each auto-variable that does not have an explicit initializer, insert the 
following initializer for it:

X = DEFERRED_INIT (X, INIT)

In which, DEFERRED_INIT is an internal const function, which can be defined as:

DEF_INTERNAL_FN (DEFERRED_INIT, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)

It’s two arguments are:

1st argument:   this uninitialized auto-variable;
2nd argument:  initialized pattern (zero | pattern);

2.  During tree to SSA phase:  

No change, the current tree to SSA phase should automatically change the above 
new inserted statement as

X_2 = DEFERRED_INIT (X_1(D), INIT);
And all other uses of X-1(D) being replaced by X_2. 

3. During expanding phase:

Expand each call to “DEFERRED_INIT (X, INIT)” to zero or pattern depends on 
“INIT”. 

Is the above understanding correct? Do I miss anything? 

More comments and questions are embedded below:


> On Dec 3, 2020, at 11:32 AM, Richard Sandiford  
> wrote:
> 
> Richard Biener via Gcc-patches  writes:
>> On Tue, Nov 24, 2020 at 4:47 PM Qing Zhao  wrote:
>>> Another issue is, in order to check whether an auto-variable has 
>>> initializer, I plan to add a new bit in “decl_common” as:
>>>  /* In a VAR_DECL, this is DECL_IS_INITIALIZED.  */
>>>  unsigned decl_is_initialized :1;
>>> 
>>> /* IN VAR_DECL, set when the decl is initialized at the declaration.  */
>>> #define DECL_IS_INITIALIZED(NODE) \
>>>  (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized)
>>> 
>>> set this bit when setting DECL_INITIAL for the variables in FE. then keep it
>>> even though DECL_INITIAL might be NULLed.
>> 
>> For locals it would be more reliable to set this flag during gimplification.
>> 
>>> Do you have any comment and suggestions?
>> 
>> As said above - do you want to cover registers as well as locals?  I'd do
>> the actual zeroing during RTL expansion instead since otherwise you
>> have to figure youself whether a local is actually used (see 
>> expand_stack_vars)
>> 
>> Note that optimization will already made have use of "uninitialized" state
>> of locals so depending on what the actual goal is here "late" may be too 
>> late.
> 
> Haven't thought about this much, so it might be a daft idea, but would a
> compromise be to use a const internal function:
> 
>  X1 = .DEFERRED_INIT (X0, INIT)
> 
> where the X0 argument is an uninitialised value and the INIT argument
> describes the initialisation pattern?  So for a decl we'd have:
> 
>  X = .DEFERRED_INIT (X, INIT)
> 
> and for an SSA name we'd have:
> 
>  X_2 = .DEFERRED_INIT (X_1(D), INIT)
> 
> with all other uses of X_1(D) being replaced by X_2.  The idea is that:
> 
> * Having the X0 argument would keep the uninitialised use of the
>  variable around for the later warning passes.
> 
> * Using a const function should still allow the UB to be deleted as dead
>  if X1 isn't needed.

So, current GCC will delete the UB as dead code when X1 is not needed, with
The new option, we should keep this behavior? 

> 
> * Having a function in the way should stop passes from taking advantage
>  of direct uninitialised uses for optimisation.

This will resolve the issue we raised before with directly adding “artificial” 
zero-initializer 
during gimplification. 

However, I am wondering whether the new added const internal functions will 
impact the 
optimization and then change the uninitialized analysis behavior? 
> 
> This means we won't be able to optimise based on the actual init
> value at the gimple level, but that seems like a fair trade-off.

Yes, with this approach: 

At gimple level, we will not be able to optimize on the new added init values;
At RTL level, we will optimize on the new added init values;
RTL optimizations will be able to eliminate any redundancy introduced by this 
new
Initializations to reduce the cost of this options. 



> AIUI this is really a security feature or anti-UB hardening feature
> (in the sense that users are more likely to see predictable behaviour
> “in the field” even if the program has UB).

Yes, this option is for security purpose, and currently have been used in 
productions by Microsoft, 
Apple and google, etc. 

Qing
> 
> Thanks,
> Richard



Re: How to traverse all the local variables that declared in the current routine?

2020-12-03 Thread Qing Zhao via Gcc-patches



> On Dec 3, 2020, at 10:36 AM, Richard Biener  
> wrote:
> 
> On December 3, 2020 5:07:28 PM GMT+01:00, Qing Zhao  <mailto:qing.z...@oracle.com>> wrote:
>> 
>> 
>>>> of uninitialized analysis in the later stage.
>>> 
>>> I don't see how the issue can be resolved, you can't get both, uninit
>>> warnings and no uninitialized memory.
>>> People can compile twice, once without -fzero-init to get uninit
>>> warnings and once with -fzero-init to get
>>> the extra "security".
>> 
>> So, for GCC, you think that it’s okay to get rid of the following
>> requirement:
>> 
>> C. The implementation needs to keep the current static warning on
>> uninitialized
>> variables untouched in order to avoid "forking the language”.
>> 
>> Then, we can add explanation in the user documentation of the new
>> -fzero-init and also 
>> that of the -Wuninitialized to inform users that -fzero-init will
>> change the behavior of -Wuninitialized.
>> In order to get the warnings, -fzero-init should not be added at the
>> same time?
>> 
>> With this requirement being eliminated, implementation will be much
>> easier. 
>> 
>> We can add the new initialization during simplification phase. Then
>> this new option will work
>> for all languages.  Is this reasonable?
> 
> I think that's reasonable indeed. Eventually doing the init after the early 
> uninit pass is possible as well.

You suggested to put the new pass after the early uninit pass? Why?

Qing
> 
> Richard. 
> 
>> thanks.
>> 
>> Qing
>> 
>> 
>> 
>>> 
>>> 



Re: How to traverse all the local variables that declared in the current routine?

2020-12-03 Thread Qing Zhao via Gcc-patches



> On Dec 3, 2020, at 2:45 AM, Richard Biener  wrote:
> 
> On Wed, Dec 2, 2020 at 4:36 PM Qing Zhao  <mailto:qing.z...@oracle.com>> wrote:
>> 
>> 
>> 
>> On Dec 2, 2020, at 2:45 AM, Richard Biener  
>> wrote:
>> 
>> On Tue, Dec 1, 2020 at 8:49 PM Qing Zhao  wrote:
>> 
>> 
>> Hi, Richard,
>> 
>> Could you please comment on the following approach:
>> 
>> Instead of adding the zero-initializer quite late at the pass “pass_expand”, 
>> we can add it as early as during gimplification.
>> However, we will mark these new added zero-initializers as “artificial”. And 
>> passing this “artificial” information to
>> “pass_early_warn_uninitialized” and “pass_late_warn_uninitialized”, in these 
>> two uninitialized variable analysis passes,
>> (i.e., in tree-sea-uninit.c) We will update the checking on 
>> “ssa_undefined_value_p”  to consider “artificial” zero-initializers.
>> (i.e, if the def_stmt is marked with “artificial”, then it’s a undefined 
>> value).
>> 
>> With such approach, we should be able to address all those conflicts.
>> 
>> Do you see any obvious issue with this approach?
>> 
>> 
>> Yes, DSE will happily elide an explicit zero-init following the
>> artificial one leading to false uninit diagnostics.
>> 
>> 
>> Indeed.  This is a big issue. And other optimizations might also be impacted 
>> by the new zero-init, resulting changed behavior
>> of uninitialized analysis in the later stage.
> 
> I don't see how the issue can be resolved, you can't get both, uninit
> warnings and no uninitialized memory.
> People can compile twice, once without -fzero-init to get uninit
> warnings and once with -fzero-init to get
> the extra "security".

So, for GCC, you think that it’s okay to get rid of the following requirement:

C. The implementation needs to keep the current static warning on uninitialized
variables untouched in order to avoid "forking the language”.

Then, we can add explanation in the user documentation of the new -fzero-init 
and also 
that of the -Wuninitialized to inform users that -fzero-init will change the 
behavior of -Wuninitialized.
In order to get the warnings, -fzero-init should not be added at the same time?

With this requirement being eliminated, implementation will be much easier. 

We can add the new initialization during simplification phase. Then this new 
option will work
for all languages.  Is this reasonable?

thanks.

Qing



> 
> Richard.
> 
>> 
>> What's the intended purpose of the zero-init?
>> 
>> 
>> 
>> The purpose of this new option is: (from the original LLVM patch submission):
>> 
>> "Add an option to initialize automatic variables with either a pattern or 
>> with
>> zeroes. The default is still that automatic variables are uninitialized. Also
>> add attributes to request uninitialized on a per-variable basis, mainly to 
>> disable
>> initialization of large stack arrays when deemed too expensive.
>> 
>> This isn't meant to change the semantics of C and C++. Rather, it's meant to 
>> be
>> a last-resort when programmers inadvertently have some undefined behavior in
>> their code. This patch aims to make undefined behavior hurt less, which
>> security-minded people will be very happy about. Notably, this means that
>> there's no inadvertent information leak when:
>> 
>> • The compiler re-uses stack slots, and a value is used uninitialized.
>> • The compiler re-uses a register, and a value is used uninitialized.
>> • Stack structs / arrays / unions with padding are copied.
>> This patch only addresses stack and register information leaks. There's many
>> more infoleaks that we could address, and much more undefined behavior that
>> could be tamed. Let's keep this patch focused, and I'm happy to address 
>> related
>> issues elsewhere."
>> 
>> For more details, please refer to the LLVM code review discussion on this 
>> patch:
>> https://reviews.llvm.org/D54604
>> 
>> 
>> I also wrote a simple writeup for this task based on my study and discussion 
>> with
>> Kees Cook (cc’ing him) as following:
>> 
>> 
>> thanks.
>> 
>> Qing
>> 
>> Support stack variables auto-initialization in GCC
>> 
>> 11/19/2020
>> 
>> Qing Zhao
>> 
>> ===
>> 
>> 
>> ** Background of the task:
>> 
>> The correponding GCC bugzilla RFE was created on 9/3/2018:
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87210
>> 
>>

Re: [patch][rtl-optimization][i386][pr97777] Fix a reg-stack df maintenance bug triggered by zero-call-used-regs pass.

2020-12-02 Thread Qing Zhao via Gcc-patches
Thanks a lot for your review.

I will commit the patch soon.

Qing

> On Dec 2, 2020, at 12:27 PM, Jeff Law  wrote:
> 
> 
> 
> On 12/2/20 9:32 AM, Qing Zhao wrote:
>> 
>>> So we are clearing the x87 registers with that option.  Hence the change
>>> in reg-stack behavior.  I'm a bit surprised by this as I don't see
>>> clearing the x87 registers as particularly helpful from a security
>>> standpoint.  But I haven't followed that discussion closely.
>> Even with the option -fzero-call-used-regs=used-gpr (without clearing any 
>> x87 registers), 
>> We have the same compiler time error. 
>> 
>> The first thing that the new pass “zero_call_used_regs” does is:
>> 
>> df_analyze();
> OK.  So that's the key issue, until now nothing has really cared about
> the DF state after reg-stack, but the pass to zero registers does.  I
> had the order backwards in my head with register clearing happening
> before regstack.  Hence my significant confusion about why things were
> suddenly going wrong.
> 
> The patch is OK.  Thanks for patiently walking me through this.
> 
> jeff
> 



Re: [patch][rtl-optimization][i386][pr97777] Fix a reg-stack df maintenance bug triggered by zero-call-used-regs pass.

2020-12-02 Thread Qing Zhao via Gcc-patches



> On Dec 1, 2020, at 6:16 PM, Jeff Law  wrote:
>>>> From c2573c6c8552b7b4c2eedb0684ce48b5c11436ec Mon Sep 17 00:00:00 2001
>>>> From: qing zhao 
>>>> Date: Thu, 19 Nov 2020 16:46:50 +0100
>>>> Subject: [PATCH] rtl-optimization: Fix data flow maintenance bug in
>>>> reg-stack.c [pr9]
>>>> 
>>>> reg-stack pass does not maintain the data flow information correctly.
>>>> call df_insn_rescan_all after the transformation is done.
>>>> 
>>>>gcc/
>>>> PR rtl-optimization/9
>>>> * reg-stack.c (rest_of_handle_stack_regs): call
>>>> df_insn_rescan_all if reg_to_stack return true.
>>>> 
>>>> gcc/testsuite/
>>>> PR rtl-optimization/9
>>>> * gcc.target/i386/pr9.c: New test.
>>> I'd like to see more analysis here.
>>> 
>>> ie, precisely what data is out of date and why?
>> 
>> For the simple testing case, what happened is, for the insn #6:
>> 
>> (gdb) call debug_rtx(insn)
>> (insn 6 26 7 2 (set (reg:XF 8 st [84])
>> (reg:XF 9 st(1) [85])) "t.c":4:10 134 {*movxf_internal}
>>  (expr_list:REG_EQUAL (const_double:XF 0.0 [0x0.0p+0])
>> (nil)))
>> 
>> After the following statement in reg-stack.c:
>>3080   control_flow_insn_deleted |= subst_stack_regs (insn,
>> );
>> 
>> This insn # 6 becomes:
>> (gdb) call debug_rtx(insn)
>> (insn 6 26 7 2 (set (reg:XF 8 st)
>> (reg:XF 8 st)) "t.c":4:10 134 {*movxf_internal}
>>  (expr_list:REG_EQUAL (const_double:XF 0.0 [0x0.0p+0])
>> (nil)))
>> 
>> However, there is no any df maintenance routine (for example,
>> df_insn_rescan, etc) is called for this changed insn.
> So we are clearing the x87 registers with that option.  Hence the change
> in reg-stack behavior.  I'm a bit surprised by this as I don't see
> clearing the x87 registers as particularly helpful from a security
> standpoint.  But I haven't followed that discussion closely.

Even with the option -fzero-call-used-regs=used-gpr (without clearing any x87 
registers), 
We have the same compiler time error. 

The first thing that the new pass “zero_call_used_regs” does is:

df_analyze();

And the compiler error happens inside this call. 

From the following passes list:

  NEXT_PASS (pass_stack_regs);
  PUSH_INSERT_PASSES_WITHIN (pass_stack_regs)
  NEXT_PASS (pass_split_before_regstack);
  NEXT_PASS (pass_stack_regs_run);
  POP_INSERT_PASSES ()
  POP_INSERT_PASSES ()
  NEXT_PASS (pass_late_compilation);
  PUSH_INSERT_PASSES_WITHIN (pass_late_compilation)
  NEXT_PASS (pass_zero_call_used_regs);
  NEXT_PASS (pass_compute_alignments);
  NEXT_PASS (pass_variable_tracking);
  NEXT_PASS (pass_free_cfg);
  NEXT_PASS (pass_machine_reorg);
  NEXT_PASS (pass_cleanup_barriers);
  NEXT_PASS (pass_delay_slots);
  NEXT_PASS (pass_split_for_shorten_branches);
  NEXT_PASS (pass_convert_to_eh_region_ranges);
  NEXT_PASS (pass_shorten_branches);
  NEXT_PASS (pass_set_nothrow_function_flags);
  NEXT_PASS (pass_dwarf2_frame);
  NEXT_PASS (pass_final);

We can see that the new pass “zero_call_used_regs” immediately follows pass 
“stack_regs”.

And all other passes that follows “stack_regs” do not call “df_analyze()”.


> 
>> 
>> As I checked, the transformation for this pass “stack” is quite
>> complicated. In addition to the above register replacement,
>> New insns might be inserted, and control flow might be changed, but
>> for most of the transformations applied in this pass,
>> There is no corresponding df maintenance routine is added for deferred
>> df rescanning.
> But this has been the case essentially for ever with reg-stack.  So
> what's unclear to me is why it's suddenly a problem now.

Previously, without the new pass “zero_call_used_regs”, all the passes 
following “stack_regs”
do not call df_analyze, therefore never expose this bug. 

The new pass “zero_call_used_regs” is the first pass that follows “stack_regs” 
and call “df_analyze”, 
Therefore triggered this old bug.

Instead of adding -fzero-call-used-regs option, if I change final.c as 
following to add “df_analyze” at
The beginning of the pass “pass_compute_alignments”:

qinzhao@gcc10:~/Work/write_gcc/gcc$ git diff final.c
diff --git a/gcc/final.c b/gcc/final.c
index fc9a05e335f..4955fc3fdcb 100644
--- a/gcc/final.c
+++ b/gcc/final.c
@@ -639,6 +639,7 @@ compute_alignments (void)
   basic_block bb;
   align_flags max_alignment;
 
+  df_analyze();
   label_align.truncate (0);
 
   max_labelno = max_label_num 

Re: How to traverse all the local variables that declared in the current routine?

2020-12-02 Thread Qing Zhao via Gcc-patches



> On Dec 2, 2020, at 2:45 AM, Richard Biener  wrote:
> 
> On Tue, Dec 1, 2020 at 8:49 PM Qing Zhao  wrote:
>> 
>> Hi, Richard,
>> 
>> Could you please comment on the following approach:
>> 
>> Instead of adding the zero-initializer quite late at the pass “pass_expand”, 
>> we can add it as early as during gimplification.
>> However, we will mark these new added zero-initializers as “artificial”. And 
>> passing this “artificial” information to
>> “pass_early_warn_uninitialized” and “pass_late_warn_uninitialized”, in these 
>> two uninitialized variable analysis passes,
>> (i.e., in tree-sea-uninit.c) We will update the checking on 
>> “ssa_undefined_value_p”  to consider “artificial” zero-initializers.
>> (i.e, if the def_stmt is marked with “artificial”, then it’s a undefined 
>> value).
>> 
>> With such approach, we should be able to address all those conflicts.
>> 
>> Do you see any obvious issue with this approach?
> 
> Yes, DSE will happily elide an explicit zero-init following the
> artificial one leading to false uninit diagnostics.

Indeed.  This is a big issue. And other optimizations might also be impacted by 
the new zero-init, resulting changed behavior
of uninitialized analysis in the later stage.

> 
> What's the intended purpose of the zero-init?


The purpose of this new option is: (from the original LLVM patch submission):

"Add an option to initialize automatic variables with either a pattern or with
zeroes. The default is still that automatic variables are uninitialized. Also
add attributes to request uninitialized on a per-variable basis, mainly to 
disable
initialization of large stack arrays when deemed too expensive.

This isn't meant to change the semantics of C and C++. Rather, it's meant to be
a last-resort when programmers inadvertently have some undefined behavior in
their code. This patch aims to make undefined behavior hurt less, which
security-minded people will be very happy about. Notably, this means that
there's no inadvertent information leak when:

• The compiler re-uses stack slots, and a value is used uninitialized.
• The compiler re-uses a register, and a value is used uninitialized.
• Stack structs / arrays / unions with padding are copied.
This patch only addresses stack and register information leaks. There's many
more infoleaks that we could address, and much more undefined behavior that
could be tamed. Let's keep this patch focused, and I'm happy to address related
issues elsewhere."

For more details, please refer to the LLVM code review discussion on this patch:
https://reviews.llvm.org/D54604


I also wrote a simple writeup for this task based on my study and discussion 
with
Kees Cook (cc’ing him) as following:


thanks.

Qing

Support stack variables auto-initialization in GCC

11/19/2020

Qing Zhao

===


** Background of the task:

The correponding GCC bugzilla RFE was created on 9/3/2018:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87210

A similar option for LLVM (around Nov, 2018)
https://lists.llvm.org/pipermail/cfe-dev/2018-November/060172.html
had invoked a lot of discussion before committed.

(The following are quoted from the comments of Alexander Potapenko in
GCC bug 87210):

Finally, on Oct, 2019, upstream Clang supports force initialization
of stack variables under the -ftrivial-auto-var-init flag.

-ftrivial-auto-var-init=pattern initializes local variables with a 0xAA pattern
(actually it's more complicated, see https://reviews.llvm.org/D54604)

-ftrivial-auto-var-init=zero provides zero-initialization of locals.
This mode isn't officially supported yet and is hidden behind an additional
-enable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang flag.
This is done to avoid creating a C++ dialect where all variables are
zero-initialized.

Starting v5.2, Linux kernel has a CONFIG_INIT_STACK_ALL config that performs
the build  with -ftrivial-auto-var-init=pattern. This one isn't widely adopted
yet, partially because initializing locals with 0xAA isn't fast enough.

Linus Torvalds is quite positive about zero-initializing the locals though,
see https://lkml.org/lkml/2019/7/30/1303:

"when a compiler has an option to initialize stack variables, it
would probably _also_ be a very good idea for that compiler to then
support a variable attribute that says "don't initialize _this_
variable, I will do that manually".
I also think that the "initialize with poison" is
pointless and wrong. Yes, it can find bugs, but it doesn't really help
improve the general situation, and people see it as a debugging tool,
not a "improve code quality and improve the life of kernel developers"
tool.

So having a flag similar to -ftrivial-auto-var-init=zero in GCC will be
appreciated by the Linux ker

Re: How to traverse all the local variables that declared in the current routine?

2020-12-01 Thread Qing Zhao via Gcc-patches
Hi, Richard, 

Could you please comment on the following approach:

Instead of adding the zero-initializer quite late at the pass “pass_expand”, we 
can add it as early as during gimplification. 
However, we will mark these new added zero-initializers as “artificial”. And 
passing this “artificial” information to 
“pass_early_warn_uninitialized” and “pass_late_warn_uninitialized”, in these 
two uninitialized variable analysis passes, 
(i.e., in tree-sea-uninit.c) We will update the checking on 
“ssa_undefined_value_p”  to consider “artificial” zero-initializers. 
(i.e, if the def_stmt is marked with “artificial”, then it’s a undefined 
value). 

With such approach, we should be able to address all those conflicts. 

Do you see any obvious issue with this approach?

Thanks a lot for your help.

Qing


> On Nov 25, 2020, at 3:11 AM, Richard Biener  
> wrote:
>> 
>> 
>> I am planing to add a new phase immediately after 
>> “pass_late_warn_uninitialized” to initialize all auto-variables that are
>> not explicitly initialized in the declaration, the basic idea is following:
>> 
>> ** The proposal:
>> 
>> A. add a new GCC option: (same name and meaning as CLANG)
>> -ftrivial-auto-var-init=[pattern|zero], similar pattern init as CLANG;
>> 
>> B. add a new attribute for variable:
>> __attribute((uninitialized)
>> the marked variable is uninitialized intentionaly for performance purpose.
>> 
>> C. The implementation needs to keep the current static warning on 
>> uninitialized
>> variables untouched in order to avoid "forking the language".
>> 
>> 
>> ** The implementation:
>> 
>> There are two major requirements for the implementation:
>> 
>> 1. all auto-variables that do not have an explicit initializer should be 
>> initialized to
>> zero by this option.  (Same behavior as CLANG)
>> 
>> 2. keep the current static warning on uninitialized variables untouched.
>> 
>> In order to satisfy 1, we should check whether an auto-variable has 
>> initializer
>> or not;
>> In order to satisfy 2, we should add this new transformation after
>> "pass_late_warn_uninitialized".
>> 
>> So, we should be able to check whether an auto-variable has initializer or 
>> not after “pass_late_warn_uninitialized”,
>> If Not, then insert an initialization for it.
>> 
>> For this purpose, I guess that “FOR_EACH_LOCAL_DECL” might be better?
>> 
>> 
>> I think both as long as they are source-level auto-variables. Then which one 
>> is better?
>> 
>> 
>> Another issue is, in order to check whether an auto-variable has 
>> initializer, I plan to add a new bit in “decl_common” as:
>> /* In a VAR_DECL, this is DECL_IS_INITIALIZED.  */
>> unsigned decl_is_initialized :1;
>> 
>> /* IN VAR_DECL, set when the decl is initialized at the declaration.  */
>> #define DECL_IS_INITIALIZED(NODE) \
>> (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized)
>> 
>> set this bit when setting DECL_INITIAL for the variables in FE. then keep it
>> even though DECL_INITIAL might be NULLed.
>> 
>> 
>> For locals it would be more reliable to set this flag-Wmaybe-uninitialized.
>> 
>> 
>> You mean I can set the flag “DECL_IS_INITIALIZED (decl)”  inside the routine 
>> “gimpley_decl_expr” (gimplify.c) as following:
>> 
>>  if (VAR_P (decl) && !DECL_EXTERNAL (decl))
>>{
>>  tree init = DECL_INITIAL (decl);
>> ...
>>  if (init && init != error_mark_node)
>>{
>>  if (!TREE_STATIC (decl))
>>{
>>  DECL_IS_INITIALIZED(decl) = 1;
>>}
>> 
>> Is this enough for all Frontends? Are there other places that I need to 
>> maintain this bit?
>> 
>> 
>> 
>> Do you have any comment and suggestions?
>> 
>> 
>> As said above - do you want to cover registers as well as locals?
>> 
>> 
>> All the locals from the source-code point of view should be covered.   (From 
>> my study so far,  looks like that Clang adds that phase in FE).
>> If GCC adds this phase in FE, then the following design requirement
>> 
>> C. The implementation needs to keep the current static warning on 
>> uninitialized
>> variables untouched in order to avoid "forking the language”.
>> 
>> cannot be satisfied.  Since gcc’s uninitialized variables analysis is 
>> applied quite late.
>> 
>> So, we have to add this new phase after “pass_late_warn_uninitialized”.
>> 
>> I'd do
>> the actual zeroing during RTL expansion instead since otherwise you
>> have to figure youself whether a local is actually used (see 
>> expand_stack_vars)
>> 
>> 
>> Adding  this new transformation during RTL expansion is okay.  I will check 
>> on this in more details to see how to add it to RTL expansion phase.
>> 
>> 
>> Note that optimization will already made have use of "uninitialized" state
>> of locals so depending on what the actual goal is here "late" may be too 
>> late.
>> 
>> 
>> This is a really good point…
>> 
>> In order to avoid optimization  to use the “uninitialized” state of locals, 
>> we should add the zeroing phase as early as possible (adding it in FE might 
>> be best
>> for this 

Re: How to traverse all the local variables that declared in the current routine?

2020-11-30 Thread Qing Zhao via Gcc-patches
On Nov 30, 2020, at 11:18 AM, Martin Sebor  wrote:
 Does gcc provide an iterator to traverse all the local variables that 
 are declared in the current routine?
 
 If not, what’s the best way to traverse the local variables?
>>> 
>>> Depends on what for.  There's the source level view you get by walking
>>> BLOCK_VARS of the
>>> scope tree, theres cfun->local_variables (FOR_EACH_LOCAL_DECL) and
>>> there's SSA names
>>> (FOR_EACH_SSA_NAME).
>> 
>> I am planing to add a new phase immediately after 
>> “pass_late_warn_uninitialized” to initialize all auto-variables that are
>> not explicitly initialized in the declaration, the basic idea is 
>> following:
>> 
>> ** The proposal:
>> 
>> A. add a new GCC option: (same name and meaning as CLANG)
>> -ftrivial-auto-var-init=[pattern|zero], similar pattern init as CLANG;
>> 
>> B. add a new attribute for variable:
>> __attribute((uninitialized)
>> the marked variable is uninitialized intentionaly for performance 
>> purpose.
>> 
>> C. The implementation needs to keep the current static warning on 
>> uninitialized
>> variables untouched in order to avoid "forking the language".
>> 
>> 
>> ** The implementation:
>> 
>> There are two major requirements for the implementation:
>> 
>> 1. all auto-variables that do not have an explicit initializer should be 
>> initialized to
>> zero by this option.  (Same behavior as CLANG)
>> 
>> 2. keep the current static warning on uninitialized variables untouched.
>> 
>> In order to satisfy 1, we should check whether an auto-variable has 
>> initializer
>> or not;
>> In order to satisfy 2, we should add this new transformation after
>> "pass_late_warn_uninitialized".
>> 
>> So, we should be able to check whether an auto-variable has initializer 
>> or not after “pass_late_warn_uninitialized”,
>> If Not, then insert an initialization for it.
>> 
>> For this purpose, I guess that “FOR_EACH_LOCAL_DECL” might be better?
> 
> Yes, but do you want to catch variables promoted to register as well
> or just variables
> on the stack?
 I think both as long as they are source-level auto-variables. Then which 
 one is better?
> 
>> Another issue is, in order to check whether an auto-variable has 
>> initializer, I plan to add a new bit in “decl_common” as:
>>  /* In a VAR_DECL, this is DECL_IS_INITIALIZED.  */
>>  unsigned decl_is_initialized :1;
>> 
>> /* IN VAR_DECL, set when the decl is initialized at the declaration.  */
>> #define DECL_IS_INITIALIZED(NODE) \
>>  (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized)
>> 
>> set this bit when setting DECL_INITIAL for the variables in FE. then 
>> keep it
>> even though DECL_INITIAL might be NULLed.
> 
> For locals it would be more reliable to set this flag during 
> gimplification.
 You mean I can set the flag “DECL_IS_INITIALIZED (decl)”  inside the 
 routine “gimpley_decl_expr” (gimplify.c) as following:
   if (VAR_P (decl) && !DECL_EXTERNAL (decl))
 {
   tree init = DECL_INITIAL (decl);
 ...
   if (init && init != error_mark_node)
 {
   if (!TREE_STATIC (decl))
 {
   DECL_IS_INITIALIZED(decl) = 1;
 }
 Is this enough for all Frontends? Are there other places that I need to 
 maintain this bit?
> 
>> Do you have any comment and suggestions?
> 
> As said above - do you want to cover registers as well as locals?
 All the locals from the source-code point of view should be covered.   
 (From my study so far,  looks like that Clang adds that phase in FE).
 If GCC adds this phase in FE, then the following design requirement
 C. The implementation needs to keep the current static warning on 
 uninitialized
 variables untouched in order to avoid "forking the language”.
 cannot be satisfied.  Since gcc’s uninitialized variables analysis is 
 applied quite late.
 So, we have to add this new phase after “pass_late_warn_uninitialized”.
>  I'd do
> the actual zeroing during RTL expansion instead since otherwise you
> have to figure youself whether a local is actually used (see 
> expand_stack_vars)
 Adding  this new transformation during RTL expansion is okay.  I will 
 check on this in more details to see how to add it to RTL expansion phase.
> 
> Note that optimization will already made have use of "uninitialized" state
> of locals so depending on what the actual goal is here "late" may be too 
> late.
 This is a really good point…
 In order to avoid optimization  to use the “uninitialized” state of 
 locals, we should add the zeroing phase as early as possible (adding it in 
 FE might 

Re: [patch][rtl-optimization][i386][pr97777] Fix a reg-stack df maintenance bug triggered by zero-call-used-regs pass.

2020-11-30 Thread Qing Zhao via Gcc-patches
Hi, Jeff,

Sorry for the late reply due to thanksgiving long weekend. 

> On Nov 25, 2020, at 1:37 PM, Jeff Law  wrote:
> 
> 
> 
> On 11/19/20 8:59 AM, Qing Zhao via Gcc-patches wrote:
>> Hi, 
>> 
>> PR9 - ICE: in df_refs_verify, at df-scan.c:3991 with -O 
>> -ffinite-math-only -fzero-call-used-regs=all
>> 
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=9
>> 
>> Is a bug triggered by the new pass zero-call-used-regs, however, it’s an old 
>> bug in the pass “reg-stack”.
>> This pass does not correctly maintain the df information after 
>> transformation. 
>> 
>> Since the transformation is reg-stack pass is quite complicate, involving 
>> both instruction changes and control
>> Flow changes, I called “df_insn_rescan_all” after the transformation is done.
>> 
>> The patch has been tested with bootstrap with 
>> --enable-checking=yes,rtl,df,extra, no regression. 
>> 
>> Okay for commit?
>> 
>> Qing
>> 
>> From c2573c6c8552b7b4c2eedb0684ce48b5c11436ec Mon Sep 17 00:00:00 2001
>> From: qing zhao 
>> Date: Thu, 19 Nov 2020 16:46:50 +0100
>> Subject: [PATCH] rtl-optimization: Fix data flow maintenance bug in
>> reg-stack.c [pr9]
>> 
>> reg-stack pass does not maintain the data flow information correctly.
>> call df_insn_rescan_all after the transformation is done.
>> 
>>gcc/
>>  PR rtl-optimization/9
>>  * reg-stack.c (rest_of_handle_stack_regs): call
>>  df_insn_rescan_all if reg_to_stack return true.
>> 
>>  gcc/testsuite/
>>  PR rtl-optimization/9
>>  * gcc.target/i386/pr9.c: New test.
> I'd like to see more analysis here.
> 
> ie, precisely what data is out of date and why?

For the simple testing case, what happened is, for the insn #6:

(gdb) call debug_rtx(insn)
(insn 6 26 7 2 (set (reg:XF 8 st [84])
(reg:XF 9 st(1) [85])) "t.c":4:10 134 {*movxf_internal}
 (expr_list:REG_EQUAL (const_double:XF 0.0 [0x0.0p+0])
(nil)))

After the following statement in reg-stack.c:
   3080   control_flow_insn_deleted |= subst_stack_regs (insn, 
);

This insn # 6 becomes:
(gdb) call debug_rtx(insn)
(insn 6 26 7 2 (set (reg:XF 8 st)
(reg:XF 8 st)) "t.c":4:10 134 {*movxf_internal}
 (expr_list:REG_EQUAL (const_double:XF 0.0 [0x0.0p+0])
(nil)))

However, there is no any df maintenance routine (for example, df_insn_rescan, 
etc) is called for this changed insn. 

As I checked, the transformation for this pass “stack” is quite complicated. In 
addition to the above register replacement,
New insns might be inserted, and control flow might be changed, but for most of 
the transformations applied in this pass,
There is no corresponding df maintenance routine is added for deferred df 
rescanning. 

Therefore, I called the “df_insn_rescan_all” after the whole transformation is 
done to maintain the df information. 

Another solution is to check all the details of the transformations of this 
pass, and add the df maintenance routine case by
case for each of the transformation.  Since the transformation of the pass 
“stack” is quite extensive and complicate, I feel 
that calling “df_insn_rescan_all” might be a better solution. 

Let me know your suggestions.

thanks.

Qing



> 
> Jeff



Re: How to traverse all the local variables that declared in the current routine?

2020-11-30 Thread Qing Zhao via Gcc-patches
Hi, Martin,

Thanks a lot for your suggestion.

> On Nov 25, 2020, at 6:08 PM, Martin Sebor  wrote:
> 
> On 11/24/20 9:54 AM, Qing Zhao via Gcc-patches wrote:
>>> On Nov 24, 2020, at 9:55 AM, Richard Biener  
>>> wrote:
>>> 
>>> On Tue, Nov 24, 2020 at 4:47 PM Qing Zhao  wrote:
>>>> 
>>>> 
>>>> 
>>>>> On Nov 24, 2020, at 1:32 AM, Richard Biener  
>>>>> wrote:
>>>>> 
>>>>> On Tue, Nov 24, 2020 at 12:05 AM Qing Zhao via Gcc-patches
>>>>>  wrote:
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> Does gcc provide an iterator to traverse all the local variables that 
>>>>>> are declared in the current routine?
>>>>>> 
>>>>>> If not, what’s the best way to traverse the local variables?
>>>>> 
>>>>> Depends on what for.  There's the source level view you get by walking
>>>>> BLOCK_VARS of the
>>>>> scope tree, theres cfun->local_variables (FOR_EACH_LOCAL_DECL) and
>>>>> there's SSA names
>>>>> (FOR_EACH_SSA_NAME).
>>>> 
>>>> I am planing to add a new phase immediately after 
>>>> “pass_late_warn_uninitialized” to initialize all auto-variables that are
>>>> not explicitly initialized in the declaration, the basic idea is following:
>>>> 
>>>> ** The proposal:
>>>> 
>>>> A. add a new GCC option: (same name and meaning as CLANG)
>>>> -ftrivial-auto-var-init=[pattern|zero], similar pattern init as CLANG;
>>>> 
>>>> B. add a new attribute for variable:
>>>> __attribute((uninitialized)
>>>> the marked variable is uninitialized intentionaly for performance purpose.
>>>> 
>>>> C. The implementation needs to keep the current static warning on 
>>>> uninitialized
>>>> variables untouched in order to avoid "forking the language".
>>>> 
>>>> 
>>>> ** The implementation:
>>>> 
>>>> There are two major requirements for the implementation:
>>>> 
>>>> 1. all auto-variables that do not have an explicit initializer should be 
>>>> initialized to
>>>> zero by this option.  (Same behavior as CLANG)
>>>> 
>>>> 2. keep the current static warning on uninitialized variables untouched.
>>>> 
>>>> In order to satisfy 1, we should check whether an auto-variable has 
>>>> initializer
>>>> or not;
>>>> In order to satisfy 2, we should add this new transformation after
>>>> "pass_late_warn_uninitialized".
>>>> 
>>>> So, we should be able to check whether an auto-variable has initializer or 
>>>> not after “pass_late_warn_uninitialized”,
>>>> If Not, then insert an initialization for it.
>>>> 
>>>> For this purpose, I guess that “FOR_EACH_LOCAL_DECL” might be better?
>>> 
>>> Yes, but do you want to catch variables promoted to register as well
>>> or just variables
>>> on the stack?
>> I think both as long as they are source-level auto-variables. Then which one 
>> is better?
>>> 
>>>> Another issue is, in order to check whether an auto-variable has 
>>>> initializer, I plan to add a new bit in “decl_common” as:
>>>>  /* In a VAR_DECL, this is DECL_IS_INITIALIZED.  */
>>>>  unsigned decl_is_initialized :1;
>>>> 
>>>> /* IN VAR_DECL, set when the decl is initialized at the declaration.  */
>>>> #define DECL_IS_INITIALIZED(NODE) \
>>>>  (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized)
>>>> 
>>>> set this bit when setting DECL_INITIAL for the variables in FE. then keep 
>>>> it
>>>> even though DECL_INITIAL might be NULLed.
>>> 
>>> For locals it would be more reliable to set this flag during gimplification.
>> You mean I can set the flag “DECL_IS_INITIALIZED (decl)”  inside the routine 
>> “gimpley_decl_expr” (gimplify.c) as following:
>>   if (VAR_P (decl) && !DECL_EXTERNAL (decl))
>> {
>>   tree init = DECL_INITIAL (decl);
>> ...
>>   if (init && init != error_mark_node)
>> {
>>   if (!TREE_STATIC (decl))
>>  {
>>DECL_IS_INITIALIZED(decl) = 1;
>>  }
>> Is this enough for all Frontends? Are there other places that I need

Re: How to traverse all the local variables that declared in the current routine?

2020-11-25 Thread Qing Zhao via Gcc-patches



> On Nov 25, 2020, at 3:11 AM, Richard Biener  
> wrote:
>> 
>> 
>> Hi,
>> 
>> Does gcc provide an iterator to traverse all the local variables that are 
>> declared in the current routine?
>> 
>> If not, what’s the best way to traverse the local variables?
>> 
>> 
>> Depends on what for.  There's the source level view you get by walking
>> BLOCK_VARS of the
>> scope tree, theres cfun->local_variables (FOR_EACH_LOCAL_DECL) and
>> there's SSA names
>> (FOR_EACH_SSA_NAME).
>> 
>> 
>> I am planing to add a new phase immediately after 
>> “pass_late_warn_uninitialized” to initialize all auto-variables that are
>> not explicitly initialized in the declaration, the basic idea is following:
>> 
>> ** The proposal:
>> 
>> A. add a new GCC option: (same name and meaning as CLANG)
>> -ftrivial-auto-var-init=[pattern|zero], similar pattern init as CLANG;
>> 
>> B. add a new attribute for variable:
>> __attribute((uninitialized)
>> the marked variable is uninitialized intentionaly for performance purpose.
>> 
>> C. The implementation needs to keep the current static warning on 
>> uninitialized
>> variables untouched in order to avoid "forking the language".
>> 
>> 
>> ** The implementation:
>> 
>> There are two major requirements for the implementation:
>> 
>> 1. all auto-variables that do not have an explicit initializer should be 
>> initialized to
>> zero by this option.  (Same behavior as CLANG)
>> 
>> 2. keep the current static warning on uninitialized variables untouched.
>> 
>> In order to satisfy 1, we should check whether an auto-variable has 
>> initializer
>> or not;
>> In order to satisfy 2, we should add this new transformation after
>> "pass_late_warn_uninitialized".
>> 
>> So, we should be able to check whether an auto-variable has initializer or 
>> not after “pass_late_warn_uninitialized”,
>> If Not, then insert an initialization for it.
>> 
>> For this purpose, I guess that “FOR_EACH_LOCAL_DECL” might be better?
>> 
>> 
>> Yes, but do you want to catch variables promoted to register as well
>> or just variables
>> on the stack?
>> 
>> 
>> I think both as long as they are source-level auto-variables. Then which one 
>> is better?
>> 
>> 
>> Another issue is, in order to check whether an auto-variable has 
>> initializer, I plan to add a new bit in “decl_common” as:
>> /* In a VAR_DECL, this is DECL_IS_INITIALIZED.  */
>> unsigned decl_is_initialized :1;
>> 
>> /* IN VAR_DECL, set when the decl is initialized at the declaration.  */
>> #define DECL_IS_INITIALIZED(NODE) \
>> (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized)
>> 
>> set this bit when setting DECL_INITIAL for the variables in FE. then keep it
>> even though DECL_INITIAL might be NULLed.
>> 
>> 
>> For locals it would be more reliable to set this flag during gimplification.
>> 
>> 
>> You mean I can set the flag “DECL_IS_INITIALIZED (decl)”  inside the routine 
>> “gimpley_decl_expr” (gimplify.c) as following:
>> 
>>  if (VAR_P (decl) && !DECL_EXTERNAL (decl))
>>{
>>  tree init = DECL_INITIAL (decl);
>> ...
>>  if (init && init != error_mark_node)
>>{
>>  if (!TREE_STATIC (decl))
>>{
>>  DECL_IS_INITIALIZED(decl) = 1;
>>}
>> 
>> Is this enough for all Frontends? Are there other places that I need to 
>> maintain this bit?
>> 
>> 
>> 
>> Do you have any comment and suggestions?
>> 
>> 
>> As said above - do you want to cover registers as well as locals?
>> 
>> 
>> All the locals from the source-code point of view should be covered.   (From 
>> my study so far,  looks like that Clang adds that phase in FE).
>> If GCC adds this phase in FE, then the following design requirement
>> 
>> C. The implementation needs to keep the current static warning on 
>> uninitialized
>> variables untouched in order to avoid "forking the language”.
>> 
>> cannot be satisfied.  Since gcc’s uninitialized variables analysis is 
>> applied quite late.
>> 
>> So, we have to add this new phase after “pass_late_warn_uninitialized”.
>> 
>> I'd do
>> the actual zeroing during RTL expansion instead since otherwise you
>> have to figure youself whether a local is actually used (see 
>> expand_stack_vars)
>> 
>> 
>> Adding  this new transformation during RTL expansion is okay.  I will check 
>> on this in more details to see how to add it to RTL expansion phase.
>> 
>> 
>> Note that optimization will already made have use of "uninitialized" state
>> of locals so depending on what the actual goal is here "late" may be too 
>> late.
>> 
>> 
>> This is a really good point…
>> 
>> In order to avoid optimization  to use the “uninitialized” state of locals, 
>> we should add the zeroing phase as early as possible (adding it in FE might 
>> be best
>> for this issue). However, if we have to met the following requirement:
> 
> So is optimization supposed to pick up zero or is it supposed to act
> as if the initializer
> is unknown?

Good question!

Theoretically,  the new option -ftrivial-auto-var-init=zero is supposed to add 

Re: How to traverse all the local variables that declared in the current routine?

2020-11-24 Thread Qing Zhao via Gcc-patches



> On Nov 24, 2020, at 9:55 AM, Richard Biener  
> wrote:
> 
> On Tue, Nov 24, 2020 at 4:47 PM Qing Zhao  wrote:
>> 
>> 
>> 
>>> On Nov 24, 2020, at 1:32 AM, Richard Biener  
>>> wrote:
>>> 
>>> On Tue, Nov 24, 2020 at 12:05 AM Qing Zhao via Gcc-patches
>>>  wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> Does gcc provide an iterator to traverse all the local variables that are 
>>>> declared in the current routine?
>>>> 
>>>> If not, what’s the best way to traverse the local variables?
>>> 
>>> Depends on what for.  There's the source level view you get by walking
>>> BLOCK_VARS of the
>>> scope tree, theres cfun->local_variables (FOR_EACH_LOCAL_DECL) and
>>> there's SSA names
>>> (FOR_EACH_SSA_NAME).
>> 
>> I am planing to add a new phase immediately after 
>> “pass_late_warn_uninitialized” to initialize all auto-variables that are
>> not explicitly initialized in the declaration, the basic idea is following:
>> 
>> ** The proposal:
>> 
>> A. add a new GCC option: (same name and meaning as CLANG)
>> -ftrivial-auto-var-init=[pattern|zero], similar pattern init as CLANG;
>> 
>> B. add a new attribute for variable:
>> __attribute((uninitialized)
>> the marked variable is uninitialized intentionaly for performance purpose.
>> 
>> C. The implementation needs to keep the current static warning on 
>> uninitialized
>> variables untouched in order to avoid "forking the language".
>> 
>> 
>> ** The implementation:
>> 
>> There are two major requirements for the implementation:
>> 
>> 1. all auto-variables that do not have an explicit initializer should be 
>> initialized to
>> zero by this option.  (Same behavior as CLANG)
>> 
>> 2. keep the current static warning on uninitialized variables untouched.
>> 
>> In order to satisfy 1, we should check whether an auto-variable has 
>> initializer
>> or not;
>> In order to satisfy 2, we should add this new transformation after
>> "pass_late_warn_uninitialized".
>> 
>> So, we should be able to check whether an auto-variable has initializer or 
>> not after “pass_late_warn_uninitialized”,
>> If Not, then insert an initialization for it.
>> 
>> For this purpose, I guess that “FOR_EACH_LOCAL_DECL” might be better?
> 
> Yes, but do you want to catch variables promoted to register as well
> or just variables
> on the stack?

I think both as long as they are source-level auto-variables. Then which one is 
better?

> 
>> Another issue is, in order to check whether an auto-variable has 
>> initializer, I plan to add a new bit in “decl_common” as:
>>  /* In a VAR_DECL, this is DECL_IS_INITIALIZED.  */
>>  unsigned decl_is_initialized :1;
>> 
>> /* IN VAR_DECL, set when the decl is initialized at the declaration.  */
>> #define DECL_IS_INITIALIZED(NODE) \
>>  (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized)
>> 
>> set this bit when setting DECL_INITIAL for the variables in FE. then keep it
>> even though DECL_INITIAL might be NULLed.
> 
> For locals it would be more reliable to set this flag during gimplification.

You mean I can set the flag “DECL_IS_INITIALIZED (decl)”  inside the routine 
“gimpley_decl_expr” (gimplify.c) as following:

  if (VAR_P (decl) && !DECL_EXTERNAL (decl))
{
  tree init = DECL_INITIAL (decl);
...
  if (init && init != error_mark_node)
{
  if (!TREE_STATIC (decl))
{
  DECL_IS_INITIALIZED(decl) = 1;
}

Is this enough for all Frontends? Are there other places that I need to 
maintain this bit? 


> 
>> Do you have any comment and suggestions?
> 
> As said above - do you want to cover registers as well as locals?

All the locals from the source-code point of view should be covered.   (From my 
study so far,  looks like that Clang adds that phase in FE). 
If GCC adds this phase in FE, then the following design requirement

C. The implementation needs to keep the current static warning on uninitialized
variables untouched in order to avoid "forking the language”.

cannot be satisfied.  Since gcc’s uninitialized variables analysis is applied 
quite late. 

So, we have to add this new phase after “pass_late_warn_uninitialized”. 

>  I'd do
> the actual zeroing during RTL expansion instead since otherwise you
> have to figure youself whether a local is actually used (see 
> expand_stack_vars)

Adding  this new transformation during RTL expansion is okay.  I will check on 
this in more details to se

Re: How to traverse all the local variables that declared in the current routine?

2020-11-24 Thread Qing Zhao via Gcc-patches



> On Nov 24, 2020, at 1:32 AM, Richard Biener  
> wrote:
> 
> On Tue, Nov 24, 2020 at 12:05 AM Qing Zhao via Gcc-patches
>  wrote:
>> 
>> Hi,
>> 
>> Does gcc provide an iterator to traverse all the local variables that are 
>> declared in the current routine?
>> 
>> If not, what’s the best way to traverse the local variables?
> 
> Depends on what for.  There's the source level view you get by walking
> BLOCK_VARS of the
> scope tree, theres cfun->local_variables (FOR_EACH_LOCAL_DECL) and
> there's SSA names
> (FOR_EACH_SSA_NAME).

I am planing to add a new phase immediately after 
“pass_late_warn_uninitialized” to initialize all auto-variables that are
not explicitly initialized in the declaration, the basic idea is following:

** The proposal:

A. add a new GCC option: (same name and meaning as CLANG)
-ftrivial-auto-var-init=[pattern|zero], similar pattern init as CLANG;

B. add a new attribute for variable:
__attribute((uninitialized)
the marked variable is uninitialized intentionaly for performance purpose.

C. The implementation needs to keep the current static warning on uninitialized
variables untouched in order to avoid "forking the language".


** The implementation:

There are two major requirements for the implementation:

1. all auto-variables that do not have an explicit initializer should be 
initialized to
zero by this option.  (Same behavior as CLANG)

2. keep the current static warning on uninitialized variables untouched.

In order to satisfy 1, we should check whether an auto-variable has initializer
or not;
In order to satisfy 2, we should add this new transformation after
"pass_late_warn_uninitialized".

So, we should be able to check whether an auto-variable has initializer or not 
after “pass_late_warn_uninitialized”, 
If Not, then insert an initialization for it. 

For this purpose, I guess that “FOR_EACH_LOCAL_DECL” might be better?

Another issue is, in order to check whether an auto-variable has initializer, I 
plan to add a new bit in “decl_common” as:
  /* In a VAR_DECL, this is DECL_IS_INITIALIZED.  */
  unsigned decl_is_initialized :1;

/* IN VAR_DECL, set when the decl is initialized at the declaration.  */
#define DECL_IS_INITIALIZED(NODE) \
  (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized)

set this bit when setting DECL_INITIAL for the variables in FE. then keep it
even though DECL_INITIAL might be NULLed.

Do you have any comment and suggestions?

Thanks a lot for the help.

Qing

> Richard.
> 
>> 
>> Thanks.
>> 
>> Qing



How to traverse all the local variables that declared in the current routine?

2020-11-23 Thread Qing Zhao via Gcc-patches
Hi, 

Does gcc provide an iterator to traverse all the local variables that are 
declared in the current routine? 

If not, what’s the best way to traverse the local variables?

Thanks.

Qing

[patch][rtl-optimization][i386][pr97777] Fix a reg-stack df maintenance bug triggered by zero-call-used-regs pass.

2020-11-19 Thread Qing Zhao via Gcc-patches
Hi, 

PR9 - ICE: in df_refs_verify, at df-scan.c:3991 with -O -ffinite-math-only 
-fzero-call-used-regs=all

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=9

Is a bug triggered by the new pass zero-call-used-regs, however, it’s an old 
bug in the pass “reg-stack”.
This pass does not correctly maintain the df information after transformation. 

Since the transformation is reg-stack pass is quite complicate, involving both 
instruction changes and control
Flow changes, I called “df_insn_rescan_all” after the transformation is done.

The patch has been tested with bootstrap with 
--enable-checking=yes,rtl,df,extra, no regression. 

Okay for commit?

Qing

From c2573c6c8552b7b4c2eedb0684ce48b5c11436ec Mon Sep 17 00:00:00 2001
From: qing zhao 
Date: Thu, 19 Nov 2020 16:46:50 +0100
Subject: [PATCH] rtl-optimization: Fix data flow maintenance bug in
 reg-stack.c [pr9]

reg-stack pass does not maintain the data flow information correctly.
call df_insn_rescan_all after the transformation is done.

gcc/
PR rtl-optimization/9
* reg-stack.c (rest_of_handle_stack_regs): call
df_insn_rescan_all if reg_to_stack return true.

gcc/testsuite/
PR rtl-optimization/9
* gcc.target/i386/pr9.c: New test.
---
 gcc/reg-stack.c | 3 ++-
 gcc/testsuite/gcc.target/i386/pr9.c | 9 +
 2 files changed, 11 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr9.c

diff --git a/gcc/reg-stack.c b/gcc/reg-stack.c
index 8f98bd85750..3dab843f803 100644
--- a/gcc/reg-stack.c
+++ b/gcc/reg-stack.c
@@ -3426,7 +3426,8 @@ static unsigned int
 rest_of_handle_stack_regs (void)
 {
 #ifdef STACK_REGS
-  reg_to_stack ();
+  if (reg_to_stack ())
+df_insn_rescan_all ();
   regstack_completed = 1;
 #endif
   return 0;
diff --git a/gcc/testsuite/gcc.target/i386/pr9.c 
b/gcc/testsuite/gcc.target/i386/pr9.c
new file mode 100644
index 000..fcefc098637
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr9.c
@@ -0,0 +1,9 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O -fzero-call-used-regs=used -ffinite-math-only" } */
+
+float
+foo (void)
+{
+  return __builtin_fmod (0, 0);
+}
+
-- 
2.11.0




Re: [stage1][PATCH] Change semantics of -frecord-gcc-switches and add -frecord-gcc-switches-format.

2020-11-16 Thread Qing Zhao via Gcc-patches



> On Nov 16, 2020, at 4:29 AM, Martin Liška  wrote:
> 
> On 11/10/20 9:53 PM, Qing Zhao wrote:
>> The deadline for gcc11 stage 1 is approaching.  The pinged patch is one that 
>> has been sent for review 8 months ago in order to
>> Make into gcc11.
> 
> Hello.
> 
> You didn't miss the deadline as all patches sent before stage1 can be 
> reviewed even during stage3.
> Note that many upstream developers (and maintainers) were busy with feature 
> development and now
> will be more time for a review.
> 
> Thanks for patience. You can bet on having that in GCC 11.1.0

Thank you for this information.

Qing
> 
> Martin



Re: [stage1][PATCH] Change semantics of -frecord-gcc-switches and add -frecord-gcc-switches-format.

2020-11-10 Thread Qing Zhao via Gcc-patches
Jakub and Jeff,

PING^7 on the following patch proposed 8 months ago for gcc11:

https://gcc.gnu.org/pipermail/gcc-patches/2020-March/542198.html 


The deadline for gcc11 stage 1 is approaching.  The pinged patch is one that 
has been sent for review 8 months ago in order to 
Make into gcc11. 

And this is an important feature that our company is waiting for a long time. 

Could you please take a look at this patch and let us know whether it’s ready 
for commit into gcc11? 

Thanks a lot.

Qing




> On Oct 27, 2020, at 5:56 AM, Martin Liška  wrote:
> 
> PING^6
> 
> The patch is in review process for more than 6 months, can please any global
> reviewer take a look at it?
> 
> Thanks,
> Martin
> 
> On 9/25/20 4:55 PM, Martin Liška wrote:
>> PING^5
> 



[Patch][i386][PR97715]: Fix a bug when adding -fzero-call-used-regs=all with -mno-80387

2020-11-04 Thread Qing Zhao via Gcc-patches
As we discussed in the bug report, we should not zero stack registers when 
there is no x87 registers available. 

The following is the fix per Jakub’s suggestion. 

And I have tested it on X86.

Okay for commit?

thanks.

Qing

From 0080f104df2dc752969a1949981ba343f276e802 Mon Sep 17 00:00:00 2001
From: qing zhao 
Date: Wed, 4 Nov 2020 20:46:15 +0100
Subject: [PATCH] i386: Fix PR97715

This change fixes a bug in the i386 backend when adding
-fzero-call-used-regs=all on a target that has no x87
registers.

When there is no x87 registers available, we should not
zero stack registers.

gcc/Changelog:

PR target/97715
* config/i386/i386.c (zero_all_st_registers): Return
earlier when the FPU is disabled.

gcc/testsuite/ChnageLog:

PR target/97715
* gcc.target/i386/zero-scratch-regs-32.c: New test.
---
 gcc/config/i386/i386.c   |  5 +
 gcc/testsuite/gcc.target/i386/zero-scratch-regs-32.c | 11 +++
 2 files changed, 16 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-32.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 6fc6228a26e..789ef727cf8 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -3640,6 +3640,11 @@ zero_all_vector_registers (HARD_REG_SET 
need_zeroed_hardregs)
 static bool
 zero_all_st_registers (HARD_REG_SET need_zeroed_hardregs)
 {
+
+  /* If the FPU is disabled, no need to zero all st registers.  */
+  if (! (TARGET_80387 || TARGET_FLOAT_RETURNS_IN_80387))
+return false;
+
   unsigned int num_of_st = 0;
   for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
 if ((STACK_REGNO_P (regno) || MMX_REGNO_P (regno))
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-32.c 
b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-32.c
new file mode 100644
index 000..ca3261fe5ea
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-32.c
@@ -0,0 +1,11 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=all -mno-80387" } */
+
+int
+foo (int x)
+{
+  return (x + 1);
+}
+
+/* { dg-final { scan-assembler-not "fldz" } } */
+
-- 
2.11.0



Re: Testsuite fails on PowerPC with: Re: [PATCH][middle-end][i386][version 6]Add -fzero-call-used-regs=[skip|used-gpr-arg|used-arg|all-gpr-arg|all-arg|used-gpr|all-gpr|used|all])

2020-11-04 Thread Qing Zhao via Gcc-patches



> On Nov 4, 2020, at 1:00 PM, Segher Boessenkool  
> wrote:
> 
> On Wed, Nov 04, 2020 at 01:20:58PM +, Richard Sandiford wrote:
>> Tobias Burnus  writes:
>>> Three of the testcases fail on PowerPC: 
>>> gcc.target/i386/zero-scratch-regs-{9,10,11}.c
>>>   powerpc64le-linux-gnu/default/gcc.d/zero-scratch-regs-10.c:77:1: sorry, 
>>> unimplemented: '-fzero-call-used_regs' not supported on this target
>>> 
>>> Did you miss some dg-require-effective-target ?
>> 
>> No, these are a signal to target maintainers that they need
>> to decide whether to add support or accept the status quo
>> (in which case a new effective-target will be needed).  See:
>> https://urldefense.com/v3/__https://gcc.gnu.org/pipermail/gcc-patches/2020-October/557595.html__;!!GqivPVa7Brio!PD1t9rpXf7lNS8yVbiQckiR5w3bv1eqGZenzRGPMBTAlYpshdQ9qVR0JLhoeNFMg$
>>  :
>> 
>>The new tests are likely to fail on some targets with the sorry()
>>message, but I think target maintainers are best placed to decide
>>whether (a) that's a fundamental restriction of the target and the
>>tests should just be skipped or (b) the target needs to implement
>>the new hook.
> 
> But why are tests in gcc.target/i386/ run for other targets at all?!

No,  tests in gcc.target/i386 should not run for PowerPC.

What Tobias Burnus mentioned are the following tests:

powerpc64le-linux-gnu-gcc.sum:FAIL: c-c++-common/zero-scratch-regs-10.c  
-Wc++-compat  (test for excess errors)
powerpc64le-linux-gnu-gcc.sum:FAIL: c-c++-common/zero-scratch-regs-11.c  
-Wc++-compat  (test for excess errors)
powerpc64le-linux-gnu-gcc.sum:FAIL: c-c++-common/zero-scratch-regs-9.c  
-Wc++-compat  (test for excess errors)
powerpc64le-linux-gnu-g++.sum:FAIL: c-c++-common/zero-scratch-regs-10.c  
-std=gnu++98 (test for excess errors)
powerpc64le-linux-gnu-g++.sum:FAIL: c-c++-common/zero-scratch-regs-10.c  
-std=gnu++14 (test for excess errors)
powerpc64le-linux-gnu-g++.sum:FAIL: c-c++-common/zero-scratch-regs-10.c  
-std=gnu++17 (test for excess errors)
powerpc64le-linux-gnu-g++.sum:FAIL: c-c++-common/zero-scratch-regs-10.c  
-std=gnu++2a (test for excess errors)
powerpc64le-linux-gnu-g++.sum:FAIL: c-c++-common/zero-scratch-regs-11.c  
-std=gnu++98 (test for excess errors)


They are under c-c++-common, not gcc.target/i386. 

These testing cases are added intentionaly on all platforms in order to check 
whether  the current middle-end default implementation for
-fzero-call-used-regs works on the specific platform.

If the default implementation doesn’t work for the specific platform, for 
example, on PowerPC, it’s better for the Maintainer of PowerPC to decide
Whether to skip these testing case on this platform or add a PowerPC 
implementation.

Qing
> 
> 
> Segher



Re: [PATCH][middle-end][i386][version 6]Add -fzero-call-used-regs=[skip|used-gpr-arg|used-arg|all-gpr-arg|all-arg|used-gpr|all-gpr|used|all]

2020-10-30 Thread Qing Zhao via Gcc-patches
FYI.

I just committed the patch to gcc11 as:

https://gcc.gnu.org/pipermail/gcc-cvs/2020-October/336263.html 


Qing

Re: [PATCH][middle-end][i386][version 6]Add -fzero-call-used-regs=[skip|used-gpr-arg|used-arg|all-gpr-arg|all-arg|used-gpr|all-gpr|used|all]

2020-10-30 Thread Qing Zhao via Gcc-patches



> On Oct 30, 2020, at 4:54 AM, Richard Sandiford  
> wrote:
> 
> Qing Zhao  writes:
>> @@ -3996,22 +3996,19 @@ with a named @code{target} must be @code{static}.
>> @cindex @code{zero_call_used_regs} function attribute
>> 
>> The @code{zero_call_used_regs} attribute causes the compiler to zero
>> -a subset of all call-used registers at function return according to
>> -@var{choice}.
>> -This is used to increase the program security by either mitigating
>> -Return-Oriented Programming (ROP) or preventing information leak
>> +a subset of all call-used registers@footnote{A ``call-used'' register
>> +is a register whose contents can be changed by a function call;
>> +therefore, a caller cannot assume that the register has the same contents
>> +on return from the function as it had before calling the function.  Such
>> +registers are also called ``call-clobbered'', ``caller-saved'', or
>> +``volatile''.} at function return.
>> +This is used to increase program security by either mitigating
>> +Return-Oriented Programming (ROP) attacks or preventing information leakage
>> through registers.
>> 
>> -A ``call-used'' register is a register whose contents can be changed by
>> -a function call; therefore, a caller cannot assume that the register has
>> -the same contents on return from the function as it had before calling
>> -the function.  Such registers are also called ``call-clobbered'',
>> -``caller-saved'', or ``volatile''.
>> -
>> In order to satisfy users with different security needs and control the
>> -run-time overhead at the same time, GCC provides a flexible way to choose
>> -the subset of the call-used registers to be zeroed.
>> -
>> +run-time overhead at the same time, @var{choice} parameter provides a
> 
> I suggested “the @var{choice} parameter provides” in the review yesterday.
> The “the” is needed.
My bad, added it.
> 
>> +flexible way to choose the subset of the call-used registers to be zeroed.
>> The three basic values of @var{choice} are:
>> 
>> @itemize @bullet
>> @@ -4046,42 +4043,41 @@ together, they must appear in the order above.
>> 
>> The full list of @var{choice}s is therefore:
>> 
>> -@itemize @bullet
>> -@item
>> -@samp{skip} doesn't zero any call-used register.
>> +@table @code
>> +@item skip
>> +doesn't zero any call-used register.
>> 
>> -@item
>> -@samp{used} only zeros call-used registers that are used in the function.
>> +@item used
>> +only zeros call-used registers that are used in the function.
>> 
>> -@item
>> -@samp{all} zeros all call-used registers.
>> +@item used-gpr
>> +only zeros call-used general purpose registers that are used in the 
>> function.
>> 
>> -@item
>> -@samp{used-arg} only zeros used call-used registers that pass arguments.
>> +@item used-arg
>> +only zeros call-used registers that are used in the function and pass 
>> arguments.
>> 
>> -@item
>> -@samp{used-gpr} only zeros used call-used general purpose registers.
>> +@item used-gpr-arg
>> +only zeros call-used general purpose registers that are used in the function
>> +and pass arguments.
>> 
>> -@item
>> -@samp{used-gpr-arg} only zeros used call-used general purpose registers that
>> -pass arguments.
>> +@item all
>> +zeros all call-used registers.
>> 
>> -@item
>> -@samp{all-gpr-arg} zeros all call-used general purpose registers that pass
>> -arguments.
>> +@item all-gpr
>> +zeros all call-used general purpose registers.
>> 
>> -@item
>> -@samp{all-arg} zeros all call-used registers that pass arguments.
>> +@item all-arg
>> +zeros all call-used registers that pass arguments.
>> 
>> -@item
>> -@samp{all-gpr} zeros all call-used general purpose registers.
>> -@end itemize
>> +@item all-gpr-arg
>> +zeros all call-used general purpose registers that pass
>> +arguments.
>> +@end table
> 
> TBH I also think the order I suggested yesterday is more natural
> than this one, but either's OK.  The above certainly addresses
> the original concern I had about the order being inconsistent.

You suggested:

- skip
- used
- used-arg
- used-gpr
- used-gpr-arg
- all
- all-arg
- all-gpr
- all-gpr-arg

I changed to: (switched the order used-arg and used-gpr,all-arg and all-gpr)

-skip
-used
-used-gpr
-used-arg
-used-gpr-arg
-all
-all-gpr
-all-arg
-all-gpr-arg

I intended to do this change in order to make “gpr” before “arg”.

> 
>> @@ -288,7 +288,7 @@ enum sanitize_code {
>> };
>> 
>> /* Different settings for zer

[PATCH][middle-end][i386][version 6]Add -fzero-call-used-regs=[skip|used-gpr-arg|used-arg|all-gpr-arg|all-arg|used-gpr|all-gpr|used|all]

2020-10-29 Thread Qing Zhao via Gcc-patches
Hi, 

This is the 6th version of the implementation of patch -fzero-call-used-regs.

The X86 part is exactly the same as 5th version. (And it has been approved by 
Uros already).

The major change compared to the previous version (5th version) are:

1. Documentation change per Richard’s suggestion;
2. Other minor changes. 
3.  general testing cases update per Richard’s suggestion;


I have tested this new GCC on both x86 and arm64, no regression. 

Richard, Please let me know whether it’s ready for stage 1 gcc11?

thanks.

Qing

In order to make it easier for you to review the change, I list the changes I 
added compared to the 5th version first, and then the whole patch followed.

*The patch compared to 5th version:

From 3545cc92b327e11af0dde832a60161da92cc4262 Mon Sep 17 00:00:00 2001
From: qing zhao 
Date: Thu, 29 Oct 2020 18:40:27 +0100
Subject: [PATCH] fix all issues raised by Richard on 10/29

---
 gcc/c-family/c-attribs.c   |  8 +-
 gcc/common.opt |  2 +-
 gcc/doc/extend.texi| 72 -
 gcc/doc/invoke.texi|  6 +-
 gcc/emit-rtl.h |  3 -
 gcc/flag-types.h   |  2 +-
 gcc/function.c | 56 +++---
 gcc/opts.c |  2 +-
 gcc/targhooks.c|  2 +-
 gcc/testsuite/c-c++-common/zero-scratch-regs-11.c  | 90 +-
 gcc/testsuite/c-c++-common/zero-scratch-regs-2.c   | 13 +---
 gcc/testsuite/c-c++-common/zero-scratch-regs-3.c   | 13 +---
 gcc/testsuite/c-c++-common/zero-scratch-regs-4.c   | 13 +---
 gcc/testsuite/c-c++-common/zero-scratch-regs-5.c   | 13 +---
 gcc/testsuite/c-c++-common/zero-scratch-regs-6.c   | 13 +---
 gcc/testsuite/c-c++-common/zero-scratch-regs-7.c   | 13 +---
 gcc/testsuite/c-c++-common/zero-scratch-regs-8.c   | 13 +---
 gcc/testsuite/c-c++-common/zero-scratch-regs-9.c   | 13 +---
 .../c-c++-common/zero-scratch-regs-attr-usages.c   | 10 ++-
 19 files changed, 90 insertions(+), 267 deletions(-)

diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
index 1b05e8c..8da1dc7 100644
--- a/gcc/c-family/c-attribs.c
+++ b/gcc/c-family/c-attribs.c
@@ -4979,13 +4979,15 @@ handle_zero_call_used_regs_attribute (tree *node, tree 
name, tree args,
   error_at (DECL_SOURCE_LOCATION (decl),
"%qE attribute applies only to functions", name);
   *no_add_attrs = true;
+  return NULL_TREE;
 }
 
   if (TREE_CODE (id) != STRING_CST)
 {
   error_at (DECL_SOURCE_LOCATION (decl),
-   "attribute %qE arguments not a string", name);
+   "%qE argument not a string", name);
   *no_add_attrs = true;
+  return NULL_TREE;
 }
 
   bool found = false;
@@ -5000,8 +5002,8 @@ handle_zero_call_used_regs_attribute (tree *node, tree 
name, tree args,
   if (!found)
 {
   error_at (DECL_SOURCE_LOCATION (decl),
-   "unrecognized zero_call_used_regs attribute: %qs",
-   TREE_STRING_POINTER (id));
+   "unrecognized %qE attribute argument %qs",
+   name, TREE_STRING_POINTER (id));
   *no_add_attrs = true;
 }
 
diff --git a/gcc/common.opt b/gcc/common.opt
index 4a13f32..d716ea1 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -228,7 +228,7 @@ unsigned int flag_sanitize_coverage
 Variable
 bool dump_base_name_prefixed = false
 
-; What subset of registers should be zeroed
+; What subset of registers should be zeroed on function return
 Variable
 unsigned int flag_zero_call_used_regs
 
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index b011c17..25b3909 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -3996,22 +3996,19 @@ with a named @code{target} must be @code{static}.
 @cindex @code{zero_call_used_regs} function attribute
 
 The @code{zero_call_used_regs} attribute causes the compiler to zero
-a subset of all call-used registers at function return according to
-@var{choice}.
-This is used to increase the program security by either mitigating
-Return-Oriented Programming (ROP) or preventing information leak
+a subset of all call-used registers@footnote{A ``call-used'' register
+is a register whose contents can be changed by a function call;
+therefore, a caller cannot assume that the register has the same contents
+on return from the function as it had before calling the function.  Such
+registers are also called ``call-clobbered'', ``caller-saved'', or
+``volatile''.} at function return.
+This is used to increase program security by either mitigating
+Return-Oriented Programming (ROP) attacks or preventing information leakage
 through registers.
 
-A ``call-used'' register is a register whose contents can be changed by
-a function call; therefore, a caller cannot assume that 

Re: [PATCH][middle-end][i386][version 5]Add -fzero-call-used-regs=[skip|used-gpr-arg|used-arg|all-gpr-arg|all-arg|used-gpr|all-gpr|used|all]

2020-10-29 Thread Qing Zhao via Gcc-patches
Hi, Segher,

> On Oct 29, 2020, at 2:31 PM, Segher Boessenkool  
> wrote:
> 
> On Thu, Oct 29, 2020 at 06:02:58PM +, Richard Sandiford wrote:
>> Qing Zhao via Gcc-patches  writes:
>>>>> +Return-Oriented Programming (ROP) or preventing information leak
>>>> 
>>>> leakage
>>>> 
>>>> (FWIW, I'm not sure “mitigating ROP” is really correct usage, but I don't
>>>> have any better suggestions.)
>>> 
>>> Do you mean whether “mitigating ROP’ is one of the major purpose of this 
>>> new feature?
>> 
>> No, I meant just the English usage.  E.g., I think you mitigate the
>> damage caused by earthquakes rather than mitigate earthquakes themselves.
>> But I could be wrong.  It's not a word I use very often ;-)
> 
> "Mitigating ROP attacks" is a phrase often used in the literature, sadly
> (what is really meant is not "mitigating the effects of ROP attacks",
> but simply "making ROP attacks less likely to succeed" -- it is almost
> always a binary thing, either it succeeds or it doesn't).

Thanks.

I will change “Mitigating ROP” to “Mitigating ROP attacks” in the 
documentation. 

Qing
> 
> 
> Segher



Re: [PATCH][middle-end][i386][version 5]Add -fzero-call-used-regs=[skip|used-gpr-arg|used-arg|all-gpr-arg|all-arg|used-gpr|all-gpr|used|all]

2020-10-29 Thread Qing Zhao via Gcc-patches



> On Oct 29, 2020, at 1:06 PM, Richard Sandiford  
> wrote:
> 
> Qing Zhao  writes:
>> Now, the documentation (gcc.info) is like following, let me know any issue 
>> there:
> 
> Yeah, looks good apart from merging
> 
>> In order to satisfy users with different security needs and control
>> the run-time overhead at the same time, CHOICE parameter provides a
>> flexible way to choose the subset of the call-used registers to be
>> zeroed.
>> 
>> The three basic values of CHOICE are:
> 
> this into a single paragraph.

Okay, will do.

Qing
> 
> Thanks,
> Richard



Re: [PATCH][middle-end][i386][version 5]Add -fzero-call-used-regs=[skip|used-gpr-arg|used-arg|all-gpr-arg|all-arg|used-gpr|all-gpr|used|all]

2020-10-29 Thread Qing Zhao via Gcc-patches



> On Oct 29, 2020, at 1:02 PM, Richard Sandiford  
> wrote:
> 
> Qing Zhao via Gcc-patches  writes:
>>>> +Return-Oriented Programming (ROP) or preventing information leak
>>> 
>>> leakage
>>> 
>>> (FWIW, I'm not sure “mitigating ROP” is really correct usage, but I don't
>>> have any better suggestions.)
>> 
>> Do you mean whether “mitigating ROP’ is one of the major purpose of this new 
>> feature?
> 
> No, I meant just the English usage.  E.g., I think you mitigate the
> damage caused by earthquakes rather than mitigate earthquakes themselves.
> But I could be wrong.  It's not a word I use very often ;-)

Okay.
I see. 
> 
>>>> +In order to satisfy users with different security needs and control the
>>>> +run-time overhead at the same time, GCC provides a flexible way to choose
>>>> +the subset of the call-used registers to be zeroed.
>>> 
>>> Maybe s/GCC/the @var{choice} parameter/.
>> Okay.
>>> 
>>>> +
>>>> +The three basic values of @var{choice} are:
>>> 
>>> After which, I think this should be part of the previous paragraph.
>> 
>> Don’t understand here, could you explain a little bit more?
> 
> I meant:
> 
> In order to satisfy users with different security needs and control the
> run-time overhead at the same time, @var{choice} provides a flexible way
> to choose the subset of the call-used registers to be zeroed.  The three
> basic values of @var{choice} are:
> 

Oh. :-)

>>>> +  /* If gpr_only is true, only zero call-used registers that are
>>>> + general-purpose registers; if used_only is true, only zero
>>>> + call-used registers that are used in the current function;
>>>> + if arg_only is true, only zero call-used registers that pass
>>>> + parameters defined by the flatform's calling conversion.  */
>>>> +
>>>> +  gpr_only = crtl->zero_call_used_regs & ONLY_GPR;
>>>> +  used_only = crtl->zero_call_used_regs & ONLY_USED;
>>>> +  arg_only = crtl->zero_call_used_regs & ONLY_ARG;
>>> 
>>> Guess it would be nice to be consistent about which side the “only”
>>> goes on.  FWIW, I don't mind which way: GPR_ONLY etc. would be
>>> OK with me if you prefer that.
>> The current names are okay for me.
> 
> OK.  But I think one of them should change to match the other.
> E.g. either the local variable should be “only_gpr” or the
> flag should be “GPR_ONLY”.
Okay, I see what you mean, will make them consistent.

Qing
> 
> Thanks,
> Richard



Re: [PATCH][middle-end][i386][version 5]Add -fzero-call-used-regs=[skip|used-gpr-arg|used-arg|all-gpr-arg|all-arg|used-gpr|all-gpr|used|all]

2020-10-29 Thread Qing Zhao via Gcc-patches
Hi, Richard,


Now, the documentation (gcc.info) is like following, let me know any issue 
there:

thanks.

Qing
==

'zero_call_used_regs ("CHOICE")'

 The 'zero_call_used_regs' attribute causes the compiler to zero a
 subset of all call-used registers(1) at function return.  This is
 used to increase program security by either mitigating
 Return-Oriented Programming (ROP) or preventing information leakage
 through registers.

 In order to satisfy users with different security needs and control
 the run-time overhead at the same time, CHOICE parameter provides a
 flexible way to choose the subset of the call-used registers to be
 zeroed.

 The three basic values of CHOICE are:

* 'skip' doesn't zero any call-used registers.

* 'used' only zeros call-used registers that are used in the
  function.  A "used" register is one whose content has been set
  or referenced in the function.

* 'all' zeros all call-used registers.

 In addition to these three basic choices, it is possible to modify
 'used' or 'all' as follows:

* Adding '-gpr' restricts the zeroing to general-purpose
  registers.

* Adding '-arg' restricts the zeroing to registers that can
  sometimes be used to pass function arguments.  This includes
  all argument registers defined by the platform's calling
  conversion, regardless of whether the function uses those
  registers for function arguments or not.

 The modifiers can be used individually or together.  If they are
 used together, they must appear in the order above.

 The full list of CHOICEs is therefore:

 'skip'
  doesn't zero any call-used register.

 'used'
  only zeros call-used registers that are used in the function.

 'used-gpr'
  only zeros call-used general purpose registers that are used
  in the function.

 'used-arg'
  only zeros call-used registers that are used in the function
  and pass arguments.

 'used-gpr-arg'
  only zeros call-used general purpose registers that are used
  in the function and pass arguments.

 'all'
  zeros all call-used registers.

 'all-gpr'
  zeros all call-used general purpose registers.

 'all-arg'
  zeros all call-used registers that pass arguments.

 'all-gpr-arg'
  zeros all call-used general purpose registers that pass
  arguments.

 Of this list, 'used-arg', 'used-gpr-arg', 'all-arg', and
 'all-gpr-arg' are mainly used for ROP mitigation.

 The default for the attribute is controlled by
 '-fzero-call-used-regs'.

   -- Footnotes --

   (1) A "call-used" register is a register whose contents can be
changed by a function call; therefore, a caller cannot assume that the
register has the same contents on return from the function as it had
before calling the function.  Such registers are also called
"call-clobbered", "caller-saved", or "volatile”.


'-fzero-call-used-regs=CHOICE'
 Zero call-used registers at function return to increase program
 security by either mitigating Return-Oriented Programming (ROP) or
 preventing information leakage through registers.

 The possible values of CHOICE are the same as for the
 'zero_call_used_regs' attribute (*note Function Attributes::).  The
 default is 'skip'.

 You can control this behavior for a specific function by using the
 function attribute 'zero_call_used_regs' (*note Function
 Attributes::).



Re: [PATCH][middle-end][i386][version 5]Add -fzero-call-used-regs=[skip|used-gpr-arg|used-arg|all-gpr-arg|all-arg|used-gpr|all-gpr|used|all]

2020-10-29 Thread Qing Zhao via Gcc-patches



> On Oct 29, 2020, at 6:09 AM, Richard Sandiford  
> wrote:
> 
> Qing Zhao via Gcc-patches  writes:
>> +/* Handle a "zero_call_used_regs" attribute; arguments as in
>> +   struct attribute_spec.handler.  */
>> +
>> +static tree
>> +handle_zero_call_used_regs_attribute (tree *node, tree name, tree args,
>> +  int ARG_UNUSED (flags),
>> +  bool *no_add_attrs)
>> +{
>> +  tree decl = *node;
>> +  tree id = TREE_VALUE (args);
>> +
>> +  if (TREE_CODE (decl) != FUNCTION_DECL)
>> +{
>> +  error_at (DECL_SOURCE_LOCATION (decl),
>> +"%qE attribute applies only to functions", name);
>> +  *no_add_attrs = true;
>> +}
>> +
>> +  if (TREE_CODE (id) != STRING_CST)
>> +{
>> +  error_at (DECL_SOURCE_LOCATION (decl),
>> +"attribute %qE arguments not a string", name);
> 
> The existing message for this seems to be:
> 
>  "%qE argument not a string"
> 
> (which seems a bit terse, but hey)

Okay.
> 
>> +  *no_add_attrs = true;
>> +}
>> +
>> +  bool found = false;
>> +  for (unsigned int i = 0; zero_call_used_regs_opts[i].name != NULL; ++i)
>> +if (strcmp (TREE_STRING_POINTER (id),
>> +zero_call_used_regs_opts[i].name) == 0)
>> +  {
>> +found = true;
>> +break;
>> +  }
>> +
>> +  if (!found)
>> +{
>> +  error_at (DECL_SOURCE_LOCATION (decl),
>> +"unrecognized zero_call_used_regs attribute: %qs",
>> +TREE_STRING_POINTER (id));
> 
> The attribute name needs to be quoted, and it would be good if it
> wasn't hard-coded into the string:
> 
>  error_at (DECL_SOURCE_LOCATION (decl),
>   "unrecognized %qE attribute argument %qs", name,
>   TREE_STRING_POINTER (id));
Okay.
> 
>> @@ -228,6 +228,10 @@ unsigned int flag_sanitize_coverage
>> Variable
>> bool dump_base_name_prefixed = false
>> 
>> +; What subset of registers should be zeroed
> 
> Think it would be useful to add “ on function return.”.
Okay.
> 
>> +Variable
>> +unsigned int flag_zero_call_used_regs
>> +
>> ###
>> Driver
>> 
>> diff --git a/gcc/df.h b/gcc/df.h
>> index 8b6ca8c..0f098d7 100644
>> --- a/gcc/df.h
>> +++ b/gcc/df.h
>> @@ -1085,6 +1085,7 @@ extern void df_update_entry_exit_and_calls (void);
>> extern bool df_hard_reg_used_p (unsigned int);
>> extern unsigned int df_hard_reg_used_count (unsigned int);
>> extern bool df_regs_ever_live_p (unsigned int);
>> +extern bool df_epilogue_uses_p (unsigned int);
>> extern void df_set_regs_ever_live (unsigned int, bool);
>> extern void df_compute_regs_ever_live (bool);
>> extern void df_scan_verify (void);
>> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
>> index c9f7299..b011c17 100644
>> --- a/gcc/doc/extend.texi
>> +++ b/gcc/doc/extend.texi
>> @@ -3992,6 +3992,96 @@ performing a link with relocatable output (i.e.@: 
>> @code{ld -r}) on them.
>> A declaration to which @code{weakref} is attached and that is associated
>> with a named @code{target} must be @code{static}.
>> 
>> +@item zero_call_used_regs ("@var{choice}")
>> +@cindex @code{zero_call_used_regs} function attribute
>> +
>> +The @code{zero_call_used_regs} attribute causes the compiler to zero
>> +a subset of all call-used registers at function return according to
>> +@var{choice}.
> 
> Suggest dropping “according to @var{choice}” here, since it's now
> disconnected with the part that talks about what @var{choice} is.
Okay
> 
>> +This is used to increase the program security by either mitigating
> 
> s/the program security/program security/
Okay
> 
>> +Return-Oriented Programming (ROP) or preventing information leak
> 
> leakage
> 
> (FWIW, I'm not sure “mitigating ROP” is really correct usage, but I don't
> have any better suggestions.)

Do you mean whether “mitigating ROP’ is one of the major purpose of this new 
feature?

The initial main motivation of the new feature is for mitigating ROP. And the 
reason for only zeroing
argument subset of the register is also for mitigating ROP.

> 
>> +through registers.
>> +
>> +A ``call-used'' register is a register whose contents can be changed by
>> +a function call; therefore, a caller cannot assume that the register has
>> +the same contents on return from the function as it had before calling
>> +the function.  Such regis

[PATCH][middle-end][i386][version 5]Add -fzero-call-used-regs=[skip|used-gpr-arg|used-arg|all-gpr-arg|all-arg|used-gpr|all-gpr|used|all]

2020-10-28 Thread Qing Zhao via Gcc-patches
Hi, 

This is the 5th version of the implementation of patch -fzero-call-used-regs.

The major change compared to the previous version (4th version) are:

1. Documentation change per Richard’s suggestion;
2. Use namespace for zero_regs_code;
3. Add more general testing cases per Richard’s suggestion;
4. I386 part, ST/MM register sets clearing per Uros’s suggestion. 
5. Add more i386 testing cases for ST/MM clearing per Uros’s suggestion.
6. Some minor style fixes. 

I have tested this new GCC on both x86 and arm64, no regression. 

Please let me know whether it’s ready for stage 1 gcc11?

Thanks.

Qing

**The documentation (gcc.info):
'zero_call_used_regs ("CHOICE")'

 The 'zero_call_used_regs' attribute causes the compiler to zero a
 subset of all call-used registers at function return according to
 CHOICE.  This is used to increase the program security by either
 mitigating Return-Oriented Programming (ROP) or preventing
 information leak through registers.

 A "call-used" register is a register whose contents can be changed
 by a function call; therefore, a caller cannot assume that the
 register has the same contents on return from the function as it
 had before calling the function.  Such registers are also called
 "call-clobbered", "caller-saved", or "volatile".

 In order to satisfy users with different security needs and control
 the run-time overhead at the same time, GCC provides a flexible way
 to choose the subset of the call-used registers to be zeroed.

 The three basic values of CHOICE are:

* 'skip' doesn't zero any call-used registers.

* 'used' only zeros call-used registers that are used in the
  function.  A "used" register is one whose content has been set
  or referenced in the function.

* 'all' zeros all call-used registers.

 In addition to these three basic choices, it is possible to modify
 'used' or 'all' as follows:

* Adding '-gpr' restricts the zeroing to general-purpose
  registers.

* Adding '-arg' restricts the zeroing to registers that can
  sometimes be used to pass function arguments.  This includes
  all argument registers defined by the platform's calling
  conversion, regardless of whether the function uses those
  registers for function arguments or not.

 The modifiers can be used individually or together.  If they are
 used together, they must appear in the order above.

 The full list of CHOICEs is therefore:

* 'skip' doesn't zero any call-used register.

* 'used' only zeros call-used registers that are used in the
  function.

* 'all' zeros all call-used registers.

* 'used-arg' only zeros used call-used registers that pass
  arguments.

* 'used-gpr' only zeros used call-used general purpose
  registers.

* 'used-gpr-arg' only zeros used call-used general purpose
  registers that pass arguments.

* 'all-gpr-arg' zeros all call-used general purpose registers
  that pass arguments.

* 'all-arg' zeros all call-used registers that pass arguments.

* 'all-gpr' zeros all call-used general purpose registers.

 Among this list, 'used-gpr-arg', 'used-arg', 'all-gpr-arg', and
 'all-arg' are mainly used for ROP mitigation.

 The default for the attribute is controlled by
 '-fzero-call-used-regs’.

'-fzero-call-used-regs=CHOICE'
 Zero call-used registers at function return to increase the program
 security by either mitigating Return-Oriented Programming (ROP) or
 preventing information leak through registers.

 The possible values of CHOICE are the same as for the
 'zero_call_used_regs' attribute (*note Function Attributes::).  The
 default is 'skip'.

 You can control this behavior for a specific function by using the
 function attribute 'zero_call_used_regs' (*note Function
 Attributes::).

**The changelog:

gcc/ChangeLog:

2020-10-28  Qing Zhao  
H.J.Lu  

* common.opt: Add new option -fzero-call-used-regs
* config/i386/i386.c (zero_call_used_regno_p): New function.
(zero_call_used_regno_mode): Likewise.
(zero_all_vector_registers): Likewise.
(zero_all_st_registers): Likewise.
(zero_all_mm_registers): Likewise.
(ix86_zero_call_used_regs): Likewise.
(TARGET_ZERO_CALL_USED_REGS): Define.
* df-scan.c (df_epilogue_uses_p): New function.
(df_get_exit_block_use_set): Replace EPILOGUE_USES with
df_epilogue_uses_p.
* df.h (df_epilogue_uses_p): Declare.
* doc/extend.texi: Document the new zero_call_used_regs attribute.
* doc/invoke.texi: Document the new -fzero-call-used-regs option.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in (TARGET_Z

Re: [PATCH][middle-end][i386][Version 4] Add -fzero-call-used-regs=[skip|used-gpr-arg|used-arg|all-arg|used-gpr|all-gpr|used|all]

2020-10-28 Thread Qing Zhao via Gcc-patches
Hi, Richard, 

I changed the “enum” to “namespace”.

There is no issue for C++ compilation. However, flag-types.h header file is 
also included by C modules and compiled with gcc, then I got a lot of following 
compilation errors:

make[4]: Entering directory 
'/home/qinzhao/Work/x86-build/x86_64-pc-linux-gnu/libgcc'
In file included from ../.././gcc/options.h:6,
 from ../.././gcc/tm.h:22,
 from ../../../x86-gcc/libgcc/libgcc2.c:29,
 from ../../../x86-gcc/libgcc/config/i386/64/_multc3.c:6:
../../../x86-gcc/libgcc/../gcc/flag-types.h:289:1: error: unknown type name 
‘namespace’
  289 | namespace  zero_regs_code {
  | ^

Looks like that I should not put this new namespace inside “flag-types.h”?  
Which other header file I should put this namespace in? 

thanks.

Qing

> On Oct 28, 2020, at 9:24 AM, Qing Zhao via Gcc-patches 
>  wrote:
> 
> Okay, I will change it to namespace.
> 
> Qing
> 
>> On Oct 28, 2020, at 9:19 AM, Richard Sandiford  
>> wrote:
>> 
>> Qing Zhao mailto:qing.z...@oracle.com>> writes:
>>> Hi, Richard,
>>> 
>>> In order to be consistent with other flags in flag-types.h, for example, 
>>> “sanitize_code”,
>>> I didn’t use namespace, instead making the name more specific as following:
>>> 
>>> /* Different settings for zeroing subset of registers.  */
>>> enum  zero_regs_flags {
>>> ZERO_REGS_UNSET = 0,
>>> ZERO_REGS_SKIP = 1UL << 0,
>>> ZERO_REGS_ONLY_USED = 1UL << 1,
>>> ZERO_REGS_ONLY_GPR = 1UL << 2,
>>> ZERO_REGS_ONLY_ARG = 1UL << 3,
>>> ZERO_REGS_ENABLED = 1UL << 4,
>>> ZERO_REGS_USED_GPR_ARG = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_USED
>>>  | ZERO_REGS_ONLY_GPR | ZERO_REGS_ONLY_ARG,
>>> ZERO_REGS_USED_GPR = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_USED
>>>  | ZERO_REGS_ONLY_GPR,
>>> ZERO_REGS_USED_ARG = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_USED
>>>  | ZERO_REGS_ONLY_ARG,
>>> ZERO_REGS_USED = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_USED,
>>> ZERO_REGS_ALL_GPR_ARG = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_GPR
>>> | ZERO_REGS_ONLY_ARG,
>>> ZERO_REGS_ALL_GPR = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_GPR,
>>> ZERO_REGS_ALL_ARG = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_ARG,
>>> ZERO_REGS_ALL = ZERO_REGS_ENABLED
>>> };
>>> 
>>> Is this good?
>>> 
>>> Or you still prefer namespace?
>> 
>> I prefer the namespace.  I realise namespaces aren't used that much
>> in GCC yet, but they *are* used.
>> 
>> The advantage they have is that it's possible to do:
>> 
>> using namespace ...;
>> 
>> in contexts where there's no ambiguity.  They also make lines like
>> the | ones above easier to read.
>> 
>> Thanks,
>> Richard
> 



Re: [PATCH][middle-end][i386][Version 4] Add -fzero-call-used-regs=[skip|used-gpr-arg|used-arg|all-arg|used-gpr|all-gpr|used|all]

2020-10-28 Thread Qing Zhao via Gcc-patches
Okay, I will change it to namespace.

Qing

> On Oct 28, 2020, at 9:19 AM, Richard Sandiford  
> wrote:
> 
> Qing Zhao mailto:qing.z...@oracle.com>> writes:
>> Hi, Richard,
>> 
>> In order to be consistent with other flags in flag-types.h, for example, 
>> “sanitize_code”,
>> I didn’t use namespace, instead making the name more specific as following:
>> 
>> /* Different settings for zeroing subset of registers.  */
>> enum  zero_regs_flags {
>>  ZERO_REGS_UNSET = 0,
>>  ZERO_REGS_SKIP = 1UL << 0,
>>  ZERO_REGS_ONLY_USED = 1UL << 1,
>>  ZERO_REGS_ONLY_GPR = 1UL << 2,
>>  ZERO_REGS_ONLY_ARG = 1UL << 3,
>>  ZERO_REGS_ENABLED = 1UL << 4,
>>  ZERO_REGS_USED_GPR_ARG = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_USED
>>   | ZERO_REGS_ONLY_GPR | ZERO_REGS_ONLY_ARG,
>>  ZERO_REGS_USED_GPR = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_USED
>>   | ZERO_REGS_ONLY_GPR,
>>  ZERO_REGS_USED_ARG = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_USED
>>   | ZERO_REGS_ONLY_ARG,
>>  ZERO_REGS_USED = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_USED,
>>  ZERO_REGS_ALL_GPR_ARG = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_GPR
>>  | ZERO_REGS_ONLY_ARG,
>>  ZERO_REGS_ALL_GPR = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_GPR,
>>  ZERO_REGS_ALL_ARG = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_ARG,
>>  ZERO_REGS_ALL = ZERO_REGS_ENABLED
>> };
>> 
>> Is this good?
>> 
>> Or you still prefer namespace?
> 
> I prefer the namespace.  I realise namespaces aren't used that much
> in GCC yet, but they *are* used.
> 
> The advantage they have is that it's possible to do:
> 
>  using namespace ...;
> 
> in contexts where there's no ambiguity.  They also make lines like
> the | ones above easier to read.
> 
> Thanks,
> Richard



<    4   5   6   7   8   9   10   11   12   >