Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-03 Thread Qing Zhao
Yes, after today’s discussion, I think we agreed on 

1. Passing the size field by reference to .ACCESS_WITH_SIZE as jakub suggested.
2. Then the compiler should be able to always use the latest value of size 
field for the reference to FAM.

As a result, no need to add code for pointer re-obtaining purpose in the source 
code. 

I will update the proposal one more time.

thanks.

Qing

> On Nov 2, 2023, at 8:28 PM, Bill Wendling  wrote:
> 
> On Thu, Nov 2, 2023 at 1:36 PM Qing Zhao  wrote:
>> 
>> Thanks a lot for raising these issues.
>> 
>> If I understand correctly,  the major question we need to answer is:
>> 
>> For the following example: (Jakub mentioned this  in an early message)
>> 
>>  1 struct S { int a; char b __attribute__((counted_by (a))) []; };
>>  2 struct S s;
>>  3 s.a = 5;
>>  4 char *p = [2];
>>  5 int i1 = __builtin_dynamic_object_size (p, 0);
>>  6 s.a = 3;
>>  7 int i2 = __builtin_dynamic_object_size (p, 0);
>> 
>> Should the 2nd __bdos call (line 7) get
>>A. the latest value of s.a (line 6) for it’s size?
>> Or  B. the value when the s.b was referenced (line 3, line 4)?
>> 
> I personally think it should be (A). The user is specifically
> indicating that the size has somehow changed, and the compiler should
> behave accordingly.
> 
>> A should be more convenient for the user to use the dynamic array feature.
>> With B, the user has to modify the source code (to add code to “re-obtain”
>> the pointer after the size was adjusted at line 6) as mentioned by Richard.
>> 
>> This depends on how we design the new internal function .ACCESS_WITH_SIZE
>> 
>> 1. Size is passed by value to .ACCESS_WITH_SIZE as we currently designed.
>> 
>> PTR = .ACCESS_WITH_SIZE (PTR, SIZE, ACCESS_MODE)
>> 
>> 2. Size is passed by reference to .ACCESS_WITH_SIZE as Jakub suggested.
>> 
>> PTR = .ACCESS_WITH_SIZE(PTR, , TYPEOFSIZE, ACCESS_MODE)
>> 
>> With 1, We can only provide B, the user needs to modify the source code to 
>> get the full feature of dynamic array;
>> With 2, We can provide  A, the user will get full support to the dynamic 
>> array without restrictions in the source code.
>> 
> My understanding of ACCESS_WITH_SIZE is that it's there to add an
> explicit reference to SIZE so that the optimizers won't reorder the
> code incorrectly. If that's the case, then it should act as if
> ACCESS_WITH_SIZE wasn't even there (i.e. it's just a pointer
> dereference into the FAM). We get that with (2) it appears. It would
> be a major headache to make the user go throughout their code base to
> ensure that SIZE was either unmodified, or if it was that extra code
> must be added to ensure the expected behavior.
> 
>> However, We have to pay additional cost for supporting A by using 2, which 
>> includes:
>> 
>> 1. .ACCESS_WITH_SIZE will become an escape point, which will further impact 
>> the IPA optimizations, more runtime overhead.
>>Then .ACCESS_WTH_SIZE will not be CONST, right? But it will still be PURE?
>> 
>> 2. __builtin_dynamic_object_size will NOT be LEAF anymore.  This will also 
>> impact some IPA optimizations, more runtime overhead.
>> 
>> I think the following are the factors that make the decision:
>> 
>> 1. How big the performance impact?
>> 2. How important the dynamic array feature? Is adding some user restrictions 
>> as Richard mentioned feasible to support this feature?
>> 
>> Maybe we can implement 1 first, if the full support to the dynamic array is 
>> needed, we can add 2 then?
>> Or, we can implement both, and compare the performance difference, then 
>> decide?
>> 
>> Qing
>> 



Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-03 Thread Qing Zhao


> On Nov 2, 2023, at 8:13 PM, Bill Wendling  wrote:
> 
> On Thu, Nov 2, 2023 at 1:00 AM Richard Biener
>  wrote:
>> 
>> On Wed, Nov 1, 2023 at 3:47 PM Qing Zhao  wrote:
>>> 
>>> 
>>> 
 On Oct 31, 2023, at 6:14 PM, Joseph Myers  wrote:
 
 On Tue, 31 Oct 2023, Qing Zhao wrote:
 
> 2.3 A new semantic requirement in the user documentation of "counted_by"
> 
> For the following structure including a FAM with a counted_by attribute:
> 
> struct A
> {
>  size_t size;
>  char buf[] __attribute__((counted_by(size)));
> };
> 
> for any object with such type:
> 
> struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
> 
> The setting to the size field should be done before the first reference
> to the FAM field.
> 
> Such requirement to the user will guarantee that the first reference to
> the FAM knows the size of the FAM.
> 
> We need to add this additional requirement to the user document.
 
 Make sure the manual is very specific about exactly when size is
 considered to be an accurate representation of the space available for buf
 (given that, after malloc or realloc, it's going to be temporarily
 inaccurate).  If the intent is that inaccurate size at such a time means
 undefined behavior, say so explicitly.
>>> 
>>> Yes, good point. We need to define this clearly in the beginning.
>>> We need to explicit say that
>>> 
>>> the size of the FAM is defined by the latest “counted_by” value. And it’s 
>>> an undefined behavior when the size field is not defined when the FAM is 
>>> referenced.
>>> 
>>> Is the above good enough?
>>> 
>>> 
 
> 2.4 Replace FAM field accesses with the new function ACCESS_WITH_SIZE
> 
> In C FE:
> 
> for every reference to a FAM, for example, "obj->buf" in the small 
> example,
> check whether the corresponding FIELD_DECL has a "counted_by" attribute?
> if YES, replace the reference to "obj->buf" with a call to
> .ACCESS_WITH_SIZE (obj->buf, obj->size, -1);
 
 This seems plausible - but you should also consider the case of static
 initializers - remember the GNU extension for statically allocated objects
 with flexible array members (unless you're not allowing it with
 counted_by).
 
 static struct A x = { sizeof "hello", "hello" };
 static char *y = 
 
 I'd expect that to be valid - and unless you say such a usage is invalid,
>>> 
>>> At this moment, I think that this should be valid.
>>> 
>>> I,e, the following:
>>> 
>>> struct A
>>> {
>>> size_t size;
>>> char buf[] __attribute__((counted_by(size)));
>>> };
>>> 
>>> static struct A x = {sizeof "hello", "hello”};
>>> 
>>> Should be valid, and x.size represents the number of elements of x.buf.
>>> Both x.size and x.buf are initialized statically.
>>> 
 you should avoid the replacement in such a static initializer context when
 the FAM reference is to an object with a constant address (if
 .ACCESS_WITH_SIZE would not act as an lvalue whose address is a constant
 expression; if it works fine as a constant-address lvalue, then the
 replacement would be OK).
>>> 
>>> Then if such usage for the “counted_by” is valid, we need to replace the FAM
>>> reference by a call to  .ACCESS_WITH_SIZE as well.
>>> Otherwise the “counted_by” relationship will be lost to the Middle end.
>>> 
>>> With the current definition of .ACCESS_WITH_SIZE
>>> 
>>> PTR = .ACCESS_WITH_SIZE (PTR, SIZE, ACCESS_MODE)
>>> 
>>> Isn’t the PTR (return value of the call) a LVALUE?
>> 
>> You probably want to specify that when a pointer to the array is taken the
>> pointer has to be to the first array element (or do we want to mangle the
>> 'size' accordingly for the instrumentation?).  You also want to specify that
>> the 'size' associated with such pointer is assumed to be unchanging and
>> after changing the size such pointer has to be re-obtained.  Plus that
>> changes to the allocated object/size have to be performed through an
>> lvalue where the containing type and thus the 'counted_by' attribute is
>> visible.  That is,
>> 
>> size_t *s = 
>> *s = 1;
>> 
>> is invoking undefined behavior, likewise modifying 'buf' (makes it a bit
>> awkward since for example that wouldn't support using posix_memalign
>> for allocation, though aligned_alloc would be fine).
>> 
> I believe Qing's original documentation for counted_by makes the
> relationship between 'size' and the FAM very clear and that without
> agreement it'll result in undefined behavior. Though it might be best
> to state that in a strong way.

I will update the counted-by documentation with the following additions:

1. The initialization of the size field should be done before the first 
reference to the FAM;
And.
 2. A later reference to the FAM will use the latest value assigned to the size 
field before that reference;

I think adding these two on top of my current user 

Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-03 Thread Qing Zhao


> On Nov 3, 2023, at 12:30 PM, Jakub Jelinek  wrote:
> 
> On Fri, Nov 03, 2023 at 04:20:57PM +, Qing Zhao wrote:
>> So, based on the discussion so far, We will define the .ACCESS_WITH_SIZE as 
>> following:
>> 
>> .ACCESS_WITH_SIZE (REF_TO_OBJ, REF_TO_SIZE, ACCESS_MODE)
>> 
>> INTERNAL_FN (ACCESS_WITH_SIZE,  ECF_LEAF | ECF_NOTHROW, NULL)
>> 
>> which returns the “REF_TO_OBJ" same as the 1st argument;
>> 
>> 1st argument “REF_TO_OBJ": Reference to the object;
>> 2nd argument “REF_TO_SIZE”:  Reference to size of the object referenced by 
>> the 1st argument, 
>> if the object that the “REF_TO_OBJ” refered has a
>>   * real type, the SIZE that the “REF_TO_SIZE” referred is the number of the 
>> elements of the type;
>>   * void type, the SIZE that the “REF_TO_SIZE” referred is number of bytes; 
> 
> No, you can't do this.  Conversions between pointers are mostly useless in
> GIMPLE, , so you can't make decisions based on TREE_TYPE (TREE_TYPE (fnarg))
> as it could have some random completely unrelated type.
> So, the multiplication factor needs to be encoded in the arguments rather
> than derived from REF_TO_OBJ's type, and similarly the size of what
> REF_TO_SIZE points to needs to be encoded somewhere.

Okay, I see, so 2 more arguments to the new function.

Qing
> 
>> 3rd argument "ACCESS_MODE": 
>> -1: Unknown access semantics
>>  0: none
>>  1: read_only
>>  2: write_only
>>  3: read_write
> 
>   Jakub
> 



Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-03 Thread Jakub Jelinek
On Fri, Nov 03, 2023 at 04:20:57PM +, Qing Zhao wrote:
> So, based on the discussion so far, We will define the .ACCESS_WITH_SIZE as 
> following:
> 
>  .ACCESS_WITH_SIZE (REF_TO_OBJ, REF_TO_SIZE, ACCESS_MODE)
> 
> INTERNAL_FN (ACCESS_WITH_SIZE,  ECF_LEAF | ECF_NOTHROW, NULL)
> 
> which returns the “REF_TO_OBJ" same as the 1st argument;
> 
> 1st argument “REF_TO_OBJ": Reference to the object;
> 2nd argument “REF_TO_SIZE”:  Reference to size of the object referenced by 
> the 1st argument, 
>  if the object that the “REF_TO_OBJ” refered has a
>* real type, the SIZE that the “REF_TO_SIZE” referred is the number of the 
> elements of the type;
>* void type, the SIZE that the “REF_TO_SIZE” referred is number of bytes; 

No, you can't do this.  Conversions between pointers are mostly useless in
GIMPLE, , so you can't make decisions based on TREE_TYPE (TREE_TYPE (fnarg))
as it could have some random completely unrelated type.
So, the multiplication factor needs to be encoded in the arguments rather
than derived from REF_TO_OBJ's type, and similarly the size of what
REF_TO_SIZE points to needs to be encoded somewhere.

> 3rd argument "ACCESS_MODE": 
>  -1: Unknown access semantics
>   0: none
>   1: read_only
>   2: write_only
>   3: read_write

Jakub



Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-03 Thread Qing Zhao
So, based on the discussion so far, We will define the .ACCESS_WITH_SIZE as 
following:

 .ACCESS_WITH_SIZE (REF_TO_OBJ, REF_TO_SIZE, ACCESS_MODE)

INTERNAL_FN (ACCESS_WITH_SIZE,  ECF_LEAF | ECF_NOTHROW, NULL)

which returns the “REF_TO_OBJ" same as the 1st argument;

1st argument “REF_TO_OBJ": Reference to the object;
2nd argument “REF_TO_SIZE”:  Reference to size of the object referenced by the 
1st argument, 
 if the object that the “REF_TO_OBJ” refered has a
   * real type, the SIZE that the “REF_TO_SIZE” referred is the number of the 
elements of the type;
   * void type, the SIZE that the “REF_TO_SIZE” referred is number of bytes; 
3rd argument "ACCESS_MODE": 
 -1: Unknown access semantics
  0: none
  1: read_only
  2: write_only
  3: read_write

NOTEs, 
 A. This new internal function is intended for a more general use from all the 
3 attributes, "access", "alloc_size", and the new "counted_by", to encode the 
"size" and "access_mode" information to the corresponding pointer. (in order to 
resolve PR96503, etc. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96503)
 B. For "counted_by" and "alloc_size" attributes, the 3rd argument will be -1.  
 
 C. In this wrieup, we focus on the implementation details for the "counted_by" 
attribute. However, this function should be ready to be used by "access" and 
"alloc_size" without issue. 

Although .ACCESS_WITH_SIZE is not PURE anymore, but it’s only read from the 2nd 
argument, and not modify anything in the pointed objects. So, we can adjust the 
IPA alias analysis phase with this details 
(ref_maybe_used_by_call_p_1/call_may_clobber_ref_p_1).

One more note: only the integer type is allowed for the SIZE, and in 
tree_object_size.cc, all the SIZE
 (in attributes “access”, “alloc_size”, etc) are converted to “sizetype”.  So, 
we don’t need to specify
The type of the size for “REF_TO_SIZE” since it’s always integer types and 
always converted to “sizetype” internally. 

Let me know any more comment or suggestion. 

Qing


On Nov 3, 2023, at 2:32 AM, Martin Uecker  wrote:
> 
> 
> Am Freitag, dem 03.11.2023 um 07:22 +0100 schrieb Jakub Jelinek:
>> On Fri, Nov 03, 2023 at 07:07:36AM +0100, Martin Uecker wrote:
>>> Am Donnerstag, dem 02.11.2023 um 17:28 -0700 schrieb Bill Wendling:
 On Thu, Nov 2, 2023 at 1:36 PM Qing Zhao  wrote:
> 
> Thanks a lot for raising these issues.
> 
> If I understand correctly,  the major question we need to answer is:
> 
> For the following example: (Jakub mentioned this  in an early message)
> 
>  1 struct S { int a; char b __attribute__((counted_by (a))) []; };
>  2 struct S s;
>  3 s.a = 5;
>  4 char *p = [2];
>  5 int i1 = __builtin_dynamic_object_size (p, 0);
>  6 s.a = 3;
>  7 int i2 = __builtin_dynamic_object_size (p, 0);
> 
> Should the 2nd __bdos call (line 7) get
>A. the latest value of s.a (line 6) for it’s size?
> Or  B. the value when the s.b was referenced (line 3, line 4)?
> 
 I personally think it should be (A). The user is specifically
 indicating that the size has somehow changed, and the compiler should
 behave accordingly.
>>> 
>>> 
>>> One potential problem for A apart from the potential impact on
>>> optimization is that the information may get lost more
>>> easily. Consider:
>>> 
>>> char *p = [2];
>>> f();
>>> int i = __bdos(p, 0);
>>> 
>>> If the compiler can not see into 'f', the information is lost
>>> because f may have changed the size.
>> 
>> Why?  It doesn't really matter.  The options are
>> A. p is at [2] associated with  and int type (or size of int
>>   or whatever); .ACCESS_WITH_SIZE can't be pure, but sure, for aliasing
>>   POV we can describe it with more detail that it doesn't modify anything
>>   in the pointed structure, just escapes the pointer; __bdos can stay
>>   leaf I believe; and when expanding __bdos later on, it would just
>>   dereference the associated pointer at that point (note, __bdos is
>>   pure, so it has vuse but not vdef and can load from memory); if
>>   f changes s.a, no problem, __bdos will load the changed value in there
> 
> Ah, I right. Because of the reload it doesn't matter. 
> Thank you for the explanation!
> 
> Martin
> 
>> B. if .ACCESS_WITH_SIZE associates the pointer with the s.a value from that
>>   point, .ACCESS_WITH_SIZE can be const, but obviously if f changes s.a,
>>   __bdos later will use s.a value from the [2] spot



Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-03 Thread Qing Zhao


> On Nov 3, 2023, at 10:46 AM, Jakub Jelinek  wrote:
> 
> On Fri, Nov 03, 2023 at 02:32:04PM +, Qing Zhao wrote:
>>> Why?  It doesn't really matter.  The options are
>>> A. p is at [2] associated with  and int type (or size of int
>>>  or whatever); .ACCESS_WITH_SIZE can't be pure,
>> 
>> .ACCESS_WITH_SIZE will only load the size from its address, no any write to 
>> memory.
>> It still can be PURE, right? (It will not be CONST anymore).
> 
> No, it can't be pure.  Because for the IL purposes, it needs to be treated
> as if it saves that address of the counter into some unnamed global variable
> somewhere.

Okay. I see.

>> 
>>> but sure, for aliasing
>>>  POV we can describe it with more detail that it doesn't modify anything
>>>  in the pointed structure, just escapes the pointer;
>> 
>> If we need to do this, where in the gcc code we need to add these details?
> 
> I think ref_maybe_used_by_call_p_1/call_may_clobber_ref_p_1, but Richi is
> expert here.

Just checked these routines, looks like that some other non-pure internal 
functions are handled here too.
For example, 
  case IFN_UBSAN_BOUNDS:
  case IFN_UBSAN_VPTR:
  case IFN_UBSAN_OBJECT_SIZE:
  case IFN_UBSAN_PTR:
  case IFN_ASAN_CHECK:

Looks like the correct place to adjust the new .ACCESS_WITH_SIZE. 
> 
>>> __bdos can stay
>>>  leaf I believe;
>> 
>> That’s good!  (I thought now _bdos will call .ACCESS_WITH_SIZE?)
> 
> No, it shouldn't call it obviously.  If tree-object-size.cc discovery tracks
> something to a pointer initialized by .ACCESS_WITH_SIZE call, then it should
> I believe recurse on the first argument of that call (say if one has
>  ptr_3 = malloc (sz_1);
>  ptr_2 = .ACCESS_WITH_SIZE (ptr_3, _3[4], ...);
> then supposedly __bdos later on should e.g. for 0/1 modes take minimum from
> ptr_3 (the size actually allocated)) and the the counter.

Yes, this is the situation in my mind too. 
I thought this might eliminate the LEAF feature from __bdos. -:) if not, that’s 
good.

Qing
> 
>   Jakub
> 



Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-03 Thread Jakub Jelinek
On Fri, Nov 03, 2023 at 02:32:04PM +, Qing Zhao wrote:
> > Why?  It doesn't really matter.  The options are
> > A. p is at [2] associated with  and int type (or size of int
> >   or whatever); .ACCESS_WITH_SIZE can't be pure,
> 
> .ACCESS_WITH_SIZE will only load the size from its address, no any write to 
> memory.
> It still can be PURE, right? (It will not be CONST anymore).

No, it can't be pure.  Because for the IL purposes, it needs to be treated
as if it saves that address of the counter into some unnamed global variable
somewhere.
> 
> > but sure, for aliasing
> >   POV we can describe it with more detail that it doesn't modify anything
> >   in the pointed structure, just escapes the pointer;
> 
> If we need to do this, where in the gcc code we need to add these details?

I think ref_maybe_used_by_call_p_1/call_may_clobber_ref_p_1, but Richi is
expert here.

> > __bdos can stay
> >   leaf I believe;
> 
> That’s good!  (I thought now _bdos will call .ACCESS_WITH_SIZE?)

No, it shouldn't call it obviously.  If tree-object-size.cc discovery tracks
something to a pointer initialized by .ACCESS_WITH_SIZE call, then it should
I believe recurse on the first argument of that call (say if one has
  ptr_3 = malloc (sz_1);
  ptr_2 = .ACCESS_WITH_SIZE (ptr_3, _3[4], ...);
then supposedly __bdos later on should e.g. for 0/1 modes take minimum from
ptr_3 (the size actually allocated)) and the the counter.

Jakub



Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-03 Thread Qing Zhao


> On Nov 3, 2023, at 2:22 AM, Jakub Jelinek  wrote:
> 
> On Fri, Nov 03, 2023 at 07:07:36AM +0100, Martin Uecker wrote:
>> Am Donnerstag, dem 02.11.2023 um 17:28 -0700 schrieb Bill Wendling:
>>> On Thu, Nov 2, 2023 at 1:36 PM Qing Zhao  wrote:
 
 Thanks a lot for raising these issues.
 
 If I understand correctly,  the major question we need to answer is:
 
 For the following example: (Jakub mentioned this  in an early message)
 
  1 struct S { int a; char b __attribute__((counted_by (a))) []; };
  2 struct S s;
  3 s.a = 5;
  4 char *p = [2];
  5 int i1 = __builtin_dynamic_object_size (p, 0);
  6 s.a = 3;
  7 int i2 = __builtin_dynamic_object_size (p, 0);
 
 Should the 2nd __bdos call (line 7) get
A. the latest value of s.a (line 6) for it’s size?
 Or  B. the value when the s.b was referenced (line 3, line 4)?
 
>>> I personally think it should be (A). The user is specifically
>>> indicating that the size has somehow changed, and the compiler should
>>> behave accordingly.
>> 
>> 
>> One potential problem for A apart from the potential impact on
>> optimization is that the information may get lost more
>> easily. Consider:
>> 
>> char *p = [2];
>> f();
>> int i = __bdos(p, 0);
>> 
>> If the compiler can not see into 'f', the information is lost
>> because f may have changed the size.
> 
> Why?  It doesn't really matter.  The options are
> A. p is at [2] associated with  and int type (or size of int
>   or whatever); .ACCESS_WITH_SIZE can't be pure,

.ACCESS_WITH_SIZE will only load the size from its address, no any write to 
memory.
It still can be PURE, right? (It will not be CONST anymore).

> but sure, for aliasing
>   POV we can describe it with more detail that it doesn't modify anything
>   in the pointed structure, just escapes the pointer;

If we need to do this, where in the gcc code we need to add these details?

> __bdos can stay
>   leaf I believe;

That’s good!  (I thought now _bdos will call .ACCESS_WITH_SIZE?)

Qing

> and when expanding __bdos later on, it would just
>   dereference the associated pointer at that point (note, __bdos is
>   pure, so it has vuse but not vdef and can load from memory); if
>   f changes s.a, no problem, __bdos will load the changed value in there
> B. if .ACCESS_WITH_SIZE associates the pointer with the s.a value from that
>   point, .ACCESS_WITH_SIZE can be const, but obviously if f changes s.a,
>   __bdos later will use s.a value from the [2] spot
> 
>   Jakub
> 



Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-03 Thread Martin Uecker
Am Freitag, dem 03.11.2023 um 07:22 +0100 schrieb Jakub Jelinek:
> On Fri, Nov 03, 2023 at 07:07:36AM +0100, Martin Uecker wrote:
> > Am Donnerstag, dem 02.11.2023 um 17:28 -0700 schrieb Bill Wendling:
> > > On Thu, Nov 2, 2023 at 1:36 PM Qing Zhao  wrote:
> > > > 
> > > > Thanks a lot for raising these issues.
> > > > 
> > > > If I understand correctly,  the major question we need to answer is:
> > > > 
> > > > For the following example: (Jakub mentioned this  in an early message)
> > > > 
> > > >   1 struct S { int a; char b __attribute__((counted_by (a))) []; };
> > > >   2 struct S s;
> > > >   3 s.a = 5;
> > > >   4 char *p = [2];
> > > >   5 int i1 = __builtin_dynamic_object_size (p, 0);
> > > >   6 s.a = 3;
> > > >   7 int i2 = __builtin_dynamic_object_size (p, 0);
> > > > 
> > > > Should the 2nd __bdos call (line 7) get
> > > > A. the latest value of s.a (line 6) for it’s size?
> > > > Or  B. the value when the s.b was referenced (line 3, line 4)?
> > > > 
> > > I personally think it should be (A). The user is specifically
> > > indicating that the size has somehow changed, and the compiler should
> > > behave accordingly.
> > 
> > 
> > One potential problem for A apart from the potential impact on
> > optimization is that the information may get lost more
> > easily. Consider:
> > 
> > char *p = [2];
> > f();
> > int i = __bdos(p, 0);
> > 
> > If the compiler can not see into 'f', the information is lost
> > because f may have changed the size.
> 
> Why?  It doesn't really matter.  The options are
> A. p is at [2] associated with  and int type (or size of int
>or whatever); .ACCESS_WITH_SIZE can't be pure, but sure, for aliasing
>POV we can describe it with more detail that it doesn't modify anything
>in the pointed structure, just escapes the pointer; __bdos can stay
>leaf I believe; and when expanding __bdos later on, it would just
>dereference the associated pointer at that point (note, __bdos is
>pure, so it has vuse but not vdef and can load from memory); if
>f changes s.a, no problem, __bdos will load the changed value in there

Ah, I right. Because of the reload it doesn't matter. 
Thank you for the explanation!

Martin

> B. if .ACCESS_WITH_SIZE associates the pointer with the s.a value from that
>point, .ACCESS_WITH_SIZE can be const, but obviously if f changes s.a,
>__bdos later will use s.a value from the [2] spot



Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-03 Thread Jakub Jelinek
On Fri, Nov 03, 2023 at 07:07:36AM +0100, Martin Uecker wrote:
> Am Donnerstag, dem 02.11.2023 um 17:28 -0700 schrieb Bill Wendling:
> > On Thu, Nov 2, 2023 at 1:36 PM Qing Zhao  wrote:
> > > 
> > > Thanks a lot for raising these issues.
> > > 
> > > If I understand correctly,  the major question we need to answer is:
> > > 
> > > For the following example: (Jakub mentioned this  in an early message)
> > > 
> > >   1 struct S { int a; char b __attribute__((counted_by (a))) []; };
> > >   2 struct S s;
> > >   3 s.a = 5;
> > >   4 char *p = [2];
> > >   5 int i1 = __builtin_dynamic_object_size (p, 0);
> > >   6 s.a = 3;
> > >   7 int i2 = __builtin_dynamic_object_size (p, 0);
> > > 
> > > Should the 2nd __bdos call (line 7) get
> > > A. the latest value of s.a (line 6) for it’s size?
> > > Or  B. the value when the s.b was referenced (line 3, line 4)?
> > > 
> > I personally think it should be (A). The user is specifically
> > indicating that the size has somehow changed, and the compiler should
> > behave accordingly.
> 
> 
> One potential problem for A apart from the potential impact on
> optimization is that the information may get lost more
> easily. Consider:
> 
> char *p = [2];
> f();
> int i = __bdos(p, 0);
> 
> If the compiler can not see into 'f', the information is lost
> because f may have changed the size.

Why?  It doesn't really matter.  The options are
A. p is at [2] associated with  and int type (or size of int
   or whatever); .ACCESS_WITH_SIZE can't be pure, but sure, for aliasing
   POV we can describe it with more detail that it doesn't modify anything
   in the pointed structure, just escapes the pointer; __bdos can stay
   leaf I believe; and when expanding __bdos later on, it would just
   dereference the associated pointer at that point (note, __bdos is
   pure, so it has vuse but not vdef and can load from memory); if
   f changes s.a, no problem, __bdos will load the changed value in there
B. if .ACCESS_WITH_SIZE associates the pointer with the s.a value from that
   point, .ACCESS_WITH_SIZE can be const, but obviously if f changes s.a,
   __bdos later will use s.a value from the [2] spot

Jakub



Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-03 Thread Martin Uecker
Am Donnerstag, dem 02.11.2023 um 17:28 -0700 schrieb Bill Wendling:
> On Thu, Nov 2, 2023 at 1:36 PM Qing Zhao  wrote:
> > 
> > Thanks a lot for raising these issues.
> > 
> > If I understand correctly,  the major question we need to answer is:
> > 
> > For the following example: (Jakub mentioned this  in an early message)
> > 
> >   1 struct S { int a; char b __attribute__((counted_by (a))) []; };
> >   2 struct S s;
> >   3 s.a = 5;
> >   4 char *p = [2];
> >   5 int i1 = __builtin_dynamic_object_size (p, 0);
> >   6 s.a = 3;
> >   7 int i2 = __builtin_dynamic_object_size (p, 0);
> > 
> > Should the 2nd __bdos call (line 7) get
> > A. the latest value of s.a (line 6) for it’s size?
> > Or  B. the value when the s.b was referenced (line 3, line 4)?
> > 
> I personally think it should be (A). The user is specifically
> indicating that the size has somehow changed, and the compiler should
> behave accordingly.


One potential problem for A apart from the potential impact on
optimization is that the information may get lost more
easily. Consider:

char *p = [2];
f();
int i = __bdos(p, 0);

If the compiler can not see into 'f', the information is lost
because f may have changed the size.

And if I understand it correctly, if the pointers escapes
with .ACCESS_WITH_SIZE, then this is already true for:

char *p = [2];
g();
int i = __bdos(p, 0);


If we make it UB to change the size, then I guess we could
also delay this choice.  Or we implement B but have a UBSan
option based on A that only verifies at run-time that the size 
did not change.


Martin


> 
> > A should be more convenient for the user to use the dynamic array feature.
> > With B, the user has to modify the source code (to add code to “re-obtain”
> > the pointer after the size was adjusted at line 6) as mentioned by Richard.
> > 
> > This depends on how we design the new internal function .ACCESS_WITH_SIZE
> > 
> > 1. Size is passed by value to .ACCESS_WITH_SIZE as we currently designed.
> > 
> > PTR = .ACCESS_WITH_SIZE (PTR, SIZE, ACCESS_MODE)
> > 
> > 2. Size is passed by reference to .ACCESS_WITH_SIZE as Jakub suggested.
> > 
> > PTR = .ACCESS_WITH_SIZE(PTR, , TYPEOFSIZE, ACCESS_MODE)
> > 
> > With 1, We can only provide B, the user needs to modify the source code to 
> > get the full feature of dynamic array;
> > With 2, We can provide  A, the user will get full support to the dynamic 
> > array without restrictions in the source code.
> > 
> My understanding of ACCESS_WITH_SIZE is that it's there to add an
> explicit reference to SIZE so that the optimizers won't reorder the
> code incorrectly. If that's the case, then it should act as if
> ACCESS_WITH_SIZE wasn't even there (i.e. it's just a pointer
> dereference into the FAM). We get that with (2) it appears. It would
> be a major headache to make the user go throughout their code base to
> ensure that SIZE was either unmodified, or if it was that extra code
> must be added to ensure the expected behavior.
> 
> > However, We have to pay additional cost for supporting A by using 2, which 
> > includes:
> > 
> > 1. .ACCESS_WITH_SIZE will become an escape point, which will further impact 
> > the IPA optimizations, more runtime overhead.
> > Then .ACCESS_WTH_SIZE will not be CONST, right? But it will still be 
> > PURE?
> > 
> > 2. __builtin_dynamic_object_size will NOT be LEAF anymore.  This will also 
> > impact some IPA optimizations, more runtime overhead.
> > 
> > I think the following are the factors that make the decision:
> > 
> > 1. How big the performance impact?
> > 2. How important the dynamic array feature? Is adding some user 
> > restrictions as Richard mentioned feasible to support this feature?
> > 
> > Maybe we can implement 1 first, if the full support to the dynamic array is 
> > needed, we can add 2 then?
> > Or, we can implement both, and compare the performance difference, then 
> > decide?
> > 
> > Qing
> > 



Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-02 Thread Bill Wendling
On Thu, Nov 2, 2023 at 1:36 PM Qing Zhao  wrote:
>
> Thanks a lot for raising these issues.
>
> If I understand correctly,  the major question we need to answer is:
>
> For the following example: (Jakub mentioned this  in an early message)
>
>   1 struct S { int a; char b __attribute__((counted_by (a))) []; };
>   2 struct S s;
>   3 s.a = 5;
>   4 char *p = [2];
>   5 int i1 = __builtin_dynamic_object_size (p, 0);
>   6 s.a = 3;
>   7 int i2 = __builtin_dynamic_object_size (p, 0);
>
> Should the 2nd __bdos call (line 7) get
> A. the latest value of s.a (line 6) for it’s size?
> Or  B. the value when the s.b was referenced (line 3, line 4)?
>
I personally think it should be (A). The user is specifically
indicating that the size has somehow changed, and the compiler should
behave accordingly.

> A should be more convenient for the user to use the dynamic array feature.
> With B, the user has to modify the source code (to add code to “re-obtain”
> the pointer after the size was adjusted at line 6) as mentioned by Richard.
>
> This depends on how we design the new internal function .ACCESS_WITH_SIZE
>
> 1. Size is passed by value to .ACCESS_WITH_SIZE as we currently designed.
>
> PTR = .ACCESS_WITH_SIZE (PTR, SIZE, ACCESS_MODE)
>
> 2. Size is passed by reference to .ACCESS_WITH_SIZE as Jakub suggested.
>
> PTR = .ACCESS_WITH_SIZE(PTR, , TYPEOFSIZE, ACCESS_MODE)
>
> With 1, We can only provide B, the user needs to modify the source code to 
> get the full feature of dynamic array;
> With 2, We can provide  A, the user will get full support to the dynamic 
> array without restrictions in the source code.
>
My understanding of ACCESS_WITH_SIZE is that it's there to add an
explicit reference to SIZE so that the optimizers won't reorder the
code incorrectly. If that's the case, then it should act as if
ACCESS_WITH_SIZE wasn't even there (i.e. it's just a pointer
dereference into the FAM). We get that with (2) it appears. It would
be a major headache to make the user go throughout their code base to
ensure that SIZE was either unmodified, or if it was that extra code
must be added to ensure the expected behavior.

> However, We have to pay additional cost for supporting A by using 2, which 
> includes:
>
> 1. .ACCESS_WITH_SIZE will become an escape point, which will further impact 
> the IPA optimizations, more runtime overhead.
> Then .ACCESS_WTH_SIZE will not be CONST, right? But it will still be PURE?
>
> 2. __builtin_dynamic_object_size will NOT be LEAF anymore.  This will also 
> impact some IPA optimizations, more runtime overhead.
>
> I think the following are the factors that make the decision:
>
> 1. How big the performance impact?
> 2. How important the dynamic array feature? Is adding some user restrictions 
> as Richard mentioned feasible to support this feature?
>
> Maybe we can implement 1 first, if the full support to the dynamic array is 
> needed, we can add 2 then?
> Or, we can implement both, and compare the performance difference, then 
> decide?
>
> Qing
>


Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-02 Thread Bill Wendling
On Thu, Nov 2, 2023 at 1:00 AM Richard Biener
 wrote:
>
> On Wed, Nov 1, 2023 at 3:47 PM Qing Zhao  wrote:
> >
> >
> >
> > > On Oct 31, 2023, at 6:14 PM, Joseph Myers  wrote:
> > >
> > > On Tue, 31 Oct 2023, Qing Zhao wrote:
> > >
> > >> 2.3 A new semantic requirement in the user documentation of "counted_by"
> > >>
> > >> For the following structure including a FAM with a counted_by attribute:
> > >>
> > >>  struct A
> > >>  {
> > >>   size_t size;
> > >>   char buf[] __attribute__((counted_by(size)));
> > >>  };
> > >>
> > >> for any object with such type:
> > >>
> > >>  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
> > >>
> > >> The setting to the size field should be done before the first reference
> > >> to the FAM field.
> > >>
> > >> Such requirement to the user will guarantee that the first reference to
> > >> the FAM knows the size of the FAM.
> > >>
> > >> We need to add this additional requirement to the user document.
> > >
> > > Make sure the manual is very specific about exactly when size is
> > > considered to be an accurate representation of the space available for buf
> > > (given that, after malloc or realloc, it's going to be temporarily
> > > inaccurate).  If the intent is that inaccurate size at such a time means
> > > undefined behavior, say so explicitly.
> >
> > Yes, good point. We need to define this clearly in the beginning.
> > We need to explicit say that
> >
> > the size of the FAM is defined by the latest “counted_by” value. And it’s 
> > an undefined behavior when the size field is not defined when the FAM is 
> > referenced.
> >
> > Is the above good enough?
> >
> >
> > >
> > >> 2.4 Replace FAM field accesses with the new function ACCESS_WITH_SIZE
> > >>
> > >> In C FE:
> > >>
> > >> for every reference to a FAM, for example, "obj->buf" in the small 
> > >> example,
> > >>  check whether the corresponding FIELD_DECL has a "counted_by" attribute?
> > >>  if YES, replace the reference to "obj->buf" with a call to
> > >>  .ACCESS_WITH_SIZE (obj->buf, obj->size, -1);
> > >
> > > This seems plausible - but you should also consider the case of static
> > > initializers - remember the GNU extension for statically allocated objects
> > > with flexible array members (unless you're not allowing it with
> > > counted_by).
> > >
> > > static struct A x = { sizeof "hello", "hello" };
> > > static char *y = 
> > >
> > > I'd expect that to be valid - and unless you say such a usage is invalid,
> >
> > At this moment, I think that this should be valid.
> >
> > I,e, the following:
> >
> > struct A
> > {
> >  size_t size;
> >  char buf[] __attribute__((counted_by(size)));
> > };
> >
> > static struct A x = {sizeof "hello", "hello”};
> >
> > Should be valid, and x.size represents the number of elements of x.buf.
> > Both x.size and x.buf are initialized statically.
> >
> > > you should avoid the replacement in such a static initializer context when
> > > the FAM reference is to an object with a constant address (if
> > > .ACCESS_WITH_SIZE would not act as an lvalue whose address is a constant
> > > expression; if it works fine as a constant-address lvalue, then the
> > > replacement would be OK).
> >
> > Then if such usage for the “counted_by” is valid, we need to replace the FAM
> > reference by a call to  .ACCESS_WITH_SIZE as well.
> > Otherwise the “counted_by” relationship will be lost to the Middle end.
> >
> > With the current definition of .ACCESS_WITH_SIZE
> >
> > PTR = .ACCESS_WITH_SIZE (PTR, SIZE, ACCESS_MODE)
> >
> > Isn’t the PTR (return value of the call) a LVALUE?
>
> You probably want to specify that when a pointer to the array is taken the
> pointer has to be to the first array element (or do we want to mangle the
> 'size' accordingly for the instrumentation?).  You also want to specify that
> the 'size' associated with such pointer is assumed to be unchanging and
> after changing the size such pointer has to be re-obtained.  Plus that
> changes to the allocated object/size have to be performed through an
> lvalue where the containing type and thus the 'counted_by' attribute is
> visible.  That is,
>
> size_t *s = 
> *s = 1;
>
> is invoking undefined behavior, likewise modifying 'buf' (makes it a bit
> awkward since for example that wouldn't support using posix_memalign
> for allocation, though aligned_alloc would be fine).
>
I believe Qing's original documentation for counted_by makes the
relationship between 'size' and the FAM very clear and that without
agreement it'll result in undefined behavior. Though it might be best
to state that in a strong way.

-bw


Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-02 Thread Qing Zhao



> On Nov 2, 2023, at 8:09 AM, Jakub Jelinek  wrote:
> 
> On Thu, Nov 02, 2023 at 12:52:50PM +0100, Richard Biener wrote:
>>> What I meant is to emit
>>> tmp_4 = .ACCESS_WITH_SIZE ([0], , (typeof ()) 0);
>>> p_5 = _4[2];
>>> i.e. don't associate the pointer with a value of the size, but with
>>> an address where to find the size (plus how large it is), basically escape
>>> pointer to the size at that point.  And __builtin_dynamic_object_size is 
>>> pure,
>>> so supposedly it can depend on what the escaped pointer points to.
>> 
>> Well, yeah - that would work but depend on .ACCESS_WITH_SIZE being an
>> escape point (quite bad IMHO)
> 
> That is why I've said we need to decide what cost we want to suffer because
> of that.
> 
>> and __builtin_dynamic_object_size being
>> non-const (that's probably not too bad).
> 
> It is already pure,leaf,nothrow (unlike __builtin_object_size which is 
> obviously
> const,leaf,nothrow).  Because under the hood, it can read memory when
> expanded.
> 
>>> We'd see that a particular pointer is size associated with  address
>>> and would use that address cast to the type of the third argument (to
>>> preserve the exact pointer type on INTEGER_CST, though not sure, wouldn't
>>> VN CSE it anyway if one has say
>>> union U { struct S { int a; char b __attribute__((counted_by (a))) []; } s;
>>>  struct T { char c, d, e, f; char g __attribute__((counted_by (c))) 
>>> []; } t; };
>>> and
>>> .ACCESS_WITH_SIZE ([0], , (int *) 0);
>>> ...
>>> .ACCESS_WITH_SIZE ([0], , (int *) 0);
>>> ?
>> 
>> We'd probably CSE that - the usual issue of address-with-same-value.
>> 
>>> It would mean though that counted_by wouldn't be allowed to be a
>>> bit-field...
>> 
>> Yup.  We could also pass a pointer to the container though, that's good 
>> enough
>> for the escape, and pass the size by value in addition to that.
> 
> I was wondering about stuff like _BitInt.  But sure, counted_by is just an
> extension, we can just refuse counting by _BitInt in addition to counting by
> floating point, pointers, aggregates, bit-fields, or we could somehow encode
> all the needed type's properties numerically into an integral constant.
> Similarly for alias set (unless it uses 0 for reads).

counted_by currently is limited to INTEGER type. This should resolve this 
issue, right?

Qing
> 
>   Jakub
> 



Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-02 Thread Qing Zhao


> On Nov 2, 2023, at 7:52 AM, Richard Biener  wrote:
> 
> On Thu, Nov 2, 2023 at 11:40 AM Jakub Jelinek  wrote:
>> 
>> On Thu, Nov 02, 2023 at 11:18:09AM +0100, Richard Biener wrote:
 Or, if we want to pay further price, .ACCESS_WITH_SIZE could take as one of
 the arguments not the size value, but its address.  Then at __bdos time
 we would dereference that pointer to get the size.
 So,
 struct S { int a; char b __attribute__((counted_by (a))) []; };
 struct S s;
 s.a = 5;
 char *p = [2];
 int i1 = __builtin_dynamic_object_size (p, 0);
 s.a = 3;
 int i2 = __builtin_dynamic_object_size (p, 0);
 would then yield 3 and 1 rather than 3 and 3.
>>> 
>>> I fail to see how we can get the __builtin_dynamic_object_size call
>>> data dependent on s.a, thus avoid re-ordering or even DSE of the
>>> store.
>> 
>> If [2] is lowered as
>> sz_1 = s.a;
>> tmp_2 = .ACCESS_WITH_SIZE ([0], sz_1);
>> p_3 = _2[2];
>> then sure, there is no way, you get the size from that point.
>> tree-object-size.cc tracking then determines that in a particular
>> case the pointer is size associated with sz_1 and use that value
>> as the size (with the usual adjustments for pointer arithmetics and the
>> like).
>> 
>> What I meant is to emit
>> tmp_4 = .ACCESS_WITH_SIZE ([0], , (typeof ()) 0);
>> p_5 = _4[2];
>> i.e. don't associate the pointer with a value of the size, but with
>> an address where to find the size (plus how large it is), basically escape
>> pointer to the size at that point.  And __builtin_dynamic_object_size is 
>> pure,
>> so supposedly it can depend on what the escaped pointer points to.
> 
> Well, yeah - that would work but depend on .ACCESS_WITH_SIZE being an
> escape point (quite bad IMHO) and __builtin_dynamic_object_size being
> non-const (that's probably not too bad).
> 
>> We'd see that a particular pointer is size associated with  address
>> and would use that address cast to the type of the third argument (to
>> preserve the exact pointer type on INTEGER_CST, though not sure, wouldn't
>> VN CSE it anyway if one has say
>> union U { struct S { int a; char b __attribute__((counted_by (a))) []; } s;
>>  struct T { char c, d, e, f; char g __attribute__((counted_by (c))) 
>> []; } t; };
>> and
>> .ACCESS_WITH_SIZE ([0], , (int *) 0);
>> ...
>> .ACCESS_WITH_SIZE ([0], , (int *) 0);
>> ?
> 
> We'd probably CSE that - the usual issue of address-with-same-value.
> 
>> It would mean though that counted_by wouldn't be allowed to be a
>> bit-field...
> 
> Yup.  We could also pass a pointer to the container though, that's good enough
> for the escape, and pass the size by value in addition to that.
Could you explain a little bit more here? Then the .ACCESS_WITH_SIZE will become

PTR = .ACCESS_WITH_SIZE (PTR, ’s Container, SIZE, ACCESS_MODE)

??

> 
>>Jakub
>> 



Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-02 Thread Qing Zhao
Thanks a lot for raising these issues. 

If I understand correctly,  the major question we need to answer is:

For the following example: (Jakub mentioned this  in an early message)

  1 struct S { int a; char b __attribute__((counted_by (a))) []; };
  2 struct S s;
  3 s.a = 5;
  4 char *p = [2];
  5 int i1 = __builtin_dynamic_object_size (p, 0);
  6 s.a = 3;
  7 int i2 = __builtin_dynamic_object_size (p, 0);

Should the 2nd __bdos call (line 7) get
A. the latest value of s.a (line 6) for it’s size? 
Or  B. the value when the s.b was referenced (line 3, line 4)?

A should be more convenient for the user to use the dynamic array feature.
With B, the user has to modify the source code (to add code to “re-obtain” 
the pointer after the size was adjusted at line 6) as mentioned by Richard. 

This depends on how we design the new internal function .ACCESS_WITH_SIZE

1. Size is passed by value to .ACCESS_WITH_SIZE as we currently designed. 

PTR = .ACCESS_WITH_SIZE (PTR, SIZE, ACCESS_MODE)

2. Size is passed by reference to .ACCESS_WITH_SIZE as Jakub suggested.

PTR = .ACCESS_WITH_SIZE(PTR, , TYPEOFSIZE, ACCESS_MODE)

With 1, We can only provide B, the user needs to modify the source code to get 
the full feature of dynamic array;
With 2, We can provide  A, the user will get full support to the dynamic array 
without restrictions in the source code. 

However, We have to pay additional cost for supporting A by using 2, which 
includes:

1. .ACCESS_WITH_SIZE will become an escape point, which will further impact the 
IPA optimizations, more runtime overhead. 
Then .ACCESS_WTH_SIZE will not be CONST, right? But it will still be PURE?

2. __builtin_dynamic_object_size will NOT be LEAF anymore.  This will also 
impact some IPA optimizations, more runtime overhead. 

I think the following are the factors that make the decision:

1. How big the performance impact?
2. How important the dynamic array feature? Is adding some user restrictions as 
Richard mentioned feasible to support this feature?

Maybe we can implement 1 first, if the full support to the dynamic array is 
needed, we can add 2 then? 
Or, we can implement both, and compare the performance difference, then decide?

Qing




> On Nov 2, 2023, at 8:09 AM, Jakub Jelinek  wrote:
> 
> On Thu, Nov 02, 2023 at 12:52:50PM +0100, Richard Biener wrote:
>>> What I meant is to emit
>>> tmp_4 = .ACCESS_WITH_SIZE ([0], , (typeof ()) 0);
>>> p_5 = _4[2];
>>> i.e. don't associate the pointer with a value of the size, but with
>>> an address where to find the size (plus how large it is), basically escape
>>> pointer to the size at that point.  And __builtin_dynamic_object_size is 
>>> pure,
>>> so supposedly it can depend on what the escaped pointer points to.
>> 
>> Well, yeah - that would work but depend on .ACCESS_WITH_SIZE being an
>> escape point (quite bad IMHO)
> 
> That is why I've said we need to decide what cost we want to suffer because
> of that.
> 
>> and __builtin_dynamic_object_size being
>> non-const (that's probably not too bad).
> 
> It is already pure,leaf,nothrow (unlike __builtin_object_size which is 
> obviously
> const,leaf,nothrow).  Because under the hood, it can read memory when
> expanded.
> 
>>> We'd see that a particular pointer is size associated with  address
>>> and would use that address cast to the type of the third argument (to
>>> preserve the exact pointer type on INTEGER_CST, though not sure, wouldn't
>>> VN CSE it anyway if one has say
>>> union U { struct S { int a; char b __attribute__((counted_by (a))) []; } s;
>>>  struct T { char c, d, e, f; char g __attribute__((counted_by (c))) 
>>> []; } t; };
>>> and
>>> .ACCESS_WITH_SIZE ([0], , (int *) 0);
>>> ...
>>> .ACCESS_WITH_SIZE ([0], , (int *) 0);
>>> ?
>> 
>> We'd probably CSE that - the usual issue of address-with-same-value.
>> 
>>> It would mean though that counted_by wouldn't be allowed to be a
>>> bit-field...
>> 
>> Yup.  We could also pass a pointer to the container though, that's good 
>> enough
>> for the escape, and pass the size by value in addition to that.
> 
> I was wondering about stuff like _BitInt.  But sure, counted_by is just an
> extension, we can just refuse counting by _BitInt in addition to counting by
> floating point, pointers, aggregates, bit-fields, or we could somehow encode
> all the needed type's properties numerically into an integral constant.
> Similarly for alias set (unless it uses 0 for reads).
> 
>   Jakub
> 



Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-02 Thread Siddhesh Poyarekar

On 2023-11-02 10:12, Martin Uecker wrote:

This shouldn't be necessary. The object-size pass
can track pointer arithmeti if it comes after
inserting the .ACCESS_WITH_SIZE.

https://godbolt.org/z/fvc3aoPfd


The problem is dependency tracking through the pointer arithmetic, which 
Jakub suggested to work around by passing a reference to the size in 
.ACCESS_WITH_SIZE to avoid DCE/reordering.


Thanks,
Sid


Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-02 Thread Qing Zhao


> On Nov 2, 2023, at 9:54 AM, Richard Biener  wrote:
> 
> On Thu, Nov 2, 2023 at 2:50 PM Qing Zhao  wrote:
>> 
>> 
>> 
>>> On Nov 2, 2023, at 3:57 AM, Richard Biener  
>>> wrote:
>>> 
>>> On Wed, Nov 1, 2023 at 3:47 PM Qing Zhao  wrote:
 
 
 
> On Oct 31, 2023, at 6:14 PM, Joseph Myers  wrote:
> 
> On Tue, 31 Oct 2023, Qing Zhao wrote:
> 
>> 2.3 A new semantic requirement in the user documentation of "counted_by"
>> 
>> For the following structure including a FAM with a counted_by attribute:
>> 
>> struct A
>> {
>> size_t size;
>> char buf[] __attribute__((counted_by(size)));
>> };
>> 
>> for any object with such type:
>> 
>> struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
>> 
>> The setting to the size field should be done before the first reference
>> to the FAM field.
>> 
>> Such requirement to the user will guarantee that the first reference to
>> the FAM knows the size of the FAM.
>> 
>> We need to add this additional requirement to the user document.
> 
> Make sure the manual is very specific about exactly when size is
> considered to be an accurate representation of the space available for buf
> (given that, after malloc or realloc, it's going to be temporarily
> inaccurate).  If the intent is that inaccurate size at such a time means
> undefined behavior, say so explicitly.
 
 Yes, good point. We need to define this clearly in the beginning.
 We need to explicit say that
 
 the size of the FAM is defined by the latest “counted_by” value. And it’s 
 an undefined behavior when the size field is not defined when the FAM is 
 referenced.
 
 Is the above good enough?
 
 
> 
>> 2.4 Replace FAM field accesses with the new function ACCESS_WITH_SIZE
>> 
>> In C FE:
>> 
>> for every reference to a FAM, for example, "obj->buf" in the small 
>> example,
>> check whether the corresponding FIELD_DECL has a "counted_by" attribute?
>> if YES, replace the reference to "obj->buf" with a call to
>>.ACCESS_WITH_SIZE (obj->buf, obj->size, -1);
> 
> This seems plausible - but you should also consider the case of static
> initializers - remember the GNU extension for statically allocated objects
> with flexible array members (unless you're not allowing it with
> counted_by).
> 
> static struct A x = { sizeof "hello", "hello" };
> static char *y = 
> 
> I'd expect that to be valid - and unless you say such a usage is invalid,
 
 At this moment, I think that this should be valid.
 
 I,e, the following:
 
 struct A
 {
 size_t size;
 char buf[] __attribute__((counted_by(size)));
 };
 
 static struct A x = {sizeof "hello", "hello”};
 
 Should be valid, and x.size represents the number of elements of x.buf.
 Both x.size and x.buf are initialized statically.
 
> you should avoid the replacement in such a static initializer context when
> the FAM reference is to an object with a constant address (if
> .ACCESS_WITH_SIZE would not act as an lvalue whose address is a constant
> expression; if it works fine as a constant-address lvalue, then the
> replacement would be OK).
 
 Then if such usage for the “counted_by” is valid, we need to replace the 
 FAM
 reference by a call to  .ACCESS_WITH_SIZE as well.
 Otherwise the “counted_by” relationship will be lost to the Middle end.
 
 With the current definition of .ACCESS_WITH_SIZE
 
 PTR = .ACCESS_WITH_SIZE (PTR, SIZE, ACCESS_MODE)
 
 Isn’t the PTR (return value of the call) a LVALUE?
>>> 
>>> You probably want to specify that when a pointer to the array is taken the
>>> pointer has to be to the first array element (or do we want to mangle the
>>> 'size' accordingly for the instrumentation?).
>> 
>> Yes. Will add this into the user documentation.
>> 
>>> You also want to specify that
>>> the 'size' associated with such pointer is assumed to be unchanging and
>>> after changing the size such pointer has to be re-obtained.
>> 
>> What do you mean by “re-obtained”?
> 
> do
> 
> p = [0];
> 
> after any adjustments to 'array' or 'len' again and base further accesses on
> the new 'p'.


Then for the following example form Kees:

struct foo *f;
char *p;
int i;

f = alloc(maximum_possible);
f->count = 0;
p = f->buf;

for (i; data_is_available() && i < maximum_possible; i++) {
f->count ++;
p[i] = next_data_item();
}

Will not work?

We have to change it as:

struct foo *f;
char *p;
int i;

f = alloc(maximum_possible);
f->count = 0;
p = f->buf;

for (i; data_is_available() && i < maximum_possible; i++) {
f->count ++;
   

Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-02 Thread Martin Uecker
Am Donnerstag, dem 02.11.2023 um 13:50 + schrieb Qing Zhao:
> 
> > On Nov 2, 2023, at 3:57 AM, Richard Biener  
> > wrote:
> > 
> > On Wed, Nov 1, 2023 at 3:47 PM Qing Zhao  wrote:
> > > 
> > > 
> > > 
> > > > On Oct 31, 2023, at 6:14 PM, Joseph Myers  
> > > > wrote:
> > > > 
> > > > On Tue, 31 Oct 2023, Qing Zhao wrote:
> > > > 
> > > > > 2.3 A new semantic requirement in the user documentation of 
> > > > > "counted_by"
> > > > > 
> > > > > For the following structure including a FAM with a counted_by 
> > > > > attribute:
> > > > > 
> > > > > struct A
> > > > > {
> > > > >  size_t size;
> > > > >  char buf[] __attribute__((counted_by(size)));
> > > > > };
> > > > > 
> > > > > for any object with such type:
> > > > > 
> > > > > struct A *obj = __builtin_malloc (sizeof(struct A) + sz * 
> > > > > sizeof(char));
> > > > > 
> > > > > The setting to the size field should be done before the first 
> > > > > reference
> > > > > to the FAM field.
> > > > > 
> > > > > Such requirement to the user will guarantee that the first reference 
> > > > > to
> > > > > the FAM knows the size of the FAM.
> > > > > 
> > > > > We need to add this additional requirement to the user document.
> > > > 
> > > > Make sure the manual is very specific about exactly when size is
> > > > considered to be an accurate representation of the space available for 
> > > > buf
> > > > (given that, after malloc or realloc, it's going to be temporarily
> > > > inaccurate).  If the intent is that inaccurate size at such a time means
> > > > undefined behavior, say so explicitly.
> > > 
> > > Yes, good point. We need to define this clearly in the beginning.
> > > We need to explicit say that
> > > 
> > > the size of the FAM is defined by the latest “counted_by” value. And it’s 
> > > an undefined behavior when the size field is not defined when the FAM is 
> > > referenced.
> > > 
> > > Is the above good enough?
> > > 
> > > 
> > > > 
> > > > > 2.4 Replace FAM field accesses with the new function ACCESS_WITH_SIZE
> > > > > 
> > > > > In C FE:
> > > > > 
> > > > > for every reference to a FAM, for example, "obj->buf" in the small 
> > > > > example,
> > > > > check whether the corresponding FIELD_DECL has a "counted_by" 
> > > > > attribute?
> > > > > if YES, replace the reference to "obj->buf" with a call to
> > > > > .ACCESS_WITH_SIZE (obj->buf, obj->size, -1);
> > > > 
> > > > This seems plausible - but you should also consider the case of static
> > > > initializers - remember the GNU extension for statically allocated 
> > > > objects
> > > > with flexible array members (unless you're not allowing it with
> > > > counted_by).
> > > > 
> > > > static struct A x = { sizeof "hello", "hello" };
> > > > static char *y = 
> > > > 
> > > > I'd expect that to be valid - and unless you say such a usage is 
> > > > invalid,
> > > 
> > > At this moment, I think that this should be valid.
> > > 
> > > I,e, the following:
> > > 
> > > struct A
> > > {
> > > size_t size;
> > > char buf[] __attribute__((counted_by(size)));
> > > };
> > > 
> > > static struct A x = {sizeof "hello", "hello”};
> > > 
> > > Should be valid, and x.size represents the number of elements of x.buf.
> > > Both x.size and x.buf are initialized statically.
> > > 
> > > > you should avoid the replacement in such a static initializer context 
> > > > when
> > > > the FAM reference is to an object with a constant address (if
> > > > .ACCESS_WITH_SIZE would not act as an lvalue whose address is a constant
> > > > expression; if it works fine as a constant-address lvalue, then the
> > > > replacement would be OK).
> > > 
> > > Then if such usage for the “counted_by” is valid, we need to replace the 
> > > FAM
> > > reference by a call to  .ACCESS_WITH_SIZE as well.
> > > Otherwise the “counted_by” relationship will be lost to the Middle end.
> > > 
> > > With the current definition of .ACCESS_WITH_SIZE
> > > 
> > > PTR = .ACCESS_WITH_SIZE (PTR, SIZE, ACCESS_MODE)
> > > 
> > > Isn’t the PTR (return value of the call) a LVALUE?
> > 
> > You probably want to specify that when a pointer to the array is taken the
> > pointer has to be to the first array element (or do we want to mangle the
> > 'size' accordingly for the instrumentation?).
> 
> Yes. Will add this into the user documentation.

This shouldn't be necessary. The object-size pass
can track pointer arithmeti if it comes after
inserting the .ACCESS_WITH_SIZE.

https://godbolt.org/z/fvc3aoPfd

> 
> >  You also want to specify that
> > the 'size' associated with such pointer is assumed to be unchanging and
> > after changing the size such pointer has to be re-obtained.
> 
> What do you mean by “re-obtained”? 
> 
> >  Plus that
> > changes to the allocated object/size have to be performed through an
> > lvalue where the containing type and thus the 'counted_by' attribute is
> > visible.
> 
> Through an lvalue with the containing type?
> 
> Yes, will add this too. 

I do not understand this.  It shouldn't matter how
it is 

Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-02 Thread Richard Biener
On Thu, Nov 2, 2023 at 2:50 PM Qing Zhao  wrote:
>
>
>
> > On Nov 2, 2023, at 3:57 AM, Richard Biener  
> > wrote:
> >
> > On Wed, Nov 1, 2023 at 3:47 PM Qing Zhao  wrote:
> >>
> >>
> >>
> >>> On Oct 31, 2023, at 6:14 PM, Joseph Myers  wrote:
> >>>
> >>> On Tue, 31 Oct 2023, Qing Zhao wrote:
> >>>
>  2.3 A new semantic requirement in the user documentation of "counted_by"
> 
>  For the following structure including a FAM with a counted_by attribute:
> 
>  struct A
>  {
>   size_t size;
>   char buf[] __attribute__((counted_by(size)));
>  };
> 
>  for any object with such type:
> 
>  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
> 
>  The setting to the size field should be done before the first reference
>  to the FAM field.
> 
>  Such requirement to the user will guarantee that the first reference to
>  the FAM knows the size of the FAM.
> 
>  We need to add this additional requirement to the user document.
> >>>
> >>> Make sure the manual is very specific about exactly when size is
> >>> considered to be an accurate representation of the space available for buf
> >>> (given that, after malloc or realloc, it's going to be temporarily
> >>> inaccurate).  If the intent is that inaccurate size at such a time means
> >>> undefined behavior, say so explicitly.
> >>
> >> Yes, good point. We need to define this clearly in the beginning.
> >> We need to explicit say that
> >>
> >> the size of the FAM is defined by the latest “counted_by” value. And it’s 
> >> an undefined behavior when the size field is not defined when the FAM is 
> >> referenced.
> >>
> >> Is the above good enough?
> >>
> >>
> >>>
>  2.4 Replace FAM field accesses with the new function ACCESS_WITH_SIZE
> 
>  In C FE:
> 
>  for every reference to a FAM, for example, "obj->buf" in the small 
>  example,
>  check whether the corresponding FIELD_DECL has a "counted_by" attribute?
>  if YES, replace the reference to "obj->buf" with a call to
>  .ACCESS_WITH_SIZE (obj->buf, obj->size, -1);
> >>>
> >>> This seems plausible - but you should also consider the case of static
> >>> initializers - remember the GNU extension for statically allocated objects
> >>> with flexible array members (unless you're not allowing it with
> >>> counted_by).
> >>>
> >>> static struct A x = { sizeof "hello", "hello" };
> >>> static char *y = 
> >>>
> >>> I'd expect that to be valid - and unless you say such a usage is invalid,
> >>
> >> At this moment, I think that this should be valid.
> >>
> >> I,e, the following:
> >>
> >> struct A
> >> {
> >> size_t size;
> >> char buf[] __attribute__((counted_by(size)));
> >> };
> >>
> >> static struct A x = {sizeof "hello", "hello”};
> >>
> >> Should be valid, and x.size represents the number of elements of x.buf.
> >> Both x.size and x.buf are initialized statically.
> >>
> >>> you should avoid the replacement in such a static initializer context when
> >>> the FAM reference is to an object with a constant address (if
> >>> .ACCESS_WITH_SIZE would not act as an lvalue whose address is a constant
> >>> expression; if it works fine as a constant-address lvalue, then the
> >>> replacement would be OK).
> >>
> >> Then if such usage for the “counted_by” is valid, we need to replace the 
> >> FAM
> >> reference by a call to  .ACCESS_WITH_SIZE as well.
> >> Otherwise the “counted_by” relationship will be lost to the Middle end.
> >>
> >> With the current definition of .ACCESS_WITH_SIZE
> >>
> >> PTR = .ACCESS_WITH_SIZE (PTR, SIZE, ACCESS_MODE)
> >>
> >> Isn’t the PTR (return value of the call) a LVALUE?
> >
> > You probably want to specify that when a pointer to the array is taken the
> > pointer has to be to the first array element (or do we want to mangle the
> > 'size' accordingly for the instrumentation?).
>
> Yes. Will add this into the user documentation.
>
> >  You also want to specify that
> > the 'size' associated with such pointer is assumed to be unchanging and
> > after changing the size such pointer has to be re-obtained.
>
> What do you mean by “re-obtained”?

do

p = [0];

after any adjustments to 'array' or 'len' again and base further accesses on
the new 'p'.

> >  Plus that
> > changes to the allocated object/size have to be performed through an
> > lvalue where the containing type and thus the 'counted_by' attribute is
> > visible.
>
> Through an lvalue with the containing type?
>
> Yes, will add this too.
>
>
> >  That is,
> >
> > size_t *s = 
> > *s = 1;
> >
> > is invoking undefined behavior,
>
> right.
>
> > likewise modifying 'buf' (makes it a bit
> > awkward since for example that wouldn't support using posix_memalign
> > for allocation, though aligned_alloc would be fine).
> Is there a small example for the undefined behavior for this?

a.len = len;
posix_memalign (, 16, len);

we would probably have to somehow instrument this.

> Qing
> >
> > Richard.
> >
> >> 

Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-02 Thread Qing Zhao


> On Nov 2, 2023, at 3:57 AM, Richard Biener  wrote:
> 
> On Wed, Nov 1, 2023 at 3:47 PM Qing Zhao  wrote:
>> 
>> 
>> 
>>> On Oct 31, 2023, at 6:14 PM, Joseph Myers  wrote:
>>> 
>>> On Tue, 31 Oct 2023, Qing Zhao wrote:
>>> 
 2.3 A new semantic requirement in the user documentation of "counted_by"
 
 For the following structure including a FAM with a counted_by attribute:
 
 struct A
 {
  size_t size;
  char buf[] __attribute__((counted_by(size)));
 };
 
 for any object with such type:
 
 struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
 
 The setting to the size field should be done before the first reference
 to the FAM field.
 
 Such requirement to the user will guarantee that the first reference to
 the FAM knows the size of the FAM.
 
 We need to add this additional requirement to the user document.
>>> 
>>> Make sure the manual is very specific about exactly when size is
>>> considered to be an accurate representation of the space available for buf
>>> (given that, after malloc or realloc, it's going to be temporarily
>>> inaccurate).  If the intent is that inaccurate size at such a time means
>>> undefined behavior, say so explicitly.
>> 
>> Yes, good point. We need to define this clearly in the beginning.
>> We need to explicit say that
>> 
>> the size of the FAM is defined by the latest “counted_by” value. And it’s an 
>> undefined behavior when the size field is not defined when the FAM is 
>> referenced.
>> 
>> Is the above good enough?
>> 
>> 
>>> 
 2.4 Replace FAM field accesses with the new function ACCESS_WITH_SIZE
 
 In C FE:
 
 for every reference to a FAM, for example, "obj->buf" in the small example,
 check whether the corresponding FIELD_DECL has a "counted_by" attribute?
 if YES, replace the reference to "obj->buf" with a call to
 .ACCESS_WITH_SIZE (obj->buf, obj->size, -1);
>>> 
>>> This seems plausible - but you should also consider the case of static
>>> initializers - remember the GNU extension for statically allocated objects
>>> with flexible array members (unless you're not allowing it with
>>> counted_by).
>>> 
>>> static struct A x = { sizeof "hello", "hello" };
>>> static char *y = 
>>> 
>>> I'd expect that to be valid - and unless you say such a usage is invalid,
>> 
>> At this moment, I think that this should be valid.
>> 
>> I,e, the following:
>> 
>> struct A
>> {
>> size_t size;
>> char buf[] __attribute__((counted_by(size)));
>> };
>> 
>> static struct A x = {sizeof "hello", "hello”};
>> 
>> Should be valid, and x.size represents the number of elements of x.buf.
>> Both x.size and x.buf are initialized statically.
>> 
>>> you should avoid the replacement in such a static initializer context when
>>> the FAM reference is to an object with a constant address (if
>>> .ACCESS_WITH_SIZE would not act as an lvalue whose address is a constant
>>> expression; if it works fine as a constant-address lvalue, then the
>>> replacement would be OK).
>> 
>> Then if such usage for the “counted_by” is valid, we need to replace the FAM
>> reference by a call to  .ACCESS_WITH_SIZE as well.
>> Otherwise the “counted_by” relationship will be lost to the Middle end.
>> 
>> With the current definition of .ACCESS_WITH_SIZE
>> 
>> PTR = .ACCESS_WITH_SIZE (PTR, SIZE, ACCESS_MODE)
>> 
>> Isn’t the PTR (return value of the call) a LVALUE?
> 
> You probably want to specify that when a pointer to the array is taken the
> pointer has to be to the first array element (or do we want to mangle the
> 'size' accordingly for the instrumentation?).

Yes. Will add this into the user documentation.

>  You also want to specify that
> the 'size' associated with such pointer is assumed to be unchanging and
> after changing the size such pointer has to be re-obtained.

What do you mean by “re-obtained”? 

>  Plus that
> changes to the allocated object/size have to be performed through an
> lvalue where the containing type and thus the 'counted_by' attribute is
> visible.

Through an lvalue with the containing type?

Yes, will add this too. 


>  That is,
> 
> size_t *s = 
> *s = 1;
> 
> is invoking undefined behavior,

right.

> likewise modifying 'buf' (makes it a bit
> awkward since for example that wouldn't support using posix_memalign
> for allocation, though aligned_alloc would be fine).
Is there a small example for the undefined behavior for this?

Qing
> 
> Richard.
> 
>> Qing
>>> 
>>> --
>>> Joseph S. Myers
>>> jos...@codesourcery.com
>> 



Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-02 Thread Jakub Jelinek
On Thu, Nov 02, 2023 at 12:52:50PM +0100, Richard Biener wrote:
> > What I meant is to emit
> > tmp_4 = .ACCESS_WITH_SIZE ([0], , (typeof ()) 0);
> > p_5 = _4[2];
> > i.e. don't associate the pointer with a value of the size, but with
> > an address where to find the size (plus how large it is), basically escape
> > pointer to the size at that point.  And __builtin_dynamic_object_size is 
> > pure,
> > so supposedly it can depend on what the escaped pointer points to.
> 
> Well, yeah - that would work but depend on .ACCESS_WITH_SIZE being an
> escape point (quite bad IMHO)

That is why I've said we need to decide what cost we want to suffer because
of that.

> and __builtin_dynamic_object_size being
> non-const (that's probably not too bad).

It is already pure,leaf,nothrow (unlike __builtin_object_size which is obviously
const,leaf,nothrow).  Because under the hood, it can read memory when
expanded.

> > We'd see that a particular pointer is size associated with  address
> > and would use that address cast to the type of the third argument (to
> > preserve the exact pointer type on INTEGER_CST, though not sure, wouldn't
> > VN CSE it anyway if one has say
> > union U { struct S { int a; char b __attribute__((counted_by (a))) []; } s;
> >   struct T { char c, d, e, f; char g __attribute__((counted_by 
> > (c))) []; } t; };
> > and
> > .ACCESS_WITH_SIZE ([0], , (int *) 0);
> > ...
> > .ACCESS_WITH_SIZE ([0], , (int *) 0);
> > ?
> 
> We'd probably CSE that - the usual issue of address-with-same-value.
> 
> > It would mean though that counted_by wouldn't be allowed to be a
> > bit-field...
> 
> Yup.  We could also pass a pointer to the container though, that's good enough
> for the escape, and pass the size by value in addition to that.

I was wondering about stuff like _BitInt.  But sure, counted_by is just an
extension, we can just refuse counting by _BitInt in addition to counting by
floating point, pointers, aggregates, bit-fields, or we could somehow encode
all the needed type's properties numerically into an integral constant.
Similarly for alias set (unless it uses 0 for reads).

Jakub



Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-02 Thread Richard Biener
On Thu, Nov 2, 2023 at 11:40 AM Jakub Jelinek  wrote:
>
> On Thu, Nov 02, 2023 at 11:18:09AM +0100, Richard Biener wrote:
> > > Or, if we want to pay further price, .ACCESS_WITH_SIZE could take as one 
> > > of
> > > the arguments not the size value, but its address.  Then at __bdos time
> > > we would dereference that pointer to get the size.
> > > So,
> > > struct S { int a; char b __attribute__((counted_by (a))) []; };
> > > struct S s;
> > > s.a = 5;
> > > char *p = [2];
> > > int i1 = __builtin_dynamic_object_size (p, 0);
> > > s.a = 3;
> > > int i2 = __builtin_dynamic_object_size (p, 0);
> > > would then yield 3 and 1 rather than 3 and 3.
> >
> > I fail to see how we can get the __builtin_dynamic_object_size call
> > data dependent on s.a, thus avoid re-ordering or even DSE of the
> > store.
>
> If [2] is lowered as
> sz_1 = s.a;
> tmp_2 = .ACCESS_WITH_SIZE ([0], sz_1);
> p_3 = _2[2];
> then sure, there is no way, you get the size from that point.
> tree-object-size.cc tracking then determines that in a particular
> case the pointer is size associated with sz_1 and use that value
> as the size (with the usual adjustments for pointer arithmetics and the
> like).
>
> What I meant is to emit
> tmp_4 = .ACCESS_WITH_SIZE ([0], , (typeof ()) 0);
> p_5 = _4[2];
> i.e. don't associate the pointer with a value of the size, but with
> an address where to find the size (plus how large it is), basically escape
> pointer to the size at that point.  And __builtin_dynamic_object_size is pure,
> so supposedly it can depend on what the escaped pointer points to.

Well, yeah - that would work but depend on .ACCESS_WITH_SIZE being an
escape point (quite bad IMHO) and __builtin_dynamic_object_size being
non-const (that's probably not too bad).

> We'd see that a particular pointer is size associated with  address
> and would use that address cast to the type of the third argument (to
> preserve the exact pointer type on INTEGER_CST, though not sure, wouldn't
> VN CSE it anyway if one has say
> union U { struct S { int a; char b __attribute__((counted_by (a))) []; } s;
>   struct T { char c, d, e, f; char g __attribute__((counted_by (c))) 
> []; } t; };
> and
> .ACCESS_WITH_SIZE ([0], , (int *) 0);
> ...
> .ACCESS_WITH_SIZE ([0], , (int *) 0);
> ?

We'd probably CSE that - the usual issue of address-with-same-value.

> It would mean though that counted_by wouldn't be allowed to be a
> bit-field...

Yup.  We could also pass a pointer to the container though, that's good enough
for the escape, and pass the size by value in addition to that.

> Jakub
>


Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-02 Thread Jakub Jelinek
On Thu, Nov 02, 2023 at 11:18:09AM +0100, Richard Biener wrote:
> > Or, if we want to pay further price, .ACCESS_WITH_SIZE could take as one of
> > the arguments not the size value, but its address.  Then at __bdos time
> > we would dereference that pointer to get the size.
> > So,
> > struct S { int a; char b __attribute__((counted_by (a))) []; };
> > struct S s;
> > s.a = 5;
> > char *p = [2];
> > int i1 = __builtin_dynamic_object_size (p, 0);
> > s.a = 3;
> > int i2 = __builtin_dynamic_object_size (p, 0);
> > would then yield 3 and 1 rather than 3 and 3.
> 
> I fail to see how we can get the __builtin_dynamic_object_size call
> data dependent on s.a, thus avoid re-ordering or even DSE of the
> store.

If [2] is lowered as
sz_1 = s.a;
tmp_2 = .ACCESS_WITH_SIZE ([0], sz_1);
p_3 = _2[2];
then sure, there is no way, you get the size from that point.
tree-object-size.cc tracking then determines that in a particular
case the pointer is size associated with sz_1 and use that value
as the size (with the usual adjustments for pointer arithmetics and the
like).

What I meant is to emit
tmp_4 = .ACCESS_WITH_SIZE ([0], , (typeof ()) 0);
p_5 = _4[2];
i.e. don't associate the pointer with a value of the size, but with
an address where to find the size (plus how large it is), basically escape
pointer to the size at that point.  And __builtin_dynamic_object_size is pure,
so supposedly it can depend on what the escaped pointer points to.
We'd see that a particular pointer is size associated with  address
and would use that address cast to the type of the third argument (to
preserve the exact pointer type on INTEGER_CST, though not sure, wouldn't
VN CSE it anyway if one has say
union U { struct S { int a; char b __attribute__((counted_by (a))) []; } s;
  struct T { char c, d, e, f; char g __attribute__((counted_by (c))) 
[]; } t; };
and
.ACCESS_WITH_SIZE ([0], , (int *) 0);
...
.ACCESS_WITH_SIZE ([0], , (int *) 0);
?

It would mean though that counted_by wouldn't be allowed to be a
bit-field...

Jakub



Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-02 Thread Richard Biener
On Thu, Nov 2, 2023 at 9:27 AM Jakub Jelinek  wrote:
>
> On Thu, Nov 02, 2023 at 08:57:36AM +0100, Richard Biener wrote:
> > You probably want to specify that when a pointer to the array is taken the
> > pointer has to be to the first array element (or do we want to mangle the
> > 'size' accordingly for the instrumentation?).  You also want to specify that
> > the 'size' associated with such pointer is assumed to be unchanging and
> > after changing the size such pointer has to be re-obtained.  Plus that
> > changes to the allocated object/size have to be performed through an
> > lvalue where the containing type and thus the 'counted_by' attribute is
> > visible.  That is,
> >
> > size_t *s = 
> > *s = 1;
> >
> > is invoking undefined behavior, likewise modifying 'buf' (makes it a bit
> > awkward since for example that wouldn't support using posix_memalign
> > for allocation, though aligned_alloc would be fine).
>
> Depends on what behavior we want to guarantee and what kind of price we want
> to pay for it.  If the size is .ACCESS_WITH_SIZE operand, the size used in
> __bdos will be whatever counted_by size an array had upon taking address of
> the array, wherever that happens in the program.  And while we can CSE
> the calls, they'd be CSEd only if they have the same size.
>
> Or, if we want to pay further price, .ACCESS_WITH_SIZE could take as one of
> the arguments not the size value, but its address.  Then at __bdos time
> we would dereference that pointer to get the size.
> So,
> struct S { int a; char b __attribute__((counted_by (a))) []; };
> struct S s;
> s.a = 5;
> char *p = [2];
> int i1 = __builtin_dynamic_object_size (p, 0);
> s.a = 3;
> int i2 = __builtin_dynamic_object_size (p, 0);
> would then yield 3 and 1 rather than 3 and 3.

I fail to see how we can get the __builtin_dynamic_object_size call
data dependent on s.a, thus avoid re-ordering or even DSE of the
store.

Basically the model is that __builtin_dynamic_object_size will get
you the size at the point 'p' was formed from something that "last"
had the container with the counted_by attribute visible (plus adjustments
to 'p' inbetween that we are able to track).

s.a = 5;
char *p = [0];

will get you '5' as size,

char *p = [0];
s.a = 7;

will get you whatever was in 's.a' at the point of the address taking,
s.a  = 7 will _not_ be honored for __builtin_dynamic_object_size
calls on 'p'.

>  But dunno if we wouldn't
> need to drop leaf attribute from __bdos to make that work, that would be
> I think a significant case against doing that, because while in all the
> current plans one just pay code performance price when using counted_by
> attribute, even when not using __bdos for it, if we had to make __bdos
> non-leaf we'd pay extra price even when nobody is using that attribute
> just in -D_FORTIFY_SOURCE=3 / -fhardened compilations, which is how
> several distros build basically everything.
>
> Jakub
>


Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-02 Thread Jakub Jelinek
On Thu, Nov 02, 2023 at 08:57:36AM +0100, Richard Biener wrote:
> You probably want to specify that when a pointer to the array is taken the
> pointer has to be to the first array element (or do we want to mangle the
> 'size' accordingly for the instrumentation?).  You also want to specify that
> the 'size' associated with such pointer is assumed to be unchanging and
> after changing the size such pointer has to be re-obtained.  Plus that
> changes to the allocated object/size have to be performed through an
> lvalue where the containing type and thus the 'counted_by' attribute is
> visible.  That is,
> 
> size_t *s = 
> *s = 1;
> 
> is invoking undefined behavior, likewise modifying 'buf' (makes it a bit
> awkward since for example that wouldn't support using posix_memalign
> for allocation, though aligned_alloc would be fine).

Depends on what behavior we want to guarantee and what kind of price we want
to pay for it.  If the size is .ACCESS_WITH_SIZE operand, the size used in
__bdos will be whatever counted_by size an array had upon taking address of
the array, wherever that happens in the program.  And while we can CSE
the calls, they'd be CSEd only if they have the same size.

Or, if we want to pay further price, .ACCESS_WITH_SIZE could take as one of
the arguments not the size value, but its address.  Then at __bdos time
we would dereference that pointer to get the size.
So,
struct S { int a; char b __attribute__((counted_by (a))) []; };
struct S s;
s.a = 5;
char *p = [2];
int i1 = __builtin_dynamic_object_size (p, 0);
s.a = 3;
int i2 = __builtin_dynamic_object_size (p, 0);
would then yield 3 and 1 rather than 3 and 3.  But dunno if we wouldn't
need to drop leaf attribute from __bdos to make that work, that would be
I think a significant case against doing that, because while in all the
current plans one just pay code performance price when using counted_by
attribute, even when not using __bdos for it, if we had to make __bdos
non-leaf we'd pay extra price even when nobody is using that attribute
just in -D_FORTIFY_SOURCE=3 / -fhardened compilations, which is how
several distros build basically everything.

Jakub



Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-02 Thread Richard Biener
On Wed, Nov 1, 2023 at 3:47 PM Qing Zhao  wrote:
>
>
>
> > On Oct 31, 2023, at 6:14 PM, Joseph Myers  wrote:
> >
> > On Tue, 31 Oct 2023, Qing Zhao wrote:
> >
> >> 2.3 A new semantic requirement in the user documentation of "counted_by"
> >>
> >> For the following structure including a FAM with a counted_by attribute:
> >>
> >>  struct A
> >>  {
> >>   size_t size;
> >>   char buf[] __attribute__((counted_by(size)));
> >>  };
> >>
> >> for any object with such type:
> >>
> >>  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
> >>
> >> The setting to the size field should be done before the first reference
> >> to the FAM field.
> >>
> >> Such requirement to the user will guarantee that the first reference to
> >> the FAM knows the size of the FAM.
> >>
> >> We need to add this additional requirement to the user document.
> >
> > Make sure the manual is very specific about exactly when size is
> > considered to be an accurate representation of the space available for buf
> > (given that, after malloc or realloc, it's going to be temporarily
> > inaccurate).  If the intent is that inaccurate size at such a time means
> > undefined behavior, say so explicitly.
>
> Yes, good point. We need to define this clearly in the beginning.
> We need to explicit say that
>
> the size of the FAM is defined by the latest “counted_by” value. And it’s an 
> undefined behavior when the size field is not defined when the FAM is 
> referenced.
>
> Is the above good enough?
>
>
> >
> >> 2.4 Replace FAM field accesses with the new function ACCESS_WITH_SIZE
> >>
> >> In C FE:
> >>
> >> for every reference to a FAM, for example, "obj->buf" in the small example,
> >>  check whether the corresponding FIELD_DECL has a "counted_by" attribute?
> >>  if YES, replace the reference to "obj->buf" with a call to
> >>  .ACCESS_WITH_SIZE (obj->buf, obj->size, -1);
> >
> > This seems plausible - but you should also consider the case of static
> > initializers - remember the GNU extension for statically allocated objects
> > with flexible array members (unless you're not allowing it with
> > counted_by).
> >
> > static struct A x = { sizeof "hello", "hello" };
> > static char *y = 
> >
> > I'd expect that to be valid - and unless you say such a usage is invalid,
>
> At this moment, I think that this should be valid.
>
> I,e, the following:
>
> struct A
> {
>  size_t size;
>  char buf[] __attribute__((counted_by(size)));
> };
>
> static struct A x = {sizeof "hello", "hello”};
>
> Should be valid, and x.size represents the number of elements of x.buf.
> Both x.size and x.buf are initialized statically.
>
> > you should avoid the replacement in such a static initializer context when
> > the FAM reference is to an object with a constant address (if
> > .ACCESS_WITH_SIZE would not act as an lvalue whose address is a constant
> > expression; if it works fine as a constant-address lvalue, then the
> > replacement would be OK).
>
> Then if such usage for the “counted_by” is valid, we need to replace the FAM
> reference by a call to  .ACCESS_WITH_SIZE as well.
> Otherwise the “counted_by” relationship will be lost to the Middle end.
>
> With the current definition of .ACCESS_WITH_SIZE
>
> PTR = .ACCESS_WITH_SIZE (PTR, SIZE, ACCESS_MODE)
>
> Isn’t the PTR (return value of the call) a LVALUE?

You probably want to specify that when a pointer to the array is taken the
pointer has to be to the first array element (or do we want to mangle the
'size' accordingly for the instrumentation?).  You also want to specify that
the 'size' associated with such pointer is assumed to be unchanging and
after changing the size such pointer has to be re-obtained.  Plus that
changes to the allocated object/size have to be performed through an
lvalue where the containing type and thus the 'counted_by' attribute is
visible.  That is,

size_t *s = 
*s = 1;

is invoking undefined behavior, likewise modifying 'buf' (makes it a bit
awkward since for example that wouldn't support using posix_memalign
for allocation, though aligned_alloc would be fine).

Richard.

> Qing
> >
> > --
> > Joseph S. Myers
> > jos...@codesourcery.com
>


Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-01 Thread Qing Zhao


> On Nov 1, 2023, at 11:00 AM, Martin Uecker  wrote:
> 
> Am Mittwoch, dem 01.11.2023 um 14:47 + schrieb Qing Zhao:
>> 
>>> On Oct 31, 2023, at 6:14 PM, Joseph Myers  wrote:
>>> 
>>> On Tue, 31 Oct 2023, Qing Zhao wrote:
>>> 
 2.3 A new semantic requirement in the user documentation of "counted_by"
 
 For the following structure including a FAM with a counted_by attribute:
 
 struct A
 {
  size_t size;
  char buf[] __attribute__((counted_by(size)));
 };
 
 for any object with such type:
 
 struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
 
 The setting to the size field should be done before the first reference 
 to the FAM field.
 
 Such requirement to the user will guarantee that the first reference to 
 the FAM knows the size of the FAM.
 
 We need to add this additional requirement to the user document.
>>> 
>>> Make sure the manual is very specific about exactly when size is 
>>> considered to be an accurate representation of the space available for buf 
>>> (given that, after malloc or realloc, it's going to be temporarily 
>>> inaccurate).  If the intent is that inaccurate size at such a time means 
>>> undefined behavior, say so explicitly.
>> 
>> Yes, good point. We need to define this clearly in the beginning. 
>> We need to explicit say that 
>> 
>> the size of the FAM is defined by the latest “counted_by” value. And it’s an 
>> undefined behavior when the size field is not defined when the FAM is 
>> referenced.
> 
> It is defined by the latest "counted_by" value before x.buf
> is referenced, but not the latest before x.buf is dereferenced.

Then:

The size of the FAM is defined by the latest “counted_by” value before the FAM 
is referenced. 
It’s an undefined behavior when the “counted_by” value is not initialized 
before the FAM is referenced. 

> 
>> 
>> Is the above good enough?
>> 
>> 
>>> 
 2.4 Replace FAM field accesses with the new function ACCESS_WITH_SIZE
 
 In C FE:
 
 for every reference to a FAM, for example, "obj->buf" in the small example,
 check whether the corresponding FIELD_DECL has a "counted_by" attribute?
 if YES, replace the reference to "obj->buf" with a call to
 .ACCESS_WITH_SIZE (obj->buf, obj->size, -1); 
>>> 
>>> This seems plausible - but you should also consider the case of static 
>>> initializers - remember the GNU extension for statically allocated objects 
>>> with flexible array members (unless you're not allowing it with 
>>> counted_by).
>>> 
>>> static struct A x = { sizeof "hello", "hello" };
>>> static char *y = 
>>> 
>>> I'd expect that to be valid - and unless you say such a usage is invalid, 
>> 
>> At this moment, I think that this should be valid.
>> 
>> I,e, the following:
>> 
>> struct A
>> {
>> size_t size;
>> char buf[] __attribute__((counted_by(size)));
>> };
>> 
>> static struct A x = {sizeof "hello", "hello”};
>> 
>> Should be valid, and x.size represents the number of elements of x.buf. 
>> Both x.size and x.buf are initialized statically. 
> 
> Joseph is talking about the compile-time initialization of y.

Okay, -:) 
so, this is the point where the x.buf is referenced,
 and I think that replacing this reference to a call to .ACCESS_WITH_SIZE is 
still needed.
Otherwise, the “counted_by” relationship will NOT be seen by the middle-end 
anymore.


> 
>> 
>>> you should avoid the replacement in such a static initializer context when 
>>> the FAM reference is to an object with a constant address (if 
>>> .ACCESS_WITH_SIZE would not act as an lvalue whose address is a constant 
>>> expression; if it works fine as a constant-address lvalue, then the 
>>> replacement would be OK).
>> 
>> Then if such usage for the “counted_by” is valid, we need to replace the FAM 
>> reference by a call to  .ACCESS_WITH_SIZE as well.
>> Otherwise the “counted_by” relationship will be lost to the Middle end. 
>> 
>> With the current definition of .ACCESS_WITH_SIZE
>> 
>> PTR = .ACCESS_WITH_SIZE (PTR, SIZE, ACCESS_MODE)
>> 
>> Isn’t the PTR (return value of the call) a LVALUE? 
> 
> The question is whether we get an address constant
> that can be used for compile-time initialization.

Oh, I see.

So, now, PTR is already an constant at FE, the replacement will be

.ACCESS_WITH_SIZE( CONSTANT_ADDRESS, SIZE, ACCESS_MODE)

This looks awkward….
Should we allow this?

If not allowed, then the “counted_by” attribute will not work for the static 
initialization. 

> 
> I think it would be good to collect a list of test
> cases and to include this example.

Yes, I will put this into the testing case list.

Qing
> 
> Martin
> 
>> 
>> Qing
>>> 
>>> -- 
>>> Joseph S. Myers
>>> jos...@codesourcery.com



Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-01 Thread Martin Uecker
Am Mittwoch, dem 01.11.2023 um 14:47 + schrieb Qing Zhao:
> 
> > On Oct 31, 2023, at 6:14 PM, Joseph Myers  wrote:
> > 
> > On Tue, 31 Oct 2023, Qing Zhao wrote:
> > 
> > > 2.3 A new semantic requirement in the user documentation of "counted_by"
> > > 
> > > For the following structure including a FAM with a counted_by attribute:
> > > 
> > >  struct A
> > >  {
> > >   size_t size;
> > >   char buf[] __attribute__((counted_by(size)));
> > >  };
> > > 
> > > for any object with such type:
> > > 
> > >  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
> > > 
> > > The setting to the size field should be done before the first reference 
> > > to the FAM field.
> > > 
> > > Such requirement to the user will guarantee that the first reference to 
> > > the FAM knows the size of the FAM.
> > > 
> > > We need to add this additional requirement to the user document.
> > 
> > Make sure the manual is very specific about exactly when size is 
> > considered to be an accurate representation of the space available for buf 
> > (given that, after malloc or realloc, it's going to be temporarily 
> > inaccurate).  If the intent is that inaccurate size at such a time means 
> > undefined behavior, say so explicitly.
> 
> Yes, good point. We need to define this clearly in the beginning. 
> We need to explicit say that 
> 
> the size of the FAM is defined by the latest “counted_by” value. And it’s an 
> undefined behavior when the size field is not defined when the FAM is 
> referenced.

It is defined by the latest "counted_by" value before x.buf
is referenced, but not the latest before x.buf is dereferenced.

> 
> Is the above good enough?
> 
> 
> > 
> > > 2.4 Replace FAM field accesses with the new function ACCESS_WITH_SIZE
> > > 
> > > In C FE:
> > > 
> > > for every reference to a FAM, for example, "obj->buf" in the small 
> > > example,
> > >  check whether the corresponding FIELD_DECL has a "counted_by" attribute?
> > >  if YES, replace the reference to "obj->buf" with a call to
> > >  .ACCESS_WITH_SIZE (obj->buf, obj->size, -1); 
> > 
> > This seems plausible - but you should also consider the case of static 
> > initializers - remember the GNU extension for statically allocated objects 
> > with flexible array members (unless you're not allowing it with 
> > counted_by).
> > 
> > static struct A x = { sizeof "hello", "hello" };
> > static char *y = 
> > 
> > I'd expect that to be valid - and unless you say such a usage is invalid, 
> 
> At this moment, I think that this should be valid.
> 
> I,e, the following:
> 
> struct A
> {
>  size_t size;
>  char buf[] __attribute__((counted_by(size)));
> };
> 
> static struct A x = {sizeof "hello", "hello”};
> 
> Should be valid, and x.size represents the number of elements of x.buf. 
> Both x.size and x.buf are initialized statically. 

Joseph is talking about the compile-time initialization of y.

> 
> > you should avoid the replacement in such a static initializer context when 
> > the FAM reference is to an object with a constant address (if 
> > .ACCESS_WITH_SIZE would not act as an lvalue whose address is a constant 
> > expression; if it works fine as a constant-address lvalue, then the 
> > replacement would be OK).
> 
> Then if such usage for the “counted_by” is valid, we need to replace the FAM 
> reference by a call to  .ACCESS_WITH_SIZE as well.
> Otherwise the “counted_by” relationship will be lost to the Middle end. 
> 
> With the current definition of .ACCESS_WITH_SIZE
> 
> PTR = .ACCESS_WITH_SIZE (PTR, SIZE, ACCESS_MODE)
> 
> Isn’t the PTR (return value of the call) a LVALUE? 

The question is whether we get an address constant
that can be used for compile-time initialization.

I think it would be good to collect a list of test
cases and to include this example.

Martin

> 
> Qing
> > 
> > -- 
> > Joseph S. Myers
> > jos...@codesourcery.com
> 



Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-01 Thread Qing Zhao


> On Oct 31, 2023, at 6:14 PM, Joseph Myers  wrote:
> 
> On Tue, 31 Oct 2023, Qing Zhao wrote:
> 
>> 2.3 A new semantic requirement in the user documentation of "counted_by"
>> 
>> For the following structure including a FAM with a counted_by attribute:
>> 
>>  struct A
>>  {
>>   size_t size;
>>   char buf[] __attribute__((counted_by(size)));
>>  };
>> 
>> for any object with such type:
>> 
>>  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
>> 
>> The setting to the size field should be done before the first reference 
>> to the FAM field.
>> 
>> Such requirement to the user will guarantee that the first reference to 
>> the FAM knows the size of the FAM.
>> 
>> We need to add this additional requirement to the user document.
> 
> Make sure the manual is very specific about exactly when size is 
> considered to be an accurate representation of the space available for buf 
> (given that, after malloc or realloc, it's going to be temporarily 
> inaccurate).  If the intent is that inaccurate size at such a time means 
> undefined behavior, say so explicitly.

Yes, good point. We need to define this clearly in the beginning. 
We need to explicit say that 

the size of the FAM is defined by the latest “counted_by” value. And it’s an 
undefined behavior when the size field is not defined when the FAM is 
referenced.

Is the above good enough?


> 
>> 2.4 Replace FAM field accesses with the new function ACCESS_WITH_SIZE
>> 
>> In C FE:
>> 
>> for every reference to a FAM, for example, "obj->buf" in the small example,
>>  check whether the corresponding FIELD_DECL has a "counted_by" attribute?
>>  if YES, replace the reference to "obj->buf" with a call to
>>  .ACCESS_WITH_SIZE (obj->buf, obj->size, -1); 
> 
> This seems plausible - but you should also consider the case of static 
> initializers - remember the GNU extension for statically allocated objects 
> with flexible array members (unless you're not allowing it with 
> counted_by).
> 
> static struct A x = { sizeof "hello", "hello" };
> static char *y = 
> 
> I'd expect that to be valid - and unless you say such a usage is invalid, 

At this moment, I think that this should be valid.

I,e, the following:

struct A
{
 size_t size;
 char buf[] __attribute__((counted_by(size)));
};

static struct A x = {sizeof "hello", "hello”};

Should be valid, and x.size represents the number of elements of x.buf. 
Both x.size and x.buf are initialized statically. 

> you should avoid the replacement in such a static initializer context when 
> the FAM reference is to an object with a constant address (if 
> .ACCESS_WITH_SIZE would not act as an lvalue whose address is a constant 
> expression; if it works fine as a constant-address lvalue, then the 
> replacement would be OK).

Then if such usage for the “counted_by” is valid, we need to replace the FAM 
reference by a call to  .ACCESS_WITH_SIZE as well.
Otherwise the “counted_by” relationship will be lost to the Middle end. 

With the current definition of .ACCESS_WITH_SIZE

PTR = .ACCESS_WITH_SIZE (PTR, SIZE, ACCESS_MODE)

Isn’t the PTR (return value of the call) a LVALUE? 

Qing
> 
> -- 
> Joseph S. Myers
> jos...@codesourcery.com



Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-10-31 Thread Joseph Myers
On Tue, 31 Oct 2023, Qing Zhao wrote:

> 2.3 A new semantic requirement in the user documentation of "counted_by"
> 
> For the following structure including a FAM with a counted_by attribute:
> 
>   struct A
>   {
>size_t size;
>char buf[] __attribute__((counted_by(size)));
>   };
> 
> for any object with such type:
> 
>   struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
> 
> The setting to the size field should be done before the first reference 
> to the FAM field.
> 
> Such requirement to the user will guarantee that the first reference to 
> the FAM knows the size of the FAM.
> 
> We need to add this additional requirement to the user document.

Make sure the manual is very specific about exactly when size is 
considered to be an accurate representation of the space available for buf 
(given that, after malloc or realloc, it's going to be temporarily 
inaccurate).  If the intent is that inaccurate size at such a time means 
undefined behavior, say so explicitly.

> 2.4 Replace FAM field accesses with the new function ACCESS_WITH_SIZE
> 
> In C FE:
> 
> for every reference to a FAM, for example, "obj->buf" in the small example,
>   check whether the corresponding FIELD_DECL has a "counted_by" attribute?
>   if YES, replace the reference to "obj->buf" with a call to
>   .ACCESS_WITH_SIZE (obj->buf, obj->size, -1); 

This seems plausible - but you should also consider the case of static 
initializers - remember the GNU extension for statically allocated objects 
with flexible array members (unless you're not allowing it with 
counted_by).

static struct A x = { sizeof "hello", "hello" };
static char *y = 

I'd expect that to be valid - and unless you say such a usage is invalid, 
you should avoid the replacement in such a static initializer context when 
the FAM reference is to an object with a constant address (if 
.ACCESS_WITH_SIZE would not act as an lvalue whose address is a constant 
expression; if it works fine as a constant-address lvalue, then the 
replacement would be OK).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-10-31 Thread Qing Zhao


> On Oct 31, 2023, at 1:35 PM, Siddhesh Poyarekar  wrote:
> 
> On 2023-10-31 12:26, Qing Zhao wrote:
>> Hi,
>> I wrote a summary based on our extensive discussion, hopefully this can be 
>> served as an informal proposal.
>> Please take a look at it and let me know any comment or suggestion.
>> There are some (???) in the section 3.2 and 3.6, those are my questions 
>> seeking for help.  -:)
>> Thanks again for all the help.
>> Qing.
>> 
>> Represent the missing dependence for the "counted_by" attribute and its 
>> consumers
>> Qing Zhao
>> 10/30/2023
>> ==
>> The whole discussion is at:
>> https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633783.html
>> 1. The problem
>> There is a data dependency between the size assignment and the implicit use 
>> of the size information in the __builtin_dynamic_object_size that is missing 
>> in the IL (line 11 and line 13 in the below example). Such information 
>> missing will result incorrect code reordering and other code transformations.
>>   1 struct A
>>   2 {
>>   3  size_t size;
>>   4  char buf[] __attribute__((counted_by(size)));
>>   5 };
>>   6
>>   7 size_t
>>   8 foo (size_t sz)
>>   9 {
>>  10  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
>>  11  obj->size = sz;
>>  12  obj->buf[0] = 2;
>>  13  return __builtin_dynamic_object_size (obj->buf, 1);
>>  14 }
>>   Please see a more complicate example in the Appendex 1.
>> We need to represent such data dependency correctly in the IL.
>> 2. The solution:
>> 2.1 Summary
>> * Add a new internal function "ACCESS_WITH_SIZE" to carry the size 
>> information for every FAM field access;
>> * In C FE, Replace every FAM field access whose TYPE has the "counted_by" 
>> attribute with the new internal function "ACCESS_WITH_SIZE";
>> * In every consumer of the size information, for example, BDOS or array 
>> bound sanitizer, query the size information or ACCESS_MODE information from 
>> the new internal function;
>> * When the size information and the "ACCESS_MODE" information are not used 
>> anymore, possibly at the 2nd object size phase, replace the internal 
>> function with the actual FAM field access;
>> * Some adjustment to inlining heuristic and some SSA passes to mitigate the 
>> impact to the optimizer and code generation.
>> 2.2 The new internal function
>>   .ACCESS_WITH_SIZE (PTR, SIZE, ACCESS_MODE)
>> INTERNAL_FN (ACCESS_WITH_SIZE, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
>> which returns the "PTR" same as the 1st argument;
>> 1st argument "PTR": Pointer to the object;
>> 2nd argument "SIZE": The size of the pointed object,
>>   if the pointee of the "PTR" has a
>> * real type, it's the number of the elements of the type;
>> * void type, it's the number of bytes;
>> 3rd argument "ACCESS_MODE":
>>   -1: Unknown access semantics
>>0: none
>>1: read_only
>>2: write_only
>>3: read_write
>> NOTEs,
>>   A. This new internal function is intended for a more general use from all 
>> the 3 attributes, "access", "alloc_size", and the new "counted_by", to 
>> encode the "size" and "access_mode" information to the corresponding 
>> pointer. (in order to resolve PR96503, etc. 
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96503)
>>   B. For "counted_by" and "alloc_size" attributes, the 3rd argument will be 
>> -1.
>>   C. In this wrieup, we focus on the implementation details for the 
>> "counted_by" attribute. However, this function should be ready to be used by 
>> "access" and "alloc_size" without issue.
>> 2.3 A new semantic requirement in the user documentation of "counted_by"
>> For the following structure including a FAM with a counted_by attribute:
>>   struct A
>>   {
>>size_t size;
>>char buf[] __attribute__((counted_by(size)));
>>   };
>> for any object with such type:
>>   struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
>> The setting to the size field should be done before the first reference to 
>> the FAM field.
> 
> A more flexible specification could be stating that validation for a 
> reference to the FAM field will use the latest value assigned to the size 
> field before that reference.  That will allow for situations like:
> 
>  o->size = val1;
>  deref (o->buf);
>  o->size = val2;
> 
> making it clear that deref will see val1 and not val2.

Good point! Yes, with the current design, this is reasonable. 
Will update the proposal with this.
> 
>> Such requirement to the user will guarantee that the first reference to the 
>> FAM knows the size of the FAM.
>> We need to add this additional requirement to the user document.
>> 2.4 Replace FAM field accesses with the new function ACCESS_WITH_SIZE
>> In C FE:
>> for every reference to a FAM, for example, "obj->buf" in the small example,
>>   check whether the corresponding FIELD_DECL has a "counted_by" attribute?
>>   if YES, replace the reference to "obj->buf" 

Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-10-31 Thread Siddhesh Poyarekar

On 2023-10-31 12:26, Qing Zhao wrote:

Hi,

I wrote a summary based on our extensive discussion, hopefully this can be 
served as an informal proposal.

Please take a look at it and let me know any comment or suggestion.

There are some (???) in the section 3.2 and 3.6, those are my questions seeking 
for help.  -:)

Thanks again for all the help.

Qing.


Represent the missing dependence for the "counted_by" attribute and its 
consumers

Qing Zhao

10/30/2023
==

The whole discussion is at:
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633783.html

1. The problem

There is a data dependency between the size assignment and the implicit use of 
the size information in the __builtin_dynamic_object_size that is missing in 
the IL (line 11 and line 13 in the below example). Such information missing 
will result incorrect code reordering and other code transformations.

   1 struct A
   2 {
   3  size_t size;
   4  char buf[] __attribute__((counted_by(size)));
   5 };
   6
   7 size_t
   8 foo (size_t sz)
   9 {
  10  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
  11  obj->size = sz;
  12  obj->buf[0] = 2;
  13  return __builtin_dynamic_object_size (obj->buf, 1);
  14 }
   
Please see a more complicate example in the Appendex 1.


We need to represent such data dependency correctly in the IL.

2. The solution:

2.1 Summary

* Add a new internal function "ACCESS_WITH_SIZE" to carry the size information 
for every FAM field access;
* In C FE, Replace every FAM field access whose TYPE has the "counted_by" attribute with 
the new internal function "ACCESS_WITH_SIZE";
* In every consumer of the size information, for example, BDOS or array bound 
sanitizer, query the size information or ACCESS_MODE information from the new 
internal function;
* When the size information and the "ACCESS_MODE" information are not used 
anymore, possibly at the 2nd object size phase, replace the internal function with the 
actual FAM field access;
* Some adjustment to inlining heuristic and some SSA passes to mitigate the 
impact to the optimizer and code generation.

2.2 The new internal function

   .ACCESS_WITH_SIZE (PTR, SIZE, ACCESS_MODE)

INTERNAL_FN (ACCESS_WITH_SIZE, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)

which returns the "PTR" same as the 1st argument;

1st argument "PTR": Pointer to the object;
2nd argument "SIZE": The size of the pointed object,
   if the pointee of the "PTR" has a
 * real type, it's the number of the elements of the type;
 * void type, it's the number of bytes;
3rd argument "ACCESS_MODE":
   -1: Unknown access semantics
0: none
1: read_only
2: write_only
3: read_write

NOTEs,
   A. This new internal function is intended for a more general use from all the 3 attributes, "access", 
"alloc_size", and the new "counted_by", to encode the "size" and "access_mode" 
information to the corresponding pointer. (in order to resolve PR96503, etc. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96503)
   B. For "counted_by" and "alloc_size" attributes, the 3rd argument will be -1.
   C. In this wrieup, we focus on the implementation details for the "counted_by" attribute. 
However, this function should be ready to be used by "access" and "alloc_size" without 
issue.

2.3 A new semantic requirement in the user documentation of "counted_by"

For the following structure including a FAM with a counted_by attribute:

   struct A
   {
size_t size;
char buf[] __attribute__((counted_by(size)));
   };

for any object with such type:

   struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));

The setting to the size field should be done before the first reference to the 
FAM field.


A more flexible specification could be stating that validation for a 
reference to the FAM field will use the latest value assigned to the 
size field before that reference.  That will allow for situations like:


  o->size = val1;
  deref (o->buf);
  o->size = val2;

making it clear that deref will see val1 and not val2.



Such requirement to the user will guarantee that the first reference to the FAM 
knows the size of the FAM.

We need to add this additional requirement to the user document.

2.4 Replace FAM field accesses with the new function ACCESS_WITH_SIZE

In C FE:

for every reference to a FAM, for example, "obj->buf" in the small example,
   check whether the corresponding FIELD_DECL has a "counted_by" attribute?
   if YES, replace the reference to "obj->buf" with a call to
   .ACCESS_WITH_SIZE (obj->buf, obj->size, -1);

2.5 Query the size info

There are multiple consumers of the size info (and ACCESS_MODE info):

   * __builtin_dynamic_object_size;
   * array bound sanitizer;

in these consumers, get the size info from the 2nd argument of the call to
ACCESS_WITH_SIZE (PTR, SIZE, -1)

2.6 Eliminate the internal function when not useful anymore


RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-10-31 Thread Qing Zhao
Hi, 

I wrote a summary based on our extensive discussion, hopefully this can be 
served as an informal proposal. 

Please take a look at it and let me know any comment or suggestion.

There are some (???) in the section 3.2 and 3.6, those are my questions seeking 
for help.  -:)

Thanks again for all the help.

Qing.


Represent the missing dependence for the "counted_by" attribute and its 
consumers 

Qing Zhao

10/30/2023
==

The whole discussion is at:
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633783.html

1. The problem

There is a data dependency between the size assignment and the implicit use of 
the size information in the __builtin_dynamic_object_size that is missing in 
the IL (line 11 and line 13 in the below example). Such information missing 
will result incorrect code reordering and other code transformations. 

  1 struct A
  2 {
  3  size_t size;
  4  char buf[] __attribute__((counted_by(size)));
  5 };
  6 
  7 size_t 
  8 foo (size_t sz)
  9 {
 10  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
 11  obj->size = sz;
 12  obj->buf[0] = 2;
 13  return __builtin_dynamic_object_size (obj->buf, 1);
 14 }
  
Please see a more complicate example in the Appendex 1.

We need to represent such data dependency correctly in the IL. 

2. The solution:

2.1 Summary

* Add a new internal function "ACCESS_WITH_SIZE" to carry the size information 
for every FAM field access;
* In C FE, Replace every FAM field access whose TYPE has the "counted_by" 
attribute with the new internal function "ACCESS_WITH_SIZE";
* In every consumer of the size information, for example, BDOS or array bound 
sanitizer, query the size information or ACCESS_MODE information from the new 
internal function;
* When the size information and the "ACCESS_MODE" information are not used 
anymore, possibly at the 2nd object size phase, replace the internal function 
with the actual FAM field access; 
* Some adjustment to inlining heuristic and some SSA passes to mitigate the 
impact to the optimizer and code generation. 

2.2 The new internal function 

  .ACCESS_WITH_SIZE (PTR, SIZE, ACCESS_MODE)

INTERNAL_FN (ACCESS_WITH_SIZE, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)

which returns the "PTR" same as the 1st argument;

1st argument "PTR": Pointer to the object;
2nd argument "SIZE": The size of the pointed object, 
  if the pointee of the "PTR" has a
* real type, it's the number of the elements of the type;
* void type, it's the number of bytes; 
3rd argument "ACCESS_MODE": 
  -1: Unknown access semantics
   0: none
   1: read_only
   2: write_only
   3: read_write

NOTEs, 
  A. This new internal function is intended for a more general use from all the 
3 attributes, "access", "alloc_size", and the new "counted_by", to encode the 
"size" and "access_mode" information to the corresponding pointer. (in order to 
resolve PR96503, etc. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96503)
  B. For "counted_by" and "alloc_size" attributes, the 3rd argument will be -1. 
  
  C. In this wrieup, we focus on the implementation details for the 
"counted_by" attribute. However, this function should be ready to be used by 
"access" and "alloc_size" without issue. 

2.3 A new semantic requirement in the user documentation of "counted_by"

For the following structure including a FAM with a counted_by attribute:

  struct A
  {
   size_t size;
   char buf[] __attribute__((counted_by(size)));
  };

for any object with such type:

  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));

The setting to the size field should be done before the first reference to the 
FAM field.

Such requirement to the user will guarantee that the first reference to the FAM 
knows the size of the FAM.  

We need to add this additional requirement to the user document.

2.4 Replace FAM field accesses with the new function ACCESS_WITH_SIZE

In C FE:

for every reference to a FAM, for example, "obj->buf" in the small example,
  check whether the corresponding FIELD_DECL has a "counted_by" attribute?
  if YES, replace the reference to "obj->buf" with a call to
  .ACCESS_WITH_SIZE (obj->buf, obj->size, -1); 

2.5 Query the size info 

There are multiple consumers of the size info (and ACCESS_MODE info):

  * __builtin_dynamic_object_size;
  * array bound sanitizer;

in these consumers, get the size info from the 2nd argument of the call to
ACCESS_WITH_SIZE (PTR, SIZE, -1)

2.6 Eliminate the internal function when not useful anymore

After the last consumer of the size information in the ACCESS_WITH_SIZE, We 
should replace the internal call with its first argument.

Do it in the 2nd object size phase. 

2.7 Adjustment to inlining heuristic and other IPA analysis

the FE changes:

obj->buf

to

_1 = obj->buf;
_2 = obj->size;
.ACCESS_WITH_SIZE (_1, _2, -1)

there are major two changes:

  A. the # of LOADs, the #