Re: [PATCH 1/1] sparc: support for -mmisalign in the SPARC M8

2017-08-02 Thread Qing Zhao
Hi, David,

thanks a lot for your comment.

see my reply below

> STRICT_ALIGNMENT has a lot of implications.

from the definition of STRICT_ALIGNMENT:

/* Set this nonzero if move instructions will actually fail to work
   when given unaligned data.  */
#define STRICT_ALIGNMENT 1

for MISALIGN_TARGET,  it’s clear that move instructions will NOT fail to work 
when given unaligned data.
so, it’s reasonable to set STRICT_ALIGNMENT to 0 for MISALIGN_TARGET.

> 
> I think just because we happen to have misaligned loads and stores
> available doesn't mean we want all of the side effects associated
> with STRICT_ALIGNMENT being true.

so, could you please specify what kind of side effects will have when =
set STRICT_ALIGNMENT to true on TARGET_MISALIGN?=20

thanks a lot.

Qing

Re: [PATCH 1/1] sparc: support for -mmisalign in the SPARC M8

2017-08-03 Thread Qing Zhao
To be more specified,  when reading all the codes corresponding to 
“STRICT_ALIGNMENT” and “SLOW_UNALIGNMENT_ACCESS” in gcc
(NOTE, SLOW_UNALIGNMENT_ACCESS is the same as STRICT_ALIGNMENT when it is NOT 
defined explicitly, this is the case for SPARC)

We can get the following summary: 

all the special handling on STRICT_ALIGNMENT or SLOW_UNALIGNMENT_ACCESS in 
these codes have the following common logic:

if the memory access is known to be not-aligned well during compilation time, 
if the targeted platform does NOT support faster unaligned memory
access,  the compiler will try to make the memory access aligned well. 
Otherwise, if the targeted platform supports faster unaligned memory access,
 it will leave the compiler-time known not-aligned memory access as it, later 
the hardware support will kicked in for these unaligned memory access. 

this behavior is consistent with the high level definition of STRICT_ALIGNMENT. 

And also consistent with the M8 misaligned support:

if the target is NOT TARGET_MISALIGN,  STRICT_ALIGNMENT is 1,  all the 
compiler-time known misaligned memory accesses are adjusted to
aligned memory access before RTL generation;
on the other hand, if the target is TARGET_MISALIGN, STRICT_ALIGNMENT is 0,  
the compiler-time known misaligned memory accesses are NOT
adjusted,  after RTL generation, we will have compiler-time known misaligned 
memory access, we can use the new misaligned ld/st hardware insns to 
support these compiler-time known misaligned memory access. 

hope this is clear.

thanks.

Qing


>> 
>> Why don't you read the code rather than just relying upon what
>> high level description is given by the documentation instead?
> 
> I read the codes before making the change, that’s the reason I ask you to 
> specify clearly the bad side effect that I didn’t considered yet.
> 
> thanks.
> 
> Qing



Re: [PATCH 1/1] sparc: support for -mmisalign in the SPARC M8

2017-08-03 Thread Qing Zhao

> On Aug 3, 2017, at 11:40 AM, David Miller <da...@davemloft.net> wrote:
> 
> From: Qing Zhao <qing.z...@oracle.com>
> Date: Thu, 3 Aug 2017 10:37:15 -0500
> 
>> all the special handling on STRICT_ALIGNMENT or
>> SLOW_UNALIGNMENT_ACCESS in these codes have the following common
>> logic:
>> 
>> if the memory access is known to be not-aligned well during
>> compilation time, if the targeted platform does NOT support faster
>> unaligned memory access, the compiler will try to make the memory
>> access aligned well. Otherwise, if the targeted platform supports
>> faster unaligned memory access, it will leave the compiler-time
>> known not-aligned memory access as it, later the hardware support
>> will kicked in for these unaligned memory access.
>> 
>> this behavior is consistent with the high level definition of 
>> STRICT_ALIGNMENT. 
> 
> That's exactly the problem.
> 
> What you want with this M8 feature is simply to let the compiler know
> that if it is completely impossible to make some memory object
> aligned, then the cpu can handle this with special instructions.

> 
> You still want the compiler to make the effort to align data when it
> can because the accesses will be faster than if it used the unaligned
> loads and stores.

I don’t think the above is true.

first, the compiler-time known misaligned memory access can always be emulated 
by aligned memory access ( by byte-size load/stores).  then there will be no 
compiler-time known 
misaligned memory access left for the special misaligned ld/st insns. 

second, there are always overhead cost for the compiler-time effort to make the 
compiler-time known unaligned memory access as aligned memory access. (adding 
additional
padding, or split the unaligned multi-bytes to single-byte load/store), all 
such overhead might be even bigger than the overhead of the special misaligned 
load/store itself.

to decide which is better (to use software emulation or use hardware misaligned 
load/store insns), experiments might be needed to justify the performance 
impact.

This set of change is to provide a way to use misaligned load/store insns to 
implement the compiler-time known unaligned memory access,  -mno-misalign can 
be used
to disable such behavior very easily if our performance data shows that 
misaligned load/store insns are slower than the current software emulation. 

Qing


> 
> This is incredibly important for on-stack objects.



Re: [PATCH 1/1] sparc: support for -mmisalign in the SPARC M8

2017-08-03 Thread Qing Zhao

> On Aug 2, 2017, at 6:17 PM, David Miller <da...@davemloft.net> wrote:
> 
> From: Qing Zhao <qing.z...@oracle.com>
> Date: Wed, 2 Aug 2017 14:41:51 -0500
> 
>> so, could you please specify what kind of side effects will have
>> when set STRICT_ALIGNMENT to true on TARGET_MISALIGN?
> 
> Why don't you read the code rather than just relying upon what
> high level description is given by the documentation instead?

I read the codes before making the change, that’s the reason I ask you to 
specify clearly the bad side effect that I didn’t considered yet.

thanks.

Qing
> 
> Thanks.



Re: A potential bug in lra-constraints.c for special_memory_constraint?

2017-07-11 Thread Qing Zhao
thanks for the replying.

> On Jul 11, 2017, at 2:44 AM, Eric Botcazou  wrote:
> 
>> From the above doc, the major difference between a memory_constraint and a
>> special_memory_constraint is:  whether "reload can or cannot make them match
>> by reloading the address".
> 
> Right, i.e. by just changing the form of the address (instead of the address 
> itself).
> 
>> For memory_constraint,  the reload is Okay, however, for
>> special_memory_constraint, the reload is NOT Okay.
>> 
>> I am not sure whether the RELOAD includes Spill or not, if it is, then the
>> current handling of special_memory_constraint is NOT correct:
>> (lra-constraints.c)
>> 
>> 2088 case CT_SPECIAL_MEMORY:
>> 2089   if (MEM_P (op)
>> 2090   && satisfies_memory_constraint_p (op, cn))
>> 2091 win = true;
>> 2092   else if (spilled_pseudo_p (op))
>> 2093 win = true;
>> 2094   break;
>> 
>> line 2092-2093 permits the memory spill, which seems need to be avoided for
>> SPECIAL_MEMORY_Constraint.
>> 
>> the thing I need to confirm is:
>> 
>> whether “spill” is considered as RELOAD or NOT?
> 
> Yes, spilling is part of reloading but is not reloading the address since 
> it's 
> changing the address, so I think it's OK.  Why is that problematic for you?

the problem I had is:

1. we added a new special_memory_constraint for misaligned memory access,  one 
important requirement
for this new special_memory_constraint is, the address of the memory access is 
misaligned.

2. per the current code in lra-constraints.c:

2286 case CT_SPECIAL_MEMORY:
2287   if (MEM_P (op)
2288   && satisfies_memory_constraint_p (op, cn))
2289 win = true;
2290   else if (spilled_pseudo_p (op))
2291 win = true;
2292   break;

if the op is a pseudo_p that can be spilled, then it's treated as a
PERFECT MATCH.

the issue only can be exposed by the following kind of RTL:

(insn 34 13 14 2 (set (reg:DI 122)
   (reg:DI 129)) misalign-3.c:12 125 {*movdi_insn_sp64}
(nil))

i.e.
(1). REG2 move to REG1
and. (2). REG2 is a virtual reg (> the max hard regno, on Sparc, its 103),
therefore, must be spilled to stack.

the current interpretation of special memory treat such REG2 as a perfect
match to special memory,  and then spill it.
however, such spilled memory RTL is NOT match the MISALIGN requirement, (i.e, 
the address of the memory
access for the spilled RTL is not misaligned)

therefore triggered the failure later.

That’s the reason I tracked down to this potential issue in the handling of 
“CT_SPECIAL_MEMORY”.

thanks a lot.

Qing


> -- 
> Eric Botcazou



Re: A potential bug in lra-constraints.c for special_memory_constraint?

2017-07-11 Thread Qing Zhao

> On Jul 11, 2017, at 2:00 PM, Eric Botcazou  wrote:
> 
>> the problem I had is:
>> 
>> 1. we added a new special_memory_constraint for misaligned memory access, 
>> one important requirement for this new special_memory_constraint is, the
>> address of the memory access is misaligned.
> 
> OK, it's the other way around from the usage of special_memory_constraint.
> In other words, I'm not sure that you really need to use it in this case,
> a standard memory_constraint could be sufficient.

A standard memory_constraint has the same handling on spilled code as 
special_memory_constraint:

2250 case CT_MEMORY:
2251   if (MEM_P (op)
2252   && satisfies_memory_constraint_p (op, cn))
2253 win = true;
2254   else if (spilled_pseudo_p (op))
2255 win = true;

as a result, if the new misaligned memory access was defined as a standard 
memory_constraint, will have the same issue.


> 
>> 2. per the current code in lra-constraints.c:
>> 
>> 2286 case CT_SPECIAL_MEMORY:
>> 2287   if (MEM_P (op)
>> 2288   && satisfies_memory_constraint_p (op, cn))
>> 2289 win = true;
>> 2290   else if (spilled_pseudo_p (op))
>> 2291 win = true;
>> 2292   break;
>> 
>> if the op is a pseudo_p that can be spilled, then it's treated as a
>> PERFECT MATCH.
>> 
>> the issue only can be exposed by the following kind of RTL:
>> 
>> (insn 34 13 14 2 (set (reg:DI 122)
>>   (reg:DI 129)) misalign-3.c:12 125 {*movdi_insn_sp64}
>>(nil))
>> 
>> i.e.
>>(1). REG2 move to REG1
>> and. (2). REG2 is a virtual reg (> the max hard regno, on Sparc, its 103),
>> therefore, must be spilled to stack.
>> 
>> the current interpretation of special memory treat such REG2 as a perfect
>> match to special memory,  and then spill it.
>> however, such spilled memory RTL is NOT match the MISALIGN requirement,
>> (i.e, the address of the memory access for the spilled RTL is not
>> misaligned)
> 
> Yes, spilling will automatically meet alignment requirements, that's why it's 
> allowed for special_memory_constraint.

You mean, even for the mis-alignment requirement, the spilled memory access 
will met the mis-alignment?

> 
> Why do you absolutely need to have a misaligned address?
we need to generate misaligned load/store insns ONLY for misaligned memory 
access, therefore need a new 
constraints for misaligned address.

As I checked the GCC source code, the special_memory_constraints only were 
defined in i386 and sparc target, not
used quite often.

Qing

>  Can't you just avert 
> your eyes and pretend that the address is misaligned?  This will be 
> suboptimal 
> but presumably work.  To be honest, I'm not even sure that you really need an 
> additional constraint, but I haven't investigated the subject seriously.
> 
> -- 
> Eric Botcazou



Re: A potential bug in lra-constraints.c for special_memory_constraint?

2017-07-12 Thread Qing Zhao

> On Jul 12, 2017, at 10:28 AM, David Miller <da...@davemloft.net> wrote:
> 
> From: Qing Zhao <qing.z...@oracle.com>
> Date: Wed, 12 Jul 2017 08:49:52 -0500
> 
>> and it also clearly mentioned that “specially aligned memory might
>> use this constraint”.
> 
> It guarantees the achieve the opposite of what you are trying to do.
> 
> That is, it can be used to guarantee that something is aligned to a
> multiple of X or greater.

so, the spill code can guarantee to provide a special alignment for multiple of 
X or greater?


> 
> What you want is to know that something is guaranteed to be
> aligned less strongly that X.  And that invariant is not
> provided for.

are you implying that special_memory_constraint is NOT for my current purpose?

thanks.

Qing



A potential bug in lra-constraints.c for special_memory_constraint?

2017-07-10 Thread Qing Zhao
Hi,team:


The following doc for memory_constraint and special_memory_constraint seems
imply that the handling of the special_memory_constraint in lra-constraints.c
is NOT correct:

(https://gcc.gnu.org/onlinedocs/gccint/Define-Constraints.html#Define-Constraints)

MD Expression: define_memory_constraint name docstring exp

Use this expression for constraints that match a subset of all memory
operands: that is, reload can make them match by converting the operand to
the form `(mem (reg X))¿, where X is a base register (from the register class
specified by BASE_REG_CLASS, see Register Classes).

….

The syntax and semantics are otherwise identical to define_constraint.


MD Expression: define_special_memory_constraint name docstring exp

Use this expression for constraints that match a subset of all memory
operands: that is, reload can not make them match by reloading the address as
it is described for define_memory_constraint or such address reload is
undesirable with the performance point of view.

For example, define_special_memory_constraint can be useful if
specifically aligned memory is necessary or desirable for some insn operand.

The syntax and semantics are otherwise identical to define_constraint.


>From the above doc, the major difference between a memory_constraint and a
special_memory_constraint is:  whether "reload can or cannot make them match
by reloading the address".

For memory_constraint,  the reload is Okay, however, for
special_memory_constraint, the reload is NOT Okay.

I am not sure whether the RELOAD includes Spill or not, if it is, then the
current handling of special_memory_constraint is NOT correct:
(lra-constraints.c)

2088 case CT_SPECIAL_MEMORY:
2089   if (MEM_P (op)
2090   && satisfies_memory_constraint_p (op, cn))
2091 win = true;
2092   else if (spilled_pseudo_p (op))
2093 win = true;
2094   break;

line 2092-2093 permits the memory spill, which seems need to be avoided for
SPECIAL_MEMORY_Constraint.

the thing I need to confirm is:

whether “spill” is considered as RELOAD or NOT?

if the spill IS RELOAD, then the handling of special_memory_constraint in 
lra-constraints.c is 
definitely wrong, we should fix this issue in upstream. 

thanks.

Qing

Re: A potential bug in lra-constraints.c for special_memory_constraint?

2017-07-11 Thread Qing Zhao

> On Jul 11, 2017, at 4:24 PM, Eric Botcazou  wrote:
> 
>> we need to generate misaligned load/store insns ONLY for misaligned memory
>> access, therefore need a new constraints for misaligned address.
> 
> Why?  What happens exactly if the memory access turns out to be aligned?

we add this new constraint as:

;; We need a special memory constraint for the misaligned memory access
;; This is only for TARGET_MISALIGN target
(define_special_memory_constraint "B"
 "Memory reference whose address is misaligned"
 (and (match_code "mem")
  (match_test "TARGET_MISALIGN")
  (match_test "memory_is_misaligned (op, mode)”)))

the routine “memory_is_misaligned” is a compile-time check to see whether the 
address is known to be
misaligned or not. only for compile-time KNOWN misaligned memory access, we 
will use misaligned load/store insns
provided by the new processor for the memory access. 

and then put this new constraints to sparc.md as: 

(define_insn "*movdi_insn_sp64"
  [(set (match_operand:DI 0 "nonimmediate_operand" "=r,r,r,r, B, m, 
r,*e,?*e,?*e,?W,b,b")
(match_operand:DI 1 "input_operand""rI,N,B,m,rJ,rJ,*e, r, *e,  
W,*e,J,P"))]


NOTE, the 4th constraints for this insn is “B, rJ”,  if the operands match this 
constraint, then.  misaligned store insns will be generated for the
misaligned memory access instead of regular store. 

misaligned insns will NOT be used for memory access whose alignment cannot be 
decided to be misaligned during compilation time.

Hope this is clear.  If I still don’t answer your question, please let me know.

Qing
 

> 
> -- 
> Eric Botcazou



[PATCH] Fix PR81422[aarch64] internal compiler error: in update_equiv_regs, at ira.c:3425

2017-09-19 Thread Qing Zhao
Hi,

This patch fixes the aarch64 bug 81422 
https://gcc.gnu.org/PR81422

Before adding REG_EQUIV notes in the TLS symbol handling code,
we should check whether the "dest" is a REG or NOT (sometimes,
it's a SUBREG as in this bug). Only when the “dest” is a REG, the note
will be added.

a new small testing case is added. 

this change has been tested on aarch64-unknown-linux-gnu with no regression. 

Thanks!

Qing


===

gcc/ChangeLog:

   * config/aarch64/aarch64.c (aarch64_load_symref_appropriately):
   check whether the dest is REG before adding REG_EQUIV note.

gcc/testsuite/ChangeLog:

   PR target/81422
   * gcc.target/aarch64/pr81422.C: New test.

---
 gcc/config/aarch64/aarch64.c   | 12 
 gcc/testsuite/gcc.target/aarch64/pr81422.C | 15 +++
 2 files changed, 23 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/pr81422.C

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index a2ecd7a..6c3ef76 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1480,7 +1480,8 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm,
  tp = gen_lowpart (mode, tp);
 
emit_insn (gen_rtx_SET (dest, gen_rtx_PLUS (mode, tp, x0)));
-   set_unique_reg_note (get_last_insn (), REG_EQUIV, imm);
+if (REG_P (dest)) 
+ set_unique_reg_note (get_last_insn (), REG_EQUIV, imm);
return;
   }
 
@@ -1514,7 +1515,8 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm,
  }
 
emit_insn (gen_rtx_SET (dest, gen_rtx_PLUS (mode, tp, tmp_reg)));
-   set_unique_reg_note (get_last_insn (), REG_EQUIV, imm);
+if (REG_P (dest)) 
+ set_unique_reg_note (get_last_insn (), REG_EQUIV, imm);
return;
   }
 
@@ -1555,7 +1557,8 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm,
gcc_unreachable ();
  }
 
-   set_unique_reg_note (get_last_insn (), REG_EQUIV, imm);
+if (REG_P (dest)) 
+ set_unique_reg_note (get_last_insn (), REG_EQUIV, imm);
return;
   }
 
@@ -1584,7 +1587,8 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm,
emit_insn (gen_tlsie_tiny_sidi (dest, imm, tp));
  }
 
-   set_unique_reg_note (get_last_insn (), REG_EQUIV, imm);
+if (REG_P (dest))
+ set_unique_reg_note (get_last_insn (), REG_EQUIV, imm);
return;
   }
 
diff --git a/gcc/testsuite/gcc.target/aarch64/pr81422.C 
b/gcc/testsuite/gcc.target/aarch64/pr81422.C
new file mode 100644
index 000..5bcc948
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr81422.C
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O0" } */
+
+struct DArray
+{
+__SIZE_TYPE__ length;
+int* ptr;
+};
+
+void foo35(DArray)
+{
+static __thread int x[5];
+foo35({5, (int*)});
+}
+
-- 
1.9.1



[PATCH] Fix PR80295[aarch64] [7/8 Regression] ICE in __builtin_update_setjmp_buf expander

2017-10-06 Thread Qing Zhao
Thanks a lot for Wilco’s help on this bug. 

Yes, Aarch64 does NOT do anything wrong.  

The implementation of __builtin_update_setjmp_buf is not correct. It takes a 
pointer
as an operand and treats the Mode of the pointer as Pmode, which is not correct.
a conversion from ptr_mode to Pmode is needed for this pointer.

bootstrapped on aarch64-unknown-linux-gnu.
testing on aarch64-unknown-linux-gnu is running.

the new patch is:

=
gcc/ChangeLog

* builtins.c (expand_builtin_update_setjmp_buf): Add a
converstion to Pmode from the buf_addr.

gcc/testsuite/ChangeLog

PR middle-end/80295
* gcc.target/aarch64/pr80295.c: New test.

---
 gcc/builtins.c | 1 +
 gcc/testsuite/gcc.target/aarch64/pr80295.c | 8 
 2 files changed, 9 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/pr80295.c

diff --git a/gcc/builtins.c b/gcc/builtins.c
index c8a5ea6..01fb08b 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -1199,6 +1199,7 @@ void
 expand_builtin_update_setjmp_buf (rtx buf_addr)
 {
   machine_mode sa_mode = STACK_SAVEAREA_MODE (SAVE_NONLOCAL);
+  buf_addr = convert_memory_address (Pmode, buf_addr);
   rtx stack_save
 = gen_rtx_MEM (sa_mode,
   memory_address
diff --git a/gcc/testsuite/gcc.target/aarch64/pr80295.c 
b/gcc/testsuite/gcc.target/aarch64/pr80295.c
new file mode 100644
index 000..b3866d8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr80295.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-mabi=ilp32" } */
+
+void f (void *b) 
+{ 
+  __builtin_update_setjmp_buf (b); 
+}
+
-- 
1.9.1


> 
>> On Oct 5, 2017, at 11:50 AM, Richard Earnshaw (lists) 
>> <richard.earns...@arm.com> wrote:
>> 
>> On 25/09/17 17:35, Qing Zhao wrote:
>>> 
>>> --- a/gcc/config/aarch64/aarch64.h
>>> +++ b/gcc/config/aarch64/aarch64.h
>>> @@ -782,7 +782,7 @@ typedef struct
>>> /* Specify the machine mode that the hardware addresses have.
>>>   After generation of rtl, the compiler makes no further distinction
>>>   between pointers and any other objects of this machine mode.  */
>>> -#define Pmode  DImode
>>> +#define Pmode  (TARGET_ILP32 ? SImode : DImode)
>> 
>> This is wrong.  AArch64 has both ptr_mode and Pmode.  Pmode must always
>> be DImode as (internally) all addresses must be 64-bit.  ptr_mode
>> reflects the ABI choice of 32/64-bit language level addresses.  The
>> register SP must always be a 64-bit value, even when all the top 32 bits
>> are zero.
> 



Re: [PATCH][aarch64] Fix pr81356 - copy empty string with wrz, not a ldrb/strb

2017-10-03 Thread Qing Zhao
I think the change is good.  But I don’t have the permission for approval…

Qing
> On Sep 25, 2017, at 12:36 PM, Steve Ellcey  wrote:
> 
> Ping.
> 
> Steve Ellcey
> sell...@cavium.com
> 
> 
> On Fri, 2017-09-15 at 11:22 -0700, Steve Ellcey wrote:
>> PR 81356 points out that doing a __builtin_strcpy of an empty string on
>> aarch64 does a copy from memory instead of just writing out a zero byte.
>> In looking at this I found that it was because of
>> aarch64_use_by_pieces_infrastructure_p, which returns false for
>> STORE_BY_PIECES.  The comment says:
>> 
>>   /* STORE_BY_PIECES can be used when copying a constant string, but
>>  in that case each 64-bit chunk takes 5 insns instead of 2 (LDR/STR).
>>  For now we always fail this and let the move_by_pieces code copy
>>  the string from read-only memory.  */
>> 
>> But this doesn't seem to be the case anymore.  When I remove this function
>> and the TARGET_USE_BY_PIECES_INFRASTRUCTURE_P macro that uses it the code
>> for __builtin_strcpy of a constant string seems to be either better or the
>> same.  The only time I got more instructions after removing this function
>> was on an 8 byte __builtin_strcpy where we now generate a mov and 3 movk
>> instructions to create the source followed by a store instead of doing a
>> load/store of 8 bytes.  The comment may have been applicable for
>> -mstrict-align at one time but it doesn't seem to be the case now.  I still
>> get better code without this routine under that option as well.
>> 
>> Bootstrapped and tested without regressions, OK to checkin?
>> 
>> Steve Ellcey
>> sell...@cavium.com
>> 
>> 
>> 
>> 2017-09-15  Steve Ellcey  
>> 
>>  PR target/81356
>>  * config/aarch64/aarch64.c
>> (aarch64_use_by_pieces_infrastructure_p):
>>  Remove.
>>  (TARGET_USE_BY_PIECES_INFRASTRUCTURE_P): Remove define.
>> 
>> 
>> 2017-09-15  Steve Ellcey  
>> 
>>  * gcc.target/aarch64/pr81356.c: New test.



Re: [PATCH] Fix PR80295[aarch64] [7/8 Regression] ICE in __builtin_update_setjmp_buf expander

2017-10-09 Thread Qing Zhao

> On Oct 9, 2017, at 11:27 AM, Qing Zhao <qing.z...@oracle.com> wrote:
> 
>> 
>> On Oct 9, 2017, at 5:33 AM, Richard Earnshaw (lists) 
>> <richard.earns...@arm.com> wrote:
>> 
>> On 06/10/17 20:56, Qing Zhao wrote:
>>> Thanks a lot for Wilco’s help on this bug. 
>>> 
>>> Yes, Aarch64 does NOT do anything wrong.  
>>> 
>>> The implementation of __builtin_update_setjmp_buf is not correct. It takes 
>>> a pointer
>>> as an operand and treats the Mode of the pointer as Pmode, which is not 
>>> correct.
>>> a conversion from ptr_mode to Pmode is needed for this pointer.
>>> 
>>> bootstrapped on aarch64-unknown-linux-gnu.
>>> testing on aarch64-unknown-linux-gnu is running.
>>> 
>> 
>> This is more believable for AArch64, but we need to be careful here.  I
>> think this needs testing on at least one other target that has a
>> ilp32-on-64-bit ABI, eg x86_64's x32 ABI.
> 
> I just built the latest upstream GCC on x86_64 Linux. and use it to compile 
> the testing case in PR80295 with the following options:
> 
> [qinzhao@qzh-ol6-test 80295]$ cat t.c
> void f (void *b) { __builtin_update_setjmp_buf (b); }
> [qinzhao@qzh-ol6-test 80295]$ sh t
> /home/qinzhao/Install/latest/bin/gcc -m32 t.c
> 
> it compiles successfully on x86_64.  no similar issue as aarch64. 
> 
> DId  I use the correct option on X86 (i.e -m32)?

I also tried -mx32,  the compilation had no issue. 

With the latest GCC + my patch, on X86,  the testing case still has No any 
issue with -m32 or -mx32. 

my testing on aarch64-unknown-linux-gnu has finished, no any regression. 

Is my patch Okay?

thanks.

Qing



Re: [PATCH] Fix PR80295[aarch64] [7/8 Regression] ICE in __builtin_update_setjmp_buf expander

2017-10-09 Thread Qing Zhao

> On Oct 9, 2017, at 5:33 AM, Richard Earnshaw (lists) 
> <richard.earns...@arm.com> wrote:
> 
> On 06/10/17 20:56, Qing Zhao wrote:
>> Thanks a lot for Wilco’s help on this bug. 
>> 
>> Yes, Aarch64 does NOT do anything wrong.  
>> 
>> The implementation of __builtin_update_setjmp_buf is not correct. It takes a 
>> pointer
>> as an operand and treats the Mode of the pointer as Pmode, which is not 
>> correct.
>> a conversion from ptr_mode to Pmode is needed for this pointer.
>> 
>> bootstrapped on aarch64-unknown-linux-gnu.
>> testing on aarch64-unknown-linux-gnu is running.
>> 
> 
> This is more believable for AArch64, but we need to be careful here.  I
> think this needs testing on at least one other target that has a
> ilp32-on-64-bit ABI, eg x86_64's x32 ABI.

I just built the latest upstream GCC on x86_64 Linux. and use it to compile the 
testing case in PR80295 with the following options:

[qinzhao@qzh-ol6-test 80295]$ cat t.c
void f (void *b) { __builtin_update_setjmp_buf (b); }
[qinzhao@qzh-ol6-test 80295]$ sh t
/home/qinzhao/Install/latest/bin/gcc -m32 t.c

it compiles successfully on x86_64.  no similar issue as aarch64. 

DId  I use the correct option on X86 (i.e -m32)?

Qing

> 
> 
> R.
> 


Re: [PATCH] Fix PR80295[aarch64] [7/8 Regression] ICE in __builtin_update_setjmp_buf expander

2017-10-10 Thread Qing Zhao

On Oct 9, 2017, at 12:05 PM, Qing Zhao <qing.z...@oracle.com> wrote:
> 
>>>> Thanks a lot for Wilco’s help on this bug. 
>>>> 
>>>> Yes, Aarch64 does NOT do anything wrong.  
>>>> 
>>>> The implementation of __builtin_update_setjmp_buf is not correct. It takes 
>>>> a pointer
>>>> as an operand and treats the Mode of the pointer as Pmode, which is not 
>>>> correct.
>>>> a conversion from ptr_mode to Pmode is needed for this pointer.
>>>> 
>>>> bootstrapped on aarch64-unknown-linux-gnu.
>>>> testing on aarch64-unknown-linux-gnu is running.
>>>> 
>>> 
>>> This is more believable for AArch64, but we need to be careful here.  I
>>> think this needs testing on at least one other target that has a
>>> ilp32-on-64-bit ABI, eg x86_64's x32 ABI.
>> 
>> I just built the latest upstream GCC on x86_64 Linux. and use it to compile 
>> the testing case in PR80295 with the following options:
>> 
>> [qinzhao@qzh-ol6-test 80295]$ cat t.c
>> void f (void *b) { __builtin_update_setjmp_buf (b); }
>> [qinzhao@qzh-ol6-test 80295]$ sh t
>> /home/qinzhao/Install/latest/bin/gcc -m32 t.c
>> 
>> it compiles successfully on x86_64.  no similar issue as aarch64. 
>> 
>> DId  I use the correct option on X86 (i.e -m32)?
> 
> I also tried -mx32,  the compilation had no issue. 
> 
> With the latest GCC + my patch, on X86,  the testing case still has No any 
> issue with -m32 or -mx32. 
> 
> my testing on aarch64-unknown-linux-gnu has finished, no any regression. 

More testing on x86_64-pc-linux-gnu, bootstrapped and gcc testing,  No any 
regression.

> 
> Is my patch Okay?
> 
> thanks.
Qing


Re: [PATCH] Fix PR81422[aarch64] internal compiler error: in update_equiv_regs, at ira.c:3425

2017-10-05 Thread Qing Zhao
HI, Richard,

> On Oct 5, 2017, at 9:53 AM, Richard Earnshaw (lists) 
>  wrote:
> 
> Two minor nits to fix:
> 
> - ChangeLog entries should start with a capital letter (s/check/Check/).
> - Please use hard tabs rather than 8 consecutive spaces (each of your
> new 'if' statements).
> 
> OK with those changes.
> 
> R.

Per your comments, my updated patch is following:

since I don’t have the SVN write permission, if the change is good, could you 
help me to check them in?

thanks a lot.

Qing

===

gcc/ChangeLog:

  * config/aarch64/aarch64.c (aarch64_load_symref_appropriately):
  Check whether the dest is REG before adding REG_EQUIV note.

gcc/testsuite/ChangeLog:

  PR target/81422
  * gcc.target/aarch64/pr81422.C: New test.

---
 gcc/config/aarch64/aarch64.c   | 12 
 gcc/testsuite/gcc.target/aarch64/pr81422.C | 15 +++
 2 files changed, 23 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/pr81422.C

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index ee98a1f..181f66a 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1490,7 +1490,8 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm,
  tp = gen_lowpart (mode, tp);
 
emit_insn (gen_rtx_SET (dest, gen_rtx_PLUS (mode, tp, x0)));
-   set_unique_reg_note (get_last_insn (), REG_EQUIV, imm);
+   if (REG_P (dest))
+ set_unique_reg_note (get_last_insn (), REG_EQUIV, imm);
return;
   }
 
@@ -1524,7 +1525,8 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm,
  }
 
emit_insn (gen_rtx_SET (dest, gen_rtx_PLUS (mode, tp, tmp_reg)));
-   set_unique_reg_note (get_last_insn (), REG_EQUIV, imm);
+   if (REG_P (dest))
+ set_unique_reg_note (get_last_insn (), REG_EQUIV, imm);
return;
   }
 
@@ -1565,7 +1567,8 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm,
gcc_unreachable ();
  }
 
-   set_unique_reg_note (get_last_insn (), REG_EQUIV, imm);
+   if (REG_P (dest))
+ set_unique_reg_note (get_last_insn (), REG_EQUIV, imm);
return;
   }
 
@@ -1594,7 +1597,8 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm,
emit_insn (gen_tlsie_tiny_sidi (dest, imm, tp));
  }
 
-   set_unique_reg_note (get_last_insn (), REG_EQUIV, imm);
+   if (REG_P (dest))
+ set_unique_reg_note (get_last_insn (), REG_EQUIV, imm);
return;
   }
 
diff --git a/gcc/testsuite/gcc.target/aarch64/pr81422.C 
b/gcc/testsuite/gcc.target/aarch64/pr81422.C
new file mode 100644
index 000..5bcc948
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr81422.C
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O0" } */
+
+struct DArray
+{
+__SIZE_TYPE__ length;
+int* ptr;
+};
+
+void foo35(DArray)
+{
+static __thread int x[5];
+foo35({5, (int*)});
+}
+
-- 
1.9.1



Re: [PATCH] Fix PR80295[aarch64] [7/8 Regression] ICE in __builtin_update_setjmp_buf expander

2017-10-16 Thread Qing Zhao

> On Oct 16, 2017, at 5:52 AM, Wilco Dijkstra <wilco.dijks...@arm.com> wrote:
> 
> Qing Zhao wrote:
> 
>> Is  my patch Okay?
> 
> Given it's a mid-end patch this shouldn't be marked as AArch64 specific.
> Similarly the PR needs to be updated to say middle-end. So resending
> it making it clear it's not a target bug should help getting a review.

thanks, I will do that.

I do have a question on the new added testing case:

currently, I added it into gcc.target/aarch64/pr80295.c,  but due to it’s a 
middleend bug, shall I added it to gcc.dg?

However, the option that triggered the issue is -mabi=ilp32, which is a target 
depend option, so, I am really confused on where to put the
testing case.

thanks for the help.

Qing



[PATCH][Middle-end]Fix PR80295 [7/8 Regression] ICE in __builtin_update_setjmp_buf expander

2017-10-16 Thread Qing Zhao
resend this patch for middle-end to review. 

this patch was originally sent to aarch64 for review in the beginning:

https://gcc.gnu.org/ml/gcc-patches/2017-10/msg00404.html 
<https://gcc.gnu.org/ml/gcc-patches/2017-10/msg00404.html>
The implementation of __builtin_update_setjmp_buf is not correct. It takes a 
pointer
as an operand and treats the Mode of the pointer as Pmode, which is not correct.
a conversion from ptr_mode to Pmode is needed for this pointer.

bootstrapped and tested on both aarch64-unknown-linux-gnu and 
x86_64-pc-linux-gnu, 
no regressions.

Wilco helped me a lot during fixing this bug.

Okay for trunk?

the patch is:

gcc/ChangeLog

2017-10-16  Qing Zhao <qing.z...@oracle.com>
Wilco Dijkstra <wilco.dijks...@arm.com>

* builtins.c (expand_builtin_update_setjmp_buf): Add a
converstion to Pmode from the buf_addr.

gcc/testsuite/ChangeLog

2017-10-16  Qing Zhao <qing.z...@oracle.com>
Wilco Dijkstra <wilco.dijks...@arm.com>

PR middle-end/80295
* gcc.target/aarch64/pr80295.c: New test.

---
 gcc/builtins.c | 1 +
 gcc/testsuite/gcc.target/aarch64/pr80295.c | 8 
 2 files changed, 9 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/pr80295.c

diff --git a/gcc/builtins.c b/gcc/builtins.c
index c8a5ea6..01fb08b 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -1199,6 +1199,7 @@ void
 expand_builtin_update_setjmp_buf (rtx buf_addr)
 {
   machine_mode sa_mode = STACK_SAVEAREA_MODE (SAVE_NONLOCAL);
+  buf_addr = convert_memory_address (Pmode, buf_addr);
   rtx stack_save
 = gen_rtx_MEM (sa_mode,
   memory_address
diff --git a/gcc/testsuite/gcc.target/aarch64/pr80295.c 
b/gcc/testsuite/gcc.target/aarch64/pr80295.c
new file mode 100644
index 000..b3866d8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr80295.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-mabi=ilp32" } */
+
+void f (void *b) 
+{ 
+  __builtin_update_setjmp_buf (b); 
+}
+
-- 
1.9.1



Re: [PATCH 1/1] sparc: support for -mmisalign in the SPARC M8

2017-09-06 Thread Qing Zhao
Just a followup on this patch.

We did some run-time performance testing internally on this set of
change on sparc M8 machine with -mmisalign and -mno-misalign
based on the latest upstream gcc

for CPU2017 C/C++ SPEED run:

***without -O,  -mmisalign slowdown the run-time performance about 4% on
average

This is mainly due to the following workaround to misaligned support in
M8: (config/sparc/sparc.c)

+/* for misaligned ld/st provided by M8, the IMM field is 10-bit wide
+   other than the 13-bit for regular ld/st.
+   The best solution for this problem is to distinguish each ld/st
+   whether it's aligned or misaligned. However, due to the current
+   design of the common routine TARGET_LEGITIMATE_ADDRESS_P,  only
+   the ADDR of a ld/st is passed to the routine, the align info
+   carried by the corresponding MEM is NOT passed in. without changing
+   the prototype of TARGET_LEGITIMATE_ADDRESS_P, we cannot use this
+   best solution.
+   as a workaround, we have to conservatively treat ALL IMM field of
+   a ld/st insn on a MISALIGNED target is 10-bit wide.
+   the side-effect of this workaround is:  there will be additiona
+   REG<-IMM insn generated for regular ld/st when -mmisalign is ON.
+   However, such additional reload insns should be very easily to be
+   removed by a set of optimization whenever -O specified.
+*/
+#define RTX_OK_FOR_OFFSET_P(X, MODE) \
+  (CONST_INT_P (X)   \
+   && ((!TARGET_MISALIGN \
+&& INTVAL (X) >=3D3D -0x1000 \
+&& INTVAL (X) <=3D3D (0x1000 - GET_MODE_SIZE (MODE)))\
+|| (TARGET_MISALIGN  \
+&& INTVAL (X) >=3D3D -0x0400 \
+&& INTVAL (X) <=3D3D (0x0400 - GET_MODE_SIZE (MODE)

due to this run-time regression introduced by this workaround is not
trivial, We decided to hold on this
set of change at this time.

Thanks.

Qing

> 
> This set of change is to provide a way to use misaligned load/store insns to 
> implement the compiler-time known unaligned memory access,  -mno-misalign can 
> be used
> to disable such behavior very easily if our performance data shows that 
> misaligned load/store insns are slower than the current software emulation. 
> 
> Qing



[PATCH] Fix PR80295[aarch64] [7/8 Regression] ICE in __builtin_update_setjmp_buf expander

2017-09-25 Thread Qing Zhao
Hi,

This patch fixes the aarch64 bug 80295
https://gcc.gnu.org/PR80295

The aarch64 backend has multiple places that miss the handling of TARGET_ILP32.
in the patch, we added correct handling of TARGET_ILP32 into aarch64 backend. 

a new small testing case is added.

bootstrapped and tested on aarch64-unknown-linux-gnu with no regression.

thanks.

Qing

==

gcc/ChangeLog:

   * config/aarch64/aarch64.c (aarch64_expand_prologue):
   emit different modes of stack_tie insn depend on TARGET_ILP32.
   (aarch64_expand_epilogue): Likewise.
   * config/aarch64/aarch64.h: define Pmode to SImode/DImode
   depend on TARGET_ILP32.
  * config/aarch64/aarch64.md: define insn stack_tie to different
   modes (SImode/DImode) 

gcc/testsuite/ChangeLog:

   PR middle-end/80295
   * gcc.target/aarch64/pr80295.c: New test.

---
 gcc/config/aarch64/aarch64.c   | 12 +---
 gcc/config/aarch64/aarch64.h   |  2 +-
 gcc/config/aarch64/aarch64.md  |  6 +++---
 gcc/testsuite/gcc.target/aarch64/pr80295.c |  8 
 4 files changed, 21 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/pr80295.c

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 6c3ef76..876e9e3 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -3693,7 +3693,9 @@ aarch64_expand_prologue (void)
   stack_pointer_rtx,
   GEN_INT (callee_offset)));
   RTX_FRAME_RELATED_P (insn) = 1;
-  emit_insn (gen_stack_tie (stack_pointer_rtx, hard_frame_pointer_rtx));
+  emit_insn (TARGET_ILP32 ? 
+ gen_stack_tiesi (stack_pointer_rtx, hard_frame_pointer_rtx) :
+ gen_stack_tiedi (stack_pointer_rtx, hard_frame_pointer_rtx));
 }
 
   aarch64_save_callee_saves (DImode, callee_offset, R0_REGNUM, R30_REGNUM,
@@ -3750,7 +3752,9 @@ aarch64_expand_epilogue (bool for_sibcall)
   if (final_adjust > crtl->outgoing_args_size || cfun->calls_alloca
   || crtl->calls_eh_return)
 {
-  emit_insn (gen_stack_tie (stack_pointer_rtx, stack_pointer_rtx));
+  emit_insn (TARGET_ILP32 ? 
+ gen_stack_tiesi (stack_pointer_rtx, stack_pointer_rtx) :
+ gen_stack_tiedi (stack_pointer_rtx, stack_pointer_rtx));
   need_barrier_p = false;
 }
 
@@ -3774,7 +3778,9 @@ aarch64_expand_epilogue (bool for_sibcall)
callee_adjust != 0, _ops);
 
   if (need_barrier_p)
-emit_insn (gen_stack_tie (stack_pointer_rtx, stack_pointer_rtx));
+emit_insn (TARGET_ILP32 ? 
+   gen_stack_tiesi (stack_pointer_rtx, stack_pointer_rtx) :
+   gen_stack_tiedi (stack_pointer_rtx, stack_pointer_rtx));
 
   if (callee_adjust != 0)
 aarch64_pop_regs (reg1, reg2, callee_adjust, _ops);
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 8fada9e..df58442 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -782,7 +782,7 @@ typedef struct
 /* Specify the machine mode that the hardware addresses have.
After generation of rtl, the compiler makes no further distinction
between pointers and any other objects of this machine mode.  */
-#define Pmode  DImode
+#define Pmode  (TARGET_ILP32 ? SImode : DImode)
 
 /* A C expression whose value is zero if pointers that need to be extended
from being `POINTER_SIZE' bits wide to `Pmode' are sign-extended and
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index bb7f2c0..30853b2 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -5533,10 +5533,10 @@
   [(set_attr "type" "call")
(set_attr "length" "16")])
 
-(define_insn "stack_tie"
+(define_insn "stack_tie"
   [(set (mem:BLK (scratch))
-   (unspec:BLK [(match_operand:DI 0 "register_operand" "rk")
-(match_operand:DI 1 "register_operand" "rk")]
+   (unspec:BLK [(match_operand:GPI 0 "register_operand" "rk")
+(match_operand:GPI 1 "register_operand" "rk")]
UNSPEC_PRLG_STK))]
   ""
   ""
diff --git a/gcc/testsuite/gcc.target/aarch64/pr80295.c 
b/gcc/testsuite/gcc.target/aarch64/pr80295.c
new file mode 100644
index 000..b3866d8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr80295.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-mabi=ilp32" } */
+
+void f (void *b) 
+{ 
+  __builtin_update_setjmp_buf (b); 
+}
+
-- 
1.9.1



Re: [PATCH] Fix PR80295[aarch64] [7/8 Regression] ICE in __builtin_update_setjmp_buf expander

2017-09-25 Thread Qing Zhao
Hi, Andreas,

thanks for the comment. 

> GNU style is line break before the operator, not after.

updated per your comment.

Qing.

---
 gcc/config/aarch64/aarch64.c   | 12 +---
 gcc/config/aarch64/aarch64.h   |  2 +-
 gcc/config/aarch64/aarch64.md  |  6 +++---
 gcc/testsuite/gcc.target/aarch64/pr80295.c |  8 
 4 files changed, 21 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/pr80295.c

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 6c3ef76..ff0890d 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -3693,7 +3693,9 @@ aarch64_expand_prologue (void)
   stack_pointer_rtx,
   GEN_INT (callee_offset)));
   RTX_FRAME_RELATED_P (insn) = 1;
-  emit_insn (gen_stack_tie (stack_pointer_rtx, hard_frame_pointer_rtx));
+  emit_insn (TARGET_ILP32 
+ ? gen_stack_tiesi (stack_pointer_rtx, hard_frame_pointer_rtx)
+ : gen_stack_tiedi (stack_pointer_rtx, 
hard_frame_pointer_rtx));
 }
 
   aarch64_save_callee_saves (DImode, callee_offset, R0_REGNUM, R30_REGNUM,
@@ -3750,7 +3752,9 @@ aarch64_expand_epilogue (bool for_sibcall)
   if (final_adjust > crtl->outgoing_args_size || cfun->calls_alloca
   || crtl->calls_eh_return)
 {
-  emit_insn (gen_stack_tie (stack_pointer_rtx, stack_pointer_rtx));
+  emit_insn (TARGET_ILP32 
+ ? gen_stack_tiesi (stack_pointer_rtx, stack_pointer_rtx)
+ : gen_stack_tiedi (stack_pointer_rtx, stack_pointer_rtx));
   need_barrier_p = false;
 }
 
@@ -3774,7 +3778,9 @@ aarch64_expand_epilogue (bool for_sibcall)
callee_adjust != 0, _ops);
 
   if (need_barrier_p)
-emit_insn (gen_stack_tie (stack_pointer_rtx, stack_pointer_rtx));
+emit_insn (TARGET_ILP32 
+   ? gen_stack_tiesi (stack_pointer_rtx, stack_pointer_rtx)
+   : gen_stack_tiedi (stack_pointer_rtx, stack_pointer_rtx));
 
   if (callee_adjust != 0)
 aarch64_pop_regs (reg1, reg2, callee_adjust, _ops);
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 8fada9e..df58442 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -782,7 +782,7 @@ typedef struct
 /* Specify the machine mode that the hardware addresses have.
After generation of rtl, the compiler makes no further distinction
between pointers and any other objects of this machine mode.  */
-#define Pmode  DImode
+#define Pmode  (TARGET_ILP32 ? SImode : DImode)
 
 /* A C expression whose value is zero if pointers that need to be extended
from being `POINTER_SIZE' bits wide to `Pmode' are sign-extended and
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index bb7f2c0..30853b2 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -5533,10 +5533,10 @@
   [(set_attr "type" "call")
(set_attr "length" "16")])
 
-(define_insn "stack_tie"
+(define_insn "stack_tie"
   [(set (mem:BLK (scratch))
-   (unspec:BLK [(match_operand:DI 0 "register_operand" "rk")
-(match_operand:DI 1 "register_operand" "rk")]
+   (unspec:BLK [(match_operand:GPI 0 "register_operand" "rk")
+(match_operand:GPI 1 "register_operand" "rk")]
UNSPEC_PRLG_STK))]
   ""
   ""
diff --git a/gcc/testsuite/gcc.target/aarch64/pr80295.c 
b/gcc/testsuite/gcc.target/aarch64/pr80295.c
new file mode 100644
index 000..b3866d8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr80295.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-mabi=ilp32" } */
+
+void f (void *b) 
+{ 
+  __builtin_update_setjmp_buf (b); 
+}
+
-- 
1.9.1



[PING] [PATCH] Fix PR81422[aarch64] internal compiler error: in update_equiv_regs, at ira.c:3425

2017-10-03 Thread Qing Zhao
Ping

https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg173877.html

> On Sep 19, 2017, at 2:22 PM, Qing Zhao <qing.z...@oracle.com> wrote:
> 
> Hi,
> 
> This patch fixes the aarch64 bug 81422 
> https://gcc.gnu.org/PR81422
> 
> Before adding REG_EQUIV notes in the TLS symbol handling code,
> we should check whether the "dest" is a REG or NOT (sometimes,
> it's a SUBREG as in this bug). Only when the “dest” is a REG, the note
> will be added.
> 
> a new small testing case is added. 
> 
> this change has been tested on aarch64-unknown-linux-gnu with no regression. 
> 
> Thanks!
> 
> Qing
> 
> 
> ===
> 
> gcc/ChangeLog:
> 
>   * config/aarch64/aarch64.c (aarch64_load_symref_appropriately):
>   check whether the dest is REG before adding REG_EQUIV note.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR target/81422
>   * gcc.target/aarch64/pr81422.C: New test.
> 
> ---
> gcc/config/aarch64/aarch64.c   | 12 
> gcc/testsuite/gcc.target/aarch64/pr81422.C | 15 +++
> 2 files changed, 23 insertions(+), 4 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/aarch64/pr81422.C
> 
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index a2ecd7a..6c3ef76 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -1480,7 +1480,8 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm,
> tp = gen_lowpart (mode, tp);
> 
>   emit_insn (gen_rtx_SET (dest, gen_rtx_PLUS (mode, tp, x0)));
> - set_unique_reg_note (get_last_insn (), REG_EQUIV, imm);
> +if (REG_P (dest)) 
> +   set_unique_reg_note (get_last_insn (), REG_EQUIV, imm);
>   return;
>   }
> 
> @@ -1514,7 +1515,8 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm,
> }
> 
>   emit_insn (gen_rtx_SET (dest, gen_rtx_PLUS (mode, tp, tmp_reg)));
> - set_unique_reg_note (get_last_insn (), REG_EQUIV, imm);
> +if (REG_P (dest)) 
> +   set_unique_reg_note (get_last_insn (), REG_EQUIV, imm);
>   return;
>   }
> 
> @@ -1555,7 +1557,8 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm,
>   gcc_unreachable ();
> }
> 
> - set_unique_reg_note (get_last_insn (), REG_EQUIV, imm);
> +if (REG_P (dest)) 
> +   set_unique_reg_note (get_last_insn (), REG_EQUIV, imm);
>   return;
>   }
> 
> @@ -1584,7 +1587,8 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm,
>   emit_insn (gen_tlsie_tiny_sidi (dest, imm, tp));
> }
> 
> - set_unique_reg_note (get_last_insn (), REG_EQUIV, imm);
> +if (REG_P (dest))
> +   set_unique_reg_note (get_last_insn (), REG_EQUIV, imm);
>   return;
>   }
> 
> diff --git a/gcc/testsuite/gcc.target/aarch64/pr81422.C 
> b/gcc/testsuite/gcc.target/aarch64/pr81422.C
> new file mode 100644
> index 000..5bcc948
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/pr81422.C
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O0" } */
> +
> +struct DArray
> +{
> +__SIZE_TYPE__ length;
> +int* ptr;
> +};
> +
> +void foo35(DArray)
> +{
> +static __thread int x[5];
> +foo35({5, (int*)});
> +}
> +
> -- 
> 1.9.1
> 



Re: [PATCH] Fix PR80295[aarch64] [7/8 Regression] ICE in __builtin_update_setjmp_buf expander

2017-10-05 Thread Qing Zhao
Richard,

thanks for your review and comments.

> On Oct 5, 2017, at 11:50 AM, Richard Earnshaw (lists) 
> <richard.earns...@arm.com> wrote:
> 
> On 25/09/17 17:35, Qing Zhao wrote:
>> 
>> --- a/gcc/config/aarch64/aarch64.h
>> +++ b/gcc/config/aarch64/aarch64.h
>> @@ -782,7 +782,7 @@ typedef struct
>> /* Specify the machine mode that the hardware addresses have.
>>After generation of rtl, the compiler makes no further distinction
>>between pointers and any other objects of this machine mode.  */
>> -#define Pmode   DImode
>> +#define Pmode   (TARGET_ILP32 ? SImode : DImode)
> 
> This is wrong.  AArch64 has both ptr_mode and Pmode.  Pmode must always
> be DImode as (internally) all addresses must be 64-bit.  ptr_mode
> reflects the ABI choice of 32/64-bit language level addresses.  The
> register SP must always be a 64-bit value, even when all the top 32 bits
> are zero.


So,  in ilp32 ABI, compared to LP64,  the only difference is:

   long and pointer size are different

all the others are the same, including:

registers size are 64 bits for both ilp32 and LP64.
stack poping and pushing by 64 bits for both ilp32 and LP64.

is the above right?

what’s the other difference between ilp32 and LP64?

thanks a lot for your help.

Qing

Add myself as GCC maintainer

2017-11-29 Thread Qing Zhao
Added myself as GCC maintainer with r255248:

https://gcc.gnu.org/ml/gcc-cvs/2017-11/msg00965.html 
<https://gcc.gnu.org/ml/gcc-cvs/2017-11/msg00965.html>

thanks.

Qing

==

ChangeLog

+2017-11-29  Qing Zhao  <qing.z...@oracle.com>
+
+   * MAINTAINERS (Write After Approval): Add myself.
+

Index: MAINTAINERS
===
--- MAINTAINERS (revision 255246)
+++ MAINTAINERS (working copy)
@@ -644,6 +644,7 @@
 Kirill Yukhin  <kirill.yuk...@gmail.com>
 Adhemerval Zanella <azane...@linux.vnet.ibm.com>
 Yufeng Zhang   <yufeng.zh...@arm.com>
+Qing Zhao  <qing.z...@oracle.com>
 Shujing Zhao   <pearly.z...@oracle.com>
 Jon Ziegler<j...@apple.com>
 Roman Zippel   <zip...@linux-m68k.org>



Re: [PATCH][Middle-end]79538 missing -Wformat-overflow with %s and non-member array arguments

2017-12-12 Thread Qing Zhao
Hi, Martin,

thanks for the suggestion, this might be a good enhancement for 
get_range_strlen for a future work. 

my understanding is,  the current get_range_strlen does not use value range 
info yet, and also does not handle VLA. 
we can improve it from both aspects in a later work.

Qing

>> 
>> Per your comments, the updated gimple-fold.c is like the following:
> 
> FWIW, I suspect Richard is thinking of VLAs with the INTEGER_CST
> comment above.  This is the case we discussed in private.  It is
> handled either way but the handling could be improved by determining
> the size of the VLA from the first __builtin_alloca_with_align
> argument and using it or its range as the minimum and maximum.
> With that, the range of string lengths that fit in vla in
> the following could be determined and the sprintf call could
> be diagnosed:
> 
>  char d[30];
> 
>  void f (unsigned n)
>  {
>if (n < 32)
>  {
>char vla[n];
>__builtin_sprintf (d, "%s", vla);
>  }
>  }
> 
> I think this would be a nice enhancement to add on top of yours,
> not just for VLAs but for dynamically allocated arrays as well,
> and not just for the sprintf pass but also for the strlen pass
> to optimize cases like:
> 
>  void f (unsigned n)
>  {
>if (n < 32)
>  {
>char vla[n];
> 
>fgets (vla, n, stdin);
> 
>unsigned len = strlen (vla);
>if (len >= n)   // cannot hold
>  abort (); // can be eliminated
>   }
>}
> 
> That said, at some point, it might make more sense to change
> those passes to start tracking these things as they traverse
> the CFG rather than having get_range_strlen() do the work.
> 
> Martin
> 
>> 
>> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
>> index 353a46e..0500fba 100644
>> --- a/gcc/gimple-fold.c
>> +++ b/gcc/gimple-fold.c
>> @@ -1323,6 +1323,19 @@ get_range_strlen (tree arg, tree length[2], bitmap 
>> *visited, int type,
>>   the array could have zero length.  */
>>*minlen = ssize_int (0);
>>  }
>> +
>> +  if (VAR_P (arg)
>> +  && TREE_CODE (TREE_TYPE (arg)) == ARRAY_TYPE)
>> +{
>> +  val = TYPE_SIZE_UNIT (TREE_TYPE (arg));
>> +  if (!val || TREE_CODE (val) != INTEGER_CST || integer_zerop 
>> (val))
>> +return false;
>> +  val = fold_build2 (MINUS_EXPR, TREE_TYPE (val), val,
>> + build_int_cst (TREE_TYPE (val), 1));
>> +  /* Set the minimum size to zero since the string in
>> + the array could have zero length.  */
>> +  *minlen = ssize_int (0);
>> +}
>>  }
>> 
>>   if (!val)
>> 
>> let me know any further issue with the above.
>> 
>> thanks a lot.
>> 
>> Qing



Re: [PATCH][Middle-end]79538 missing -Wformat-overflow with %s and non-member array arguments

2017-12-12 Thread Qing Zhao

> On Dec 12, 2017, at 10:50 AM, Richard Biener  wrote:
>>> 
 +return false;
 +  val = fold_build2 (MINUS_EXPR, TREE_TYPE (val), val,
 + integer_one_node);
>>> 
>>>  val = wide_int_to_tree (TREE_TYPE (val), wi::minus (wi::to_wide
>> (val), 1));
> 
> I don't see this requested change. 

>> +  val = fold_build2 (MINUS_EXPR, TREE_TYPE (val), val,
>> + build_int_cst (TREE_TYPE (val), 1));

for my original code:

 +  val = fold_build2 (MINUS_EXPR, TREE_TYPE (val), val,
 + integer_one_node);

I thought that your original major concern was:

“ you pass a possibly bogus type of 1 to fold_build2 above” 

so I use “build_int_cst (TREE_TYPE (val), 1)" to replace the original 
“integer_one_node”
to make the type of the 1 exactly the same as “val”. 

do I miss anything here?

I don’t quite understand why I have to use:

 val = wide_int_to_tree (TREE_TYPE (val), wi::minus (wi::to_wide(val), 1));

to replace the original fold_build2 (MINUS_EXPR, TREE_TYPE (val), val, 
integer_one_node)? 

thanks.

Qing

>> wide-int variant
>>> is prefered.
>>> 
>>> The gimple-fold.c changes are ok with that change.
>> 
>> Per your comments, the updated gimple-fold.c is like the following:
>> 
>> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
>> index 353a46e..0500fba 100644
>> --- a/gcc/gimple-fold.c
>> +++ b/gcc/gimple-fold.c
>> @@ -1323,6 +1323,19 @@ get_range_strlen (tree arg, tree length[2],
>> bitmap *visited, int type,
>>   the array could have zero length.  */
>>*minlen = ssize_int (0);
>>  }
>> +
>> +  if (VAR_P (arg) 
>> +  && TREE_CODE (TREE_TYPE (arg)) == ARRAY_TYPE)
>> +{
>> +  val = TYPE_SIZE_UNIT (TREE_TYPE (arg));
>> +  if (!val || TREE_CODE (val) != INTEGER_CST ||
>> integer_zerop (val))
>> +return false;
>> +  val = fold_build2 (MINUS_EXPR, TREE_TYPE (val), val,
>> + build_int_cst (TREE_TYPE (val), 1));
>> +  /* Set the minimum size to zero since the string in
>> + the array could have zero length.  */
>> +  *minlen = ssize_int (0);
>> +}
>>  }
>> 
>>  if (!val)
>> 
>> let me know any further issue with the above.
>> 
>> thanks a lot.
>> 
>> Qing



Re: [PATCH][Middle-end]79538 missing -Wformat-overflow with %s and non-member array arguments

2017-12-14 Thread Qing Zhao

> On Dec 14, 2017, at 1:36 PM, Jeff Law <l...@redhat.com> wrote:
> 
> On 12/14/2017 12:22 PM, Qing Zhao wrote:
>> 
>>> On Dec 14, 2017, at 2:05 AM, Richard Biener <rguent...@suse.de
>>> <mailto:rguent...@suse.de>> wrote:
>>> 
>>> On Wed, 13 Dec 2017, Qing Zhao wrote:
>>> 
>>>> Hi,
>>>> 
>>>> I updated gimple-fold.c as you suggested, bootstrapped and re-tested
>>>> on both x86 and aarch64. no any issue.
>>>> 
>>>> 
>>>> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
>>>> index 353a46e..eb6a87a 100644
>>>> --- a/gcc/gimple-fold.c
>>>> +++ b/gcc/gimple-fold.c
>>>> @@ -1323,6 +1323,19 @@ get_range_strlen (tree arg, tree length[2],
>>>> bitmap *visited, int type,
>>>>  the array could have zero length.  */
>>>>   *minlen = ssize_int (0);
>>>> }
>>>> +
>>>> +  if (VAR_P (arg) 
>>>> +  && TREE_CODE (TREE_TYPE (arg)) == ARRAY_TYPE)
>>>> +{
>>>> +  val = TYPE_SIZE_UNIT (TREE_TYPE (arg));
>>>> +  if (!val || TREE_CODE (val) != INTEGER_CST || integer_zerop (val))
>>>> +return false;
>>>> +  val = wide_int_to_tree (TREE_TYPE (val), 
>>>> +  wi::sub(wi::to_wide (val), 1));
>>>> +  /* Set the minimum size to zero since the string in
>>>> + the array could have zero length.  */
>>>> +  *minlen = ssize_int (0);
>>>> +}
>>>> }
>>>> 
>>>> 
>>>> I plan to commit the change very soon. 
>>>> let me know if you have further comment.
>>> 
>>> Looks good to me.
>> 
>> thanks a lot for your review.
>> 
>> committed the patch as revision 255654
>> https://gcc.gnu.org/viewcvs/gcc?view=revision=255654
>> 
>> PR 79538 was filed against GCC7.0, So, I assume that this patch need to
>> be backported to GCC7 branch.
>> I will do the backporting to GCC7 later this week if there is no objection. 
> We don't try to backport all fixes to the release branches -- we tend to
> focus more on regressions that apply to those releases and codegen
> correctness issues.
> 
> I'd think a missed warning isn't that important to backport.

Okay. I see. 

then I will close PR79538 as fixed.

thanks.

Qing
> 
> Jeff



[PATCH][Middle-end]2nd patch of PR78809 and PR83026

2017-12-14 Thread Qing Zhao
Hi,

I am not sure whether it’s proper to send this patch during this late stage.
however, since the patch itself is quite straightforward, I decided to send it 
now. 

=

2nd Patch for PR78009
Patch for PR83026

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809 
<https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809>
Inline strcmp with small constant strings

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83026 
<https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83026>
missing strlen optimization for strcmp of unequal strings

The design doc for PR78809 is at:
https://www.mail-archive.com/gcc@gcc.gnu.org/msg83822.html 
<https://www.mail-archive.com/gcc@gcc.gnu.org/msg83822.html>

this patch is for the second part of change of PR78809 and PR83026:

B. for strncmp (s1, s2, n) (!)= 0 or strcmp (s1, s2) (!)= 0

  B.1. (PR83026) When both arguments are constant strings and it's a strcmp:
 * if the length of the strings are NOT equal, we can safely fold the call
   to a non-zero value.
 * otherwise, do nothing now.

  B.2. (PR78809) When one of the arguments is a constant string, try to replace
  the call with a __builtin_memcmp_eq call where possible, i.e:

  strncmp (s, STR, C) (!)= 0 in which, s is a pointer to a string, STR is a
  constant string, C is a constant.
if (C <= strlen(STR) && sizeof_array(s) > C)
  {
replace this call with
memcmp (s, STR, C) (!)= 0
  }
if (C > strlen(STR)
  {
it can be safely treated as a call to strcmp (s, STR) (!)= 0
can handled by the following strcmp.
  }

  strcmp (s, STR) (!)= 0 in which, s is a pointer to a string, STR is a
  constant string.
if  (sizeof_array(s) > strlen(STR))
  {
replace this call with
memcmp (s, STR, strlen(STR)+1) (!)= 0
  }

  (NOTE, currently, memcmp(s1, s2, N) (!)=0 has been optimized to a simple
   sequence to access all bytes and accumulate the overall result in GCC by
   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52171 
<https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52171>

   as a result, such str(n)cmp call would like to be replaced by simple
   sequences to access all types and accumulate the overall results.

adding test case strcmpopt_2.c into gcc.dg for part B of PR78809
adding test case strcmpopt_3.c into gcc.dg for PR83026

bootstraped and tested on both X86 and Aarch64. no regression.

Okay for trunk?

thanks.

Qing

====
gcc/ChangeLog:

2017-12-11  Qing Zhao  <qing.z...@oracle.com <mailto:qing.z...@oracle.com>>

PR middle-end/78809
PR middle-end/83026
* tree-ssa-strlen.c (compute_string_length): New function.
(handle_builtin_string_cmp): New function to handle calls to
string compare functions.
(strlen_optimize_stmt): Add handling to builtin string compare
    calls. 

gcc/testsuite/ChangeLog:

2017-12-11  Qing Zhao  <qing.z...@oracle.com <mailto:qing.z...@oracle.com>>

PR middle-end/78809
* gcc.dg/strcmpopt_2.c: New testcase.

PR middle-end/83026
* gcc.dg/strcmpopt_3.c: New testcase.


---
 gcc/testsuite/gcc.dg/strcmpopt_2.c |  67 +
 gcc/testsuite/gcc.dg/strcmpopt_3.c |  27 +
 gcc/tree-ssa-strlen.c  | 197 +
 3 files changed, 291 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/strcmpopt_2.c
 create mode 100644 gcc/testsuite/gcc.dg/strcmpopt_3.c

diff --git a/gcc/testsuite/gcc.dg/strcmpopt_2.c 
b/gcc/testsuite/gcc.dg/strcmpopt_2.c
new file mode 100644
index 000..ac8ff39
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/strcmpopt_2.c
@@ -0,0 +1,67 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -fdump-tree-strlen" } */
+
+char s[100] = {'a','b','c','d'};
+typedef struct { int x; char s[8]; } S;
+
+__attribute__ ((noinline)) int 
+f1 (S *s) 
+{ 
+  return __builtin_strcmp (s->s, "abc") != 0; 
+}
+
+__attribute__ ((noinline)) int 
+f2 (void) 
+{ 
+  return __builtin_strcmp (s, "abc") != 0; 
+}
+
+__attribute__ ((noinline)) int 
+f3 (S *s) 
+{ 
+  return __builtin_strcmp ("abc", s->s) != 0; 
+}
+
+__attribute__ ((noinline)) int 
+f4 (void) 
+{ 
+  return __builtin_strcmp ("abc", s) != 0; 
+}
+
+__attribute__ ((noinline)) int 
+f5 (S *s) 
+{ 
+  return __builtin_strncmp (s->s, "abc", 3) != 0; 
+}
+
+__attribute__ ((noinline)) int 
+f6 (void) 
+{ 
+  return __builtin_strncmp (s, "abc", 2) != 0; 
+}
+
+__attribute__ ((noinline)) int 
+f7 (S *s) 
+{ 
+  return __builtin_strncmp ("abc", s->s, 3) != 0; 
+}
+
+__attribute__ ((noinline)) int 
+f8 (void) 
+{ 
+  return __builtin_strncmp ("abc", s, 2) != 0; 
+}
+
+int main (void)
+{
+  S ss = {2, {'a','b','c'}};
+
+  if (f1 () != 0 || f2 () != 1 || f3 () != 0 ||
+  f4 () != 1 || f5 () != 0 || f6 () != 0 ||
+  f7 () != 0 || f8 () != 0)
+__buil

Re: [PATCH][Middle-end]79538 missing -Wformat-overflow with %s and non-member array arguments

2017-12-14 Thread Qing Zhao

> On Dec 14, 2017, at 2:05 AM, Richard Biener <rguent...@suse.de> wrote:
> 
> On Wed, 13 Dec 2017, Qing Zhao wrote:
> 
>> Hi,
>> 
>> I updated gimple-fold.c as you suggested, bootstrapped and re-tested on both 
>> x86 and aarch64. no any issue.
>> 
>> 
>> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
>> index 353a46e..eb6a87a 100644
>> --- a/gcc/gimple-fold.c
>> +++ b/gcc/gimple-fold.c
>> @@ -1323,6 +1323,19 @@ get_range_strlen (tree arg, tree length[2], bitmap 
>> *visited, int type,
>>   the array could have zero length.  */
>>*minlen = ssize_int (0);
>>  }
>> +
>> +  if (VAR_P (arg) 
>> +  && TREE_CODE (TREE_TYPE (arg)) == ARRAY_TYPE)
>> +{
>> +  val = TYPE_SIZE_UNIT (TREE_TYPE (arg));
>> +  if (!val || TREE_CODE (val) != INTEGER_CST || integer_zerop (val))
>> +return false;
>> +  val = wide_int_to_tree (TREE_TYPE (val), 
>> +  wi::sub(wi::to_wide (val), 1));
>> +  /* Set the minimum size to zero since the string in
>> + the array could have zero length.  */
>> +  *minlen = ssize_int (0);
>> +}
>>  }
>> 
>> 
>> I plan to commit the change very soon. 
>> let me know if you have further comment.
> 
> Looks good to me.

thanks a lot for your review.

committed the patch as revision 255654
https://gcc.gnu.org/viewcvs/gcc?view=revision=255654 
<https://gcc.gnu.org/viewcvs/gcc?view=revision=255654>

PR 79538 was filed against GCC7.0, So, I assume that this patch need to be 
backported to GCC7 branch.
I will do the backporting to GCC7 later this week if there is no objection. 

Qing
> Richard.
> 



Re: [PATCH][Middle-end]2nd patch of PR78809 and PR83026

2017-12-15 Thread Qing Zhao
Hi, Jakub,

thanks a lot for your detailed review.

> On Dec 14, 2017, at 2:45 PM, Jakub Jelinek <ja...@redhat.com> wrote:
> 
> On Thu, Dec 14, 2017 at 01:45:21PM -0600, Qing Zhao wrote:
>> 2017-12-11  Qing Zhao  <qing.z...@oracle.com <mailto:qing.z...@oracle.com>>
> 
> No " <mailto:qing.z...@oracle.com>" in ChangeLog entries please.

this is an error when I pasted this part from my terminal to mail editor, not 
in my real code.
will double check next time when sending out email.
> 
>> --- a/gcc/tree-ssa-strlen.c
>> +++ b/gcc/tree-ssa-strlen.c
>> @@ -2541,6 +2541,198 @@ handle_builtin_memcmp (gimple_stmt_iterator *gsi)
>>   return false;
>> }
>> 
>> +/* Given an index to the strinfo vector, compute the string length for the
>> +   corresponding string. Return -1 when unknown.  */
>> + 
>> +static HOST_WIDE_INT 
>> +compute_string_length (int idx)
>> +{
>> +  HOST_WIDE_INT string_leni = -1; 
>> +  gcc_assert (idx != 0);
>> +
>> +  if (idx < 0)
>> +string_leni = ~idx;
>> +  else
>> +{
>> +  strinfo *si = get_strinfo (idx);
>> +  if (si)
>> +{
>> +  tree const_string_len = get_string_length (si);
>> +  string_leni
>> += (const_string_len && tree_fits_uhwi_p (const_string_len)
>> +   ? tree_to_uhwi(const_string_len) : -1); 
> 
> So, you are returning a signed HWI, then clearly tree_fits_uhwi_p and
> tree_to_uhwi are inappropriate, you should have used tree_fits_shwi_p
> and tree_to_shwi.  Space after function name is missing too.
> And, as you start by initializing string_leni to -1, there is no
> point to write it this way rather than
> if (const_string_len && tree_fits_shwi_p (const_string_len))
>   string_leni = tree_to_shwi (const_string_len);

originally it returned an unsigned HWI.   but later I changed it to return a 
signed one since I
need a negative value to represent the UNKNOWN state. 

I will fix this.
> 
>> +}
>> +}
> 
> Maybe also do
>  if (string_leni < 0)
>return -1;

Yes, this might be safer.
> 
>> +  return string_leni;
> 
> unless the callers just look for negative value as unusable.
> 
>> +  tree len = gimple_call_arg (stmt, 2);
>> +  if (tree_fits_uhwi_p (len))
>> +length = tree_to_uhwi (len);
> 
> Similarly to above, you are mixing signed and unsigned HWIs too much.

same reason as above :-),  I will fix this.

> 
>> +  if (gimple_code (ustmt) == GIMPLE_ASSIGN)
> 
>  if (is_gimple_assign (ustmt))
> 
> Usually we use use_stmt instead of ustmt.
Okay.
> 
>> +{
>> +  gassign *asgn = as_a  (ustmt);
> 
> No need for the gassign and ugly as_a, gimple_assign_rhs_code
> as well as gimple_assign_rhs2 can be called on gimple * too.

this part of the code I just copied from the routine “handle_builtin_memcpy” 
and no change.

I will change it as you suggested.


>> +  tree_code code = gimple_assign_rhs_code (asgn);
>> +  if ((code != EQ_EXPR && code != NE_EXPR)
>> +  || !integer_zerop (gimple_assign_rhs2 (asgn)))
>> +return true;
>> +}
>> +  else if (gimple_code (ustmt) == GIMPLE_COND)
>> +{
>> +  tree_code code = gimple_cond_code (ustmt);
>> +  if ((code != EQ_EXPR && code != NE_EXPR)
>> +  || !integer_zerop (gimple_cond_rhs (ustmt)))
>> +return true;
> 
> There is another case you are missing, assign stmt with
> gimple_assign_rhs_code COND_EXPR, where gimple_assign_rhs1 is
> tree with TREE_CODE EQ_EXPR or NE_EXPR with TREE_OPERAND (rhs1, 1)
> integer_zerop.

a little confused here:

in the current code:
. the first case is:  result = strcmp() != 0
. the second case is:if (strcmp() != 0)

so, the missing case you mentioned above is:

result = if (strcmp() != 0) 

or something else?
> 
>> +  /* When both arguments are known, and their strlens are unequal, we can 
>> + safely fold the call to a non-zero value for strcmp;
>> + othewise, do nothing now.  */
>> +  if (idx1 != 0 && idx2 != 0)
>> +{
>> +  HOST_WIDE_INT const_string_leni1 = -1;
>> +  HOST_WIDE_INT const_string_leni2 = -1;
>> +  const_string_leni1 = compute_string_length (idx1);
>> +  const_string_leni2 = compute_string_length (idx2);
> 
> Why do you initialize the vars when you immediately overwrite it?

just a habit to declare a variable with initialization :-).

> Just do
>  HOST_WIDE_INT const_string_leni1 = compute_string_length (idx1);

I can change it like this.

Re: [PATCH][Middle-end]2nd patch of PR78809 and PR83026

2017-12-15 Thread Qing Zhao

> On Dec 15, 2017, at 10:42 AM, Jakub Jelinek <ja...@redhat.com> wrote:
> 
> On Fri, Dec 15, 2017 at 10:08:03AM -0600, Qing Zhao wrote:
>> a little confused here:
>> 
>> in the current code:
>>  . the first case is:  result = strcmp() != 0
>>  . the second case is:if (strcmp() != 0)
>> 
>> so, the missing case you mentioned above is:
>> 
>>result = if (strcmp() != 0) 
>> 
>> or something else?
> 
> result = (strcmp () != 0 ? 15 : 37);
> or similar.  Though, usually COND_EXPRs are added by the tree-if-conversion
> pass, so you might need -ftree-loop-if-convert option and it probably needs
> to be within some loop which will have just a single bb after the
> if-conversion.

I see. thanks.

>> 
>>> if you just use else, you don't need to initialize const_string_leni
>>> either.
>> 
>> I think that const_string_leni still need to be initialized in this case, 
>> because when idx2 is non-zero,  
>> const_string_leni is initialized to compute_string_length (idx2). 
> 
> Sure.  But
>  type uninitialized_var;
>  if (cond1)
>uninitialized_var = foo;
>  else if (cond2)
>uninitialized_var = bar;
>  use (uninitialized_var);
> is a coding style which asks for -Wmaybe-uninitialized warnings, in order
> not to warn, the compiler has to prove that cond1 || cond2 is always true,
> which might not be always easy for the compiler.

in my case, I already initialize the “uninitialized_var” when declared it:

  HOST_WIDE_INT const_string_leni = -1;

  if (idx1)
{
  const_string_leni = compute_string_length (idx1);
  var_string = arg2;
}
  else if (idx2)
{
  const_string_leni = compute_string_length (idx2);
  var_string = arg1;
}

so, the -Wmaybe-uninitialized should NOT issue warning, right?

but anyway, I can change the above as following:

 HOST_WIDE_INT const_string_leni = -1;

  if (idx1)
{
  const_string_leni = compute_string_length (idx1);
  var_string = arg2;
}
  else
{
  gcc_assert (idx2);
  const_string_leni = compute_string_length (idx2);
  var_string = arg1;
}

is this better?

> 
>>> This is something that looks problematic to me.  get_range_strlen returns
>>> some conservative upper bound on the string length, which is fine if
>>> var_string points to say a TREE_STATIC variable where you know the allocated
>>> size, or automatic variable.  But if somebody passes you a pointer to a
>>> structure and the source doesn't contain aggregate copying for it, not sure
>>> if you can take for granted that all the bytes are readable after the '\0'
>>> in the string.  Hopefully at least for flexible array members and arrays in
>>> such positions get_range_strlen will not provide the upper bound, but even
>>> in other cases it doesn't feel safe to me.
>> 
>> this is the part that took me most of the time during the implementation. 
>> 
>> I have considered the following 3 approaches to decide the size of the 
>> variable array:
>> 
>>  A. use “compute_builtin_object_size” in tree-object-size.h to decide 
>> the size of the
>> object.   However, even with the simplest case, it cannot provide the 
>> information. 
> 
> compute_builtin_object_size with modes 0 or 1 computes upper bound, what you
> are really looking for is lower bound,

you mean: 0, 1 is for maximum object size, and 2 is for minimum object size?

yes, I am looking for minimum object size for this optimization. 

> so that would be mode 2, though that
> mode isn't actually used in real-world code and thus might be not fully
> tested.

so, using this routine with mode 2 should be the right approach to go? and we 
need fully testing on this too?

> 
>>  B. use “get_range_strlen” in gimple-fold.h to decide the size of the 
>> object.  however, 
>> it cannot provide valid info for simple array, either. 
> 
> get_range_strlen returns you a range, the minval is not what you're looking
> for, that is the minimum string length, so might be too short for your
> purposes.  And maxval is an upper bound, but you are looking for lower
> bound, you need guarantees this amount of memory can be accessed, even if
> there is 0 in the first byte.

my understanding is that: get_range_strlen returns the minimum and maximum 
length of the string pointed by the 
pointer, and the maximum length of the string is determined by the size of the 
allocated memory pointed by the
pointer, so, it should serve my purpose,   did I misunderstand it?

> 
>>> Furthermore, in the comments you say that you do it only for small strings,
>>> but in the patch I can't see any upper bound, so you c

Re: [PATCH][Middle-end]2nd patch of PR78809 and PR83026

2017-12-15 Thread Qing Zhao

> On Dec 15, 2017, at 11:47 AM, Jakub Jelinek <ja...@redhat.com> wrote:
> 
> On Fri, Dec 15, 2017 at 11:17:37AM -0600, Qing Zhao wrote:
>>  HOST_WIDE_INT const_string_leni = -1;
>> 
>>  if (idx1)
>>{
>>  const_string_leni = compute_string_length (idx1);
>>  var_string = arg2;
>>}
>>  else if (idx2)
>>{
>>  const_string_leni = compute_string_length (idx2);
>>  var_string = arg1;
>>}
>> 
>> so, the -Wmaybe-uninitialized should NOT issue warning, right?
> 
> Well, you had the var_string var uninitialized, so that is what I was
> talking about.

oh, yeah, forgot to initialize var_string.
> 
>> but anyway, I can change the above as following:
>> 
>> HOST_WIDE_INT const_string_leni = -1;
> 
> And here you don't need to initialize it.
> 
>>  if (idx1)
>>{
>>  const_string_leni = compute_string_length (idx1);
>>  var_string = arg2;
>>}
>>  else
>>{
>>  gcc_assert (idx2);
>>  const_string_leni = compute_string_length (idx2);
>>  var_string = arg1;
>>}
>> 
>> is this better?
> 
> Yes, though the gcc_assert could be just gcc_checking_assert (idx2);

what’s the major difference between these two assertion calls?

> 
>>> so that would be mode 2, though that
>>> mode isn't actually used in real-world code and thus might be not fully
>>> tested.
>> 
>> so, using this routine with mode 2 should be the right approach to go? 
>> and we need fully testing on this too?
> 
> It has been a while since I wrote it, so it would need careful analysis.
> 
>>>>B. use “get_range_strlen” in gimple-fold.h to decide the size of the 
>>>> object.  however, 
>>>> it cannot provide valid info for simple array, either. 
>>> 
>>> get_range_strlen returns you a range, the minval is not what you're looking
>>> for, that is the minimum string length, so might be too short for your
>>> purposes.  And maxval is an upper bound, but you are looking for lower
>>> bound, you need guarantees this amount of memory can be accessed, even if
>>> there is 0 in the first byte.
>> 
>> my understanding is that: get_range_strlen returns the minimum and maximum 
>> length of the string pointed by the 
>> pointer, and the maximum length of the string is determined by the size of 
>> the allocated memory pointed by the
>> pointer, so, it should serve my purpose,   did I misunderstand it?
> 
> What I'm worried about is:
> struct S { int a; char b[64]; };
> struct T { struct S c; char d; };
> int
> foo (struct T *x)
> {
>  return strcmp (x->c.b, "01234567890123456789012345678901234567890123456789") 
> == 0;
> }
> int
> bar (void)
> {
>  struct S *p = malloc (offsetof (struct S, b) + 8);
>  p->a = 123;
>  strcpy (p->b, "0123456");
>  return foo ((struct T *) p);
> }
> etc.  where if you transform that into memcmp (x->c.b, 
> "01234567890123456789012345678901234567890123456789", 51) == 0
> it will segfault, whereas strcmp would not.

thanks for the example.

is this issue only for the flexible array member? 
if so, the current get_range_strlen will distinguish whether this is for 
flexible array member or not, then we can disable such transformation
for flexible array member.


> 
>>> But if we find out during
>>> expansion we don't want to expand it inline, we should fall back to calling
>>> strcmp or strncmp.
>> 
>> under what situation we will NOT expand the memcpy_eq call inline?
> 
>  target = expand_builtin_memcmp (exp, target, fcode == 
> BUILT_IN_MEMCMP_EQ);
>  if (target)
>return target;
>  if (fcode == BUILT_IN_MEMCMP_EQ)
>{
>  tree newdecl = builtin_decl_explicit (BUILT_IN_MEMCMP);
>  TREE_OPERAND (exp, 1) = build_fold_addr_expr (newdecl);
>}
> is what builtins.c has, so it certainly counts with the possibility.
> Now, both expand_builtin_memcmp, and emit_block_cmp_hints has several cases
> when it fails.  E.g. can_do_by_pieces decides it is too expensive to do it
> inline, and emit_block_cmp_via_cmpmem fails because the target doesn't have
> cmpmemsi expander.  Various other cases.
> 
> Also, note that some target might have cmpstr*si expanders implemented, but
> not cmpmemsi, in which case trying to optimize strcmp as memcmp_eq might be a
> severe pessimization.

Okay, I see.  this is reasonable. 

if the following better:  (some details still need more work, just basic idea)

in handle_builtin_string_cmp of tree-ssa-strlen.c:


Re: [PATCH][Middle-end]2nd patch of PR78809 and PR83026

2017-12-15 Thread Qing Zhao
Hi, Wilco,

thanks a lot for your review and comments.

> On Dec 15, 2017, at 6:41 AM, Wilco Dijkstra  wrote:
> 
> Hi Qing,
> 
> Just looking at a very high level, I have a few comments:
> 
> 1. Constant folding str(n)cmp - folding is done separately in 
> fold-const-call.c
>   and gimple-fold.c.  There is already code for folding strcmp and strncmp,
>   so we shouldn't need to add new foldings.  Or do you have an example that
>   isn't folded as expected? If so, a fix should be added to the existing code.

are you referring the following:

 B.1. (PR83026) When both arguments are constant strings and it's a strcmp:
* if the length of the strings are NOT equal, we can safely fold the call
  to a non-zero value.
* otherwise, do nothing now.


the following testing case cannot be folded by the current gimple-fold or 
fold-const-call:

int f1 (void)
{
  char *s0= "abcd";
  char s[8];
  strcpy (s, s0);
  return __builtin_strcmp(s, "abc") != 0;
}

in order to fold the above call, we need the strlen information that provided 
by the tree-ssa-strlen.c.
and I don’t think that improving the gimple-fold or fold-const-call can fold 
the above call to 1. 

with the strlen info in tree-ssa-strlen.c,  the strlen of s could be determined 
as 4, which is not equal
to the strlen of the constant string “abc”, then we can safely fold 

__builtin_strcmp(s, "abc") != 0

to 1. 

(actually, your above  comments remind me that one of my testcases added has 
some issue, I will fix the testcase,
strcmpopt_3.c). 

> 
> 2. Why check for str(n)cmp == 0 / != 0? There is no need to explicitly check
>   for equality comparisons since folding into memcmp is always good.

If the result is Not used to compare with 0, then we need the exact value of 
strcmp of two strings.  negative value, zero, or positive value.

I am not sure changing str(n)cmp to memcmp under such situation will still 
return the correct negative or positive value?

> 
> 3. Why handle strncmp? There is already code to convert strncmp into strcmp,
>   so why repeat that again in a different way? It just seems to make the
>   code significantly more complex for no benefit.

if strncmp cannot be convert to strcmp, there is still call to strncmp that we 
should handle, right? 

Qing


> 
> You can achieve the same effect by just optimizing strcmp into memcmp when
> legal without checking for equality comparison.  As a result you can 
> considerably
> reduce the size of this patch while handling more useful cases.
> 
> Wilco
> 



Re: [PATCH][Middle-end]79538 missing -Wformat-overflow with %s and non-member array arguments

2017-12-13 Thread Qing Zhao
Hi,

I updated gimple-fold.c as you suggested, bootstrapped and re-tested on both 
x86 and aarch64. no any issue.


diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index 353a46e..eb6a87a 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -1323,6 +1323,19 @@ get_range_strlen (tree arg, tree length[2], bitmap 
*visited, int type,
 the array could have zero length.  */
  *minlen = ssize_int (0);
}
+
+ if (VAR_P (arg) 
+ && TREE_CODE (TREE_TYPE (arg)) == ARRAY_TYPE)
+   {
+ val = TYPE_SIZE_UNIT (TREE_TYPE (arg));
+ if (!val || TREE_CODE (val) != INTEGER_CST || integer_zerop (val))
+   return false;
+ val = wide_int_to_tree (TREE_TYPE (val), 
+ wi::sub(wi::to_wide (val), 1));
+ /* Set the minimum size to zero since the string in
+the array could have zero length.  */
+ *minlen = ssize_int (0);
+   }
}


I plan to commit the change very soon. 
let me know if you have further comment.

thanks.

Qing

==

the updated full patch is as following:

gcc/ChangeLog

2017-12-13  Qing Zhao  <qing.z...@oracle.com >

 PR middle_end/79538
 * gimple-fold.c (get_range_strlen): Add the handling of non-member array.

gcc/fortran/ChangeLog

2017-12-13  Qing Zhao  <qing.z...@oracle.com >

  PR middle_end/79538
  * class.c (gfc_build_class_symbol): Replace call to
  sprintf with xasprintf to avoid format-overflow warning.
  (generate_finalization_wrapper): Likewise.
  (gfc_find_derived_vtab): Likewise.
  (find_intrinsic_vtab): Likewise.


gcc/c-family/ChangeLog

2017-12-13  Qing Zhao  <qing.z...@oracle.com >

 PR middle_end/79538 
* c-cppbuiltin.c (builtin_define_with_hex_fp_value):
 Adjust the size of buf1 and buf2, add a new buf to avoid
 format-overflow warning.

gcc/testsuite/ChangeLog

2017-12-13  Qing Zhao  <qing.z...@oracle.com >

 PR middle_end/79538
 * gcc.dg/pr79538.c: New test.

---
 gcc/c-family/c-cppbuiltin.c| 10 -
 gcc/fortran/class.c| 49 --
 gcc/gimple-fold.c  | 13 +++
 gcc/testsuite/gcc.dg/pr79538.c | 23 
 4 files changed, 69 insertions(+), 26 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr79538.c

diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c
index 2ac9616..9e33aed 100644
--- a/gcc/c-family/c-cppbuiltin.c
+++ b/gcc/c-family/c-cppbuiltin.c
@@ -1613,7 +1613,7 @@ builtin_define_with_hex_fp_value (const char *macro,
  const char *fp_cast)
 {
   REAL_VALUE_TYPE real;
-  char dec_str[64], buf1[256], buf2[256];
+  char dec_str[64], buf[256], buf1[128], buf2[64];
 
   /* This is very expensive, so if possible expand them lazily.  */
   if (lazy_hex_fp_value_count < 12
@@ -1656,11 +1656,11 @@ builtin_define_with_hex_fp_value (const char *macro,
 
   /* Assemble the macro in the following fashion
  macro = fp_cast [dec_str fp_suffix] */
-  sprintf (buf1, "%s%s", dec_str, fp_suffix);
-  sprintf (buf2, fp_cast, buf1);
-  sprintf (buf1, "%s=%s", macro, buf2);
+  sprintf (buf2, "%s%s", dec_str, fp_suffix);
+  sprintf (buf1, fp_cast, buf2);
+  sprintf (buf, "%s=%s", macro, buf1);
 
-  cpp_define (parse_in, buf1);
+  cpp_define (parse_in, buf);
 }
 
 /* Return a string constant for the suffix for a value of type TYPE
diff --git a/gcc/fortran/class.c b/gcc/fortran/class.c
index ebbd41b..a08fb8d 100644
--- a/gcc/fortran/class.c
+++ b/gcc/fortran/class.c
@@ -602,7 +602,8 @@ bool
 gfc_build_class_symbol (gfc_typespec *ts, symbol_attribute *attr,
gfc_array_spec **as)
 {
-  char name[GFC_MAX_SYMBOL_LEN+1], tname[GFC_MAX_SYMBOL_LEN+1];
+  char tname[GFC_MAX_SYMBOL_LEN+1];
+  char *name;
   gfc_symbol *fclass;
   gfc_symbol *vtab;
   gfc_component *c;
@@ -633,17 +634,17 @@ gfc_build_class_symbol (gfc_typespec *ts, 
symbol_attribute *attr,
   rank = !(*as) || (*as)->rank == -1 ? GFC_MAX_DIMENSIONS : (*as)->rank;
   get_unique_hashed_string (tname, ts->u.derived);
   if ((*as) && attr->allocatable)
-sprintf (name, "__class_%s_%d_%da", tname, rank, (*as)->corank);
+name = xasprintf ("__class_%s_%d_%da", tname, rank, (*as)->corank);
   else if ((*as) && attr->pointer)
-sprintf (name, "__class_%s_%d_%dp", tname, rank, (*as)->corank);
+name = xasprintf ("__class_%s_%d_%dp", tname, rank, (*as)->corank);
   else if ((*as))
-sprintf (name, "__class_%s_%d_%dt", tname, rank, (*as)->corank);
+name = xasprintf ("__class_%s_%d_%dt", tname, rank, (*as)->corank);
   else if (attr->pointer)
-sprintf (name, "__class_%s_p", tname);
+name = xasprintf

Re: [PATCH 1/3][middle-end]PR78809 (Inline strcmp with small constant strings)

2017-11-17 Thread Qing Zhao
thanks Jeff and Paolo. 

really appreciate  for all the help so far.

Qing
> On Nov 17, 2017, at 3:17 AM, Paolo Carlini  wrote:
> 
> Hi,
> 
> On 17/11/2017 06:29, Jeff Law wrote:
>> OK. I'll go ahead and commit for you.
> Beautiful. Thanks Jeff.
>> I think this patch is small enough to not require a copyright
>> assignment.  However further work likely will.  I don't offhand know if
>> Oracle has a blanket assignment in place.  Can you work with Paolo to
>> ensure that the proper paperwork is in place so that future
>> contributions don't get held up on legal stuff.
> Oracle is fine, has a blanket assignment, thus Qing is already covered and I 
> think this one is her third small (but useful ;) contribution so far.
> 
> While we are at it, I'm thinking that probably Qing could be added to 
> sourceware.org, if you don't disagree. Qing, if nobody objects over the next 
> day or so, please go ahead, submit this form (include my name as sponsor):
> 
> https://sourceware.org/cgi-bin/pdw/ps_form.cgi
> 
> Thanks again!
> Paolo.



Re: [PATCH 1/3][middle-end]PR78809 (Inline strcmp with small constant strings)

2017-11-16 Thread Qing Zhao

> On Nov 16, 2017, at 6:57 PM, Jeff Law <l...@redhat.com> wrote:
> 
> On 11/15/2017 08:00 AM, Qing Zhao wrote:
>> Hi,
>> 
>> this is the first patch for PR78809 (totally 3 patches)
>> 
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809
>> inline strcmp with small constant strings
>> 
>> The design doc is at:
>> https://www.mail-archive.com/gcc@gcc.gnu.org/msg83822.html
>> 
>> this patch is for the first part of change:
>> 
>> A. for strncmp (s1, s2, n)
>> if one of "s1" or "s2" is a constant string, "n" is a constant, and
>> larger than the length of the constant string:
>> change strncmp (s1, s2, n) to strcmp (s1, s2);
>> 
>> adding test case strcmpopt_1.c into gcc.dg
>> 
>> bootstraped and tested on both X86 and aarch64. no regression.
>> 
>> Okay for commit?
>> 
>> thanks.
>> 
>> Qing
>> 
>> ==
>> 
>> gcc/ChangeLog
>> 
>> 2017-11-15  Qing Zhao  <qing.z...@oracle.com>
>> 
>>   * gimple-fold.c (gimple_fold_builtin_string_compare): Add handling
>>   of replacing call to strncmp with corresponding call to strcmp when
>>   meeting conditions.
>> 
>> gcc/testsuite/ChangeLog
>> 
>> 2017-11-15  Qing Zhao  <qing.z...@oracle.com>
>> 
>>   PR middle-end/78809
>>   * gcc.dg/strcmpopt_1.c: New test.
> This is OK for the trunk.
> 
> Please put the PR middle-end/78809 marker in the main ChangeLog entry as
> well.

Okay.  (as following?)

2017-11-15  Qing Zhao <qing.z...@oracle.com>

PR middle-end/78809
* gimple-fold.c (gimple_fold_builtin_string_compare): Add handling
of replacing call to strncmp with corresponding call to strcmp when
meeting conditions.


gcc/testsuite/ChangeLog

2017-11-15  Qing Zhao  <qing.z...@oracle.com <mailto:qing.z...@oracle.com>>

  PR middle-end/78809
  * gcc.dg/strcmpopt_1.c: New test.

> 
> Do you have write access to the repository?

No, I don’t have. 
Qing


> 
> jeff



[PATCH 1/3][middle-end]PR78809 (Inline strcmp with small constant strings)

2017-11-15 Thread Qing Zhao
Hi,

this is the first patch for PR78809 (totally 3 patches)

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809
inline strcmp with small constant strings

The design doc is at:
https://www.mail-archive.com/gcc@gcc.gnu.org/msg83822.html

this patch is for the first part of change:

A. for strncmp (s1, s2, n)
 if one of "s1" or "s2" is a constant string, "n" is a constant, and
larger than the length of the constant string:
 change strncmp (s1, s2, n) to strcmp (s1, s2);

adding test case strcmpopt_1.c into gcc.dg

bootstraped and tested on both X86 and aarch64. no regression.

Okay for commit?

thanks.

Qing

==

gcc/ChangeLog

2017-11-15  Qing Zhao  <qing.z...@oracle.com>

   * gimple-fold.c (gimple_fold_builtin_string_compare): Add handling
   of replacing call to strncmp with corresponding call to strcmp when
   meeting conditions.

gcc/testsuite/ChangeLog

2017-11-15  Qing Zhao  <qing.z...@oracle.com>

   PR middle-end/78809
   * gcc.dg/strcmpopt_1.c: New test.

---
 gcc/gimple-fold.c  | 15 +++
 gcc/testsuite/gcc.dg/strcmpopt_1.c | 28 
 2 files changed, 43 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/strcmpopt_1.c

diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index adb6f3b..1ed6383 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -2258,6 +2258,21 @@ gimple_fold_builtin_string_compare (gimple_stmt_iterator 
*gsi)
   return true;
 }
 
+  /* If length is larger than the length of one constant string, 
+ replace strncmp with corresponding strcmp */ 
+  if (fcode == BUILT_IN_STRNCMP 
+  && length > 0
+  && ((p2 && (size_t) length > strlen (p2)) 
+  || (p1 && (size_t) length > strlen (p1
+{
+  tree fn = builtin_decl_implicit (BUILT_IN_STRCMP);
+  if (!fn)
+return false;
+  gimple *repl = gimple_build_call (fn, 2, str1, str2);
+  replace_call_with_call_and_fold (gsi, repl);
+  return true;
+}
+
   return false;
 }
 
diff --git a/gcc/testsuite/gcc.dg/strcmpopt_1.c 
b/gcc/testsuite/gcc.dg/strcmpopt_1.c
new file mode 100644
index 000..40596a2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/strcmpopt_1.c
@@ -0,0 +1,28 @@
+/* { dg-do run } */
+/* { dg-options "-fdump-tree-gimple" } */
+
+#include 
+#include 
+
+int cmp1 (char *p)
+{
+  return strncmp (p, "fis", 4);
+}
+int cmp2 (char *q)
+{
+  return strncmp ("fis", q, 4);
+}
+
+int main ()
+{
+
+  char *p = "fish";
+  char *q = "fis\0";
+
+  if (cmp1 (p) == 0 || cmp2 (q) != 0)
+abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "strcmp \\(" 2 "gimple" } } */
-- 
1.9.1



Patch to fix an undefined behavior in fortran/decl.c

2017-12-01 Thread Qing Zhao
Hi,

this is a very straightforward fix for an undefined behavior in fortran/decl.c:


In the man page of sprintf, it's clearly state:

===
NOTES
  Some programs imprudently rely on code such as the following

  sprintf(buf, "%s some further text", buf);

  to append text to buf.  However, the standards explicitly note that the  
results
  are  undefined if source and destination buffers overlap when calling 
sprintf(),
  snprintf(), vsprintf(), and vsnprintf().  Depending on  the  version  of  
gcc(1)
  used,  and  the compiler options employed, calls such as the above will 
not pro‐
  duce the expected results.
===

in gcc/fortran/decl.c, there is exactly such case as following:

3361   sprintf (name, "%s_%d", name, kind_value);

fixed it in this patch.

bootstraped and tested on both X86 and Aarch64. no regression.

Okay for trunk?

thanks.

Qing


gcc/fortran/ChangeLog

2017-11-30  Qing Zhao  <qing.z...@oracle.com <mailto:qing.z...@oracle.com>>

   * fortran/decl.c (gfc_get_pdt_instance): Adjust the call to
   sprintf to avoid the same buffer being both source and
   destination.

---
gcc/fortran/decl.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/fortran/decl.c b/gcc/fortran/decl.c
index e57cfde..02dda24 100644
--- a/gcc/fortran/decl.c
+++ b/gcc/fortran/decl.c
@@ -3358,7 +3358,7 @@ gfc_get_pdt_instance (gfc_actual_arglist *param_list, 
gfc_symbol **sym,
   }

  gfc_extract_int (kind_expr, _value);
-  sprintf (name, "%s_%d", name, kind_value);
+  sprintf (name + strlen (name), "_%d", kind_value);

  if (!name_seen && actual_param)
   actual_param = actual_param->next;
-- 
1.9.1

[PATCH][Middle-end]79538 missing -Wformat-overflow with %s and non-member array arguments

2017-12-04 Thread Qing Zhao
Hi,

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79538 
<https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79538>
missing -Wformat-overflow with %s and non-member array arguments

-Wformat-overflow uses the routine "get_range_strlen" to decide the
maximum string length, however, currently "get_range_strlen" misses
the handling of non-member arrays.

Adding the handling of non-member array resolves the issue.
Adding test case pr79538.c into gcc.dg.

During gcc bootstrap, 2 source files (c-family/c-cppbuiltin.c,
fortran/class.c) were detected new warnings by -Wformat-overflow
due to the new handling of non-member array in "get_range_strlen".
in order to avoid these new warnings and continue with bootstrap,
updating these 2 files to avoid the warnings.

in c-family/c-cppbuiltin.c, the warning is following:

../../latest_gcc_2/gcc/c-family/c-cppbuiltin.c:1627:15: note:
‘sprintf’ output 2 or more bytes (assuming 257) into a destination
of size 256
  sprintf (buf1, "%s=%s", macro, buf2);
  ^~~~
in the above, buf1 and buf2 are declared as:
char buf1[256], buf2[256];

i.e, buf1 and buf2 have same size. adjusting the size of buf1 and
buf2 resolves the warning.

fortran/class.c has the similar issue as above. Instead of adjusting
size of the buffers, replacing sprintf with xasprintf.

bootstraped and tested on both X86 and aarch64. no regression.

Okay for trunk?

thanks.

Qing

===

gcc/ChangeLog

2017-11-30  Qing Zhao  <qing.z...@oracle.com <mailto:qing.z...@oracle.com>>

  PR middle_end/79538
  * gimple-fold.c (get_range_strlen): Add the handling of non-member array.

gcc/fortran/ChangeLog

2017-11-30  Qing Zhao  <qing.z...@oracle.com <mailto:qing.z...@oracle.com>>

   PR middle_end/79538
   * class.c (gfc_build_class_symbol): Replace call to
   sprintf with xasprintf to avoid format-overflow warning.
   (generate_finalization_wrapper): Likewise.
   (gfc_find_derived_vtab): Likewise.
   (find_intrinsic_vtab): Likewise.


gcc/c-family/ChangeLog

2017-11-30  Qing Zhao  <qing.z...@oracle.com <mailto:qing.z...@oracle.com>>

  PR middle_end/79538 
* c-cppbuiltin.c (builtin_define_with_hex_fp_value):
  Adjust the size of buf1 and buf2, add a new buf to avoid
  format-overflow warning.

gcc/testsuite/ChangeLog

2017-11-30  Qing Zhao  <qing.z...@oracle.com <mailto:qing.z...@oracle.com>>

  PR middle_end/79538
  * gcc.dg/pr79538.c: New test.

---
gcc/c-family/c-cppbuiltin.c| 10 -
gcc/fortran/class.c| 49 --
gcc/gimple-fold.c  | 13 +++
gcc/testsuite/gcc.dg/pr79538.c | 23 
4 files changed, 69 insertions(+), 26 deletions(-)
create mode 100644 gcc/testsuite/gcc.dg/pr79538.c

diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c
index 2ac9616..9e33aed 100644
--- a/gcc/c-family/c-cppbuiltin.c
+++ b/gcc/c-family/c-cppbuiltin.c
@@ -1613,7 +1613,7 @@ builtin_define_with_hex_fp_value (const char *macro,
  const char *fp_cast)
{
  REAL_VALUE_TYPE real;
-  char dec_str[64], buf1[256], buf2[256];
+  char dec_str[64], buf[256], buf1[128], buf2[64];

  /* This is very expensive, so if possible expand them lazily.  */
  if (lazy_hex_fp_value_count < 12
@@ -1656,11 +1656,11 @@ builtin_define_with_hex_fp_value (const char *macro,

  /* Assemble the macro in the following fashion
 macro = fp_cast [dec_str fp_suffix] */
-  sprintf (buf1, "%s%s", dec_str, fp_suffix);
-  sprintf (buf2, fp_cast, buf1);
-  sprintf (buf1, "%s=%s", macro, buf2);
+  sprintf (buf2, "%s%s", dec_str, fp_suffix);
+  sprintf (buf1, fp_cast, buf2);
+  sprintf (buf, "%s=%s", macro, buf1);

-  cpp_define (parse_in, buf1);
+  cpp_define (parse_in, buf);
}

/* Return a string constant for the suffix for a value of type TYPE
diff --git a/gcc/fortran/class.c b/gcc/fortran/class.c
index ebbd41b..a08fb8d 100644
--- a/gcc/fortran/class.c
+++ b/gcc/fortran/class.c
@@ -602,7 +602,8 @@ bool
gfc_build_class_symbol (gfc_typespec *ts, symbol_attribute *attr,
gfc_array_spec **as)
{
-  char name[GFC_MAX_SYMBOL_LEN+1], tname[GFC_MAX_SYMBOL_LEN+1];
+  char tname[GFC_MAX_SYMBOL_LEN+1];
+  char *name;
  gfc_symbol *fclass;
  gfc_symbol *vtab;
  gfc_component *c;
@@ -633,17 +634,17 @@ gfc_build_class_symbol (gfc_typespec *ts, 
symbol_attribute *attr,
  rank = !(*as) || (*as)->rank == -1 ? GFC_MAX_DIMENSIONS : (*as)->rank;
  get_unique_hashed_string (tname, ts->u.derived);
  if ((*as) && attr->allocatable)
-sprintf (name, "__class_%s_%d_%da", tname, rank, (*as)->corank);
+name = xasprintf ("__class_%s_%d_%da", tname, rank, (*as)->corank);
  else if ((*as) && att

Re: [PATCH][Middle-end]79538 missing -Wformat-overflow with %s and non-member array arguments

2017-12-12 Thread Qing Zhao
Hi, Richard,

thanks a lot for your review.

> 
>>{
>>  /* __copy is always the same for characters.
>> Check to see if copy function already exists.  */
>> - sprintf (name, "__copy_character_%d", ts->kind);
>> + name = xasprintf ("__copy_character_%d", ts->kind);
>>  contained = ns->contained;
>>  for (; contained; contained = contained->sibling)
>>if (contained->proc_name
>> @@ -2796,6 +2802,7 @@ find_intrinsic_vtab (gfc_typespec *ts)
>>  vtab->ts.u.derived = vtype;
>>  vtab->value = gfc_default_initializer (>ts);
>>}
>> +  free (name);
> 
> It looks like this might be in a performance critical lookup path
> where we'd really
> like to avoid allocating/freeing memory.  Why's GFC_MAX_SYMBOL_LEN+1 not
> enough?  Leaving for fortran folks to comment - note you should CC
> fort...@gcc.gnu.org 
> for fortran patches (done).

I have sent this patch to fort...@gcc.gnu.org  per 
Thomas’s suggestion last week, and got approval by fortran team on last Friday:

https://gcc.gnu.org/ml/fortran/2017-12/msg00027.html

> 
>>}
>> 
>>  found_sym = vtab;
>> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
>> index 353a46e..fef1969 100644
>> --- a/gcc/gimple-fold.c
>> +++ b/gcc/gimple-fold.c
>> @@ -1323,6 +1323,19 @@ get_range_strlen (tree arg, tree length[2], bitmap 
>> *visited, int type,
>> the array could have zero length.  */
>>  *minlen = ssize_int (0);
>>}
>> +
>> +  if (VAR_P (arg)
>> +  && TREE_CODE (TREE_TYPE (arg)) == ARRAY_TYPE)
>> +{
>> +  val = TYPE_SIZE_UNIT (TREE_TYPE (arg));
>> +  if (!val || integer_zerop (val))
> 
> val might be non-constant so you also need to check TREE_CODE (val) !=
> INTEGER_CST here.

Okay.
> 
>> +return false;
>> +  val = fold_build2 (MINUS_EXPR, TREE_TYPE (val), val,
>> + integer_one_node);
> 
>   val = wide_int_to_tree (TREE_TYPE (val), wi::minus (wi::to_wide (val), 1));
> 
> you pass a possibly bogus type of 1 to fold_build2 above so the wide-int 
> variant
> is prefered.
> 
> The gimple-fold.c changes are ok with that change.

Per your comments, the updated gimple-fold.c is like the following:

diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index 353a46e..0500fba 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -1323,6 +1323,19 @@ get_range_strlen (tree arg, tree length[2], bitmap 
*visited, int type,
 the array could have zero length.  */
  *minlen = ssize_int (0);
}
+
+  if (VAR_P (arg) 
+  && TREE_CODE (TREE_TYPE (arg)) == ARRAY_TYPE)
+{
+  val = TYPE_SIZE_UNIT (TREE_TYPE (arg));
+  if (!val || TREE_CODE (val) != INTEGER_CST || integer_zerop 
(val))
+return false;
+  val = fold_build2 (MINUS_EXPR, TREE_TYPE (val), val,
+build_int_cst (TREE_TYPE (val), 1));
+  /* Set the minimum size to zero since the string in
+ the array could have zero length.  */
+  *minlen = ssize_int (0);
+}
}
 
   if (!val)

let me know any further issue with the above.

thanks a lot.

Qing





Ping: [PATCH][Middle-end][version 3]2nd patch of PR78809 and PR83026

2018-05-22 Thread Qing Zhao
Ping for the following patch sent in 3 months ago in the end of GCC8:

https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg184075.html

I have rebased the patch on the latest GCC9 thunk. 

bootstraped and tested on both X86 and Aarch64. no regression.

the following are more details:



Hi, this is the 3rd version for this patch.

the main change compared with 2nd version are:
1. do not use “compute_objsize” to get the minimum object size per Jeff 
and Richard’s
comment. Instead, add a new function “determine_min_objsize” for this purpose. 
This new
function calls “compute_builtin_object_size” to get the minimum objsize, 
however, due to 
the fact that “compute_builtin_object_size” does nothing for SSA_NAME when 
optimize > 0 (don’t
know the exactly reason for this), inside “determine_min_objsize”, I have to 
add  more code
to catch up some simple SSA_NAME cases.

2. in gimple-fold.c and tree-ssa-structalias.c, add the handling of the 
new 
BUILT_IN_STRCMP_EQ and BUILT_IN_STRNCMP_EQ in the same places where 
BUILT_IN_STRCMP and BUILT_IN_STRNCMP is checked.

3. some  format change and comments change per Jeff’s comment. 

let me know if you have any comments.

thanks a lot.

Qing

*

2nd Patch for PR78009 
Patch for PR83026

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809
Inline strcmp with small constant strings

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83026
missing strlen optimization for strcmp of unequal strings

The design doc for PR78809 is at:
https://www.mail-archive.com/gcc@gcc.gnu.org/msg83822.html

this patch is for the second part of change of PR78809 and PR83026:

B. for strncmp (s1, s2, n) (!)= 0 or strcmp (s1, s2) (!)= 0

B.1. (PR83026) When the lengths of both arguments are constant and
 it's a strcmp:
   * if the lengths are NOT equal, we can safely fold the call
 to a non-zero value.
   * otherwise, do nothing now.

B.2. (PR78809) When the length of one argument is constant, try to replace
the call with a __builtin_str(n)cmp_eq call where possible, i.e:

strncmp (s, STR, C) (!)= 0 in which, s is a pointer to a string, STR is a
string with constant length, C is a constant.
  if (C <= strlen(STR) && sizeof_array(s) > C)
{
  replace this call with
  __builtin_strncmp_eq (s, STR, C) (!)= 0
}
  if (C > strlen(STR)
{
  it can be safely treated as a call to strcmp (s, STR) (!)= 0
  can handled by the following strcmp.
}

strcmp (s, STR) (!)= 0 in which, s is a pointer to a string, STR is a
string with constant length.
  if  (sizeof_array(s) > strlen(STR))
{
  replace this call with
  __builtin_strcmp_eq (s, STR, strlen(STR)+1) (!)= 0
}

later when expanding the new __builtin_str(n)cmp_eq calls, first expand them
as __builtin_memcmp_eq, if the expansion does not succeed, change them back
to call to __builtin_str(n)cmp.

adding test case strcmpopt_2.c and strcmpopt_4.c into gcc.dg for part B of
PR78809 adding test case strcmpopt_3.c into gcc.dg for PR83026

bootstraped and tested on both X86 and Aarch64. no regression.


gcc/ChangeLog

+2018-05-21  
+   
+   PR middle-end/78809
+   PR middle-end/83026
+   * builtins.c (expand_builtin): Add the handling of BUILT_IN_STRCMP_EQ
+   and BUILT_IN_STRNCMP_EQ.
+   * builtins.def: Add new builtins BUILT_IN_STRCMP_EQ and
+   BUILT_IN_STRNCMP_EQ.
+   * gimple-fold.c (gimple_fold_builtin_string_compare): Add the 
+   handling of BUILTIN_IN_STRCMP_EQ and BUILT_IN_STRNCMP_EQ.
+   (gimple_fold_builtin): Likewise.
+   * tree-ssa-strlen.c (compute_string_length): New function.
+   (determine_min_obsize): New function.
+   (handle_builtin_string_cmp): New function to handle calls to
+   string compare functions.
+   (strlen_optimize_stmt): Add handling to builtin string compare
+   calls. 
+   * tree-ssa-structalias.c (find_func_aliases_for_builtin_call):
+   Add the handling of BUILT_IN_STRCMP_EQ and BUILT_IN_STRNCMP_EQ.
+   * tree.c (build_common_builtin_nodes): Add new defines of
+   BUILT_IN_STRNCMP_EQ and BUILT_IN_STRCMP_EQ.
+

gcc/testsuite/ChangeLog

+2018-05-21  
+
+   PR middle-end/78809
+   * gcc.dg/strcmpopt_2.c: New testcase.
+   * gcc.dg/strcmpopt_3.c: New testcase.
+
+   PR middle-end/83026
+   * gcc.dg/strcmpopt_3.c: New testcase.



PR78809_B.patch
Description: Binary data


[PATCH][Middle-end]3rd patch of PR78809

2018-06-18 Thread Qing Zhao
Hi,

this is the 3rd (and the last) patch for PR78809:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809
Inline strcmp with small constant strings

The design doc for PR78809 is at:
https://www.mail-archive.com/gcc@gcc.gnu.org/msg83822.html

this patch is for the third part of change of PR78809.

C. for strcmp (s1, s2), strncmp (s1, s2, n), and memcmp (s1, s2, n)
   if the result is NOT used to do simple equality test against zero, one of
"s1" or "s2" is a small constant string, n is a constant, and the Min value of
the length of the constant string and "n" is smaller than a predefined
threshold T,
   inline the call by a byte-to-byte comparision sequence to avoid calling
overhead.

adding test case strcmpopt_5.c into gcc.dg for part C of PR78809.

bootstraped and tested on both X86 and Aarch64. no regression.

I have done some experiments to check the run-time performance impact 
and code-size impact from such inlining with different threshold for the
length of the constant string to inline, the data were recorded in the 
attachments of 
PR78809:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809.

and finally decide that the default value of the threshold is 3. 
this value of the threshold can be adjusted by the new option:

--param builtin-string-cmp-inline-length=

The following is the patch.

thanks.

Qing

gcc/ChangeLog:

+2018-06-18  Qing Zhao  
+
+   PR middle-end/78809
+   * builtins.c (expand_builtin_memcmp): Inline the calls first
+   when result_eq is false.
+   (expand_builtin_strcmp): Inline the calls first.
+   (expand_builtin_strncmp): Likewise.
+   (inline_string_cmp): New routine. Expand a string compare 
+   call by using a sequence of char comparison.
+   (inline_expand_builtin_string_cmp): New routine. Inline expansion
+   a call to str(n)cmp/memcmp.
+   * doc/invoke.texi (--param builtin-string-cmp-inline-length): New 
option.
+   * params.def (BUILTIN_STRING_CMP_INLINE_LENGTH): New.
+

gcc/testsuite/ChangeLog:

+2018-06-18  Qing Zhao  
+
+   PR middle-end/78809
+   * gcc.dg/strcmpopt_5.c: New test.
+




0001-3nd-Patch-for-PR78009.patch
Description: Binary data


Re: [PATCH][Middle-end][version 3]2nd patch of PR78809 and PR83026

2018-05-29 Thread Qing Zhao
Hi, Jeff,

Thanks a lot for your review and comments.

I have updated my patch based on your suggestion, and retested this whole patch 
on both X86 and aarch64.

please take a look at the patch again.

thanks.

Qing

> On May 25, 2018, at 3:38 PM, Jeff Law  wrote:

> So I originally thought you had the core logic wrong in the immediate
> uses loop.  But it's actually the case that the return value is the
> exact opposite of what I expected.
> 
> ie, I expected "TRUE" to mean the call was transformed, "FALSE" if it
> was not transformed.
> 
> Can you fix that so it's not so confusing?
> 
> I think with that change we'll be good to go, but please repost for a
> final looksie.
> 
> THanks,
> Jeff



0001-2nd-Patch-for-PR78009.patch
Description: Binary data


committed: [PATCH][Middle-end][version 3]2nd patch of PR78809 and PR83026

2018-05-31 Thread Qing Zhao
Hi, 

I have committed the patch as revision 261039.

thanks.

Qing

> On May 29, 2018, at 7:08 PM, Qing Zhao  wrote:
> 
> Hi, Jeff,
> 
> Thanks a lot for your review and comments.
> 
> I have updated my patch based on your suggestion, and retested this whole 
> patch on both X86 and aarch64.
> 
> please take a look at the patch again.
> 
> thanks.
> 
> Qing
> 
>> On May 25, 2018, at 3:38 PM, Jeff Law  wrote:
> 
>> So I originally thought you had the core logic wrong in the immediate
>> uses loop.  But it's actually the case that the return value is the
>> exact opposite of what I expected.
>> 
>> ie, I expected "TRUE" to mean the call was transformed, "FALSE" if it
>> was not transformed.
>> 
>> Can you fix that so it's not so confusing?
>> 
>> I think with that change we'll be good to go, but please repost for a
>> final looksie.
>> 
>> THanks,
>> Jeff


Re: [PATCH][Middle-end][version 3]2nd patch of PR78809 and PR83026

2018-06-25 Thread Qing Zhao


> On Jun 22, 2018, at 11:49 PM, Jeff Law  wrote:
> 
> On 05/29/2018 06:08 PM, Qing Zhao wrote:
>> Hi, Jeff,
>> 
>> Thanks a lot for your review and comments.
>> 
>> I have updated my patch based on your suggestion, and retested this whole 
>> patch on both X86 and aarch64.
>> 
>> please take a look at the patch again.
>> 
>> thanks.
>> 
>> Qing
>> 
>>> On May 25, 2018, at 3:38 PM, Jeff Law  wrote:
>>> So I originally thought you had the core logic wrong in the immediate
>>> uses loop.  But it's actually the case that the return value is the
>>> exact opposite of what I expected.
>>> 
>>> ie, I expected "TRUE" to mean the call was transformed, "FALSE" if it
>>> was not transformed.
>>> 
>>> Can you fix that so it's not so confusing?
>>> 
>>> I think with that change we'll be good to go, but please repost for a
>>> final looksie.
>>> 
>>> THanks,
>>> Jeff
>> 
>> 
>> 0001-2nd-Patch-for-PR78009.patch
>> 
>> 
>> From 750f44ee0777d55b568f07e263babdedd532d315 Mon Sep 17 00:00:00 2001
>> From: qing zhao mailto:qing.z...@oracle.com>>
>> Date: Tue, 29 May 2018 16:15:21 -0400
>> Subject: [PATCH] 2nd Patch for PR78009 Patch for PR83026
>> 
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809 
>> <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809>
>> Inline strcmp with small constant strings
>> 
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83026 
>> <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83026>
>> missing strlen optimization for strcmp of unequal strings
>> 
>> The design doc for PR78809 is at:
>> https://www.mail-archive.com/gcc@gcc.gnu.org/msg83822.html 
>> <https://www.mail-archive.com/gcc@gcc.gnu.org/msg83822.html>
>> 
>> this patch is for the second part of change of PR78809 and PR83026:
>> 
>> B. for strncmp (s1, s2, n) (!)= 0 or strcmp (s1, s2) (!)= 0
>> 
>>   B.1. (PR83026) When the lengths of both arguments are constant and
>>it's a strcmp:
>>  * if the lengths are NOT equal, we can safely fold the call
>>to a non-zero value.
>>  * otherwise, do nothing now.
>> 
>>   B.2. (PR78809) When the length of one argument is constant, try to replace
>>   the call with a __builtin_str(n)cmp_eq call where possible, i.e:
>> 
>>   strncmp (s, STR, C) (!)= 0 in which, s is a pointer to a string, STR is a
>>   string with constant length, C is a constant.
>> if (C <= strlen(STR) && sizeof_array(s) > C)
>>   {
>> replace this call with
>> __builtin_strncmp_eq (s, STR, C) (!)= 0
>>   }
>> if (C > strlen(STR)
>>   {
>> it can be safely treated as a call to strcmp (s, STR) (!)= 0
>> can handled by the following strcmp.
>>   }
>> 
>>   strcmp (s, STR) (!)= 0 in which, s is a pointer to a string, STR is a
>>   string with constant length.
>> if  (sizeof_array(s) > strlen(STR))
>>   {
>> replace this call with
>> __builtin_strcmp_eq (s, STR, strlen(STR)+1) (!)= 0
>>   }
>> 
>>   later when expanding the new __builtin_str(n)cmp_eq calls, first expand 
>> them
>>   as __builtin_memcmp_eq, if the expansion does not succeed, change them back
>>   to call to __builtin_str(n)cmp.
>> 
>> adding test case strcmpopt_2.c and strcmpopt_4.c into gcc.dg for part B of
>> PR78809 adding test case strcmpopt_3.c into gcc.dg for PR83026
>> 
>> bootstraped and tested on both X86 and Aarch64. no regression.
>> ---
>> gcc/builtins.c |  33 
>> gcc/builtins.def   |   5 +
>> gcc/gimple-fold.c  |   5 +
>> gcc/testsuite/gcc.dg/strcmpopt_2.c |  67 
>> gcc/testsuite/gcc.dg/strcmpopt_3.c |  31 
>> gcc/testsuite/gcc.dg/strcmpopt_4.c |  16 ++
>> gcc/tree-ssa-strlen.c  | 304 
>> ++---
>> gcc/tree-ssa-structalias.c |   2 +
>> gcc/tree.c |   8 +
>> 9 files changed, 454 insertions(+), 17 deletions(-)
>> create mode 100644 gcc/testsuite/gcc.dg/strcmpopt_2.c
>> create mode 100644 gcc/testsuite/gcc.dg/strcmpopt_3.c
>> create mode 100644 gcc/testsuite/gcc.dg/strcmpopt_4.c
> Sorry for the long delay.  This needs a ChangeLog.  With the ChangeLog
> it is OK for the trunk.

Hi, Jeff,

the patch had been committed with ChangeLogs as following:

https://gcc.gnu.org/viewcvs/gcc?view=revision=261039 
<https://gcc.gnu.org/viewcvs/gcc?view=revision=261039>

thanks.

Qing

> 
> Jeff



Re: [PATCH][Middle-end]3rd patch of PR78809

2018-07-02 Thread Qing Zhao
Hi, Jeff,

thanks a lot for your review and comments.

I have addressed your comments,updated the patch, retested on both
aarch64 and x86.

The major changes in this version compared to the previous version are:

1. in routine expand_builtin_memcmp:
* move the inlining transformation AFTER the warning is issues for
-Wstringop-overflow;
* only apply inlining when there is No warning is issued.
2. in the testsuite, add a new testcase strcmpopt_6.c for this case.
3. update comments to:
* capitalize the first word.
* capitalize all the arguments.

NOTE, the routine expand_builtin_strcmp and expand_builtin_strncmp are not 
changed.
the reason is:  there is NO overflow checking for these two routines currently.
if we need overflow checking for these two routines, I think that a separate 
patch is needed.
if this is needed, let me know, I can work on this separate patch for issuing 
warning for strcmp/strncmp when
-Wstringop-overflow is specified.

The new patch is as following, please take a look at it.

thanks.

Qing

gcc/ChangeLog

+2018-07-02  Qing Zhao  
+
+   PR middle-end/78809
+   * builtins.c (expand_builtin_memcmp): Inline the calls first
+   when result_eq is false.
+   (expand_builtin_strcmp): Inline the calls first.
+   (expand_builtin_strncmp): Likewise.
+   (inline_string_cmp): New routine. Expand a string compare
+   call by using a sequence of char comparison.
+   (inline_expand_builtin_string_cmp): New routine. Inline expansion
+   a call to str(n)cmp/memcmp.
+   * doc/invoke.texi (--param builtin-string-cmp-inline-length): New 
option.
+   * params.def (BUILTIN_STRING_CMP_INLINE_LENGTH): New.
+

gcc/testsuite/ChangeLog

+2018-07-02  Qing Zhao  
+
+   PR middle-end/78809
+   * gcc.dg/strcmpopt_5.c: New test.
+   * gcc.dg/strcmpopt_6.c: New test.
+



0001-3nd-Patch-for-PR78009.patch
Description: Binary data


> But leave them as lower case in code fragments like this.
> 
> 
> So this generally looks pretty good.  THe biggest technical concern is
> making sure we're doing the right thing WRT issuing warnings.  You can
> tackle that problem by deferring inlining to a later point after
> warnings have been issued or by verifying that your routines do not
> inline in cases where warnings will be issued.  It may be worth adding
> testcases for these issues.
> 
> There's a large number of comments that need capitalization fixes.
> 
> Given there was no measured runtime performance impact, but slight
> improvements on codesize for values <= 3, let's go ahead with that as
> the default.
> 
> Can you address the issues above and repost for final review?
> 
> Thanks,
> jeff



[version 2] Re: [PATCH][Middle-end]3rd patch of PR78809

2018-07-02 Thread Qing Zhao
Hi, Jeff,

thanks a lot for your review and comments.

I have addressed your comments,updated the patch, retested on both
aarch64 and x86.

The major changes in this version compared to the previous version are:

1. in routine “expand_builtin_memcmp”:
* move the inlining transformation AFTER the warning is issues for
-Wstringop-overflow;
* only apply inlining when there is No warning is issued.
2. in the testsuite, add a new testcase strcmpopt_6.c for this case.
3. update comments to:
* capitalize the first word.
* capitalize all the arguments.

NOTE, the routine ”expand_builtin_strcmp” and “expand_builtin_strncmp" are not 
changed.
the reason is:  there is NO overflow checking for these two routines currently.
if we need overflow checking for these two routines, I think that a separate 
patch is needed.
if this is needed, let me know, I can work on this separate patch for issuing 
warning for strcmp/strncmp when
-Wstringop-overflow is specified.

The new patch is as following, please take a look at it.

thanks.

Qing

gcc/ChangeLog

+2018-07-02  Qing Zhao  
+
+   PR middle-end/78809
+   * builtins.c (expand_builtin_memcmp): Inline the calls first
+   when result_eq is false.
+   (expand_builtin_strcmp): Inline the calls first.
+   (expand_builtin_strncmp): Likewise.
+   (inline_string_cmp): New routine. Expand a string compare
+   call by using a sequence of char comparison.
+   (inline_expand_builtin_string_cmp): New routine. Inline expansion
+   a call to str(n)cmp/memcmp.
+   * doc/invoke.texi (--param builtin-string-cmp-inline-length): New 
option.
+   * params.def (BUILTIN_STRING_CMP_INLINE_LENGTH): New.
+

gcc/testsuite/ChangeLog

+2018-07-02  Qing Zhao  
+
+   PR middle-end/78809
+   * gcc.dg/strcmpopt_5.c: New test.
+   * gcc.dg/strcmpopt_6.c: New test.
+



0001-3nd-Patch-for-PR78009.patch
Description: Binary data


> On Jun 28, 2018, at 12:10 AM, Jeff Law  wrote:
> So I still need to dig into this patch.  But I wanted to raise an
> potential issue and get yours and Martin's thoughts on it.
> 
> Martin (and others) have been working hard to improve GCC's ability to
> give good diagnostics for various problems with calls to mem* and str*
> functions (buffer overflows, restrict issues, etc).
> 
> One of the problems Martin has identified is early conversion of these
> calls into inlined direct operations.  If those conversions happen prior
> to the analysis for warnings we obviously can't issue any relevant warnings.
> 
> Please capitalize the first word in sentences like this.  This nit
> appears in most of your comments.
> 
> 
> So I believe you do inline expansion here prior to the checks for
> Wstringop_overflow.  I think you can safely move this code to after the
> warn_stringop_overflow checks.  Though you may want to make this code
> conditional on both calls to check_access returning true and avoiding
> your transformation if either or both calls return false.
> 
> Alternately you'd need to verify that inline_expand_builtin_string_cmp
> always returns false for cases which are going to generate a warning.
> But that seems a bit tougher to maintain over time if we were to add
> more warnings to this code.
> 
> When referring to arguments in comments, please capitalize them.  ie
> VAR_STR, CONST_STR, etc.
> 
> 
> But leave them as lower case in code fragments like this.
> 
> 
> So this generally looks pretty good.  THe biggest technical concern is
> making sure we're doing the right thing WRT issuing warnings.  You can
> tackle that problem by deferring inlining to a later point after
> warnings have been issued or by verifying that your routines do not
> inline in cases where warnings will be issued.  It may be worth adding
> testcases for these issues.
> 
> There's a large number of comments that need capitalization fixes.
> 
> Given there was no measured runtime performance impact, but slight
> improvements on codesize for values <= 3, let's go ahead with that as
> the default.
> 
> Can you address the issues above and repost for final review?
> 
> Thanks,
> jeff



Test (please ignore)

2018-07-02 Thread Qing Zhao




Ping*2 [PATCH][Middle-end][version 2]2nd patch of PR78809 and PR83026

2018-01-09 Thread Qing Zhao
Hi,

I’d like to ping the following patch the 2nd time:

https://gcc.gnu.org/ml/gcc-patches/2017-12/msg01454.html 

I already addressed all the issues raised for the first version of the patch, 
and retested on both X86 and Aarch64, without any issue.

I’d really like to make it into GCC 8. 

thanks a lot for the help.

Qing

For your reference, the first version of this patch is at:

https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00962.html 



Re: [PATCH][Middle-end][version 2]2nd patch of PR78809 and PR83026

2018-01-25 Thread Qing Zhao
Hi, Jeff,

Really sorry for my  delay. 

Your email on 1/12/2018 and Richard’s email on 1/15/2018,   were completely 
lost in my mailboxes until yesterday my colleague, Paolo Carlini, forwarded it 
to me. 

I really apologize for the late reply.

Please see my reply below:

> On Jan 12, 2018, at 4:47 PM, Jeff Law  > wrote:
>> 
>> 
>>  1. replace the candidate calls with __builtin_str(n)cmp_eq instead of 
>> __builtin_memcmp_eq;
>>in builtins.c,  when expanding the new __builtin_str(n)cmp_eq 
>> call, expand them first as
>>__builtin_memcmp_eq, if Not succeed,  change the call back to 
>> __builtin_str(n)cmp.
>>  2. change the call to “get_range_strlen” with “compute_objsize”.
> Please read the big comment before compute_objsize.  If you are going to
> use it to influence code generation or optimization, then you're most
> likely doing something wrong.
> 
> compute_objsize can return an estimate in some circumstances.

> On Jan 15, 2018, at 2:48 AM, Richard Biener  > wrote:
> 
> Yes, compute_objsize cannot be used for optimizations.
Yes, I just checked the compute_objsize again, you are right, it might return 
an estimated code size.


> On Jan 12, 2018, at 4:47 PM, Jeff Law  > wrote:

> What I don't like here is the introduction of STRCMP_EQ and STRNCMP_EQ.
> ISTM that if you're going to introduce those new builtins, then you have
> to audit all the code that runs between their introduction into the IL
> and when you expand them to ensure they're handled properly.
> 
> All you're really doing is carrying along a status bit about what
> tree-ssa-strlen did.  So you could possibly store that status bit somewhere.
> 
> The concern with both is that something later invalidates the analysis
> you've done.  I'm having a hard time coming up with a case where this
> could happen, but I'm definitely concerned about this possibility.
> Though I feel it's more likely to happen if we store a status bit vs
> using STRCMP_EQ STRNCMP_EQ.
> 
> [ For example, we have two calls with the same arguments, but one feeds
> an equality test, the other does not.  If we store a status bit that one
> could be transformed, but then we end up CSE-ing the two calls, the
> status bit would be incorrect because one of the calls did not feed an
> equality test.  Hmmm. ]

See below.

>> +static HOST_WIDE_INT 
>> +compute_string_length (int idx)
>> +{
>> +  HOST_WIDE_INT string_leni = -1; 
>> +  gcc_assert (idx != 0);
>> +
>> +  if (idx < 0)
>> +string_leni = ~idx;
> So it seems to me you should just
>  return ~idx;
> 
> Then appropriately simplify the rest of the code.

Okay, I will update my code per your suggestion. 
> 
>> +
>> +/* Handle a call to strcmp or strncmp. When the result is ONLY used to do 
>> +   equality test against zero:
>> +
>> +   A. When both arguments are constant strings and it's a strcmp:
>> +  * if the length of the strings are NOT equal, we can safely fold the 
>> call
>> +to a non-zero value.
>> +  * otherwise, do nothing now.
> I'm guessing your comment needs a bit of work.  If both arguments are
> constant strings, then we can just use the host str[n]cmp to fold the
> str[n]cmp to a constant.  Presumably this is handled earlier :-)
> 
> So what I'm guessing is you're really referring to the case where the
> lengths are known constants, even if the contents of the strings
> themselves are not.  In that case if its an equality comparison, then
> you can fold to a constant.  Otherwise we do nothing.  So I think the
> comment needs updating here.

your understanding is correct, I will update the comments to make it more 
accurate.

>> +  
>> +   B. When one of the arguments is constant string, try to replace the call 
>> with
>> +   a __builtin_str(n)cmp_eq call where possible, i.e:
>> +
>> +   strncmp (s, STR, C) (!)= 0 in which, s is a pointer to a string, STR is 
>> a 
>> +   constant string, C is a constant.
>> + if (C <= strlen(STR) && sizeof_array(s) > C)
>> +   {
>> + replace this call with
>> + strncmp_eq (s, STR, C) (!)= 0
>> +   }
>> + if (C > strlen(STR)
>> +   {
>> + it can be safely treated as a call to strcmp (s, STR) (!)= 0
>> + can handled by the following strcmp.
>> +   }
>> +
>> +   strcmp (s, STR) (!)= 0 in which, s is a pointer to a string, STR is a 
>> +   constant string.
>> + if  (sizeof_array(s) > strlen(STR))
>> +   {
>> + replace this call with
>> + strcmp_eq (s, STR, strlen(STR)+1) (!)= 0
> So I'd hazard a guess that this comment has the same problem.  It's
> mixing up the concept of a constant string and the concept of a
> nonconstant string, but which has a known constant length.
> 
> First, if one of the arguments is a string constant, then it should be
> easy to get the constant at expansion time with minimal effort.  It's

Re: [PATCH][Middle-end][version 2]2nd patch of PR78809 and PR83026

2018-01-25 Thread Qing Zhao

> 
> We're now in stage4 so please queue this patch and ping it during
> next stage1.
> 

I will update my patch based on Jeff and your comments, and send it during next 
stage 1.

thanks.

Qing



[PATCH][Middle-end][version 3]2nd patch of PR78809 and PR83026

2018-02-07 Thread Qing Zhao
Hi, this is the 3rd version for this patch.

the main change compared with 2nd version are:
1. do not use “compute_objsize” to get the minimum object size per Jeff 
and Richard’s
comment. Instead, add a new function “determine_min_objsize” for this purpose. 
This new
function calls “compute_builtin_object_size” to get the minimum objsize, 
however, due to 
the fact that “compute_builtin_object_size” does nothing for SSA_NAME when 
optimize > 0 (don’t
know the exactly reason for this), inside “determine_min_objsize”, I have to 
add  more code
to catch up some simple SSA_NAME cases.

2. in gimple-fold.c and tree-ssa-structalias.c, add the handling of the 
new 
BUILT_IN_STRCMP_EQ and BUILT_IN_STRNCMP_EQ in the same places where 
BUILT_IN_STRCMP and BUILT_IN_STRNCMP is checked.

3. some  format change and comments change per Jeff’s comment. 

let me know if you have any comments.

thanks a lot.

Qing

*

2nd Patch for PR78009 
Patch for PR83026

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809
Inline strcmp with small constant strings

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83026
missing strlen optimization for strcmp of unequal strings

The design doc for PR78809 is at:
https://www.mail-archive.com/gcc@gcc.gnu.org/msg83822.html

this patch is for the second part of change of PR78809 and PR83026:

B. for strncmp (s1, s2, n) (!)= 0 or strcmp (s1, s2) (!)= 0

 B.1. (PR83026) When the lengths of both arguments are constant and
  it's a strcmp:
* if the lengths are NOT equal, we can safely fold the call
  to a non-zero value.
* otherwise, do nothing now.

 B.2. (PR78809) When the length of one argument is constant, try to replace
 the call with a __builtin_str(n)cmp_eq call where possible, i.e:

 strncmp (s, STR, C) (!)= 0 in which, s is a pointer to a string, STR is a
 string with constant length, C is a constant.
   if (C <= strlen(STR) && sizeof_array(s) > C)
 {
   replace this call with
   __builtin_strncmp_eq (s, STR, C) (!)= 0
 }
   if (C > strlen(STR)
 {
   it can be safely treated as a call to strcmp (s, STR) (!)= 0
   can handled by the following strcmp.
 }

 strcmp (s, STR) (!)= 0 in which, s is a pointer to a string, STR is a
 string with constant length.
   if  (sizeof_array(s) > strlen(STR))
 {
   replace this call with
   __builtin_strcmp_eq (s, STR, strlen(STR)+1) (!)= 0
 }

 later when expanding the new __builtin_str(n)cmp_eq calls, first expand them
 as __builtin_memcmp_eq, if the expansion does not succeed, change them back
 to call to __builtin_str(n)cmp.

adding test case strcmpopt_2.c and strcmpopt_4.c into gcc.dg for part B of
PR78809 adding test case strcmpopt_3.c into gcc.dg for PR83026

bootstraped and tested on both X86 and Aarch64. no regression.


gcc/ChangeLog

+2018-02-02  
+   
+   PR middle-end/78809
+   PR middle-end/83026
+   * builtins.c (expand_builtin): Add the handling of BUILT_IN_STRCMP_EQ
+   and BUILT_IN_STRNCMP_EQ.
+   * builtins.def: Add new builtins BUILT_IN_STRCMP_EQ and
+   BUILT_IN_STRNCMP_EQ.
+   * gimple-fold.c (gimple_fold_builtin_string_compare): Add the 
+   handling of BUILTIN_IN_STRCMP_EQ and BUILT_IN_STRNCMP_EQ.
+   (gimple_fold_builtin): Likewise.
+   * tree-ssa-strlen.c (compute_string_length): New function.
+   (determine_min_obsize): New function.
+   (handle_builtin_string_cmp): New function to handle calls to
+   string compare functions.
+   (strlen_optimize_stmt): Add handling to builtin string compare
+   calls. 
+   * tree-ssa-structalias.c (find_func_aliases_for_builtin_call):
+   Add the handling of BUILT_IN_STRCMP_EQ and BUILT_IN_STRNCMP_EQ.
+   * tree.c (build_common_builtin_nodes): Add new defines of
+   BUILT_IN_STRNCMP_EQ and BUILT_IN_STRCMP_EQ.
+

gcc/testsuite/ChangeLog

+2018-02-02  
+
+   PR middle-end/78809
+   * gcc.dg/strcmpopt_2.c: New testcase.
+   * gcc.dg/strcmpopt_3.c: New testcase.
+
+   PR middle-end/83026
+   * gcc.dg/strcmpopt_3.c: New testcase.



PR78809_B.patch
Description: Binary data


> 
>> 
>> We're now in stage4 so please queue this patch and ping it during
>> next stage1.
>> 
> 
> I will update my patch based on Jeff and your comments, and send it during 
> next stage 1.
> 
> thanks.
> 
> Qing
> 



Ping [PATCH][Middle-end][version 2]2nd patch of PR78809 and PR83026

2018-01-02 Thread Qing Zhao
Hi,

I’d like to ping for the following patch:

https://gcc.gnu.org/ml/gcc-patches/2017-12/msg01454.html 


thanks a lot.

For your reference, the first version of this patch is at:

https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00962.html 


Qing

[PATCH][Middle-end][version 2]2nd patch of PR78809 and PR83026

2017-12-21 Thread Qing Zhao
Hi, 

I updated my patch based on all your comments. 

the major changes are the following:

1. replace the candidate calls with __builtin_str(n)cmp_eq instead of 
__builtin_memcmp_eq;
in builtins.c,  when expanding the new __builtin_str(n)cmp_eq call, 
expand them first as
__builtin_memcmp_eq, if Not succeed,  change the call back to 
__builtin_str(n)cmp.
2. change the call to “get_range_strlen” with “compute_objsize”.
3. add the missing case for equality checking with zero;
4. adjust the new testing case for PR83026; add a new testing case for 
the missing case added in 3.
5. update “uhwi” to “shwi” for where it needs;
6. some other minor format changes.

the changes are retested on x86 and aarch64, bootstrapped and regression 
tested. no issue.

Okay for trunk?

thanks.

Qing

Please see the updated patch:

gcc/ChangeLog:

+2017-12-21  Qing Zhao  <qing.z...@oracle.com>
+
+   PR middle-end/78809
+   PR middle-end/83026
+   * builtins.c (expand_builtin): Add the handling of BUILT_IN_STRCMP_EQ
+   and BUILT_IN_STRNCMP_EQ.
+   * builtins.def: Add new builtins BUILT_IN_STRCMP_EQ and
+   BUILT_IN_STRNCMP_EQ.
+   * tree-ssa-strlen.c (compute_string_length): New function.
+   (handle_builtin_string_cmp): New function to handle calls to
+   string compare functions.
+   (strlen_optimize_stmt): Add handling to builtin string compare
+   calls. 
+   * tree.c (build_common_builtin_nodes): Add new defines of
+   BUILT_IN_STRNCMP_EQ and BUILT_IN_STRCMP_EQ.
+
gcc/testsuite/ChangeLog

+2017-12-21  Qing Zhao  <qing.z...@oracle.com> 
+
+   PR middle-end/78809
+   * gcc.dg/strcmpopt_2.c: New testcase.
+   * gcc.dg/strcmpopt_3.c: New testcase.
+
+   PR middle-end/83026
+   * gcc.dg/strcmpopt_3.c: New testcase.
+

---
 gcc/builtins.c |  33 ++
 gcc/builtins.def   |   5 +
 gcc/testsuite/gcc.dg/strcmpopt_2.c |  67 
 gcc/testsuite/gcc.dg/strcmpopt_3.c |  31 ++
 gcc/testsuite/gcc.dg/strcmpopt_4.c |  16 +++
 gcc/tree-ssa-strlen.c  | 215 +
 gcc/tree.c |   8 ++
 7 files changed, 375 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/strcmpopt_2.c
 create mode 100644 gcc/testsuite/gcc.dg/strcmpopt_3.c
 create mode 100644 gcc/testsuite/gcc.dg/strcmpopt_4.c

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 6b25253..a5f6885 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -6953,12 +6953,45 @@ expand_builtin (tree exp, rtx target, rtx subtarget, 
machine_mode mode,
return target;
   break;
 
+/* expand it as BUILT_IN_MEMCMP_EQ first. If not successful, change it 
+   back to a BUILT_IN_STRCMP. Remember to delete the 3rd paramater
+   when changing it to a strcmp call.  */
+case BUILT_IN_STRCMP_EQ:
+  target = expand_builtin_memcmp (exp, target, true);
+  if (target)
+   return target;
+
+  /* change this call back to a BUILT_IN_STRCMP.  */
+  TREE_OPERAND (exp, 1) 
+   = build_fold_addr_expr (builtin_decl_explicit (BUILT_IN_STRCMP));
+
+  /* delete the last parameter.  */
+  unsigned int i;
+  vec<tree, va_gc> *arg_vec;
+  vec_alloc (arg_vec, 2);
+  for (i = 0; i < 2; i++)
+   arg_vec->quick_push (CALL_EXPR_ARG (exp, i));
+  exp = build_call_vec (TREE_TYPE (exp), CALL_EXPR_FN (exp), arg_vec);
+  /* FALLTHROUGH */
+
 case BUILT_IN_STRCMP:
   target = expand_builtin_strcmp (exp, target);
   if (target)
return target;
   break;
 
+/* expand it as BUILT_IN_MEMCMP_EQ first. If not successful, change it
+   back to a BUILT_IN_STRNCMP.  */
+case BUILT_IN_STRNCMP_EQ:
+  target = expand_builtin_memcmp (exp, target, true);
+  if (target)
+   return target;
+
+  /* change it back to a BUILT_IN_STRNCMP.  */
+  TREE_OPERAND (exp, 1) 
+   = build_fold_addr_expr (builtin_decl_explicit (BUILT_IN_STRNCMP));
+  /* FALLTHROUGH */
+
 case BUILT_IN_STRNCMP:
   target = expand_builtin_strncmp (exp, target, mode);
   if (target)
diff --git a/gcc/builtins.def b/gcc/builtins.def
index 2281021..9eb79fd 100644
--- a/gcc/builtins.def
+++ b/gcc/builtins.def
@@ -949,6 +949,11 @@ DEF_BUILTIN_STUB (BUILT_IN_ALLOCA_WITH_ALIGN_AND_MAX, 
"__builtin_alloca_with_ali
equality with zero.  */
 DEF_BUILTIN_STUB (BUILT_IN_MEMCMP_EQ, "__builtin_memcmp_eq")
 
+/* An internal version of strcmp/strncmp, used when the result is only 
+   tested for equality with zero.  */
+DEF_BUILTIN_STUB (BUILT_IN_STRCMP_EQ, "__builtin_strcmp_eq")
+DEF_BUILTIN_STUB (BUILT_IN_STRNCMP_EQ, "__builtin_strncmp_eq")
+
 /* Object size checking builtins.  */
 DEF_GCC_BUILTIN   (BUILT_IN_OBJECT_SIZE, "object_size", 
BT_FN_SIZE_CONST_PTR_INT, ATTR_PURE_NOTHROW_LEA

Re: [PATCH][Middle-end] disable strcmp/strncmp inlining with O2 below and Os

2018-07-26 Thread Qing Zhao


> On Jul 26, 2018, at 3:26 AM, Richard Biener  wrote:
> 
> On Wed, 25 Jul 2018, Qing Zhao wrote:
> 
>> Hi,
>> 
>> As Wilco suggested, the new added strcmp/strncmp inlining should be only 
>> enabled with O2 and above.
>> 
>> this is the simple patch for this change.
>> 
>> tested on both X86 and aarch64.
>> 
>> Okay for thunk?
> 
> You should simply use
> 
>  if (optimize_insn_for_size_p ())
>return NULL_RTX;
> 
> to be properly profile-aware.  OK with that change.

thanks for the review.

I will make the change, retest it, and then commit it.

Qing
> 
> Richard.
> 



Re: [PATCH][Middle-end]patch for fixing PR 86519

2018-08-15 Thread Qing Zhao


> On Aug 14, 2018, at 11:25 PM, Jeff Law  wrote:
> 
> On 08/14/2018 08:57 AM, Qing Zhao wrote:
>> Hi,
>> 
>> PR 86519:New test case gcc.dg/strcmpopt_6.c fails with its introduction in 
>> r262636.
>> 
>> gcc/ChangeLog:
>> 
>> +2018-08-14  Qing Zhao  
>> +
>> +   PR testsuite/86519
>> +   * builtins.c (expand_builtin_memcmp): Do not expand the call
>> +   when overflow is detected.
>> +
>> gcc/testsuite/ChangeLog:
>> 
>> +2018-08-14  Qing Zhao 
>> +
>> +   PR testsuite/86519
>> +   * gcc.dg/strcmpopt_6.c: Scan the assembly file instead of
>> +   the .expand file.
> OK.

thanks.

the change has been committed as:
https://gcc.gnu.org/viewcvs/gcc?view=revision=263563 
<https://gcc.gnu.org/viewcvs/gcc?view=revision=263563>

Qing



Re: [PATCH][Middle-end] disable strcmp/strncmp inlining with O2 below and Os

2018-08-07 Thread Qing Zhao
Hi, Christophe,

I have attached a patch in PR86519, could you please download it and test it, 
and let me know the result.

thanks.

Qing
> On Jul 30, 2018, at 8:45 AM, Christophe Lyon  
> wrote:
> 
> On Wed, 25 Jul 2018 at 19:08, Qing Zhao  <mailto:qing.z...@oracle.com>> wrote:
>> 
>> Hi,
>> 
>> As Wilco suggested, the new added strcmp/strncmp inlining should be only 
>> enabled with O2 and above.
>> 
>> this is the simple patch for this change.
>> 
>> tested on both X86 and aarch64.
>> 
>> Okay for thunk?
>> 
>> Qing
>> 
>> gcc/ChangeLog:
>> 
>> +2018-07-25  Qing Zhao  
>> +
>> +   * builtins.c (inline_expand_builtin_string_cmp): Disable inlining
>> +   when optimization level is lower than 2 or optimize for size.
>> +
>> 
>> gcc/testsuite/ChangeLog:
>> 
>> +2018-07-25  Qing Zhao  
>> +
>> +   * gcc.dg/strcmpopt_5.c: Change to O2 to enable the transformation.
>> +   * gcc.dg/strcmpopt_6.c: Likewise.
>> +
>> 
> 
> Hi,
> 
> After this change, I've noticed that:
> FAIL: gcc.dg/strcmpopt_6.c scan-rtl-dump-times expand "__builtin_memcmp" 6
> on arm-none-linux-gnueabi
> --with-mode thumb
> --with-cpu cortex-a9
> and forcing -march=armv5t via RUNTESTFLAGS
> 
> The log says:
> gcc.dg/strcmpopt_6.c: pattern found 4 times
> FAIL: gcc.dg/strcmpopt_6.c scan-rtl-dump-times expand "__builtin_memcmp" 6
> 
> Christophe



[PATCH][Middle-end]patch for fixing PR 86519

2018-08-14 Thread Qing Zhao
Hi,

PR 86519:New test case gcc.dg/strcmpopt_6.c fails with its introduction in 
r262636.

***the root cause is:

for the following call to memcmp:   __builtin_memcmp (s->s, "a", 3);
the specified length 3 is larger than the length of "a", it's clearly an 
out-of-bound access. This new testing case is try to claim that,
For such out-of-bound access, we should NOT expand this call at all. The new 
added in-lining expansion was prohibited under
such situation, However, the expansion to hardware compare insn (old code) is 
NOT prohibited under such situation. 
on powerPC, the above call to memcmp is expanded to hardware compare insn. 
therefore, the testing case failed.

***in addition to the above major issue, there is also one minor issue with the 
new testing case itself:

dg-final { scan-rtl-dump-times "__builtin_memcmp" 6 "expand” }
this is trying to scan the dumped .expand file to match the string 
“__builtin_memcmp” exactly 6 times. however, the # of times that
the string “__builtin_memcmp” appears in the .expand file varies on different 
target or optimization level, in order to avoid such
instability, instead of scanning the .expand file to match the string 
“__builtin_memcmp”,  scanning the final assembly file to match
the string “memcmp”.

please review the attached simple patch.

thanks.

Qing

gcc/ChangeLog:

+2018-08-14  Qing Zhao  
+
+   PR testsuite/86519
+   * builtins.c (expand_builtin_memcmp): Do not expand the call
+   when overflow is detected.
+
gcc/testsuite/ChangeLog:

+2018-08-14  Qing Zhao 
+
+   PR testsuite/86519
+   * gcc.dg/strcmpopt_6.c: Scan the assembly file instead of
+   the .expand file.
+



PR86519.patch
Description: Binary data







Re: [PATCH][Middle-end] disable strcmp/strncmp inlining with O2 below and Os

2018-08-06 Thread Qing Zhao
thanks for reporting this issue.

I will take a look.

Qing
> On Jul 30, 2018, at 8:45 AM, Christophe Lyon  
> wrote:
> 
> On Wed, 25 Jul 2018 at 19:08, Qing Zhao  <mailto:qing.z...@oracle.com>> wrote:
>> 
>> Hi,
>> 
>> As Wilco suggested, the new added strcmp/strncmp inlining should be only 
>> enabled with O2 and above.
>> 
>> this is the simple patch for this change.
>> 
>> tested on both X86 and aarch64.
>> 
>> Okay for thunk?
>> 
>> Qing
>> 
>> gcc/ChangeLog:
>> 
>> +2018-07-25  Qing Zhao  
>> +
>> +   * builtins.c (inline_expand_builtin_string_cmp): Disable inlining
>> +   when optimization level is lower than 2 or optimize for size.
>> +
>> 
>> gcc/testsuite/ChangeLog:
>> 
>> +2018-07-25  Qing Zhao  
>> +
>> +   * gcc.dg/strcmpopt_5.c: Change to O2 to enable the transformation.
>> +   * gcc.dg/strcmpopt_6.c: Likewise.
>> +
>> 
> 
> Hi,
> 
> After this change, I've noticed that:
> FAIL: gcc.dg/strcmpopt_6.c scan-rtl-dump-times expand "__builtin_memcmp" 6
> on arm-none-linux-gnueabi
> --with-mode thumb
> --with-cpu cortex-a9
> and forcing -march=armv5t via RUNTESTFLAGS
> 
> The log says:
> gcc.dg/strcmpopt_6.c: pattern found 4 times
> FAIL: gcc.dg/strcmpopt_6.c scan-rtl-dump-times expand "__builtin_memcmp" 6
> 
> Christophe



Re: [PATCH][Middle-end]patch for fixing PR 86519

2018-08-20 Thread Qing Zhao
Hi, Rainer,

thanks a lot to report the issues with mips and sparc platform.

Yes, looks like even on the assembly level, the string scanning still not 
reliable on different platforms.

I agree with Jeff’s suggestion to apply different search result for different 
platforms.

I will update the testcase with this approach soon.

Qing
> On Aug 20, 2018, at 3:57 AM, Rainer Orth  
> wrote:
> 
> Hi Jeff,
> 
>> On 08/17/2018 09:43 PM, Paul Hua wrote:
>>> Hi Qing:
>>> 
 
 the change has been committed as:
 https://gcc.gnu.org/viewcvs/gcc?view=revision=263563
 
 
 Qing
 
>>> 
>>> The strcmpopt_6.c test still fails on mips64el target.
>>> 
>>> gcc.dg/strcmpopt_6.c: memcmp found 4 times
>>> FAIL: gcc.dg/strcmpopt_6.c scan-assembler-times memcmp 2
>>> 
>>> 
>>> The mips asm output have ".reloc" info.
>>> 
>>> -
>>>ld  $5,%got_page(.LC0)($28)
>>>ld  $25,%call16(memcmp)($28)
>>>li  $6,3# 0x3
>>>sd  $31,8($sp)
>>>.reloc  1f,R_MIPS_JALR,memcmp
>>> 1:  jalr$25
>>>daddiu  $5,$5,%got_ofst(.LC0)
>>> 
>>> 
>>> scan-assembler find "4" times.
>> Ugh.  :(  There's probably other targets that are going to have similar
>> properties.  I'm not really sure how to write a suitable assembler test
>> for this.
>> 
>> Maybe we just need a different search result for MIPS (and potentially
>> other targets).
> 
> we certainly do: I've reported a similar issue on sparc in PR
> testsuite/86519, but powerpc and s390x are equally affected.
> 
>   Rainer
> 
> -- 
> -
> Rainer Orth, Center for Biotechnology, Bielefeld University



Re: [PATCH][Middle-end]patch for fixing PR 86519

2018-08-20 Thread Qing Zhao
Hi, Paul,

I was trying to repeat this issue on a mips machine today, but failed…

the only mips machines I can access are those in gcc compile farm, I chose 
gcc22, but failed to build GCC on this machine.

do you know any other machine in gcc compile farm that can repeat this issue?

thanks a lot.

Qing
> On Aug 17, 2018, at 10:43 PM, Paul Hua  wrote:
> 
> Hi Qing:
> 
>> 
>> the change has been committed as:
>> https://gcc.gnu.org/viewcvs/gcc?view=revision=263563 
>> 
>> 
>> Qing
>> 
> 
> The strcmpopt_6.c test still fails on mips64el target.
> 
> gcc.dg/strcmpopt_6.c: memcmp found 4 times
> FAIL: gcc.dg/strcmpopt_6.c scan-assembler-times memcmp 2
> 
> 
> The mips asm output have ".reloc" info.
> 
> -
>ld  $5,%got_page(.LC0)($28)
>ld  $25,%call16(memcmp)($28)
>li  $6,3# 0x3
>sd  $31,8($sp)
>.reloc  1f,R_MIPS_JALR,memcmp
> 1:  jalr$25
>daddiu  $5,$5,%got_ofst(.LC0)
> 
> 
> scan-assembler find "4" times.
> 
> Thanks
> Paul Hua



Re: [PATCH][Middle-end]patch for fixing PR 86519

2018-08-23 Thread Qing Zhao


> On Aug 22, 2018, at 5:01 PM, Jeff Law  wrote:
> 
> On 08/22/2018 11:05 AM, Qing Zhao wrote:
>> 
>>> On Aug 22, 2018, at 10:50 AM, Rainer Orth  
>>> wrote:
>>> 
>>> Hi Qing,
>>> 
>>>> From the comments you put into PR86519, for SPARC, looks like that only
>>>> 32-bit sparc has the problem.
>>>> sparcv9 does NOT have the same issue.
>>>> 
>>>> I was trying to find the string to represent 32-bit sparc target, but
>>>> haven’t found it.
>>>> 
>>>> my guess is:   sparc32*-*-*,  is this correct?
>>> 
>>> no, certainly not.  You need to use something like sparc*-*-* && ilp32
>>> to catch the 32-bit multilib in both sparc-*-* and sparcv9-*-*
>>> configurations.  This is similar to { i?86-*-* x86_64-*-* } && ilp32 on x86.
>> 
>> thanks for the info.
>> 
>>> 
>>> I'm still doubtful that enumerating target after target which all fail
>>> the original test for unrelated reasons is the way to go, especially
>>> given that there are others affected besides mips and sparc.
>> 
>> I am not sure, either.
>> 
>> however, given the available directives provided in testing suite, what’s 
>> the better solution
>> to this problem?
> We could move the test into the i386 target specific test directory.
> It'll still get good coverage that way and it'll be naturally restricted
> to a target where we don't have to worry about the function name we're
> scanning for showing up in undesirable contexts.

I will do this.  is it better to add it to both i386 and aarch64 target?

Qing
> 
> Another approach might be to scan the RTL dumps.  But if there were a
> target that didn't have direct jumps and requires a hi+lo_sum style
> sequence to load a symbolic constant it would fail.
> 
> jeff



Re: [PATCH][Middle-end]patch for fixing PR 86519

2018-08-22 Thread Qing Zhao
thanks.
now, I can repeat the failure.

Qing
> On Aug 21, 2018, at 7:25 PM, Paul Hua  wrote:
> 
> On Wed, Aug 22, 2018 at 2:15 AM Qing Zhao  <mailto:qing.z...@oracle.com>> wrote:
>> 
>> 
>>> On Aug 21, 2018, at 8:07 AM, Paul Hua  wrote:
>>> 
>>> Hi, Qing,
>>> 
>>> The cfarm machine gcc23 can build mips64el successful, configure with
>>> "../configure --prefix=/opt/gcc-9 MISSING=texinfo MAKEINFO=missing
>>> --target=mips64el-linux-gnu --enable-languages=c,c++
>> 
>> I got the same failure message on gcc23 as gcc22, even with the above 
>> configure:
>> 
>> /usr/bin/ld: failed to merge target specific data of file 
>> /usr/lib32/libc.a(mremap.o)
>> /usr/bin/ld: /usr/lib32/libc.a(libc-lowlevellock.o): ABI is incompatible 
>> with that of the selected emulation
>> 
>> not sure what’s the problem?
>> 
> 
> I just build all-gcc and make check.
> 
> ./configure xxx
> make all-gcc -j 2
> make check-gcc RUNTESTFLAGS="dg.exp=strcmpopt_6.c"
> 
> It's enough to reproduce the regression.
> 
> Here is a mips port patch.
> 
> diff --git a/gcc/testsuite/gcc.dg/strcmpopt_6.c
> b/gcc/testsuite/gcc.dg/strcmpopt_6.c
> index 4c6de02824f..fa0ff9d1171 100644
> --- a/gcc/testsuite/gcc.dg/strcmpopt_6.c
> +++ b/gcc/testsuite/gcc.dg/strcmpopt_6.c
> @@ -33,4 +33,5 @@ int main (void)
> 
> }
> 
> -/* { dg-final { scan-assembler-times "memcmp" 2 } } */
> +/* { dg-final { scan-assembler-times "memcmp" 2 { target { !
> mips*-*-* } } } } */
> +/* { dg-final { scan-assembler-times "memcmp" 4 { target { mips*-*-* } } } } 
> */
> 
> Paul Hua



Re: [PATCH][Middle-end]patch for fixing PR 86519

2018-08-22 Thread Qing Zhao
Hi, Rainer,

>From the comments you put into PR86519, for SPARC, looks like that only 32-bit 
>sparc has the problem.
sparcv9 does NOT have the same issue.

I was trying to find the string to represent 32-bit sparc target,   but haven’t 
found it.

my guess is:   sparc32*-*-*,  is this correct?


> On Aug 20, 2018, at 3:57 AM, Rainer Orth  
> wrote:
> 
> Hi Jeff,
> 
>>> 
>> Ugh.  :(  There's probably other targets that are going to have similar
>> properties.  I'm not really sure how to write a suitable assembler test
>> for this.
>> 
>> Maybe we just need a different search result for MIPS (and potentially
>> other targets).
> 
> we certainly do: I've reported a similar issue on sparc in PR
> testsuite/86519, but powerpc and s390x are equally affected.

do you mean that on powerpc and s390x, there is the same issue?

I tried on powerpc machines, seems no such issue. please advice.

thanks.

Qing

> 
>   Rainer
> 
> -- 
> -
> Rainer Orth, Center for Biotechnology, Bielefeld University



Re: [PATCH][Middle-end]patch for fixing PR 86519

2018-08-22 Thread Qing Zhao


> On Aug 22, 2018, at 10:50 AM, Rainer Orth  
> wrote:
> 
> Hi Qing,
> 
>> From the comments you put into PR86519, for SPARC, looks like that only
>> 32-bit sparc has the problem.
>> sparcv9 does NOT have the same issue.
>> 
>> I was trying to find the string to represent 32-bit sparc target, but
>> haven’t found it.
>> 
>> my guess is:   sparc32*-*-*,  is this correct?
> 
> no, certainly not.  You need to use something like sparc*-*-* && ilp32
> to catch the 32-bit multilib in both sparc-*-* and sparcv9-*-*
> configurations.  This is similar to { i?86-*-* x86_64-*-* } && ilp32 on x86.

thanks for the info.

> 
> I'm still doubtful that enumerating target after target which all fail
> the original test for unrelated reasons is the way to go, especially
> given that there are others affected besides mips and sparc.

I am not sure, either.

however, given the available directives provided in testing suite, what’s the 
better solution
to this problem?

thanks.

Qing
> 
>   Rainer
> 
> -- 
> -
> Rainer Orth, Center for Biotechnology, Bielefeld University



Re: [PATCH][Middle-end][version 2]change char type to unsigned char type when expanding strcmp/strncmp

2018-07-20 Thread Qing Zhao


> On Jul 20, 2018, at 9:59 AM, Jakub Jelinek  wrote:
> 
> On Fri, Jul 20, 2018 at 09:53:24AM -0500, Qing Zhao wrote:
>> +2018-07-20  Qing Zhao  
>> +
>> +   * builtins.c (expand_builtin_memcmp): Delete the last parameter for
>> +   call to inline_expand_builtin_string_cmp.
>> +   (expand_builtin_strcmp): Likewise.
>> +   (expand_builtin_strncmp): Likewise.
>> +   (inline_string_cmp): Delete the last parameter, change char_type_node
>> +   to unsigned_char_type_node for strcmp/strncmp, add conversions to the
>> +   two operands.
>> +   (inline_expand_builtin_string_cmp): Delete the last parameter, give 
>> up
>> +   the inlining expansion on target where the type of the call has same 
>> or 
>> +   narrower presicion than unsigned char.
> 
> s/presicion/precision/
> 
> Also in the patch, where there is another typo, s/of/or/.

Okay.
> 
> Ok for trunk with that fixed.

thanks a lot for the review.

Qing
> 
>   Jakub



[PATCH][Middle-end][version 2]change char type to unsigned char type when expanding strcmp/strncmp

2018-07-20 Thread Qing Zhao
Hi,

this is the 2nd version of the change, mainly addressed Jakub’s comments:

1. Give up the inlining expansion for strcmp/strncmp/memcmp on a target
where the type of the call has same or narrower presicion than unsigned
char.
2.  add conversions before expand_simple_binop to the two operands.

and
3. also updated comments of routine inline_string_cmp to reflect the conversions
in the expanded code.

have tested on X86 and aarch64. No regressions.

Okay for thunk?

Qing

gcc/ChangeLog:

+2018-07-20  Qing Zhao  
+
+   * builtins.c (expand_builtin_memcmp): Delete the last parameter for
+   call to inline_expand_builtin_string_cmp.
+   (expand_builtin_strcmp): Likewise.
+   (expand_builtin_strncmp): Likewise.
+   (inline_string_cmp): Delete the last parameter, change char_type_node
+   to unsigned_char_type_node for strcmp/strncmp, add conversions to the
+   two operands.
+   (inline_expand_builtin_string_cmp): Delete the last parameter, give up
+   the inlining expansion on target where the type of the call has same or 
+   narrower presicion than unsigned char.
+



78809C_uchar.patch
Description: Binary data


> On Jul 19, 2018, at 12:31 PM, Jakub Jelinek  wrote:
> 
> If you expand it as (int) ((unsigned char *)p)[n] - (int) ((unsigned char 
> *)q)[n]
> then aren't you relying on int type to have wider precision than unsigned char
> (or unit_mode being narrower than mode)?  I don't see anywhere where you'd
> give up on doing the inline expansion on targets where e.g. lowest
> addressable unit would be 16-bit and int would be 16-bit too.
> On targets where int is as wide as char, one would need to expand it instead
> as something like:
> if (((unsigned char *)p)[n] == ((unsigned char *)q)[n]) loop;
> ret = ((unsigned char *)p)[n] < ((unsigned char *)q)[n] ? -1 : 1;
> or similar or just use the library routine.
> 
> Also:
>  var_rtx
>= adjust_address (var_rtx_array, TYPE_MODE (unit_type_node), offset);
>  const_rtx = c_readstr (const_str + offset, unit_mode);
>  rtx op0 = (const_str_n == 1) ? const_rtx : var_rtx;
>  rtx op1 = (const_str_n == 1) ? var_rtx : const_rtx;
> 
>  result = expand_simple_binop (mode, MINUS, op0, op1,
>result, is_memcmp ? 1 : 0, OPTAB_WIDEN);
> doesn't look correct to me, var_rtx and const_rtx here are in unit_mode,
> you need to convert those to mode before you can use those in
> expand_simple_binop, using
>  op0 = convert_modes (mode, unit_mode, op0, 1);
>  op1 = convert_modes (mode, unit_mode, op1, 1);
> before the expand_simple_binop.
> While expand_simple_binop is called with an unsignedp argument, that is
> meant for the cases where the expansion needs to widen it further, not for
> calling expand_simple_binop with arguments with known incorrect mode;
> furthermore, one of them being CONST_INT which has VOIDmode.



Re: [PATCH][Middle-end][version 2]change char type to unsigned char type when expanding strcmp/strncmp

2018-07-20 Thread Qing Zhao
the patch was committed as:

https://gcc.gnu.org/viewcvs/gcc?view=revision=262907 
<https://gcc.gnu.org/viewcvs/gcc?view=revision=262907>

thanks.

Qing
> On Jul 20, 2018, at 9:59 AM, Jakub Jelinek  wrote:
> 
> On Fri, Jul 20, 2018 at 09:53:24AM -0500, Qing Zhao wrote:
>> +2018-07-20  Qing Zhao  
>> +
>> +   * builtins.c (expand_builtin_memcmp): Delete the last parameter for
>> +   call to inline_expand_builtin_string_cmp.
>> +   (expand_builtin_strcmp): Likewise.
>> +   (expand_builtin_strncmp): Likewise.
>> +   (inline_string_cmp): Delete the last parameter, change char_type_node
>> +   to unsigned_char_type_node for strcmp/strncmp, add conversions to the
>> +   two operands.
>> +   (inline_expand_builtin_string_cmp): Delete the last parameter, give 
>> up
>> +   the inlining expansion on target where the type of the call has same 
>> or 
>> +   narrower presicion than unsigned char.
> 
> s/presicion/precision/
> 
> Also in the patch, where there is another typo, s/of/or/.
> 
> Ok for trunk with that fixed.
> 
>   Jakub



[PATCH][Middle-end]change char type to unsigned char type when expanding strcmp/strncmp

2018-07-19 Thread Qing Zhao
Hi, 

As Wilco mentioned in PR78809 after I checked in the last part of 
implementation of inline strcmp:

See  http://www.iso-9899.info/n1570.html
 section 7.24.4:

"The sign of a nonzero value returned by the comparison functions memcmp, 
strcmp, and strncmp is determined 
by the sign of the difference between the values of the first pair of 
characters (both interpreted as unsigned char)
 that differ in the objects being compared."

currently, in my implementation, I used char type when expanding 
strcmp/strncmp, and unsigned char when expanding
memcmp.

from the C standard, we should use unsigned char for all strcmp/strncmp/memcmp.

the change is quite simple, and I have tested it on X86, aarch64 and powerPC, 
no regressions.

Okay for trunk?

Qing

gcc/ChangeLog:

+2018-07-19  Qing Zhao  
+
+   * builtins.c (expand_builtin_memcmp): Delete the last parameter for
+   call to inline_expand_builtin_string_cmp.
+   (expand_builtin_strcmp): Likewise.
+   (expand_builtin_strncmp): Likewise.
+   (inline_string_cmp): Delete the last parameter, change char_type_node
+   to unsigned_char_type_node for strcmp/strncmp;
+   (inline_expand_builtin_string_cmp): Delete the last parameter.
+


78809C_uchar.patch
Description: Binary data


Re: [PATCH][Middle-end]change char type to unsigned char type when expanding strcmp/strncmp

2018-07-19 Thread Qing Zhao
Jakub,

thanks a lot for you review and comments.

> On Jul 19, 2018, at 12:31 PM, Jakub Jelinek  wrote:
> 
> On Thu, Jul 19, 2018 at 11:49:16AM -0500, Qing Zhao wrote:
>> As Wilco mentioned in PR78809 after I checked in the last part of 
>> implementation of inline strcmp:
>> 
>> See  http://www.iso-9899.info/n1570.html
>> section 7.24.4:
>> 
>> "The sign of a nonzero value returned by the comparison functions memcmp, 
>> strcmp, and strncmp is determined 
>> by the sign of the difference between the values of the first pair of 
>> characters (both interpreted as unsigned char)
>> that differ in the objects being compared."
>> 
>> currently, in my implementation, I used char type when expanding 
>> strcmp/strncmp, and unsigned char when expanding
>> memcmp.
>> 
>> from the C standard, we should use unsigned char for all 
>> strcmp/strncmp/memcmp.
>> 
>> the change is quite simple, and I have tested it on X86, aarch64 and 
>> powerPC, no regressions.
>> 
>> Okay for trunk?
> 
> If you expand it as (int) ((unsigned char *)p)[n] - (int) ((unsigned char 
> *)q)[n]
> then aren't you relying on int type to have wider precision than unsigned char
> (or unit_mode being narrower than mode)?

do you imply that we should only expand it as  (int) ((unsigned char *)p)[n] - 
(int) ((unsigned char *)q)[n] when we are sure
int type is wider than unsigned char? 

>  I don't see anywhere where you'd
> give up on doing the inline expansion on targets where e.g. lowest
> addressable unit would be 16-bit and int would be 16-bit too.

even on this targets, is char type still 8-bit?
then int type is still wider than char?

> On targets where int is as wide as char, one would need to expand it instead
> as something like:
> if (((unsigned char *)p)[n] == ((unsigned char *)q)[n]) loop;
> ret = ((unsigned char *)p)[n] < ((unsigned char *)q)[n] ? -1 : 1;
> or similar or just use the library routine.


even when int type is as wide as char,  expand it as (int) ((unsigned char 
*)p)[n] - (int) ((unsigned char *)q)[n]
should still be correct (even though not optimal), doesn’t it?

do I miss anything in this part?

> 
> Also:
>  var_rtx
>= adjust_address (var_rtx_array, TYPE_MODE (unit_type_node), offset);
>  const_rtx = c_readstr (const_str + offset, unit_mode);
>  rtx op0 = (const_str_n == 1) ? const_rtx : var_rtx;
>  rtx op1 = (const_str_n == 1) ? var_rtx : const_rtx;
> 
>  result = expand_simple_binop (mode, MINUS, op0, op1,
>result, is_memcmp ? 1 : 0, OPTAB_WIDEN);
> doesn't look correct to me, var_rtx and const_rtx here are in unit_mode,
> you need to convert those to mode before you can use those in
> expand_simple_binop, using
>  op0 = convert_modes (mode, unit_mode, op0, 1);
>  op1 = convert_modes (mode, unit_mode, op1, 1);
> before the expand_simple_binop.
> While expand_simple_binop is called with an unsignedp argument, that is
> meant for the cases where the expansion needs to widen it further, not for
> calling expand_simple_binop with arguments with known incorrect mode;
> furthermore, one of them being CONST_INT which has VOIDmode.

thank you for raising this issue, Yes, I will update this part of the code as 
you suggested.

Qing



Re: [PATCH][Middle-end]change char type to unsigned char type when expanding strcmp/strncmp

2018-07-19 Thread Qing Zhao


> On Jul 19, 2018, at 2:24 PM, Jakub Jelinek  wrote:
> 
> On Thu, Jul 19, 2018 at 02:06:16PM -0500, Qing Zhao wrote:
>>> If you expand it as (int) ((unsigned char *)p)[n] - (int) ((unsigned char 
>>> *)q)[n]
>>> then aren't you relying on int type to have wider precision than unsigned 
>>> char
>>> (or unit_mode being narrower than mode)?
>> 
>> do you imply that we should only expand it as  (int) ((unsigned char *)p)[n] 
>> - (int) ((unsigned char *)q)[n] when we are sure
>> int type is wider than unsigned char? 
> 
> Yes.
> 
>>> I don't see anywhere where you'd
>>> give up on doing the inline expansion on targets where e.g. lowest
>>> addressable unit would be 16-bit and int would be 16-bit too.
>> 
>> even on this targets, is char type still 8-bit?
>> then int type is still wider than char?
> 
> C requires that int is at least 16-bit wide, so the sizeof (int) == sizeof
> (char) case is only possible say with 16-bit char and 16-bit int, or 32-bit
> char and 32-bit int etc.
> 
>>> On targets where int is as wide as char, one would need to expand it instead
>>> as something like:
>>> if (((unsigned char *)p)[n] == ((unsigned char *)q)[n]) loop;
>>> ret = ((unsigned char *)p)[n] < ((unsigned char *)q)[n] ? -1 : 1;
>>> or similar or just use the library routine.
>> 
>> 
>> even when int type is as wide as char,  expand it as (int) ((unsigned char 
>> *)p)[n] - (int) ((unsigned char *)q)[n]
>> should still be correct (even though not optimal), doesn’t it?
> 
> No.  Consider p[n] being e.g. 1 and q[n] being __SCHAR_MAX__ + 3U and 16-bit
> int and 16-bit char.  Then (unsigned char) 0x0001 < (unsigned char) 0x8002,
> so it should return a negative number.  But (int) (0x0001U - 0x8002U) is
> 0x7fff, which is a positive int.  Now, if int is 17-bit and char is 16-bit,
> this works fine, because is then -0x8001 and thus negative.

Okay, I see now.
really appreciate for your detailed explanation.
> 
> The above really works only if int is at least one bit wider than unsigned
> char.

Then, I will add a check to exclude the inlining when int is NOT wider than 
unsigned char on the target.

is the following the correct check:  (exp is the call to strcmp)

 if (CHAR_TYPE_SIZE >= TYPE_PRECISION (TREE_TYPE (exp)))

 
Thanks.

Qing




[PATCH][Middle-end] disable strcmp/strncmp inlining with O2 below and Os

2018-07-25 Thread Qing Zhao
Hi,

As Wilco suggested, the new added strcmp/strncmp inlining should be only 
enabled with O2 and above.

this is the simple patch for this change.

tested on both X86 and aarch64.

Okay for thunk?

Qing

gcc/ChangeLog:

+2018-07-25  Qing Zhao  
+
+   * builtins.c (inline_expand_builtin_string_cmp): Disable inlining
+   when optimization level is lower than 2 or optimize for size.
+   

gcc/testsuite/ChangeLog:

+2018-07-25  Qing Zhao  
+
+   * gcc.dg/strcmpopt_5.c: Change to O2 to enable the transformation.
+   * gcc.dg/strcmpopt_6.c: Likewise.
+



78809_O2.patch
Description: Binary data


[PATCH][testcase]patch for fixing PR 86519

2018-08-29 Thread Qing Zhao
Hi,

this is the patch to fix PR86519.

Per Jeff’s suggestion, I removed strcmpopt_6.c from gcc.dg, and
added it to gcc.target/aarch64 and gcc.target/i386.

retested on x86 and aarch64. no issue.

Okay to commit?

thanks.

Qing.

gcc/testsuite/ChangeLog:

--- ChangeLog   (revision 263888)
+++ ChangeLog   (working copy)
@@ -1,3 +1,10 @@
+2018-08-29  Qing Zhao  
+
+   PR 86519
+   gcc.dg/strcmpopt_6.c: Remove.
+   gcc.target/aarch64/strcmpopt_6.c: New testcase.
+   gcc.target/i386/strcmpopt_6.c: Likewise.
+



86519_2.patch
Description: Binary data




Re: [PATCH][Middle-end]3rd patch of PR78809

2018-07-05 Thread Qing Zhao
Hi,

I have sent two emails with the updated patches on 7/3:

https://gcc.gnu.org/ml/gcc-patches/2018-07/msg00065.html
https://gcc.gnu.org/ml/gcc-patches/2018-07/msg00070.html

however, these 2 emails  were not successfully forwarded to the 
gcc-patches@gcc.gnu.org mailing list.

So, I am sending the same email again in this one, hopefully this time it can 
go through.
Qing

Hi, Jeff,

thanks a lot for your review and comments.

I have addressed your comments,updated the patch, retested on both
aarch64 and x86.

The major changes in this version compared to the previous version are:

1. in routine expand_builtin_memcmp:
* move the inlining transformation AFTER the warning is issues for
-Wstringop-overflow;
* only apply inlining when there is No warning is issued.
2. in the testsuite, add a new testcase strcmpopt_6.c for this case.
3. update comments to:
* capitalize the first word.
* capitalize all the arguments.

NOTE, the routine expand_builtin_strcmp and expand_builtin_strncmp are not 
changed.
the reason is:  there is NO overflow checking for these two routines currently.
if we need overflow checking for these two routines, I think that a separate 
patch is needed.
if this is needed, let me know, I can work on this separate patch for issuing 
warning for strcmp/strncmp when
-Wstringop-overflow is specified.

The new patch is as following, please take a look at it.

thanks.

Qing

gcc/ChangeLog

+2018-07-02  Qing Zhao  
+
+   PR middle-end/78809
+   * builtins.c (expand_builtin_memcmp): Inline the calls first
+   when result_eq is false.
+   (expand_builtin_strcmp): Inline the calls first.
+   (expand_builtin_strncmp): Likewise.
+   (inline_string_cmp): New routine. Expand a string compare
+   call by using a sequence of char comparison.
+   (inline_expand_builtin_string_cmp): New routine. Inline expansion
+   a call to str(n)cmp/memcmp.
+   * doc/invoke.texi (--param builtin-string-cmp-inline-length): New 
option.
+   * params.def (BUILTIN_STRING_CMP_INLINE_LENGTH): New.
+

gcc/testsuite/ChangeLog

+2018-07-02  Qing Zhao  
+
+   PR middle-end/78809
+   * gcc.dg/strcmpopt_5.c: New test.
+   * gcc.dg/strcmpopt_6.c: New test.
+



0001-3nd-Patch-for-PR78009.patch
Description: Binary data


> On Jun 28, 2018, at 12:10 AM, Jeff Law  wrote:
> 
> 
> So this generally looks pretty good.  THe biggest technical concern is
> making sure we're doing the right thing WRT issuing warnings.  You can
> tackle that problem by deferring inlining to a later point after
> warnings have been issued or by verifying that your routines do not
> inline in cases where warnings will be issued.  It may be worth adding
> testcases for these issues.
> 
> There's a large number of comments that need capitalization fixes.
> 
> Given there was no measured runtime performance impact, but slight
> improvements on codesize for values <= 3, let's go ahead with that as
> the default.
> 
> Can you address the issues above and repost for final review?
> 
> Thanks,
> jeff



Re: [PATCH][Middle-end]3rd patch of PR78809

2018-07-09 Thread Qing Zhao
Hi, Martin,

thanks a lot for your comments. 

> On Jul 5, 2018, at 11:30 AM, Martin Sebor  wrote:
> 
> One of the basic design principles that I myself have
> accidentally violated in the past is that warning options
> should not impact the emitted object code.  I don't think
> your patch actually does introduce this dependency by having
> the codegen depend on the result of check_access() -- I'm
> pretty sure the function is designed to do the validation
> irrespective of warning options and return based on
> the result of the validation and not based on whether
> a warning was issued.  But the choice of the variable name,
> no_overflow_warn, suggests that it does, in fact, have this
> effect.  So I would suggest to rename the variable and add
> a test that verifies that this dependency does not exist.

I agree that warning options should not impact the emitted object code. 
and my current change seems violate this principle:

in the following change:

+  bool no_overflow_warn = true;

   /* Diagnose calls where the specified length exceeds the size of either
  object.  */
   if (warn_stringop_overflow)
 {
   tree size = compute_objsize (arg1, 0);
-  if (check_access (exp, /*dst=*/NULL_TREE, /*src=*/NULL_TREE, len,
-   /*maxread=*/NULL_TREE, size, /*objsize=*/NULL_TREE))
+  no_overflow_warn = check_access (exp, /*dst=*/NULL_TREE, 
/*src=*/NULL_TREE,
+  len, /*maxread=*/NULL_TREE, size,
+  /*objsize=*/NULL_TREE);
+  if (no_overflow_warn) 
{
  size = compute_objsize (arg2, 0);
- check_access (exp, /*dst=*/NULL_TREE, /*src=*/NULL_TREE, len,
-   /*maxread=*/NULL_TREE, size, /*objsize=*/NULL_TREE);
+ no_overflow_warn = check_access (exp, /*dst=*/NULL_TREE, 
/*src=*/NULL_TREE,
+  len,  /*maxread=*/NULL_TREE, size,
+  /*objsize=*/NULL_TREE);
}
 }

+  /* Due to the performance benefit, always inline the calls first 
+ when result_eq is false.  */
+  rtx result = NULL_RTX;
+   
+  if (!result_eq && fcode != BUILT_IN_BCMP && no_overflow_warn) 
+{
+  result = inline_expand_builtin_string_cmp (exp, target, true);
+  if (result)

The variable no_overflow_warn DEPENDs on the warning option 
warn_stringop_overflow, and this
variable is used to control the code generation.  such behavior seems violate 
the above mentioned
principle.

However, this is not a problem that can be easily fixed based on the the 
current design, which has the following issues as my
understanding:

1. the routine check_access issues warnings by default, then it seems 
necessary to guard the call
to this routine with the warning option;
2. then the returned value of the routine check_access has to depend on 
the warning option.

in order to fix the current problem I have, an approach is to rewrite the 
routine check_access to guard the issue warning inside
the routine with the warning option passed as an additional parameter.

let me know anything I am missing so far.

> 
> Beyond that, an enhancement to this optimization that might
> be worth considering is inlining even non-constant calls
> with array arguments whose size is no greater than the limit.
> As in:
> 
>  extern char a[4], *b;
> 
>  int n = strcmp (a, b);
> 
> Because strcmp arguments are required to be nul-terminated
> strings, a's length above must be at most 3.  This is analogous
> to similar optimizations GCC performs, such as folding to zero
> calls to strlen() with one-element arrays.

Yes, I agree that this will be another good enhancement to the strcmp inlining.

however, it’s not easy to be integrated with my current patch.  The major issue 
is:

 The inlined code for the strcmp call without string constant will be 
different than the inlined code for the
strcmp call with string constant,  then:

1. the default value for the threshold that control the maximum length 
of the string length for inlining will
be different than the one for the strcmp call with string constant,  more 
experiments need to be run and a new parameter
need to be added to control this;
2. the inlined transformed code will be different than the current one. 

based on the above, I’d like to open a new PR to record this new enhancement 
and finish it with a new patch later.

what’s your opinion on this?

Qing
> 
> Martin



Re: [PATCH][Middle-end]3rd patch of PR78809

2018-07-10 Thread Qing Zhao


> On Jul 9, 2018, at 3:25 PM, Martin Sebor  wrote:
> 
> check_access() calls warning_at() to issue warnings, and that
> function only issues warnings if they are enabled, so the guard
> isn't necessary to make it work this way.

Okay I see.

then, in the current code: (for routine expand_builtin_memcmp)

  /* Diagnose calls where the specified length exceeds the size of either
 object.  */
  if (warn_stringop_overflow)
{
  tree size = compute_objsize (arg1, 0);
  if (check_access (exp, /*dst=*/NULL_TREE, /*src=*/NULL_TREE, len,
/*maxread=*/NULL_TREE, size, /*objsize=*/NULL_TREE))
{
  size = compute_objsize (arg2, 0);
  check_access (exp, /*dst=*/NULL_TREE, /*src=*/NULL_TREE, len,
/*maxread=*/NULL_TREE, size, /*objsize=*/NULL_TREE);
}
}

Is the above condition on variable warn_stringop_overflow unnecessary?
all the warnings inside check_access are controlled by OPT_Wstringop_overflow_.

can I safely delete the above condition if (warn_stringop_overflow)?

> 
>>> Beyond that, an enhancement to this optimization that might
>>> be worth considering is inlining even non-constant calls
>>> with array arguments whose size is no greater than the limit.
>>> As in:
>>> 
>>> extern char a[4], *b;
>>> 
>>> int n = strcmp (a, b);
>>> 
>>> Because strcmp arguments are required to be nul-terminated
>>> strings, a's length above must be at most 3.  This is analogous
>>> to similar optimizations GCC performs, such as folding to zero
>>> calls to strlen() with one-element arrays.
>> 
>> Yes, I agree that this will be another good enhancement to the strcmp 
>> inlining.
>> 
>> however, it’s not easy to be integrated with my current patch.  The major 
>> issue is:
>> 
>>   The inlined code for the strcmp call without string constant will be 
>> different than the inlined code for the
>> strcmp call with string constant,  then:
>> 
>>  1. the default value for the threshold that control the maximum length 
>> of the string length for inlining will
>> be different than the one for the strcmp call with string constant,  more 
>> experiments need to be run and a new parameter
>> need to be added to control this;
>>  2. the inlined transformed code will be different than the current one.
>> 
>> based on the above, I’d like to open a new PR to record this new enhancement 
>> and finish it with a new patch later.
>> 
>> what’s your opinion on this?
> 
> I'm not sure I see the issues above as problems and I would expect
> the non-constant optimization to naturally handle the constant case
> as well.  But if you prefer it that way, implementing the non-constant
> optimization in a separate step sounds reasonable to me.  It's your
> call.

the inlined code for call to strcmp with constant string will only have one 
load instruction for each byte, but for call to strcmp
without constant string, there will be  two load instructions for each byte.  
So, the run time performance impact will be different.
we need separate default values of the maximum length of the string to enable 
the transformation. 

I will create a PR on this and add a new patch after this one.

thanks.

Qing



Re: [PATCH][Middle-end]3rd patch of PR78809

2018-07-10 Thread Qing Zhao
Richard and Martin,

thanks for the info.

> On Jul 10, 2018, at 11:29 AM, Richard Biener  wrote:
>> Is the above condition on variable warn_stringop_overflow unnecessary?
>> all the warnings inside check_access are controlled by
>> OPT_Wstringop_overflow_.
> 
> Well, the condition certainly saves compile time. 



> On Jul 10, 2018, at 11:55 AM, Martin Sebor  wrote:
>> 
>> Is the above condition on variable warn_stringop_overflow unnecessary?
>> all the warnings inside check_access are controlled by 
>> OPT_Wstringop_overflow_.
>> 
>> can I safely delete the above condition if (warn_stringop_overflow)?
> 
> I think the check above is only there to avoid the overhead
> of the two calls to compute_objsize and check_access.  There
> are a few more like it in other functions in the file and
> they all should be safe to remove, but also safe to keep.
> (Some of them might make it easy to inadvertently introduce
> a dependency between the warning option and an optimization
> so that's something to consider.)

currently,  the condition is there for saving compilation time.
However, for my patch, I need the return value of check_access to control 
whether 
to invoking inlining or not,  therefore,  the call to check_access should 
always be
invoked for code generation.  The condition need to be deleted.

let me know if I still miss anything here.

 based on the above, I’d like to open a new PR to record this new 
 enhancement and finish it with a new patch later.
 
 what’s your opinion on this?
>>> 
>>> I'm not sure I see the issues above as problems and I would expect
>>> the non-constant optimization to naturally handle the constant case
>>> as well.  But if you prefer it that way, implementing the non-constant
>>> optimization in a separate step sounds reasonable to me.  It's your
>>> call.
>> 
>> the inlined code for call to strcmp with constant string will only have one 
>> load instruction for each byte, but for call to strcmp
>> without constant string, there will be  two load instructions for each byte. 
>>  So, the run time performance impact will be different.
>> we need separate default values of the maximum length of the string to 
>> enable the transformation.
> 
> You're right, that's true for builtins.c where all we have to
> work with is arrays with unknown contents and string literals.
> The strlen pass, on the other hand, has access to the lengths
> of even unknown strings.  That suggests that an even better
> place for the optimization might be the strlen pass where
> the folding could happen earlier and at a higher level, which
> might even obviate having to worry about the constant vs non-
> constant handling.

Yes, looks like the inlining of call to strcmp with all variable strings might 
need to be done at
strlen pass in order to get more necessary info. 

In addition to this, I still feel that these two inlining could be separated.  
the generated code of inlining of call to strcmp with constant string
could be more optimal than the inlining of call to strcmp without constant 
strings. the cost models are different.

I just created PR:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86467 


for this work.

> 
>> I will create a PR on this and add a new patch after this one.
> 
> Sure, one step at a time makes sense.  I don't think there is
> any harm in having the optimization in two places: builtins.c
> and strlen.

Thanks a lot for your suggestions.

Qing



Re: [PATCH][Middle-end][version 3]3rd patch of PR78809

2018-07-13 Thread Qing Zhao
thank you.

the patch was just committed into trunk as:

https://gcc.gnu.org/viewcvs/gcc?view=revision=262636 
<https://gcc.gnu.org/viewcvs/gcc?view=revision=262636>

Qing
> On Jul 12, 2018, at 12:03 PM, Jeff Law  wrote:
> 
>> 
>> gcc/ChangeLog:
>> 
>> +2018-07-11  Qing Zhao  
>> +
>> +PR middle-end/78809
>> +* builtins.c (expand_builtin_memcmp): Inline the calls first
>> +when result_eq is false.
>> +(expand_builtin_strcmp): Inline the calls first.
>> +(expand_builtin_strncmp): Likewise.
>> +(inline_string_cmp): New routine. Expand a string compare 
>> +call by using a sequence of char comparison.
>> +(inline_expand_builtin_string_cmp): New routine. Inline expansion
>> +a call to str(n)cmp/memcmp.
>> +* doc/invoke.texi (--param builtin-string-cmp-inline-length): New 
>> option.
>> +    * params.def (BUILTIN_STRING_CMP_INLINE_LENGTH): New.
>> +
>> 
>> gcc/testsuite/ChangeLog:
>> 
>> +2018-07-11  Qing Zhao  
>> +
>> +PR middle-end/78809
>> +* gcc.dg/strcmpopt_5.c: New test.
>> +* gcc.dg/strcmpopt_6.c: New test.
> OK
> THanks
> 
> Jeff



[PATCH][Middle-end][version 3]3rd patch of PR78809

2018-07-11 Thread Qing Zhao
Hi,   This is the 3rd version of the patch for the last part of PR78809.

the major change in this version is to address the following concerns raised by 
Martin:

> One of the basic design principles that I myself have
> accidentally violated in the past is that warning options
> should not impact the emitted object code.  I don't think
> your patch actually does introduce this dependency by having
> the codegen depend on the result of check_access() -- I'm
> pretty sure the function is designed to do the validation
> irrespective of warning options and return based on
> the result of the validation and not based on whether
> a warning was issued.  But the choice of the variable name,
> no_overflow_warn, suggests that it does, in fact, have this
> effect.  So I would suggest to rename the variable and add
> a test that verifies that this dependency does not exist.

I have addressed this concern as following per our discussion:

1. in routine expand_builtin_memcmp, 
* delete the condition if (warn_stringop_overflow) before check_access;
* change the name of the variable that holds the return value of check_access 
to no_overflow

2. in the testsuite, change the new testcase strcmpopt_6.c to inhibit inlining 
when check_access
detects error (Not depend on whether the warning option is ON or not).

the following is the new patch, tested on both X86 and aarch64, no regression.

Okay for thunk?

thanks.

Qing

gcc/ChangeLog:

+2018-07-11  Qing Zhao  
+
+   PR middle-end/78809
+   * builtins.c (expand_builtin_memcmp): Inline the calls first
+   when result_eq is false.
+   (expand_builtin_strcmp): Inline the calls first.
+   (expand_builtin_strncmp): Likewise.
+   (inline_string_cmp): New routine. Expand a string compare 
+   call by using a sequence of char comparison.
+   (inline_expand_builtin_string_cmp): New routine. Inline expansion
+   a call to str(n)cmp/memcmp.
+   * doc/invoke.texi (--param builtin-string-cmp-inline-length): New 
option.
+   * params.def (BUILTIN_STRING_CMP_INLINE_LENGTH): New.
+

gcc/testsuite/ChangeLog:

+2018-07-11  Qing Zhao  
+
+   PR middle-end/78809
+   * gcc.dg/strcmpopt_5.c: New test.
+   * gcc.dg/strcmpopt_6.c: New test.
+



0001-3nd-Patch-for-PR78009.patch
Description: Binary data


> On Jul 5, 2018, at 10:46 AM, Qing Zhao  wrote:
> 
> Hi,
> 
> I have sent two emails with the updated patches on 7/3:
> 
> https://gcc.gnu.org/ml/gcc-patches/2018-07/msg00065.html
> https://gcc.gnu.org/ml/gcc-patches/2018-07/msg00070.html
> 
> however, these 2 emails  were not successfully forwarded to the 
> gcc-patches@gcc.gnu.org mailing list.
> 
> So, I am sending the same email again in this one, hopefully this time it can 
> go through.
> Qing
> 
> Hi, Jeff,
> 
> thanks a lot for your review and comments.
> 
> I have addressed your comments,updated the patch, retested on both
> aarch64 and x86.
> 
> The major changes in this version compared to the previous version are:
> 
>   1. in routine expand_builtin_memcmp:
> * move the inlining transformation AFTER the warning is issues for
> -Wstringop-overflow;
> * only apply inlining when there is No warning is issued.
>   2. in the testsuite, add a new testcase strcmpopt_6.c for this case.
>   3. update comments to:
> * capitalize the first word.
> * capitalize all the arguments.
> 
> NOTE, the routine expand_builtin_strcmp and expand_builtin_strncmp are not 
> changed.
> the reason is:  there is NO overflow checking for these two routines 
> currently.
> if we need overflow checking for these two routines, I think that a separate 
> patch is needed.
> if this is needed, let me know, I can work on this separate patch for issuing 
> warning for strcmp/strncmp when
> -Wstringop-overflow is specified.



[PATCH][Middle-end]Add a new option to finer control inlining based on function's visibility

2018-09-11 Thread Qing Zhao
Hi, 

This is a simple patch to add a new first-class option 

-finline-visibility={all|extern|static}

to finer control inlining based on function’s visibility. 

'-finline-visibility=[all|extern|static]'
 By default, GCC inlines functions without considering their
 visibility.  This flag allows finer control of inlining based on
 their visibility.

 The value 'extern' tells the compiler to only inline functions with
 external visibility.  The value 'static' tells the compiler to only
 inline functions with static visibility.  The value 'all' tells the
 compilers to inline functions without considering their visibility.

 The default value of '-finline-visibility' is 'all'.

The major purpose of this option is to provide a way for the user 
to finer choose the inline candidates based on function’s visibility.
For example, some online patching users might want to limit the inlining
to only static functions to avoid patching the callers of global functions
in order to control the memory consumption caused by online patching. 

let me know any comments and suggestions.

thanks.

Qing

gcc/ChangeLog:

+2018-09-11  Qing Zhao  
+
+   * cif-code.def (FUNCTION_EXTERN): New CIFCODE.
+   (FUNCTION_STATIC): Likewise.
+   * common.opt (-finline-visibility=): New option.
+   * doc/invoke.texi: Document -finline-visibility=.
+   * flag-types.h: Add a new enum inline_visibility.
+   * ipa-inline.c (can_inline_edge_p): Control inlining based on
+   function's visibility. 
+

gcc/testsuite/ChangeLog:

+2018-09-11  Qing Zhao  
+
+   * gcc.dg/inline_visibility_1.c: New test.
+   * gcc.dg/inline_visibility_2.c: New test.
+



add-a-new-option-for-inline-visibility.patch
Description: Binary data


Re: [RFC] GCC support for live-patching

2018-10-22 Thread Qing Zhao
Hi, 

thanks for the comments.

> 
> thanks for the proposal. The others have already expressed some of my 
> worries and remarks, but I think it would be only right to write them 
> again. Especially since I am part of the team responsible for 
> implementation and maintenance of live patches here at SUSE, we use kGraft 
> and we prepare everything manually (compared to kpatch and ksplice).

One question here,  what’s the major benefit to prepare the patches manually? 
> 
>> 1. A study of Kernel live patching schemes.
>> 
>> Three major kernel live patching tools:  https://lwn.net/Articles/734765/
>> 
>> * ksplice:   http://www.ksplice.com/doc/ksplice.pdf
>> * kpatch:https://lwn.net/Articles/597123/
>>https://github.com/dynup/kpatch
>> * kGraft:
>> https://pdfs.semanticscholar.org/presentation/af4c/895aa3fef0cc2b501317aaec9d91ba2d704c.pdf
>> 
>> In the above, ksplice and kpatch can automatically generate binary patches 
>> as following:
>> 
>>   * a collection of tools which convert a source diff patch to a patch
>> module. They work by compiling the kernel both with and without the source
>> patch, comparing the binaries, and generating a binary patch module which 
>> includes new binary versions of the functions to be replaced.
>> 
>> on the other hand, kGraft offers a way to create patches entirely by hand. 
>> The source of the patch is a single C file, easy to review, easy to
>> maintain. 
>> 
>> In addition to kGraft, there are other live patching tools that prefer
>> creating patches by hand for the similar reason. 
>> 
>> The compiler support is mainly for the above live patching tools that create 
>> patches entirely by hand. the major purpose is:
>> 
>> * control patch code size and debug complexity;
>> * keep good run time performance;
>> 
>> 2. the major problems of compiler in live patching:
>> 
>> For the live patching schemes that create patches by hand, when patching 
>> one function, there might a list of functions that will be impacted by 
>> this patched function due to compiler optimization/analyses (mainly IPA
>> optimization/analyses), a complete patch will include the patched function
>> and all impacted functions. Usually, there are two major factors to be
>> considered in such live patching schemes:
>> 
>> * patch code size, one major factor is the length of the list 
>> of impacted functions;
>> * run time performance.
>> 
>> If we want to control the patch code size, to make the list of impacted 
>> functions minimum, we have to disable corresponding compiler optimizations 
>> as much as possible.
> 
> Andi already talked about it and I, too, do not understand your worry 
> about patch code size. First, it has never been so bad here. Yes, 
> sometimes the function closure gets bigger due to optimizations and 
> inlining. I've considered it as nothing else than a lack of better 
> tooling, because it is indeed something which could be improved a lot. 
> Nicolai (CCed) works on a potential solution. It is also one of the topics 
> at LPC miniconf in Vancouver.
> 
> Second, the idea to disable inlining would not fly at SUSE. I can't 
> imagine to even propose it. The kernel heavily relies on the feature. The 
> optimizations are a different story and some of those certainly could be 
> disabled with no harm caused.
> 
> So let me ask, what is your motivation behind this? Is there a real 
> problem you're trying to solve? It may have been mentioned somewhere and I 
> missed it.

the major functionality we want is:   to Only enable static inlining for live 
patching for one 
of our internal customers.   the major purpose is to control the patch code 
size explosion and
debugging complexity due to too much inlining of global functions for the 
specific application.

therefore, I proposed the multiple level of control for -flive-patching to 
satisfy multiple request from 
different users. 

So far, from the feedback, I see that among the 4 levels of control,   none, 
only-inline-static, inline,
and inline-clone,   “none” and “inline” are NOT needed at all.

however,  -flive-patching = [only-inline-static | inline-clone] are necessary.

> 
>> On the other hand, in order to keep good run time performance, we need to 
>> keep the compiler optimization as much as possible. 
>> 
>> So, there should be some tradeoff between these two factors. 
>> 
>> The following are two major categories of compiler optimizations 
>> we should considered:
>> 
>> A. compiler optimizations/analyses that extract ipa info from
>> a routine's body, and use such info to guide other optimization.
>> 
>> Since the body of the routine might be changed for live patching, 
>> the ipa info extracted from the body of the routine also changes,
>> as a result, all the routines that directly or indirectly utilize 
>> the ipa info from this routine are in the list of the impacted 
>> routines.  
>> 
>> Most of the IPA analyses and optimization belong to this category. 
>> 
>> Although theoretically the 

Re: [RFC] GCC support for live-patching

2018-10-23 Thread Qing Zhao


> On Oct 23, 2018, at 4:11 AM, Miroslav Benes  wrote:
>> 
>> One question here,  what’s the major benefit to prepare the patches 
>> manually? 
> 
> I could almost quote what you wrote below. It is a C file, easy to review 
> and maintain. You have everything "under control". It allows to implement 
> tricky hacks easily by hand if needed.

Okay, I see. 

another question here:

>From my understanding of the live patching creation from Nicolai’s email, the 
>patch includes:

1. initial patched functions;
2. all the callers of any patched function if it’s been 
inlined/cloned/optimized;
3. recursively copy any needed cpp macro, type, or
  functions definition and add references to data objects with static
  storage duration.

during review and maintain procedure, are all the above 3 need to be reviewed 
and maintained?

>>> 
>>> So let me ask, what is your motivation behind this? Is there a real 
>>> problem you're trying to solve? It may have been mentioned somewhere and I 
>>> missed it.
>> 
>> the major functionality we want is:   to Only enable static inlining for 
>> live patching for one 
>> of our internal customers.   the major purpose is to control the patch code 
>> size explosion and
>> debugging complexity due to too much inlining of global functions for the 
>> specific application.
> 
> I hoped for more details, but ok.
at this time, this is the details I have. I can ask more if more details are 
needed.

> 
>> therefore, I proposed the multiple level of control for -flive-patching to 
>> satisfy multiple request from 
>> different users. 
>> 
>> So far, from the feedback, I see that among the 4 levels of control,   none, 
>> only-inline-static, inline,
>> and inline-clone,   “none” and “inline” are NOT needed at all.
>> 
>> however,  -flive-patching = [only-inline-static | inline-clone] are 
>> necessary.
>> 
>>> 
 
 3. Details of the proposal:
>>> 
>>> This sounds awfully complicated. Especially when there is a dumping option 
>>> in GCC thanks to Martin. What information do you miss there? We could 
>>> improve the analysis tool. So far, it has given us all the info we need.
>> 
>> Yes, it’s TRUE that the tool Martin wrote should serve the same purpose. 
>> nothing new from this
>> new GCC option -flive-patch-list compared to Martin’s tool.
>> 
>> However,  by simply adding this new GCC’s option, we can simplify the whole 
>> procedure for helping
>> live-patching. by only running  GCC with the new added options once, we can 
>> get the impacted function list
>> at the same time. No need to run another tool anymore.   
> 
> I probably do not understand completely. I thought that using the 
> option you would "preprocess" everything during the kernel build and then 
> you'd need a tool to get the impacted function list for a given function. 
> In that case, Martin's work is more than sufficient.
> 
> Now I think you meant to run GCC with a given function, build everything 
> and the list. Iteratively for every to-be-patched function. It does not 
> sound better to me.

there might be misunderstanding among us for this part.  Let me explain my 
understanding first:

1. with martin’s tool, there are two steps to get the impacted function list 
for patched functions:

Step1,  build kernel with GCC with -fdump-ipa-clones + a bunch of options 
to disable bunch of ipa optimizations. ;
Step2,  using the tool kgraft-ipa-analysis.py to analyze the dumped file 
from step1 to report the impacted function list.

2. with the new functionality of the GCC proposed in this proposal, 
-flive-patching -flive-patch-list

Step1,  build kernel with GCC with  -flive-patching -flive-patch-list

then gcc will automatically disable the unsafe ipa optimizations and report 
the impacted function list with the safe ipa optimizations. 

compare 1 and 2,  I think that 2 is better and much more convenient than 1. 
another benefit from 2 is:

if later we want more ipa optimization to be On for live-patching for the 
runtime performance purpose, we can expand it easily to include those
ipa optimization and at the same time report the additional impacted function 
list with the new ipa optimizations. 

however, for 1,  this will be not easy to be extended. 

do I miss anything here?

> 
>> this is the major benefit from this new option.
>> 
>> anyway, if most of the people think that this new option is not necessary, I 
>> am fine to delete it. 
>>> 
>>> In the end, I'd be more than happy with what has been proposed in this 
>>> thread by the others. To have a way to guarantee that GCC would not apply 
>>> an optimization that could potentially destroy our effort to livepatch a 
>>> running system.
>> 
>> So, the major functionality you want from GCC is:
>> 
>> -flive-patching=inline-clone
>> 
>> Only enable inlining and all optimizations that internally create clone,
>> for example, cloning, ipa-sra, partial inlining, etc; disable all 
>> other IPA optimizations/analyses.
>> 
>> As a result, 

Re: GCC options for kernel live-patching (Was: Add a new option to control inlining only on static functions)

2018-11-05 Thread Qing Zhao
>>> 
  - ipa-pta (disabled by default, -fno-ipa-pta)
  - ipa-reference (list of accessed/modified global vars), disable by 
 -fno-ipa-refernece
  - stack alignment requirements (no flag to disable)
>>> 
>>> Would it be possible to add flag for it? Can you please point to a location 
>>> where
>>> the optimization happen?
> 
> In expand_call
> 
>  /* Figure out the amount to which the stack should be aligned.  */
>  preferred_stack_boundary = PREFERRED_STACK_BOUNDARY;
>  if (fndecl)
>{
>  struct cgraph_rtl_info *i = cgraph_node::rtl_info (fndecl);
>  /* Without automatic stack alignment, we can't increase preferred
> stack boundary.  With automatic stack alignment, it is
> unnecessary since unless we can guarantee that all callers will
> align the outgoing stack properly, callee has to align its
> stack anyway.  */
>  if (i
>  && i->preferred_incoming_stack_boundary
>  && i->preferred_incoming_stack_boundary < preferred_stack_boundary)
>preferred_stack_boundary = i->preferred_incoming_stack_boundary;
>}
> 

As I checked, in the above, i->preferred_incoming_stack_boundary is set to 
non-zero
when "decl_binds_to_current_def_p ()” is TRUE as the following: (in 
rest_of_clean_state()
of final.c)

  /* We can reduce stack alignment on call site only when we are sure that
 the function body just produced will be actually used in the final
 executable.  */
  if (decl_binds_to_current_def_p (current_function_decl))
{
  unsigned int pref = crtl->preferred_stack_boundary;
  if (crtl->stack_alignment_needed > crtl->preferred_stack_boundary)
pref = crtl->stack_alignment_needed;
  cgraph_node::rtl_info (current_function_decl)
->preferred_incoming_stack_boundary = pref;
}

It looks like that “decl_binds_to_current_def_p()”  will be FALSE when the 
routine
is live-patched, right? 
So, should we disable all the places that guided by 
“decl_binds_to_current_def_p()”? 

thanks.

Qing




Re: [PATCH][RFC] Come up with -flive-patching master option.

2018-11-11 Thread Qing Zhao
Hi,


> On Nov 10, 2018, at 2:51 AM, Martin Liška  wrote:
> 
> On 11/9/18 6:43 PM, Qing Zhao wrote:
>> Hi, Martin,
>> 
>> thanks a lot for the previous two new options for live-patching. 
>> 
>> 
>> I have two more questions below:
> 
> Hello.
> 
>> 
>> 1. do we still need new options to disable the following:
>>   A. unreachable code/variable removal? 
> 
> I hope it's guarded with newly added option -fipa-reference-addressable. 
> Correct me
> if I'm wrong.

The followings are some previous discussions on this:

“
>>> 
>>> We perform discovery of functions/variables with no address taken and
>>> optimizations that are not valid otherwise such as duplicating them
>>> or doing skipping them for alias analysis (no flag to disable)
>> 
>> Can you be please more verbose here? What optimizations do you mean?

See ipa_discover_readonly_nonaddressable_vars. If addressable bit is
cleared we start analyzing uses of the variable via ipa_reference or so.
If writeonly bit is set, we start removing writes to the variable and if
readonly bit is set we skip any analysis about whether vairable changed.
“

the new -fipa-reference-addressable is to control the above,  seems not the 
unreachable code/variable removal?

> 
>>   B. Visibility changes with -flto and/or -fwhole-program?
> 
> The options are not used in linux kernel, thus I didn't consider.

Okay, I see.

> 
>> 
>> 2. for this new patch, could you please explain a little bit more on the 
>> problem?
> 
> We want to enable a single option that will disable all possible (and future) 
> optimizations
> that influence live patching.

Okay, I see.

I am also working on a similar option as yours, but make the -flive-patching as 
two level control:

+flive-patching
+Common RejectNegative Alias(flive-patching=,inline-clone)
+
+flive-patching=
+Common Report Joined RejectNegative Enum(live_patching_level) 
Var(flag_live_patching) Init(LIVE_NONE)
+-flive-patching=[inline-only-static|inline-clone]  Control optimizations 
to provide a safe comp for live-patching purpose.

the implementation for -flive-patching=inline-clone (the default) is exactly as 
yours,  the new level -flive-patching=inline-only-static
is to only enable inlining of static function for live patching, which is 
important for multiple-processes live patching to control memory
consumption. 

(please see my 2nd version of the -flive-patching proposal).

I will send out my complete patch in another email.

thanks.

Qing




Re: [PATCH][RFC] Come up with -flive-patching master option.

2018-11-09 Thread Qing Zhao
Hi, Martin,

thanks a lot for the previous two new options for live-patching. 


I have two more questions below:

1. do we still need new options to disable the following:
   A. unreachable code/variable removal? 
   B. Visibility changes with -flto and/or -fwhole-program?

2. for this new patch, could you please explain a little bit more on the 
problem?

Thanks.

Qing

> On Nov 9, 2018, at 9:33 AM, Martin Liška  wrote:
> 
> Hi.
> 
> After I added 2 new options, I would like to include a new master option.
> It's minimal version which only disables optimizations that we are aware of
> and can potentially cause problems for live-patching.
> 
> Martin
> <0001-Come-up-with-fvectorized-functions.patch>



Re: [PATCH][RFC] Come up with -flive-patching master option.

2018-11-12 Thread Qing Zhao


> On Nov 12, 2018, at 2:53 AM, Martin Liška  wrote:
> 
>> 
>> Okay, I see.
>> 
>> I am also working on a similar option as yours, but make the -flive-patching 
>> as two level control:
>> 
>> +flive-patching
>> +Common RejectNegative Alias(flive-patching=,inline-clone)
>> +
>> +flive-patching=
>> +Common Report Joined RejectNegative Enum(live_patching_level) 
>> Var(flag_live_patching) Init(LIVE_NONE)
>> +-flive-patching=[inline-only-static|inline-clone]  Control 
>> optimizations to provide a safe comp for live-patching purpose.
>> 
>> the implementation for -flive-patching=inline-clone (the default) is exactly 
>> as yours,  the new level -flive-patching=inline-only-static
>> is to only enable inlining of static function for live patching, which is 
>> important for multiple-processes live patching to control memory
>> consumption. 
>> 
>> (please see my 2nd version of the -flive-patching proposal).
>> 
>> I will send out my complete patch in another email.
> 
> Hi, sure, works for me. Let's make 2 level option.

thank you.

I will send the patch tomorrow.

Qing
> 
> Martin



Re: [PATCH][RFC] Come up with -flive-patching master option.

2018-11-13 Thread Qing Zhao
Hi,

> On Nov 13, 2018, at 1:18 PM, Miroslav Benes  wrote:
> 
>> Attached is the patch for new -flive-patching=[inline-only-static | 
>> inline-clone] master option.
>> 
>> '-flive-patching=LEVEL'
>> Control GCC's optimizations to provide a safe compilation for
>> live-patching.  Provides multiple-level control on how many of the
>> optimizations are enabled by users' request.  The LEVEL argument
>> should be one of the following:
>> 
>> 'inline-only-static'
>> 
>>  Only enable inlining of static functions, disable all other
>>  ipa optimizations/analyses.  As a result, when patching a
>>  static routine, all its callers need to be patches as well.
>> 
>> 'inline-clone'
>> 
>>  Only enable inlining and all optimizations that internally
>>  create clone, for example, cloning, ipa-sra, partial inlining,
>>  etc.; disable all other ipa optimizations/analyses.  As a
>>  result, when patching a routine, all its callers and its
>>  clones' callers need to be patched as well.
> 
> Based on our previous discussion I assume that "clone" optimizations are 
> safe (for LP) and the others are not. Anyway I'd welcome a note mentioning 
> that disabled optimizations are dangerous for LP.

actually, I don’t think that those disabled optimizations are “dangerous” for 
live-patching. one of the major reasons we disable them
is because that currently the compiler does NOT provide a good way to compute 
the impacted function list for those optimizations.
therefore, we disable them at this time. 

many of them could be enabled too if the compiler can report the impacted 
function list accurately in the future.



> 
> I know it may be the same for you, but it is not for me as a GCC user. 
> "internally create clone" sounds very... well, internal. It does not 
> describe the option much for ordinary user whow has no knowledge about GCC 
> internals.
> 
> So could you rephrase it a bit, please?

I tried to make this clear. please see the following:

'-flive-patching=LEVEL'
 Control GCC's optimizations to provide a safe compilation for
 live-patching.

 If the compiler's optimization uses a function's body or
 information extracted from its body to optimize/change another
 function, the latter is called an impacted function of the former.
 If a function is patched, its impacted functions should be patched
 too.

 The impacted functions are decided by the compiler's
 interprocedural optimizations.  For example, inlining a function
 into its caller, cloning a function and changing its caller to call
 this new clone, or extracting a function's pureness/constness
 information to optimize its direct or indirect callers, etc.

 Usually, the more ipa optimizations enabled, the larger the number
 of impacted functions for each function.  In order to control the
 number of impacted functions and computed the list of impacted
 function easily, we provide control to partially enable ipa
 optimizations on two different levels.

 The LEVEL argument should be one of the following:

 'inline-only-static'

  Only enable inlining of static functions, disable all other
  interprocedural optimizations/analyses.  As a result, when
  patching a static routine, all its callers need to be patches
  as well.

 'inline-clone'

  Only enable inlining and cloning optimizations, which includes
  inlining, cloning, interprocedural scalar replacement of
  aggregates and partial inlining.  Disable all other
  interprocedural optimizations/analyses.  As a result, when
  patching a routine, all its callers and its clones' callers
  need to be patched as well.

 When -flive-patching specified without any value, the default value
 is "inline-clone".

 This flag is disabled by default.


> 
>> When -flive-patching specified without any value, the default value
>> is "inline-clone".
>> 
>> This flag is disabled by default.
>> 
>> let me know your comments and suggestions on the implementation.
> 
> I compared it to Martin's patch and ipa-icf-variables is not covered in 
> yours (I may have missed something).

Yes, you are right. I added this into my patch.

I am attaching the new patch here.



flive-patching.patch
Description: Binary data


> 
> Thanks,
> Miroslav



[PATCH][RFC] Come up with -flive-patching master option.

2018-11-13 Thread Qing Zhao
Hi,


Attached is the patch for new -flive-patching=[inline-only-static | 
inline-clone] master option.

'-flive-patching=LEVEL'
 Control GCC's optimizations to provide a safe compilation for
 live-patching.  Provides multiple-level control on how many of the
 optimizations are enabled by users' request.  The LEVEL argument
 should be one of the following:

 'inline-only-static'

  Only enable inlining of static functions, disable all other
  ipa optimizations/analyses.  As a result, when patching a
  static routine, all its callers need to be patches as well.

 'inline-clone'

  Only enable inlining and all optimizations that internally
  create clone, for example, cloning, ipa-sra, partial inlining,
  etc.; disable all other ipa optimizations/analyses.  As a
  result, when patching a routine, all its callers and its
  clones' callers need to be patched as well.

 When -flive-patching specified without any value, the default value
 is "inline-clone".

 This flag is disabled by default.

let me know your comments and suggestions on the implementation.

thanks a lot.

Qing



flive-patching.patch
Description: Binary data


> On Nov 12, 2018, at 4:29 PM, Qing Zhao  wrote:
> 
> 
>> On Nov 12, 2018, at 2:53 AM, Martin Liška  wrote:
>> 
>>> 
>>> Okay, I see.
>>> 
>>> I am also working on a similar option as yours, but make the 
>>> -flive-patching as two level control:
>>> 
>>> +flive-patching
>>> +Common RejectNegative Alias(flive-patching=,inline-clone)
>>> +
>>> +flive-patching=
>>> +Common Report Joined RejectNegative Enum(live_patching_level) 
>>> Var(flag_live_patching) Init(LIVE_NONE)
>>> +-flive-patching=[inline-only-static|inline-clone]  Control 
>>> optimizations to provide a safe comp for live-patching purpose.
>>> 
>>> the implementation for -flive-patching=inline-clone (the default) is 
>>> exactly as yours,  the new level -flive-patching=inline-only-static
>>> is to only enable inlining of static function for live patching, which is 
>>> important for multiple-processes live patching to control memory
>>> consumption. 
>>> 
>>> (please see my 2nd version of the -flive-patching proposal).
>>> 
>>> I will send out my complete patch in another email.
>> 
>> Hi, sure, works for me. Let's make 2 level option.
> 
> thank you.
> 
> I will send the patch tomorrow.
> 
> Qing
>> 
>> Martin
> 



Re: GCC options for kernel live-patching (Was: Add a new option to control inlining only on static functions)

2018-10-03 Thread Qing Zhao


> On Oct 3, 2018, at 4:04 AM, Jan Hubicka  wrote:
> 
>> 
>> That was promised to be done by Honza Hubička. He's very skilled in IPA 
>> optimizations and he's aware
>> of optimizations that cause troubles for live-patching.
> 
> :) I am not sure how skilful I am, but here is what I arrived to.
> 
> We have transformations that are modeled as clonning, which are
>  - inlining  (can't be disabled completely because of always inline, but 
> -fno-inline
>does most of stuff)
>  - cloning (disabled via -fno-ipa-cp)
>  - ipa-sra (-fno-ipa-sra)
>  - splitting (-fno-partial-inlining)
> These should play well with Martin's tracking code

In addition to the above IPA transformations that cloned a routine, the 
following RTL transformation will clone a routine, too:

reorder-blocks-and-partition

this optimization will create .cold routines based on profiling feedback.

looks like that we also need to include this transformation into this category?

> 
> We propagate info about side effects of function:
>  - function attribute discovery (pure, const, nothrow, malloc)
>Some of this can be disabled by -fno-ipa-pure-const, but not all
>of it.  Nothrow does not have flag but it is obviously not a concern
>for C++
>  - ipa-pta (disabled by default, -fno-ipa-pta)
>  - ipa-reference (list of accessed/modified global vars), disable by 
> -fno-ipa-refernece
>  - stack alignment requirements (no flag to disable)
>  - inter-procedural register allocation (-fno-ipa-ra)


> 
> We perform discovery of functions/variables with no address taken and
> optimizations that are not valid otherwise such as duplicating them
> or doing skipping them for alias analysis (no flag to disable)
> 
> Identical code folding merges function bodies that are semanticaly equivalent
> and thus one can't patch one without patching another, -fno-ipa-icf
> 
> Unreachable code/variable removal may be concern too (no flag to disable)
> 
> Write only global variable discovery (no flag to dosable)
> 
> Visibility changes with -flto and/or -fwhole-program
> 
> We also have profile propagation (discovery of cuntions used only in cold 
> regions,
> but that I guess is only performance issue not correctness)
> No flag to disable

are all the above analyzes guarded by AVAIL_INTERPOSABLE right now?
If NOT, should they be guarded by AVAIL_INTERPOSABLE?

Qing

> 
> Honza
> 
>> 
>> Martin
>> 
>>> 
>>> thanks.
>>> 
>>> Qing
>>> 
>> 



Re: GCC options for kernel live-patching (Was: Add a new option to control inlining only on static functions)

2018-10-02 Thread Qing Zhao


> On Oct 2, 2018, at 9:55 AM, Martin Liška  wrote:
 
 Affected functions: 5
   __ilog2_u64/132 (include/linux/log2.h:40:5)
   ablkcipher_request_alloc/1639 (include/linux/crypto.h:979:82)
   ablkcipher_request_alloc.constprop.8/3198 (include/linux/crypto.h:979:82)
   helper_rfc4106_decrypt/3007 (arch/x86/crypto/aesni-intel_glue.c:1016:12)
   helper_rfc4106_encrypt/3006 (arch/x86/crypto/aesni-intel_glue.c:939:12)
 [..skipped..]
 
 
 if we want to patch the function “fls64/63”,  what else functions we need 
 to patch, too? my guess is:
>>> 
>>> Hi.
>>> 
>>> Yes, 'Affected functions' is exactly the list of functions you want to 
>>> patch.
>>> 
 
 **all the callers:
 __ilog2_u64/132
 ablkcipher_request_alloc/1639
 helper_rfc4106_decrypt/3007
 helper_rfc4106_encrypt/3006 
 **and:
 ablkcipher_request_alloc.constprop.8/3198
 is the above correct?
 
 how to generate patch for ablkcipher_request_alloc.constprop.8/3198? since 
 it’s not a function in the source code?
>>> 
>>> Well, it's a 'static inline' function in a header file thus the function 
>>> will be inlined in all usages.
>>> In this particular case there's no such caller function, so you're fine.
>> 
>> So, for cloned functions, you have to analyze them case by case manually to 
>> see their callers?
> 
> No, the tool should provide complete list of affected functions.

So,  the tool will provide the callers of the cloned routine? then we will 
patch the callers of the cloned routine, Not the cloned routine itself?

> 
>> why not just disable ipa-cp or ipa-sra completely?
> 
> Because the optimizations create function clones, which are trackable with my 
> tool
> and one knows then all affected functions.
Okay. I see. 
> 
> You can disable the optimizations, but you'll miss some performance benefit 
> provide
> by compiler.
> 
> Note that as Martin Jambor mentioned in point 2) there are also IPA 
> optimizations that
> do not create clones. These should be listed and eventually disabled for 
> kernel live
> patching.

Yes, such IPA analyzes should be disabled.  we need to identify a complete list 
of such analyzes.

thanks.

Qing



Re: GCC options for kernel live-patching (Was: Add a new option to control inlining only on static functions)

2018-10-02 Thread Qing Zhao


> On Oct 2, 2018, at 3:33 AM, Martin Jambor  wrote:
> 
> Hi,
> 
> my apologies for being terse, I'm in a meeting.
> 
> On Mon, Oct 01 2018, Qing Zhao wrote:
>> Hi, Martin,
>> 
>> I have studied a little more on
>> 
>> https://github.com/marxin/kgraft-analysis-tool/blob/master/README.md 
>> <https://github.com/marxin/kgraft-analysis-tool/blob/master/README.md>
>> 
>> in the Section “Usages”, from the example, we can see:
>> 
>> the tool will report a list of affected functions for a function that will 
>> be patched.
>> In this list, it includes all callers of the patched function, and the 
>> cloned functions from the patched function due to ipa const-propogation or 
>> ipa sra. 
>> 
>> My question:
>> 
>> what’s the current action to handle the cloned functions from the
>> patched function due to ipa const-proposation or ipa sra, etc?
> 
> If we want to patch an inlined, cloned, or IPA-SRAed function, we also
> patch all of its callers.

take the example from the link:

$ gcc /home/marxin/Programming/linux/aesni-intel_glue.i -O2 -fdump-ipa-clones -c
$ ./kgraft-ipa-analysis.py aesni-intel_glue.i.000i.ipa-clones

[..skipped..]
Function: fls64/63 (./arch/x86/include/asm/bitops.h:479:90)
  inlining to: __ilog2_u64/132 (include/linux/log2.h:40:5)
inlining to: ablkcipher_request_alloc/1639 (include/linux/crypto.h:979:82)
  constprop: ablkcipher_request_alloc.constprop.8/3198 
(include/linux/crypto.h:979:82)
inlining to: helper_rfc4106_decrypt/3007 
(arch/x86/crypto/aesni-intel_glue.c:1016:12)
inlining to: helper_rfc4106_encrypt/3006 
(arch/x86/crypto/aesni-intel_glue.c:939:12)

  Affected functions: 5
__ilog2_u64/132 (include/linux/log2.h:40:5)
ablkcipher_request_alloc/1639 (include/linux/crypto.h:979:82)
ablkcipher_request_alloc.constprop.8/3198 (include/linux/crypto.h:979:82)
helper_rfc4106_decrypt/3007 (arch/x86/crypto/aesni-intel_glue.c:1016:12)
helper_rfc4106_encrypt/3006 (arch/x86/crypto/aesni-intel_glue.c:939:12)
[..skipped..]


if we want to patch the function “fls64/63”,  what else functions we need to 
patch, too? my guess is:

**all the callers:
__ilog2_u64/132
ablkcipher_request_alloc/1639
helper_rfc4106_decrypt/3007
helper_rfc4106_encrypt/3006 
**and:
ablkcipher_request_alloc.constprop.8/3198
is the above correct?

how to generate patch for ablkcipher_request_alloc.constprop.8/3198? since it’s 
not a function in the source code?

Qing

> 
>> 
>> since those cloned functions are NOT in the source code level, how to 
>> generate the patches for the cloned functions? how to guarantee that after 
>> the patched function is changed, the same ipa const-propogation or ipa
>> sra will still happened?
> 
> You don't.
> 
> Martin
> 
>> 



Re: GCC options for kernel live-patching (Was: Add a new option to control inlining only on static functions)

2018-10-02 Thread Qing Zhao


> On Oct 2, 2018, at 9:02 AM, Martin Liška  wrote:
> 
> On 10/2/18 3:28 PM, Qing Zhao wrote:
>> 
>>> On Oct 2, 2018, at 3:33 AM, Martin Jambor  wrote:
>>> 
>>> Hi,
>>> 
>>> my apologies for being terse, I'm in a meeting.
>>> 
>>> On Mon, Oct 01 2018, Qing Zhao wrote:
>>>> Hi, Martin,
>>>> 
>>>> I have studied a little more on
>>>> 
>>>> https://github.com/marxin/kgraft-analysis-tool/blob/master/README.md 
>>>> <https://github.com/marxin/kgraft-analysis-tool/blob/master/README.md>
>>>> 
>>>> in the Section “Usages”, from the example, we can see:
>>>> 
>>>> the tool will report a list of affected functions for a function that will 
>>>> be patched.
>>>> In this list, it includes all callers of the patched function, and the 
>>>> cloned functions from the patched function due to ipa const-propogation or 
>>>> ipa sra. 
>>>> 
>>>> My question:
>>>> 
>>>> what’s the current action to handle the cloned functions from the
>>>> patched function due to ipa const-proposation or ipa sra, etc?
>>> 
>>> If we want to patch an inlined, cloned, or IPA-SRAed function, we also
>>> patch all of its callers.
>> 
>> take the example from the link:
>> 
>> $ gcc /home/marxin/Programming/linux/aesni-intel_glue.i -O2 
>> -fdump-ipa-clones -c
>> $ ./kgraft-ipa-analysis.py aesni-intel_glue.i.000i.ipa-clones
>> 
>> [..skipped..]
>> Function: fls64/63 (./arch/x86/include/asm/bitops.h:479:90)
>>  inlining to: __ilog2_u64/132 (include/linux/log2.h:40:5)
>>inlining to: ablkcipher_request_alloc/1639 (include/linux/crypto.h:979:82)
>>  constprop: ablkcipher_request_alloc.constprop.8/3198 
>> (include/linux/crypto.h:979:82)
>>inlining to: helper_rfc4106_decrypt/3007 
>> (arch/x86/crypto/aesni-intel_glue.c:1016:12)
>>inlining to: helper_rfc4106_encrypt/3006 
>> (arch/x86/crypto/aesni-intel_glue.c:939:12)
>> 
>>  Affected functions: 5
>>__ilog2_u64/132 (include/linux/log2.h:40:5)
>>ablkcipher_request_alloc/1639 (include/linux/crypto.h:979:82)
>>ablkcipher_request_alloc.constprop.8/3198 (include/linux/crypto.h:979:82)
>>helper_rfc4106_decrypt/3007 (arch/x86/crypto/aesni-intel_glue.c:1016:12)
>>helper_rfc4106_encrypt/3006 (arch/x86/crypto/aesni-intel_glue.c:939:12)
>> [..skipped..]
>> 
>> 
>> if we want to patch the function “fls64/63”,  what else functions we need to 
>> patch, too? my guess is:
> 
> Hi.
> 
> Yes, 'Affected functions' is exactly the list of functions you want to patch.
> 
>> 
>> **all the callers:
>> __ilog2_u64/132
>> ablkcipher_request_alloc/1639
>> helper_rfc4106_decrypt/3007
>> helper_rfc4106_encrypt/3006 
>> **and:
>> ablkcipher_request_alloc.constprop.8/3198
>> is the above correct?
>> 
>> how to generate patch for ablkcipher_request_alloc.constprop.8/3198? since 
>> it’s not a function in the source code?
> 
> Well, it's a 'static inline' function in a header file thus the function will 
> be inlined in all usages.
> In this particular case there's no such caller function, so you're fine.

So, for cloned functions, you have to analyze them case by case manually to see 
their callers?
why not just disable ipa-cp or ipa-sra completely?

Qing
> 



Re: GCC options for kernel live-patching (Was: Add a new option to control inlining only on static functions)

2018-10-01 Thread Qing Zhao
Hi, Martin,

I have studied a little more on

https://github.com/marxin/kgraft-analysis-tool/blob/master/README.md 


in the Section “Usages”, from the example, we can see:

the tool will report a list of affected functions for a function that will be 
patched.
In this list, it includes all callers of the patched function, and the cloned 
functions from the patched function due to ipa const-propogation or ipa sra. 

My question:

what’s the current action to handle the cloned functions from the patched 
function due to ipa const-proposation or ipa sra, etc?

since those cloned functions are NOT in the source code level, how to generate 
the patches for the cloned functions? how to guarantee that after 
the patched function is changed, the same ipa const-propogation or ipa sra will 
still happened? 

a little confused here.

thanks.

Qing
> On Sep 27, 2018, at 7:19 AM, Martin Jambor  wrote:
> 
> Hi,
> 
> (this message is a part of the thread originating with
> https://gcc.gnu.org/ml/gcc-patches/2018-09/msg01018.html)
> 
> On Thu, Sep 27 2018, Jan Hubicka wrote:
 If you make this to be INTERPOSABLE (which means it can be replaced by 
 different
 implementation by linker and that is probably what we want for live 
 patching)
 then also inliner, ipa-sra and other optimization will give up on these.
>>> 
>>> do you suggest that to set the global function as AVAIL_INTERPOSABLE when 
>>> -finline-only-static 
>>> is present? then we should avoid all issues?
>> 
>> It seems to be reasonable direction I think, because it is what really 
>> happens
>> (well AVAIL_INTERPOSABLE still does not assume that the interposition will
>> happen at runtime, but it is an approximation and we may introduce something 
>> like
>> AVAIL_RUNTIME_INTERPOSABLE if there is need for better difference).
>> I wonder if -finline-only-static is good name for the flag though, because it
>> does a lot more than that.  Maybe something like -flive-patching?
>> How much is this all tied to one particular implementation of the feature?
> 
> We have just had a quick discussion with two upstream maintainers of
> Linux kernel live-patching about this and the key points were:
> 
> 1. SUSE live-patch creators (and I assume all that use the upstream
>   live-patching method) use Martin Liska's (somewhat under-documented)
>   -fdump-ipa-clones option and a utility he wrote
>   (https://github.com/marxin/kgraft-analysis-tool) to deal with all
>   kinds of inlining, IPA-CP and generally all IPA optimizations that
>   internally create a clone.  The tool tells them what happened and
>   also lists all callers that need to be live-patched.
> 
> 2. However, there is growing concern about other IPA analyses that do
>   not create a clone but still affect code generation in other
>   functions.  Kernel developers have identified and disabled IPA-RA but
>   there is more of them such as IPA-modref analysis, stack alignment
>   propagation and possibly quite a few others which extract information
>   from one function and use it a caller or perhaps even some
>   almost-unrelated functions (such as detection of read-only and
>   write-only static global variables).
> 
>   The kernel live-patching community would welcome if GCC had an option
>   that could disable all such optimizations/analyses for which it
>   cannot provide a list of all affected functions (i.e. which ones need
>   to be live-patched if a particular function is).
> 
>   I assume this is orthogonal to the proposed -finline-only-static
>   option, but the above approach seems superior in all respects.
> 
> 3. The community would also like to be involved in these discussions,
>   and therefore I am adding live-patch...@vger.kernel.org to CC.  On a
>   related note, they will also have a live-patching mini-summit at the
>   Linux Plumbers conference in Vancouver in November where they plan to
>   discuss what they would like GCC to provide.
> 
> Thanks,
> 
> Martin
> 



Re: [PATCH][Middle-end][Version 3]Add a new option to control inlining only on static functions

2018-09-19 Thread Qing Zhao
thanks, Martin.

> On Sep 18, 2018, at 5:26 PM, Martin Sebor  wrote:
>> 
>> gcc/ChangeLog
>> 
>> +2018-09-18  Qing Zhao  mailto:qing.z...@oracle.com>>
>> +
>> +* cif-code.def (FUNCTION_EXTERN): New CIFCODE.
>> +* common.opt (-finline-only-static): New option.
>> +* doc/invoke.texi: Document -finline-only-static.
>> +* ipa-inline.c (can_inline_edge_p): Control inlining based on
>> +function's visibility.
> 
> Probably "linkage" would be a more fitting term here.
Okay.

> 
>> 
>> gcc/testsuite/ChangeLog
>> 
>> +2018-09-18  Qing Zhao  mailto:qing.z...@oracle.com>>
>> +
>> +* gcc.dg/inline_only_static.c: New test.
>> +
> 
> 
> diff --git a/gcc/cif-code.def b/gcc/cif-code.def
> index 19a7621..64b2b1a 100644
> --- a/gcc/cif-code.def
> +++ b/gcc/cif-code.def
> @@ -132,6 +132,12 @@
> DEFCIFCODE(USES_COMDAT_LOCAL, CIF_FINAL_ERROR,
> DEFCIFCODE(ATTRIBUTE_MISMATCH, CIF_FINAL_ERROR,
>  N_("function attribute mismatch"))
> 
> +/* We can't inline because the user requests only inlining static function
> +   but the function is external visible.  */
> 
> I suspect you meant: "only static functions" (plural) and
> "the function has external linkage" (as defined in the C and
> C++ standards).
Okay.

> 
> +DEFCIFCODE(FUNCTION_EXTERN, CIF_FINAL_ERROR,
> +N_("function is external visible when the user requests only"
> +   " inlining static"))
> +
> 
> Here as well: either "function has external linkage" or "function
> is extern.”
Okay.
> 
> ===
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index ec12711..b6b0db5 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @ -8066,6 +8067,15 @@
> having large chains of nested wrapper functions.
> 
> Enabled by default.
> 
> +@item -finline-only-static
> +@opindex finline-only-static
> +By default, GCC inlines functions without considering whether they are static
> +or not. This flag guides inliner to only inline static functions.
> 
> Guides "the inliner" (missing article).
Okay.

> 
> +This option has any effect only when inlining itself is turned on by the
> +-finline-functions or -finline-small-fiunctions.
> 
> "by the -f... options."  (Missing "options") and
> -finline-small-functions (note the spelling of functions).

Okay.
> 
> +
> +Off by default.
> 
> I think the customary way to word it is: "Disabled by default."
> or "The finline-only-static option/flag is disabled/off by default”

Okay.

Qing
> 
> Martin



[PATCH][Middle-end][Version 4]Add a new option to control inlining only on static functions

2018-09-21 Thread Qing Zhao
Hi, this is the 4th version of the patch.

mainly address Martin’s comments on some spelling issues.

I have tested the patch on both x86 and aarch64, no issue.

Okay for commit?

thanks.

Qing.

gcc/ChangeLog

+2018-09-20  Qing Zhao  
+
+   * cif-code.def (FUNCTION_EXTERN): New CIFCODE.
+   * common.opt (-finline-only-static): New option.
+   * doc/invoke.texi: Document -finline-only-static.
+   * ipa-inline.c (can_inline_edge_p): Control inlining based on
+   function's linkage. 

gcc/testsuite/ChangeLog

+2018-09-20  Qing Zhao  
+
+   * gcc.dg/inline_only_static.c: New test.
+



New-inline-only-static.patch
Description: Binary data


Re: [PATCH][Middle-end][Version 4]Add a new option to control inlining only on static functions

2018-09-26 Thread Qing Zhao


> On Sep 26, 2018, at 8:24 AM, Richard Biener  wrote:
> 
> On Wed, 26 Sep 2018, Jason Merrill wrote:
> 
>> On Fri, Sep 21, 2018 at 11:12 AM, Qing Zhao  wrote:
>>> Hi, this is the 4th version of the patch.
>>> 
>>> mainly address Martin’s comments on some spelling issues.
>>> 
>>> I have tested the patch on both x86 and aarch64, no issue.
>>> 
>>> Okay for commit?
>>> 
>>> thanks.
>>> 
>>> Qing.
>>> 
>>> gcc/ChangeLog
>>> 
>>> +2018-09-20  Qing Zhao  
>>> +
>>> +   * cif-code.def (FUNCTION_EXTERN): New CIFCODE.
>>> +   * common.opt (-finline-only-static): New option.
>>> +   * doc/invoke.texi: Document -finline-only-static.
>>> +   * ipa-inline.c (can_inline_edge_p): Control inlining based on
>>> +   function's linkage.
>> 
>> I would suggest "internal" rather than "static" in general.  So
>> 
>> +N_("function has external linkage when the user requests only"
>> +   " inlining functions with internal linkage"))
>> 
>> +Inline functions only if they have internal linkage.
>> 
>> +@item -finline-only-internal
>> +@opindex finline-only-internal
>> +By default, GCC inlines functions without considering their linkage.
>> +This flag guides the inliner to only inline functions with internal linkage.
>> +This option has any effect only when inlining itself is turned on by the
>> +-finline-functions or -finline-small-functions options.
>> 
>> This should also mention whether it applies to functions explicitly
>> declared inline; I assume it does not.
> 
> IIRC he explicitely wanted 'static' not 'hidden' linkage.

Yes.  that’s the intention. It will be very helpful to compile the application 
with ONLY inlining
STATIC functions for online-patching purpose. 

>  Not sure
> what 'internal' would mean in this context.
> 
> But then the implementation looks at callee->externally_visible which
> matches hidden visibility... externally_visible is probably not
> the very best thing to look at depending on what we intend to do.

from the comments of callee->externally_visible in cgraph.h:

  /* Set when function is visible by other units.  */
  unsigned externally_visible : 1;

My understand of this “externally_visible” is:

this function is visible from other compilation units.

Is this correct?

> 
> What about 'static' functions with their address taken?
> 

such functions should still be inlined if -finline-only-static is specified. 

is there any issue with this?

> Note you shouldn't look at individual cgraph_node fields but use
> some of the accessors with more well-constrained semantics.
> Why are you not simply checking !TREE_PUBLIC?

Yes, looks like that TREE_PUBLIC(node->decl) might be better for this purpose.

> 
> Honza might be able to suggest one.

thanks.

Qing
> 
> Richard.



Re: [PATCH][Middle-end][Version 4]Add a new option to control inlining only on static functions

2018-09-26 Thread Qing Zhao


> On Sep 26, 2018, at 9:45 AM, Jeff Law  wrote:
> 
> On 9/26/18 7:38 AM, Jason Merrill wrote:
>> On Wed, Sep 26, 2018 at 9:24 AM, Richard Biener  wrote:
>>> IIRC he explicitely wanted 'static' not 'hidden' linkage.  Not sure
>>> what 'internal' would mean in this context.
>> 
>> I mean internal linkage as in the C and C++ standards.
> Since this is primarily for kernel hot patching, I think we're looking
> to restrict inlining to functions that have visibility limited to a
> compilation unit.

Yes. that’s the intention.

-finline-only-static will ONLY inline functions that are visible within the 
current compilation unit.

Qing
> 
> Qing, can you confirm that either way?
> 
> Jeff



Re: [PATCH][Middle-end][Version 4]Add a new option to control inlining only on static functions

2018-09-26 Thread Qing Zhao
Alexander,

thanks for the questions.

Yes, we had some discussion on the questions you raised during the review of 
the initial patch back to 9/11/2018.
please take a look at those discussions at:

https://gcc.gnu.org/ml/gcc-patches/2018-09/msg00549.html 
<https://gcc.gnu.org/ml/gcc-patches/2018-09/msg00549.html>
https://gcc.gnu.org/ml/gcc-patches/2018-09/msg00787.html 
<https://gcc.gnu.org/ml/gcc-patches/2018-09/msg00787.html>

and let me know if those discussion still does not answer your questions.

Qing

> On Sep 26, 2018, at 6:12 AM, Alexander Monakov  wrote:
> 
> On Fri, 21 Sep 2018, Qing Zhao wrote:
>> +2018-09-20  Qing Zhao  
>> +
>> +* cif-code.def (FUNCTION_EXTERN): New CIFCODE.
>> +* common.opt (-finline-only-static): New option.
>> +* doc/invoke.texi: Document -finline-only-static.
>> +* ipa-inline.c (can_inline_edge_p): Control inlining based on
>> +function's linkage. 
> 
> Note, I am not a reviewer.
> 
> In my opinion, there's a problem with the patch that it looks like an ad-hoc,
> incomplete solution. You said you need this change to help with building
> livepatching-capable kernels, but it's not clear what exactly the issue with
> inlining non-static functions is. Can you describe how the workflow looks like
> so code duplication due to inlining static functions is not an issue, but
> inlining non-static functions is a problem? Does using existing
> -fno-inline-functions flag achieve something useful for your usecase?
> 
> Please always make it clear what problem the patch is intended to solve and 
> help
> reviewers see the connection between the problem and your solution. Look how 
> the
> "XY problem" effect applies partially in this situation.
> 
> https://en.wikipedia.org/wiki/XY_problem
> http://xyproblem.info/
> 
> Alexander



Re: [PATCH][Middle-end][Version 4]Add a new option to control inlining only on static functions

2018-09-26 Thread Qing Zhao


> On Sep 26, 2018, at 10:06 AM, Jan Hubicka  wrote:
> 
>> On Wed, Sep 26, 2018 at 10:45 AM, Jeff Law  wrote:
>>> On 9/26/18 7:38 AM, Jason Merrill wrote:
 On Wed, Sep 26, 2018 at 9:24 AM, Richard Biener  wrote:
> IIRC he explicitely wanted 'static' not 'hidden' linkage.  Not sure
> what 'internal' would mean in this context.
 
 I mean internal linkage as in the C and C++ standards.
>> 
>>> Since this is primarily for kernel hot patching, I think we're looking
>>> to restrict inlining to functions that have visibility limited to a
>>> compilation unit.
>> 
>> Right, which is internal linkage.
>> 
>> C11: "Within one translation unit, each declaration of an identifier
>> with internal linkage denotes the same object or function."
>> C++17: "When a name has internal linkage, the entity it denotes can be
>> referred to by names from other scopes in the same translation unit."
>> 
>> Or perhaps we want to say "not external linkage", i.e. !TREE_PUBLIC as
>> richi suggested.
> 
> I am not quite sure if we can relate visibility flags we have at this stage
> to visibility in source languge in very coherent way.  They change a lot with
> LTO and we may want to make this option incompatible with LTO, but even w/o
> we can turn function static that previously wasn’t.

Looks like both LTO and whole_program need to be made incompatible with 
-finline-only-static. 
from my study of the function “cgraph_externally_visible_p”, comdat functions 
can ONLY be turned into
static when either “in_lto_p” or “whole_program” is true. 


> For example comdat that was cloned by IPA-SRA. See can_be_local_p and
> comdat_can_be_unshared_p predicates.  Similar problem happens to clones 
> created
> by ipa-cp.
> 
> I guess we want to disable localization and cloning in this case as well.
> I wonder what else.

Yes, I think we should make -finline-only-static incompatible with cloning and 
tree-sra too.

Qing
> 
> Honza



Re: [PATCH][Middle-end][Version 4]Add a new option to control inlining only on static functions

2018-09-26 Thread Qing Zhao


> On Sep 26, 2018, at 11:02 AM, Jan Hubicka  wrote:
> 
>>> Not sure
>>> what 'internal' would mean in this context.
>>> 
>>> But then the implementation looks at callee->externally_visible which
>>> matches hidden visibility... externally_visible is probably not
>>> the very best thing to look at depending on what we intend to do.
>> 
>> from the comments of callee->externally_visible in cgraph.h:
>> 
>>  /* Set when function is visible by other units.  */
>>  unsigned externally_visible : 1;
>> 
>> My understand of this “externally_visible” is:
>> 
>> this function is visible from other compilation units.
>> 
>> Is this correct?
> 
> Yes, but the catch is that we may "localize" previously visible function into
> invisible by clonning, see my previous email.

Yes, these are the cases that we need to take care carefully.

>>> Note you shouldn't look at individual cgraph_node fields but use
>>> some of the accessors with more well-constrained semantics.
>>> Why are you not simply checking !TREE_PUBLIC?
>> 
>> Yes, looks like that TREE_PUBLIC(node->decl) might be better for this 
>> purpose.
> 
> externally_visible is better approximation, TREE_PUBLIC is false i.e. for
> weakref aliases.  But we need to check places where externally visible 
> functions
> are turned to local.

So, do you suggest to make all the optimizations that might turn external 
visible functions into local 
as incompatible with -finline-only-static?


(a possible list of such optimizations include cloning, ipa-cp, ipa-sra, etc?)

> 
> What about comdats in general? What is the intended behaviour? If you prevent
> inlining them completely, C++ programs will be very slow.
> Also we still can analyze the body and derive some facts about them to drive
> optimization (such as discovering that they are const/pure etc). Do we want
> to disable this too?

my current understanding of comdat functions is:  they are external visible 
initially, but under
some special situation, (for example, comdat_can_be_unshared_p + in_lto_p), 
they can 
be converted to local by the compiler. 

so, for most of the comdat functions, they are NOT inlined by the current 
implementation if -finline-only-static
is specified due to they are externally_visible.  

I am not sure whether it’s necessary to change this current behavior for comdat 
functions, I assume that most
of the applications for online patching are not written by C++,  not sure 
whether this assumption is reasonable or not?

any suggestion?

thanks.

Qing



> Honza



  1   2   3   4   5   6   7   8   9   10   >