Re: [RFC][PATCH]Merge VEC_COND_EXPR into MASK_STORE after loop vectorization

2018-11-20 Thread Renlin Li

Hi Richard,

On 11/14/2018 02:59 PM, Richard Biener wrote:

On Fri, Nov 9, 2018 at 4:49 PM Renlin Li  wrote:


Hi Richard,

On 11/09/2018 11:48 AM, Richard Biener wrote:

On Thu, Nov 8, 2018 at 5:55 PM Renlin Li  wrote:


Hi Richard,



I don't see the masked load here on x86_64 btw. (I don't see
if-conversion generating a load).
I guess that's again when store-data-races are allowed that it uses a
RMW cycle and vectorization
generating the masked variants for the loop-mask.  Which means for SVE
if-conversion should
prefer the masked-store variant even when store data races are allowed?


Yes, it looks like, for SVE, masked-store variant is preferred even when store 
data races are allowed.
This decision is made in if-cvt.

mask_store need a pointer, and if it is created from an array access, we need 
to make sure the data reference analysis
could properly analysis relationship between array reference and pointer 
reference.
So that no versioned loop is generated during loop vectorization.
(This is a general improvement, and could be done in a different patch?)







I was wondering whether we can implement

l = [masked]load;
tem = cond ? x : l;
masked-store = tem;

pattern matching in a regular pass - forwprop for example.  Note the
load doesn't need to be masked,
correct?  In fact if it is masked you need to make sure the
conditional never accesses parts that
are masked in the load, no?  Or require the mask to be the same as
that used by the store.  But then
you still cannot simply replace the store mask with a new mask
generated from the conditional?


Yes, this would require the mask for load and store is the same.
This matches the pattern before loop vectorization.
The mask here is loop mask, to ensure we are bounded by the number of 
iterations.

The new mask is the (original mask & condition mask) (example shown above).
In this case, less lanes will be stored.

It is possible we do that in forwprop.
I could try to integrate the change into it if it is the correct place to go.

As the pattern is initially generated by loop vectorizer, I did the change 
right after it before it got
converted into other forms. For example, forwprop will transform the original 
code into:

vect__2.4_29 = vect_cst__27 + { 1, ... };
_16 = (void *) ivtmp.13_25;
_2 = [base: _16, offset: 0B];
vect__ifc__24.7_33 = .MASK_LOAD (_2, 4B, loop_mask_32);
_28 = vect_cst__34 != { 0, ... };
_35 = .COND_ADD (_28, vect_cst__27, { 1, ... }, vect__ifc__24.7_33);
vect__ifc__26.8_36 = _35;
.MASK_STORE (_2, 4B, loop_mask_32, vect__ifc__26.8_36);
ivtmp_41 = ivtmp_40 + POLY_INT_CST [4, 4];
next_mask_43 = .WHILE_ULT (ivtmp_41, 16, { 0, ... });
ivtmp.13_15 = ivtmp.13_25 + POLY_INT_CST [16, 16];

This make the pattern matching not straight forward.


Ah, that's because of the .COND_ADDs (I wonder about the copy that's
left - forwprop should eliminate copies).  Note the pattern-matching
could go in the

   /* Apply forward propagation to all stmts in the basic-block.
  Note we update GSI within the loop as necessary.  */

loop which comes before the match.pd pattern matching so you'd
still see the form without the .COND_ADD I think.

There _is_ precedence for some masked-store post-processing
in the vectorizer (optimize_mask_stores), not that I like that
very much either.  Eventually those can be at least combined...

Thanks for your suggestion, indeed .COND_ADD is generated later in fold_stmt 
function.

I update the patch with the style of "forward-propagation". It starts from
VEC_COND, and forward propagate it into MASK_STORE when specific pattern is 
found.

 X = MASK_LOAD (PTR, -, MASK)
 VAL = ...
 Y = VEC_COND (cond, VAL, X)
 MASK_STORE (PTR, -, MASK, Y)


That said, I prefer the forwprop place for any pattern matching
and the current patch needs more comments to understand
what it is doing (the DCE it does is IMHO premature).  You
should also modify the masked store in-place rather than
building a new one.  I don't like how you need to use
find_data_references_in_stmt, can't you simply compare
the address and size arguments?  


find_data_references_in_stmt is used because the two data reference are created
as two new SSA_NAMEs from same scalar pointer by loop vectorizer.
I can not directly compare the address as the are complicated with loop 
information.

By moving the functionality into forwprop, the complications are removed by the 
optimizers in between.
This makes a simple comparison possible.



It should also work for
a non-masked load I guess and thus apply to non-SVE if
we manage to feed the masked store with another conditional.


You are right, non-masked load is a load with a mask of all 1.
As long as the store mask is a subset of load mask, and they are loading from 
the same address,
we could do this combining. (I haven't add this yet as I don't have a test case 
to test it)
This probably indicates there are more cases we

Re: [PATCH] Come up with gcc/testsuite/g++.target/i386/i386.dg and move there some tests.

2018-11-16 Thread Renlin Li

Hi Martin,

Seems the change is not checked in yet?

Thanks,
Renlin

On 10/22/2018 01:22 PM, Martin Liška wrote:

On 10/22/18 12:09 PM, Jakub Jelinek wrote:

On Mon, Oct 22, 2018 at 12:04:23PM +0200, Martin Liška wrote:

I noticed that before the tests were run with all of
-std=(c|gnu)++(98|11|14), now with no explict -std option.  I wonder if
this is an issue.

Rainer



Hello.

I guess that should not be a problem.


If we want that, it is a matter of (untested):
--- gcc/testsuite/g++.target/i386/i386.exp.jj   2018-10-10 10:50:48.352235231 
+0200
+++ gcc/testsuite/g++.target/i386/i386.exp  2018-10-22 12:08:56.546807996 
+0200
@@ -35,8 +35,8 @@ dg-init
  clearcap-init
  
  # Main loop.

-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.C]] \
-"" $DEFAULT_CXXFLAGS
+g++-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.C]] \
+  "" $DEFAULT_CXXFLAGS
  
  # All done.

  clearcap-finish

Jakub



Thank you Jakub, works for me for:
$ make check -k RUNTESTFLAGS="i386.exp"

I can confirm that:
grep '^PASS' ./gcc/testsuite/g++/g++.log | wc -l


changed from 61 to 183.

I'm going to install the patch.

Martin




Re: [Patch, libstdc++.exp]Update the usage of cached result, rebuild nlocale wrapper for each variant.

2018-11-16 Thread Renlin Li




On 11/16/2018 01:20 PM, Jonathan Wakely wrote:

On 16/11/18 10:42 +, Renlin Li wrote:

Hi all,


Please remember that all patches for libstdc++ must be sent to the
libstdc++ list, as documented at https://gcc.gnu.org/lists.html
Just CCing me is not enough.


Hi Jonathan,

I knew I missed something!
Thanks, committed.

Regards,
Renlin



[Patch, libstdc++.exp]Update the usage of cached result, rebuild nlocale wrapper for each variant.

2018-11-16 Thread Renlin Li

Hi all,

Tejas noticed that libstdc++.exp currently builds nlocale driver 
(libstc++.exp:check_v3_target_namedlocale())
once for a test run. This is done irrespective of the number of variants in the 
site.exp file.
For eg. if we have more than one variant for different target profiles i.e.

/-mthumb/-march=armv8-a/-mfpu=crypto-neon-fp-armv8/-mfloat-abi=hard
/-mthumb/-mcpu=cortex-m0

nlocale.cpp is built once and is reused for all the variants.
This is incorrect as the same binary may not work on all target profiles - eg. 
nlocale.x built for arm A-profile
may not work on M-profile targets. nlocale needs to be rebuilt for each 
variant in site.exp. This patch fixes that.

Meanwhile, it updates all the usage of cached value with the new method.
This is similar to the recent change in gcc/testsuite/lib/target-support.exp
A global dictionary is used to store a property for a particular target, 
instead of the target check and update approach.
This factors the common code out of each procedure, reduce the length of 
libstdc++.exp file.


Tested on arm-none-eabi with the following variants in site.exp:

/-marm/-march=armv7-a/-mfpu=vfpv3-d16/-mfloat-abi=softfp
/-mthumb/-march=armv8-a/-mfpu=crypto-neon-fp-armv8/-mfloat-abi=hard
/-mthumb/-mcpu=cortex-m0
/-mthumb/-mcpu=cortex-m3
/-mthumb/-mcpu=cortex-m4
/-mthumb/-mcpu=cortex-m7
/-mthumb/-mcpu=cortex-m23
/-mthumb/-march=armv8-m.main


Tested on native x86_64,

   make check-target-libstdc++-v3

with default unix variant. There is no change on the result.


Okay to commit?

Regards,
Renlin


gcc/libstdc++-v3/:
2018-11-16  Renlin Li  
Tejas Belagod  

testsuite/lib/libstdc++.exp (check_v3_target_prop_cached): New proc.
(check_v3_target): Use the check_v3_target_prop_cached.
diff --git a/libstdc++-v3/testsuite/lib/libstdc++.exp b/libstdc++-v3/testsuite/lib/libstdc++.exp
index 79d8e0130dcefdd8ccb67ad45f81ff12a3703600..7047b8f7b2233911445abaed54337bc46b37b7e5 100644
--- a/libstdc++-v3/testsuite/lib/libstdc++.exp
+++ b/libstdc++-v3/testsuite/lib/libstdc++.exp
@@ -688,31 +688,38 @@ proc v3-build_support { } {
 }
 }
 
-proc check_v3_target_fileio { } {
-global et_fileio_saved
-global et_fileio_target_name
-global tool
-global srcdir
+# Implement an target check for property PROP by invoking
+# the Tcl command ARGS and seeing if it returns true.
 
-if { ![info exists et_fileio_target_name] } {
-	set et_fileio_target_name ""
-}
+proc check_v3_target_prop_cached { prop args } {
+global et_cache
 
-# If the target has changed since we set the cached value, clear it.
-set current_target [current_target_name]
-if { $current_target != $et_fileio_target_name } {
-	verbose "check_v3_target_fileio: `$et_fileio_target_name'" 2
-	set et_fileio_target_name $current_target
-	if [info exists et_fileio_saved] {
-	verbose "check_v3_target_fileio: removing cached result" 2
-	unset et_fileio_saved
+set target [current_target_name]
+if {![info exists et_cache($prop,$target)]} {
+	verbose "check_v3_target_prop_cached $prop: checking $target" 2
+	if {[string is true -strict $args] || [string is false -strict $args]} {
+	error {check_v3_target_prop_cached condition already evaluated; did you pass [...] instead of the expected {...}?}
+	} else {
+	set code [catch {uplevel eval $args} result]
+	if {$code != 0 && $code != 2} {
+		verbose "check_v3_target_prop_cached $prop: evaluation failed for $target" 2
+		return -code $code $result
+	}
+	set et_cache($prop,$target) $result
 	}
+} else {
+	verbose "check_v3_target_prop_cached $prop $target: using cached result" 2
 }
 
-if [info exists et_fileio_saved] {
-	verbose "check_v3_target_fileio: using cached result" 2
-} else {
-	set et_fileio_saved 0
+set value $et_cache($prop,$target)
+verbose "check_v3_target_prop_cached $prop: returning $value for $target" 2
+return $value
+}
+
+proc check_v3_target_fileio { } {
+return [check_v3_target_prop_cached et_fileio {
+	global tool
+	global srcdir
 
 	# Set up, compile, and execute a C++ test program that tries to use
 	# the file functions
@@ -766,41 +773,19 @@ proc check_v3_target_fileio { } {
 	verbose "check_v3_target_fileio: status is <$status>" 2
 
 	if { $status == "pass" } {
-		set et_fileio_saved 1
+		return 1
 	}
 	} else {
 	verbose "check_v3_target_fileio: compilation failed" 2
 	}
-}
-return $et_fileio_saved
+	return 0
+}]
 }
 
 # Eventually we want C90/C99 determining and switching from this.
 proc check_v3_target_c_std { } {
-global et_c_std_saved
-global et_c_std_target_name
-global tool
-
-if { ![info exists et_c_std_target_name] } {
-	set et_c_std_target_name ""
-}
-
-# If the target has changed since we set the 

Re: [PATCH][LRA] Fix PR87899: r264897 cause mis-compiled native arm-linux-gnueabihf toolchain

2018-11-13 Thread Renlin Li

Hi Peter,

I could verify that, your patch fixes all the ICEs I saw with 
arm-linux-gnueabihf toolchain!
There are some differences on the test results, because I compare the latest 
results with something which is old.

I haven't test it on bare-metal toolchain yet. But will do to ensure all 
related issues are fixed.

Thanks for fixing it!

Regards,
Renlin



On 11/12/2018 08:25 PM, Peter Bergner wrote:

On 11/12/18 6:25 AM, Renlin Li wrote:

I tried to build a native arm-linuxeabihf toolchain with the patch.
But I got the following ICE:


Ok, the issue was a problem in handling the src reg from a register copy.
I thought I could just remove it from the dead_set, but forgot that the
updating of the program points looks at whether the pseudo is live or
not.  The change below on top of the previous patch fixes the ICE for me.
I now add the src reg back into pseudos_live before we process the insn's
input operands so it doesn't trigger a new program point being added.

Renlin and Jeff, can you apply this patch on top of the previous one
and see whether that is better?

Thanks.

Peter


--- gcc/lra-lives.c.orig2018-11-12 14:15:18.257657911 -0600
+++ gcc/lra-lives.c 2018-11-12 14:08:55.978795092 -0600
@@ -934,6 +934,18 @@
  || sparseset_contains_pseudos_p (start_dying))
next_program_point (curr_point, freq);
  
+  /* If we removed the source reg from a simple register copy from the

+live set above, then add it back now so we don't accidentally add
+it to the start_living set below.  */
+  if (ignore_reg != NULL_RTX)
+   {
+ int ignore_regno = REGNO (ignore_reg);
+ if (HARD_REGISTER_NUM_P (ignore_regno))
+   SET_HARD_REG_BIT (hard_regs_live, ignore_regno);
+ else
+   sparseset_set_bit (pseudos_live, ignore_regno);
+   }
+
sparseset_clear (start_living);
  
/* Mark each used value as live.	*/

@@ -959,11 +971,6 @@
  
sparseset_and_compl (dead_set, start_living, start_dying);
  
-  /* If we removed the source reg from a simple register copy from the

-live set, then it will appear to be dead, but it really isn't.  */
-  if (ignore_reg != NULL_RTX)
-   sparseset_clear_bit (dead_set, REGNO (ignore_reg));
-
sparseset_clear (start_dying);
  
/* Mark early clobber outputs dead.  */




Re: [PATCH][LRA] Fix PR87899: r264897 cause mis-compiled native arm-linux-gnueabihf toolchain

2018-11-12 Thread Renlin Li

Hi Peter,

Thanks for the patch! It makes much more sense to me to split those functions, 
and use them separately.

I tried to build a native arm-linuxeabihf toolchain with the patch. But I got 
the following ICE:

/home/renlin/try-new/./gcc/xgcc -B/home/renlin/try-new/./gcc/ -B/usr/local/arm-none-linux-gnueabihf/bin/ -B/usr/local/arm-none-linux-gnueabihf/lib/ 
-isystem /usr/local/arm-none-linux-gnueabihf/include -isystem /usr/local/arm-none-linux-gnueabihf/sys-include   -fno-checking -O2 -g -O0 -O2  -O2 -g 
-O0 -DIN_GCC-W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wno-format -Wstrict-prototypes -Wmissing-prototypes -Wold-style-definition 
-isystem ./include   -fPIC -fno-inline -g -DIN_LIBGCC2 -fbuilding-libgcc -fno-stack-protector   -fPIC -fno-inline -I. -I. -I../.././gcc 
-I../../../gcc/libgcc -I../../../gcc/libgcc/. -I../../../gcc/libgcc/../gcc -I../../../gcc/libgcc/../include  -DHAVE_CC_TLS  -o _negvdi2_s.o -MT 
_negvdi2_s.o -MD -MP -MF _negvdi2_s.dep -DSHARED -DL_negvdi2 -c ../../../gcc/libgcc/libgcc2.c

0x807eb3 lra(_IO_FILE*)
../../gcc/gcc/lra.c:2497
0x7c2755 do_reload
../../gcc/gcc/ira.c:5469
0x7c2c11 execute
../../gcc/gcc/ira.c:5653
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.
make[3]: *** [Makefile:916: _gcov_merge_icall_topn.o] Error 1
make[3]: *** Waiting for unfinished jobs
make[3]: *** [Makefile:916: _gcov_merge_single.o] Error 1
during RTL pass: reload
../../../gcc/libgcc/libgcov-driver.c: In function 
‘gcov_sort_icall_topn_counter’:
../../../gcc/libgcc/libgcov-driver.c:436:1: internal compiler error: in 
remove_some_program_points_and_update_live_ranges, at lra-lives.c:1172
436 | }
| ^
0x829189 remove_some_program_points_and_update_live_ranges
../../gcc/gcc/lra-lives.c:1172
0x829683 compress_live_ranges
../../gcc/gcc/lra-lives.c:1301
0x829d45 lra_create_live_ranges_1
../../gcc/gcc/lra-lives.c:1454
0x829d7d lra_create_live_ranges(bool, bool)
../../gcc/gcc/lra-lives.c:1466
0x807eb3 lra(_IO_FILE*)
../../gcc/gcc/lra.c:2497
0x7c2755 do_reload
../../gcc/gcc/ira.c:5469
0x7c2c11 execute
../../gcc/gcc/ira.c:5653
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.


Regards,
Renlin

On 11/12/2018 04:34 AM, Peter Bergner wrote:

Renlin, Jeff and Vlad: requests and questions for you below...

PR87899 shows another latent LRA bug exposed by my r264897 commit.
In the bugzilla report, we have the following rtl in LRA:

   (insn 1 (set (reg:SI 1 r1) (reg/f:SI 2040)))
...
   (insn 2 (set (mem/f/c:SI (pre_modify:SI (reg:SI 1 r1)
   (plus:SI (reg:SI 1 r1)
(const_int 12
(reg:SI 1048))
   (expr_list:REG_INC (reg:SI 1 r1)))
...
   

My earlier patch now sees the reg copy in insn "1" and correctly skips
adding a conflict between r1 and r2040 due to the copy.  However, insn "2"
updates r1 and r2040 is live across that update and so we should create
a conflict between them, but we currently do not and that leads to us
assigning r1 to one of r2040's reload pseudos which gets clobbered by
the r1 update in insn "2".

The reason a conflict was never added between r1 and r2040 is that LRA
skips INOUT operands when computing conflicts and so misses the definition
of r1 in insn "2" and so never adds conflicts for it.  The reason the code
skips the INOUT operands is that LRA doesn't want to create new program
points for INOUT operands, since unnecessary program points can slow down
remove_some_program_points_and_update_live_ranges.  This was all fine
before when we had conservative conflict info, but now we cannot ignore
INOUT operands.

The heart of the problem is that the {make,mark}_*_{live,dead} routines
update the liveness, conflict and program point information for operands.
My solution to the problem was to pull out the updating of the program point
info from {make,mark}_*_{live,dead} and have them only update liveness and
conflict information.  I then created a separate function that is used for
updating an operand's program points.  This allowed me to modify the insn
operand scanning to handle all operand types (IN, OUT and INOUT) and always
call the {make,mark}_*_{live,dead} functions for all operand types, while
only calling the new program point update function for IN and OUT operands.

This change then allowed me to remove the hacky handling of conflicts for
reg copies and instead use the more common method of removing the src reg
of a copy from the live set before handling the copy's definition, thereby
skipping the unwanted conflict.  Bonus! :-)

This passes bootstrap and regtesting on powerpc64le-linux with no regressions.


Re: [RFC][PATCH]Merge VEC_COND_EXPR into MASK_STORE after loop vectorization

2018-11-09 Thread Renlin Li

Hi Richard,

On 11/09/2018 11:48 AM, Richard Biener wrote:

On Thu, Nov 8, 2018 at 5:55 PM Renlin Li  wrote:


Hi Richard,


*However*, after I rebased my patch on the latest trunk.
Got the following dump from ifcvt:
 [local count: 1006632961]:
# i_20 = PHI 
# ivtmp_18 = PHI 
a_10 = array[i_20];
_1 = a_10 & 1;
_2 = a_10 + 1;
_ifc__34 = _1 != 0 ? _2 : a_10;
array[i_20] = _ifc__34;
_4 = a_10 + 2;
_ifc__37 = _ifc__34 > 10 ? _4 : _ifc__34;
array[i_20] = _ifc__37;
i_13 = i_20 + 1;
ivtmp_5 = ivtmp_18 - 1;
if (ivtmp_5 != 0)
  goto ; [93.33%]
else
  goto ; [6.67%]

the redundant load is not generated, but you could still see the unconditional 
store.


Yes, I fixed the redundant loads recently and indeed dead stores
remain (for the particular
testcase they would be easy to remove).


Right.




After loop vectorization, the following is generated (without my change):


Huh.  But that's not because of if-conversion but because SVE needs to
mask _all_
loop operations that are not safe to execute with the loop_mask!


vect_a_10.6_6 = .MASK_LOAD (vectp_array.4_35, 4B, loop_mask_7);
a_10 = array[i_20];
vect__1.7_39 = vect_a_10.6_6 & vect_cst__38;
_1 = a_10 & 1;
vect__2.8_41 = vect_a_10.6_6 + vect_cst__40;
_2 = a_10 + 1;
vect__ifc__34.9_43 = VEC_COND_EXPR ;
_ifc__34 = _1 != 0 ? _2 : a_10;
.MASK_STORE (vectp_array.10_45, 4B, loop_mask_7, vect__ifc__34.9_43);
vect__4.12_49 = vect_a_10.6_6 + vect_cst__48;
_4 = a_10 + 2;
vect__ifc__37.13_51 = VEC_COND_EXPR  vect_cst__50, 
vect__4.12_49, vect__ifc__34.9_43>;
_ifc__37 = _ifc__34 > 10 ? _4 : _ifc__34;
.MASK_STORE (vectp_array.14_53, 4B, loop_mask_7, vect__ifc__37.13_51);

With the old ifcvt code, my change here could improve it a little bit, 
eliminate some redundant load.
With the new code, it could not improved it further. I'll adjust the patch 
based on the latest trunk.


So what does the patch change the above to?  The code has little to no
comments apart from a
small picture with code _before_ the transform...

It is like this:
  vect_a_10.6_6 = .MASK_LOAD (vectp_array.4_35, 4B, loop_mask_7);
  a_10 = array[i_20];
  vect__1.7_39 = vect_a_10.6_6 & vect_cst__38;
  _1 = a_10 & 1;
  vect__2.8_41 = vect_a_10.6_6 + vect_cst__40;
  _2 = a_10 + 1;
  _60 = vect__1.7_39 != vect_cst__42;
  vect__ifc__34.9_43 = VEC_COND_EXPR <_60, vect__2.8_41, vect_a_10.6_6>;
  _ifc__34 = _1 != 0 ? _2 : a_10;
  vec_mask_and_61 = _60 & loop_mask_7;
  .MASK_STORE (vectp_array.10_45, 4B, vec_mask_and_61, vect__2.8_41);
  vect__4.12_49 = vect_a_10.6_6 + vect_cst__48;
  _4 = a_10 + 2;
  vect__ifc__37.13_51 = VEC_COND_EXPR  vect_cst__50, 
vect__4.12_49, vect__ifc__34.9_43>;
  _ifc__37 = _ifc__34 > 10 ? _4 : _ifc__34;
  .MASK_STORE (vectp_array.14_53, 4B, loop_mask_7, vect__ifc__37.13_51);

As the loaded value is used later, It could not be removed.

With the change, ideally, less data is stored.
However, it might generate more instructions.

1, The load is not eliminable. Apparently, your change eliminate most of the 
redundant load.
   The rest is necessary or not easy to remove.
2, additional AND instruction.

With a simpler test case like this:

static int array[100];
int test (int a, int i)
{
  for (unsigned i = 0; i < 16; i++)
{
  if (a & 1)
array[i] = a + 1;
}
  return array[i];
}

The new code-gen will be:
  vect__2.4_29 = vect_cst__27 + vect_cst__28;
  _44 = vect_cst__34 != vect_cst__35;
  vec_mask_and_45 = _44 & loop_mask_32;
  .MASK_STORE (vectp_array.9_37, 4B, vec_mask_and_45, vect__2.4_29);

While the old one is:

  vect__2.4_29 = vect_cst__27 + vect_cst__28;
  vect__ifc__24.7_33 = .MASK_LOAD (vectp_array.5_30, 4B, loop_mask_32);
  vect__ifc__26.8_36 = VEC_COND_EXPR ;
  .MASK_STORE (vectp_array.9_37, 4B, loop_mask_32, vect__ifc__26.8_36);




I was wondering whether we can implement

   l = [masked]load;
   tem = cond ? x : l;
   masked-store = tem;

pattern matching in a regular pass - forwprop for example.  Note the
load doesn't need to be masked,
correct?  In fact if it is masked you need to make sure the
conditional never accesses parts that
are masked in the load, no?  Or require the mask to be the same as
that used by the store.  But then
you still cannot simply replace the store mask with a new mask
generated from the conditional?


Yes, this would require the mask for load and store is the same.
This matches the pattern before loop vectorization.
The mask here is loop mask, to ensure we are bounded by the number of 
iterations.

The new mask is the (original mask & condition mask) (example shown above).
In this case, less lanes will be stored.

It is possible we do that in forwprop.
I could try to integrate the change into it if it is the correct place to go.

As the pattern is initially generated by loop vectorizer, I did the change 
right after it before it got
converted into ot

[AARCH64][SVE]Add extract_last for mask/predicates mode register

2018-11-08 Thread Renlin Li

Hi all,

As a follow up patch described here: 
https://gcc.gnu.org/ml/gcc-patches/2018-10/msg02016.html

Mask/predicate type of data could be used as general data.

In sve ISA, we don't have operations which could directly extract element from a
predicate. The default code-gen for such use is in-efficient, it use
memory to reload the predicate into a scalar GPR.

Here, an EXTRACT_LAST pattern is added to support mask mode.
So that,
EXTRACT_LAST (mask_1, mask_2)
will expands to is:
mov Z0, mask_2, 1
lastb W0, mask_1, Z0

aarch64-sve test Okay.
Okay to commit?

There might be more cases need to be discovered/fixed.

Regards,
Renlin

gcc/ChangeLog:

2018-11-08  Renlin Li  

* config/aarch64/aarch64-sve.md (extract_last): Add new modes.
* config/aarch64/iterators.md (PREDV): predicate mode to vector mode 
mapping.
(predv): Likewise, lower case.

gcc/testsuite/ChangeLog:

2018-11-08  Renlin Li  

* gcc.target/aarch64/sve/pr87815.c: Update.

diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md
index 5cd591b94335cde2230decf632f65c0faf33c4de..0550a2bd4b1b552159fad298342876afdd34303b 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -615,6 +615,28 @@
lastb\t%0, %1, %2."
 )
 
+(define_insn_and_split "extract_last_"
+  [(set (match_operand: 0 "register_operand" "=r")
+	(unspec:
+	  [(match_operand:PRED_ALL 1 "register_operand" "Upl")
+	   (match_operand:PRED_ALL 2 "register_operand" "Upl")]
+	  UNSPEC_LASTB))
+   (clobber (match_scratch: 3 "=w"))]
+  "TARGET_SVE"
+  "#"
+  "&& true"
+  [(const_int 0)]
+  {
+if (GET_CODE (operands[3]) == SCRATCH)
+  operands[3] = gen_reg_rtx (mode);
+emit_insn (gen_aarch64_sve_dup_const (operands[3], operands[2],
+		 CONST1_RTX (mode),
+		 CONST0_RTX (mode)));
+emit_insn (gen_extract_last_ (operands[0], operands[1], operands[3]));
+DONE;
+  }
+)
+
 (define_expand "vec_duplicate"
   [(parallel
 [(set (match_operand:SVE_ALL 0 "register_operand")
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index a43956054e82aaf651fb45d0ff254b248c02c644..3f01c0f611173f9cdfcc150fc6c88141e7b7ebf8 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -698,8 +698,10 @@
 			(V4HF "HF") (V8HF  "HF") (VNx8HF  "HF")
 			(V2SF "SF") (V4SF  "SF") (VNx4SF  "SF")
 			(DF   "DF") (V2DF  "DF") (VNx2DF  "DF")
-			(SI   "SI") (HI"HI")
-			(QI   "QI")])
+			(SI   "SI")  (VNx16BI "QI")
+			(HI   "HI")  (VNx8BI  "HI")
+			(QI   "QI")  (VNx4BI  "SI")
+			 (VNx2BI  "DI")])
 
 ;; Define element mode for each vector mode (lower case).
 (define_mode_attr Vel [(V8QI "qi") (V16QI "qi") (VNx16QI "qi")
@@ -1134,6 +1136,12 @@
 			 (VNx16SI "vnx4bi") (VNx16SF "vnx4bi")
 			 (VNx8DI "vnx2bi") (VNx8DF "vnx2bi")])
 
+(define_mode_attr PREDV [(VNx16BI "VNx16QI") (VNx8BI "VNx8HI")
+			 (VNx4BI  "VNx4SI")  (VNx2BI "VNx2DI")])
+
+(define_mode_attr predv [(VNx16BI "vnx16qi") (VNx8BI "vnx8hi")
+			 (VNx4BI  "vnx4si")  (VNx2BI "vnx2di")])
+
 ;; ---
 ;; Code Iterators
 ;; ---
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr87815.c b/gcc/testsuite/gcc.target/aarch64/sve/pr87815.c
index 628cedb2acce82a86b61944eb6184d7fdbb2d656..82e73a8211a1f84d799be6f3f9137e296c59792c 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/pr87815.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pr87815.c
@@ -1,5 +1,5 @@
-/* { dg-do compile { target aarch64_asm_sve_ok } } */
-/* { dg-options "-O3" } */
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-vect" } */
 int a, b, d;
 short e;
 
@@ -11,3 +11,6 @@ void f ()
   d = e && b;
 }
 }
+
+/* { dg-final { scan-tree-dump-times "EXTRACT_LAST" 1 "vect" } } */
+/* { dg-final { scan-assembler-times {lastb} 1 } } */



Re: [RFC][PATCH]Merge VEC_COND_EXPR into MASK_STORE after loop vectorization

2018-11-08 Thread Renlin Li

Hi Richard,

On 11/08/2018 12:09 PM, Richard Biener wrote:

On Thu, Nov 8, 2018 at 12:02 PM Renlin Li  wrote:


Hi all,

When allow-store-data-races is enabled, ifcvt would prefer to generated
conditional select and unconditional store to convert certain if statement
into:

_ifc_1 = val
_ifc_2 = A[i]
val = cond? _ifc_1 : _ifc_2
A[i] = val

When the loop got vectorized, this pattern will be turned into
MASK_LOAD, VEC_COND_EXPR and MASK_STORE. This could be improved.


I'm somewhat confused - the vectorizer doesn't generate a masked store when
if-conversion didn't create one in the first place

In particular with allow-store-data-races=1 (what your testcase uses)
there are no
masked loads/stores generated at all.   So at least you need a better testcase
to motivate (one that doesn't load from array[i] so that we know the conditional
stores might trap).


Thanks for trying this. The test case is a little bit simple and artificial.
ifcvt won't generate mask_store, instead it will generate unconditional store 
with allow-store-data-races=1.

My build is based on 25th Oct. I got the following IR from ifcvt with
aarch64-none-elf-gcc -S -march=armv8-a+sve -O2 -ftree-vectorize --param 
allow-store-data-races=1

   [local count: 1006632961]:
  # i_20 = PHI 
  # ivtmp_18 = PHI 
  a_10 = array[i_20];
  _1 = a_10 & 1;
  _2 = a_10 + 1;
  _ifc__32 = array[i_20];
  _ifc__33 = _2;
  _ifc__34 = _1 != 0 ? _ifc__33 : _ifc__32;
  array[i_20] = _ifc__34;
  prephitmp_8 = _1 != 0 ? _2 : a_10;
  _4 = a_10 + 2;
  _ifc__35 = array[i_20];
  _ifc__36 = _4;
  _ifc__37 = prephitmp_8 > 10 ? _ifc__36 : _ifc__35;
  array[i_20] = _ifc__37;
  i_13 = i_20 + 1;
  ivtmp_5 = ivtmp_18 - 1;
  if (ivtmp_5 != 0)
goto ; [93.33%]
  else
goto ; [6.67%]

*However*, after I rebased my patch on the latest trunk.
Got the following dump from ifcvt:
   [local count: 1006632961]:
  # i_20 = PHI 
  # ivtmp_18 = PHI 
  a_10 = array[i_20];
  _1 = a_10 & 1;
  _2 = a_10 + 1;
  _ifc__34 = _1 != 0 ? _2 : a_10;
  array[i_20] = _ifc__34;
  _4 = a_10 + 2;
  _ifc__37 = _ifc__34 > 10 ? _4 : _ifc__34;
  array[i_20] = _ifc__37;
  i_13 = i_20 + 1;
  ivtmp_5 = ivtmp_18 - 1;
  if (ivtmp_5 != 0)
goto ; [93.33%]
  else
goto ; [6.67%]

the redundant load is not generated, but you could still see the unconditional 
store.
After loop vectorization, the following is generated (without my change):

  vect_a_10.6_6 = .MASK_LOAD (vectp_array.4_35, 4B, loop_mask_7);
  a_10 = array[i_20];
  vect__1.7_39 = vect_a_10.6_6 & vect_cst__38;
  _1 = a_10 & 1;
  vect__2.8_41 = vect_a_10.6_6 + vect_cst__40;
  _2 = a_10 + 1;
  vect__ifc__34.9_43 = VEC_COND_EXPR ;
  _ifc__34 = _1 != 0 ? _2 : a_10;
  .MASK_STORE (vectp_array.10_45, 4B, loop_mask_7, vect__ifc__34.9_43);
  vect__4.12_49 = vect_a_10.6_6 + vect_cst__48;
  _4 = a_10 + 2;
  vect__ifc__37.13_51 = VEC_COND_EXPR  vect_cst__50, 
vect__4.12_49, vect__ifc__34.9_43>;
  _ifc__37 = _ifc__34 > 10 ? _4 : _ifc__34;
  .MASK_STORE (vectp_array.14_53, 4B, loop_mask_7, vect__ifc__37.13_51);

With the old ifcvt code, my change here could improve it a little bit, 
eliminate some redundant load.
With the new code, it could not improved it further. I'll adjust the patch 
based on the latest trunk.




So what I see (with store data races not allowed) from ifcvt is


when store data races is not allowed, we won't generate unconditional store. 
Instead ifcvt
generates predicated store. That's what you showed here.

As I mentioned, we could always make ifcvt generate mask_store as it should be 
always safe.
But I don't know the performance implication on other targets (I assume there 
must be reasons why
people write code to generate unconditional store when data-race is allowed? 
What I understand is that,
this option allows the compiler to be more aggressive on optimization).

The other reason is the data reference analysis. There might be versioned loop 
created with a
more complexer test case.

Again, I need to rebase and check my patch with the latest trunk, and need to 
come up with a better test case.



[local count: 1006632961]:
   # i_20 = PHI 
   # ivtmp_18 = PHI 
   a_10 = array[i_20];
   _1 = a_10 & 1;
   _2 = a_10 + 1;
   _32 = _1 != 0;
   _33 = [i_20];
   .MASK_STORE (_33, 32B, _32, _2);
   prephitmp_8 = _1 != 0 ? _2 : a_10;
   _4 = a_10 + 2;
   _34 = prephitmp_8 > 10;
   .MASK_STORE (_33, 32B, _34, _4);
   i_13 = i_20 + 1;
   ivtmp_5 = ivtmp_18 - 1;
   if (ivtmp_5 != 0)

and what you want to do is merge

   prephitmp_8 = _1 != 0 ? _2 : a_10;
   _34 = prephitmp_8 > 10;

somehow?  But your patch mentions that _4 should be prephitmp_8 so
it wouldn't do anything here?


The change here add a post processing function to combine the VEC_COND_EXPR
expression into MASK_STORE, and delete related dead code.

I am a little bit conservative here.
I didn't change the default behavior of ifcvt to always generate MASK_STORE,
although it should be safe in all cases (allow or dis-allow store data rac

Re: [PATCH 2/2 v3][IRA,LRA] Fix PR86939, IRA incorrectly creates an interference between a pseudo register and a hard register

2018-11-08 Thread Renlin Li

Hi Peter,

On 11/08/2018 03:21 PM, Peter Bergner wrote:

On 11/8/18 4:57 AM, Renlin Li wrote:

I think I found the problem!

As described in the PR, a hard register is used in
an pre/post modify expression. The hard register is live, but updated.
In this case, we should make it conflicting with all pseudos live at
that point.  Does it make sense?

[snip]

It fixes the ICE of mis-compiled arm-linux-gnueabihf toolchain described in the
PR.

I attached the patch for discussion.  I haven't give a complete test on arm or
any other targets, yet. (Probably need more adjusting)


Yes, this is the problem.  We see from the dump, that r2040 does not conflict 
with
hard reg r1:

;; a2040(r1597,l0) conflicts: 
;; total conflict hard regs:
;; conflict hard regs:

I think you should look for axxx(r2040, ..)?

Maybe I am wrong (not an expert of RA), from what I observed, it is the LRA
makes the code more complex. It decides to split the live range and spill r2040.
It creates multiple instructions to reload it.
r2944 in LRA dump is the register which starts to go wrong. It is assigned as 
r1.


  Creating newreg=2944 from oldreg=2040, assigning class GENERAL_REGS to 
inheritance r2944
Original reg change 2040->2944 (bb2):
 10905: r1:SI=r2944:SI
Add inheritance<-original before:
 12868: r2944:SI=r2040:SI

The dump is the final state of LRA. I debug it with gdb, and there are some 
temporary steps
which is not observable in the final dump.



...and we have the following RTL:

(insn 10905 179 199 2 (set (reg:SI 1 r1)
 (reg/f:SI 2040)) "../../gcc/gcc/vec.h":1654 647 {*thumb2_movsi_vfp}
  (nil))

...

(insn 208 202 182 2 (set (mem/f/c:SI (pre_modify:SI (reg:SI 1 r1)
 (plus:SI (reg:SI 1 r1)
 (const_int 12 [0xc]))) [129 loop_nest.m_vec+0 S4 A32])
 (reg:SI 1048)) "../../gcc/gcc/vec.h":1654 647 {*thumb2_movsi_vfp}
  (expr_list:REG_INC (reg:SI 1 r1)
 (nil)))

...

(insn 191 189 192 2 (set (mem/f/c:SI (plus:SI (reg/f:SI 2040)
 (const_int 8 [0x8])) [367 ddrs_table+0 S4 A32])
 (reg/f:SI 1047)) "../../gcc/gcc/tree-loop-distribution.c":2741 647 
{*thumb2_movsi_vfp}
  (nil))

So my patch caused us to (correctly) skip adding a conflict between r1 and
r2040 due to the register copy in insn 10905.  However, they really should
conflict as you found due to the definition of r1 in insn 208 and the fact
we don't add one is a latent bug in LRA.  I think your patch is on the right
track, but not totally there yet.  Imagine instead that the references to r1
and r2040 were swapped, so instead we have:

(insn 10905 179 199 2 (set (reg:SI 2040)
 (reg/f:SI 1 r1)) "../../gcc/gcc/vec.h":1654 647 {*thumb2_movsi_vfp}
  (nil))

...

(insn 208 202 182 2 (set (mem/f/c:SI (pre_modify:SI (reg:SI 2040)
 (plus:SI (reg:SI 2040)
 (const_int 12 [0xc]))) [129 loop_nest.m_vec+0 S4 A32])
 (reg:SI 1048)) "../../gcc/gcc/vec.h":1654 647 {*thumb2_movsi_vfp}
  (expr_list:REG_INC (reg:SI 2040)
 (nil)))

...

(insn 191 189 192 2 (set (mem/f/c:SI (plus:SI (reg/f:SI 1 r1)
 (const_int 8 [0x8])) [367 ddrs_table+0 S4 A32])
 (reg/f:SI 1047)) "../../gcc/gcc/tree-loop-distribution.c":2741 647 
{*thumb2_movsi_vfp}
  (nil))

Even with your patch, we'd miss adding the conflict between r1 and r2040.
Let me think about how we should solve this one.


Yes, I am not confident the patch will be the ultimate fix to the problem.



And a *BIG* thank you for tracking down the problem!!!


Nop.

Regards,
Renlin

Peter



Re: [PATCH 2/2 v3][IRA,LRA] Fix PR86939, IRA incorrectly creates an interference between a pseudo register and a hard register

2018-11-08 Thread Renlin Li

Hi Peter,

On 11/08/2018 12:35 PM, Peter Bergner wrote:

On 11/8/18 4:57 AM, Renlin Li wrote:

I think I found the problem!

As described in the PR, a hard register is used in
an pre/post modify expression. The hard register is live, but updated.
In this case, we should make it conflicting with all pseudos live at
that point.  Does it make sense?


Do you have a reproducer test case I can look at?  I'd like to see the
problematical rtl to help me determine whether your patch is correct
or not.  ...and thank you for debugging this!

Peter



Sure! (I was trying to send the mail, but it failed with large attachment.)
I attached the dump file in the bugzilla ticket: 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87899
I remove the unrelated dump (tar -xzvf xxx.tgz)

The code you want to check is the following in ira pass:
insn 10905: r1 = r2040
insn 208: use and update r1 with pre_modify
insn 191: use pseudo r2040

I could not create a test case. This dump is created with stage1 compiler 
compiling next stage compiler.
The (not helpful) command line is:


/home/renlin/try-new/./prev-gcc/cc1plus -quiet -nostdinc++ -v -I 
/home/renlin/try-new/prev-arm-none-linux-gnueabihf/libstdc++-v3/include/arm-none-linux-gnueabihf -I 
/home/renlin/try-new/prev-arm-none-linux-gnueabihf/libstdc++-v3/include -I /home/renlin/gcc/libstdc++-v3/libsupc++ -I . -I . -I ../../gcc/gcc -I 
../../gcc/gcc/. -I ../../gcc/gcc/../include -I ../../gcc/gcc/../libcpp/include -I ../../gcc/gcc/../libdecnumber -I ../../gcc/gcc/../libdecnumber/dpd 
-I ../libdecnumber -I ../../gcc/gcc/../libbacktrace -imultilib . -imultiarch arm-linux-gnueabihf -iprefix 
/home/renlin/try-new/prev-gcc/../lib/gcc/arm-none-linux-gnueabihf/9.0.0/ -isystem /home/renlin/try-new/./prev-gcc/include -isystem 
/home/renlin/try-new/./prev-gcc/include-fixed -MMD tree-loop-distribution.d -MF ./.deps/tree-loop-distribution.TPo -MP -MT tree-loop-distribution.o 
-D_GNU_SOURCE -D IN_GCC -D HAVE_CONFIG_H ../../gcc/gcc/tree-loop-distribution.c -quiet -dumpbase tree-loop-distribution.c -mfloat-abi=hard -mfpu=neon 
-mthumb -mtls-dialect=gnu -march=armv7-a+simd -auxbase-strip tree-loop-distribution.s -g -gtoggle -O2 -Wextra -Wall -Wno-narrowing -Wwrite-strings 
-Wcast-qual -Wsuggest-attribute=format -Woverloaded-virtual -Wpedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror -version 
-fno-PIE -fno-checking -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -fno-common -o tree-loop-distribution.s


Regards,
Renlin


[RFC][PATCH]Merge VEC_COND_EXPR into MASK_STORE after loop vectorization

2018-11-08 Thread Renlin Li

Hi all,

When allow-store-data-races is enabled, ifcvt would prefer to generated
conditional select and unconditional store to convert certain if statement
into:

_ifc_1 = val
_ifc_2 = A[i]
val = cond? _ifc_1 : _ifc_2
A[i] = val

When the loop got vectorized, this pattern will be turned into
MASK_LOAD, VEC_COND_EXPR and MASK_STORE. This could be improved.

The change here add a post processing function to combine the VEC_COND_EXPR
expression into MASK_STORE, and delete related dead code.

I am a little bit conservative here.
I didn't change the default behavior of ifcvt to always generate MASK_STORE,
although it should be safe in all cases (allow or dis-allow store data race).

MASK_STORE might not well handled in vectorization pass compared with
conditional select. It might be too early and aggressive to do that in ifcvt.
And the performance of MASK_STORE might not good for some platforms.
(We could add --param or target hook to differentiate this ifcvt behavior
on different platforms)

Another reason I did not do that in ifcvt is the data reference
analysis. To create a MASK_STORE, a pointer is created as the first
argument to the internal function call. If the pointer is created out of
array references, e.g. x = [i], data reference analysis could not properly
analysis the relationship between MEM_REF (x) and ARRAY_REF (A, i). This
will create a versioned loop beside the vectorized one.
(I have hacks to look through the MEM_REF, and restore the reference back to
ARRAY_REF (A, i).  Maybe we could do analysis on lowered address expression?
I saw we have gimple laddress pass to aid the vectorizer)

The approach here comes a little bit late, on the condition that vector
MASK_STORE is generated by loop vectorizer. In this case, it is definitely
beneficial to do the code transformation.

Any thoughts on the best way to fix the issue?


This patch has been tested with aarch64-none-elf, no regressions.

Regards,
Renlin

gcc/ChangeLog:

2018-11-08  Renlin Li  

* tree-vectorizer.h (combine_sel_mask_store): Declare new function.
* tree-vect-loop.c (combine_sel_mask_store): Define new function.
* tree-vectorizer.c (vectorize_loops): Call it.

gcc/testsuite/ChangeLog:

2018-11-08  Renlin Li  

* gcc.target/aarch64/sve/combine_vcond_mask_store_1.c: New.

diff --git a/gcc/testsuite/gcc.target/aarch64/sve/combine_vcond_mask_store_1.c b/gcc/testsuite/gcc.target/aarch64/sve/combine_vcond_mask_store_1.c
new file mode 100644
index ..64f6b7b00f58ee45bd4a2f91c1a9404911f1a09f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/combine_vcond_mask_store_1.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize --param allow-store-data-races=1 -fdump-tree-vect-details" } */
+
+void test ()
+{
+  static int array[100];
+  for (unsigned i = 1; i < 16; ++i)
+{
+  int a = array[i];
+  if (a & 1)
+	array[i] = a + 1;
+  if (array[i] > 10)
+	array[i] = a + 2;
+}
+}
+
+/* { dg-final { scan-tree-dump-times "Combining VEC_COND_EXPR and MASK_STORE" 1 "vect" } } */
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 177b284e9c617a41c33d1387ba5afbed51d8ed00..9e1a167d03ea5bf640e58b3426d42b4e3c74da56 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -8539,6 +8539,166 @@ vect_transform_loop (loop_vec_info loop_vinfo)
   return epilogue;
 }
 
+/*
+   When allow-store-data-races=1, if-conversion will convert certain if
+   statements into:
+   A[i] = cond ? val : A[i].
+   If the loop is successfully vectorized,
+   MASK_LOAD + VEC_COND_EXPR + MASK_STORE will be generated.
+
+   This pattern could be combined into a single MASK_STORE with new mask.
+   The new mask is the combination of original mask and the value selection mask
+   in VEC_COND_EXPR.
+
+   After the transformation, the MASK_LOAD and VEC_COND_EXPR might be dead.  */
+
+void
+combine_sel_mask_store (struct loop *loop)
+{
+  basic_block *bbs = get_loop_body (loop);
+  unsigned nbbs = loop->num_nodes;
+  unsigned i;
+  basic_block bb;
+  gimple_stmt_iterator gsi;
+
+  vect_location = find_loop_location (loop);
+  for (i = 0; i < nbbs; i++)
+{
+  bb = bbs[i];
+  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi);
+	   gsi_next ())
+	{
+	  gimple *mask_store = gsi_stmt (gsi);
+	  if (!gimple_call_internal_p (mask_store, IFN_MASK_STORE))
+	continue;
+
+	  /*
+	 X = MASK_LOAD (PTR, -, MASK)
+	 VAL = ...
+	 Y = VEC_COND (cond, VAL, X)
+	 MASK_STORE (PTR, -, MASK, Y)
+	  */
+	  tree vec_op = gimple_call_arg (mask_store, 3);
+	  tree store_mask = gimple_call_arg (mask_store, 2);
+	  if (TREE_CODE (vec_op) == SSA_NAME)
+	{
+	  gimple *def = SSA_NAME_DEF_STMT (vec_op);
+	  gassign *assign = dyn_cast  (def);
+	  if (!assign || gimple_assign_rhs_code (assign) != VEC_COND_EXPR)
+		continue;
+
+	  tree sel_cond = gimple_assign_rhs1 (assign);
+	  

Re: [PATCH 2/2 v3][IRA,LRA] Fix PR86939, IRA incorrectly creates an interference between a pseudo register and a hard register

2018-11-08 Thread Renlin Li

Hi,

On 11/06/2018 06:58 PM, Jeff Law wrote:

On 11/6/18 3:52 AM, Renlin Li wrote:

Hi Jeff & Peter,

On 11/05/2018 07:41 PM, Jeff Law wrote:

On 11/5/18 12:36 PM, Peter Bergner wrote:

On 11/5/18 1:20 PM, Jeff Law wrote:

On 11/1/18 4:07 PM, Peter Bergner wrote:

On 11/1/18 1:50 PM, Renlin Li wrote:

Is there any update on this issues?
arm-none-linux-gnueabihf native toolchain has been mis-compiled
for a while.


  From the analysis I've done, my commit is just exposing latent issues
in LRA.  Can you try the patch I submitted here to see if it helps?

    https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01757.html

It survives on powerpc64le-linux, x86_64-linux and s390x-linux.
Jeff threw it on his testers and said he saw an arm issue and was
trying to come up with a test case for me to debug.

So I don't think the ARM issues are related to your patch, they may
have
been related the combiner changes that went in around the same time.

Yes, there are issues related to the combiner changes.

But the IRA/LRA change dose cause the arm-none-linux-gnueabihf bootstrap
native toolchain mis-compiled.
And the new patch seems not fix this problem.

That's strange.  I'm bootstrapping arm-linux-gnueabihf daily with qemu +
a suitable root filesystem using Peter's most recent testing patch.




I am trying to extract a test case, but it is a little bit hard as the
toolchain itself is mis-compiled.
And it ICEs when compile test case with it.

What I would suggest doing is to first start with running the testsuite
against the stage1 compiler before/after Peter's changes.  Sometimes
that'll turn up something useful and you can avoid debuging things
through stage2/stage3.


Hi Jeff,
Thanks for the suggestion! I could reproduce it with stage1 compiler.



Hi Peter,

I think I found the problem!

As described in the PR, a hard register is used in
an pre/post modify expression. The hard register is live, but updated.
In this case, we should make it conflicting with all pseudos live at
that point.  Does it make sense?





It fixes the ICE of mis-compiled arm-linux-gnueabihf toolchain described in the
PR.

I attached the patch for discussion.  I haven't give a complete test on arm or
any other targets, yet. (Probably need more adjusting)

I will run arm and aarch64 regression test, cross and native.

Regards,
Renlin

BTW, The pre/post modify expression is generated by auto_inc/auto_dec pass.
somehow, it merges with hard register,  for example function argument registers.
This optimization make the life for RA harder. Probably we don't want that pass
too aggressive. @Wilco.
(This IRA/LRA and the combiner change reveals a lot of issues,
force us to work on it and improve the compiler :) .)

gcc/ChangeLog:

2018-11-08  Renlin Li  
PR middle-end/87899
* lra-lives.c (process_bb_lives): Make hard register of INOUT
type conflict with all live pseudo.










jeff

diff --git a/gcc/lra-lives.c b/gcc/lra-lives.c
index 0bf8cd06a302c8a6fcb914b94f953cdaa86597a2..370a7254cac7dbde4e320424e09274cee66c50b9 100644
--- a/gcc/lra-lives.c
+++ b/gcc/lra-lives.c
@@ -878,11 +878,25 @@ process_bb_lives (basic_block bb, int _point, bool dead_insn_p)
 
   /* See which defined values die here.  */
   for (reg = curr_id->regs; reg != NULL; reg = reg->next)
-	if (reg->type == OP_OUT
-	&& ! reg_early_clobber_p (reg, n_alt) && ! reg->subreg_p)
-	  need_curr_point_incr
-	|= mark_regno_dead (reg->regno, reg->biggest_mode,
-curr_point);
+	if (! reg_early_clobber_p (reg, n_alt) && ! reg->subreg_p)
+	  {
+	if (reg->type == OP_OUT)
+	  need_curr_point_incr
+		|= mark_regno_dead (reg->regno, reg->biggest_mode,
+curr_point);
+
+	// This is a hard register, and it must be live.  Keep it live and
+	// make it conflict with all live pseudo registers.
+	else if (reg->type == OP_INOUT && reg->regno < FIRST_PSEUDO_REGISTER)
+	  {
+		lra_assert (TEST_HARD_REG_BIT (hard_regs_live, reg->regno));
+
+		unsigned int i;
+		EXECUTE_IF_SET_IN_SPARSESET (pseudos_live, i)
+		  SET_HARD_REG_BIT (lra_reg_info[i].conflict_hard_regs,
+reg->regno);
+	  }
+	  }
 
   for (reg = curr_static_id->hard_regs; reg != NULL; reg = reg->next)
 	if (reg->type == OP_OUT


Re: [PATCH 2/2 v3][IRA,LRA] Fix PR86939, IRA incorrectly creates an interference between a pseudo register and a hard register

2018-11-06 Thread Renlin Li

Hi Ramana,

On 11/06/2018 10:57 AM, Ramana Radhakrishnan wrote:

On Tue, Nov 6, 2018 at 10:52 AM Renlin Li  wrote:


Hi Jeff & Peter,

On 11/05/2018 07:41 PM, Jeff Law wrote:

On 11/5/18 12:36 PM, Peter Bergner wrote:

On 11/5/18 1:20 PM, Jeff Law wrote:

On 11/1/18 4:07 PM, Peter Bergner wrote:

On 11/1/18 1:50 PM, Renlin Li wrote:

Is there any update on this issues?
arm-none-linux-gnueabihf native toolchain has been mis-compiled for a while.


  From the analysis I've done, my commit is just exposing latent issues
in LRA.  Can you try the patch I submitted here to see if it helps?

https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01757.html

It survives on powerpc64le-linux, x86_64-linux and s390x-linux.
Jeff threw it on his testers and said he saw an arm issue and was
trying to come up with a test case for me to debug.

So I don't think the ARM issues are related to your patch, they may have
been related the combiner changes that went in around the same time.

Yes, there are issues related to the combiner changes.


But didn't the combiner changes come *after* these patches ? So IIUC,
Renlin has been trying to get these fixed *without* the combine
patches but just with your patch applied on top of the revision where
the problem started showing up .

Can you confirm that Renlin ?


I just did a bootstrap again with everything up to r264897 which is Oct 6.
it produce the ICE I mentioned on the PR87899.

The first combiner patch on Oct 22.

Regards,
Renlin




Ramana


But the IRA/LRA change dose cause the arm-none-linux-gnueabihf bootstrap native 
toolchain mis-compiled.
And the new patch seems not fix this problem.

I am trying to extract a test case, but it is a little bit hard as the 
toolchain itself is mis-compiled.
And it ICEs when compile test case with it.

I created a bugzilla ticket for this, PR87899.

./gcc/cc1 ~/gcc/./gcc/testsuite/gcc.c-torture/execute/pr36034-1.c  -O3
   test main
Analyzing compilation unit
Performing interprocedural optimizations
   <*free_lang_data> 
 Streaming LTO
   
  
Assembling functions:
testduring GIMPLE pass: ldist

gcc/./gcc/testsuite/gcc.c-torture/execute/pr36034-1.c: In function ‘test’:
gcc/./gcc/testsuite/gcc.c-torture/execute/pr36034-1.c:9:1: internal compiler 
error: Segmentation fault
  9 | test (void)
| ^~~~
0x5c3a37 crash_signal
 ../../gcc/gcc/toplev.c:325
0x63ef6b inchash::hash::add(void const*, unsigned int)
 ../../gcc/gcc/inchash.h:100
0x63ef6b inchash::hash::add_ptr(void const*)
 ../../gcc/gcc/inchash.h:94
0x63ef6b ddr_hasher::hash(data_dependence_relation const*)
 ../../gcc/gcc/tree-loop-distribution.c:143
0x63ef6b hash_table::find_slot(data_dependence_relation* 
const&, insert_option)
 ../../gcc/gcc/hash-table.h:414
0x63ef6b get_data_dependence
 ../../gcc/gcc/tree-loop-distribution.c:1184
0x63f2bd data_dep_in_cycle_p
 ../../gcc/gcc/tree-loop-distribution.c:1210
0x63f2bd update_type_for_merge
 ../../gcc/gcc/tree-loop-distribution.c:1255
0x64064b build_rdg_partition_for_vertex
 ../../gcc/gcc/tree-loop-distribution.c:1302
0x64064b rdg_build_partitions
 ../../gcc/gcc/tree-loop-distribution.c:1754
0x64064b distribute_loop
 ../../gcc/gcc/tree-loop-distribution.c:2795
0x642299 execute
 ../../gcc/gcc/tree-loop-distribution.c:3133
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.



Regards
Renlin





At this point your patch appears to be DTRT across the board.  The only
fallout is the bogus s390 asm it caught in the kernel.


Cool.  I will note that I contacted the s390 kernel guys and gave them a
fix to their broken constraints in that asm and they are going to fix it.

Sounds good.  I've got a hack in my tester to "fix" that bogus asm until
the kernel folks do it right.




Is the above an approval to commit the patch mentioned above or do you
still want to wait until the ARM issues are fully resolved?

I think knowing the patch addresses all the known issues related to the
earlier IRA/LRA change unblocks the review step.  I don't think we need
to wait for the other ARM issues to be resolved -- they seem to be
unrelated to the IRA/LRA changes.

jeff



Re: [PATCH 2/2 v3][IRA,LRA] Fix PR86939, IRA incorrectly creates an interference between a pseudo register and a hard register

2018-11-06 Thread Renlin Li

Hi Jeff & Peter,

On 11/05/2018 07:41 PM, Jeff Law wrote:

On 11/5/18 12:36 PM, Peter Bergner wrote:

On 11/5/18 1:20 PM, Jeff Law wrote:

On 11/1/18 4:07 PM, Peter Bergner wrote:

On 11/1/18 1:50 PM, Renlin Li wrote:

Is there any update on this issues?
arm-none-linux-gnueabihf native toolchain has been mis-compiled for a while.


 From the analysis I've done, my commit is just exposing latent issues
in LRA.  Can you try the patch I submitted here to see if it helps?

   https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01757.html

It survives on powerpc64le-linux, x86_64-linux and s390x-linux.
Jeff threw it on his testers and said he saw an arm issue and was
trying to come up with a test case for me to debug.

So I don't think the ARM issues are related to your patch, they may have
been related the combiner changes that went in around the same time.

Yes, there are issues related to the combiner changes.

But the IRA/LRA change dose cause the arm-none-linux-gnueabihf bootstrap native 
toolchain mis-compiled.
And the new patch seems not fix this problem.

I am trying to extract a test case, but it is a little bit hard as the 
toolchain itself is mis-compiled.
And it ICEs when compile test case with it.

I created a bugzilla ticket for this, PR87899.

./gcc/cc1 ~/gcc/./gcc/testsuite/gcc.c-torture/execute/pr36034-1.c  -O3
 test main
Analyzing compilation unit
Performing interprocedural optimizations
 <*free_lang_data> 
 Streaming LTO

Assembling functions:

  testduring GIMPLE pass: ldist

gcc/./gcc/testsuite/gcc.c-torture/execute/pr36034-1.c: In function ‘test’:
gcc/./gcc/testsuite/gcc.c-torture/execute/pr36034-1.c:9:1: internal compiler 
error: Segmentation fault
9 | test (void)
  | ^~~~
0x5c3a37 crash_signal
../../gcc/gcc/toplev.c:325
0x63ef6b inchash::hash::add(void const*, unsigned int)
../../gcc/gcc/inchash.h:100
0x63ef6b inchash::hash::add_ptr(void const*)
../../gcc/gcc/inchash.h:94
0x63ef6b ddr_hasher::hash(data_dependence_relation const*)
../../gcc/gcc/tree-loop-distribution.c:143
0x63ef6b hash_table::find_slot(data_dependence_relation* 
const&, insert_option)
../../gcc/gcc/hash-table.h:414
0x63ef6b get_data_dependence
../../gcc/gcc/tree-loop-distribution.c:1184
0x63f2bd data_dep_in_cycle_p
../../gcc/gcc/tree-loop-distribution.c:1210
0x63f2bd update_type_for_merge
../../gcc/gcc/tree-loop-distribution.c:1255
0x64064b build_rdg_partition_for_vertex
../../gcc/gcc/tree-loop-distribution.c:1302
0x64064b rdg_build_partitions
../../gcc/gcc/tree-loop-distribution.c:1754
0x64064b distribute_loop
../../gcc/gcc/tree-loop-distribution.c:2795
0x642299 execute
../../gcc/gcc/tree-loop-distribution.c:3133
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.



Regards
Renlin





At this point your patch appears to be DTRT across the board.  The only
fallout is the bogus s390 asm it caught in the kernel.


Cool.  I will note that I contacted the s390 kernel guys and gave them a
fix to their broken constraints in that asm and they are going to fix it.

Sounds good.  I've got a hack in my tester to "fix" that bogus asm until
the kernel folks do it right.




Is the above an approval to commit the patch mentioned above or do you
still want to wait until the ARM issues are fully resolved?

I think knowing the patch addresses all the known issues related to the
earlier IRA/LRA change unblocks the review step.  I don't think we need
to wait for the other ARM issues to be resolved -- they seem to be
unrelated to the IRA/LRA changes.

jeff



Re: [PATCH] combine: Do not combine moves from hard registers

2018-11-05 Thread Renlin Li




On 11/05/2018 12:35 PM, Renlin Li wrote:

Hi Segher,

On 11/03/2018 02:34 AM, Jeff Law wrote:

On 11/2/18 5:54 PM, Segher Boessenkool wrote:

On Fri, Nov 02, 2018 at 06:03:20PM -0500, Segher Boessenkool wrote:

The original rtx is generated by expand_builtin_setjmp_receiver to adjust
the frame pointer.

And later in LRA, it will try to eliminate frame_pointer with hard frame
pointer which is
defined the ELIMINABLE_REGS.

Your change split the insn into two.
This makes it doesn't match the "from" and "to" regs defined in
ELIMINABLE_REGS.
The if statement to generate the adjustment insn is been skipt.
And the original instruction is just been deleted!

I don't follow why, or what should have prevented it from being deleted.


Probably, we don't want to split the move rtx if they are related to
entries defined in ELIMINABLE_REGS?

One thing I can easily do is not making an intermediate pseudo when copying
*to* a fixed reg, which sfp is.  Let me try if that helps the testcase I'm
looking at (setjmp-4.c).

This indeed helps, see patch below.  Could you try that on the whole
testsuite?

Thanks,


Segher


p.s. It still is a problem in the arm backend, but this won't hurt combine,
so why not.


 From 814ca23ce05384d017b3c2bff41ab61cf5446e46 Mon Sep 17 00:00:00 2001
Message-Id: 
<814ca23ce05384d017b3c2bff41ab61cf5446e46.1541202704.git.seg...@kernel.crashing.org>
From: Segher Boessenkool 
Date: Fri, 2 Nov 2018 23:33:32 +
Subject: [PATCH] combine: Don't break up copy from hard to fixed reg

---
  gcc/combine.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/gcc/combine.c b/gcc/combine.c
index dfb0b44..15e941a 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -14998,6 +14998,8 @@ make_more_copies (void)
  continue;
    if (TEST_HARD_REG_BIT (fixed_reg_set, REGNO (src)))
  continue;
+  if (REG_P (dest) && TEST_HARD_REG_BIT (fixed_reg_set, REGNO (dest)))
+    continue;
    rtx new_reg = gen_reg_rtx (GET_MODE (dest));
    rtx_insn *new_insn = gen_move_insn (new_reg, src);
-- 1.8.3.1

It certainly helps the armeb test results.


Yes, I can also see it helps a lot with the regression test.
Thanks for working on it!


Beside the correctness issue, there are performance regression issues as other 
people also reported.

I analysised a case, which is gcc.c-torture/execute/builtins/memcpy-chk.c
In this case, two additional register moves and callee saves are emitted.

The problem is that, make_more_moves split a move into two. Ideally, the RA 
could figure out and
make the best register allocation. However, in reality, scheduler in some cases 
will reschedule
the instructions, and which changes the live-range of registers. And thus 
change the interference graph
of pseudo registers.

This will force the RA to choose a different register for it, and make the move 
instruction not redundant,
at least, not possible for RA to eliminate it.

For example,

set r102, r1

After combine:
insn x: set r103, r1
insn x+1: set r22, r103

After scheduler:
insn x: set r103, r1
...
...
...
insn x+1: set r102, r103

After IRA, r1 could be assigned to operands used in instructions in between 
insn x and x+1.
so r23 is conflicting with r1. LRA has to assign r23 a different hard register.


Sorry, this is not correct. Instructions scheduled between x and x+1 directly 
use hard register r1.
It is not IRA/LRA assigning r1 to the operands.


To reproduce this particular case, you could use:
cc1  -O3 -marm -march=armv7-a -mfpu=vfpv3-d16 -mfloat-abi=softfp 
gcc.c-torture/execute/builtins/memcpy-chk.c

This insn is been splitted.

(insn 152 150 154 11 (set (mem/c:QI (plus:SI (reg/f:SI 266)
(const_int 24 [0x18])) [0 MEM[(void *) + 20B]+4 S1 A32])
(reg:QI 1 r1)) "memcpy-chk-reduce.c":48:3 189 {*arm_movqi_insn}
 (expr_list:REG_DEAD (reg:QI 1 r1)
(nil)))


Regards,
Renlin



This cause one additional move, and probably one more callee save/restore.

Nothing is obviously wrong here. But...

One simple case probably not beneficial is to split hard register store.
According to your comment on make_more_moves, you might want to apply the 
transformation only
on hard-reg-to-pseudo-copy?

Regards,
Renlin






Jeff



Re: [PATCH] combine: Do not combine moves from hard registers

2018-11-05 Thread Renlin Li

Hi Segher,

On 11/03/2018 02:34 AM, Jeff Law wrote:

On 11/2/18 5:54 PM, Segher Boessenkool wrote:

On Fri, Nov 02, 2018 at 06:03:20PM -0500, Segher Boessenkool wrote:

The original rtx is generated by expand_builtin_setjmp_receiver to adjust
the frame pointer.

And later in LRA, it will try to eliminate frame_pointer with hard frame
pointer which is
defined the ELIMINABLE_REGS.

Your change split the insn into two.
This makes it doesn't match the "from" and "to" regs defined in
ELIMINABLE_REGS.
The if statement to generate the adjustment insn is been skipt.
And the original instruction is just been deleted!

I don't follow why, or what should have prevented it from being deleted.


Probably, we don't want to split the move rtx if they are related to
entries defined in ELIMINABLE_REGS?

One thing I can easily do is not making an intermediate pseudo when copying
*to* a fixed reg, which sfp is.  Let me try if that helps the testcase I'm
looking at (setjmp-4.c).

This indeed helps, see patch below.  Could you try that on the whole
testsuite?

Thanks,


Segher


p.s. It still is a problem in the arm backend, but this won't hurt combine,
so why not.


 From 814ca23ce05384d017b3c2bff41ab61cf5446e46 Mon Sep 17 00:00:00 2001
Message-Id: 
<814ca23ce05384d017b3c2bff41ab61cf5446e46.1541202704.git.seg...@kernel.crashing.org>
From: Segher Boessenkool 
Date: Fri, 2 Nov 2018 23:33:32 +
Subject: [PATCH] combine: Don't break up copy from hard to fixed reg

---
  gcc/combine.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/gcc/combine.c b/gcc/combine.c
index dfb0b44..15e941a 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -14998,6 +14998,8 @@ make_more_copies (void)
continue;
  if (TEST_HARD_REG_BIT (fixed_reg_set, REGNO (src)))
continue;
+ if (REG_P (dest) && TEST_HARD_REG_BIT (fixed_reg_set, REGNO (dest)))
+   continue;
  
  	  rtx new_reg = gen_reg_rtx (GET_MODE (dest));

  rtx_insn *new_insn = gen_move_insn (new_reg, src);
-- 1.8.3.1

It certainly helps the armeb test results.


Yes, I can also see it helps a lot with the regression test.
Thanks for working on it!


Beside the correctness issue, there are performance regression issues as other 
people also reported.

I analysised a case, which is gcc.c-torture/execute/builtins/memcpy-chk.c
In this case, two additional register moves and callee saves are emitted.

The problem is that, make_more_moves split a move into two. Ideally, the RA 
could figure out and
make the best register allocation. However, in reality, scheduler in some cases 
will reschedule
the instructions, and which changes the live-range of registers. And thus 
change the interference graph
of pseudo registers.

This will force the RA to choose a different register for it, and make the move 
instruction not redundant,
at least, not possible for RA to eliminate it.

For example,

set r102, r1

After combine:
insn x: set r103, r1
insn x+1: set r22, r103

After scheduler:
insn x: set r103, r1
...
...
...
insn x+1: set r102, r103

After IRA, r1 could be assigned to operands used in instructions in between 
insn x and x+1.
so r23 is conflicting with r1. LRA has to assign r23 a different hard register.
This cause one additional move, and probably one more callee save/restore.

Nothing is obviously wrong here. But...

One simple case probably not beneficial is to split hard register store.
According to your comment on make_more_moves, you might want to apply the 
transformation only
on hard-reg-to-pseudo-copy?

Regards,
Renlin






Jeff



Re: [PATCH] combine: Do not combine moves from hard registers

2018-11-02 Thread Renlin Li

Hi Segher,

I find a problem with your change to add make_more_copies.
I am investigating those regressions, a big amount of them are wrong code 
generation.

One problem is that, make_more_copies will split the assignment of fp to sfp.

From:
(insn 48 26 28 5 (set (reg/f:SI 102 sfp)
(reg/f:SI 11 fp)) -1
To:
(insn 51 32 26 5 (set (reg:SI 117)
(reg/f:SI 11 fp)) 646 {*arm_movsi_vfp}
 (expr_list:REG_EQUIV (reg/f:SI 11 fp)
(nil)))
(insn 48 26 28 5 (set (reg/f:SI 102 sfp)
(reg:SI 117)) 646 {*arm_movsi_vfp}
 (expr_list:REG_DEAD (reg:SI 117)
(nil)))

The original rtx is generated by expand_builtin_setjmp_receiver to adjust the 
frame pointer.

And later in LRA, it will try to eliminate frame_pointer with hard frame 
pointer which is
defined the ELIMINABLE_REGS.

Your change split the insn into two.
This makes it doesn't match the "from" and "to" regs defined in ELIMINABLE_REGS.
The if statement to generate the adjustment insn is been skipt.
And the original instruction is just been deleted!



Probably, we don't want to split the move rtx if they are related to entries 
defined in ELIMINABLE_REGS?


Regards,
Renlin

On 10/24/2018 09:23 AM, Christophe Lyon wrote:

On Wed, 24 Oct 2018 at 00:26, Segher Boessenkool
 wrote:


Hi Christophe,

On Tue, Oct 23, 2018 at 03:25:55PM +0200, Christophe Lyon wrote:

On Tue, 23 Oct 2018 at 14:29, Segher Boessenkool
 wrote:

On Tue, Oct 23, 2018 at 12:14:27PM +0200, Christophe Lyon wrote:

I have noticed many regressions on arm and aarch64 between 265366 and
265408 (this commit is 265398).

I bisected at least one to this commit on aarch64:
FAIL: gcc.dg/ira-shrinkwrap-prep-1.c scan-rtl-dump ira "Split
live-range of register"
The same test also regresses on arm.


Many targets also fail gcc.dg/ira-shrinkwrap-prep-2.c; these tests fail
when random things in the RTL change, apparently.


This is PR87708 now.


For a whole picture of all the regressions I noticed during these two
commits, have a look at:
http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/265408/report-build-info.html


No thanks.  I am not going to click on 111 links and whatever is behind
those.  Please summarise, like, what was the diff in test_summary, and
then dig down into individual tests if you want.  Or whatever else works
both for you and for me.  This doesn't work for me.


OK this is not very practical for me either. There were 25 commits between
the two validations being compared,
25-28 gcc tests regressed on aarch64, depending on the exact target
177-206 gcc tests regressed on arm*, 7-29 gfortran regressions on arm*
so I could have to run many bisects to make sure every regression is
caused by the same commit.


So many, ouch!  I didn't realise.


I've now got the results of validating your patch only, compared to the
previous revision, and it does cause all the regressions I noticed earlier.


Since these are all automated builds with everything discarded after
computing the regressions, it's quite time consuming to re-run the
tests manually on my side (probably at least as much as it is for you).


Running arm tests is very painful for me.  But you say this is on aarch64
as well, I didn't realise that either; aarch64 should be easy to test,
we have many reasonable aarch64 machines in the cfarm.


I know this doesn't answer your question, but I thought you could run aarch64
tests easily and that would be more efficient for the project that you
do it directly
without waiting for me to provide hardly little more information.


Well, I'm not too familiar with aarch64, so if you can say "this Z is a
pretty simple test that should do X but now does Y" that would be a huge
help :-)


Maybe this will answer your question better:
List of aarch64-linux-gnu regressions:
http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/265408/aarch64-none-linux-gnu/diff-gcc-rh60-aarch64-none-linux-gnu-default-default-default.txt
List of arm-none-linux-gnueabihf regressions:
(gcc) 
http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/265408/arm-none-linux-gnueabihf/diff-gcc-rh60-arm-none-linux-gnueabihf-arm-cortex-a9-neon-fp16.txt
(gfortran) 
http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/265408/arm-none-linux-gnueabihf/diff-gfortran-rh60-arm-none-linux-gnueabihf-arm-cortex-a9-neon-fp16.txt


That may help yes, thanks!


To me it just highlights again that we need a validation system easier to
work with when we break something on a target we are not familiar with.


OTOH a patch like this is likely to break many target-specific tests, and
that should not prevent commiting it imnsho.  If it actively breaks things,
then of course it shouldn't go in as-is, or if it breaks bootstrap, etc.


I run post-commit validations as finely grained as possible with the CPU
resources I have access to, that's not enough and I think having a
developer-accessible gerrit+jenkins-like system would be very valuable
to test 

Re: [PATCH 2/2 v3][IRA,LRA] Fix PR86939, IRA incorrectly creates an interference between a pseudo register and a hard register

2018-11-02 Thread Renlin Li

Hi Peter,

On 11/01/2018 10:07 PM, Peter Bergner wrote:

On 11/1/18 1:50 PM, Renlin Li wrote:

Is there any update on this issues?
arm-none-linux-gnueabihf native toolchain has been mis-compiled for a while.


 From the analysis I've done, my commit is just exposing latent issues
in LRA.  


Yes, it looks like some latent issues are been exposed.


Can you try the patch I submitted here to see if it helps?


   https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01757.html


Thanks for the patch! I'll help to test the patch and let you know the status.

Thanks,
Renlin



It survives on powerpc64le-linux, x86_64-linux and s390x-linux.
Jeff threw it on his testers and said he saw an arm issue and was
trying to come up with a test case for me to debug.

The specific issue you mentioned with the inline asm and the casp insn
is a bug in LRA where is will spill a user defined hard register and
it shouldn't do that.  My patch above stops that.  The question is
whether we've quashed the rest of the latent bugs.

Peter




Re: [PATCH 2/2 v3][IRA,LRA] Fix PR86939, IRA incorrectly creates an interference between a pseudo register and a hard register

2018-11-01 Thread Renlin Li

Hi Peter,

Is there any update on this issues?
arm-none-linux-gnueabihf native toolchain has been mis-compiled for a while.

I got the following dump from the test case.

x1 is an early clobber operand in the inline assembly statement,
r92 should conflict with x1?

;; a0(r93,l0) conflicts: a1(r92,l0)
;; total conflict hard regs: 0-4 16 17 30
;; conflict hard regs: 0-4 16 17 30


;; a1(r92,l0) conflicts: a0(r93,l0)
;; total conflict hard regs: 0 2-4 16 17 30
;; conflict hard regs: 0 2-4 16 17 30

Dump from ira.

(insn 2 8 6 2 (set (reg/v:DI 92 [ arg ])
(reg:DI 97)) "test.c":3:1 47 {*movdi_aarch64}
 (expr_list:REG_DEAD (reg:DI 97)
(nil)))
(insn 7 6 9 2 (set (reg/v:DI 1 x1 [ x1 ])
(reg/v:DI 92 [ arg ])) "test.c":13:26 47 {*movdi_aarch64}
 (nil))
(insn 11 10 14 2 (set (reg/f:DI 93)
(const_int 0 [0])) "test.c":17:3 47 {*movdi_aarch64}
 (expr_list:REG_EQUIV (const_int 0 [0])
(nil)))
(insn 14 11 21 2 (parallel [
(set (reg/v:DI 0 x0 [ x0 ])
(asm_operands/v:DI ("  casp%0, %1, %3, %4, %2
eor %0, %0, %6
eor %1, %1, %7
orr %0, %0, %1
") ("=") 0 [
(reg/v:DI 2 x2 [ x2 ])
(reg/v:DI 3 x3 [ x3 ])
(reg/v:DI 4 x4 [ x4 ])
(reg/f:DI 93)
(reg/v:DI 92 [ arg ])
(reg/v:DI 0 x0 [ x0 ])
(reg/v:DI 1 x1 [ x1 ])
(mem:DI (reg/f:DI 93) [1 MEM[(long unsigned int *)0B]+0 
S8 A128])
]
 [
(asm_input:DI ("r") test.c:17)
(asm_input:DI ("r") test.c:17)
(asm_input:DI ("r") test.c:17)
(asm_input:DI ("r") test.c:17)
(asm_input:DI ("r") test.c:17)
(asm_input:DI ("0") test.c:17)
(asm_input:DI ("1") test.c:17)
(asm_input:DI ("Q") test.c:17)
]
 [] test.c:17))
(set (reg/v:DI 1 x1 [ x1 ])
(asm_operands/v:DI ("  casp%0, %1, %3, %4, %2
eor %0, %0, %6
eor %1, %1, %7
orr %0, %0, %1
") ("=") 1 [
(reg/v:DI 2 x2 [ x2 ])
(reg/v:DI 3 x3 [ x3 ])
(reg/v:DI 4 x4 [ x4 ])
(reg/f:DI 93)
(reg/v:DI 92 [ arg ])
(reg/v:DI 0 x0 [ x0 ])
(reg/v:DI 1 x1 [ x1 ])
(mem:DI (reg/f:DI 93) [1 MEM[(long unsigned int *)0B]+0 
S8 A128])
]
 [
(asm_input:DI ("r") test.c:17)
(asm_input:DI ("r") test.c:17)
(asm_input:DI ("r") test.c:17)
(asm_input:DI ("r") test.c:17)
(asm_input:DI ("r") test.c:17)
(asm_input:DI ("0") test.c:17)
(asm_input:DI ("1") test.c:17)
(asm_input:DI ("Q") test.c:17)
]
 [] test.c:17))
(set (mem:DI (reg/f:DI 93) [1 MEM[(long unsigned int *)0B]+0 S8 
A128])
(asm_operands/v:DI ("  casp%0, %1, %3, %4, %2
eor %0, %0, %6
eor %1, %1, %7
orr %0, %0, %1
") ("=Q") 2 [
(reg/v:DI 2 x2 [ x2 ])
(reg/v:DI 3 x3 [ x3 ])
(reg/v:DI 4 x4 [ x4 ])
(reg/f:DI 93)
(reg/v:DI 92 [ arg ])
(reg/v:DI 0 x0 [ x0 ])
(reg/v:DI 1 x1 [ x1 ])
(mem:DI (reg/f:DI 93) [1 MEM[(long unsigned int *)0B]+0 
S8 A128])
]
 [
(asm_input:DI ("r") test.c:17)
(asm_input:DI ("r") test.c:17)
(asm_input:DI ("r") test.c:17)
(asm_input:DI ("r") test.c:17)
(asm_input:DI ("r") test.c:17)
(asm_input:DI ("0") test.c:17)
(asm_input:DI ("1") test.c:17)
(asm_input:DI ("Q") test.c:17)
]
 [] test.c:17))
(clobber (reg:DI 30 x30))
(clobber (reg:DI 17 x17))
(clobber (reg:DI 16 x16))
]) "test.c":17:3 -1
 (expr_list:REG_DEAD (reg/f:DI 93)
(expr_list:REG_DEAD (reg/v:DI 92 [ arg ])
(expr_list:REG_DEAD (reg/v:DI 4 x4 [ x4 ])
(expr_list:REG_DEAD (reg/v:DI 3 x3 [ x3 ])
(expr_list:REG_DEAD (reg/v:DI 2 x2 [ x2 ])
(expr_list:REG_UNUSED (reg:DI 30 x30)
 

[PR87815]Don't generate shift sequence for load replacement in DSE when the mode size is not compile-time constant

2018-10-31 Thread Renlin Li

Hi all,

The patch adds a check if the gap is compile-time constant.

This happens when dse decides to replace the load with previous store value.
The problem is that, shift sequence could not accept compile-time non-constant
mode operand.

Another issue raised from this issue is the inefficient code-generation for
general data manipulation over mask/predicate register.
In sve, some general data processing instructions don't apply on predicate
registers directly. In the worst(this) case, memory load/store is generated to 
reload
the value into a general purpose register for further data processing.
We need to improve that.

aarch64 sve test Okay, Okay to commit?

Regards,
Renlin

gcc/ChangeLog:

2018-10-31  Renlin Li  

PR target/87815
* dse.c (get_stored_val): Add check for compile-time
  constantness of gap.

gcc/testsuite/ChangeLog:

2018-10-31  Renlin Li  

PR target/87815
* gcc.target/aarch64/sve/pr87815.c: New.
diff --git a/gcc/dse.c b/gcc/dse.c
index cfebfa0e110be56f17337dcb152984d782528889..21d166d92fcc2c2a4dd6d04bb7a7247b79b81a62 100644
--- a/gcc/dse.c
+++ b/gcc/dse.c
@@ -1841,7 +1841,7 @@ get_stored_val (store_info *store_info, machine_mode read_mode,
   else
 gap = read_offset - store_info->offset;
 
-  if (maybe_ne (gap, 0))
+  if (gap.is_constant () && maybe_ne (gap, 0))
 {
   poly_int64 shift = gap * BITS_PER_UNIT;
   poly_int64 access_size = GET_MODE_SIZE (read_mode) + gap;
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr87815.c b/gcc/testsuite/gcc.target/aarch64/sve/pr87815.c
new file mode 100644
index ..628cedb2acce82a86b61944eb6184d7fdbb2d656
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pr87815.c
@@ -0,0 +1,13 @@
+/* { dg-do compile { target aarch64_asm_sve_ok } } */
+/* { dg-options "-O3" } */
+int a, b, d;
+short e;
+
+void f ()
+{
+  for (int i = 0; i < 8; i++)
+{
+  e = b >= 2 ?: a >> b;
+  d = e && b;
+}
+}


Re: [PATCH] Initial commit of Networking TS implementation

2018-10-18 Thread Renlin Li

Hi Jonathan,

I saw those tests failed to compile on baremetal targets with the following 
error:
```
libstdc++-v3/include/experimental/io_context:45: fatal error: poll.h: No such 
file or directory
```

Should we add a check to prevent it from running on unsupported platforms?

Thanks!
Renlin

On 10/16/2018 05:15 PM, Jonathan Wakely wrote:

On 16/10/18 17:12 +0100, Jonathan Wakely wrote:

On 16/10/18 16:36 +0100, Jonathan Wakely wrote:

On 16/10/18 16:24 +0100, Jonathan Wakely wrote:

On 12/10/18 11:50 +0100, Jonathan Wakely wrote:

This implementation is very incomplete (see the various TODO comments
in the code) but rather than keeping it out of tree any longer I'm
committing it to trunk. This will allow others to experiment with it
and (I hope) work on finishing it. Either way we'll ship somehing for
gcc 9. It works OK for some synchronous operations, but most of the
async ops are not done yet.

* include/Makefile.am: Add new headers.
* include/Makefile.in: Regenerate.
* include/experimental/bits/net.h: New header for common
implementation details of Networking TS.
* include/experimental/buffer: New header.
* include/experimental/executor: New header.
* include/experimental/internet: New header.
* include/experimental/io_context: New header.
* include/experimental/net: New header.
* include/experimental/netfwd: New header.
* include/experimental/socket: New header.
* include/experimental/timer: New header.
* testsuite/experimental/net/buffer/arithmetic.cc: New test.
* testsuite/experimental/net/buffer/const.cc: New test.
* testsuite/experimental/net/buffer/creation.cc: New test.
* testsuite/experimental/net/buffer/mutable.cc: New test.
* testsuite/experimental/net/buffer/size.cc: New test.
* testsuite/experimental/net/buffer/traits.cc: New test.
* testsuite/experimental/net/execution_context/use_service.cc: New
test.
* testsuite/experimental/net/headers.cc: New test.
* testsuite/experimental/net/internet/address/v4/comparisons.cc: New
test.
* testsuite/experimental/net/internet/address/v4/cons.cc: New test.
* testsuite/experimental/net/internet/address/v4/creation.cc: New
test.
* testsuite/experimental/net/internet/address/v4/members.cc: New
test.
* testsuite/experimental/net/internet/resolver/base.cc: New test.
* testsuite/experimental/net/internet/resolver/ops/lookup.cc: New
test.
* testsuite/experimental/net/internet/resolver/ops/reverse.cc: New
test.
* testsuite/experimental/net/timer/waitable/cons.cc: New test.
* testsuite/experimental/net/timer/waitable/dest.cc: New test.
* testsuite/experimental/net/timer/waitable/ops.cc: New test.


A minor correction. Committed to trunk.


The tests were written three years ago, before we used effective
targets to control the C++14 dialect used for tests. This fixes them
to use the modern style.


And this makes it a bit more portable (but still a long way from
compiling for mingw).


This fixes a name collision in a test, because various systems (at
least GNU and AIX) define struct ip in .

Tested x86_64-linux and powerpc-aix, committed to trunk.



[AARCH64]Don't force symbols which referencing per-function literal pool into memory

2018-10-16 Thread Renlin Li

Hi all,

"-mcmodel=large" and "-mpc-relative-loads" are used to avoid adrp+add to 
address symbols.
When the combination is used, the original symbol is first forced into 
per-function literal pools.
And a local symbol is created to reference it.

In this case, the way to reference this local symbol is pc relative. According 
to the original logic,
the local symbol will be forced into memory, and another local symbol will be 
created again.

For example, during expand stage,

(insn 5 2 6 2 (set (reg/f:DI 92)
 (mem/u/c:DI (symbol_ref/u:DI ("*.LC1") [flags 0x2]) [0  S8 A64])) 
"imm.c":5 -1
  (expr_list:REG_EQUAL (symbol_ref/u:DI ("*.LC0") [flags 0x2])
 (nil)))
(insn 6 5 10 2 (set (reg:TI 90 [  ])
 (mem/u/c:TI (reg/f:DI 92) [0  S16 A128])) "imm.c":5 -1
  (expr_list:REG_EQUAL (const_wide_int 0x123456789abcdef0fedcba987654321)
 (nil)))

However, later, the CSE will replace memory load in insn 5 with its equivalent value 
(symbol_ref/u:DI ("*.LC0").
So there is no issue in the final code-generation.
However, if the CSE is not enabled, for example with -O0, a load will be 
generated.

The patch here simplifies the rtx in expand stage.
Instead of force the local symbol in the memory again, the symbol is classified 
as SYMBOL_TINY_ABSOLUTE.
The following rtx is generated directly.

(insn 5 2 6 2 (set (reg/f:DI 92)
 (symbol_ref/u:DI ("*.LC0") [flags 0x2])) "imm.c":5 -1
  (nil))
(insn 6 5 10 2 (set (reg:TI 90 [  ])
 (mem/u/c:TI (reg/f:DI 92) [0  S16 A128])) "imm.c":5 -1
  (expr_list:REG_EQUAL (const_wide_int 0x123456789abcdef0fedcba987654321)
 (nil)))

Similar change is added to handle literal pool referencing in small memory 
model when option
"-mpc-relative-loads" is present. The symbol is classified as 
SYMBOL_TINY_ABSOLUTE, instead of SYMBOL_SMALL_ABSOLUTE.
So that, in the final code-generation, a single ADR instruction could be used, 
instead of ADRP+ADD.

A test case is added, which should be applicable at O0 optimization level for 
all memory models (tiny, small and large).

Okay to commit?


gcc/ChangeLog:

2018-10-15  Renlin Li  

 * config/aarch64/aarch64.c (aarch64_classify_symbol): Direct address
 symbols referencing per function literl pool.

gcc/testsuite/ChangeLog:

2018-10-15  Renlin Li  

 * gcc.target/aarch64/pr79041-2.c: Skip when mcmodel is defined by
 dejagnu configuration.
 * gcc.target/aarch64/pr79041-3.c: New.

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 2f98a21acf16297bdd7c4742cbcfc695cdc4e5f9..bd6369ca08707cca59809bf970830238b05fa2bc 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -11845,7 +11845,13 @@ aarch64_classify_symbol (rtx x, HOST_WIDE_INT offset)
 	  || !IN_RANGE (offset, HOST_WIDE_INT_C (-4294967263),
 			HOST_WIDE_INT_C (4294967264)))
 	return SYMBOL_FORCE_TO_MEM;
-	  return SYMBOL_SMALL_ABSOLUTE;
+	  /* Use ADR when addressing per-function constant pool if
+	 pcrelative_literal_loads is enabled.  */
+	  else if (CONSTANT_POOL_ADDRESS_P (x)
+		   && aarch64_pcrelative_literal_loads)
+	return SYMBOL_TINY_ABSOLUTE;
+	  else
+	return SYMBOL_SMALL_ABSOLUTE;
 
 	case AARCH64_CMODEL_TINY_PIC:
 	  if (!aarch64_symbol_binds_local_p (x))
@@ -11857,14 +11863,24 @@ aarch64_classify_symbol (rtx x, HOST_WIDE_INT offset)
 	  if (!aarch64_symbol_binds_local_p (x))
 	return (aarch64_cmodel == AARCH64_CMODEL_SMALL_SPIC
 		?  SYMBOL_SMALL_GOT_28K : SYMBOL_SMALL_GOT_4G);
-	  return SYMBOL_SMALL_ABSOLUTE;
+	  /* Use ADR when addressing per-function constant pool if
+	 pcrelative_literal_loads is enabled.  */
+	  else if (CONSTANT_POOL_ADDRESS_P (x)
+		   && aarch64_pcrelative_literal_loads)
+	return SYMBOL_TINY_ABSOLUTE;
+	  else
+	return SYMBOL_SMALL_ABSOLUTE;
 
 	case AARCH64_CMODEL_LARGE:
-	  /* This is alright even in PIC code as the constant
-	 pool reference is always PC relative and within
-	 the same translation unit.  */
-	  if (!aarch64_pcrelative_literal_loads && CONSTANT_POOL_ADDRESS_P (x))
-	return SYMBOL_SMALL_ABSOLUTE;
+	  if (CONSTANT_POOL_ADDRESS_P (x))
+	{
+	  /* Use ADR when addressing per-function constant pool if
+		 pcrelative_literal_loads is enabled.  */
+	  if (aarch64_pcrelative_literal_loads)
+		return SYMBOL_TINY_ABSOLUTE;
+	  else
+		return SYMBOL_SMALL_ABSOLUTE;
+	}
 	  else
 	return SYMBOL_FORCE_TO_MEM;
 
diff --git a/gcc/testsuite/gcc.target/aarch64/pr79041-2.c b/gcc/testsuite/gcc.target/aarch64/pr79041-2.c
index 4695b5c1b2b7c9b515995e242dd38e0519a48a2b..42695be127db454934d2791474d5f97fc5667403 100644
--- a/gcc/testsuite/gcc.target/aarch64/pr79041-2.c
+++ b/gcc/testsuite/gcc.target/aarch64/pr79041-2.c
@@ -2,6 +2,7 @@
 /* { dg-options "-O2 -mcmodel=large -mpc-relative-litera

[PR87563][AARCH64-SVE]: Don't keep ifcvt loop when COND_ ifn could not be vectorized.

2018-10-12 Thread Renlin Li

Hi all,

ifcvt will created versioned loop and it will permissively generate
scalar COND_ ifn.

If in the loop vectorize pass, COND_ could not get vectorized,
the if-converted loop should be abandoned when the target doesn't support
such ifn.

As currently, COND_ is only used by aarch64 sve extension,
I only run the aarch64-sve testsuites, no change to the result.

Okay to commit?

Regards,
Renlin


gcc/ChangeLog:

2018-10-12  Renlin Li  

PR target/87563
* tree-vectorizer.c (try_vectorize_loop_1): Don't use
if-conversioned loop when it contains ifn with types not
supported by backend.
* internal-fn.c (expand_direct_optab_fn): Add an assert.
(direct_internal_fn_supported_p): New helper function.
* internal-fn.h (direct_internal_fn_supported_p): Declare.

gcc/testsuite/ChangeLog:

2018-10-12  Renlin Li  

PR target/87563
* gcc.target/aarch64/sve/pr87563.c: New.
diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h
index 99765cf407acc7d65356b156e91f9dc51f1dba34..ff3bace1ce643ee10e1f776efffa01af31b6bbe7 100644
--- a/gcc/internal-fn.h
+++ b/gcc/internal-fn.h
@@ -187,6 +187,7 @@ extern bool direct_internal_fn_supported_p (internal_fn, tree_pair,
 	optimization_type);
 extern bool direct_internal_fn_supported_p (internal_fn, tree,
 	optimization_type);
+extern bool direct_internal_fn_supported_p (gcall *, optimization_type);
 
 /* Return true if FN is supported for types TYPE0 and TYPE1 when the
optimization type is OPT_TYPE.  The types are those associated with
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index 34d4f9efab9a45e0a9e3622f37dab0fa417b76f7..d082dd5054fa7175ffd3a53414b1ef42a1fca14e 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -2890,6 +2890,7 @@ expand_direct_optab_fn (internal_fn fn, gcall *stmt, direct_optab optab,
 
   tree_pair types = direct_internal_fn_types (fn, stmt);
   insn_code icode = direct_optab_handler (optab, TYPE_MODE (types.first));
+  gcc_assert (icode != CODE_FOR_nothing);
 
   tree lhs = gimple_call_lhs (stmt);
   rtx lhs_rtx = NULL_RTX;
@@ -3183,6 +3184,17 @@ direct_internal_fn_supported_p (internal_fn fn, tree type,
   return direct_internal_fn_supported_p (fn, tree_pair (type, type), opt_type);
 }
 
+/* Return true if the STMT is supported when the optimization type is OPT_TYPE,
+   given that STMT is a call to a direct internal function.  */
+
+bool
+direct_internal_fn_supported_p (gcall *stmt, optimization_type opt_type)
+{
+  internal_fn fn = gimple_call_internal_fn (stmt);
+  tree_pair types = direct_internal_fn_types (fn, stmt);
+  return direct_internal_fn_supported_p (fn, types, opt_type);
+}
+
 /* If FN is commutative in two consecutive arguments, return the
index of the first, otherwise return -1.  */
 
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr87563.c b/gcc/testsuite/gcc.target/aarch64/sve/pr87563.c
new file mode 100644
index ..83553b7ceea7199b5afb9f5adab50f15f9e41d55
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pr87563.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize -fdump-tree-ifcvt-details -fdump-tree-vect" } */
+
+int a, b, c, *e;
+int d[2];
+
+void f ()
+{
+  while (c)
+{
+  d[0] = 4;
+  d[1] = 4;
+  *e = b == 0 ? 0 : a / b;
+}
+}
+
+/* { dg-final { scan-tree-dump "COND_DIV" "ifcvt" } } */
+/* { dg-final { scan-tree-dump-not "COND_DIV" "vect" } } */
diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c
index 747fb67ba13e80309377e66945a40f5e48f186c5..e70bd60493b22699b6873463281cc337a888f887 100644
--- a/gcc/tree-vectorizer.c
+++ b/gcc/tree-vectorizer.c
@@ -80,6 +80,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "attribs.h"
 #include "gimple-pretty-print.h"
 #include "opt-problem.h"
+#include "internal-fn.h"
 
 
 /* Loop or bb location, with hotness information.  */
@@ -898,23 +899,30 @@ try_vectorize_loop_1 (hash_table *_to_vf_htab,
 	  && ! loop->inner)
 	{
 	  basic_block bb = loop->header;
-	  bool has_mask_load_store = false;
+	  bool require_loop_vectorize = false;
 	  for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
 	   !gsi_end_p (gsi); gsi_next ())
 	{
 	  gimple *stmt = gsi_stmt (gsi);
-	  if (is_gimple_call (stmt)
-		  && gimple_call_internal_p (stmt)
-		  && (gimple_call_internal_fn (stmt) == IFN_MASK_LOAD
-		  || gimple_call_internal_fn (stmt) == IFN_MASK_STORE))
+	  gcall *call = dyn_cast  (stmt);
+	  if (call && gimple_call_internal_p (call))
 		{
-		  has_mask_load_store = true;
-		  break;
+		  internal_fn ifn = gimple_call_internal_fn (call);
+		  if (ifn == IFN_MASK_LOAD || ifn == IFN_MASK_STORE
+		  /* Don't keep the if-converted parts when the ifn with
+			 specifc type is not supported by the backend.  */
+		  || (direct_internal

Re: [PATCH] Redirect call within specific target attribute among MV clones (PR ipa/82625).

2018-10-08 Thread Renlin Li

Hi Martin,

pr82625.C failed on compiler builds which don't support "default" and "avx" 
target.
For example, arm/aarch64 native linux gcc compiler.


As I found in this gcc wiki: https://gcc.gnu.org/wiki/FunctionMultiVersioning
'''
This support is available in GCC 4.8 and later. Support is only available in 
C++ for i386 targets.
'''

Should the test be guarded with a target selector or require function 
multi-versioning instead of ifunc?

Regards,
Renlin


On 10/04/2018 02:56 PM, Martin Liška wrote:

Hi.

When having a pair of target clones where foo calls bar, if the target
attribute are equal we can redirect the call and not use ifunc dispatcher.

Patch survives regression tests on x86_64-linux-gnu.
Ready for trunk?

Martin

gcc/ChangeLog:

2018-10-04  Martin Liska  

PR ipa/82625
* multiple_target.c (redirect_to_specific_clone): New function.
(ipa_target_clone): Use it.
* tree-inline.c: Fix comment.

gcc/testsuite/ChangeLog:

2018-10-04  Martin Liska  

PR ipa/82625
* g++.dg/ext/pr82625.C: New test.
---
  gcc/multiple_target.c  | 51 ++
  gcc/testsuite/g++.dg/ext/pr82625.C | 36 +
  gcc/tree-inline.c  |  2 +-
  3 files changed, 88 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/ext/pr82625.C




Re: [PATCH][PR84877]Dynamically align the address for local parameter copy on the stack when required alignment is larger than MAX_SUPPORTED_STACK_ALIGNMENT

2018-07-23 Thread Renlin Li

Hi Jeff,

On 06/29/2018 08:34 PM, Jeff Law wrote:

On 03/22/2018 05:56 AM, Renlin Li wrote:

Hi all,

As described in PR84877. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84877
The local copy of parameter on stack is not aligned.

For BLKmode paramters, a local copy on the stack will be saved.
There are three cases:
1) arguments passed partially on the stack, partially via registers.
2) arguments passed fully on the stack.
3) arguments passed via registers.

After the change here, in all three cases, the stack slot for the local 
parameter copy is aligned by the data type.Presumably this is only for named 
arguments.  If we have to deal with

stdarg/varargs there's a number of additional complications we'd need to
worry about.




The stack slot is the DECL_RTL of the parameter. All the references thereafter 
in the function will refer to this RTL.

OK.  This implies we're dealing strictly with named arguments.


Yes, only for named arguments.







To populate the local copy on the stack,
For case 1) and 2), there are operations to move data from the caller's stack 
(from incoming rtl) into callee's stack.

For case 3), the registers are directly saved into the stack slot.

In all cases, the destination address is properly aligned.
But for case 1) and case 2), the source address is not aligned by the type. It 
is defined by the PCS how the arguments are prepared.

I'm not 100% sure the destination is always aligned.  I vaguely recall
the PA being an oddball on this kind of stuff.





The block move operation is fulfilled by emit_block_move (). As far as I can 
see,

Yes.  But we may have had to flush argument registers to memory prior to
using emit_block_move.  And the flushing operation can be odd because of
things like alignment, padding, etc.  The PA in particular was an
oddball here, but I don't remember the precise details.


I might not paragraph it properly, Block move will be generated for parameters 
passed on the stack.
If the parameters are of BLKmode but passed via registers (case target by this 
patch), in general cases, it will
move the reg into its stack slot first.

And the stack slot is dynamically aligned with this patch.






it will use the smaller alignment of source and destination.
This looks fine as long as we don't use instructions which requires a strict 
larger alignment than the address actually has.

Right.





Here, it only changes receiving parameters.
The function assign_stack_local_1 will be called in various places.
Usually, the caller will constraint the ALIGN parameter. For example via 
STACK_SLOT_ALIGNMENT macro.

assign_parm_setup_block will call assign_stack_local () with alignment from the 
parameter type which in this case could be

larger than MAX_SUPPORTED_STACK_ALIGNMENT.

The alignment operation for parameter copy on the stack is similar to stack 
vars.

First, enough space is reserved on the stack. The size is fixed at compile time.

Instructions are emitted to dynamically get an aligned address at runtime 
within this piece of memory.

At least that's how it's supposed to work.  I have some concerns about
the existing dynamic alignment bits independent of your change.


I will double check.






This will unavoidably increase the usage of stack. However, it really depends on

how many over-aligned parameters are passed by value.

It's relatively rare in my experience, so I wouldn't let this get in the
way.




x86-linux, arm-none-eabi, aarch64-one-elf regression test Okay.
linux-armhf bootstrap Okay.
   
I assume there are other targets which will be affected by the change.

But I don't have environment to test.

I don't think my tester will help much here as over-aligned parameters
are relatively rare and likely not triggered by bootstraps.



Okay the commit?
   


Regards,
Renlin

gcc/

2018-03-22  Renlin Li  

 PR middle-end/84877
 * explow.h (get_dynamic_stack_size): Declare it as external.
 * explow.c (record_new_stack_level): Remove function static attribute.
 * function.c (assign_stack_local_1): Dynamically align the stack slot
 addr for parameter copy on the stack.

gcc/testsuite/

2018-03-22  Renlin Li  

 PR middle-end/84877
 * gcc.dg/pr84877.c: New.

OK.  Certainly keep an eye out for issues on other targets.
Jeff



Thanks! I will run regression test again and commit it if there is no 
regression.
Will track and take related issues if there are any.


Thanks,
Renlin




Re: Re: [GCC][PATCH][Aarch64] Exploiting BFXIL when OR-ing two AND-operations with appropriate bitmasks

2018-07-23 Thread Renlin Li

+(define_insn "*aarch64_bfxil"
+  [(set (match_operand:GPI 0 "register_operand" "=r,r")
+(ior:GPI (and:GPI (match_operand:GPI 1 "register_operand" "r,0")
+   (match_operand:GPI 3 "const_int_operand" "n, Ulc"))
+   (and:GPI (match_operand:GPI 2 "register_operand" "0,r")
+   (match_operand:GPI 4 "const_int_operand" "Ulc, n"]
+  "INTVAL (operands[3]) == ~INTVAL (operands[4])"
+  {
+switch (which_alternative)
+{
+  case 0:
+   operands[3] = GEN_INT (ceil_log2 (INTVAL (operands[3])));
+   return "bfxil\\t%0, %1, 0, %3";
+  case 1:
+   operands[3] = GEN_INT (ceil_log2 (INTVAL (operands[4])));
+   return "bfxil\\t%0, %2, 0, %3";
+  default:
+   gcc_unreachable ();
+}
+  }
+  [(set_attr "type" "bfm")]
+)


Hi Sam,

Is it possible that operand[3] or operand[4] is 1?

ceil_log2() could return 0 if the argument is 1.
The width field of the resulting instruction will be 0. Is it still correct?

Regard,
Renlin



On 07/20/2018 10:33 AM, Sam Tebbs wrote:

Please disregard the original patch and see this updated version.


On 07/20/2018 10:31 AM, Sam Tebbs wrote:

Hi all,

Here is an updated patch that does the following:

* Adds a new constraint in config/aarch64/constraints.md to check for a constant integer that is left consecutive. This addresses Richard 
Henderson's suggestion about combining the aarch64_is_left_consecutive call and the const_int match in the pattern.


* Merges the two patterns defined into one.

* Changes the pattern's type attribute to bfm.

* Improved the comment above the aarch64_is_left_consecutive implementation.

* Makes the pattern use the GPI iterator to accept smaller integer sizes (an 
extra test is added to check for this).

* Improves the tests in combine_bfxil.c to ensure they aren't optimised away 
and that they check for the pattern's correctness.

Below is a new changelog and the example given before.

Is this OK for trunk?

This patch adds an optimisation that exploits the AArch64 BFXIL instruction
when or-ing the result of two bitwise and operations with non-overlapping
bitmasks (e.g. (a & 0x) | (b & 0x)).

Example:

unsigned long long combine(unsigned long long a, unsigned long long b) {
  return (a & 0xll) | (b & 0xll);
}

void read(unsigned long long a, unsigned long long b, unsigned long long *c) {
  *c = combine(a, b);
}

When compiled with -O2, read would result in:

read:
  and   x5, x1, #0x
  and   x4, x0, #0x
  orr   x4, x4, x5
  str   x4, [x2]
  ret

But with this patch results in:

read:
  mov    x4, x0
  bfxil    x4, x1, 0, 32
  str    x4, [x2]
  ret



Bootstrapped and regtested on aarch64-none-linux-gnu and aarch64-none-elf with 
no regressions.


gcc/
2018-07-11  Sam Tebbs  

    PR target/85628
    * config/aarch64/aarch64.md (*aarch64_bfxil):
    Define.
    * config/aarch64/constraints.md (Ulc): Define
    * config/aarch64/aarch64-protos.h (aarch64_is_left_consecutive):
    Define.
    * config/aarch64/aarch64.c (aarch64_is_left_consecutive): New function.

gcc/testsuite
2018-07-11  Sam Tebbs  

    PR target/85628
    * gcc.target/aarch64/combine_bfxil.c: New file.
    * gcc.target/aarch64/combine_bfxil_2.c: New file.


On 07/19/2018 02:02 PM, Sam Tebbs wrote:

Hi Richard,

Thanks for the feedback. I find that using "is_left_consecutive" is more descriptive than checking for it being a power of 2 - 1, since it 
describes the requirement (having consecutive ones from the MSB) more explicitly. I would be happy to change it though if that is the consensus.


I have addressed your point about just returning the string instead of using output_asm_insn and have changed it locally. I'll send an updated 
patch soon.



On 07/17/2018 02:33 AM, Richard Henderson wrote:

On 07/16/2018 10:10 AM, Sam Tebbs wrote:

+++ b/gcc/config/aarch64/aarch64.c
@@ -1439,6 +1439,14 @@ aarch64_hard_regno_caller_save_mode (unsigned regno, 
unsigned,
  return SImode;
  }
  +/* Implement IS_LEFT_CONSECUTIVE.  Check if an integer's bits are consecutive
+   ones from the MSB.  */
+bool
+aarch64_is_left_consecutive (HOST_WIDE_INT i)
+{
+  return (i | (i - 1)) == HOST_WIDE_INT_M1;
+}
+

...

+(define_insn "*aarch64_bfxil"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+    (ior:DI (and:DI (match_operand:DI 1 "register_operand" "r")
+    (match_operand 3 "const_int_operand"))
+    (and:DI (match_operand:DI 2 "register_operand" "0")
+    (match_operand 4 "const_int_operand"]
+  "INTVAL (operands[3]) == ~INTVAL (operands[4])
+    && aarch64_is_left_consecutive (INTVAL (operands[4]))"

Better is to use a define_predicate to merge both that second test and the
const_int_operand.

(I'm not sure about the "left_consecutive" language either.
Isn't it more descriptive to say that op3 is a power of 2 minus 1?)

(define_predicate "pow2m1_operand"
   (and (match_code 

Re: [PATCH]Correct comment for ADDR_EXPR tree code.

2018-03-24 Thread Renlin Li

Hi Jeff,

On 23/03/18 23:19, Jeff Law wrote:

On 03/23/2018 09:44 AM, Renlin Li wrote:

Hi all,

This is a simple patch to correct the comment for ADDR_EXPR tree code.

The resulting expression of ADDR_EXPR is a tree with POINTER_TYPE.
So the result mode should ptr_mode instead of Pmode.

As far as I understand, Pmode is the addressing mode. But not the mode
to represent a pointer (or address?).

Okay to commit?

Regards,
Renlin

gcc/ChangeLog:

2018-03-23  Renlin Li  <renlin...@arm.com>

 * tree.def (ADDR_EXPR): Correct the commnet.I'm not sure this is strictly 
correct.  More importantly, I'm not sure

why we care :-0

Modes are more of a target/RTL issue.  Why a tree node needs to specify
a mode in this case vs a type seems to be the more important question.


It is a very minor issue, I just come cross the comment which doesn't seem very 
right.

I agree, the type is more meaningful than the machine mode to describe a tree 
node.
The result of ADDR_EXPR should be an expression of POINTER_TYPE or 
REFERENCE_TYPE as the document indicates.

I can replace the sentence "Result mode is Pmode." with "The result expression will 
always have pointer or reference type."

Thanks!
Renlin




jeff





[PATCH]Correct comment for ADDR_EXPR tree code.

2018-03-23 Thread Renlin Li

Hi all,

This is a simple patch to correct the comment for ADDR_EXPR tree code.

The resulting expression of ADDR_EXPR is a tree with POINTER_TYPE.
So the result mode should ptr_mode instead of Pmode.

As far as I understand, Pmode is the addressing mode. But not the mode to 
represent a pointer (or address?).

Okay to commit?

Regards,
Renlin

gcc/ChangeLog:

2018-03-23  Renlin Li  <renlin...@arm.com>

* tree.def (ADDR_EXPR): Correct the commnet.
diff --git a/gcc/tree.def b/gcc/tree.def
index 31de6c0994de43c175b924d4ba578a131fb4d524..1e5aca811f801c54be9215a9d86028f50a4ec608 100644
--- a/gcc/tree.def
+++ b/gcc/tree.def
@@ -870,7 +870,7 @@ DEFTREECODE (COMPOUND_LITERAL_EXPR, "compound_literal_expr", tcc_expression, 1)
 DEFTREECODE (SAVE_EXPR, "save_expr", tcc_expression, 1)
 
 /* & in C.  Value is the address at which the operand's value resides.
-   Operand may have any mode.  Result mode is Pmode.  */
+   Operand may have any mode.  Result mode is ptr_mode.  */
 DEFTREECODE (ADDR_EXPR, "addr_expr", tcc_expression, 1)
 
 /* Operand0 is a function constant; result is part N of a function


Re: [PATCH][PR84877]Dynamically align the address for local parameter copy on the stack when required alignment is larger than MAX_SUPPORTED_STACK_ALIGNMENT

2018-03-22 Thread Renlin Li

Hi H.J.

On 22/03/18 12:55, H.J. Lu wrote:

On Thu, Mar 22, 2018 at 5:52 AM, H.J. Lu <hjl.to...@gmail.com> wrote:

On Thu, Mar 22, 2018 at 4:56 AM, Renlin Li <renlin...@foss.arm.com> wrote:

Hi all,

As described in PR84877. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84877
The local copy of parameter on stack is not aligned.

For BLKmode paramters, a local copy on the stack will be saved.
There are three cases:
1) arguments passed partially on the stack, partially via registers.
2) arguments passed fully on the stack.
3) arguments passed via registers.

After the change here, in all three cases, the stack slot for the local
parameter copy is aligned by the data type.
The stack slot is the DECL_RTL of the parameter. All the references
thereafter in the function will refer to this RTL.

To populate the local copy on the stack,
For case 1) and 2), there are operations to move data from the caller's
stack (from incoming rtl) into callee's stack.
For case 3), the registers are directly saved into the stack slot.

In all cases, the destination address is properly aligned.
But for case 1) and case 2), the source address is not aligned by the type.
It is defined by the PCS how the arguments are prepared.
The block move operation is fulfilled by emit_block_move (). As far as I can
see,
it will use the smaller alignment of source and destination.
This looks fine as long as we don't use instructions which requires a strict
larger alignment than the address actually has.

Here, it only changes receiving parameters.
The function assign_stack_local_1 will be called in various places.
Usually, the caller will constraint the ALIGN parameter. For example via
STACK_SLOT_ALIGNMENT macro.
assign_parm_setup_block will call assign_stack_local () with alignment from
the parameter type which in this case could be
larger than MAX_SUPPORTED_STACK_ALIGNMENT.

The alignment operation for parameter copy on the stack is similar to stack
vars.
First, enough space is reserved on the stack. The size is fixed at compile
time.
Instructions are emitted to dynamically get an aligned address at runtime
within this piece of memory.

This will unavoidably increase the usage of stack. However, it really
depends on
how many over-aligned parameters are passed by value.

x86-linux, arm-none-eabi, aarch64-one-elf regression test Okay.
linux-armhf bootstrap Okay.

I assume there are other targets which will be affected by the change.
But I don't have environment to test.

Okay the commit?


Regards,
Renlin

gcc/

2018-03-22  Renlin Li  <renlin...@arm.com>

 PR middle-end/84877
 * explow.h (get_dynamic_stack_size): Declare it as external.
 * explow.c (record_new_stack_level): Remove function static
attribute.
 * function.c (assign_stack_local_1): Dynamically align the stack
slot
 addr for parameter copy on the stack.

gcc/testsuite/

2018-03-22  Renlin Li  <renlin...@arm.com>

 PR middle-end/84877
 * gcc.dg/pr84877.c: New.

^^

But your patch has

diff --git a/gcc/testsuite/gcc.target/arm/pr84877.c
b/gcc/testsuite/gcc.target/arm/pr84877.c
new file mode 100644
index 
..b2df8a954f566dea3a4f1ed70572b43de39dda82
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr84877.c

Why is this ARM specific?

I attached the wrong patch. The testcase should really be 
gcc/testsuite/gcc.dg/pr84877.c



+#include 
+
+struct U {
+int M0;
+int M1;
+} __attribute ((aligned(16)));
 Some targets align stack to 16 bytes by default.
You need 32 byte alignment to better test stack alignment.


yes, you are right here.

IIUIC, as for x86 and aarch64, the STRICT_ALIGNMNET marco is false and the 
STACK_BOUNDARY is 128.
For this particular test case here, the patch didn't make any difference.

As the code here indicates, for parameters passed on the stack fully or 
partially, a local copy
will be allocated when the following condition is true. Otherwise, it will use 
the stack space prepared by the caller.
And for that part, it is ABI specific.


  /* If we can't trust the parm stack slot to be aligned enough for its
 ultimate type, don't use that slot after entry.  We'll make another
 stack slot, if we need one.  */
  if (stack_parm
  && ((STRICT_ALIGNMENT
   && GET_MODE_ALIGNMENT (data->nominal_mode) > MEM_ALIGN (stack_parm))
  || (data->nominal_type
  && TYPE_ALIGN (data->nominal_type) > MEM_ALIGN (stack_parm)
  && MEM_ALIGN (stack_parm) < PREFERRED_STACK_BOUNDARY)))
stack_parm = NULL;


So even I made the alignment 32 byte, in this case, the value will be passed on 
the stack.
the stack slot from caller will be used. No local copy will be created.
It won't shown any code-gen difference in x86 target.

The alignment test will test the stack address prepared by the caller.


Regards,
Renlin




+



[PATCH][PR84877]Dynamically align the address for local parameter copy on the stack when required alignment is larger than MAX_SUPPORTED_STACK_ALIGNMENT

2018-03-22 Thread Renlin Li

Hi all,

As described in PR84877. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84877
The local copy of parameter on stack is not aligned.

For BLKmode paramters, a local copy on the stack will be saved.
There are three cases:
1) arguments passed partially on the stack, partially via registers.
2) arguments passed fully on the stack.
3) arguments passed via registers.

After the change here, in all three cases, the stack slot for the local 
parameter copy is aligned by the data type.
The stack slot is the DECL_RTL of the parameter. All the references thereafter 
in the function will refer to this RTL.

To populate the local copy on the stack,
For case 1) and 2), there are operations to move data from the caller's stack 
(from incoming rtl) into callee's stack.
For case 3), the registers are directly saved into the stack slot.

In all cases, the destination address is properly aligned.
But for case 1) and case 2), the source address is not aligned by the type. It 
is defined by the PCS how the arguments are prepared.
The block move operation is fulfilled by emit_block_move (). As far as I can 
see,
it will use the smaller alignment of source and destination.
This looks fine as long as we don't use instructions which requires a strict 
larger alignment than the address actually has.

Here, it only changes receiving parameters.
The function assign_stack_local_1 will be called in various places.
Usually, the caller will constraint the ALIGN parameter. For example via 
STACK_SLOT_ALIGNMENT macro.
assign_parm_setup_block will call assign_stack_local () with alignment from the 
parameter type which in this case could be
larger than MAX_SUPPORTED_STACK_ALIGNMENT.

The alignment operation for parameter copy on the stack is similar to stack 
vars.
First, enough space is reserved on the stack. The size is fixed at compile time.
Instructions are emitted to dynamically get an aligned address at runtime 
within this piece of memory.

This will unavoidably increase the usage of stack. However, it really depends on
how many over-aligned parameters are passed by value.

x86-linux, arm-none-eabi, aarch64-one-elf regression test Okay.
linux-armhf bootstrap Okay.

I assume there are other targets which will be affected by the change.
But I don't have environment to test.

Okay the commit?


Regards,
Renlin

gcc/

2018-03-22  Renlin Li  <renlin...@arm.com>

PR middle-end/84877
* explow.h (get_dynamic_stack_size): Declare it as external.
* explow.c (record_new_stack_level): Remove function static attribute.
* function.c (assign_stack_local_1): Dynamically align the stack slot
addr for parameter copy on the stack.

gcc/testsuite/

2018-03-22  Renlin Li  <renlin...@arm.com>

PR middle-end/84877
* gcc.dg/pr84877.c: New.
diff --git a/gcc/explow.h b/gcc/explow.h
index 18c13804b067d64dea159c0deef8e4f011be47ee..b263d353b84b9c10d04b9a8d7257c14c1c7b7ccc 100644
--- a/gcc/explow.h
+++ b/gcc/explow.h
@@ -104,6 +104,9 @@ extern void get_dynamic_stack_size (rtx *, unsigned, unsigned, HOST_WIDE_INT *);
 /* Returns the address of the dynamic stack space without allocating it.  */
 extern rtx get_dynamic_stack_base (poly_int64, unsigned);
 
+/* Return an rtx doing runtime alignment to REQUIRED_ALIGN on TARGET.  */
+extern rtx align_dynamic_address (rtx, unsigned);
+
 /* Emit one stack probe at ADDRESS, an address within the stack.  */
 extern void emit_stack_probe (rtx);
 
diff --git a/gcc/explow.c b/gcc/explow.c
index 042e71904ec897f6f8c6964119d4318dfe51bcc4..e7e20c1ec255be81e42c3d904191fa6c17c8cc77 100644
--- a/gcc/explow.c
+++ b/gcc/explow.c
@@ -1175,9 +1175,10 @@ record_new_stack_level (void)
   if (targetm_common.except_unwind_info (_options) == UI_SJLJ)
 update_sjlj_context ();
 }
-
+
 /* Return an rtx doing runtime alignment to REQUIRED_ALIGN on TARGET.  */
-static rtx
+
+rtx
 align_dynamic_address (rtx target, unsigned required_align)
 {
   /* CEIL_DIV_EXPR needs to worry about the addition overflowing,
diff --git a/gcc/function.c b/gcc/function.c
index 1a09ff0d31e2c95dec179b5cb2a9883b05fcc3d7..036be3b4912086a153939b97583e9212237b9b64 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -372,6 +372,7 @@ assign_stack_local_1 (machine_mode mode, poly_int64 size,
   poly_int64 bigend_correction = 0;
   poly_int64 slot_offset = 0, old_frame_offset;
   unsigned int alignment, alignment_in_bits;
+  bool dynamic_align_addr = false;
 
   if (align == 0)
 {
@@ -390,14 +391,20 @@ assign_stack_local_1 (machine_mode mode, poly_int64 size,
 
   alignment_in_bits = alignment * BITS_PER_UNIT;
 
-  /* Ignore alignment if it exceeds MAX_SUPPORTED_STACK_ALIGNMENT.  */
   if (alignment_in_bits > MAX_SUPPORTED_STACK_ALIGNMENT)
 {
-  alignment_in_bits = MAX_SUPPORTED_STACK_ALIGNMENT;
-  alignment = alignment_in_bits / BITS_PER_UNIT;
+  /* If the required alignment exceeds MAX_SUPPORTED_STACK_ALIGNMENT and
+	 it is not OK to reduce it.  Align the s

[PATCH][AARCH64]Fix immediate alternative of movhf_aarch64 pattern.

2018-03-07 Thread Renlin Li

Hi all,

For the immediate alternative, the constraint checks the operand with HF mode
while SImode is provided to the output template generation function.

Before the change, this inconsistency causes an ICE compiling the new test case 
in the patch.


aarch64-none-elf regression test Okay. Okay to commit the patch?

Regards,
Renlin

gcc/ChangeLog:

2018-03-07  Renlin Li  <renlin...@arm.com>

* config/aarch64/aarch64.md (movhf_aarch64): Fix mode argument to
aarch64_output_scalar_simd_mov_immediate.

gcc/testsuite/ChangeLog:

2018-03-07  Renlin Li  <renlin...@arm.com>

* gcc.target/aarch64/movi_hf.c: New.
* gcc.target/aarch64/f16_mov_immediate_1.c: Update.
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 5a2a9309a3bbbfad6fcb6db07422d774909f0ba1..391fdd07e52f4d165a0109e3baa82571bafa37de 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1145,7 +1145,7 @@
umov\\t%w0, %1.h[0]
mov\\t%0.h[0], %1.h[0]
fmov\\t%h0, %1
-   * return aarch64_output_scalar_simd_mov_immediate (operands[1], SImode);
+   * return aarch64_output_scalar_simd_mov_immediate (operands[1], HImode);
ldr\\t%h0, %1
str\\t%h1, %0
ldrh\\t%w0, %1
diff --git a/gcc/testsuite/gcc.target/aarch64/f16_mov_immediate_1.c b/gcc/testsuite/gcc.target/aarch64/f16_mov_immediate_1.c
index 1ed3831e139745227487eafa3ccfdc05c99deb34..3d22d225851af653f17e04ce7c7cc65ee1c86172 100644
--- a/gcc/testsuite/gcc.target/aarch64/f16_mov_immediate_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/f16_mov_immediate_1.c
@@ -45,5 +45,5 @@ __fp16 f5 ()
 }
 
 /* { dg-final { scan-assembler-times "mov\tw\[0-9\]+, #?19520"   3 } } */
-/* { dg-final { scan-assembler-times "movi\tv\[0-9\]+\\\.2s, 0xbc, lsl 8"  1 } } */
-/* { dg-final { scan-assembler-times "movi\tv\[0-9\]+\\\.2s, 0x4c, lsl 8"  1 } } */
+/* { dg-final { scan-assembler-times "movi\tv\[0-9\]+\\\.4h, 0xbc, lsl 8"  1 } } */
+/* { dg-final { scan-assembler-times "movi\tv\[0-9\]+\\\.4h, 0x4c, lsl 8"  1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/movi_hf.c b/gcc/testsuite/gcc.target/aarch64/movi_hf.c
new file mode 100644
index ..9521b9b09c87bd5f19cb6b62b1228bae685d8667
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/movi_hf.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O0 -std=c99" } */
+
+__fp16
+foo ()
+{
+  /* { dg-final { scan-assembler "movi\tv\[0-9\]+\.8b" } } */
+  return 0x1.544p5;
+}


Re: [PR83370][AARCH64]Use tighter register constraints for sibcall patterns.

2018-02-01 Thread Renlin Li

Hi James,

Thanks for the review! I committed it on trunk.

Is it Okay to backport this patch to release branch 5, 6,7?
It applies cleanly without any logic changes.

Regards,
Renlin

On 31/01/18 17:56, James Greenhalgh wrote:

On Tue, Jan 30, 2018 at 03:45:17PM +, Renlin Li wrote:

Hi Richard,

Thanks for the review!

On 29/01/18 20:23, Richard Sandiford wrote:


The patch looks good to me FWIW.  How about adding something like
the following testcase?


/* { dg-do run } */
/* { dg-options "-O2" } */

typedef void (*fun) (void);

void __attribute__ ((noipa))
f (fun x1)
{
register fun x2 asm ("x16");
int arr[5000];
int *volatile ptr = arr;
asm ("mov %0, %1" : "=r" (x2) : "r" (x1));
x2 ();
}

void g (void) {}

int
main (void)
{
f (g);
}



It was hard for me to construct an test case at that time.
Your example here exactly reflect the problem. The code-gen before the change 
is:

f:
mov x16, 20016
sub sp, sp, x16
add x0, sp, 16
mov x16, 20016
str x0, [sp, 8]
add sp, sp, x16
br  x16

After the change to the register constraint:

f:
mov x16, 20016
sub sp, sp, x16
// Start of user assembly
// 9 "indirect_tail_call_reg.c" 1
mov x16, x0
// 0 "" 2
// End of user assembly
add x0, sp, 16
str x0, [sp, 8]
mov x0, x16
mov x16, 20016
add sp, sp, x16
br  x0

I updated the patch with the new test case,
the wording about the register constraint is also updated.


This patch is OK.

Thanks,
James


gcc/ChangeLog:

2018-01-30  Renlin Li  <renlin...@arm.com>

  * config/aarch64/aarch64.c (aarch64_class_max_nregs): Handle
  TAILCALL_ADDR_REGS.
  (aarch64_register_move_cost): Likewise.
  * config/aarch64/aarch64.h (reg_class): Rename CALLER_SAVE_REGS to
  TAILCALL_ADDR_REGS.
  (REG_CLASS_NAMES): Likewise.
  (REG_CLASS_CONTENTS): Rename CALLER_SAVE_REGS to
  TAILCALL_ADDR_REGS. Remove IP registers.
  * config/aarch64/aarch64.md (Ucs): Update register constraint.

gcc/testsuite/ChangeLog:

2018-01-30  Richard Sandiford  <richard.sandif...@linaro.org>

  * gcc.target/aarch64/indirect_tail_call_reg.c: New.





Re: [PATCH] have -Wformat-overflow handle -fexec-charset (PR 80503)

2018-02-01 Thread Renlin Li

Hi Martin,


On 01/02/18 00:40, Martin Sebor wrote:

On 01/31/2018 10:36 AM, Renlin Li wrote:

Hi there,

I have a patch to fix to regressions we observed in armhf native
environment.

To effectively check out of range for format string, a target type
should be
used. And according to the standard, int type is used for "width" and
"precision"
field of format string.

Here target_strtol10 is rename to target_strtoi10, and fixed to use
target_int_max () which is target dependent.

The value returned by target_strtol10 is (target_int_max () + 1) when it
exceeds the range.
This is used to indicate its value exceeds target INT_MAX for the later
warning.


Sorry for not responding to your original email. It's still
in my inbox, just buried under a pile of stuff.

Using LONG_MAX is not ideal but unless I missed something
(it's been a while) I think using INT_MAX for the target would
be even less optimal.  Unless specified by the asterisk, width
and precision can be arbitrarily large.


I cannot find more document about this other than here: 
http://en.cppreference.com/w/c/io/fprintf


(optional) integer value or * that specifies minimum field width.
(optional) . followed by integer number or *, or neither that specifies 
precision of the conversion.


It only mentions a integer value, with no type. I assume it could be of type 
int?
It is indeed very unclear about the range of width and precision.


constraining either to INT_MAX would trigger warnings on LP64
targets for safe calls like:

   // INT_MAX is 2147483647
   sprintf (d, "%.2147483648s", "");

I think we want to use the maximum value of an unsigned type
with the greatest precision on the host.

 It will still warn

for directives with precisions in excess of HOST_WIDE_INT_MAX


Is it here a target type should be used to test the range against? Instead of 
the
host where the toolchain is built?

Regards,
Renlin


but short of using something like offset_int it's probably as
good as it's going to get.  (It has been suggested that
the whole pass would benefit from using offset_int to track
large numbers so if/when it's enhanced to make this change
it should also make the code behave more consistently across
different hosts.)






Martin



Is it Okay to commit?

Regards,
Renlin

gcc/ChangeLog:

2018-01-31  Renlin Li  <renlin...@arm.com>

    * gimple-ssa-sprintf.c (target_strtol10): Use target integer to check
    the range. Rename it into 
    (target_strtoi10): This.
    (parse_directive): Compare with target int max instead of LONG_MAX.


On 20/06/17 12:00, Renlin Li wrote:

Hi Martin,

I did a little investigation into this. Please correct me if I missed
anything.

I build a native arm-linux-gnueabihf toolchain in armhf hardware.
It's ILP32. So in this situation:

HOST_WIDE_INT is long, which is 32-bit.
integer type 32-bit as well, so target_int_max () == LONG_MAX


gimple-ssa-sprintf.c line 2887

  /* Has the likely and maximum directive output exceeded INT_MAX?  */
  bool likelyximax = *dir.beg && res->range.likely > target_int_max ();


likelyximax will be false as the latter expression is always false.
res->range.likely is truncated to LONG_MAX (in target_strtol10 function)

I have checked in cross build environment (host x86_64), this variable
is true.

Regards,
Renlin


On 13/06/17 09:16, Renlin Li wrote:

Hi Martin,

On 04/06/17 23:24, Martin Sebor wrote:

On 06/02/2017 09:38 AM, Renlin Li wrote:

Hi Martin,

After r247444, I saw the following two regressions in
arm-linux-gnueabihf environment:

FAIL: gcc.dg/tree-ssa/builtin-sprintf-warn-18.c  (test for warnings,
line 119)
PASS: gcc.dg/tree-ssa/builtin-sprintf-warn-18.c  (test for warnings,
line 121)
FAIL: gcc.dg/tree-ssa/builtin-sprintf-warn-18.c  (test for warnings,
line 121)

The warning message related to those two lines are:
testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c:119:3: warning:
'%9223372036854775808i' directive width out of range
[-Wformat-overflow=]

testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c:121:3: warning:
'%.9223372036854775808i' directive precision out of range
[-Wformat-overflow=]

testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c:121:3: warning:
'%.9223372036854775808i' directive precision out of range
[-Wformat-overflow=]

Did you notice similar things from your test environment, Christophe?


Looks like you're missing a couple of warnings.  I see the following
output with both my arm-linux-gnueabihf cross compiler and my native
x86_64 GCC, both in 32-bit and 64-bit modes, as expected by the test,
so I don't see the same issue in my environment.


Yes, it happens on arm-linux-gnueabihf native environment. the
warnings with "INT_MAX"
line are missing. I don't know if the host environment will cause the
difference.

Regards,
Renlin




Re: Re: [PATCH] have -Wformat-overflow handle -fexec-charset (PR 80503)

2018-01-31 Thread Renlin Li

Hi there,

I have a patch to fix to regressions we observed in armhf native environment.

To effectively check out of range for format string, a target type should be
used. And according to the standard, int type is used for "width" and 
"precision"
field of format string.

Here target_strtol10 is rename to target_strtoi10, and fixed to use
target_int_max () which is target dependent.

The value returned by target_strtol10 is (target_int_max () + 1) when it 
exceeds the range.
This is used to indicate its value exceeds target INT_MAX for the later warning.

Is it Okay to commit?

Regards,
Renlin

gcc/ChangeLog:

2018-01-31  Renlin Li  <renlin...@arm.com>

* gimple-ssa-sprintf.c (target_strtol10): Use target integer to check
the range. Rename it into 
(target_strtoi10): This.
(parse_directive): Compare with target int max instead of LONG_MAX.


On 20/06/17 12:00, Renlin Li wrote:

Hi Martin,

I did a little investigation into this. Please correct me if I missed anything.

I build a native arm-linux-gnueabihf toolchain in armhf hardware.
It's ILP32. So in this situation:

HOST_WIDE_INT is long, which is 32-bit.
integer type 32-bit as well, so target_int_max () == LONG_MAX


gimple-ssa-sprintf.c line 2887

  /* Has the likely and maximum directive output exceeded INT_MAX?  */
  bool likelyximax = *dir.beg && res->range.likely > target_int_max ();


likelyximax will be false as the latter expression is always false.
res->range.likely is truncated to LONG_MAX (in target_strtol10 function)

I have checked in cross build environment (host x86_64), this variable is true.

Regards,
Renlin


On 13/06/17 09:16, Renlin Li wrote:

Hi Martin,

On 04/06/17 23:24, Martin Sebor wrote:

On 06/02/2017 09:38 AM, Renlin Li wrote:

Hi Martin,

After r247444, I saw the following two regressions in
arm-linux-gnueabihf environment:

FAIL: gcc.dg/tree-ssa/builtin-sprintf-warn-18.c  (test for warnings,
line 119)
PASS: gcc.dg/tree-ssa/builtin-sprintf-warn-18.c  (test for warnings,
line 121)
FAIL: gcc.dg/tree-ssa/builtin-sprintf-warn-18.c  (test for warnings,
line 121)

The warning message related to those two lines are:
testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c:119:3: warning:
'%9223372036854775808i' directive width out of range [-Wformat-overflow=]

testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c:121:3: warning:
'%.9223372036854775808i' directive precision out of range
[-Wformat-overflow=]

testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c:121:3: warning:
'%.9223372036854775808i' directive precision out of range
[-Wformat-overflow=]

Did you notice similar things from your test environment, Christophe?


Looks like you're missing a couple of warnings.  I see the following
output with both my arm-linux-gnueabihf cross compiler and my native
x86_64 GCC, both in 32-bit and 64-bit modes, as expected by the test,
so I don't see the same issue in my environment.


Yes, it happens on arm-linux-gnueabihf native environment. the warnings with 
"INT_MAX"
line are missing. I don't know if the host environment will cause the 
difference.

Regards,
Renlin
diff --git a/gcc/gimple-ssa-sprintf.c b/gcc/gimple-ssa-sprintf.c
index 14b12191d9b16699b28541cb24914fa9f8d8fea9..3a903ffad8e524c6d2fd405812ebc0869bfed7a7 100644
--- a/gcc/gimple-ssa-sprintf.c
+++ b/gcc/gimple-ssa-sprintf.c
@@ -399,12 +399,12 @@ target_to_host (char *hostr, size_t hostsz, const char *targstr)
 }
 
 /* Convert the sequence of decimal digits in the execution character
-   starting at S to a long, just like strtol does.  Return the result
-   and set *END to one past the last converted character.  On range
-   error set ERANGE to the digit that caused it.  */
+   starting at S to a int.  Return the result and set *END to one past the last
+   converted character.
+   On range error set ERANGE to the digit that caused it.  */
 
-static inline long
-target_strtol10 (const char **ps, const char **erange)
+static inline HOST_WIDE_INT
+target_strtoi10 (const char **ps, const char **erange)
 {
   unsigned HOST_WIDE_INT val = 0;
   for ( ; ; ++*ps)
@@ -415,9 +415,9 @@ target_strtol10 (const char **ps, const char **erange)
 	  c -= '0';
 
 	  /* Check for overflow.  */
-	  if (val > (LONG_MAX - c) / 10LU)
+	  if (val > (target_int_max () - c) / 10LU)
 	{
-	  val = LONG_MAX;
+	  val = target_int_max () + 1;
 	  *erange = *ps;
 
 	  /* Skip the remaining digits.  */
@@ -3082,7 +3082,7 @@ parse_directive (sprintf_dom_walker::call_info ,
 	 width and sort it out later after the next character has
 	 been seen.  */
   pwidth = pf;
-  width = target_strtol10 (, );
+  width = target_strtoi10 (, );
 }
   else if (target_to_host (*pf) == '*')
 {
@@ -3164,7 +3164,7 @@ parse_directive (sprintf_dom_walker::call_info ,
 	{
 	  werange = 0;
 	  pwidth = pf;
-	  width = target_strtol10 (, );
+	  width = target_strtoi10 (, );
 	}
   e

Re: [PR83370][AARCH64]Use tighter register constraints for sibcall patterns.

2018-01-30 Thread Renlin Li

Hi Richard,

Thanks for the review!

On 29/01/18 20:23, Richard Sandiford wrote:


The patch looks good to me FWIW.  How about adding something like
the following testcase?


/* { dg-do run } */
/* { dg-options "-O2" } */

typedef void (*fun) (void);

void __attribute__ ((noipa))
f (fun x1)
{
   register fun x2 asm ("x16");
   int arr[5000];
   int *volatile ptr = arr;
   asm ("mov %0, %1" : "=r" (x2) : "r" (x1));
   x2 ();
}

void g (void) {}

int
main (void)
{
   f (g);
}



It was hard for me to construct an test case at that time.
Your example here exactly reflect the problem. The code-gen before the change 
is:

f:
mov x16, 20016
sub sp, sp, x16
add x0, sp, 16
mov x16, 20016
str x0, [sp, 8]
add sp, sp, x16
br  x16

After the change to the register constraint:

f:
mov x16, 20016
sub sp, sp, x16
// Start of user assembly
// 9 "indirect_tail_call_reg.c" 1
mov x16, x0
// 0 "" 2
// End of user assembly
add x0, sp, 16
str x0, [sp, 8]
mov x0, x16
mov x16, 20016
add sp, sp, x16
br  x0

I updated the patch with the new test case,
the wording about the register constraint is also updated.

Thanks,
Renlin

gcc/ChangeLog:

2018-01-30  Renlin Li  <renlin...@arm.com>

* config/aarch64/aarch64.c (aarch64_class_max_nregs): Handle
TAILCALL_ADDR_REGS.
(aarch64_register_move_cost): Likewise.
* config/aarch64/aarch64.h (reg_class): Rename CALLER_SAVE_REGS to
TAILCALL_ADDR_REGS.
(REG_CLASS_NAMES): Likewise.
(REG_CLASS_CONTENTS): Rename CALLER_SAVE_REGS to
TAILCALL_ADDR_REGS. Remove IP registers.
* config/aarch64/aarch64.md (Ucs): Update register constraint.

gcc/testsuite/ChangeLog:

2018-01-30  Richard Sandiford  <richard.sandif...@linaro.org>

* gcc.target/aarch64/indirect_tail_call_reg.c: New.




[...]
diff --git a/gcc/config/aarch64/constraints.md 
b/gcc/config/aarch64/constraints.md
index 
af4143ef756464afac29d17f124b436520f90451..c3791aa89562a5d5542098d2f7951afc57901150
 100644
--- a/gcc/config/aarch64/constraints.md
+++ b/gcc/config/aarch64/constraints.md
@@ -21,8 +21,8 @@
  (define_register_constraint "k" "STACK_REG"
"@internal The stack register.")
  
-(define_register_constraint "Ucs" "CALLER_SAVE_REGS"

-  "@internal The caller save registers.")
+(define_register_constraint "Ucs" "TAILCALL_ADDR_REGS"
+  "@internal The indirect tail call address registers")
  
  (define_register_constraint "w" "FP_REGS"

"Floating point and SIMD vector registers.")


Maybe "@internal Registers suitable for an indirect tail call"?
Unlike the caller-save registers, these aren't a predefined set.

Thanks,
Richard

diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 93d29b84d47b7017661a2129d61e7d740bbf7c93..322b7f4628aa69cf331c12ff2c8df351890da9ef 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -446,7 +446,7 @@ extern unsigned aarch64_architecture_version;
 enum reg_class
 {
   NO_REGS,
-  CALLER_SAVE_REGS,
+  TAILCALL_ADDR_REGS,
   GENERAL_REGS,
   STACK_REG,
   POINTER_REGS,
@@ -462,7 +462,7 @@ enum reg_class
 #define REG_CLASS_NAMES\
 {		\
   "NO_REGS",	\
-  "CALLER_SAVE_REGS",\
+  "TAILCALL_ADDR_REGS",\
   "GENERAL_REGS",\
   "STACK_REG",	\
   "POINTER_REGS",\
@@ -475,7 +475,7 @@ enum reg_class
 #define REG_CLASS_CONTENTS		\
 {	\
   { 0x, 0x, 0x },	/* NO_REGS */		\
-  { 0x0007, 0x, 0x },	/* CALLER_SAVE_REGS */	\
+  { 0x0004, 0x, 0x },	/* TAILCALL_ADDR_REGS */\
   { 0x7fff, 0x, 0x0003 },	/* GENERAL_REGS */	\
   { 0x8000, 0x, 0x },	/* STACK_REG */		\
   { 0x, 0x, 0x0003 },	/* POINTER_REGS */	\
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 1da313f57e0eed4df36dbd15aecbae9fd73f7388..59ca95019ddf0491c382e7ee2b99966694d0b36d 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -6062,7 +6062,7 @@ aarch64_class_max_nregs (reg_class_t regclass, machine_mode mode)
 {
   switch (regclass)
 {
-case CALLER_SAVE_REGS:
+case TAILCALL_ADDR_REGS:
 case POINTER_REGS:
 case GENERAL_REGS:
 case ALL_REGS:
@@ -8228,10 +8228,10 @@ aarch64_register_move_cost (machine_mode mode,
 = aarch64_tune_params.regmove_cost;
 
   /* Caller save and pointer regs are equivalent to GENERAL_REGS.  */
-  if (to == CALLER_SAVE_REGS || to == POINTER_REGS)
+  if (to == TAILCALL_A

[AARCH64]Fix ldr_got_small and ldr_got_small_28k patterns to only allow DImode address.

2018-01-03 Thread Renlin Li

Hi all,

The only allowed addressing mode for aarch64 is DImode, AKA Pmode.
ptr_mode could be SImode or DImode depending on the ABI used.

This patch here fixes the addressing mode of two patterns as DImode.
If any other mode is ever used, somewhere in the compiler might go wrong.


aarch64-none-elf regression test Okay without regressions.

Okay to commit?

Regards,
Renlin

gcc/ChangeLog:

2018-01-03  Renlin Li  <renlin...@arm.com>

* config/aarch64/aarch64.c (aarch64_load_symref_appropriately): Make
sure address is Pmode.
* config/aarch64/aarch64.md (ldr_got_small): Fix address mode as
DImode.
(ldr_got_small_28k): Likewise.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 1da313f..9e9d2ea 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1406,10 +1406,6 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm,
 	rtx s = gen_rtx_SYMBOL_REF (Pmode, "_GLOBAL_OFFSET_TABLE_");
 	crtl->uses_pic_offset_table = 1;
 	emit_move_insn (gp_rtx, gen_rtx_HIGH (Pmode, s));
-
-	if (mode != GET_MODE (gp_rtx))
- gp_rtx = gen_lowpart (mode, gp_rtx);
-
 	  }
 
 	if (mode == ptr_mode)
@@ -1451,13 +1447,19 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm,
 
 	rtx insn;
 	rtx mem;
-	rtx tmp_reg = dest;
+	rtx tmp_reg;
 	machine_mode mode = GET_MODE (dest);
 
 	if (can_create_pseudo_p ())
-	  tmp_reg = gen_reg_rtx (mode);
+	  tmp_reg = gen_reg_rtx (Pmode);
+	else
+	{
+	  gcc_assert (HARD_REGISTER_P (dest));
+	  if (mode != Pmode)
+	tmp_reg = gen_rtx_REG (Pmode, REGNO (dest));
+	}
 
-	emit_move_insn (tmp_reg, gen_rtx_HIGH (mode, imm));
+	emit_move_insn (tmp_reg, gen_rtx_HIGH (Pmode, imm));
 	if (mode == ptr_mode)
 	  {
 	if (mode == DImode)
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index f1e2a07..a1f2d2d 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -5407,9 +5407,9 @@
 
 (define_insn "ldr_got_small_"
   [(set (match_operand:PTR 0 "register_operand" "=r")
-	(unspec:PTR [(mem:PTR (lo_sum:PTR
-			  (match_operand:PTR 1 "register_operand" "r")
-			  (match_operand:PTR 2 "aarch64_valid_symref" "S")))]
+	(unspec:PTR [(mem:PTR (lo_sum:DI
+			  (match_operand:DI 1 "register_operand" "r")
+			  (match_operand:DI 2 "aarch64_valid_symref" "S")))]
 		UNSPEC_GOTSMALLPIC))]
   ""
   "ldr\\t%0, [%1, #:got_lo12:%c2]"
@@ -5430,9 +5430,9 @@
 
 (define_insn "ldr_got_small_28k_"
   [(set (match_operand:PTR 0 "register_operand" "=r")
-	(unspec:PTR [(mem:PTR (lo_sum:PTR
-			  (match_operand:PTR 1 "register_operand" "r")
-			  (match_operand:PTR 2 "aarch64_valid_symref" "S")))]
+	(unspec:PTR [(mem:PTR (lo_sum:DI
+			  (match_operand:DI 1 "register_operand" "r")
+			  (match_operand:DI 2 "aarch64_valid_symref" "S")))]
 		UNSPEC_GOTSMALLPIC28K))]
   ""
   "ldr\\t%0, [%1, #::%c2]"


Re: [PR83370][AARCH64]Use tighter register constraints for sibcall patterns.

2017-12-20 Thread Renlin Li

Ping ~

On 11/12/17 15:27, Renlin Li wrote:

Hi all,

In aarch64 backend, ip0/ip1 register will be used in the prologue/epilogue as
temporary register.

When the compiler is performing sibcall optimization. It has the chance to use
ip0/ip1 register for indirect function call to hold the address. However, those 
two register might
be clobbered by the epilogue code which makes the last sibcall instruction
invalid.

The following is an extreme example:
When built with -O2 -ffixed-x0 -ffixed-x1 -ffixed-x2 -ffixed-x3 -ffixed-x4 
-ffixed-x5 -ffixed-x6 -ffixed-x7
-ffixed-x8 -ffixed-x9 -ffixed-x10 -ffixed-x11 -ffixed-x12 -ffixed-x13 
-ffixed-x14 -ffixed-x15 -ffixed-x17 -ffixed-x18

void (*f)();
int xx;

void tailcall (int i)

{
    int arr[5000];
    xx = arr[i];
    f();
}


tailcall:
 mov    x16, 20016
 sub    sp, sp, x16
 adrp    x16, .LANCHOR0
 stp    x19, x30, [sp]
 add    x19, sp, 16
 ldr    s0, [x19, w0, sxtw 2]
 ldp    x19, x30, [sp]
 str    s0, [x16, #:lo12:.LANCHOR0]
 mov    x16, 20016
 add    sp, sp, x16
 br    x16   // oops


As we can see, x16 is used in the indirect sibcall instruction. It is used as
a temporary in the epilogue code as well. The register allocation is invalid.

With the change, the register allocator is only allowed to use r0-r15, r18 for
indirect sibcall instruction.

For this particular case above, the compiler will ICE as there is not register
could be used for this sibcall instruction.
And I think it is better to fail instead of wrong code-generation.

test.c:10:1: error: unable to generate reloads for:
  }
  ^
(call_insn/j 16 12 17 2 (parallel [
     (call (mem:DI (reg/f:DI 84 [ f ]) [0 *f.0_2 S8 A8])
     (const_int 0 [0]))
     (return)
     ]) "test.c":9 42 {*sibcall_insn}
  (expr_list:REG_DEAD (reg/f:DI 84 [ f ])
     (expr_list:REG_CALL_DECL (nil)
     (nil)))
     (expr_list (clobber (reg:DI 17 x17))
     (expr_list (clobber (reg:DI 16 x16))
     (nil

aarch64-none-elf test without regressions. Okay to commit?
The same issue affects gcc-6, gcc-7 as well. Backport are needed for those 
branches.

Regards,
Renlin

gcc/ChangeLog:

2017-12-11  Renlin Li  <renlin...@arm.com>

     PR target/83370
 * config/aarch64/aarch64.c (aarch64_class_max_nregs): Handle
 TAILCALL_ADDR_REGS.
 (aarch64_register_move_cost): Likewise.
 * config/aarch64/aarch64.h (reg_class): Rename CALLER_SAVE_REGS to 
TAILCALL_ADDR_REGS.
 * config/aarch64/constraints.md (Ucs): Update register constraint.


[PR83370][AARCH64]Use tighter register constraints for sibcall patterns.

2017-12-11 Thread Renlin Li

Hi all,

In aarch64 backend, ip0/ip1 register will be used in the prologue/epilogue as
temporary register.

When the compiler is performing sibcall optimization. It has the chance to use
ip0/ip1 register for indirect function call to hold the address. However, those 
two register might
be clobbered by the epilogue code which makes the last sibcall instruction
invalid.

The following is an extreme example:
When built with -O2 -ffixed-x0 -ffixed-x1 -ffixed-x2 -ffixed-x3 -ffixed-x4 
-ffixed-x5 -ffixed-x6 -ffixed-x7
-ffixed-x8 -ffixed-x9 -ffixed-x10 -ffixed-x11 -ffixed-x12 -ffixed-x13 
-ffixed-x14 -ffixed-x15 -ffixed-x17 -ffixed-x18

void (*f)();
int xx;

void tailcall (int i)

{
   int arr[5000];
   xx = arr[i];
   f();
}


tailcall:
mov x16, 20016
sub sp, sp, x16
adrpx16, .LANCHOR0
stp x19, x30, [sp]
add x19, sp, 16
ldr s0, [x19, w0, sxtw 2]
ldp x19, x30, [sp]
str s0, [x16, #:lo12:.LANCHOR0]
mov x16, 20016
add sp, sp, x16
br  x16   // oops


As we can see, x16 is used in the indirect sibcall instruction. It is used as
a temporary in the epilogue code as well. The register allocation is invalid.

With the change, the register allocator is only allowed to use r0-r15, r18 for
indirect sibcall instruction.

For this particular case above, the compiler will ICE as there is not register
could be used for this sibcall instruction.
And I think it is better to fail instead of wrong code-generation.

test.c:10:1: error: unable to generate reloads for:
 }
 ^
(call_insn/j 16 12 17 2 (parallel [
(call (mem:DI (reg/f:DI 84 [ f ]) [0 *f.0_2 S8 A8])
(const_int 0 [0]))
(return)
]) "test.c":9 42 {*sibcall_insn}
 (expr_list:REG_DEAD (reg/f:DI 84 [ f ])
(expr_list:REG_CALL_DECL (nil)
(nil)))
(expr_list (clobber (reg:DI 17 x17))
(expr_list (clobber (reg:DI 16 x16))
(nil

aarch64-none-elf test without regressions. Okay to commit?
The same issue affects gcc-6, gcc-7 as well. Backport are needed for those 
branches.

Regards,
Renlin

gcc/ChangeLog:

2017-12-11  Renlin Li  <renlin...@arm.com>

PR target/83370
* config/aarch64/aarch64.c (aarch64_class_max_nregs): Handle
TAILCALL_ADDR_REGS.
(aarch64_register_move_cost): Likewise.
* config/aarch64/aarch64.h (reg_class): Rename CALLER_SAVE_REGS to 
TAILCALL_ADDR_REGS.
* config/aarch64/constraints.md (Ucs): Update register constraint.
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 93d29b84d47b7017661a2129d61e7d740bbf7c93..322b7f4628aa69cf331c12ff2c8df351890da9ef 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -446,7 +446,7 @@ extern unsigned aarch64_architecture_version;
 enum reg_class
 {
   NO_REGS,
-  CALLER_SAVE_REGS,
+  TAILCALL_ADDR_REGS,
   GENERAL_REGS,
   STACK_REG,
   POINTER_REGS,
@@ -462,7 +462,7 @@ enum reg_class
 #define REG_CLASS_NAMES\
 {		\
   "NO_REGS",	\
-  "CALLER_SAVE_REGS",\
+  "TAILCALL_ADDR_REGS",\
   "GENERAL_REGS",\
   "STACK_REG",	\
   "POINTER_REGS",\
@@ -475,7 +475,7 @@ enum reg_class
 #define REG_CLASS_CONTENTS		\
 {	\
   { 0x, 0x, 0x },	/* NO_REGS */		\
-  { 0x0007, 0x, 0x },	/* CALLER_SAVE_REGS */	\
+  { 0x0004, 0x, 0x },	/* TAILCALL_ADDR_REGS */\
   { 0x7fff, 0x, 0x0003 },	/* GENERAL_REGS */	\
   { 0x8000, 0x, 0x },	/* STACK_REG */		\
   { 0x, 0x, 0x0003 },	/* POINTER_REGS */	\
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 75a6c0d0421354d7c0759292947eb5d407f5b703..66d503ac6edf59a1ea2fa3675fbbe03d70769833 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -6060,7 +6060,7 @@ aarch64_class_max_nregs (reg_class_t regclass, machine_mode mode)
 {
   switch (regclass)
 {
-case CALLER_SAVE_REGS:
+case TAILCALL_ADDR_REGS:
 case POINTER_REGS:
 case GENERAL_REGS:
 case ALL_REGS:
@@ -8226,10 +8226,10 @@ aarch64_register_move_cost (machine_mode mode,
 = aarch64_tune_params.regmove_cost;
 
   /* Caller save and pointer regs are equivalent to GENERAL_REGS.  */
-  if (to == CALLER_SAVE_REGS || to == POINTER_REGS)
+  if (to == TAILCALL_ADDR_REGS || to == POINTER_REGS)
 to = GENERAL_REGS;
 
-  if (from == CALLER_SAVE_REGS || from == POINTER_REGS)
+  if (from == TAILCALL_ADDR_REGS || from == POINTER_REGS)
 from = GENERAL_REGS;
 
   /* Moving between GPR and stack cost is the same as GP2GP.  */
diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constraints.md
index af4143ef756464afac29d17f124b436520f90451..c3791aa89562a5d5542098d2f7951afc57901150 100644
--- a/gcc/config/a

Re: Fix profile update in switch conversion

2017-10-10 Thread Renlin Li

Hi Honza,

The change here cause the following failures:


FAIL: gcc.dg/tree-prof/switch-case-1.c scan-rtl-dump-times expand ";; basic 
block[^\\n]*count 2000" 1
FAIL: gcc.dg/tree-prof/switch-case-2.c scan-rtl-dump-times expand ";; basic 
block[^\\n]*count 2000" 1



I checked that, after the change, there are two matches in the dump file.



;; basic block 8, loop depth 0, count 2000 (adjusted), freq 2000, maybe hot
;; basic block 23, loop depth 0, count 2000, freq 2000, maybe hot



And without the change, there is only one match.


;; basic block 23, loop depth 0, count 2000, freq 2000, maybe hot


Regards,
Renlin


On 06/10/17 13:18, Jan Hubicka wrote:

Hi,
this patch fixes missing profile updat that triggers during profiledbootstrap.

Honza

* tree-switch-conversion.c (do_jump_if_equal, emit_cmp_and_jump_insns):
Update edge counts.

Index: tree-switch-conversion.c
===
--- tree-switch-conversion.c(revision 253444)
+++ tree-switch-conversion.c(working copy)
@@ -2248,10 +2248,12 @@ do_jump_if_equal (basic_block bb, tree o
edge false_edge = split_block (bb, cond);
false_edge->flags = EDGE_FALSE_VALUE;
false_edge->probability = prob.invert ();
+  false_edge->count = bb->count.apply_probability (false_edge->probability);

edge true_edge = make_edge (bb, label_bb, EDGE_TRUE_VALUE);
fix_phi_operands_for_edge (true_edge, phi_mapping);
true_edge->probability = prob;
+  true_edge->count = bb->count.apply_probability (true_edge->probability);

return false_edge->dest;
  }
@@ -2291,10 +2293,12 @@ emit_cmp_and_jump_insns (basic_block bb,
edge false_edge = split_block (bb, cond);
false_edge->flags = EDGE_FALSE_VALUE;
false_edge->probability = prob.invert ();
+  false_edge->count = bb->count.apply_probability (false_edge->probability);

edge true_edge = make_edge (bb, label_bb, EDGE_TRUE_VALUE);
fix_phi_operands_for_edge (true_edge, phi_mapping);
true_edge->probability = prob;
+  true_edge->count = bb->count.apply_probability (true_edge->probability);

return false_edge->dest;
  }



Re: [RFC][AARCH64]Add 'r' integer register operand modifier. Document the common asm modifier for aarch64 target.

2017-08-31 Thread Renlin Li

Hi all,

This is a split patch from a discussion here:
https://gcc.gnu.org/ml/gcc-patches/2017-06/msg00289.html

It contains the document part only.
It clarify the behavior when no modifier is used for register operand.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63359

'H' modifier is added for TImode register pair case.

It only documents the most common cases I can think of. Any other suggestions 
are welcome.

Is Okay to trunk?

Regards,
Renlin


gcc/ChangeLog:

2017-08-31  Renlin Li  <renlin...@arm.com>

PR target/63359
* doc/extend.texi (AArch64Operandmodifers): New section.

On 27/06/17 18:19, Renlin Li wrote:

Hi Andrew,

On 27/06/17 17:11, Andrew Pinski wrote:

On Tue, Jun 27, 2017 at 8:27 AM, Renlin Li <renlin...@foss.arm.com> wrote:

Hi Andrew,

On 25/06/17 22:38, Andrew Pinski wrote:


On Tue, Jun 6, 2017 at 3:56 AM, Renlin Li <renlin...@foss.arm.com> wrote:


Hi all,

In this patch, a new integer register operand modifier 'r' is added. This
will use the
proper register name according to the mode of corresponding operand.

'w' register for scalar integer mode smaller than DImode
'x' register for DImode

This allows more flexibility and would meet people's expectations.
It will help for ILP32 and LP64, and big-endian case.

A new section is added to document the AArch64 operand modifiers which
might
be used in inline assembly. It's not an exhaustive list covers every
modifier.
Only the most common and useful ones are documented.

The default behavior of integer operand without modifier is clearly
documented
as well. It's not changed so that the patch shouldn't break anything.

So with this patch, it should resolve the issues in PR63359.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63359


aarch64-none-elf regression test Okay. Okay to check in?



I think 'r' modifier is very fragile and can be used incorrectly and
wrong in some cases really..



The user could always (or be encouraged to) opt to a strict register
modifier to enforce consistent behavior in all cases.

I agree the flexibility might bring unexpected behavior in corner cases.
Do you have any examples to share off the top of your head? So that we can
discuss the benefit and pitfalls, and decide to improve the patch or
withdraw it.


One thing is TImode is missing.  I have an use case of __int128_t
inside inline-asm.
For me %r and TImode would produce "x0, x1".  This is one of the
reasons why I said it is fragile.



This is true. Actually, I intended to make 'r' only handle the simplest single
integer register case.
So that people won't believe it's a magic thing which could handle everything.
I could improve the description about 'r' to clearly explain it's limitation.

For TImode integer data, if 'r' is used, it will error
"invalid 'asm': invalid operand mode for register modifier 'r'"




I like the documentation though.


As an aside %H is not documented here.  Noticed it because I am using
%H with TImode.


For the document as well, I only document those most common ones which might be 
used in
inline assembly. It's good to know more use cases.
I could add 'H' into the document.

Regards,
Renlin



Thanks,
Andrew



Thanks,
Renlin




Thanks,
Andrew



gcc/ChangeLog:

2017-06-06  Renlin Li  <renlin...@arm.com>

  PR target/63359
  * config/aarch64/aarch64.c (aarch64_print_operand): Add 'r'
modifier.
  * doc/extend.texi (AArch64Operandmodifiers): New section.
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 03ba8fc..589a6cb 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -8286,7 +8286,9 @@ is undefined if @var{a} is modified before using @var{b}.
 @code{asm} supports operand modifiers on operands (for example @samp{%k2} 
 instead of simply @samp{%2}). Typically these qualifiers are hardware 
 dependent. The list of supported modifiers for x86 is found at 
-@ref{x86Operandmodifiers,x86 Operand modifiers}.
+@ref{x86Operandmodifiers,x86 Operand modifiers}.  The list of supported
+modifiers for AArch64 is found at
+@ref{AArch64Operandmodifiers,AArch64 Operand modifiers}.
 
 If the C code that follows the @code{asm} makes no use of any of the output 
 operands, use @code{volatile} for the @code{asm} statement to prevent the 
@@ -8513,7 +8515,9 @@ optimizers may discard the @code{asm} statement as unneeded
 @code{asm} supports operand modifiers on operands (for example @samp{%k2} 
 instead of simply @samp{%2}). Typically these qualifiers are hardware 
 dependent. The list of supported modifiers for x86 is found at 
-@ref{x86Operandmodifiers,x86 Operand modifiers}.
+@ref{x86Operandmodifiers,x86 Operand modifiers}.  The list of supported
+modifiers for AArch64 is found at
+@ref{AArch64Operandmodifiers,AArch64 Operand modifiers}.
 
 In this example using the fictitious @code{combine} instruction, the 
 constraint @code{"0"} for input operand 1 says that it must occupy the same 
@@ -8681,6 +8685,71 @@ error

Re: [TESTSUITE]Use strncpy instead of strcpy in testsuite/gcc.dg/memcmp-1.c

2017-08-30 Thread Renlin Li

Hi Aaron,

On 30/08/17 15:37, Aaron Sawdey wrote:

On Wed, 2017-08-30 at 10:16 +0100, Renlin Li wrote:

Hi,



Hi,
   Renlin you are correct that it shouldn't be using strcpy because the
string may not be null terminated. However I would suggest we use
memcpy instead of strncpy. The reason is that cases where there is a
null char in the middle of the string test whether the strncmp is
properly ignoring what comes after. So how about using this:

  memcpy(a,str1,SZ);\
  memcpy(b,str2,SZ);\

as in the test_memcmp_ part of the macro?


You are right.

For strncpy, if there is a null-character before size, the destination will be padded with 
zeros.


memcpy is better than strncpy in this case.
Here is the updated patch.

Regards,
Renlin
diff --git a/gcc/testsuite/gcc.dg/memcmp-1.c b/gcc/testsuite/gcc.dg/memcmp-1.c
index 828a0ca..d258354 100644
--- a/gcc/testsuite/gcc.dg/memcmp-1.c
+++ b/gcc/testsuite/gcc.dg/memcmp-1.c
@@ -110,8 +110,8 @@ static void test_strncmp_ ## SZ ## _ ## ALIGN (const char *str1, const char *str
 	{\
 	  a = three+i*ALIGN+j*(4096-2*i*ALIGN);\
 	  b = four+i*ALIGN+j*(4096-2*i*ALIGN);\
-	  strcpy(a,str1);		\
-	  strcpy(b,str2);		\
+	  memcpy(a,str1,SZ);		\
+	  memcpy(b,str2,SZ);		\
 	  r = strncmp(a,b,SZ);		\
 	  if ( r < 0 && !(expect < 0) ) abort();			\
 	  if ( r > 0 && !(expect > 0) )	abort();			\


[TESTSUITE]Use strncpy instead of strcpy in testsuite/gcc.dg/memcmp-1.c

2017-08-30 Thread Renlin Li

Hi,

In test_driver_memcmp function, I found buf1 and buf2 is not properly
terminated with null character.

In lib_strncmp, strcpy will be called with buf1 and buf2.
The normal implementation of strcpy function has a loop to copy character from 
source
to destination one by one until a null character is encountered.

If the string is not properly terminated, this will cause the strcpy read/write
memory beyond the boundary.

Here I changed the strcpy into strncpy to constraint the function to visit
legal memory only.

Test Okay without any problem. Okay to commit?

Regard,
Renlin


gcc/testsuite/ChangeLog:

2017-08-30  Renlin Li  <renlin...@arm.com>

* gcc.dg/memcmp-1.c (test_strncmp): Use strncpy instead of strcpy.
diff --git a/gcc/testsuite/gcc.dg/memcmp-1.c b/gcc/testsuite/gcc.dg/memcmp-1.c
index 828a0ca..d258354 100644
--- a/gcc/testsuite/gcc.dg/memcmp-1.c
+++ b/gcc/testsuite/gcc.dg/memcmp-1.c
@@ -110,8 +110,8 @@ static void test_strncmp_ ## SZ ## _ ## ALIGN (const char *str1, const char *str
 	{\
 	  a = three+i*ALIGN+j*(4096-2*i*ALIGN);\
 	  b = four+i*ALIGN+j*(4096-2*i*ALIGN);\
-	  strcpy(a,str1);		\
-	  strcpy(b,str2);		\
+	  strncpy(a,str1,SZ);		\
+	  strncpy(b,str2,SZ);		\
 	  r = strncmp(a,b,SZ);		\
 	  if ( r < 0 && !(expect < 0) ) abort();			\
 	  if ( r > 0 && !(expect > 0) )	abort();			\


Re: [RFC][AARCH64]Add 'r' integer register operand modifier. Document the common asm modifier for aarch64 target.

2017-06-27 Thread Renlin Li

Hi Andrew,

On 27/06/17 17:11, Andrew Pinski wrote:

On Tue, Jun 27, 2017 at 8:27 AM, Renlin Li <renlin...@foss.arm.com> wrote:

Hi Andrew,

On 25/06/17 22:38, Andrew Pinski wrote:


On Tue, Jun 6, 2017 at 3:56 AM, Renlin Li <renlin...@foss.arm.com> wrote:


Hi all,

In this patch, a new integer register operand modifier 'r' is added. This
will use the
proper register name according to the mode of corresponding operand.

'w' register for scalar integer mode smaller than DImode
'x' register for DImode

This allows more flexibility and would meet people's expectations.
It will help for ILP32 and LP64, and big-endian case.

A new section is added to document the AArch64 operand modifiers which
might
be used in inline assembly. It's not an exhaustive list covers every
modifier.
Only the most common and useful ones are documented.

The default behavior of integer operand without modifier is clearly
documented
as well. It's not changed so that the patch shouldn't break anything.

So with this patch, it should resolve the issues in PR63359.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63359


aarch64-none-elf regression test Okay. Okay to check in?



I think 'r' modifier is very fragile and can be used incorrectly and
wrong in some cases really..



The user could always (or be encouraged to) opt to a strict register
modifier to enforce consistent behavior in all cases.

I agree the flexibility might bring unexpected behavior in corner cases.
Do you have any examples to share off the top of your head? So that we can
discuss the benefit and pitfalls, and decide to improve the patch or
withdraw it.


One thing is TImode is missing.  I have an use case of __int128_t
inside inline-asm.
For me %r and TImode would produce "x0, x1".  This is one of the
reasons why I said it is fragile.



This is true. Actually, I intended to make 'r' only handle the simplest single
integer register case.
So that people won't believe it's a magic thing which could handle everything.
I could improve the description about 'r' to clearly explain it's limitation.

For TImode integer data, if 'r' is used, it will error
"invalid 'asm': invalid operand mode for register modifier 'r'"




I like the documentation though.


As an aside %H is not documented here.  Noticed it because I am using
%H with TImode.


For the document as well, I only document those most common ones which might be used in 
inline assembly. It's good to know more use cases.

I could add 'H' into the document.

Regards,
Renlin



Thanks,
Andrew



Thanks,
Renlin




Thanks,
Andrew



gcc/ChangeLog:

2017-06-06  Renlin Li  <renlin...@arm.com>

  PR target/63359
  * config/aarch64/aarch64.c (aarch64_print_operand): Add 'r'
modifier.
  * doc/extend.texi (AArch64Operandmodifiers): New section.


Re: [RFC][AARCH64]Add 'r' integer register operand modifier. Document the common asm modifier for aarch64 target.

2017-06-27 Thread Renlin Li

Hi Andrew,

On 25/06/17 22:38, Andrew Pinski wrote:

On Tue, Jun 6, 2017 at 3:56 AM, Renlin Li <renlin...@foss.arm.com> wrote:

Hi all,

In this patch, a new integer register operand modifier 'r' is added. This
will use the
proper register name according to the mode of corresponding operand.

'w' register for scalar integer mode smaller than DImode
'x' register for DImode

This allows more flexibility and would meet people's expectations.
It will help for ILP32 and LP64, and big-endian case.

A new section is added to document the AArch64 operand modifiers which might
be used in inline assembly. It's not an exhaustive list covers every
modifier.
Only the most common and useful ones are documented.

The default behavior of integer operand without modifier is clearly
documented
as well. It's not changed so that the patch shouldn't break anything.

So with this patch, it should resolve the issues in PR63359.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63359


aarch64-none-elf regression test Okay. Okay to check in?


I think 'r' modifier is very fragile and can be used incorrectly and
wrong in some cases really..


The user could always (or be encouraged to) opt to a strict register modifier to enforce 
consistent behavior in all cases.


I agree the flexibility might bring unexpected behavior in corner cases.
Do you have any examples to share off the top of your head? So that we can discuss the 
benefit and pitfalls, and decide to improve the patch or withdraw it.



I like the documentation though.

Thanks,
Renlin



Thanks,
Andrew



gcc/ChangeLog:

2017-06-06  Renlin Li  <renlin...@arm.com>

 PR target/63359
 * config/aarch64/aarch64.c (aarch64_print_operand): Add 'r'
modifier.
 * doc/extend.texi (AArch64Operandmodifiers): New section.


Re: [PATCH][Testsuite] Use user defined memmove in gcc.c-torture/execute/builtins/memops-asm-lib.c

2017-06-23 Thread Renlin Li

Hi Martin,

On 23/06/17 16:27, Martin Sebor wrote:

On 06/23/2017 03:19 AM, Renlin Li wrote:

Hi all,

After the change r249278. bcopy is folded into memmove. And in newlib
aarch64
memmove implementation, it will call memcpy in certain conditions.
The memcpy defined in memops-asm-lib.c will abort when the test is running.

In this case, I defined a user memmove function which by pass the
library one.
So that memcpy won't be called accidentally.

Okay to commit?


Having memmove call memcpy when there is no overlap seems like
a valid transformation.  I don't know which test specifically
fails so the question on my mind is whether it perhaps is overly
restrictive in assuming that this transformation must never take
place.  Other than that, although I can't really approve patches,
this one looks okay to me.  Thanks for getting to the bottom of
the failure and fixing it!


Sorry I didn't mention the regressions.
It only happens with aarch64 baremetal targets because of the newlib memmove 
implementation.

FAIL: gcc.c-torture/execute/builtins/memops-asm.c execution,  -O0
FAIL: gcc.c-torture/execute/builtins/memops-asm.c execution,  -O1
FAIL: gcc.c-torture/execute/builtins/memops-asm.c execution,  -O2
FAIL: gcc.c-torture/execute/builtins/memops-asm.c execution,  -O3 -g
FAIL: gcc.c-torture/execute/builtins/memops-asm.c execution,  -Og -g
FAIL: gcc.c-torture/execute/builtins/memops-asm.c execution,  -Os

I think the purpose of the test is to check, the original function is not directly called 
from the main_test function.

Instead, those calls are redirected to "my_" version. It will abort otherwise.
I CCed Richard Sandiford as he is the original contributor of the test case.

Before r249278, bcopy has a corresponding my_bcopy function which is actually 
got called.

Regards,
Renlin



Martin



gcc/testsuite/ChangeLog:

2017-06-22  Renlin Li  <renlin...@arm.com>
Szabolcs Nagy  <szabolcs.n...@arm.com>

* gcc.c-torture/execute/builtins/memops-asm-lib.c (my_memmove): New.
* gcc.c-torture/execute/builtins/memops-asm.c (memmove): Declare
memmove.




[PATCH][Testsuite] Use user defined memmove in gcc.c-torture/execute/builtins/memops-asm-lib.c

2017-06-23 Thread Renlin Li

Hi all,

After the change r249278. bcopy is folded into memmove. And in newlib aarch64
memmove implementation, it will call memcpy in certain conditions.
The memcpy defined in memops-asm-lib.c will abort when the test is running.

In this case, I defined a user memmove function which by pass the library one.
So that memcpy won't be called accidentally.

Okay to commit?

gcc/testsuite/ChangeLog:

2017-06-22  Renlin Li  <renlin...@arm.com>
Szabolcs Nagy  <szabolcs.n...@arm.com>

* gcc.c-torture/execute/builtins/memops-asm-lib.c (my_memmove): New.
* gcc.c-torture/execute/builtins/memops-asm.c (memmove): Declare 
memmove.
diff --git a/gcc/testsuite/gcc.c-torture/execute/builtins/memops-asm-lib.c b/gcc/testsuite/gcc.c-torture/execute/builtins/memops-asm-lib.c
index 529..25d4a40 100644
--- a/gcc/testsuite/gcc.c-torture/execute/builtins/memops-asm-lib.c
+++ b/gcc/testsuite/gcc.c-torture/execute/builtins/memops-asm-lib.c
@@ -37,6 +37,24 @@ my_bcopy (const void *s, void *d, size_t n)
 }
 }
 
+__attribute__ ((used))
+void
+my_memmove (void *d, const void *s, size_t n)
+{
+  char *dst = (char *) d;
+  const char *src = (const char *) s;
+  if (src >= dst)
+while (n--)
+  *dst++ = *src++;
+  else
+{
+  dst += n;
+  src += n;
+  while (n--)
+	*--dst = *--src;
+}
+}
+
 /* LTO code is at the present to able to track that asm alias my_bcopy on builtin
actually refers to this function.  See PR47181. */
 __attribute__ ((used))
diff --git a/gcc/testsuite/gcc.c-torture/execute/builtins/memops-asm.c b/gcc/testsuite/gcc.c-torture/execute/builtins/memops-asm.c
index ed2b06c..44e336c 100644
--- a/gcc/testsuite/gcc.c-torture/execute/builtins/memops-asm.c
+++ b/gcc/testsuite/gcc.c-torture/execute/builtins/memops-asm.c
@@ -12,6 +12,8 @@ extern void *memcpy (void *, const void *, size_t)
   __asm (ASMNAME ("my_memcpy"));
 extern void bcopy (const void *, void *, size_t)
   __asm (ASMNAME ("my_bcopy"));
+extern void *memmove (void *, const void *, size_t)
+  __asm (ASMNAME ("my_memmove"));
 extern void *memset (void *, int, size_t)
   __asm (ASMNAME ("my_memset"));
 extern void bzero (void *, size_t)


Re: [PATCH] have -Wformat-overflow handle -fexec-charset (PR 80503)

2017-06-20 Thread Renlin Li

Hi Martin,

I did a little investigation into this. Please correct me if I missed anything.

I build a native arm-linux-gnueabihf toolchain in armhf hardware.
It's ILP32. So in this situation:

HOST_WIDE_INT is long, which is 32-bit.
integer type 32-bit as well, so target_int_max () == LONG_MAX


gimple-ssa-sprintf.c line 2887

  /* Has the likely and maximum directive output exceeded INT_MAX?  */
  bool likelyximax = *dir.beg && res->range.likely > target_int_max ();


likelyximax will be false as the latter expression is always false.
res->range.likely is truncated to LONG_MAX (in target_strtol10 function)

I have checked in cross build environment (host x86_64), this variable is true.

Regards,
Renlin


On 13/06/17 09:16, Renlin Li wrote:

Hi Martin,

On 04/06/17 23:24, Martin Sebor wrote:

On 06/02/2017 09:38 AM, Renlin Li wrote:

Hi Martin,

After r247444, I saw the following two regressions in
arm-linux-gnueabihf environment:

FAIL: gcc.dg/tree-ssa/builtin-sprintf-warn-18.c  (test for warnings,
line 119)
PASS: gcc.dg/tree-ssa/builtin-sprintf-warn-18.c  (test for warnings,
line 121)
FAIL: gcc.dg/tree-ssa/builtin-sprintf-warn-18.c  (test for warnings,
line 121)

The warning message related to those two lines are:
testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c:119:3: warning:
'%9223372036854775808i' directive width out of range [-Wformat-overflow=]

testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c:121:3: warning:
'%.9223372036854775808i' directive precision out of range
[-Wformat-overflow=]

testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c:121:3: warning:
'%.9223372036854775808i' directive precision out of range
[-Wformat-overflow=]

Did you notice similar things from your test environment, Christophe?


Looks like you're missing a couple of warnings.  I see the following
output with both my arm-linux-gnueabihf cross compiler and my native
x86_64 GCC, both in 32-bit and 64-bit modes, as expected by the test,
so I don't see the same issue in my environment.


Yes, it happens on arm-linux-gnueabihf native environment. the warnings with 
"INT_MAX"
line are missing. I don't know if the host environment will cause the 
difference.

Regards,
Renlin



/ssd/src/gcc/git/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c:119:3: 
warning:
‘%9223372036854775808i’ directive width out of range [-Wformat-overflow=]
T ("%9223372036854775808i", 0);/* { dg-warning "width out of range" } */
^
/ssd/src/gcc/git/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c:119:3: 
warning:
‘%9223372036854775808i’ directive output of 9223372036854775807 bytes causes 
result to
exceed ‘INT_MAX’ [-Wformat-overflow=]
/ssd/src/gcc/git/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c:121:3: 
warning:
‘%.9223372036854775808i’ directive precision out of range [-Wformat-overflow=]
T ("%.9223372036854775808i", 0);   /* { dg-warning "precision out of range" 
} */
^
/ssd/src/gcc/git/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c:121:3: 
warning:
‘%.9223372036854775808i’ directive output of 9223372036854775807 bytes causes 
result to
exceed ‘INT_MAX’ [-Wformat-overflow=]

Martin


Re: [PATCH] have -Wformat-overflow handle -fexec-charset (PR 80503)

2017-06-13 Thread Renlin Li

Hi Martin,

On 04/06/17 23:24, Martin Sebor wrote:

On 06/02/2017 09:38 AM, Renlin Li wrote:

Hi Martin,

After r247444, I saw the following two regressions in
arm-linux-gnueabihf environment:

FAIL: gcc.dg/tree-ssa/builtin-sprintf-warn-18.c  (test for warnings,
line 119)
PASS: gcc.dg/tree-ssa/builtin-sprintf-warn-18.c  (test for warnings,
line 121)
FAIL: gcc.dg/tree-ssa/builtin-sprintf-warn-18.c  (test for warnings,
line 121)

The warning message related to those two lines are:
testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c:119:3: warning:
'%9223372036854775808i' directive width out of range [-Wformat-overflow=]

testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c:121:3: warning:
'%.9223372036854775808i' directive precision out of range
[-Wformat-overflow=]

testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c:121:3: warning:
'%.9223372036854775808i' directive precision out of range
[-Wformat-overflow=]

Did you notice similar things from your test environment, Christophe?


Looks like you're missing a couple of warnings.  I see the following
output with both my arm-linux-gnueabihf cross compiler and my native
x86_64 GCC, both in 32-bit and 64-bit modes, as expected by the test,
so I don't see the same issue in my environment.


Yes, it happens on arm-linux-gnueabihf native environment. the warnings with "INT_MAX" 
line are missing. I don't know if the host environment will cause the difference.


Regards,
Renlin



/ssd/src/gcc/git/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c:119:3: 
warning:
‘%9223372036854775808i’ directive width out of range [-Wformat-overflow=]
T ("%9223372036854775808i", 0);/* { dg-warning "width out of range" } */
^
/ssd/src/gcc/git/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c:119:3: 
warning:
‘%9223372036854775808i’ directive output of 9223372036854775807 bytes causes 
result to
exceed ‘INT_MAX’ [-Wformat-overflow=]
/ssd/src/gcc/git/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c:121:3: 
warning:
‘%.9223372036854775808i’ directive precision out of range [-Wformat-overflow=]
T ("%.9223372036854775808i", 0);   /* { dg-warning "precision out of range" 
} */
^
/ssd/src/gcc/git/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c:121:3: 
warning:
‘%.9223372036854775808i’ directive output of 9223372036854775807 bytes causes 
result to
exceed ‘INT_MAX’ [-Wformat-overflow=]

Martin


Re: Statically propagate basic blocks which are likely executed 0 times

2017-06-12 Thread Renlin Li

Hi Honza & Christophe,

I have tested your suggested fix. It does fix the regression.
Here is a simple patch for it.

After r249013, die () and dump_stack () are both in cold section. This makes
the compiler generate bl instruction for the function call, instead of
honoring the -mlong-calls option.

This patch changes the dump_stack function call conditional, which fixes the
regression.

Okay to commit?

Regards,
Renlin

gcc/testsuite/ChangeLog:

2017-06-12  Renlin Li  <renlin...@arm.com>

* gcc.target/arm/cold-lc.c: Update coding style, call dump_stack
conditionally.


On 09/06/17 10:54, Jan Hubicka wrote:

Since this commit (r249013), I've noticed a regression on arm targets:
FAIL: gcc.target/arm/cold-lc.c scan-assembler-not bl[^\n]*dump_stack


I think that is because we optimize the testcase:
/* { dg-do compile } */
/* { dg-options "-O2 -mlong-calls" } */
/* { dg-final { scan-assembler-not "bl\[^\n\]*dump_stack" } } */

extern void dump_stack (void) __attribute__ ((__cold__)) __attribute__ 
((noinline));
struct thread_info {
 struct task_struct *task;
};
extern struct thread_info *current_thread_info (void);
extern int show_stack (struct task_struct *, unsigned long *);

void dump_stack (void)
{
 unsigned long stack;
 show_stack ((current_thread_info ()->task), );
}

void die (char *str, void *fp, int nr)
{
 dump_stack ();
 while (1);
}

the new logic will move die() into cold section (because it unavoidably leads 
to cold
code and thus allow using the bl instruction.
I guess just modifying die to call dump_stack conditionally should fix the 
testcase.

Honza




+  if (!n->analyzed
+  || n->decl == current_function_decl)
+return false;
+  return n->frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED;
+}


diff --git a/gcc/testsuite/gcc.target/arm/cold-lc.c b/gcc/testsuite/gcc.target/arm/cold-lc.c
index 467a696..f0cd6df 100644
--- a/gcc/testsuite/gcc.target/arm/cold-lc.c
+++ b/gcc/testsuite/gcc.target/arm/cold-lc.c
@@ -11,13 +11,14 @@ extern int show_stack (struct task_struct *, unsigned long *);
 
 void dump_stack (void)
 {
-unsigned long stack;
-show_stack ((current_thread_info ()->task), );
+  unsigned long stack;
+  show_stack ((current_thread_info ()->task), );
 }
 
 void die (char *str, void *fp, int nr)
 {
+  if (nr)
 dump_stack ();
-while (1);
+  while (1);
 }
 


Re: [PING][PATCH][ARM]Use different startfile and endfile for elf target when generating shared object.

2017-06-07 Thread Renlin Li

Ping ~

On 14/12/16 15:33, Renlin Li wrote:

Ping~

Regards,
Renlin

On 16/06/16 12:04, Renlin Li wrote:

Hi all,

GCC has startfile and endfile spec string built into it.
startfile is used to specify objects files to include at the start of the link 
process.
While endfile, on the other hand, is used to specify objects files to include 
at the end
of the link process.

crtbegin.o is one of the object files specified by startfile spec string. IIUC,
crtbeginS.o should be used in place of crtbegin.o when generating shared 
objects.
The same applies to crtend.o which is one of the endfile. crtendS.o should be 
used when
generating shared objects.

This patch makes the change to use different crtbegin and crtend files when 
creating
shared and static object for elf toolchain. The linux toolchain already did this
differentiation.

So when the toolchain doesn't support shared object, the following error 
message will be
produced:
ld: cannot find crtbeginS.o: No such file or directory

Still, those specs strings built into GCC can be overridden by using
-specs=command-line switch to specify a spec file.

arm-none-eabi regression test without new issues, OK for trunk?

Regards,
Renlin Li

gcc/ChangeLog:

2016-06-16  Renlin Li  <renlin...@arm.com>

 * config/arm/unknown-elf.h (UNKNOWN_ELF_STARTFILE_SPEC): Use
 crtbeginS.o for shared object.
 (UNKNOWN_ELF_ENDFILE_SPEC): Use crtendS.o for shared object.


Re: [Patch, fortran] PR35339 Optimize implied do loops in io statements

2017-06-07 Thread Renlin Li
171.swim fails on aarch64-linux as well. I dis a bisect and confirm it's r248877 causing 
the miscompare.


Regards,
Renlin

On 06/06/17 12:05, Markus Trippelsdorf wrote:

On 2017.06.05 at 22:39 +0200, Nicolas Koenig wrote:

With all the style fixes committed as r248877.


171_swim fails now. I didn't bisect, but I suspect your revision.



[RFC][AARCH64]Add 'r' integer register operand modifier. Document the common asm modifier for aarch64 target.

2017-06-06 Thread Renlin Li

Hi all,

In this patch, a new integer register operand modifier 'r' is added. This will 
use the
proper register name according to the mode of corresponding operand.

'w' register for scalar integer mode smaller than DImode
'x' register for DImode

This allows more flexibility and would meet people's expectations.
It will help for ILP32 and LP64, and big-endian case.

A new section is added to document the AArch64 operand modifiers which might
be used in inline assembly. It's not an exhaustive list covers every modifier.
Only the most common and useful ones are documented.

The default behavior of integer operand without modifier is clearly documented
as well. It's not changed so that the patch shouldn't break anything.

So with this patch, it should resolve the issues in PR63359.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63359


aarch64-none-elf regression test Okay. Okay to check in?

gcc/ChangeLog:

2017-06-06  Renlin Li  <renlin...@arm.com>

PR target/63359
* config/aarch64/aarch64.c (aarch64_print_operand): Add 'r' modifier.
* doc/extend.texi (AArch64Operandmodifiers): New section.
commit f8725ffd1375a8347cc8f4f183262c08ce2f73c6
Author: Renlin Li <renlin...@arm.com>
Date:   Tue May 23 16:46:31 2017 +0100

[AARCH64]Add 'r' integer operand modifier. Document the be extend asm modifier
for aarch64 target.

gcc/ChangeLog:

	PR target/63359
	* config/aarch64/aarch64.c (aarch64_print_operand): Add 'r' modifier.
	* doc/extend.texi (AArch64Operandmodifiers): New section.

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 5707e53..d1c400f 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -5132,6 +5132,26 @@ aarch64_print_operand (FILE *f, rtx x, int code)
   asm_fprintf (f, "0x%wx", UINTVAL (x) & 0x);
   break;
 
+case 'r':
+	{
+	  machine_mode mode = GET_MODE (x);
+	  switch (mode)
+	{
+	case QImode:
+	case HImode:
+	case SImode:
+	  code = 'w';
+	  break;
+	case DImode:
+	  code = 'x';
+	  break;
+	default:
+	  output_operand_lossage
+		("invalid operand mode for register modifier 'r'");
+	}
+	}
+  /* Fall through.  */
+
 case 'w':
 case 'x':
   /* Print a general register name or the zero register (32-bit or
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 60a1a3f..d1c830d 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -8264,7 +8264,9 @@ is undefined if @var{a} is modified before using @var{b}.
 @code{asm} supports operand modifiers on operands (for example @samp{%k2} 
 instead of simply @samp{%2}). Typically these qualifiers are hardware 
 dependent. The list of supported modifiers for x86 is found at 
-@ref{x86Operandmodifiers,x86 Operand modifiers}.
+@ref{x86Operandmodifiers,x86 Operand modifiers}.  The list of supported
+modifiers for AArch64 is found at
+@ref{AArch64Operandmodifiers,AArch64 Operand modifiers}.
 
 If the C code that follows the @code{asm} makes no use of any of the output 
 operands, use @code{volatile} for the @code{asm} statement to prevent the 
@@ -8491,7 +8493,9 @@ optimizers may discard the @code{asm} statement as unneeded
 @code{asm} supports operand modifiers on operands (for example @samp{%k2} 
 instead of simply @samp{%2}). Typically these qualifiers are hardware 
 dependent. The list of supported modifiers for x86 is found at 
-@ref{x86Operandmodifiers,x86 Operand modifiers}.
+@ref{x86Operandmodifiers,x86 Operand modifiers}.  The list of supported
+modifiers for AArch64 is found at
+@ref{AArch64Operandmodifiers,AArch64 Operand modifiers}.
 
 In this example using the fictitious @code{combine} instruction, the 
 constraint @code{"0"} for input operand 1 says that it must occupy the same 
@@ -8659,6 +8663,70 @@ error:
 @}
 @end example
 
+@anchor{AArch64Operandmodifiers}
+@subsubsection AArch64 Operand Modifiers
+References to input, output, and goto operands in the assembler template
+of extended @code{asm} statements can use
+modifiers to affect the way the operands are formatted in
+the code output to the assembler.
+
+The table blow descirbes the list of useful register operand modifiers which
+might be used in extended @code{asm}. It is not necessary a complete list
+of modifiers supported by the AArch64 backend.
+
+@multitable {Modifier} {Print the opcode suffix for the size of th} {Operand}
+@headitem Modifier @tab Description @tab Operand
+@item @code{r}
+@tab Print the opcode suffix for the size of the current integer operand (one of @code{w}/@code{x}).
+@tab @code{%r0}
+@item @code{w}
+@tab Print the SImode name of the register.
+@tab @code{%w0}
+@item @code{x}
+@tab Print the DImode name of the register.
+@tab @code{%x0}
+@item @code{h}
+@tab Print the HFmode name of the register.
+@tab @code{%h0}
+@item @code{s}
+@tab Print the SFmode name of the register.
+@tab @code{%s0}
+@item @code{d}
+@ta

Re: [PATCH] add more detail to -Wconversion and -Woverflow (PR 80731)

2017-06-02 Thread Renlin Li

Hi Martin,

I noticed the following failures after your change r248431.
FAIL: c-c++-common/Wfloat-conversion.c  -Wc++-compat   (test for warnings, line 
42)
FAIL: c-c++-common/Wfloat-conversion.c  -Wc++-compat   (test for warnings, line 
43)

It happens on arm target which is not a large_long_double target.
The patch here add the missing target selector. After the change, those test
won't checked in arm target.

Here I have a simple fix to it. Okay to commit?

gcc/testsuite/ChangeLog:

2017-06-02 Renlin Li <renlin...@arm.com>

* c-c++-common/Wfloat-conversion.c: Add large_long_double target
selector to related line.



And there is another failure:
FAIL: gcc.dg/utf16-4.c  (test for warnings, line 15)

The warning message is slightly different from expected.
utf16-4.c:10:15: warning: character constant too long for its type
utf16-4.c:15:15: warning: conversion from 'long unsigned int' to 'char16_t {aka short 
unsigned int}' changes value from '410401' to '17185'


On 18/05/17 01:04, Martin Sebor wrote:

While working on a new warning for unsafe conversion I noticed
that the existing warnings that diagnose these kinds of problems
are missing some useful detail. For example, given declarations
of an integer Type and an integer Constant defined in some header,
a C programmer who writes this declaration:

   Type x = Constant;

might see the following:

   warning: overflow in implicit constant conversion [-Woverflow]

To help the programmer better understand the problem and its impact
it would be helpful to mention the types of the operands, and if
available, also the values of the expressions.  For instance, like
so:

   warning: overflow in conversion from ‘int’ to ‘T {aka signed char}’ changes 
value from
‘123456789’ to ‘21’ [-Woverflow]

The attached simple patch does just that.  In making the changes
I tried to make the text of all the warnings follow the same
consistent wording pattern without losing any essential information
(e.g., I dropped "implicit" or "constant" because the implicit part
is evident from the code (no cast) and explicit conversions aren't
diagnosed, and because constant is apparent from the rest of the
diagnostic that includes its value.

Besides adding more detail and tweaking the wording the patch
makes no functional changes (i.e., doesn't add new or remove
existing warnings).

Martin

PS While adjusting the tests (a painstaking process) it occurred
to me that these kinds of changes would be a whole lot easier if
dg-warning directives simply checked for "-Woption-name" rather
than some (often arbitrary) part of the warning text.  It might
even be more accurate if the pattern happens to match the text
of two or more warnings controlled by different options.

It's of course important to also exercise the full text of
the warnings, especially where additional detail is included
(like in this patch), but that can be done in a small subset
of tests.  All the others that just verify the presence of
a warning controlled by a given option could use the simpler
approach.
diff --git a/gcc/testsuite/c-c++-common/Wfloat-conversion.c b/gcc/testsuite/c-c++-common/Wfloat-conversion.c
index e9899bc..c33a2a6 100644
--- a/gcc/testsuite/c-c++-common/Wfloat-conversion.c
+++ b/gcc/testsuite/c-c++-common/Wfloat-conversion.c
@@ -39,8 +39,8 @@ void h (void)
   vfloat = vdouble; /* { dg-warning "conversion from .double. to .float. may change value" } */
   ffloat (vlongdouble); /* { dg-warning "conversion from .long double. to .float. may change value" } */
   vfloat = vlongdouble; /* { dg-warning "conversion from .long double. to .float. may change value" } */
-  fdouble (vlongdouble); /* { dg-warning "conversion from .long double. to .double. may change value" } */
-  vdouble = vlongdouble; /* { dg-warning "conversion from .long double. to .double. may change value" } */
+  fdouble (vlongdouble); /* { dg-warning "conversion from .long double. to .double. may change value" "" { target large_long_double } } */
+  vdouble = vlongdouble; /* { dg-warning "conversion from .long double. to .double. may change value" "" { target large_long_double } } */
 
   fsi (3.1f); /* { dg-warning "conversion from .float. to .int. changes value" } */
   si = 3.1f; /* { dg-warning "conversion from .float. to .int. changes value" } */


Re: [PATCH] have -Wformat-overflow handle -fexec-charset (PR 80503)

2017-06-02 Thread Renlin Li

Hi Martin,

After r247444, I saw the following two regressions in arm-linux-gnueabihf 
environment:

FAIL: gcc.dg/tree-ssa/builtin-sprintf-warn-18.c  (test for warnings, line 119)
PASS: gcc.dg/tree-ssa/builtin-sprintf-warn-18.c  (test for warnings, line 121)
FAIL: gcc.dg/tree-ssa/builtin-sprintf-warn-18.c  (test for warnings, line 121)

The warning message related to those two lines are:
testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c:119:3: warning: 
'%9223372036854775808i' directive width out of range [-Wformat-overflow=]


testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c:121:3: warning: 
'%.9223372036854775808i' directive precision out of range [-Wformat-overflow=]


testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c:121:3: warning: 
'%.9223372036854775808i' directive precision out of range [-Wformat-overflow=]


Did you notice similar things from your test environment, Christophe?

Regards,
Renlin

On 03/05/17 16:02, Christophe Lyon wrote:

On 3 May 2017 at 16:54, Martin Sebor  wrote:

On 05/03/2017 08:22 AM, Christophe Lyon wrote:


Hi,


On 29 April 2017 at 19:56, Andreas Schwab  wrote:


On Apr 28 2017, Martin Sebor  wrote:


+void test_width_and_precision_out_of_range (char *d)
+{
+#if __LONG_MAX__ == 2147483647
+#  define   MAX_P1_STR "2147483648"
+#elif __LONG_MAX__ == 9223372036854775807
+#  define MAX_P1_STR "9223372036854775808"
+#endif
+
+  T ("%" MAX_P1_STR "i", 0);/* { dg-warning "width out of range" }
*/
+  /* { dg-warning "result to exceed .INT_MAX. " "" { target *-*-* } .-1
} */
+  T ("%." MAX_P1_STR "i", 0);   /* { dg-warning "precision out of
range" } */



FAIL: gcc.dg/tree-ssa/builtin-sprintf-warn-18.c  (test for warnings, line
123)
FAIL: gcc.dg/tree-ssa/builtin-sprintf-warn-18.c  (test for warnings, line
125)
FAIL: gcc.dg/tree-ssa/builtin-sprintf-warn-18.c (test for excess errors)
Excess errors:

/daten/aranym/gcc/gcc-20170429/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-18.c:125:3:
warning: '%.2147483648i' directive output of 2147483648 bytes causes result
to exceed 'INT_MAX' [-Wformat-overflow=]

Andreas.



I've noticed the same errors on arm* targets, if it's easier to reproduce.



Thanks.  I committed a trivial fix for this on Monday
(https://gcc.gnu.org/ml/gcc-patches/2017-05/msg00036.html).
I don't see the failures in recent test results for the few
ILP32 targets I've checked so I'm hoping they're gone but if
they persist on some others please let me know.


Indeed, I confirm your commit r247444 fixed the error.
I didn't notice your message because it was in a different thread.

Thanks,

Christophe


Martin


Re: [PATCH,testsuite] Add check_effective_target_rdynamic and use it in g++.dg/lto/pr69589_0.C.

2017-06-02 Thread Renlin Li

Hi Toma,

Thanks for fixing this! Do you have plan to backport the fix to gcc-6 branch?


Regards,
Renlin

On 09/03/17 15:08, Toma Tabacu wrote:


Ok for mainline with that fixed.

Thanks.
 Rainer



Committed as r246004.

Thanks,
Toma



Re: [PATCH][AARCH64]Simplify call, call_value, sibcall, sibcall_value patterns.

2017-05-15 Thread Renlin Li

Hi Richard,

Thanks! committed with all the comments resolved.

Regards,
Renlin

On 02/05/17 13:53, Richard Earnshaw (lists) wrote:

On 01/12/16 15:39, Renlin Li wrote:

Hi all,

This patch refactors the code used in call, call_value, sibcall,
sibcall_value expanders.

Before the change, the logic is following:

call expander  --> call_internal  --> call_reg/call_symbol
call_vlaue expander--> call_value_internal-->
call_value_reg/call_value_symbol

sibcall expander   --> sibcall_internal   --> sibcall_insn
sibcall_value expander --> sibcall_value_internal --> sibcall_value_insn

After the change, the logic is simplified into:

call expander  --> aarch64_expand_call() --> call_insn
call_value expander--> aarch64_expand_call() --> call_value_insn

sibcall expander   --> aarch64_expand_call() --> sibcall_insn
sibcall_value expander --> aarch64_expand_call() --> sibcall_value_insn

The code are factored out from each expander into aarch64_expand_call ().

This also fixes the two issues Richard Henderson suggests in comments 8:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64971

aarch64-none-elf regression test Okay, aarch64-linux bootstrap Okay.
Okay for trunk?

Regards,
Renlin Li


gcc/ChangeLog:

2016-12-01  Renlin Li  <renlin...@arm.com>

 * config/aarch64/aarch64-protos.h (aarch64_expand_call): Declare.
 * config/aarch64/aarch64.c (aarch64_expand_call): Define.
 * config/aarch64/constraints.md (Usf): Add long call check.
 * config/aarch64/aarch64.md (call): Use aarch64_expand_call.
 (call_value): Likewise.
 (sibcall): Likewise.
 (sibcall_value): Likewise.
 (call_insn): New.
 (call_value_insn): New.
 (sibcall_insn): Update rtx pattern.
 (sibcall_value_insn): Likewise.
 (call_internal): Remove.
 (call_value_internal): Likewise.
 (sibcall_internal): Likewise.
 (sibcall_value_internal): Likewise.
 (call_reg): Likewise.
 (call_symbol): Likewise.
 (call_value_reg): Likewise.
 (call_value_symbol): Likewise.


new.diff


diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 7f67f14..3a5babb 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -305,6 +305,7 @@ bool aarch64_const_vec_all_same_int_p (rtx, HOST_WIDE_INT);
  bool aarch64_constant_address_p (rtx);
  bool aarch64_emit_approx_div (rtx, rtx, rtx);
  bool aarch64_emit_approx_sqrt (rtx, rtx, bool);
+void aarch64_expand_call (rtx, rtx, bool);
  bool aarch64_expand_movmem (rtx *);
  bool aarch64_float_const_zero_rtx_p (rtx);
  bool aarch64_function_arg_regno_p (unsigned);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 68a3380..c313cf5 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -4343,6 +4343,51 @@ aarch64_fixed_condition_code_regs (unsigned int *p1, 
unsigned int *p2)
return true;
  }

+/* This function is used by the call expanders of the machine description.
+   RESULT is the register in which the result is returned.  It's NULL for
+   "call" and "sibcall".
+   MEM is the location of the function call.
+   SIBCALL indicates whether this function call is normal call or sibling call.
+   It will generate different pattern accordingly.  */
+
+void
+aarch64_expand_call (rtx result, rtx mem, bool sibcall)
+{
+  rtx call, callee, tmp;
+  rtvec vec;
+  machine_mode mode;
+
+  gcc_assert (MEM_P (mem));
+  callee = XEXP (mem, 0);
+  mode = GET_MODE (callee);
+  gcc_assert (mode == Pmode);
+
+  /* Decide if we should generate indirect calls by loading the
+ 64-bit address of the callee into a register before performing


Drop '64-bit'.  This code should also work for ILP32, where the
addresses are 32-bit.


+ the branch-and-link.  */
+
+  if (GET_CODE (callee) == SYMBOL_REF


Use SYMBOL_REF_P.

OK with those changes.

R.



+  ? (aarch64_is_long_call_p (callee)
+|| aarch64_is_noplt_call_p (callee))
+  : !REG_P (callee))
+  XEXP (mem, 0) = force_reg (mode, callee);
+
+  call = gen_rtx_CALL (VOIDmode, mem, const0_rtx);
+
+  if (result != NULL_RTX)
+call = gen_rtx_SET (result, call);
+
+  if (sibcall)
+tmp = ret_rtx;
+  else
+tmp = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (Pmode, LR_REGNUM));
+
+  vec = gen_rtvec (2, call, tmp);
+  call = gen_rtx_PARALLEL (VOIDmode, vec);
+
+  aarch64_emit_call_insn (call);
+}
+
  /* Emit call insn with PAT and do aarch64-specific handling.  */

  void
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index bc6d8a2..5682686 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -718,12 +718,6 @@
  ;; Subroutine calls and sibcalls
  ;; ---

-(define_expand "call_internal"
-  [(parallel [(call (match_operand 0 "memory_op

Re: [PATCH][ARM]Use different startfile and endfile for elf target when generating shared object.

2017-01-13 Thread Renlin Li

Hi Christophe,

On 13/01/17 11:14, Christophe Lyon wrote:

On 13 January 2017 at 11:22, Renlin Li <renlin...@foss.arm.com> wrote:

Hi Christophe,

Thanks for testing the patch!
I check the test case gcc.dg/lto/pr54709, it seems the test case is not
properly written.

It add extra ld option -shared without checking the target support for that.
After the change, this compilation will fail as a regression.
IIUC, '-shared' option is required, it should be gated with corresponding
target selector.

"g++.dg/ipa/devirt-28a.C" now is skipped because of the target selector
there.
// { dg-do link { target { { gld && fpic } && shared } } }

perhaps "gcc.dg/lto/pr54709" should do similar things like this:
// { dg-do link { target { shared } } }

Quite likely, indeed.




As far as I know, with different cpu/arch configurations, different
relocations are generated in the library, some of the relocations are not
allowed to be used in shared
object.

With -march=armv7-a (and the --with-cpu=cortex-a9 option you mentioned), the
linking stage of the test will fail because of this error:
"relocation XXX against external symbol `YYY' can not be used when making a
shared object"
for instance: crtbegin.o: relocation R_ARM_MOVW_ABS_NC against `a local
symbol` can not be used when making a shared object; recompile with -fPIC

If you are luck enough, for example with arm7tdmi cpu, no such relocation is
generated in startup files. The "shared" target support check will pass for
simple and naive code.
"--with-cpu=cortex-m3" should be this case. But the test cases which require
shared object support will fail.


So this "shared" target checking mechanism is not reliable. The patch is to
change this.


Shouldn't your patch imply that several tests move from "fail" to
"unsupported" on armv7-a ? I'm surprised not to see any difference in the
results.





Oops, I reordered the explanation paragraphs in my last email, making it 
obscure.

The "shared" target check will fail on armv7-a architecture because of the 
reason
mentioned below. So they are already been ignored. After the change, they are 
still
marked as "unsupported", but with a different reason. "crtbeginS.o cannot be 
found"

The deja-gnu test framework will compose a small program to test whether the 
toolchain
supports "shared" option.

>> With -march=armv7-a (and the --with-cpu=cortex-a9 option you mentioned), the
>> linking stage of the test will fail because of this error:
>> "relocation XXX against external symbol `YYY' can not be used when making a
>> shared object"
>> for instance: crtbegin.o: relocation R_ARM_MOVW_ABS_NC against `a local
>> symbol` can not be used when making a shared object; recompile with -fPIC
>>

So the check won't pass, and the test case is marked as "unsupported".

>> If you are luck enough, for example with arm7tdmi cpu, no such relocation is
>> generated in startup files. The "shared" target support check will pass for
>> simple and naive code.
>> "--with-cpu=cortex-m3" should be this case. But the test cases which require
>> shared object support will fail.

So for the same test case,
With "--with-cpu=cortex-m3",
The "shared" target support check will pass. It is marked as supported, but fail to 
produce binary.

with --with-cpu=cortex-a9",
The "shared" target support check will fail. it is marked as "unsupported" and 
skipped.

After the change, the test case will marked as "unsupported" regardless of the
cpu/arch configuration.

Regards,
Renlin


Re: [PATCH][ARM]Use different startfile and endfile for elf target when generating shared object.

2017-01-13 Thread Renlin Li

Hi Christophe,

Thanks for testing the patch!
I check the test case gcc.dg/lto/pr54709, it seems the test case is not 
properly written.

It add extra ld option -shared without checking the target support for that. After the 
change, this compilation will fail as a regression.

IIUC, '-shared' option is required, it should be gated with corresponding 
target selector.

"g++.dg/ipa/devirt-28a.C" now is skipped because of the target selector there.
// { dg-do link { target { { gld && fpic } && shared } } }

perhaps "gcc.dg/lto/pr54709" should do similar things like this:
// { dg-do link { target { shared } } }


As far as I know, with different cpu/arch configurations, different relocations are 
generated in the library, some of the relocations are not allowed to be used in shared

object.

With -march=armv7-a (and the --with-cpu=cortex-a9 option you mentioned), the linking stage 
of the test will fail because of this error:

"relocation XXX against external symbol `YYY' can not be used when making a shared 
object"
for instance: crtbegin.o: relocation R_ARM_MOVW_ABS_NC against `a local symbol` can not be 
used when making a shared object; recompile with -fPIC


If you are luck enough, for example with arm7tdmi cpu, no such relocation is generated in 
startup files. The "shared" target support check will pass for simple and naive code.
"--with-cpu=cortex-m3" should be this case. But the test cases which require shared object 
support will fail.



So this "shared" target checking mechanism is not reliable. The patch is to 
change this.



Regards,
Renlin



On 13/01/17 08:48, Christophe Lyon wrote:

Hi Renlin,


On 12 January 2017 at 16:50, Renlin Li <renlin...@foss.arm.com> wrote:

Hi Kugan,

some of the targets do include pie, and use the same crtbegin file as shared
object.
For example, alpha/elf.h

And there are targets which don't do that,
For example, sh/elf.h

Most of the elf target seem only consider the simple case.

The purpose of this patch is to make it possible to correctly check whether
current toolchain supports shared object.

Current dejegnu target selector "shared" tries to compile a simple source
code to with "-shared -fpic" options to check whether "-shared" is
supported.

For arm baremetal targets with, this is not sufficient.

arm-none-eabi is built with multilib. When running this testcase, if it's
compiled with "-march=armv7-a".
The crtbegin.o for this architecture version contains relocations which
cannot be used in shared object.
This test will fail.

if no cpu or architecture is specified, the default cpu will be arm7tdmi.
The test will pass as crtbegin.o for this version doesn't contains any
relocations
only allowed in shared object.

To make this "shared" target selector work for arm baremetal toolchain. The
correct way is to use different startup file for shared and non-shared
toolchain.

If the toolchain doesn't support "-shared" option, and someone attempts to
use it
to create shared object, it will complaint that "crtbeginS.o" cannot not be
found.



I have run validations with your patch, and noticed regressions on arm-none-eabi
when using default cpu or --with-cpu=cortex-m3:

   - PASS now FAIL [PASS => FAIL]:

   gcc.dg/lto/pr54709 c_lto_pr54709_0.o-c_lto_pr54709_1.o link,  -fPIC
-fvisibility=hidden -flto
   gcc.dg/lto/pr61526 c_lto_pr61526_0.o-c_lto_pr61526_1.o link,  -fPIC
-flto -flto-partition=1to1
   gcc.dg/lto/pr64415 c_lto_pr64415_0.o-c_lto_pr64415_1.o link,  -O -flto -fpic

on the same configurations, I've noticed these improvements:

   g++.dg/ipa/devirt-28a.C  -std=gnu++11 (test for excess errors)
   g++.dg/ipa/devirt-28a.C  -std=gnu++14 (test for excess errors)
   g++.dg/ipa/devirt-28a.C  -std=gnu++98 (test for excess errors)
are now unsupported rather than fail.

Why is it different when the toolchain is configured --with-cpu=cortex-a9
for instance? Are the tests involving -shared already skipped in this case?


A full history of discussion is here:
https://gcc.gnu.org/ml/gcc-patches/2016-03/msg00322.html


Regards,
Renlin



On 12/01/17 11:47, kugan wrote:


Hi,

On 16/06/16 21:04, Renlin Li wrote:


  /* Now we define the strings used to build the spec file.  */
-#define UNKNOWN_ELF_STARTFILE_SPEC" crti%O%s crtbegin%O%s crt0%O%s"
+#define UNKNOWN_ELF_STARTFILE_SPEC\
+  "crti%O%s \
+  %{!shared:crtbegin%O%s} %{shared:crtbeginS%O%s} \
+  crt0%O%s"



Some targets seems to use shared|pie. When you change it, shouldn't it
also include for pie?

Thanks,
Kugan


Re: [PATCH][ARM]Use different startfile and endfile for elf target when generating shared object.

2017-01-12 Thread Renlin Li

Hi Kugan,

some of the targets do include pie, and use the same crtbegin file as shared 
object.
For example, alpha/elf.h

And there are targets which don't do that,
For example, sh/elf.h

Most of the elf target seem only consider the simple case.

The purpose of this patch is to make it possible to correctly check whether current 
toolchain supports shared object.


Current dejegnu target selector "shared" tries to compile a simple source code to with 
"-shared -fpic" options to check whether "-shared" is supported.


For arm baremetal targets with, this is not sufficient.

arm-none-eabi is built with multilib. When running this testcase, if it's
compiled with "-march=armv7-a".
The crtbegin.o for this architecture version contains relocations which
cannot be used in shared object.
This test will fail.

if no cpu or architecture is specified, the default cpu will be arm7tdmi.
The test will pass as crtbegin.o for this version doesn't contains any 
relocations
only allowed in shared object.

To make this "shared" target selector work for arm baremetal toolchain. The correct way is 
to use different startup file for shared and non-shared toolchain.


If the toolchain doesn't support "-shared" option, and someone attempts to use 
it
to create shared object, it will complaint that "crtbeginS.o" cannot not be 
found.

A full history of discussion is here:
https://gcc.gnu.org/ml/gcc-patches/2016-03/msg00322.html


Regards,
Renlin


On 12/01/17 11:47, kugan wrote:

Hi,

On 16/06/16 21:04, Renlin Li wrote:

 /* Now we define the strings used to build the spec file.  */
-#define UNKNOWN_ELF_STARTFILE_SPEC" crti%O%s crtbegin%O%s crt0%O%s"
+#define UNKNOWN_ELF_STARTFILE_SPEC\
+  "crti%O%s \
+  %{!shared:crtbegin%O%s} %{shared:crtbeginS%O%s} \
+  crt0%O%s"


Some targets seems to use shared|pie. When you change it, shouldn't it also 
include for pie?

Thanks,
Kugan


Re: [PING][PATCH][ARM]Use different startfile and endfile for elf target when generating shared object.

2017-01-12 Thread Renlin Li

~ Ping

https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01227.html

Regards,
Renlin

On 14/12/16 15:33, Renlin Li wrote:

Ping~

Regards,
Renlin

On 16/06/16 12:04, Renlin Li wrote:

Hi all,

GCC has startfile and endfile spec string built into it.
startfile is used to specify objects files to include at the start of the link 
process.
While endfile, on the other hand, is used to specify objects files to include 
at the end
of the link process.

crtbegin.o is one of the object files specified by startfile spec string. IIUC,
crtbeginS.o should be used in place of crtbegin.o when generating shared 
objects.
The same applies to crtend.o which is one of the endfile. crtendS.o should be 
used when
generating shared objects.

This patch makes the change to use different crtbegin and crtend files when 
creating
shared and static object for elf toolchain. The linux toolchain already did this
differentiation.

So when the toolchain doesn't support shared object, the following error 
message will be
produced:
ld: cannot find crtbeginS.o: No such file or directory

Still, those specs strings built into GCC can be overridden by using
-specs=command-line switch to specify a spec file.

arm-none-eabi regression test without new issues, OK for trunk?

Regards,
Renlin Li

gcc/ChangeLog:

2016-06-16  Renlin Li  <renlin...@arm.com>

 * config/arm/unknown-elf.h (UNKNOWN_ELF_STARTFILE_SPEC): Use
 crtbeginS.o for shared object.
 (UNKNOWN_ELF_ENDFILE_SPEC): Use crtendS.o for shared object.


[PING][PATCH][ARM]Use different startfile and endfile for elf target when generating shared object.

2016-12-14 Thread Renlin Li

Ping~

Regards,
Renlin

On 16/06/16 12:04, Renlin Li wrote:

Hi all,

GCC has startfile and endfile spec string built into it.
startfile is used to specify objects files to include at the start of the link 
process.
While endfile, on the other hand, is used to specify objects files to include 
at the end
of the link process.

crtbegin.o is one of the object files specified by startfile spec string. IIUC,
crtbeginS.o should be used in place of crtbegin.o when generating shared 
objects.
The same applies to crtend.o which is one of the endfile. crtendS.o should be 
used when
generating shared objects.

This patch makes the change to use different crtbegin and crtend files when 
creating
shared and static object for elf toolchain. The linux toolchain already did this
differentiation.

So when the toolchain doesn't support shared object, the following error 
message will be
produced:
ld: cannot find crtbeginS.o: No such file or directory

Still, those specs strings built into GCC can be overridden by using
-specs=command-line switch to specify a spec file.

arm-none-eabi regression test without new issues, OK for trunk?

Regards,
Renlin Li

gcc/ChangeLog:

2016-06-16  Renlin Li  <renlin...@arm.com>

 * config/arm/unknown-elf.h (UNKNOWN_ELF_STARTFILE_SPEC): Use
 crtbeginS.o for shared object.
 (UNKNOWN_ELF_ENDFILE_SPEC): Use crtendS.o for shared object.


[PATCH][AARCH64]Simplify call, call_value, sibcall, sibcall_value patterns.

2016-12-01 Thread Renlin Li

Hi all,

This patch refactors the code used in call, call_value, sibcall,
sibcall_value expanders.

Before the change, the logic is following:

call expander  --> call_internal  --> call_reg/call_symbol
call_vlaue expander--> call_value_internal--> 
call_value_reg/call_value_symbol

sibcall expander   --> sibcall_internal   --> sibcall_insn
sibcall_value expander --> sibcall_value_internal --> sibcall_value_insn

After the change, the logic is simplified into:

call expander  --> aarch64_expand_call() --> call_insn
call_value expander--> aarch64_expand_call() --> call_value_insn

sibcall expander   --> aarch64_expand_call() --> sibcall_insn
sibcall_value expander --> aarch64_expand_call() --> sibcall_value_insn

The code are factored out from each expander into aarch64_expand_call ().

This also fixes the two issues Richard Henderson suggests in comments 8:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64971

aarch64-none-elf regression test Okay, aarch64-linux bootstrap Okay.
Okay for trunk?

Regards,
Renlin Li


gcc/ChangeLog:

2016-12-01  Renlin Li  <renlin...@arm.com>

* config/aarch64/aarch64-protos.h (aarch64_expand_call): Declare.
* config/aarch64/aarch64.c (aarch64_expand_call): Define.
* config/aarch64/constraints.md (Usf): Add long call check.
* config/aarch64/aarch64.md (call): Use aarch64_expand_call.
(call_value): Likewise.
(sibcall): Likewise.
(sibcall_value): Likewise.
(call_insn): New.
(call_value_insn): New.
(sibcall_insn): Update rtx pattern.
(sibcall_value_insn): Likewise.
(call_internal): Remove.
(call_value_internal): Likewise.
(sibcall_internal): Likewise.
(sibcall_value_internal): Likewise.
(call_reg): Likewise.
(call_symbol): Likewise.
(call_value_reg): Likewise.
(call_value_symbol): Likewise.

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 7f67f14..3a5babb 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -305,6 +305,7 @@ bool aarch64_const_vec_all_same_int_p (rtx, HOST_WIDE_INT);
 bool aarch64_constant_address_p (rtx);
 bool aarch64_emit_approx_div (rtx, rtx, rtx);
 bool aarch64_emit_approx_sqrt (rtx, rtx, bool);
+void aarch64_expand_call (rtx, rtx, bool);
 bool aarch64_expand_movmem (rtx *);
 bool aarch64_float_const_zero_rtx_p (rtx);
 bool aarch64_function_arg_regno_p (unsigned);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 68a3380..c313cf5 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -4343,6 +4343,51 @@ aarch64_fixed_condition_code_regs (unsigned int *p1, unsigned int *p2)
   return true;
 }
 
+/* This function is used by the call expanders of the machine description.
+   RESULT is the register in which the result is returned.  It's NULL for
+   "call" and "sibcall".
+   MEM is the location of the function call.
+   SIBCALL indicates whether this function call is normal call or sibling call.
+   It will generate different pattern accordingly.  */
+
+void
+aarch64_expand_call (rtx result, rtx mem, bool sibcall)
+{
+  rtx call, callee, tmp;
+  rtvec vec;
+  machine_mode mode;
+
+  gcc_assert (MEM_P (mem));
+  callee = XEXP (mem, 0);
+  mode = GET_MODE (callee);
+  gcc_assert (mode == Pmode);
+
+  /* Decide if we should generate indirect calls by loading the
+ 64-bit address of the callee into a register before performing
+ the branch-and-link.  */
+
+  if (GET_CODE (callee) == SYMBOL_REF
+  ? (aarch64_is_long_call_p (callee)
+	 || aarch64_is_noplt_call_p (callee))
+  : !REG_P (callee))
+  XEXP (mem, 0) = force_reg (mode, callee);
+
+  call = gen_rtx_CALL (VOIDmode, mem, const0_rtx);
+
+  if (result != NULL_RTX)
+call = gen_rtx_SET (result, call);
+
+  if (sibcall)
+tmp = ret_rtx;
+  else
+tmp = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (Pmode, LR_REGNUM));
+
+  vec = gen_rtvec (2, call, tmp);
+  call = gen_rtx_PARALLEL (VOIDmode, vec);
+
+  aarch64_emit_call_insn (call);
+}
+
 /* Emit call insn with PAT and do aarch64-specific handling.  */
 
 void
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index bc6d8a2..5682686 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -718,12 +718,6 @@
 ;; Subroutine calls and sibcalls
 ;; ---
 
-(define_expand "call_internal"
-  [(parallel [(call (match_operand 0 "memory_operand" "")
-		(match_operand 1 "general_operand" ""))
-	  (use (match_operand 2 "" ""))
-	  (clobber (reg:DI LR_REGNUM))])])
-
 (define_expand "call"
   [(parallel [(call (match_operand 0 "memory_operan

Re: [PATCH][AARCH64]Skip gcc.target/aarch64/pr66912.c in tiny or large memory model.

2016-10-27 Thread Renlin Li



On 27/10/16 16:28, Andrew Pinski wrote:

On Thu, Oct 27, 2016 at 4:24 AM, Renlin Li <renlin...@foss.arm.com> wrote:

Hi,

On 27/10/16 11:48, Szabolcs Nagy wrote:


On 27/10/16 11:25, Renlin Li wrote:


Hi all,

This a simple patch to fix gcc.target/aarch64/pr66912.c.
It's a test case only applicable to small memory model which is the
default
one.



/* { dg-final { scan-assembler ":got(page_lo15)?:n_common" } } */

i think this is supposed to work on tiny and small model as well.
(:got:var vs :gotpage_lo15:var)


Sorry, I didn't aware it's a regex which will match both.


it will have to be updated for large model when we have support for that.



yes, large memory model will have different relocation for this case, which
will
not be caught by this pattern.



It also fails for ILP32.  I have not looked into the assembler output there yet.


Hi Andrew,

For ILP32, the relocation will be R_AARCH64_LD32_GOTPAGE_LO14 in small memory 
model.
So the string modifier would be "gotpage_lo14"

Regards,
Renlin


Thanks,
Andrew



Regards,
Renlin





It has been tested to run only when the memory model is small.
Okay to commit?

Regards,
Renlin Li

gcc/testsuite/ChangeLog:

2016-10-27  Renlin Li  <renlin...@arm.com>

  * gcc.target/aarch64/pr66912.c: Skip tiny and large memory model.







Re: [PATCH][AARCH64]Skip gcc.target/aarch64/pr66912.c in tiny or large memory model.

2016-10-27 Thread Renlin Li

Hi,

On 27/10/16 11:48, Szabolcs Nagy wrote:

On 27/10/16 11:25, Renlin Li wrote:

Hi all,

This a simple patch to fix gcc.target/aarch64/pr66912.c.
It's a test case only applicable to small memory model which is the default
one.



   /* { dg-final { scan-assembler ":got(page_lo15)?:n_common" } } */

i think this is supposed to work on tiny and small model as well.
(:got:var vs :gotpage_lo15:var)


Sorry, I didn't aware it's a regex which will match both.


it will have to be updated for large model when we have support for that.


yes, large memory model will have different relocation for this case, which will
not be caught by this pattern.

Regards,
Renlin




It has been tested to run only when the memory model is small.
Okay to commit?

Regards,
Renlin Li

gcc/testsuite/ChangeLog:

2016-10-27  Renlin Li  <renlin...@arm.com>

 * gcc.target/aarch64/pr66912.c: Skip tiny and large memory model.




[PATCH][AARCH64]Skip gcc.target/aarch64/pr66912.c in tiny or large memory model.

2016-10-27 Thread Renlin Li

Hi all,

This a simple patch to fix gcc.target/aarch64/pr66912.c.
It's a test case only applicable to small memory model which is the default
one.

It has been tested to run only when the memory model is small.
Okay to commit?

Regards,
Renlin Li

gcc/testsuite/ChangeLog:

2016-10-27  Renlin Li  <renlin...@arm.com>

* gcc.target/aarch64/pr66912.c: Skip tiny and large memory model.
commit 364538b449d62c9a411b31021bdd9f355d36edf1
Author: Renlin Li <renlin...@arm.com>
Date:   Wed Jan 6 14:00:16 2016 +

fix pr66912.c

diff --git a/gcc/testsuite/gcc.target/aarch64/pr66912.c b/gcc/testsuite/gcc.target/aarch64/pr66912.c
index b8aabcd..be07641 100644
--- a/gcc/testsuite/gcc.target/aarch64/pr66912.c
+++ b/gcc/testsuite/gcc.target/aarch64/pr66912.c
@@ -1,5 +1,7 @@
 /* { dg-do compile { target *-*-linux* } } */
+/* { dg-require-effective-target aarch64_small_fpic } */
 /* { dg-options "-O2 -fpic" } */
+/* { dg-skip-if "small memory model" { aarch64*-*-* }  { "-mcmodel=tiny" "-mcmodel=large" } { "" } } */
 
 __attribute__((visibility("protected")))
 int n_common;


[RFC][IRA]Initialize ira_use_lra_p early by moving the initialization into ira_init_once ().

2016-09-21 Thread Renlin Li

Hi,

ira_use_lra_p is a global variable use in ira as well as 
backend_init_target ().
It's fine to be used in IRA as it's will be initialized at the beginning 
of ira pass.


However, early in backend_init_target (), this variable may not be 
initialized yet. There is a check in backend_init_target ():


'''
if (!ira_use_lra_p)
  init_reload ();
'''

In this case, init_reload () will always be called if ira_use_lra_p is 
not initialized.


ira_init_once () is a better place for the initialization.
It's called early in initialize_rtl (), just before
backend_init_target ().
And as the name suggests, it's called once to initialize function 
independent data structure.


aarch64-none-elf regression test Okay, x86-64-linux bootstrap Okay.

Regards,
Renlin

gcc/ChangeLog:

2016-09-21  Renlin Li  <renlin...@arm.com>

* ira.c (ira): Move ira_use_lra_p initialization code to ...
(ira_init_once): Here.
diff --git a/gcc/ira.c b/gcc/ira.c
index f8a59e3..9e7ba52 100644
--- a/gcc/ira.c
+++ b/gcc/ira.c
@@ -1665,6 +1665,8 @@ ira_init_once (void)
 {
   ira_init_costs_once ();
   lra_init_once ();
+
+  ira_use_lra_p = targetm.lra_p ();
 }
 
 /* Free ira_max_register_move_cost, ira_may_move_in_cost and
@@ -5082,7 +5084,6 @@ ira (FILE *f)
 
   ira_conflicts_p = optimize > 0;
 
-  ira_use_lra_p = targetm.lra_p ();
   /* If there are too many pseudos and/or basic blocks (e.g. 10K
  pseudos and 10K blocks or 100K pseudos and 1K blocks), we will
  use simplified and faster algorithms in LRA.  */


Re: [PATCH][COMMITTED] Revert r238497 because of PR 71961.

2016-08-05 Thread Renlin Li

Hi Joost,

I am not familiar with fortran code.
Maybe Thomas can do something in his new patch?

Regards,
Renlin

On 28/07/16 12:34, VandeVondele  Joost wrote:

Thanks.. I wonder if you could add the testcase in

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71961#c11

to the testsuite, as it catches the underlying issue.

Regards,

Joost VandeVondele



[PATCH][PR64971]Convert function pointer to Pmode when emit call

2016-08-04 Thread Renlin Li

Hi all,

In the case of PR64971 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64971),
the compiler ICE when compiling gcc.c-torture/compile/pr37433.c with 
ilp32 abi.


As we know, in aarch64 ilp32, the ptr_mode is SImode while Pmode is 
still DImode. It means all address should be DImode, and the backend 
defines the patterns with this assumption.


The generic part expand_expr_addr_expr () function however generates a
SYMBOL_REF with SImode, it's later used as the address of a MEM rtx 
pattern in a call expression. There is no matching pattern for this 
SImode address, that's why gcc ICEs.

(symbol_ref/f:SI ("*.LC0") [flags 0x82] )

But here, I think what expand_expr_addr_expr does is correct. In this
particular case, expand_expr_addr_expr is not generating an address. 
According to the source code, it's generating a function pointer, and 
later this pointer is used in a call expression. So SImode should be 
right in this case.


The behavior of the test case is, get the address of a piece of memory, 
cast it into a function pointer, and call the function. IIUC, the flow 
is like this:

CALL_EXPR ( NOP_EXPR (ADDR_EXPR ()))

NOP_EXPR here is to convert the address into a function pointer which
should be ptr_mode (SImode). So it's the responsibility of call expander
to convert the pointer into Pmode to create legal call rtx patern.

In the test case, there are two functions. The first function generates 
function calls with a SYMBOL_REF as address, the second one generates a 
REG as address. They are all of ptr_mode.
However, prepare_call_address () will convert the REG into Pmode to make 
it as a legal address while SYMBOL_REF is missed. That's why I add the 
code there.


And I want to change the PR64971 into a middle-end issue. The ICE 
manifests in aarch64 target, but I believe this should be a generic 
problem for targets which define ptr_mode different from Pmode.


There is a test case already, so I didn't add one.
aarch64-none-elf regression test Okay, aarch64-linux bootstrap Okay.
But I believe this may not help as the default abi is LP64.

It will be great if Andrew you can help to do regression test in your
aarch64 ilp32 environment.

And I double checked that, the backend fix can be removed without any
problem. It's good to expose middle-end bugs.

Okay for trunk and backport to branch 6?

gcc/ChangeLog:

2016-08-04  Renlin Li  <renlin...@arm.com>

PR middle-end/64971
* calls.c (prepare_call_address): Convert funexp to Pmode when
necessary.
* config/aarch64/aarch64.md (sibcall): Remove fix for PR 64971.
(sibcall_value): Likewise.
diff --git a/gcc/calls.c b/gcc/calls.c
index c04d00f..b00c153 100644
--- a/gcc/calls.c
+++ b/gcc/calls.c
@@ -194,10 +194,19 @@ prepare_call_address (tree fndecl_or_type, rtx funexp, rtx static_chain_value,
 	   && targetm.small_register_classes_for_mode_p (FUNCTION_MODE))
 	  ? force_not_mem (memory_address (FUNCTION_MODE, funexp))
 	  : memory_address (FUNCTION_MODE, funexp));
-  else if (! sibcallp)
+  else
 {
-  if (!NO_FUNCTION_CSE && optimize && ! flag_no_function_cse)
-	funexp = force_reg (Pmode, funexp);
+  /* funexp could be a SYMBOL_REF represents a function pointer which is
+	 of ptr_mode.  In this case, it should be converted into address mode
+	 to be a valid address for memory rtx pattern.  See PR 64971.  */
+  if (GET_MODE (funexp) != Pmode)
+	funexp = convert_memory_address (Pmode, funexp);
+
+  if (! sibcallp)
+	{
+	  if (!NO_FUNCTION_CSE && optimize && ! flag_no_function_cse)
+	funexp = force_reg (Pmode, funexp);
+	}
 }
 
   if (static_chain_value != 0
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index f15dd8d..c95258b 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -859,13 +859,6 @@
 	   || aarch64_is_noplt_call_p (callee)))
   XEXP (operands[0], 0) = force_reg (Pmode, callee);
 
-/* FIXME: This is a band-aid.  Need to analyze why expand_expr_addr_expr
-   is generating an SImode symbol reference.  See PR 64971.  */
-if (TARGET_ILP32
-	&& GET_CODE (XEXP (operands[0], 0)) == SYMBOL_REF
-	&& GET_MODE (XEXP (operands[0], 0)) == SImode)
-  XEXP (operands[0], 0) = convert_memory_address (Pmode,
-		  XEXP (operands[0], 0));
 if (operands[2] == NULL_RTX)
   operands[2] = const0_rtx;
 
@@ -897,14 +890,6 @@
 	   || aarch64_is_noplt_call_p (callee)))
   XEXP (operands[1], 0) = force_reg (Pmode, callee);
 
-/* FIXME: This is a band-aid.  Need to analyze why expand_expr_addr_expr
-   is generating an SImode symbol reference.  See PR 64971.  */
-if (TARGET_ILP32
-	&& GET_CODE (XEXP (operands[1], 0)) == SYMBOL_REF
-	&& GET_MODE (XEXP (operands[1], 0)) == SImode)
-  XEXP (operands[1], 0) = convert_memory_address (Pmode,
-		  XEXP (operands[1], 0));
-
 if (operands[3] == NULL_RTX)
   operands[3] = const0_rtx;
 


[PATCH][COMMITTED] Revert r238497 because of PR 71961.

2016-07-28 Thread Renlin Li

Hi all,

This patch reverts the change for PR 71902 since it causes 178.gagel 
miscompile in spec2000 as reported in PR 71961 which was observed in 
x86_64, aarch64, powerpc64.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71961

As a consequence, I will reopen PR 71902: 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71902


Regards,
Renlin Li


gcc/fortran/ChangeLog:
c
2016-07-28  Renlin Li  <renlin...@arm.com>

Revert
2016-07-19  Thomas Koenig  <tkoe...@gcc.gnu.org>

PR fortran/71902
* dependency.c (gfc_check_dependency): Use dep_ref.  Handle case
if identical is true and two array element references differ.
(gfc_dep_resovler):  Move most of the code to dep_ref.
(dep_ref):  New function.
* frontend-passes.c (realloc_string_callback):  Name temporary
variable "realloc_string".

gcc/testsuite/ChangeLog:

2016-07-28  Renlin Li  <renlin...@arm.com>

Revert
2016-07-19  Thomas Koenig  <tkoe...@gcc.gnu.org>

PR fortran/71902
* gfortran.dg/dependency_47.f90:  New test.
diff --git a/gcc/fortran/dependency.c b/gcc/fortran/dependency.c
index a873dbe..f117de0 100644
--- a/gcc/fortran/dependency.c
+++ b/gcc/fortran/dependency.c
@@ -54,8 +54,6 @@ enum gfc_dependency
 static gfc_dependency check_section_vs_section (gfc_array_ref *,
 		gfc_array_ref *, int);
 
-static gfc_dependency dep_ref (gfc_ref *, gfc_ref *, gfc_reverse *);
-
 /* Returns 1 if the expr is an integer constant value 1, 0 if it is not or
def if the value could not be determined.  */
 
@@ -1318,33 +1316,13 @@ gfc_check_dependency (gfc_expr *expr1, gfc_expr *expr2, bool identical)
 	  return 0;
 	}
 
+  if (identical)
+	return 1;
+
   /* Identical and disjoint ranges return 0,
 	 overlapping ranges return 1.  */
   if (expr1->ref && expr2->ref)
-	{
-	  gfc_dependency dep;
-	  dep = dep_ref (expr1->ref, expr2->ref, NULL);
-	  switch (dep)
-	{
-	case GFC_DEP_EQUAL:
-	  return identical;
-
-	case GFC_DEP_FORWARD:
-	  return 0;
-
-	case GFC_DEP_BACKWARD:
-	  return 1;
-
-	case GFC_DEP_OVERLAP:
-	  return 1;
-
-	case GFC_DEP_NODEP:
-	  return 0;
-
-	default:
-	  gcc_unreachable();
-	}
-	}
+	return gfc_dep_resolver (expr1->ref, expr2->ref, NULL);
 
   return 1;
 
@@ -2074,39 +2052,11 @@ ref_same_as_full_array (gfc_ref *full_ref, gfc_ref *ref)
	2 : array references are overlapping but reversal of one or
 	more dimensions will clear the dependency.
	1 : array references are overlapping.
-   	0 : array references are identical or can be handled in a forward loop.  */
+   	0 : array references are identical or not overlapping.  */
 
 int
 gfc_dep_resolver (gfc_ref *lref, gfc_ref *rref, gfc_reverse *reverse)
 {
-  enum gfc_dependency dep;
-  dep = dep_ref (lref, rref, reverse);
-  switch (dep)
-{
-case GFC_DEP_EQUAL:
-  return 0;
-
-case GFC_DEP_FORWARD:
-  return 0;
-
-case GFC_DEP_BACKWARD:
-  return 2;
-
-case GFC_DEP_OVERLAP:
-  return 1;
-
-case GFC_DEP_NODEP:
-  return 0;
-
-default:
-  gcc_unreachable();
-}
-}
-
-
-static gfc_dependency
-dep_ref (gfc_ref *lref, gfc_ref *rref, gfc_reverse *reverse)
-{
   int n;
   int m;
   gfc_dependency fin_dep;
@@ -2129,22 +2079,21 @@ dep_ref (gfc_ref *lref, gfc_ref *rref, gfc_reverse *reverse)
 	  /* The two ranges can't overlap if they are from different
 	 components.  */
 	  if (lref->u.c.component != rref->u.c.component)
-	return GFC_DEP_NODEP;
+	return 0;
 	  break;
 
 	case REF_SUBSTRING:
 	  /* Substring overlaps are handled by the string assignment code
 	 if there is not an underlying dependency.  */
-
-	  return fin_dep == GFC_DEP_ERROR ? GFC_DEP_NODEP : fin_dep;
+	  return (fin_dep == GFC_DEP_OVERLAP) ? 1 : 0;
 
 	case REF_ARRAY:
 
 	  if (ref_same_as_full_array (lref, rref))
-	return GFC_DEP_EQUAL;
+	return 0;
 
 	  if (ref_same_as_full_array (rref, lref))
-	return GFC_DEP_EQUAL;
+	return 0;
 
 	  if (lref->u.ar.dimen != rref->u.ar.dimen)
 	{
@@ -2155,7 +2104,7 @@ dep_ref (gfc_ref *lref, gfc_ref *rref, gfc_reverse *reverse)
 		fin_dep = gfc_full_array_ref_p (lref, NULL) ? GFC_DEP_EQUAL
 			: GFC_DEP_OVERLAP;
 	  else
-		return GFC_DEP_OVERLAP;
+		return 1;
 	  break;
 	}
 
@@ -2199,7 +2148,7 @@ dep_ref (gfc_ref *lref, gfc_ref *rref, gfc_reverse *reverse)
 
 	  /* If any dimension doesn't overlap, we have no dependency.  */
 	  if (this_dep == GFC_DEP_NODEP)
-		return GFC_DEP_NODEP;
+		return 0;
 
 	  /* Now deal with the loop reversal logic:  This only works on
 		 ranges and is activated by setting
@@ -2266,7 +2215,7 @@ dep_ref (gfc_ref *lref, gfc_ref *rref, gfc_reverse *reverse)
 	  /* Exactly matching and forward overlapping ranges don't cause a
 	 dependency.  */
 	  if (fin_dep < GFC_DEP_BACKWARD)
-	  

Re: C++ PATCH for c++/71913 (copy elision choices)

2016-07-25 Thread Renlin Li

Hi Jason,

On 22/07/16 04:01, Jason Merrill wrote:

71913 is a case where unsafe_copy_elision_p was being too
conservative. We can allow copy elision in a new expression; the only
way we could end up initializing a base subobject without knowing it
would be through a placement new, in which case we would already be
using the wrong (complete object) constructor, so copy elision doesn't
make it any worse.




diff --git a/gcc/testsuite/g++.dg/init/elide5.C 
b/gcc/testsuite/g++.dg/init/elide5.C
new file mode 100644
index 000..0a9978c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/init/elide5.C
@@ -0,0 +1,27 @@
+// PR c++/71913
+// { dg-do link { target c++11 } }
+
+void* operator new(unsigned long, void* p) { return p; }


g++.dg/init/elide5.C fails on target whose SIZE_TYPE is not "long 
unsigned int".


testsuite/g++.dg/init/elide5.C:4:42: error: 'operator new' takes type 
'size_t' ('unsigned int') as first parameter [-fpermissive]


I have checked, for most 32 bit architectures or ABI, the SIZE_TYPE is 
"unsigned int". arm is one of them.


To make this test case portable, will __SIZE_TYPE__ be better in this 
case, instead of "unsigned long" as first argument of new operator?


(sorry for the duplicate reply in the bugzilla, I just found the email here)

Regards,
Renlin


Re: [PATCH] correct atomic_compare_exchange_n return type (c++/71675)

2016-07-25 Thread Renlin Li

Hi Martin,

I observed the following error:

ERROR: gcc.dg/atomic/pr71675.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects : syntax error in target selector "target c11" for 
" dg-do 3 compile { target c11 } "


It seems we don't have a c11 effective target check available
in dejagnu target-supports.exp.

Thanks,
Renlin


diff --git a/gcc/testsuite/gcc.dg/atomic/pr71675.c 
b/gcc/testsuite/gcc.dg/atomic/pr71675.c
new file mode 100644
index 000..0e344ac
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/atomic/pr71675.c
@@ -0,0 +1,32 @@
+/* PR c++/71675 - __atomic_compare_exchange_n returns wrong type for typed enum
+ */
+/* { dg-do compile { target c11 } } */


Re: [PATCH PR71734] Add missed check that reference defined inside loop.

2016-07-19 Thread Renlin Li

Hi Yuri,

I saw this test case runs on arm platforms, and maybe other platforms as 
well.


testsuite/g++.dg/vect/pr70729.cc:7:10: fatal error: xmmintrin.h: No such 
file or directory


Before the change here, it's gated by vect_simd_clones target selector, 
which limit it to i?86/x86_64 platform only.


Regards,
Renlin Li



On 08/07/16 15:07, Yuri Rumyantsev wrote:

Hi Richard,

Thanks for your help - your patch looks much better.
Here is new patch in which additional argument was added to determine
source loop of reference.

Bootstrap and regression testing did not show any new failures.

Is it OK for trunk?
ChangeLog:
2016-07-08  Yuri Rumyantsev  <ysrum...@gmail.com>

PR tree-optimization/71734
* tree-ssa-loop-im.c (ref_indep_loop_p_1): Add REF_LOOP argument which
contains REF, use it to check safelen, assume that safelen value
must be greater 1, fix style.
(ref_indep_loop_p_2): Add REF_LOOP argument.
(ref_indep_loop_p): Pass LOOP as additional argument to
ref_indep_loop_p_2.
gcc/testsuite/ChangeLog:
 * g++.dg/vect/pr70729.cc: Delete redundant dg options, fix style.

2016-07-08 11:18 GMT+03:00 Richard Biener <richard.guent...@gmail.com>:

On Thu, Jul 7, 2016 at 5:38 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote:

I checked simd3.f90 and found out that my additional check reject
independence of references

REF is independent in loop#3
.istart0.19, .iend0.20
which are defined in loop#1 which is outer for loop#3.
Note that these references are defined by
_103 = __builtin_GOMP_loop_dynamic_next (&.istart0.19, &.iend0.20);
which is in loop#1.
It is clear that both these references can not be independent for loop#3.


Ok, so we end up calling ref_indep_loop for ref in LOOP also for inner loops
of LOOP to catch memory references in those as well.  So the issue is really
that we look at the wrong loop for safelen and we _do_ want to apply safelen
to inner loops as well.

So better track the loop we are ultimately asking the question for, like in the
attached patch (fixes the testcase for me).

Richard.




2016-07-07 17:11 GMT+03:00 Richard Biener <richard.guent...@gmail.com>:

On Thu, Jul 7, 2016 at 4:04 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote:

I Added this check because of new failures in libgomp.fortran suite.
Here is copy of Jakub message:
--- Comment #29 from Jakub Jelinek  ---
The #c27 r237844 change looks bogus to me.
First of all, IMNSHO you can argue this way only if ref is a reference seen in
loop LOOP,


or inner loops of LOOP I guess.  I _think_ we never call ref_indep_loop_p_1 with
a REF whose loop is not a sub-loop of LOOP or LOOP itself (as it would not make
sense to do that, it would be a waste of time).

So only if "or inner loops of LOOP" is not correct the check would be needed
but then my issue with unrolling an inner loop and turning a ref that safelen
does not apply to into a ref that it now applies to arises.

I don't fully get what Jakub is hinting at.

Can you install the safelen > 0 -> safelen > 1 fix please?  Jakub, can you
explain that bitmap check with a simple testcase?

Thanks,
Richard.


which is the case of e.g. *.omp_data_i_23(D).a ref in simd3.f90 -O2
-fopenmp -msse2, but not the D.3815[0] case tested during can_sm_ref_p - the
D.3815[0] = 0; as well as something = D.3815[0]; stmt found in the outer loop
obviously can be dependent on many of the loads and/or stores in the loop, be
it "omp simd array" or not.
Say for
void
foo (int *p, int *q)
{
   #pragma omp simd
   for (int i = 0; i < 1024; i++)
 p[i] += q[0];
}
sure, q[0] can't alias p[0] ... p[1022], the earlier iterations could write
something that changes its value, and then it would behave differently from
using VF = 1024, where everything is performed in parallel.
Though, actually, it can alias, just it would have to write the same value as
was there.  So, if this is used to determine if it is safe to hoist the load
before the loop, it is fine, if it is used to determine if [0] >= [0] &&
[0] <= [1023], then it is not fine.

For aliasing of q[0] and p[1023], I don't see why they couldn't alias in a
valid program.  #pragma omp simd I think guarantees that the last iteration is
executed last, it isn't necessarily executed last alone, it could be, or
together with one before last iteration, or (for simdlen INT_MAX) even all
iterations can be done concurrently, in hw or sw, so it is fine if it is
transformed into:
   int temp[1024], temp2[1024], temp3[1024];
   for (int i = 0; i < 1024; i++)
 temp[i] = p[i];
   for (int i = 0; i < 1024; i++)
 temp2[i] = q[0];
   /* The above two loops can be also swapped, or intermixed.  */
   for (int i = 0; i < 1024; i++)
 temp3[i] = temp[i] + temp2[i];
   for (int i = 0; i < 1024; i++)
 p[i] = temp3[i];
   /* Or the above loop reversed etc. */

If you have:
int
bar (int *p, int *q)
{
   q[0] = 0;
   #pragma omp simd
   for (int i = 0; i < 1024; i++)
  

Re: Update probabilities in predict.def to match reality

2016-06-20 Thread Renlin Li

Hi,

On 08/06/16 11:21, Andreas Schwab wrote:

Jan Hubicka <hubi...@ucw.cz> writes:


Bootstrapped/regtested x86_64-linux, will commit it later today.


FAIL: gcc.dg/tree-ssa/slsr-8.c scan-tree-dump-times optimized " w?* " 7


This fails for all arm and aarch64 targets as well since the commit.

Regards,
Renlin Li



Andreas.



[PATCH]Fix scan-tree-dump-times syntax errors in gcc.dg/tree-ssa/attr-hotcold-2.c

2016-06-20 Thread Renlin Li

Hi,

This is a simple patch to fix the syntax errors in dg-final directive 
lines within this test case.


According to the documentation, the syntax of this directive should be:
'''scan-tree-dump-times regex num suffix [{ target/xfail selector }]'''


Now the test case compilers Okay in arm environment. However, the last 
two checks seem failing. This is another issue.


Okay to commit?

Regards,
Renlin

gcc/testsuite/ChangeLog:

2016-06-20  Renlin Li  <renlin...@arm.com>

* gcc.dg/tree-ssa/attr-hotcold-2.c: Fix syntax errors.

On 13/06/16 17:35, Kyrill Tkachov wrote:

Hi Honza,

On 07/06/16 20:27, Jan Hubicka wrote:

Hello,
Maritn Liska measured branch predictor hitrates on current tree and




In the testsuite I'm seeing:
ERROR: gcc.dg/tree-ssa/attr-hotcold-2.c: error executing dg-final:
syntax error in target selector "profile_estimate"

on aarch64-none-elf.
I think the hunk:
-/* { dg-final { scan-ipa-dump-times "block 4, loop depth 0, count 0,
freq 1\[^0-9\]" 1 "profile_estimate" } } */
+/* { dg-final { scan-tree-dump-times 1 "hot label heuristics" 1
"profile_estimate" } } */
+/* { dg-final { scan-tree-dump-times 1 "cold label heuristics" 1
"profile_estimate" } } */
+/* { dg-final { scan-tree-dump-times "block 4, loop depth 0, count 0,
freq \[1-4\]\[^0-9\]" 1 "profile_estimate" } } */

is buggy, should it be
-/* { dg-final { scan-ipa-dump-times "block 4, loop depth 0, count 0,
freq 1\[^0-9\]" 1 "profile_estimate" } } */
+/* { dg-final { scan-tree-dump-times "hot label heuristics" 1
"profile_estimate" } } */
+/* { dg-final { scan-tree-dump-times "cold label heuristics" 1
"profile_estimate" } } */
+/* { dg-final { scan-tree-dump-times "block 4, loop depth 0, count 0,
freq \[1-4\]\[^0-9\]" 1 "profile_estimate" } } */
?

With that change the test runs but still FAILs:
FAIL: gcc.dg/tree-ssa/attr-hotcold-2.c scan-tree-dump-times
profile_estimate "block 4, loop depth 0, count 0, freq [1-4][^0-9]" 1
FAIL: gcc.dg/tree-ssa/attr-hotcold-2.c scan-tree-dump-times
profile_estimate "block 5, loop depth 0, count 0, freq
[6-9][0-9][0-9][0-9]" 1

Thanks,
Kyrill
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/attr-hotcold-2.c b/gcc/testsuite/gcc.dg/tree-ssa/attr-hotcold-2.c
index 6623d9e..e2e8143 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/attr-hotcold-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/attr-hotcold-2.c
@@ -18,8 +18,8 @@ void f(int x, int y)
   return;
 }
 
-/* { dg-final { scan-tree-dump-times 1 "hot label heuristics" 1 "profile_estimate" } } */
-/* { dg-final { scan-tree-dump-times 1 "cold label heuristics" 1 "profile_estimate" } } */
+/* { dg-final { scan-tree-dump-times "hot label heuristics" 1 "profile_estimate" } } */
+/* { dg-final { scan-tree-dump-times "cold label heuristics" 1 "profile_estimate" } } */
 /* { dg-final { scan-tree-dump-times "block 4, loop depth 0, count 0, freq \[1-4\]\[^0-9\]" 1 "profile_estimate" } } */
 
 /* Note: we're attempting to match some number > 6000, i.e. > 60%.


[PATCH][ARM]Use different startfile and endfile for elf target when generating shared object.

2016-06-16 Thread Renlin Li

Hi all,

GCC has startfile and endfile spec string built into it.
startfile is used to specify objects files to include at the start of 
the link process. While endfile, on the other hand, is used to specify 
objects files to include at the end of the link process.


crtbegin.o is one of the object files specified by startfile spec 
string. IIUC, crtbeginS.o should be used in place of crtbegin.o when 
generating shared objects.
The same applies to crtend.o which is one of the endfile. crtendS.o 
should be used when generating shared objects.


This patch makes the change to use different crtbegin and crtend files 
when creating shared and static object for elf toolchain. The linux 
toolchain already did this differentiation.


So when the toolchain doesn't support shared object, the following error 
message will be produced:

ld: cannot find crtbeginS.o: No such file or directory

Still, those specs strings built into GCC can be overridden by using
-specs=command-line switch to specify a spec file.

arm-none-eabi regression test without new issues, OK for trunk?

Regards,
Renlin Li

gcc/ChangeLog:

2016-06-16  Renlin Li  <renlin...@arm.com>

* config/arm/unknown-elf.h (UNKNOWN_ELF_STARTFILE_SPEC): Use
crtbeginS.o for shared object.
(UNKNOWN_ELF_ENDFILE_SPEC): Use crtendS.o for shared object.
diff --git a/gcc/config/arm/unknown-elf.h b/gcc/config/arm/unknown-elf.h
index fafe057..12ef497 100644
--- a/gcc/config/arm/unknown-elf.h
+++ b/gcc/config/arm/unknown-elf.h
@@ -29,14 +29,19 @@
 #endif
 
 /* Now we define the strings used to build the spec file.  */
-#define UNKNOWN_ELF_STARTFILE_SPEC	" crti%O%s crtbegin%O%s crt0%O%s"
+#define UNKNOWN_ELF_STARTFILE_SPEC	\
+  "crti%O%s \
+  %{!shared:crtbegin%O%s} %{shared:crtbeginS%O%s} \
+  crt0%O%s"
 
 #undef  STARTFILE_SPEC
 #define STARTFILE_SPEC	\
   "%{Ofast|ffast-math|funsafe-math-optimizations:crtfastmath.o%s} "	\
   UNKNOWN_ELF_STARTFILE_SPEC
 
-#define UNKNOWN_ELF_ENDFILE_SPEC	"crtend%O%s crtn%O%s"
+#define UNKNOWN_ELF_ENDFILE_SPEC	\
+  "%{!shared:crtend%O%s} %{shared:crtendS%O%s} \
+  crtn%O%s"
 
 #undef  ENDFILE_SPEC
 #define ENDFILE_SPEC	UNKNOWN_ELF_ENDFILE_SPEC


Re: [PATCH, aarch64] Fix 70048

2016-06-07 Thread Renlin Li

Hi Richard,

On 16/03/16 21:25, Richard Henderson wrote:

This fixes only the regression described in the PR.

There was quite a bit of follow-up that points to new work that ought to
be done during the gcc7 cycle, but isn't really appropriate now.

Tested on aarch64-linux; committed as reviewed in the PR.


r~



@@ -4953,74 +4963,43 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x  */, 
machine_mode mode)

   if (GET_CODE (x) == PLUS && CONST_INT_P (XEXP (x, 1)))
 {
-  HOST_WIDE_INT offset = INTVAL (XEXP (x, 1));
-  HOST_WIDE_INT base_offset;
+  rtx base = XEXP (x, 0);
+  rtx offset_rtx XEXP (x, 1);




I recently read the aarch64_legitimize_address function, and find a 
suspicious line of code in the above change.


>> + rtx offset_rtx XEXP (x, 1);

It's committed by you. It looks like a typo, and an assignment seems 
missing?


James suggests me this is c++ initialization. Ah, yes it is! But I 
believe this is an coincident?

As you have different initialization code above.

I made an obvious patch to make it looks more intuitive, is it Okay?


Regards,
Renlin Li




gcc/changelog:

2016-06-06  renlin li  <renlin...@arm.com>

* config/aarch64/aarch64.c (aarch64_legitimize_address): Add assignment.
commit 1fd77baf4ca918ed25dbce4678d7be7b7cd51be2
Author: Renlin Li <renlin...@arm.com>
Date:   Mon Jun 6 11:24:39 2016 +0100

fix type

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index ad07fe1..54e6813 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -4949,7 +4949,7 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x  */, machine_mode mode)
   if (GET_CODE (x) == PLUS && CONST_INT_P (XEXP (x, 1)))
 {
   rtx base = XEXP (x, 0);
-  rtx offset_rtx XEXP (x, 1);
+  rtx offset_rtx = XEXP (x, 1);
   HOST_WIDE_INT offset = INTVAL (offset_rtx);
 
   if (GET_CODE (base) == PLUS)


Re: [PATCH]Replace -shared with -r -nostdlib in gcc.dg/lto/pr61526 pr54709 pr64415 test cases.

2016-03-03 Thread Renlin Li

Hi Richard,

On 03/03/16 12:47, Richard Biener wrote:

On Thu, Mar 3, 2016 at 1:07 PM, Renlin Li <renlin...@foss.arm.com> wrote:

Hi Richard,


On 03/03/16 10:13, Richard Biener wrote:

On Wed, Mar 2, 2016 at 5:12 PM, Renlin Li <renlin...@foss.arm.com> wrote:

Hi Richard,


On 02/03/16 13:35, Richard Biener wrote:

On Tue, Mar 1, 2016 at 4:56 PM, Renlin Li <renlin...@foss.arm.com>
wrote:

Hi Richard,


On 01/03/16 09:16, Richard Biener wrote:

On Mon, Feb 29, 2016 at 5:13 PM, Renlin Li <renlin...@foss.arm.com>
wrote:

Hi all,

The gcc.dg/lto/pr54709, pr61526, pr64415 linking testcases keep
failing
on
arm/aarch64 bare-metal target.

It's because statically built newlib library is used to link with
shared
object.
And the linker complains about relocations which cannot be used in
shared object.

For example, the following errors are produced:

crtbegin.o: relocation R_ARM_MOVW_ABS_NC against `a local symbol' can
not
be
used when making a shared object; recompile with -fPIC

crtbegin.o: relocation R_ARM_THM_MOVW_ABS_NC against `a local symbol'
can
not
be used when making a shared object; recompile with -fPIC

librdimon.a(rdimon-syscalls.o): relocation R_AARCH64_ADR_PREL_PG_HI21
against
external symbol `_impure_ptr' can not be used when making a shared
object;
recompile with -fPIC

Presumably, bare-metal toolchain for other architecture have those
test
case
failures as well?

In this patch, -shared option is replace by -r -nostdlib. So that the
standard
system startup files or libraries are not used when linking.

Note that -shared is not equivalent to -r -nostdlib so please verify
that
the
original issue can be still reproduced with its fix reverted but -r
-nostdlib
used with the new -r -nostdlib handling on trunk.


pr54709_0.c: Cannot be reproduced with even -shared. The error message
is
the same as shown above.
pr64415_0.c: Reproduced with "-r -nostdlib".
pr61526_0.c: Reproduced with "-r -nostdlib".

By the way, those linking test cases all pass for linux toolchain. Only
fail
for aarch64/arm baremetal toolchain.

Andrew, I saw you have done similar things in r153555
https://gcc.gnu.org/viewcvs/gcc?view=revision=153555

Do you have any thoughts?

And also here, the last comments in this ticket suggests to add
check_effective_target_shared to the exp file to limit it to linux
targets
only:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61526

As said LTO testcases tend to be somewhat fragile so limiting them to
targets known to work might be the best option.

Richard.


Forgot the mention that, by purely adding "-nostdlib" option (instead of
replacing -shared)
fixes the failures as well.

I test those test cases again with fix reverted, keep "-shared" option,
add
"-nostdlib" option.

Ok, so I discovered we have a "shared" target which means if a target
doesn't
support shared libs we can guard against it with using

/* { dg-require-effective-target shared } */

does adding that to the three testcases fix the issue for you?

By adding this target check
/* { dg-require-effective-target shared } */

Those test cases aredeemed to be unsupported, and thus skipped for
aarch64-none-elf target.

However, it's a little bit tricky for arm bare-metal target.

The shared option check actually successes for arm-none-eabi toolchain.
This is because the default cpu for arm-none-eabi toolchain is arm7tdmi. And
the start file crtbegin.o doesn't contains any modifications not allowed in
shared object.

arm-none-eabi is built with multilib. When running this testcase, it's
compiled with "-march=armv7-a".
The crtbegin.o for this architecture version contains relocations which
cannot be used in shared object.
That's why they fails to linking test.

For -shared it should provide a crtbeginS.o then.  Why not fix it properly?

Richard.


That's the case for linux toolchain. Multiple versions of startfile are 
generated.

crtbegin.o, crtbeginS.o, crtbeginT.o etc.

If I understand it correctly, this is not applicable to bare-metal 
tool-chain?
Because, normally bare-metal toolchain will not be used to create shared 
object.


I have double checked, almost all bare-metal toolchain only requires 
crtbegin.o.

The targets define STARTFILE_SPEC in a simple way.

The failures here are complaining creating shared object including 
statically generated object.

The code in start files is not used or interact with the test cases.
So I think it's reasonable to use "-nostdlib" to exclude standard 
startup file or libraries when testing the linking.


Certainly, we can skip the test cases for bare-metal toolchain.
However, as explained above, it seems this support checker is not fully 
capable to do this.

/* { dg-require-effective-target shared } */

Regards,
Renlin






Will adding "-nostdlib" (instead of replace -shared) option be an reasonable
fix given my previous check?

Regards,
Renlin




Thanks,
Richard.


pr54709_0.c: 

Re: [PATCH]Replace -shared with -r -nostdlib in gcc.dg/lto/pr61526 pr54709 pr64415 test cases.

2016-03-03 Thread Renlin Li

Hi Richard,

On 03/03/16 10:13, Richard Biener wrote:

On Wed, Mar 2, 2016 at 5:12 PM, Renlin Li <renlin...@foss.arm.com> wrote:

Hi Richard,


On 02/03/16 13:35, Richard Biener wrote:

On Tue, Mar 1, 2016 at 4:56 PM, Renlin Li <renlin...@foss.arm.com> wrote:

Hi Richard,


On 01/03/16 09:16, Richard Biener wrote:

On Mon, Feb 29, 2016 at 5:13 PM, Renlin Li <renlin...@foss.arm.com>
wrote:

Hi all,

The gcc.dg/lto/pr54709, pr61526, pr64415 linking testcases keep failing
on
arm/aarch64 bare-metal target.

It's because statically built newlib library is used to link with
shared
object.
And the linker complains about relocations which cannot be used in
shared object.

For example, the following errors are produced:

crtbegin.o: relocation R_ARM_MOVW_ABS_NC against `a local symbol' can
not
be
used when making a shared object; recompile with -fPIC

crtbegin.o: relocation R_ARM_THM_MOVW_ABS_NC against `a local symbol'
can
not
be used when making a shared object; recompile with -fPIC

librdimon.a(rdimon-syscalls.o): relocation R_AARCH64_ADR_PREL_PG_HI21
against
external symbol `_impure_ptr' can not be used when making a shared
object;
recompile with -fPIC

Presumably, bare-metal toolchain for other architecture have those test
case
failures as well?

In this patch, -shared option is replace by -r -nostdlib. So that the
standard
system startup files or libraries are not used when linking.

Note that -shared is not equivalent to -r -nostdlib so please verify
that
the
original issue can be still reproduced with its fix reverted but -r
-nostdlib
used with the new -r -nostdlib handling on trunk.


pr54709_0.c: Cannot be reproduced with even -shared. The error message is
the same as shown above.
pr64415_0.c: Reproduced with "-r -nostdlib".
pr61526_0.c: Reproduced with "-r -nostdlib".

By the way, those linking test cases all pass for linux toolchain. Only
fail
for aarch64/arm baremetal toolchain.

Andrew, I saw you have done similar things in r153555
https://gcc.gnu.org/viewcvs/gcc?view=revision=153555

Do you have any thoughts?

And also here, the last comments in this ticket suggests to add
check_effective_target_shared to the exp file to limit it to linux
targets
only:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61526

As said LTO testcases tend to be somewhat fragile so limiting them to
targets known to work might be the best option.

Richard.


Forgot the mention that, by purely adding "-nostdlib" option (instead of
replacing -shared)
fixes the failures as well.

I test those test cases again with fix reverted, keep "-shared" option, add
"-nostdlib" option.

Ok, so I discovered we have a "shared" target which means if a target doesn't
support shared libs we can guard against it with using

/* { dg-require-effective-target shared } */

does adding that to the three testcases fix the issue for you?

By adding this target check
/* { dg-require-effective-target shared } */

Those test cases aredeemed to be unsupported, and thus skipped for 
aarch64-none-elf target.


However, it's a little bit tricky for arm bare-metal target.

The shared option check actually successes for arm-none-eabi toolchain.
This is because the default cpu for arm-none-eabi toolchain is arm7tdmi. And
the start file crtbegin.o doesn't contains any modifications not allowed 
in shared object.


arm-none-eabi is built with multilib. When running this testcase, it's 
compiled with "-march=armv7-a".
The crtbegin.o for this architecture version contains relocations which 
cannot be used in shared object.

That's why they fails to linking test.

Will adding "-nostdlib" (instead of replace -shared) option be an reasonable
fix given my previous check?

Regards,
Renlin




Thanks,
Richard.


pr54709_0.c: Cannot be reproduced even with test case unmodified.
The error message is the same as shown above. with "-nostdlib", no failure.

pr64415_0.c: Reproduced.
pr61526_0.c: Reproduced.

Regards,
Renlin




Regards,
Renlin



Otherwise simply dg-skip for aarch64.

Richard.


arm-none-eabi, aarch64-none-elf regression test OK, OK for trunk?

Regards,
Renlin Li

gcc/testsuite/ChangeLog:

2016-02-29  Renlin Li<renlin...@arm.com>

   * gcc.dg/lto/pr54709_0.c: Replace -shard with -r -nostdlib.
   * gcc.dg/lto/pr61526_0.c: Ditto.
   * gcc.dg/lto/pr64415_0.c: Ditto.





Re: [PATCH]Replace -shared with -r -nostdlib in gcc.dg/lto/pr61526 pr54709 pr64415 test cases.

2016-03-02 Thread Renlin Li

Hi Richard,

On 02/03/16 13:35, Richard Biener wrote:

On Tue, Mar 1, 2016 at 4:56 PM, Renlin Li <renlin...@foss.arm.com> wrote:

Hi Richard,


On 01/03/16 09:16, Richard Biener wrote:

On Mon, Feb 29, 2016 at 5:13 PM, Renlin Li <renlin...@foss.arm.com> wrote:

Hi all,

The gcc.dg/lto/pr54709, pr61526, pr64415 linking testcases keep failing
on
arm/aarch64 bare-metal target.

It's because statically built newlib library is used to link with shared
object.
And the linker complains about relocations which cannot be used in
shared object.

For example, the following errors are produced:

crtbegin.o: relocation R_ARM_MOVW_ABS_NC against `a local symbol' can not
be
used when making a shared object; recompile with -fPIC

crtbegin.o: relocation R_ARM_THM_MOVW_ABS_NC against `a local symbol' can
not
be used when making a shared object; recompile with -fPIC

librdimon.a(rdimon-syscalls.o): relocation R_AARCH64_ADR_PREL_PG_HI21
against
external symbol `_impure_ptr' can not be used when making a shared
object;
recompile with -fPIC

Presumably, bare-metal toolchain for other architecture have those test
case
failures as well?

In this patch, -shared option is replace by -r -nostdlib. So that the
standard
system startup files or libraries are not used when linking.

Note that -shared is not equivalent to -r -nostdlib so please verify that
the
original issue can be still reproduced with its fix reverted but -r
-nostdlib
used with the new -r -nostdlib handling on trunk.


pr54709_0.c: Cannot be reproduced with even -shared. The error message is
the same as shown above.
pr64415_0.c: Reproduced with "-r -nostdlib".
pr61526_0.c: Reproduced with "-r -nostdlib".

By the way, those linking test cases all pass for linux toolchain. Only fail
for aarch64/arm baremetal toolchain.

Andrew, I saw you have done similar things in r153555
https://gcc.gnu.org/viewcvs/gcc?view=revision=153555

Do you have any thoughts?

And also here, the last comments in this ticket suggests to add
check_effective_target_shared to the exp file to limit it to linux targets
only:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61526

As said LTO testcases tend to be somewhat fragile so limiting them to
targets known to work might be the best option.

Richard.


Forgot the mention that, by purely adding "-nostdlib" option (instead of 
replacing -shared)

fixes the failures as well.

I test those test cases again with fix reverted, keep "-shared" option, 
add "-nostdlib" option.


pr54709_0.c: Cannot be reproduced even with test case unmodified.
The error message is the same as shown above. with "-nostdlib", no failure.

pr64415_0.c: Reproduced.
pr61526_0.c: Reproduced.

Regards,
Renlin





Regards,
Renlin



Otherwise simply dg-skip for aarch64.

Richard.


arm-none-eabi, aarch64-none-elf regression test OK, OK for trunk?

Regards,
Renlin Li

gcc/testsuite/ChangeLog:

2016-02-29  Renlin Li<renlin...@arm.com>

  * gcc.dg/lto/pr54709_0.c: Replace -shard with -r -nostdlib.
  * gcc.dg/lto/pr61526_0.c: Ditto.
  * gcc.dg/lto/pr64415_0.c: Ditto.





Re: [PATCH]Replace -shared with -r -nostdlib in gcc.dg/lto/pr61526 pr54709 pr64415 test cases.

2016-03-01 Thread Renlin Li

Hi Richard,

On 01/03/16 09:16, Richard Biener wrote:

On Mon, Feb 29, 2016 at 5:13 PM, Renlin Li <renlin...@foss.arm.com> wrote:

Hi all,

The gcc.dg/lto/pr54709, pr61526, pr64415 linking testcases keep failing on
arm/aarch64 bare-metal target.

It's because statically built newlib library is used to link with shared
object.
And the linker complains about relocations which cannot be used in
shared object.

For example, the following errors are produced:

crtbegin.o: relocation R_ARM_MOVW_ABS_NC against `a local symbol' can not be
used when making a shared object; recompile with -fPIC

crtbegin.o: relocation R_ARM_THM_MOVW_ABS_NC against `a local symbol' can
not
be used when making a shared object; recompile with -fPIC

librdimon.a(rdimon-syscalls.o): relocation R_AARCH64_ADR_PREL_PG_HI21
against
external symbol `_impure_ptr' can not be used when making a shared object;
recompile with -fPIC

Presumably, bare-metal toolchain for other architecture have those test case
failures as well?

In this patch, -shared option is replace by -r -nostdlib. So that the
standard
system startup files or libraries are not used when linking.

Note that -shared is not equivalent to -r -nostdlib so please verify that the
original issue can be still reproduced with its fix reverted but -r -nostdlib
used with the new -r -nostdlib handling on trunk.


pr54709_0.c: Cannot be reproduced with even -shared. The error message 
is the same as shown above.

pr64415_0.c: Reproduced with "-r -nostdlib".
pr61526_0.c: Reproduced with "-r -nostdlib".

By the way, those linking test cases all pass for linux toolchain. Only 
fail for aarch64/arm baremetal toolchain.


Andrew, I saw you have done similar things in r153555
https://gcc.gnu.org/viewcvs/gcc?view=revision=153555

Do you have any thoughts?

And also here, the last comments in this ticket suggests to add
check_effective_target_shared to the exp file to limit it to linux 
targets only:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61526

Regards,
Renlin



Otherwise simply dg-skip for aarch64.

Richard.


arm-none-eabi, aarch64-none-elf regression test OK, OK for trunk?

Regards,
Renlin Li

gcc/testsuite/ChangeLog:

2016-02-29  Renlin Li<renlin...@arm.com>

 * gcc.dg/lto/pr54709_0.c: Replace -shard with -r -nostdlib.
 * gcc.dg/lto/pr61526_0.c: Ditto.
 * gcc.dg/lto/pr64415_0.c: Ditto.





[PATCH]Replace -shared with -r -nostdlib in gcc.dg/lto/pr61526 pr54709 pr64415 test cases.

2016-02-29 Thread Renlin Li

Hi all,

The gcc.dg/lto/pr54709, pr61526, pr64415 linking testcases keep failing on
arm/aarch64 bare-metal target.

It's because statically built newlib library is used to link with shared object.
And the linker complains about relocations which cannot be used in
shared object.

For example, the following errors are produced:

crtbegin.o: relocation R_ARM_MOVW_ABS_NC against `a local symbol' can not be
used when making a shared object; recompile with -fPIC

crtbegin.o: relocation R_ARM_THM_MOVW_ABS_NC against `a local symbol' can not
be used when making a shared object; recompile with -fPIC

librdimon.a(rdimon-syscalls.o): relocation R_AARCH64_ADR_PREL_PG_HI21 against
external symbol `_impure_ptr' can not be used when making a shared object;
recompile with -fPIC

Presumably, bare-metal toolchain for other architecture have those test case
failures as well?

In this patch, -shared option is replace by -r -nostdlib. So that the standard
system startup files or libraries are not used when linking.


arm-none-eabi, aarch64-none-elf regression test OK, OK for trunk?

Regards,
Renlin Li

gcc/testsuite/ChangeLog:

2016-02-29  Renlin Li<renlin...@arm.com>

* gcc.dg/lto/pr54709_0.c: Replace -shard with -r -nostdlib.
* gcc.dg/lto/pr61526_0.c: Ditto.
* gcc.dg/lto/pr64415_0.c: Ditto.

diff --git a/gcc/testsuite/gcc.dg/lto/pr54709_0.c b/gcc/testsuite/gcc.dg/lto/pr54709_0.c
index f3db5dc..12a10e0 100644
--- a/gcc/testsuite/gcc.dg/lto/pr54709_0.c
+++ b/gcc/testsuite/gcc.dg/lto/pr54709_0.c
@@ -1,7 +1,7 @@
 /* { dg-lto-do link } */
 /* { dg-require-visibility "hidden" } */
 /* { dg-require-effective-target fpic } */
-/* { dg-extra-ld-options { -shared } } */
+/* { dg-extra-ld-options { -r -nostdlib } } */
 /* { dg-lto-options { { -fPIC -fvisibility=hidden -flto } } } */
 
 void foo (void *p, void *q, unsigned s)
diff --git a/gcc/testsuite/gcc.dg/lto/pr61526_0.c b/gcc/testsuite/gcc.dg/lto/pr61526_0.c
index 8a631f0..5e2f7acf 100644
--- a/gcc/testsuite/gcc.dg/lto/pr61526_0.c
+++ b/gcc/testsuite/gcc.dg/lto/pr61526_0.c
@@ -1,7 +1,7 @@
 /* { dg-require-effective-target fpic } */
 /* { dg-lto-do link } */
 /* { dg-lto-options { { -fPIC -flto -flto-partition=1to1 } } } */
-/* { dg-extra-ld-options { -shared } } */
+/* { dg-extra-ld-options { -r -nostdlib } } */
 
 static void *master;
 void *foo () { return master; }
diff --git a/gcc/testsuite/gcc.dg/lto/pr64415_0.c b/gcc/testsuite/gcc.dg/lto/pr64415_0.c
index 4faab2b..0f583a5 100644
--- a/gcc/testsuite/gcc.dg/lto/pr64415_0.c
+++ b/gcc/testsuite/gcc.dg/lto/pr64415_0.c
@@ -1,7 +1,7 @@
 /* { dg-lto-do link } */
 /* { dg-require-effective-target fpic } */
 /* { dg-lto-options { { -O -flto -fpic } } } */
-/* { dg-extra-ld-options { -shared } } */
+/* { dg-extra-ld-options { -r -nostdlib } } */
 /* { dg-extra-ld-options "-Wl,-undefined,dynamic_lookup" { target *-*-darwin* } } */
 
 extern void bar(char *, int);




Re: [PATCH][LRA]Don't generate reload for output scratch operand from reload instruction.

2016-02-29 Thread Renlin Li

Hi Vladimir,

Thank you for explain it.
I have a few comments inlined.

On 26/02/16 23:54, Vladimir Makarov wrote:


Thanks for working on this and providing a good description of the 
problem.  Could you fill a PR and provide a test even if you can not 
reduce it.


I will fill a PR. Try to reduce a test case. As it's triggered by my 
local change to gcc, I cannot guarantee it.

Anyway, I am quite happy to test your fix when you have one.

As for the scratch.  As I understand the scratch was introduced for 
operands which will not require any resources (memory or a new 
register) for some insn alternatives.  If we use pseudo for this, it 
will always need memory or a register.  The typical constraint for 
scratch is "r,X" or "0r".  So I guess using just "" for scratch is a 
bad practice.  Still for compatibility I think we should implement the 
same reload behaviour for this case too.


Actually (clobber (match_scratch:MODE x "=r")) also triggers this ICE. 
The early clobber modifier here doesn't really matter.
the purpose of this pattern is to reserve a pseudo register for use as a 
temporary.
The "=" modifier is required for MATCH_SCRATCH expression. Otherwise, it 
will error "missing output reload"

That why (set scratch, RXX) is generated.



I believe we should use the same technique -- changing scratches to 
pseudo and back at the end of LRA if they don't need a register.  It 
will solve also a possible problem for correct scratch generation 
during LRA.


I am going to work on this problem on the next week.  A test case 
would be a help for me.


gcc/ChangeLog:

2016-02-26  Renlin Li<renlin...@arm.com>

* lra-constraints.c (curr_insn_transform): Don't generate reload for
output scratch operand.

Sorry, I can not accept the patch as I'd like to provide a better 
solution I described above.  The patch is also wrong for unused 
non-scratch operands.  They still should be reloaded if they do not 
satisfy their constraints even if they are not used later.




I think still it will be reload according to the code logic here:

if (get_reload_reg (type, mode, old, goal_alt[i],
  loc != curr_id->operand_loc[i], "", _reg)
  && type != OP_OUT)
{
  push_to_sequence (before);
  lra_emit_move (new_reg, old);
  before = get_insns ();
  end_sequence ();
}
  *loc = new_reg; ->>>>>>>>> the original operand will 
be replaced by a reload reg.

  if (type != OP_IN
  /* Don't generate reload for output scratch operand.  */
  && GET_CODE (old) != SCRATCH
  && find_reg_note (curr_insn, REG_UNUSED, old) == NULL_RTX)


a reload register will be generated to replace the old operand in the 
original rtx pattern to satisfy their constraints.
Later, it will check, if this operand is an ouput operand which will be 
used later, another insn will be generated to

move newly generate pseudo into old operand.

The patch is to add one more condition to this final insn generation.

Regards,
Renlin






Re: [PATCH][LRA]Don't generate reload for output scratch operand from reload instruction.

2016-02-26 Thread Renlin Li

Hi Richard,

On 26/02/16 12:57, Richard Biener wrote:

On Fri, Feb 26, 2016 at 1:54 PM, Renlin Li <renlin...@foss.arm.com> wrote:


I have checked, x86, arm, aarch64, mips, arc all have such patterns. But
it's
not triggered. In my case, it's triggered by compiling glibc with local
change.



Please extract a testcase from your modified glibc sources.

Richard.


The change is to the compiler. I change the compiler and tries to build 
a linux toolchain.


I tried to create a test case. But no luck.

It needs to make LRA generate a reload pattern with a non-input scratch 
register operand.

This type of pattern actually is quite common in the target backends.

I also searched the GCC bugzilla website, there is no similar bug reported.

Regards,
Renlin



Regards,
Renlin Li


gcc/ChangeLog:

2016-02-26  Renlin Li<renlin...@arm.com>

 * lra-constraints.c (curr_insn_transform): Don't generate reload for
 output scratch operand.





[PATCH][LRA]Don't generate reload for output scratch operand from reload instruction.

2016-02-26 Thread Renlin Li

Hi all,

I admit that, the title looks a little bit confusing.

The situation is like this,
To make insn_1 strict, lra generates a new insn_1_reload insn.
In insn_1_reload, there is a scratch operand with this form
clobber (match_scratch:MODE x "=")

When lra tries to reload insn_1_reload in later iteration, a new pseudo
register (let say RXX) is created to replace this scratch operand in-place.
Additionally, a new insn will be generated and inserted after insn_1_reload to
finish the reload. It's in this form:
(set scratch, RXX)

And this instruction is illegal. no target implements this kind of pattern.
LRA will ICE because of this.
"internal compiler error: in lra_set_insn_recog_data, at lra.c:964"

And indeed, this pattern has no side-effect. The scratch operand should
stay inside the pattern.

Normally, at the very beginning of LRA reload, all scratch operands will be
replaced by newly created pseudo register. However, this is a problem when
generated reload insn has output scratch operand.

I have checked, x86, arm, aarch64, mips, arc all have such patterns. But it's
not triggered. In my case, it's triggered by compiling glibc with local change.

So a simple change is made in this patch. The output operand is reloaded only
when it's not a scratch operand and it's not unused since then.


aarch64-none-linux-gnu bootstrap and regression test OK.
x86_64-linux bootstrap and regression test OK.
OK for trunk?

Regards,
Renlin Li


gcc/ChangeLog:

2016-02-26  Renlin Li<renlin...@arm.com>

* lra-constraints.c (curr_insn_transform): Don't generate reload for
output scratch operand.

diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index 08cf0aa6c4208bb60ba5071bad1255d587f1cb4a..ef5809ff226cca69bb711bfc5dab55e24caba01a 100644
--- a/gcc/lra-constraints.c
+++ b/gcc/lra-constraints.c
@@ -3882,6 +3882,8 @@ curr_insn_transform (bool check_only_p)
 	}
 	  *loc = new_reg;
 	  if (type != OP_IN
+	  /* Don't generate reload for output scratch operand.  */
+	  && GET_CODE (old) != SCRATCH
 	  && find_reg_note (curr_insn, REG_UNUSED, old) == NULL_RTX)
 	{
 	  start_sequence ();



[4.9][PR69082]Backport "[PATCH][ARM]Tighten the conditions for arm_movw, arm_movt"

2016-01-12 Thread Renlin Li

Hi all,

Here I backport r227129 to branch 4.9 to fix exactly the same issue reported in 
PR69082.
It's been already committed on trunk and backportted to branch 5.


I have quoted the original message for the explanation.
The patch applies to branch 4.9 without any modifications.
Test case is not added as the one provided in the bugzilla ticket is too big 
and complex.

arm-none-linux-gnueabihf regression tested without any issues.

Is Okay to backport to branch 4.9?

Renlin Li


gcc/ChangeLog

2016-01-08  Renlin Li  <renlin...@arm.com>

PR target/69082
Backport from mainline:
2015-08-24  Renlin Li  <renlin...@arm.com>

* config/arm/arm-protos.h (arm_valid_symbolic_address_p): Declare.
* config/arm/arm.c (arm_valid_symbolic_address_p): Define.
* config/arm/arm.md (arm_movt): Use arm_valid_symbolic_address_p.
* config/arm/constraints.md ("j"): Add check for high code


On 19/08/15 15:37, Renlin Li wrote:



On 19/08/15 12:49, Renlin Li wrote:

Hi all,

This simple patch will tighten the conditions when matching movw and
arm_movt rtx pattern.
Those two patterns will generate the following assembly:

movw w1, #:lower16: dummy + addend
movt w1, #:upper16: dummy + addend

The addend here is optional. However, it should be an 16-bit signed
value with in the range -32768 <= A <= 32768.

By impose this restriction explicitly, it will prevent LRA/reload code
from generation invalid high/lo_sum code for arm target.
In process_address_1(), if the address is not legitimate, it will 
try to

generate high/lo_sum pair to put the address into register. It will
check if the target support those newly generated reload instructions.
By define those two patterns, arm will reject them if conditions is not
meet.

Otherwise, it might generate movw/movt instructions with addend larger
than 32768, this will cause a GAS error. GAS will produce '''offset out
of range'' error message when the addend for MOVW/MOVT REL 
relocation is

too large.


arm-none-eabi regression tests Okay, Okay to commit to the trunk and
backport to 5.0?

Regards,
Renlin

gcc/ChangeLog:

2015-08-19  Renlin Li  <renlin...@arm.com>

   * config/arm/arm-protos.h (arm_valid_symbolic_address_p): 
Declare.

   * config/arm/arm.c (arm_valid_symbolic_address_p): Define.
   * config/arm/arm.md (arm_movt): Use 
arm_valid_symbolic_address_p.

   * config/arm/constraints.md ("j"): Add check for high code.


Thank you,
Renlin



diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index cef9eec..ff168bf 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -319,6 +319,7 @@ extern int vfp3_const_double_for_bits (rtx);
 
 extern void arm_emit_coreregs_64bit_shift (enum rtx_code, rtx, rtx, rtx, rtx,
 	   rtx);
+extern bool arm_valid_symbolic_address_p (rtx);
 extern bool arm_validize_comparison (rtx *, rtx *, rtx *);
 #endif /* RTX_CODE */
 
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index c2095a3..7cc4d93 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -28664,6 +28664,38 @@ arm_emit_coreregs_64bit_shift (enum rtx_code code, rtx out, rtx in,
   #undef BRANCH
 }
 
+/* Returns true if the pattern is a valid symbolic address, which is either a
+   symbol_ref or (symbol_ref + addend).
+
+   According to the ARM ELF ABI, the initial addend of REL-type relocations
+   processing MOVW and MOVT instructions is formed by interpreting the 16-bit
+   literal field of the instruction as a 16-bit signed value in the range
+   -32768 <= A < 32768.  */
+
+bool
+arm_valid_symbolic_address_p (rtx addr)
+{
+  rtx xop0, xop1 = NULL_RTX;
+  rtx tmp = addr;
+
+  if (GET_CODE (tmp) == SYMBOL_REF || GET_CODE (tmp) == LABEL_REF)
+return true;
+
+  /* (const (plus: symbol_ref const_int))  */
+  if (GET_CODE (addr) == CONST)
+tmp = XEXP (addr, 0);
+
+  if (GET_CODE (tmp) == PLUS)
+{
+  xop0 = XEXP (tmp, 0);
+  xop1 = XEXP (tmp, 1);
+
+  if (GET_CODE (xop0) == SYMBOL_REF && CONST_INT_P (xop1))
+	  return IN_RANGE (INTVAL (xop1), -0x8000, 0x7fff);
+}
+
+  return false;
+}
 
 /* Returns true if a valid comparison operation and makes
the operands in a form that is valid.  */
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 288bbb9..eefb7fa 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -5774,7 +5774,7 @@
   [(set (match_operand:SI 0 "nonimmediate_operand" "=r")
 	(lo_sum:SI (match_operand:SI 1 "nonimmediate_operand" "0")
 		   (match_operand:SI 2 "general_operand"  "i")))]
-  "arm_arch_thumb2"
+  "arm_arch_thumb2 && arm_valid_symbolic_address_p (operands[2])"
   "movt%?\t%0, #:upper16:%c2"
   [(set_attr "predicable" "yes")
(set_attr "predicable_short_it" "no")
diff --git a/gcc/conf

Re: [PR67383][ARM][4.9]Backport of "Allow any register for DImode values in Thumb2"

2015-11-27 Thread Renlin Li

Hi Ramana,

On 16/10/15 14:54, Renlin Li wrote:



The command line implies we remove r7 (frame pointer in Thumb2 - 
historical accident, fno-omit-frame-pointer), r9 (ffixed-r9), r10 
(-mpic-register) which

leaves us with:

* r0, r1
* r2, r3
* r4, r5

as the only free registers available for DImode values for the whole 
compilation.


We then have r0, r1 and r2 live across the insn which means that 
there are no free registers to handle DImode values
under the constraints provided unless LRA / reload can spill the 
argument registers which it doesn't seem to be able to do

in this particular testcase. Vlad, is that correct ?
According to the logic, conflict hard register are excluded from spill 
candidate. That's why, in this case, r0, r1, r2 cannot be used.



In the test case, there are code structure like this.


uint64_t callee (int a, int b, int c, int d);
uint64_t caller (int a, int b, int c, int d)
{
  uint64_t res;
/*
single BB contains complicated data processing which requires register pair
*/

  res = callee (tmp, b ,c, d);
  return res;
}

CES pass in this case will extend the hard register live range across 
the whole BB until the callee. In this case, r1, r2, r3 are excluded 
from allocatable registers.


There are places in CES which prevents extending the hard register's 
live range, for example for hard register which fullfil 
small_register_classes_for_mode_p(), class_likely_spilled_p(). However, 
argument registers belong to neither of them.


I tried to stop CES from extending argument registers live range. 
However, later, scheduler jumps in and re-orders the instruction to 
reduce the pseudo register pressure, which in effect extend the argument 
register live again.


Regards,

Renlin Li





Re: [PATCH] g++.dg/init/vbase1.C and g++.dg/cpp/ucn-1.C

2015-11-16 Thread Renlin Li

Hi David,

On 14/11/15 00:33, David Edelsohn wrote:

No RISC architecture can store directly to MEM, so the expected RTL in
g++.dg/init/vbase1.C is wrong.  I am adding XFAIL for PowerPC.  This
probably should be disabled for ARM and other RISC architectures.


I observed the same problem in arm.

This passes for aarch64 and mips as they have zero register to do that. 
However, other RISC might not have that feature, for example arm and 
RS6000 in this  case.


https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03239.html

Regards,
Renlin Li


Dollar sign is not a valid identifier on AIX, so g++.dg/cpp/ucn-1.C
will produce an additional error on AIX.

* g++.dg/init/vbase1.C: XFAIL powerpc*-*-*.
* g++.dg/cpp/ucn-1.C: Expect error for dollar sign identifier on AIX.

Thanks, David

Index: init/vbase1.C
===
--- init/vbase1.C   (revision 230366)
+++ init/vbase1.C   (working copy)
@@ -42,4 +42,4 @@
  // Verify that the SubB() mem-initializer is storing 0 directly into
  // this->D.whatever rather than into a stack temp that is then copied into the
  // base field.
-// { dg-final { scan-rtl-dump "set
\[^\n\]*\n\[^\n\]*this\[^\n\]*\n\[^\n\]*const_int 0" "expand" } }
+// { dg-final { scan-rtl-dump "set
\[^\n\]*\n\[^\n\]*this\[^\n\]*\n\[^\n\]*const_int 0" "expand" { xfail
{ powerpc*-*-* } } } }
Index: cpp/ucn-1.C
===
--- cpp/ucn-1.C (revision 230366)
+++ cpp/ucn-1.C (working copy)
@@ -7,8 +7,9 @@
"\u0041";// 'A' UCN is OK in string literal
'\u0041';// also OK in character literal

-  int c\u0041c;  // { dg-error "not valid in an
identifier" }
-  int c\u0024c;  // $ is OK; not part of basic
source char set
+  int c\u0041c;// { dg-error "not valid in an identifier" }
+   // $ is OK on most targets; not part of basic source char set
+  int c\u0024c;// { dg-error "not valid in an identifier" {
target { powerpc-ibm-aix* } } }

U"\uD800"; // { dg-error "not a valid universal character" }
  }





[PATCH][ARM]Fix addsi3_compare_op2 pattern.

2015-11-12 Thread Renlin Li

Hi all,

This is a simply patch to adjust the assembly output for 
addsi3_compare_op2 rtx pattern in ARM backend.


According to the constraints, it's the second alternative which allows 
the second operand to be a constant.
The original pattern will trigger an ICE when the third alternative is 
chosen, and trying to output a constant while the second operand is a 
register.


This is triggered by my experimental backend changes. branch 5, 4.9 all 
have this problem.


arm-none-linux-gnueabihf bootstrap Okay, arm-none-eabi regression test Okay.

Okay to commit into trunk and backport to branch 5 and 4.9?

Regards,
Renlin Li

gcc/ChangeLog:

2015-11-12  Renlin Li  <renlin...@arm.com>

* config/arm/arm.md (addsi3_compare_op2): Make the order of
assembly pattern consistent with constraint order.
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 8ebb1bf..73c3088 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -747,8 +747,8 @@
   "TARGET_32BIT"
   "@
adds%?\\t%0, %1, %2
-   adds%?\\t%0, %1, %2
-   subs%?\\t%0, %1, #%n2"
+   subs%?\\t%0, %1, #%n2
+   adds%?\\t%0, %1, %2"
   [(set_attr "conds" "set")
(set_attr "type" "alus_imm,alus_imm,alus_sreg")]
 )


Re: [PING][PATCH][4.9]Backport "Fix register corruption bug in ree"

2015-11-06 Thread Renlin Li

Hi Richard,

I am trying to come up with a simple test case, but failed to do so.

I have isolated the function where bug is triggered. However, in order 
to make it a useful test case, the silently corrupted register should be 
used somewhere, and should affect the correctness of the program. I 
cannot create such an context.



libstdc++-v3/testsuite/tr1/8_c_compatibility/complex/50880.cc is a 
concreted example which will trigger this bug.


Here is a short description about why the test case failed:

wrong code-gen in ree pass:

(insn 279 278 531 2 (set (reg:TF 32 v0)
(zero_extend:TF (reg:DI 19 x19 [orig:150 __x+8 ] [150]))) 
/libstdc++-v3/include/complex:861 691 {aarch64_movtflow_di}

 (nil))
(insn 531 279 275 2 (set (reg:TF 3 x3)
(reg:TF 32 v0)) /libstdc++-v3/include/complex:861 -1
 (nil))
(insn 275 531 510 2 (set (reg:DI 23 x23 [orig:151 __y ] [151])
(mem:DI (plus:DI (reg/v/f:DI 0 x0 [orig:104 __z ] [104])
(const_int 16 [0x10])) [9 MEM[(const long double 
&)__z_4(D) + 16]+0 S8 A128])) /libstdc++-v3/include/complex:859 34 
{*movdi_aarch64}

 (nil))
(insn 510 275 19 2 (set (zero_extract:TF (reg:TF 32 v0)
(const_int 64 [0x40])
(const_int 64 [0x40]))
(zero_extend:TF (reg:DI 2 x2 [ __x+-8 ]))) 
/libstdc++-v3/include/complex:861 689 {aarch64_movtfhigh_di}

 (nil))


insn 531 clobbers x4 as well. That's exactly the same problem described 
in Wilco's email. The following is what the correct code should look 
like. It's generated after applying Wilco's fix.



(insn 279 278 275 2 (set (reg:DI 3 x3 [orig:185 __x ] [185])
(reg:DI 19 x19 [orig:150 __x+8 ] [150])) 
/libstdc++-v3/include/complex:861 34 {*movdi_aarch64}

 (nil))
(insn 275 279 509 2 (set (reg:DI 23 x23 [orig:151 __y ] [151])
(mem:DI (plus:DI (reg/v/f:DI 0 x0 [orig:104 __z ] [104])
(const_int 16 [0x10])) [9 MEM[(const long double 
&)__z_4(D) + 16]+0 S8 A128])) /libstdc++-v3/include/complex:859 34 
{*movdi_aarch64}

 (nil))
(insn 509 275 510 2 (set (reg:TF 32 v0)
(zero_extend:TF (reg:DI 3 x3 [orig:185 __x ] [185]))) 
/libstdc++-v3/include/complex:861 691 {aarch64_movtflow_di}

 (nil))
(insn 510 509 19 2 (set (zero_extract:TF (reg:TF 32 v0)
(const_int 64 [0x40])
(const_int 64 [0x40]))
(zero_extend:TF (reg:DI 2 x2 [ __x+-8 ]))) 
/libstdc++-v3/include/complex:861 689 {aarch64_movtfhigh_di}

 (nil))



Regards,
Renlin Li

On 29/10/15 16:33, Richard Biener wrote:

On October 29, 2015 4:37:08 PM GMT+01:00, Ramana Radhakrishnan 
<ramana@googlemail.com> wrote:

On Thu, Jun 4, 2015 at 2:16 PM, Renlin Li <renlin...@arm.com> wrote:

Ping ~

Can somebody review it? Thank you!

Regards,
Renlin Li


On 16/04/15 10:32, Renlin Li wrote:

Ping~

Regards,
Renlin Li

On 16/04/15 10:09, wrote:

Ping~
Anybody has time to review it?


Regards,
Renlin Li

On 06/02/15 17:48, Renlin Li wrote:

Hi all,

This is a backport patch for branch 4.9. You can find the

original=20

patch

here:https://gcc.gnu.org/ml/gcc-patches/2014-09/msg00356.html

And it has been commit on the trunk as r215205.

This fixes a few libstdc++-v3 test suite failures.
x86_64 bootstraps Okay, aarch64_be-none-elf libstdc++-v3 tested

Okay.

Okay to commit on branch 4.9?

Regards,
Renlin Li

2015-02-06  Renlin Li<renlin...@arm.com>

  Backport from mainline
  2014-09-12  Wilco Dijkstra<wilco.dijks...@arm.com>

  * gcc/ree.c (combine_reaching_defs): Ensure inserted copy

don't=20

change
  the number of hard registers.


richi - an RM question -

Is this something that can be pulled back to GCC 4.9 branch assuming
testing still shows no regressions  - it breaks aarch64 be on GCC 4.9

If it is a regression or wrong-code fix its OK with a test case.

Richard.


...


regards
Ramana






Re: RFA: PATCH to store_field for storing a CONSTRUCTOR into a base subobject

2015-10-29 Thread Renlin Li

Hi Jason,

On 08/10/15 03:42, Jason Merrill wrote:
While looking at another issue I noticed that in g++.dg/init/vbase1.C 
the Diamond(int) constructor was unnecessarily storing a CONSTRUCTOR 
into a stack temporary and then copying it into the SubB base 
subobject rather than directly storing the CONSTRUCTOR. It was doing 
this because the base subobject is smaller than a complete object of 
SubB, but we already know how to store a CONSTRUCTOR into a space 
smaller than it would normally take without needing to introduce an 
extra temporary. This patch fixes that.


Tested x86_64-pc-linux-gnu.  OK for trunk?


The new test added to g++.dg/init/vbase1.C fails on arm target.

It assumes that, the target provides a pattern to store 0 into memory at 
least during expand stage. This is fine on aarch64, mips as they 
provided a zero register to do that.


But for targets which only allows register store and doesn't have zero 
register. This check is inapplicable. During the expand stage, the 
immediate might be forced into register. This what happens on arm, 
rs6000 target AFAIK.



Regards,
Renlin



Re: [PATCH]Add -fprofile-use option for check_effective_target_freorder.

2015-10-27 Thread Renlin Li



On 26/10/15 13:24, Bernd Schmidt wrote:

On 10/26/2015 02:17 PM, Teresa Johnson wrote:

On Mon, Oct 26, 2015 at 2:00 AM, Renlin Li <renlin...@arm.com> wrote:

 * lib/target-supports.exp (check_effective_target_freorder): Add
 -fprofile-use flag.


Hmmm, the testcases themselves which use this predicate only use 
-freorder-and-partition, so maybe this requires more thought.


Yes. In all of the related testcases, only -freorder-and-partition flag 
is provided explicitly.

How about creating a new dg-add-options for freorder?

proc add_options_for_freorder { flags } {
return "$flags -freorder-blocks-and-partition -fprofile-use"
}


proc check_effective_target_freorder {} {
return [check_no_compiler_messages freorder object {
void foo (void) { }
} [add_options_for_freorder ""]]
}

And in all test case which requires freorder support, "{ dg-add-options 
freorder }" is used to add necessary compiler flags?



Regards,
Renlin



Bernd





[PATCH]Add -fprofile-use option for check_effective_target_freorder.

2015-10-26 Thread Renlin Li

Hi all,

After r228136, flag_reorder_blocks_and_partition is canceled when 
-fprofile-use is not specified.


In this case check_effective_target_freorder() is not able to check the 
proper target support.
This is a simple patch to add "-fprofile-use" option that effective 
target check.


Okay to commit on the trunk?

Regards,
Renlin Li

gcc/testsuite/ChangeLog:

2015-10-26  Renlin Li  <renlin...@arm.com>

* lib/target-supports.exp (check_effective_target_freorder): Add
-fprofile-use flag.
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index b543519..0dc13be 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -960,11 +960,12 @@ proc check_effective_target_fstack_protector {} {
 
 # Return 1 if compilation with -freorder-blocks-and-partition is error-free
 # for trivial code, 0 otherwise.
+# -freorder-blocks-and-partition has no effect if given without -fprofile-use.
 
 proc check_effective_target_freorder {} {
 return [check_no_compiler_messages freorder object {
 	void foo (void) { }
-} "-freorder-blocks-and-partition"]
+} "-freorder-blocks-and-partition -fprofile-use"]
 }
 
 # Return 1 if -fpic and -fPIC are supported, as in no warnings or errors


Re: [PR67383][ARM][4.9]Backport of "Allow any register for DImode values in Thumb2"

2015-10-16 Thread Renlin Li

Hi Ramana,

On 16/10/15 11:52, Ramana Radhakrishnan wrote:

On Thu, Oct 15, 2015 at 03:01:24PM +0100, Renlin Li wrote:

Hi all,

This is a backport patch to loosen restrictions on core registers
for DImode values in Thumb2.

It fixes PR67383. In this particular case, reload tries to spill a
hard register, and use next register together as a pair to reload a
DImode pseudo register. However, the spilled register number is
odd.This is rejected by arm_hard_regno_mode_ok(). There is no other
register available, so the compiler throws an ICE.

I was not convinced enough by the reasoning provided in the description
because this patch was intended to be a bit of an optimization
rather than a correctness fix.


True, It's not a fix. It just allows more flexibility for register 
allocation.



The command line implies we remove r7 (frame pointer in Thumb2 - historical 
accident, fno-omit-frame-pointer), r9 (ffixed-r9), r10 (-mpic-register) which
leaves us with:

* r0, r1
* r2, r3
* r4, r5

as the only free registers available for DImode values for the whole 
compilation.

We then have r0, r1 and r2 live across the insn which means that there are no 
free registers to handle DImode values
under the constraints provided unless LRA / reload can spill the argument 
registers which it doesn't seem to be able to do
in this particular testcase. Vlad, is that correct ?
According to the logic, conflict hard register are excluded from spill 
candidate. That's why, in this case, r0, r1, r2 cannot be used.



Then I wondered why the same problem did not occur in ARM state given that has 
the same restriction.
In ARM state life is a bit better because the Frame pointer is r11 which means 
you pretty much have r6 and r7
as well available in addition to the above list, which means that theoretically 
you can
get away with this in ARM state.

Can you do some more comparison with ARM state as to why we don't have the same 
issue there ?


Presumably, ARM state should suffer from the same issue. I will have a look.

Regards,
Renlin

The test case in PR67383 is too big, so I didn't include it as part
of the patch.

I've put up a reduced testcase on the bz, the one I was using to debug.


arm-none-eabi regression test Okay without any new issues. Okay to
backport to 4.9?

For changes of this nature please bootstrap and regression test this in arm and 
thumb2 state as well please.

regards
Ramana





[PR67383][ARM][4.9]Backport of "Allow any register for DImode values in Thumb2"

2015-10-15 Thread Renlin Li

Hi all,

This is a backport patch to loosen restrictions on core registers for 
DImode values in Thumb2.


It fixes PR67383. In this particular case, reload tries to spill a hard 
register, and use next register together as a pair to reload a DImode 
pseudo register. However, the spilled register number is odd.This is 
rejected by arm_hard_regno_mode_ok(). There is no other register 
available, so the compiler throws an ICE.



The test case in PR67383 is too big, so I didn't include it as part of 
the patch.
arm-none-eabi regression test Okay without any new issues. Okay to 
backport to 4.9?


Regards,
Renlin Li

gcc/ChangeLog:

2015-10-15  Renlin Li  <renlin...@arm.com>

PR target/67383
Backport from mainline.
2014-04-22  Ramana Radhakrishnan <ramana.radhakrish...@arm.com>

* config/arm/arm.c (arm_hard_regno_mode_ok): Loosen
restrictions on core registers for DImode values in Thumb2.

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 08b5255..88d957a 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -22646,12 +22646,19 @@ arm_hard_regno_mode_ok (unsigned int regno, enum machine_mode mode)
 }
 
   /* We allow almost any value to be stored in the general registers.
- Restrict doubleword quantities to even register pairs so that we can
- use ldrd.  Do not allow very large Neon structure opaque modes in
- general registers; they would use too many.  */
+ Restrict doubleword quantities to even register pairs in ARM state
+ so that we can use ldrd.  Do not allow very large Neon structure
+ opaque modes in general registers; they would use too many.  */
   if (regno <= LAST_ARM_REGNUM)
-return !(TARGET_LDRD && GET_MODE_SIZE (mode) > 4 && (regno & 1) != 0)
-  && ARM_NUM_REGS (mode) <= 4;
+{
+  if (ARM_NUM_REGS (mode) > 4)
+	  return FALSE;
+
+  if (TARGET_THUMB2)
+	return TRUE;
+
+  return !(TARGET_LDRD && GET_MODE_SIZE (mode) > 4 && (regno & 1) != 0);
+}
 
   if (regno == FRAME_POINTER_REGNUM
   || regno == ARG_POINTER_REGNUM)


[PATCH][ARM]Add earlyclobber modifier for neon_(vtrn, vuzp, vzip)_insn rtx pattern.

2015-10-06 Thread Renlin Li

Hi all,

Previously, the compiler will generate the following pattern, which will 
cause an ICE during postreload pass. Meanwhile, the instruction itself 
produces UNKNOWN result when the source and destination register are the 
same according to ARM instruction manual. The same rule applies to vtrn 
and vzip patterns.


(insn 50 71 106 3 (parallel [
(set (reg:V2SI 48 d16 [172])
(unspec:V2SI [
(reg:V2SI 48 d16 [172])
(reg:V2SI 48 d16 [172])
] UNSPEC_VUZP1))
(set (reg:V2SI 48 d16 [172])
(unspec:V2SI [
(reg:V2SI 48 d16 [172])
(reg:V2SI 48 d16 [172])
] UNSPEC_VUZP2))
]) /src/gcc/gcc/testsuite/gcc.dg/vect/pr37474.c:21 1991 
{*neon_vuzpv2si_insn}

 (nil))

The ICE is triggered when compiling gcc.dg/vect/pr37474.c using 
arm-none-linux-gnueabihf toolchain.


Adding "&" modifier to operands[0] and operands[2] will explicitly 
prevent those two register operands getting the same register.
I made the same changes to neon_vtrn_insn and neon_vzip_insn 
pattern as well.


arm-none-linux-gnueabihf regression tests Okay. Okay to commit?

Regards,
Renlin Li

gcc/ChangeLog:

2015-10-06  Renlin Li  <renlin...@arm.com>

* config/arm/neon.md (neon_vuzp_insn): Add & modifier for
operands[0] and operands[2].
(neon_vtrn_insn): Likewise.
(neon_vzip_insn): Likewise.

diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 2667866..e5a2b0f 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -4074,11 +4074,11 @@ if (BYTES_BIG_ENDIAN)
 
 ;; Note: Different operand numbering to handle tied registers correctly.
 (define_insn "*neon_vtrn_insn"
-  [(set (match_operand:VDQW 0 "s_register_operand" "=w")
+  [(set (match_operand:VDQW 0 "s_register_operand" "=")
 (unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "0")
   (match_operand:VDQW 3 "s_register_operand" "2")]
  UNSPEC_VTRN1))
-   (set (match_operand:VDQW 2 "s_register_operand" "=w")
+   (set (match_operand:VDQW 2 "s_register_operand" "=")
  (unspec:VDQW [(match_dup 1) (match_dup 3)]
  UNSPEC_VTRN2))]
   "TARGET_NEON"
@@ -4100,11 +4100,11 @@ if (BYTES_BIG_ENDIAN)
 
 ;; Note: Different operand numbering to handle tied registers correctly.
 (define_insn "*neon_vzip_insn"
-  [(set (match_operand:VDQW 0 "s_register_operand" "=w")
+  [(set (match_operand:VDQW 0 "s_register_operand" "=")
 (unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "0")
   (match_operand:VDQW 3 "s_register_operand" "2")]
  UNSPEC_VZIP1))
-   (set (match_operand:VDQW 2 "s_register_operand" "=w")
+   (set (match_operand:VDQW 2 "s_register_operand" "=")
 (unspec:VDQW [(match_dup 1) (match_dup 3)]
  UNSPEC_VZIP2))]
   "TARGET_NEON"
@@ -4126,11 +4126,11 @@ if (BYTES_BIG_ENDIAN)
 
 ;; Note: Different operand numbering to handle tied registers correctly.
 (define_insn "*neon_vuzp_insn"
-  [(set (match_operand:VDQW 0 "s_register_operand" "=w")
+  [(set (match_operand:VDQW 0 "s_register_operand" "=")
 (unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "0")
   (match_operand:VDQW 3 "s_register_operand" "2")]
  UNSPEC_VUZP1))
-   (set (match_operand:VDQW 2 "s_register_operand" "=w")
+   (set (match_operand:VDQW 2 "s_register_operand" "=")
 (unspec:VDQW [(match_dup 1) (match_dup 3)]
  UNSPEC_VUZP2))]
   "TARGET_NEON"


[PR66776][PATCH][AARCH64] Add cmovdi_insn_uxtw pattern.

2015-10-02 Thread Renlin Li

Hi all,

This is a simple patch to add a new cmovdi_insn_uxtw rtx pattern to 
aarch64 backend.


For the following simple test case:

unsigned long long
foo (unsigned int a, unsigned int b, unsigned int c)
{
  return a ? b : c;
}

With this new pattern, the new code-generation will be:

cmpw0, wzr
cselw0, w1, w2, ne
ret

Without the path, the old code-generation is like this:
uxtwx2, w2
uxtwx1, w1
cmp w0, wzr
cselx0, x2, x1, eq
ret


aarch64-none-elf regression test Okay. Okay to commit?

Regards,
Renlin Li

gcc/ChangeLog:

2015-10-02  Renlin Li  <renlin...@arm.com>

PR target/66776
* config/aarch64/aarch64.md (cmovdi_insn_uxtw): New pattern.

gcc/testsuite/ChangeLog:

2015-10-02  Renlin Li  <renlin...@arm.com>

PR target/66776
* gcc.target/aarch64/pr66776.c: New.

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index c3cd58d..20681cd 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3010,6 +3010,18 @@
   [(set_attr "type" "csel")]
 )
 
+(define_insn "*cmovdi_insn_uxtw"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+	(if_then_else:DI
+	 (match_operator 1 "aarch64_comparison_operator"
+	  [(match_operand 2 "cc_register" "") (const_int 0)])
+	 (zero_extend:DI (match_operand:SI 3 "register_operand" "r"))
+	 (zero_extend:DI (match_operand:SI 4 "register_operand" "r"]
+  ""
+  "csel\\t%w0, %w3, %w4, %m1"
+  [(set_attr "type" "csel")]
+)
+
 (define_insn "*cmov_insn"
   [(set (match_operand:GPF 0 "register_operand" "=w")
 	(if_then_else:GPF
diff --git a/gcc/testsuite/gcc.target/aarch64/pr66776.c b/gcc/testsuite/gcc.target/aarch64/pr66776.c
new file mode 100644
index 000..a5c83b4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr66776.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 --save-temps" } */
+
+unsigned long long
+foo (unsigned int a, unsigned int b, unsigned int c)
+{
+  return a ? b : c;
+}
+
+/* { dg-final { scan-assembler-not "uxtw" } } */


Re: [PATCH] Improve DOM's optimization of control statements

2015-10-02 Thread Renlin Li

Hi Jeff,

Your patch causes an ICE regression.
The test case is " gcc.c-torture/compile/pr27087.c", I observed it on 
aarch64-none-elf target when compiling the test case with '-Os' flag.


A quick check shows, the cfg has been changed, but the loop information 
is not updated. Thus the information about the number of basic block in 
a loop is not reliable.


Could you please have a look?

Regards,
Renlin

On 30/09/15 21:28, Jeff Law wrote:

Until now DOM has had to be very conservative with handling control
statements with known conditions.  This as been an unfortunate side
effect of the interaction between removing edges and recycling names via
the SSA_NAME manager.

Essentially DOM would have to leave control statements alone.  So you'd
see stuff like

if (0 == 0)

left around by DOM.  The jump threader would thankfully come along and
optimize that as a jump thread.  But that's terribly inefficient, not to
mention it creates unnecessary churn in the CFG and SSA_NAMEs.

By optimizing that directly in DOM, including removing whatever edges
are not executable, we no longer have to rely on jump threading to
handle that case.  Less churn in the CFG & SSA_NAMEs.   There's also
some chance for secondary optimizations with fewer edges left in the CFG
for DOM to consider.

Unfortunately, the churn caused by jump threading made it excessively
difficult to analyze before/after dumps.  Sadly, you can have the same
code, but if the SSA_NAMEs have changed, that impacts coalescing as we
leave SSA.  Churn in the CFG changes labels/jumps, often without
changing the actual structure, etc.

I did some tests with valgrind to evaluate branching behaviour
before/after effects on the resulting code and those effects were tiny,
in the I doubt you could measure them range.  That was expected since
what we're really doing here is just capturing the optimization earlier.

I had a couple more tests, but they were lost in a bit of idiocy.  The
test included is the one I had a second copy of lying around.

Bootstrapped and regression tested on x86_64-linux-gnu.  Installed on
the trunk.

Jeff




[PATCH][AARCH64]Add csneg3_uxtw_insn pattern

2015-10-02 Thread Renlin Li

Hi all,

This is a simple patch to add csneg3_uxtw_insn into aarch64 backend. It 
will save one uxtw instruction as a write to the 32-bit w-register 
implicitly

zero-extends the value up to the full 64 bits of an x-register.

aarch64-none-elf regression test Okay without any issues. Okay to commit?

Regards,
Renlin Li


gcc/ChangeLog:

2015-10-02  Renlin Li <renlin...@arm.com>

* config/aarch64/aarch64.md (csneg3_uxtw_insn): New pattern.

gcc/testsuite/ChangeLog:

2015-10-02  Renlin Li <renlin...@arm.com>

* gcc.target/aarch64/csneg-1.c: Update test.
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index c3cd58d..373f2d5 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3132,6 +3132,18 @@
   [(set_attr "type" "csel")]
 )
 
+(define_insn "csneg3_uxtw_insn"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+	(zero_extend:DI
+	  (if_then_else:SI
+	(match_operand 1 "aarch64_comparison_operation" "")
+	(neg:SI (match_operand:SI 2 "register_operand" "r"))
+	(match_operand:SI 3 "aarch64_reg_or_zero" "rZ"]
+  ""
+  "csneg\\t%w0, %w3, %w2, %M1"
+  [(set_attr "type" "csel")]
+)
+
 (define_insn "csneg3_insn"
   [(set (match_operand:GPI 0 "register_operand" "=r")
 (if_then_else:GPI
diff --git a/gcc/testsuite/gcc.target/aarch64/csneg-1.c b/gcc/testsuite/gcc.target/aarch64/csneg-1.c
index 29d4e4e..4860d64 100644
--- a/gcc/testsuite/gcc.target/aarch64/csneg-1.c
+++ b/gcc/testsuite/gcc.target/aarch64/csneg-1.c
@@ -56,3 +56,15 @@ int test_csneg_cmp(int x)
 x = -x;
   return x;
 }
+
+unsigned long long
+test_csneg_uxtw (unsigned int a,
+		 unsigned int b,
+		 unsigned int c)
+{
+  /* { dg-final { scan-assembler "csneg\tw\[0-9\]*.*ne" } } */
+  /* { dg-final { scan-assembler-not "uxtw\tw\[0-9\]*.*" } } */
+  unsigned int val;
+  val = a ? b: -c;
+  return val;
+}



Re: [0/7] Type promotion pass and elimination of zext/sext

2015-09-08 Thread Renlin Li

Hi Andrew,

Previously, there is a discussion thread in binutils mailing list:

https://sourceware.org/ml/binutils/2015-04/msg00032.html

Nick proposed a way to fix, Richard Henderson hold similar opinion as you.

Regards,
Renlin

On 07/09/15 12:45, pins...@gmail.com wrote:





On Sep 7, 2015, at 7:22 PM, Kugan  wrote:



On 07/09/15 20:46, Wilco Dijkstra wrote:

Kugan wrote:
2. vector-compare-1.c from c-c++-common/torture fails to assemble with
-O3 -g Error: unaligned opcodes detected in executable segment. It works
fine if I remove the -g. I am looking into it and needs to be fixed as well.

This is a known assembler bug I found a while back, Renlin is looking into it.
Basically when debug tables are inserted at the end of a code section the
assembler doesn't align to the alignment required by the debug tables.

This is precisely what seems to be happening. Renlin, could you please
let me know when you have a patch (even if it is a prototype or a hack).


I had noticed that but I read through the assembler code and it sounded very 
much like it was a designed this way and that the compiler was not supposed to 
emit assembly like this and fix up the alignment.

Thanks,
Andrew


Thanks,
Kugan




  1   2   3   >