Re: One more path to fix PR70478

2017-04-12 Thread Christophe Lyon
On 11 April 2017 at 21:43, Vladimir Makarov  wrote:
>
>
> On 04/11/2017 03:30 AM, Christophe Lyon wrote:
>>
>> Hi Vladimir,
>>
>> On 10 April 2017 at 17:05, Vladimir Makarov  wrote:
>>>
>>>This is the second try to fix
>>>
>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70478
>>>
>>>The first try patch triggered a latent bug and broke one Fortran
>>> testcase
>>> on x86-64.
>>>
>>>The patch was successfully bootstrapped on x86-64 and tested on
>>> x86-64,
>>> ppc64, and aarch64.
>>>
>>>Committed as rev. 246808.
>>>
>>>
>> This patch causes regression on arm*hf configurations:
>>Executed from: gcc.target/arm/arm.exp
>>  gcc.target/arm/armv8_2-fp16-move-1.c scan-assembler-times
>> ldrh\\tr[0-9]+ 2
>>  gcc.target/arm/armv8_2-fp16-move-1.c scan-assembler-times
>> strh\\tr[0-9]+ 2
>>  gcc.target/arm/armv8_2-fp16-move-1.c scan-assembler-times
>> vld1\\.16\\t{d[0-9]+\\[[0-9]+\\]}, \\[r[0-9]+\\] 2
>>  gcc.target/arm/armv8_2-fp16-move-1.c scan-assembler-times
>> vmov\\.f16\\tr[0-9]+, s[0-9]+ 4
>>  gcc.target/arm/armv8_2-fp16-move-1.c scan-assembler-times
>> vmov\\.f16\\ts[0-9]+, r[0-9]+ 4
>>  gcc.target/arm/armv8_2-fp16-move-1.c scan-assembler-times
>> vst1\\.16\\t{d[0-9]+\\[[0-9]+\\]}, \\[r[0-9]+\\] 2
>>
>>
> I've committed a patch which is supposed to fix the regression.
>

I confirm it's now OK. Thanks for the prompt fix!

Christophe


Re: One more path to fix PR70478

2017-04-11 Thread Vladimir Makarov



On 04/11/2017 03:30 AM, Christophe Lyon wrote:

Hi Vladimir,

On 10 April 2017 at 17:05, Vladimir Makarov  wrote:

   This is the second try to fix

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70478

   The first try patch triggered a latent bug and broke one Fortran testcase
on x86-64.

   The patch was successfully bootstrapped on x86-64 and tested on x86-64,
ppc64, and aarch64.

   Committed as rev. 246808.



This patch causes regression on arm*hf configurations:
   Executed from: gcc.target/arm/arm.exp
 gcc.target/arm/armv8_2-fp16-move-1.c scan-assembler-times ldrh\\tr[0-9]+ 2
 gcc.target/arm/armv8_2-fp16-move-1.c scan-assembler-times strh\\tr[0-9]+ 2
 gcc.target/arm/armv8_2-fp16-move-1.c scan-assembler-times
vld1\\.16\\t{d[0-9]+\\[[0-9]+\\]}, \\[r[0-9]+\\] 2
 gcc.target/arm/armv8_2-fp16-move-1.c scan-assembler-times
vmov\\.f16\\tr[0-9]+, s[0-9]+ 4
 gcc.target/arm/armv8_2-fp16-move-1.c scan-assembler-times
vmov\\.f16\\ts[0-9]+, r[0-9]+ 4
 gcc.target/arm/armv8_2-fp16-move-1.c scan-assembler-times
vst1\\.16\\t{d[0-9]+\\[[0-9]+\\]}, \\[r[0-9]+\\] 2



I've committed a patch which is supposed to fix the regression.



Re: One more path to fix PR70478

2017-04-11 Thread Vladimir Makarov



On 04/11/2017 11:42 AM, Vladimir Makarov wrote:



Yes, Christophe.  It would be helpful.  I've tried to reproduce it but 
I don't see the difference in the generated code.



Never mind.  I've reproduced it.  Thanks.


Re: One more path to fix PR70478

2017-04-11 Thread Ramana Radhakrishnan
On Tue, Apr 11, 2017 at 5:26 PM, Christophe Lyon
 wrote:
> On 11 April 2017 at 17:42, Vladimir Makarov  wrote:
>>
>>
>> On 04/11/2017 03:30 AM, Christophe Lyon wrote:
>>>
>>> Hi Vladimir,
>>>
>>> On 10 April 2017 at 17:05, Vladimir Makarov  wrote:

This is the second try to fix

 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70478

The first try patch triggered a latent bug and broke one Fortran
 testcase
 on x86-64.

The patch was successfully bootstrapped on x86-64 and tested on
 x86-64,
 ppc64, and aarch64.

Committed as rev. 246808.


>>> I would have to re--run the build/test manually to get the generated
>>> code, let me know if it's needed.
>>
>> Yes, Christophe.  It would be helpful.  I've tried to reproduce it but I
>> don't see the difference in the generated code.
>>
>
> Here is what I observed (the "with-patch file is with your commit r246808,
> the other is r246807)
>
> --- armv8_2-fp16-move-1.s   2017-04-11 16:23:46.795264234 +
> +++ armv8_2-fp16-move-1.s.with-patch2017-04-11 15:54:52.563210963 +
> @@ -37,8 +37,8 @@
> @ frame_needed = 0, uses_anonymous_args = 0
> @ link register save eliminated.
> lsl r1, r1, #1
> -   add r3, r0, r1
> -   vld1.16 {d0[0]}, [r3]
> +   ldrhr3, [r0, r1]@ __fp16
> +   vmov.f16s0, r3  @ __fp16
> bx  lr
> .size   test_load_2, .-test_load_2
> .align  2
> @@ -64,9 +64,9 @@
> @ args = 0, pretend = 0, frame = 0
> @ frame_needed = 0, uses_anonymous_args = 0
> @ link register save eliminated.
> +   vmov.f16r3, s0  @ __fp16
> lsl r1, r1, #1
> -   add r3, r0, r1
> -   vst1.16 {d0[0]}, [r3]
> +   strhr3, [r0, r1]@ __fp16
> bx  lr
> .size   test_store_2, .-test_store_2
> .align  2
>

That's actually bad because we've now introduced additional moves
between the integer and FP register files. It could be something in
the backend but this is worth investigating further

Ramana

>
> Christophe


Re: One more path to fix PR70478

2017-04-11 Thread Christophe Lyon
On 11 April 2017 at 17:42, Vladimir Makarov  wrote:
>
>
> On 04/11/2017 03:30 AM, Christophe Lyon wrote:
>>
>> Hi Vladimir,
>>
>> On 10 April 2017 at 17:05, Vladimir Makarov  wrote:
>>>
>>>This is the second try to fix
>>>
>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70478
>>>
>>>The first try patch triggered a latent bug and broke one Fortran
>>> testcase
>>> on x86-64.
>>>
>>>The patch was successfully bootstrapped on x86-64 and tested on
>>> x86-64,
>>> ppc64, and aarch64.
>>>
>>>Committed as rev. 246808.
>>>
>>>
>> I would have to re--run the build/test manually to get the generated
>> code, let me know if it's needed.
>
> Yes, Christophe.  It would be helpful.  I've tried to reproduce it but I
> don't see the difference in the generated code.
>

Here is what I observed (the "with-patch file is with your commit r246808,
the other is r246807)

--- armv8_2-fp16-move-1.s   2017-04-11 16:23:46.795264234 +
+++ armv8_2-fp16-move-1.s.with-patch2017-04-11 15:54:52.563210963 +
@@ -37,8 +37,8 @@
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
lsl r1, r1, #1
-   add r3, r0, r1
-   vld1.16 {d0[0]}, [r3]
+   ldrhr3, [r0, r1]@ __fp16
+   vmov.f16s0, r3  @ __fp16
bx  lr
.size   test_load_2, .-test_load_2
.align  2
@@ -64,9 +64,9 @@
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
+   vmov.f16r3, s0  @ __fp16
lsl r1, r1, #1
-   add r3, r0, r1
-   vst1.16 {d0[0]}, [r3]
+   strhr3, [r0, r1]@ __fp16
bx  lr
.size   test_store_2, .-test_store_2
.align  2


Christophe


Re: One more path to fix PR70478

2017-04-11 Thread Vladimir Makarov



On 04/11/2017 03:30 AM, Christophe Lyon wrote:

Hi Vladimir,

On 10 April 2017 at 17:05, Vladimir Makarov  wrote:

   This is the second try to fix

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70478

   The first try patch triggered a latent bug and broke one Fortran testcase
on x86-64.

   The patch was successfully bootstrapped on x86-64 and tested on x86-64,
ppc64, and aarch64.

   Committed as rev. 246808.



I would have to re--run the build/test manually to get the generated
code, let me know if it's needed.
Yes, Christophe.  It would be helpful.  I've tried to reproduce it but I 
don't see the difference in the generated code.




Re: One more path to fix PR70478

2017-04-11 Thread Christophe Lyon
Hi Vladimir,

On 10 April 2017 at 17:05, Vladimir Makarov  wrote:
>   This is the second try to fix
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70478
>
>   The first try patch triggered a latent bug and broke one Fortran testcase
> on x86-64.
>
>   The patch was successfully bootstrapped on x86-64 and tested on x86-64,
> ppc64, and aarch64.
>
>   Committed as rev. 246808.
>
>

This patch causes regression on arm*hf configurations:
  Executed from: gcc.target/arm/arm.exp
gcc.target/arm/armv8_2-fp16-move-1.c scan-assembler-times ldrh\\tr[0-9]+ 2
gcc.target/arm/armv8_2-fp16-move-1.c scan-assembler-times strh\\tr[0-9]+ 2
gcc.target/arm/armv8_2-fp16-move-1.c scan-assembler-times
vld1\\.16\\t{d[0-9]+\\[[0-9]+\\]}, \\[r[0-9]+\\] 2
gcc.target/arm/armv8_2-fp16-move-1.c scan-assembler-times
vmov\\.f16\\tr[0-9]+, s[0-9]+ 4
gcc.target/arm/armv8_2-fp16-move-1.c scan-assembler-times
vmov\\.f16\\ts[0-9]+, r[0-9]+ 4
gcc.target/arm/armv8_2-fp16-move-1.c scan-assembler-times
vst1\\.16\\t{d[0-9]+\\[[0-9]+\\]}, \\[r[0-9]+\\] 2

See 
http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/246809/report-build-info.html

Is it just a matter of adjusting the testcases? (note that there is no
regression when forcing either:
-march=armv5t
-mthumb/-march=armv8-a/-mfpu=crypto-neon-fp-armv8/-mfloat-abi=hard
in the runtestflags.

I would have to re--run the build/test manually to get the generated
code, let me know if it's needed.

Thanks,

Christophe


One more path to fix PR70478

2017-04-10 Thread Vladimir Makarov

  This is the second try to fix

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70478

  The first try patch triggered a latent bug and broke one Fortran 
testcase on x86-64.


  The patch was successfully bootstrapped on x86-64 and tested on 
x86-64, ppc64, and aarch64.


  Committed as rev. 246808.


Index: ChangeLog
===
--- ChangeLog	(revision 246807)
+++ ChangeLog	(working copy)
@@ -1,3 +1,12 @@
+2017-04-10  Vladimir Makarov  
+
+	PR rtl-optimization/70478
+	* lra-constraints.c (curr_small_class_check): New.
+	(update_and_check_small_class_inputs): New.
+	(process_alt_operands): Update curr_small_class_check.  Disfavor
+	alternative insn memory operands.  Check available regs for small
+	class operands.
+
 2017-03-31  Matthew Fortune  
 
 	PR target/80057
Index: lra-constraints.c
===
--- lra-constraints.c	(revision 246789)
+++ lra-constraints.c	(working copy)
@@ -1852,6 +1852,42 @@ prohibited_class_reg_set_mode_p (enum re
 	  (temp, ira_prohibited_class_mode_regs[rclass][mode]));
 }
 
+
+/* Used to check validity info about small class input operands.  It
+   should be incremented at start of processing an insn
+   alternative.  */
+static unsigned int curr_small_class_check = 0;
+
+/* Update number of used inputs of class OP_CLASS for operand NOP.
+   Return true if we have more such class operands than the number of
+   available regs.  */
+static bool
+update_and_check_small_class_inputs (int nop, enum reg_class op_class)
+{
+  static unsigned int small_class_check[LIM_REG_CLASSES];
+  static int small_class_input_nums[LIM_REG_CLASSES];
+  
+  if (SMALL_REGISTER_CLASS_P (op_class)
+  /* We are interesting in classes became small because of fixing
+	 some hard regs, e.g. by an user through GCC options.  */
+  && hard_reg_set_intersect_p (reg_class_contents[op_class],
+   ira_no_alloc_regs)
+  && (curr_static_id->operand[nop].type != OP_OUT
+	  || curr_static_id->operand[nop].early_clobber))
+{
+  if (small_class_check[op_class] == curr_small_class_check)
+	small_class_input_nums[op_class]++;
+  else
+	{
+	  small_class_check[op_class] = curr_small_class_check;
+	  small_class_input_nums[op_class] = 1;
+	}
+  if (small_class_input_nums[op_class] > ira_class_hard_regs_num[op_class])
+	return true;
+}
+  return false;
+}
+
 /* Major function to choose the current insn alternative and what
operands should be reloaded and how.	 If ONLY_ALTERNATIVE is not
negative we should consider only this alternative.  Return false if
@@ -1952,6 +1988,7 @@ process_alt_operands (int only_alternati
   if (!TEST_BIT (preferred, nalt))
 	continue;
 
+  curr_small_class_check++;
   overall = losers = addr_losers = 0;
   static_reject = reject = reload_nregs = reload_sum = 0;
   for (nop = 0; nop < n_operands; nop++)
@@ -2685,6 +2722,21 @@ process_alt_operands (int only_alternati
 		}
 		}
 
+	  /* When we use memory operand, the insn should read the
+		 value from memory and even if we just wrote a value
+		 into the memory it is costly in comparison with an
+		 insn alternative which does not use memory
+		 (e.g. register or immediate operand).  */
+	  if (no_regs_p && offmemok)
+		{
+		  if (lra_dump_file != NULL)
+		fprintf
+		  (lra_dump_file,
+		   "Using memory insn operand %d: reject+=3\n",
+		   nop);
+		  reject += 3;
+		}
+	  
 #ifdef SECONDARY_MEMORY_NEEDED
 	  /* If reload requires moving value through secondary
 		 memory, it will need one more insn at least.  */
@@ -2747,6 +2799,14 @@ process_alt_operands (int only_alternati
   goto fail;
 }
 
+	  if (update_and_check_small_class_inputs (nop, this_alternative))
+	{
+	  if (lra_dump_file != NULL)
+		fprintf (lra_dump_file,
+			 "alt=%d, not enough small class regs -- refuse\n",
+			 nalt);
+	  goto fail;
+	}
 	  curr_alt[nop] = this_alternative;
 	  COPY_HARD_REG_SET (curr_alt_set[nop], this_alternative_set);
 	  curr_alt_win[nop] = this_alternative_win;