Re: Re : add tsv110 pipeline scheduling

2019-04-08 Thread James Greenhalgh
Thank you for the ChangeLog entry for your patch.

I have applied it to trunk as revision 270212.

We're very late in GCC 9 development, but this patch only impacts TSV
scheduling.

Thanks,
James


On Thu, Apr 04, 2019 at 02:11:12AM +0100, wuyuan (E) wrote:
> Hi ,James:
>  Thank you for your review, Please attach the following author 
> information to the patch.
> 
> 2019-04-04  wu yuan 
> 
>   * config/aarch64/aarch64-cores.def (tsv1100): Change scheduling model.
>   * config/aarch64/aarch64.md : Add "tsv110.md"
>   * config/aarch64/tsv110.md: New file.
>   
> Thanks,
>   wuyuan
> 
> -邮件原件-
> 发件人: James Greenhalgh [mailto:james.greenha...@arm.com] 
> 发送时间: 2019年4月4日 1:58
> 收件人: wuyuan (E) 
> 抄送: Kyrill Tkachov ; gcc-patches@gcc.gnu.org; 
> Zhangyichao (AB) ; Zhanghaijian (A) 
> ; nd 
> 主题: Re: Re : add tsv110 pipeline scheduling
> 
> On Tue, Apr 02, 2019 at 03:26:22PM +0100, wuyuan (E) wrote:
> > Hi ,James:
> > Has the submitted patch been merged into the trunk?  Looking forward to 
> > your reply , thank you very much!   
> > 
> > 
> > 
> > Best Regards,
> > 
> > wuyuan
> 
> Hi Wuyuan,
> 
> This patch is OK for trunk. Thank you for your many clarifications.
> 
> Will you need one of us to apply this to trunk on your behalf?
> 
> If you would like me to apply your patch, please provide the full ChangeLog 
> with author information, like so:
> 
> 2019-04-03  James Greenhalgh  
>   Second Author  
>   Third Author  
> 
>   * config/aarch64/aarch64-cores.def (tsv1100): Change scheduling model.
>   * config/aarch64/aarch64.md : Add "tsv110.md"
>   * config/aarch64/tsv110.md: New file.
> 
> Thanks,
> James
> 
> 
> > -邮件原件-
> > 发件人: wuyuan (E)
> > 发送时间: 2019年3月15日 21:57
> > 收件人: 'James Greenhalgh' 
> > 抄送: Kyrill Tkachov ; 
> > gcc-patches@gcc.gnu.org; Zhangyichao (AB) 
> > ; Zhanghaijian (A) 
> > ; nd ; wufeng (O) 
> > ; Yangfei (Felix) 
> > 主题: Re : add tsv110 pipeline scheduling
> > 
> > Hi , James:
> >  Thank you very much for your meticulous review work. The explanation 
> > of the two questions as follows:
> >  The first problem is caused by my negligence and should be changed to 
> > " crypto_sha256_fast" .
> >   The second question I have verified with the hardware engineer. Only 
> > ALU2/ALU3 could support PSTATE register update so any instruction intends 
> > to update NZCV will be issued to ALU2/ALU3.   MDU could provide a better 
> > pipeline efficiency for multi cycle ALU instruction so we issue 2 cycles 
> > ALU w/o PSTATE update to MDU unit.  the current pipeline processing is  ok  
> > , except the pipeline " tsv110_alu2" should replace with " tsv110_alu2| 
> > tsv110_alu3".
> > 
> > 
> >  
> > 
> > The detailed patches are as follows:
> > 
> >   * config/aarch64/aarch64-cores.def (tsv1100): Change scheduling model.
> >   * config/aarch64/aarch64.md : Add "tsv110.md"
> >   * config/aarch64/tsv110.md: New file.
> > 
> > 
> > diff --git a/gcc/config/aarch64/aarch64-cores.def 
> > b/gcc/config/aarch64/aarch64-cores.def
> > index ed56e5e..82d91d6
> > --- a/gcc/config/aarch64/aarch64-cores.def
> > +++ b/gcc/config/aarch64/aarch64-cores.def
> > @@ -105,7 +105,7 @@ AARCH64_CORE("neoverse-n1",  neoversen1, 
> > cortexa57, 8_2A,  AARCH64_FL_FOR_ARCH8_  AARCH64_CORE("neoverse-e1",  
> > neoversee1, cortexa53, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 
> > | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD | AARCH64_FL_SSBS, cortexa53, 
> > 0x41, 0xd4a, -1)
> >  
> >  /* HiSilicon ('H') cores. */
> > -AARCH64_CORE("tsv110",  tsv110, cortexa57, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | 
> > AARCH64_FL_CRYPTO | AARCH64_FL_F16 | AARCH64_FL_AES | AARCH64_FL_SHA2, 
> > tsv110,   0x48, 0xd01, -1)
> >

Re : add tsv110 pipeline scheduling

2019-04-03 Thread wuyuan (E)
Hi ,James:
 Thank you for your review, Please attach the following author 
information to the patch.

2019-04-04  wu yuan 

* config/aarch64/aarch64-cores.def (tsv1100): Change scheduling model.
* config/aarch64/aarch64.md : Add "tsv110.md"
* config/aarch64/tsv110.md: New file.

Thanks,
wuyuan

-邮件原件-
发件人: James Greenhalgh [mailto:james.greenha...@arm.com] 
发送时间: 2019年4月4日 1:58
收件人: wuyuan (E) 
抄送: Kyrill Tkachov ; gcc-patches@gcc.gnu.org; 
Zhangyichao (AB) ; Zhanghaijian (A) 
; nd 
主题: Re: Re : add tsv110 pipeline scheduling

On Tue, Apr 02, 2019 at 03:26:22PM +0100, wuyuan (E) wrote:
> Hi ,James:
> Has the submitted patch been merged into the trunk?  Looking forward to your 
> reply , thank you very much!  
>   
>  
>   
> Best Regards,
>   
> wuyuan

Hi Wuyuan,

This patch is OK for trunk. Thank you for your many clarifications.

Will you need one of us to apply this to trunk on your behalf?

If you would like me to apply your patch, please provide the full ChangeLog 
with author information, like so:

2019-04-03  James Greenhalgh  
Second Author  
Third Author  

* config/aarch64/aarch64-cores.def (tsv1100): Change scheduling model.
* config/aarch64/aarch64.md : Add "tsv110.md"
* config/aarch64/tsv110.md: New file.

Thanks,
James


> -邮件原件-
> 发件人: wuyuan (E)
> 发送时间: 2019年3月15日 21:57
> 收件人: 'James Greenhalgh' 
> 抄送: Kyrill Tkachov ; 
> gcc-patches@gcc.gnu.org; Zhangyichao (AB) 
> ; Zhanghaijian (A) 
> ; nd ; wufeng (O) 
> ; Yangfei (Felix) 
> 主题: Re : add tsv110 pipeline scheduling
> 
> Hi , James:
>  Thank you very much for your meticulous review work. The explanation of 
> the two questions as follows:
>  The first problem is caused by my negligence and should be changed to " 
> crypto_sha256_fast" .
>   The second question I have verified with the hardware engineer. Only 
> ALU2/ALU3 could support PSTATE register update so any instruction intends to 
> update NZCV will be issued to ALU2/ALU3.   MDU could provide a better 
> pipeline efficiency for multi cycle ALU instruction so we issue 2 cycles ALU 
> w/o PSTATE update to MDU unit.  the current pipeline processing is  ok  , 
> except the pipeline " tsv110_alu2" should replace with " tsv110_alu2| 
> tsv110_alu3".
>   
>   
>
> 
> The detailed patches are as follows:
> 
>   * config/aarch64/aarch64-cores.def (tsv1100): Change scheduling model.
>   * config/aarch64/aarch64.md : Add "tsv110.md"
>   * config/aarch64/tsv110.md: New file.
> 
> 
> diff --git a/gcc/config/aarch64/aarch64-cores.def 
> b/gcc/config/aarch64/aarch64-cores.def
> index ed56e5e..82d91d6
> --- a/gcc/config/aarch64/aarch64-cores.def
> +++ b/gcc/config/aarch64/aarch64-cores.def
> @@ -105,7 +105,7 @@ AARCH64_CORE("neoverse-n1",  neoversen1, 
> cortexa57, 8_2A,  AARCH64_FL_FOR_ARCH8_  AARCH64_CORE("neoverse-e1",  
> neoversee1, cortexa53, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 
> | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD | AARCH64_FL_SSBS, cortexa53, 
> 0x41, 0xd4a, -1)
>  
>  /* HiSilicon ('H') cores. */
> -AARCH64_CORE("tsv110",  tsv110, cortexa57, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | 
> AARCH64_FL_CRYPTO | AARCH64_FL_F16 | AARCH64_FL_AES | AARCH64_FL_SHA2, 
> tsv110,   0x48, 0xd01, -1)
> +AARCH64_CORE("tsv110",  tsv110, tsv110, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | 
> AARCH64_FL_CRYPTO | AARCH64_FL_F16 | AARCH64_FL_AES | AARCH64_FL_SHA2, 
> tsv110,   0x48, 0xd01, -1)
>  
>  /* ARMv8.4-A Architecture Processors.  */
>  
> diff --git a/gcc/config/aarch64/aarch64.md 
> b/gcc/config/aarch64/aarch64.md index b7cd9fc..861f059 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -361,6 +361,7 @@
>  (include "thunderx.md")
>  (include "../arm/xgene1.md")
>  (include "thunderx2t99.md")
> +(include "tsv110.md")
>  
>  ;; 
> ---
>  ;; Jumps and other miscellaneous insns diff --

Re: Re : add tsv110 pipeline scheduling

2019-04-03 Thread James Greenhalgh
On Tue, Apr 02, 2019 at 03:26:22PM +0100, wuyuan (E) wrote:
> Hi ,James:
> Has the submitted patch been merged into the trunk?  Looking forward to your 
> reply , thank you very much!  
>   
>  
>   
> Best Regards,
>   
> wuyuan

Hi Wuyuan,

This patch is OK for trunk. Thank you for your many clarifications.

Will you need one of us to apply this to trunk on your behalf?

If you would like me to apply your patch, please provide the full ChangeLog
with author information, like so:

2019-04-03  James Greenhalgh  
Second Author  
Third Author  

* config/aarch64/aarch64-cores.def (tsv1100): Change scheduling model.
* config/aarch64/aarch64.md : Add "tsv110.md"
* config/aarch64/tsv110.md: New file.

Thanks,
James


> -邮件原件-
> 发件人: wuyuan (E)
> 发送时间: 2019年3月15日 21:57
> 收件人: 'James Greenhalgh' 
> 抄送: Kyrill Tkachov ; gcc-patches@gcc.gnu.org; 
> Zhangyichao (AB) ; Zhanghaijian (A) 
> ; nd ; wufeng (O) 
> ; Yangfei (Felix) 
> 主题: Re : add tsv110 pipeline scheduling
> 
> Hi , James:
>  Thank you very much for your meticulous review work. The explanation of 
> the two questions as follows:
>  The first problem is caused by my negligence and should be changed to " 
> crypto_sha256_fast" .
>   The second question I have verified with the hardware engineer. Only 
> ALU2/ALU3 could support PSTATE register update so any instruction intends to 
> update NZCV will be issued to ALU2/ALU3.   MDU could provide a better 
> pipeline efficiency for multi cycle ALU instruction so we issue 2 cycles ALU 
> w/o PSTATE update to MDU unit.  the current pipeline processing is  ok  , 
> except the pipeline " tsv110_alu2" should replace with " tsv110_alu2| 
> tsv110_alu3".
>   
>   
>
> 
> The detailed patches are as follows:
> 
>   * config/aarch64/aarch64-cores.def (tsv1100): Change scheduling model.
>   * config/aarch64/aarch64.md : Add "tsv110.md"
>   * config/aarch64/tsv110.md: New file.
> 
> 
> diff --git a/gcc/config/aarch64/aarch64-cores.def 
> b/gcc/config/aarch64/aarch64-cores.def
> index ed56e5e..82d91d6
> --- a/gcc/config/aarch64/aarch64-cores.def
> +++ b/gcc/config/aarch64/aarch64-cores.def
> @@ -105,7 +105,7 @@ AARCH64_CORE("neoverse-n1",  neoversen1, cortexa57, 8_2A, 
>  AARCH64_FL_FOR_ARCH8_  AARCH64_CORE("neoverse-e1",  neoversee1, cortexa53, 
> 8_2A,  AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | 
> AARCH64_FL_DOTPROD | AARCH64_FL_SSBS, cortexa53, 0x41, 0xd4a, -1)
>  
>  /* HiSilicon ('H') cores. */
> -AARCH64_CORE("tsv110",  tsv110, cortexa57, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | 
> AARCH64_FL_CRYPTO | AARCH64_FL_F16 | AARCH64_FL_AES | AARCH64_FL_SHA2, 
> tsv110,   0x48, 0xd01, -1)
> +AARCH64_CORE("tsv110",  tsv110, tsv110, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | 
> AARCH64_FL_CRYPTO | AARCH64_FL_F16 | AARCH64_FL_AES | AARCH64_FL_SHA2, 
> tsv110,   0x48, 0xd01, -1)
>  
>  /* ARMv8.4-A Architecture Processors.  */
>  
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md 
> index b7cd9fc..861f059 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -361,6 +361,7 @@
>  (include "thunderx.md")
>  (include "../arm/xgene1.md")
>  (include "thunderx2t99.md")
> +(include "tsv110.md")
>  
>  ;; ---
>  ;; Jumps and other miscellaneous insns
> diff --git a/gcc/config/aarch64/tsv110.md b/gcc/config/aarch64/tsv110.md new 
> file mode 100644 index 000..9d12839
> --- /dev/null
> +++ b/gcc/config/aarch64/tsv110.md
> @@ -0,0 +1,708 @@
> +;; tsv110 pipeline description
> +;; Copyright (C) 2018 Free Software Foundation, Inc.
> +;;
> +;; This file is part of GCC.
> +;;
> +;; GCC is free software; you can redistribute it and/or modify it ;; 
> +under the terms of the GNU General Public License as published by ;; 
> +the Free Software Foundation; either version 3, or (at your option) ;; 
> +any later version.
> +;;
> +;; GCC is distributed in the hope that it will be useful, but ;; 
> +WITHOUT ANY WARRANTY; without even the implied

Re : add tsv110 pipeline scheduling

2019-04-02 Thread wuyuan (E)
Hi ,James:
Has the submitted patch been merged into the trunk?  Looking forward to your 
reply , thank you very much!

   

Best Regards,

wuyuan
-邮件原件-
发件人: wuyuan (E)
发送时间: 2019年3月15日 21:57
收件人: 'James Greenhalgh' 
抄送: Kyrill Tkachov ; gcc-patches@gcc.gnu.org; 
Zhangyichao (AB) ; Zhanghaijian (A) 
; nd ; wufeng (O) 
; Yangfei (Felix) 
主题: Re : add tsv110 pipeline scheduling

Hi , James:
 Thank you very much for your meticulous review work. The explanation of 
the two questions as follows:
 The first problem is caused by my negligence and should be changed to " 
crypto_sha256_fast" .
  The second question I have verified with the hardware engineer. Only 
ALU2/ALU3 could support PSTATE register update so any instruction intends to 
update NZCV will be issued to ALU2/ALU3.   MDU could provide a better pipeline 
efficiency for multi cycle ALU instruction so we issue 2 cycles ALU w/o PSTATE 
update to MDU unit.  the current pipeline processing is  ok  , except the 
pipeline " tsv110_alu2" should replace with " tsv110_alu2| tsv110_alu3".


 

The detailed patches are as follows:

  * config/aarch64/aarch64-cores.def (tsv1100): Change scheduling model.
  * config/aarch64/aarch64.md : Add "tsv110.md"
  * config/aarch64/tsv110.md: New file.


diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index ed56e5e..82d91d6
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -105,7 +105,7 @@ AARCH64_CORE("neoverse-n1",  neoversen1, cortexa57, 8_2A,  
AARCH64_FL_FOR_ARCH8_  AARCH64_CORE("neoverse-e1",  neoversee1, cortexa53, 
8_2A,  AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | 
AARCH64_FL_DOTPROD | AARCH64_FL_SSBS, cortexa53, 0x41, 0xd4a, -1)
 
 /* HiSilicon ('H') cores. */
-AARCH64_CORE("tsv110",  tsv110, cortexa57, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | 
AARCH64_FL_CRYPTO | AARCH64_FL_F16 | AARCH64_FL_AES | AARCH64_FL_SHA2, tsv110,  
 0x48, 0xd01, -1)
+AARCH64_CORE("tsv110",  tsv110, tsv110, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | 
AARCH64_FL_CRYPTO | AARCH64_FL_F16 | AARCH64_FL_AES | AARCH64_FL_SHA2, tsv110,  
 0x48, 0xd01, -1)
 
 /* ARMv8.4-A Architecture Processors.  */
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md 
index b7cd9fc..861f059 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -361,6 +361,7 @@
 (include "thunderx.md")
 (include "../arm/xgene1.md")
 (include "thunderx2t99.md")
+(include "tsv110.md")
 
 ;; ---
 ;; Jumps and other miscellaneous insns
diff --git a/gcc/config/aarch64/tsv110.md b/gcc/config/aarch64/tsv110.md new 
file mode 100644 index 000..9d12839
--- /dev/null
+++ b/gcc/config/aarch64/tsv110.md
@@ -0,0 +1,708 @@
+;; tsv110 pipeline description
+;; Copyright (C) 2018 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it ;; 
+under the terms of the GNU General Public License as published by ;; 
+the Free Software Foundation; either version 3, or (at your option) ;; 
+any later version.
+;;
+;; GCC is distributed in the hope that it will be useful, but ;; 
+WITHOUT ANY WARRANTY; without even the implied warranty of ;; 
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU ;; 
+General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License ;; 
+along with GCC; see the file COPYING3.  If not see ;; 
+<http://www.gnu.org/licenses/>.
+
+(define_automaton "tsv110")
+
+(define_attr "tsv110_neon_type"
+  "neon_arith_acc, neon_arith_acc_q,
+   neon_arith_basic, neon_arith_complex,
+   neon_reduc_add_acc, neon_multiply, neon_multiply_q,
+   neon_multiply_long, neon_mla, neon_mla_q, neon_mla_long,
+   neon_sat_mla_long, neon_shift_acc, neon_shift_imm_basic,
+   neon_shift_imm_complex,
+   neon_shift_reg_basic, neon_shift_reg_basic_q, neon_shift_reg_complex,
+   neon_shift_reg_complex_q, neon_fp_negabs, neon_fp_arith,
+   neon_fp_arith_q, neon_fp_reductions_q, neon_fp_cvt_int,
+   neon_fp_cvt_int_q, neon_fp_cvt16, neon_fp_minmax, neon_fp_mul,
+   neon_fp_mul_q, neon_fp_mla, neon_fp_mla_q, neon_fp_recpe_rsqrte,
+   neon_fp_recpe_rsqrte_q, neon_fp_recps_rsqrts, neon_

Re : add tsv110 pipeline scheduling

2019-03-25 Thread wuyuan (E)
Hi ,James:
The modified patch has been uploaded for ten days. Looking forward to your 
reply, thank you very much! 
 

Best Regards,

wuyuan
-邮件原件-
发件人: wuyuan (E) 
发送时间: 2019年3月15日 21:57
收件人: 'James Greenhalgh' 
抄送: Kyrill Tkachov ; gcc-patches@gcc.gnu.org; 
Zhangyichao (AB) ; Zhanghaijian (A) 
; nd ; wufeng (O) 
; Yangfei (Felix) 
主题: Re : add tsv110 pipeline scheduling

Hi , James:
 Thank you very much for your meticulous review work. The explanation of 
the two questions as follows:
 The first problem is caused by my negligence and should be changed to " 
crypto_sha256_fast" .
The second question I have verified with the hardware engineer. Only ALU2/ALU3 
could support PSTATE register update so any instruction intends to update NZCV 
will be issued to ALU2/ALU3.   MDU could provide a better pipeline efficiency 
for multi cycle ALU instruction so we issue 2 cycles ALU w/o PSTATE update to 
MDU unit.  the current pipeline processing is  ok  , except the pipeline " 
tsv110_alu2" should replace with " tsv110_alu2| tsv110_alu3".


 

The detailed patches are as follows:

  * config/aarch64/aarch64-cores.def (tsv1100): Change scheduling model.
  * config/aarch64/aarch64.md : Add "tsv110.md"
  * config/aarch64/tsv110.md: New file.


diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index ed56e5e..82d91d6
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -105,7 +105,7 @@ AARCH64_CORE("neoverse-n1",  neoversen1, cortexa57, 8_2A,  
AARCH64_FL_FOR_ARCH8_  AARCH64_CORE("neoverse-e1",  neoversee1, cortexa53, 
8_2A,  AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | 
AARCH64_FL_DOTPROD | AARCH64_FL_SSBS, cortexa53, 0x41, 0xd4a, -1)
 
 /* HiSilicon ('H') cores. */
-AARCH64_CORE("tsv110",  tsv110, cortexa57, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | 
AARCH64_FL_CRYPTO | AARCH64_FL_F16 | AARCH64_FL_AES | AARCH64_FL_SHA2, tsv110,  
 0x48, 0xd01, -1)
+AARCH64_CORE("tsv110",  tsv110, tsv110, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | 
AARCH64_FL_CRYPTO | AARCH64_FL_F16 | AARCH64_FL_AES | AARCH64_FL_SHA2, tsv110,  
 0x48, 0xd01, -1)
 
 /* ARMv8.4-A Architecture Processors.  */
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md 
index b7cd9fc..861f059 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -361,6 +361,7 @@
 (include "thunderx.md")
 (include "../arm/xgene1.md")
 (include "thunderx2t99.md")
+(include "tsv110.md")
 
 ;; ---
 ;; Jumps and other miscellaneous insns
diff --git a/gcc/config/aarch64/tsv110.md b/gcc/config/aarch64/tsv110.md new 
file mode 100644 index 000..9d12839
--- /dev/null
+++ b/gcc/config/aarch64/tsv110.md
@@ -0,0 +1,708 @@
+;; tsv110 pipeline description
+;; Copyright (C) 2018 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it ;; 
+under the terms of the GNU General Public License as published by ;; 
+the Free Software Foundation; either version 3, or (at your option) ;; 
+any later version.
+;;
+;; GCC is distributed in the hope that it will be useful, but ;; 
+WITHOUT ANY WARRANTY; without even the implied warranty of ;; 
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU ;; 
+General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License ;; 
+along with GCC; see the file COPYING3.  If not see ;; 
+<http://www.gnu.org/licenses/>.
+
+(define_automaton "tsv110")
+
+(define_attr "tsv110_neon_type"
+  "neon_arith_acc, neon_arith_acc_q,
+   neon_arith_basic, neon_arith_complex,
+   neon_reduc_add_acc, neon_multiply, neon_multiply_q,
+   neon_multiply_long, neon_mla, neon_mla_q, neon_mla_long,
+   neon_sat_mla_long, neon_shift_acc, neon_shift_imm_basic,
+   neon_shift_imm_complex,
+   neon_shift_reg_basic, neon_shift_reg_basic_q, neon_shift_reg_complex,
+   neon_shift_reg_complex_q, neon_fp_negabs, neon_fp_arith,
+   neon_fp_arith_q, neon_fp_reductions_q, neon_fp_cvt_int,
+   neon_fp_cvt_int_q, neon_fp_cvt16, neon_fp_minmax, neon_fp_mul,
+   neon_fp_mul_q, neon_fp_mla, neon_fp_mla_q, neon_fp_recpe_rsqrte,
+   neon_fp_recpe_rsqrte_q, neon_fp_recps_rsqrts, neon_fp_recps_rsqr

Re : add tsv110 pipeline scheduling

2019-03-15 Thread wuyuan (E)
eq_attr "type" "neon_load1_2reg,neon_load1_2reg_q,\
+  neon_load2_2reg,neon_load2_2reg_q,neon_load2_all_lanes,\
+  neon_load2_all_lanes_q,neon_load2_one_lane,neon_load2_one_lane_q"))
+  "(tsv110_ls1 + tsv110_fsu1)|(tsv110_ls1 + tsv110_fsu2)|(tsv110_ls2 + 
tsv110_fsu1)|(tsv110_ls2 + tsv110_fsu2)")
+
+(define_insn_reservation
+  "tsv110_neon_ld3" 9
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "neon_load3_3reg,neon_load3_3reg_q,\
+  neon_load3_one_lane,neon_load3_one_lane_q,\
+  neon_load3_all_lanes,neon_load3_all_lanes_q"))
+  "(tsv110_ls1 + tsv110_fsu1)|(tsv110_ls1 + tsv110_fsu2)|(tsv110_ls2 + 
tsv110_fsu1)|(tsv110_ls2 + tsv110_fsu2)")
+
+(define_insn_reservation
+  "tsv110_neon_ld4_lane" 9
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "neon_load4_all_lanes,neon_load4_all_lanes_q,\
+  neon_load4_one_lane,neon_load4_one_lane_q"))
+  "(tsv110_ls1 + tsv110_fsu1)|(tsv110_ls1 + tsv110_fsu2)|(tsv110_ls2 + 
tsv110_fsu1)|(tsv110_ls2 + tsv110_fsu2)")
+
+(define_insn_reservation
+  "tsv110_neon_ld4_reg" 11
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "neon_load4_all_lanes,neon_load4_all_lanes_q,\
+  neon_load4_one_lane,neon_load4_one_lane_q"))
+  "(tsv110_ls1 + tsv110_fsu1)|(tsv110_ls1 + tsv110_fsu2)|(tsv110_ls2 + 
tsv110_fsu1)|(tsv110_ls2 + tsv110_fsu2)")
+
+;; Store Instructions.
+
+(define_insn_reservation
+  "tsv110_neon_store_a" 0
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "tsv110_neon_type" "neon_store_a"))
+  "tsv110_fsu1|tsv110_fsu2")
+
+(define_insn_reservation
+  "tsv110_neon_store_b" 0
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "tsv110_neon_type" "neon_store_b"))
+  "tsv110_fsu1|tsv110_fsu2")
+
+;; These block issue for a number of cycles proportional to the number
+;; of 64-bit chunks they will store, we don't attempt to model that
+;; precisely, treat them as blocking execution for two cycles when
+;; issued.
+(define_insn_reservation
+  "tsv110_neon_store_complex" 0
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "tsv110_neon_type" "neon_store_complex"))
+  "tsv110_block*2")
+
+;; Floating-Point Operations.
+
+(define_insn_reservation "tsv110_fp_const" 2
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "fconsts,fconstd,fmov"))
+  "tsv110_fsu1|tsv110_fsu2")
+
+(define_insn_reservation "tsv110_fp_add_sub" 5
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "fadds,faddd,fmuls,fmuld"))
+  "tsv110_fsu1|tsv110_fsu2")
+
+(define_insn_reservation "tsv110_fp_mac" 7
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "fmacs,ffmas,fmacd,ffmad"))
+  "tsv110_fsu1|tsv110_fsu2")
+
+(define_insn_reservation "tsv110_fp_cvt" 3
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "f_cvt"))
+  "tsv110_fsu1|tsv110_fsu2")
+
+(define_insn_reservation "tsv110_fp_cvtf2i" 4
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "f_cvtf2i"))
+  "tsv110_fsu1")
+
+(define_insn_reservation "tsv110_fp_cvti2f" 5
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "f_cvti2f"))
+  "(tsv110_alu1+tsv110_fsu1)|(tsv110_alu1+tsv110_fsu2)")
+
+(define_insn_reservation "tsv110_fp_cmp" 4
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "fcmps,fcmpd"))
+  "tsv110_fsu1|tsv110_fsu2")
+
+(define_insn_reservation "tsv110_fp_arith" 2
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "ffariths,ffarithd"))
+  "tsv110_fsu1|tsv110_fsu2")
+
+(define_insn_reservation "tsv110_fp_divs" 12
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "fdivs,neon_fp_div_s,fdivd,neon_fp_div_d,\
+  neon_fp_div_s_q,neon_fp_div_d_q"))
+  "tsv110_fsu1")
+
+(define_insn_reservation "tsv110_fp_sqrts" 24
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "fsqrts,neon_fp_sqrt_s,fsqrtd,neon_fp_sqrt_d,\
+  neon_fp_sqrt_s_q,neon_fp_sqrt_d_q"))
+  "tsv110_fsu2")
+
+(define_insn_reservation "tsv110_crypto_aes" 3
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "crypto_aese,crypto_aesmc"))
+  "tsv110_fsu1")
+  
+(define_insn_reservation "tsv110_crypto_sha1_fast" 2
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "crypto_sha1_fast,crypto_sha1_xor"))
+  "(tsv110_fsu1|tsv110_fsu2)")
+
+(define_insn_reservation "tsv110_crypto_sha256_fast" 2
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "crypto_sha256_fast"))
+  "tsv110_fsu1")
+
+(define_insn_reservation "tsv110_crypto_complex" 5
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "crypto_sha1_slow,crypto_sha256_slow"))
+  "tsv110_fsu1")
+
+;; We lie with calls.  They take up all issue slots, but are otherwise
+;; not harmful.
+(define_insn_reservation "tsv110_call" 1
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "call"))
+  
"tsv110_alu1_issue+tsv110_alu2_issue+tsv110_alu3_issue+tsv110_fsu1_issue+tsv110_fsu2_issue\
++tsv110_mdu_issue+tsv110_ls1_issue+tsv110_ls2_issue"
+)
+
+;; Simple execution unit bypasses
+(define_bypass 1 "tsv110_alu"
+"tsv110_alu,tsv110_alu_shift")
+(define_bypass 2 "tsv110_alu_shift"
+"tsv110_alu,tsv110_alu_shift")
+
+;; An MLA or a MUL can feed a dependent MLA.
+(define_bypass 3 "tsv110_neon_*mla*,tsv110_neon_*mul*"
+"tsv110_neon_*mla*")
+
+;; We don't need to care about control hazards, either the branch is
+;; predicted in which case we pay no penalty, or the branch is
+;; mispredicted in which case instruction scheduling will be unlikely to
+;; help.
+(define_bypass 1 "tsv110_*"
+"tsv110_call,tsv110_branch")



 



-邮件原件-
发件人: James Greenhalgh [mailto:james.greenha...@arm.com] 
发送时间: 2019年3月14日 20:36
收件人: wuyuan (E) 
抄送: Kyrill Tkachov ; gcc-patches@gcc.gnu.org; 
Zhangyichao (AB) ; Zhanghaijian (A) 
; nd ; wufeng (O) 
; Yangfei (Felix) 
主题: Re: Re : add tsv110 pipeline scheduling

On Sat, Feb 23, 2019 at 01:28:22PM +, wuyuan (E) wrote:
> Hi ,James:
> Sorry for not responding to your email in time because of Chinese New Year’s 
> holiday and urgent work. The three questions you mentioned last email are due 
> to my misunderstanding of pipeline.
> the first question, These instructions will occupy both the tsv110_ls* and 
> tsv110_fsu* Pipeline at the same time.

Hi Wuyuan,

Please accept my apologies for how long it has taken me to revisit your patch 
and review it.

I have two questions:

> +(define_insn_reservation "tsv110_crypto_sha256_fast" 2
> +  (and (eq_attr "tune" "tsv110")
> +   (eq_attr "type" "crypto_sha1_fast"))
> +  "tsv110_fsu1")

I think you intended to check for type crypto_sha256_fast here.

> +;; ALU ops with shift
> +(define_insn_reservation "tsv110_alu_shift" 2
> +  (and (eq_attr "tune" "tsv110")
> +   (eq_attr "type" "extend,\
> + alu_shift_imm,alu_shift_reg,\
> + crc,logic_shift_imm,logic_shift_reg,\
> + mov_shift,mvn_shift,\
> + mov_shift_reg,mvn_shift_reg"))
> +  "tsv110_mdu")
> +  
> +(define_insn_reservation "tsv110_alus_shift" 2
> +  (and (eq_attr "tune" "tsv110")
> +   (eq_attr "type" "alus_shift_imm,alus_shift_reg,\
> + logics_shift_imm,logics_shift_reg"))
> +  "tsv110_alu2")

Is this the correct description? This code says that ALU operations with shift 
are executed in MDU, but ALU operations with shift that are also flag setting 
are executed in ALU2?

Otherwise, this patch is OK for trunk. Thank you for your patience.

Best Regards,
James

> rewritten as follows:
> (define_insn_reservation
>   "tsv110_neon_ld4_lane" 9
>   (and (eq_attr "tune" "tsv110")
>(eq_attr "type" "neon_load4_all_lanes,neon_load4_all_lanes_q,\
>  neon_load4_one_lane,neon_load4_one_lane_q"))
>   "(tsv110_ls1 + tsv110_fsu1)|(tsv110_ls1 + tsv110_fsu2)|(tsv110_ls2 + 
> tsv110_fsu1)|(tsv110_ls2 + tsv110_fsu2)")
> 
> the second question, These instructions will use tsv110_fsu1 Pipeline or 
> tsv110_fsu2 Pipeline.
> rewritten as follows:
> (define_insn_reservation  "tsv110_neon_abd_aba" 4
>   (and (eq_attr "tune" "tsv110")
>(eq_attr "type" "neon_abd,neon_arith_acc"))
>   "tsv110_fsu1|tsv110_fsu2")
> 
> the third question, These instructions will use tsv110_fsu1 Pipeline or 
> tsv110_fsu2 Pipeline.
> rewritten as follows:
> (define_insn_reservation  "tsv110_neon_abd_aba_q" 4
>   (and (eq_attr "tune" "tsv110")
>(eq_attr "type" "neon_arith_acc_q"))
>   "tsv110_fsu1|tsv110_fsu2")
> 
> In addition to the above changes, I asked hardware engineers and colleagues 
> to review my  patch and modify some of the errors. The detailed patches are 
> as follows:
> 
>   * config/aarch64/aarch64-cores.def (tsv1100): Change scheduling model.
>   * config/aarch64/aarch64.md : Add "tsv110.md"
>   * config/aarch64/tsv110.md: New file.
> 


tsv110_pipeline.patch
Description: tsv110_pipeline.patch


Re: Re : add tsv110 pipeline scheduling

2019-03-14 Thread James Greenhalgh
On Sat, Feb 23, 2019 at 01:28:22PM +, wuyuan (E) wrote:
> Hi ,James:
> Sorry for not responding to your email in time because of Chinese New Year’s 
> holiday and urgent work. The three questions you mentioned last email are due 
> to my misunderstanding of pipeline.
> the first question, These instructions will occupy both the tsv110_ls* and 
> tsv110_fsu* Pipeline at the same time.

Hi Wuyuan,

Please accept my apologies for how long it has taken me to revisit your
patch and review it.

I have two questions:

> +(define_insn_reservation "tsv110_crypto_sha256_fast" 2
> +  (and (eq_attr "tune" "tsv110")
> +   (eq_attr "type" "crypto_sha1_fast"))
> +  "tsv110_fsu1")

I think you intended to check for type crypto_sha256_fast here.

> +;; ALU ops with shift
> +(define_insn_reservation "tsv110_alu_shift" 2
> +  (and (eq_attr "tune" "tsv110")
> +   (eq_attr "type" "extend,\
> + alu_shift_imm,alu_shift_reg,\
> + crc,logic_shift_imm,logic_shift_reg,\
> + mov_shift,mvn_shift,\
> + mov_shift_reg,mvn_shift_reg"))
> +  "tsv110_mdu")
> +  
> +(define_insn_reservation "tsv110_alus_shift" 2
> +  (and (eq_attr "tune" "tsv110")
> +   (eq_attr "type" "alus_shift_imm,alus_shift_reg,\
> + logics_shift_imm,logics_shift_reg"))
> +  "tsv110_alu2")

Is this the correct description? This code says that ALU operations with
shift are executed in MDU, but ALU operations with shift that are also
flag setting are executed in ALU2?

Otherwise, this patch is OK for trunk. Thank you for your patience.

Best Regards,
James

> rewritten as follows:
> (define_insn_reservation
>   "tsv110_neon_ld4_lane" 9
>   (and (eq_attr "tune" "tsv110")
>(eq_attr "type" "neon_load4_all_lanes,neon_load4_all_lanes_q,\
>  neon_load4_one_lane,neon_load4_one_lane_q"))
>   "(tsv110_ls1 + tsv110_fsu1)|(tsv110_ls1 + tsv110_fsu2)|(tsv110_ls2 + 
> tsv110_fsu1)|(tsv110_ls2 + tsv110_fsu2)")
> 
> the second question, These instructions will use tsv110_fsu1 Pipeline or 
> tsv110_fsu2 Pipeline.
> rewritten as follows:
> (define_insn_reservation  "tsv110_neon_abd_aba" 4
>   (and (eq_attr "tune" "tsv110")
>(eq_attr "type" "neon_abd,neon_arith_acc"))
>   "tsv110_fsu1|tsv110_fsu2")
> 
> the third question, These instructions will use tsv110_fsu1 Pipeline or 
> tsv110_fsu2 Pipeline.
> rewritten as follows:
> (define_insn_reservation  "tsv110_neon_abd_aba_q" 4
>   (and (eq_attr "tune" "tsv110")
>(eq_attr "type" "neon_arith_acc_q"))
>   "tsv110_fsu1|tsv110_fsu2")
> 
> In addition to the above changes, I asked hardware engineers and colleagues 
> to review my  patch and modify some of the errors. The detailed patches are 
> as follows:
> 
>   * config/aarch64/aarch64-cores.def (tsv1100): Change scheduling model.
>   * config/aarch64/aarch64.md : Add "tsv110.md"
>   * config/aarch64/tsv110.md: New file.
> 


re: add tsv110 pipeline scheduling

2019-03-07 Thread wuyuan (E)
Hi ,James:
The modified patch has been uploaded for ten days. If you have time, I hope to 
get your comments earlier, thank you very much! 

 Best Regards,

wuyuan

-邮件原件-
发件人: wuyuan (E) 
发送时间: 2019年3月4日 21:46
收件人: 'James Greenhalgh' 
抄送: 'Kyrill Tkachov' ; 'gcc-patches@gcc.gnu.org' 
; Zhangyichao (AB) ; 
Zhanghaijian (A) ; 'n...@arm.com' ; 
wufeng (O) ; Yangfei (Felix) 
主题: re: add tsv110 pipeline scheduling
Hi ,James:
Have you seen the patch submitted last week? If the problem with the patch has 
been fixed, I hope to get into the trunk earlier. look forward to your reply. 
Thank you.


Best Regards,

wuyuan 
-邮件原件-
发件人: wuyuan (E)
发送时间: 2019年2月23日 21:28
收件人: 'James Greenhalgh' 
抄送: Kyrill Tkachov ; gcc-patches@gcc.gnu.org; 
Zhangyichao (AB) ; Zhanghaijian (A) 
; n...@arm.com; wufeng (O) ; 
Yangfei (Felix)  Re : add tsv110 pipeline scheduling

Hi ,James:
Sorry for not responding to your email in time because of Chinese New Year’s 
holiday and urgent work. The three questions you mentioned last email are due 
to my misunderstanding of pipeline.
the first question, These instructions will occupy both the tsv110_ls* and 
tsv110_fsu* Pipeline at the same time.
rewritten as follows:
(define_insn_reservation
  "tsv110_neon_ld4_lane" 9
  (and (eq_attr "tune" "tsv110")
   (eq_attr "type" "neon_load4_all_lanes,neon_load4_all_lanes_q,\
   neon_load4_one_lane,neon_load4_one_lane_q"))
  "(tsv110_ls1 + tsv110_fsu1)|(tsv110_ls1 + tsv110_fsu2)|(tsv110_ls2 + 
tsv110_fsu1)|(tsv110_ls2 + tsv110_fsu2)")

the second question, These instructions will use tsv110_fsu1 Pipeline or 
tsv110_fsu2 Pipeline.
rewritten as follows:
(define_insn_reservation  "tsv110_neon_abd_aba" 4
  (and (eq_attr "tune" "tsv110")
   (eq_attr "type" "neon_abd,neon_arith_acc"))
  "tsv110_fsu1|tsv110_fsu2")

the third question, These instructions will use tsv110_fsu1 Pipeline or 
tsv110_fsu2 Pipeline.
rewritten as follows:
(define_insn_reservation  "tsv110_neon_abd_aba_q" 4
  (and (eq_attr "tune" "tsv110")
   (eq_attr "type" "neon_arith_acc_q"))
  "tsv110_fsu1|tsv110_fsu2")

In addition to the above changes, I asked hardware engineers and colleagues to 
review my  patch and modify some of the errors. The detailed patches are as 
follows:

  * config/aarch64/aarch64-cores.def (tsv1100): Change scheduling model.
  * config/aarch64/aarch64.md : Add "tsv110.md"
  * config/aarch64/tsv110.md: New file.

diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index ed56e5e..82d91d6
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -105,7 +105,7 @@ AARCH64_CORE("neoverse-n1",  neoversen1, cortexa57, 8_2A,  
AARCH64_FL_FOR_ARCH8_  AARCH64_CORE("neoverse-e1",  neoversee1, cortexa53, 
8_2A,  AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | 
AARCH64_FL_DOTPROD | AARCH64_FL_SSBS, cortexa53, 0x41, 0xd4a, -1)
 
 /* HiSilicon ('H') cores. */
-AARCH64_CORE("tsv110",  tsv110, cortexa57, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | 
AARCH64_FL_CRYPTO | AARCH64_FL_F16 | AARCH64_FL_AES | AARCH64_FL_SHA2, tsv110,  
 0x48, 0xd01, -1)
+AARCH64_CORE("tsv110",  tsv110, tsv110, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | 
AARCH64_FL_CRYPTO | AARCH64_FL_F16 | AARCH64_FL_AES | AARCH64_FL_SHA2, tsv110,  
 0x48, 0xd01, -1)
 
 /* ARMv8.4-A Architecture Processors.  */
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md 
index b7cd9fc..861f059 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -361,6 +361,7 @@
 (include "thunderx.md")
 (include "../arm/xgene1.md")
 (include "thunderx2t99.md")
+(include "tsv110.md")
 
 ;; ---
 ;; Jumps and other miscellaneous insns
diff --git a/gcc/config/aarch64/tsv110.md b/gcc/config/aarch64/tsv110.md new 
file mode 100644 index 000..9d12839
--- /dev/null
+++ b/gcc/config/aarch64/tsv110.md
@@ -0,0 +1,708 @@
+;; tsv110 pipeline description
+;; Copyright (C) 2018 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or 

re: add tsv110 pipeline scheduling

2019-03-04 Thread wuyuan (E)
Hi ,James:
Have you seen the patch submitted last week? If the problem with the patch has 
been fixed, I hope to get into the trunk earlier. look forward to your reply. 
Thank you.


Best Regards,

wuyuan 


-邮件原件-
发件人: wuyuan (E) 
发送时间: 2019年2月23日 21:28
收件人: 'James Greenhalgh' 
抄送: Kyrill Tkachov ; gcc-patches@gcc.gnu.org; 
Zhangyichao (AB) ; Zhanghaijian (A) 
; n...@arm.com; wufeng (O) ; 
Yangfei (Felix) 
Re : add tsv110 pipeline scheduling

Hi ,James:
Sorry for not responding to your email in time because of Chinese New Year’s 
holiday and urgent work. The three questions you mentioned last email are due 
to my misunderstanding of pipeline.
the first question, These instructions will occupy both the tsv110_ls* and 
tsv110_fsu* Pipeline at the same time.
rewritten as follows:
(define_insn_reservation
  "tsv110_neon_ld4_lane" 9
  (and (eq_attr "tune" "tsv110")
   (eq_attr "type" "neon_load4_all_lanes,neon_load4_all_lanes_q,\
   neon_load4_one_lane,neon_load4_one_lane_q"))
  "(tsv110_ls1 + tsv110_fsu1)|(tsv110_ls1 + tsv110_fsu2)|(tsv110_ls2 + 
tsv110_fsu1)|(tsv110_ls2 + tsv110_fsu2)")

the second question, These instructions will use tsv110_fsu1 Pipeline or 
tsv110_fsu2 Pipeline.
rewritten as follows:
(define_insn_reservation  "tsv110_neon_abd_aba" 4
  (and (eq_attr "tune" "tsv110")
   (eq_attr "type" "neon_abd,neon_arith_acc"))
  "tsv110_fsu1|tsv110_fsu2")

the third question, These instructions will use tsv110_fsu1 Pipeline or 
tsv110_fsu2 Pipeline.
rewritten as follows:
(define_insn_reservation  "tsv110_neon_abd_aba_q" 4
  (and (eq_attr "tune" "tsv110")
   (eq_attr "type" "neon_arith_acc_q"))
  "tsv110_fsu1|tsv110_fsu2")

In addition to the above changes, I asked hardware engineers and colleagues to 
review my  patch and modify some of the errors. The detailed patches are as 
follows:

  * config/aarch64/aarch64-cores.def (tsv1100): Change scheduling model.
  * config/aarch64/aarch64.md : Add "tsv110.md"
  * config/aarch64/tsv110.md: New file.

diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index ed56e5e..82d91d6
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -105,7 +105,7 @@ AARCH64_CORE("neoverse-n1",  neoversen1, cortexa57, 8_2A,  
AARCH64_FL_FOR_ARCH8_  AARCH64_CORE("neoverse-e1",  neoversee1, cortexa53, 
8_2A,  AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | 
AARCH64_FL_DOTPROD | AARCH64_FL_SSBS, cortexa53, 0x41, 0xd4a, -1)
 
 /* HiSilicon ('H') cores. */
-AARCH64_CORE("tsv110",  tsv110, cortexa57, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | 
AARCH64_FL_CRYPTO | AARCH64_FL_F16 | AARCH64_FL_AES | AARCH64_FL_SHA2, tsv110,  
 0x48, 0xd01, -1)
+AARCH64_CORE("tsv110",  tsv110, tsv110, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | 
AARCH64_FL_CRYPTO | AARCH64_FL_F16 | AARCH64_FL_AES | AARCH64_FL_SHA2, tsv110,  
 0x48, 0xd01, -1)
 
 /* ARMv8.4-A Architecture Processors.  */
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md 
index b7cd9fc..861f059 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -361,6 +361,7 @@
 (include "thunderx.md")
 (include "../arm/xgene1.md")
 (include "thunderx2t99.md")
+(include "tsv110.md")
 
 ;; ---
 ;; Jumps and other miscellaneous insns
diff --git a/gcc/config/aarch64/tsv110.md b/gcc/config/aarch64/tsv110.md new 
file mode 100644 index 000..9d12839
--- /dev/null
+++ b/gcc/config/aarch64/tsv110.md
@@ -0,0 +1,708 @@
+;; tsv110 pipeline description
+;; Copyright (C) 2018 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it ;; 
+under the terms of the GNU General Public License as published by ;; 
+the Free Software Foundation; either version 3, or (at your option) ;; 
+any later version.
+;;
+;; GCC is distributed in the hope that it will be useful, but ;; 
+WITHOUT ANY WARRANTY; without even the implied warranty of ;; 
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU ;; 
+General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License ;; 
+along with GCC; see the file COPYING3.  If not see ;; 
+<http://www.gnu.org/licenses/>.
+
+(define_automaton "tsv110")
+
+(d

Re : add tsv110 pipeline scheduling

2019-02-23 Thread wuyuan (E)
fine_insn_reservation
+  "tsv110_neon_ld1_reg3" 7
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "neon_load1_3reg,neon_load1_3reg_q"))
+  "tsv110_ls1|tsv110_ls2")
+
+(define_insn_reservation
+  "tsv110_neon_ld1_reg4" 7
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "neon_load1_4reg,neon_load1_4reg_q"))
+  "tsv110_ls1|tsv110_ls2")
+
+(define_insn_reservation
+  "tsv110_neon_ld2" 8
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "neon_load1_2reg,neon_load1_2reg_q,\
+  neon_load2_2reg,neon_load2_2reg_q,neon_load2_all_lanes,\
+  neon_load2_all_lanes_q,neon_load2_one_lane,neon_load2_one_lane_q"))
+  "(tsv110_ls1 + tsv110_fsu1)|(tsv110_ls1 + tsv110_fsu2)|(tsv110_ls2 + 
tsv110_fsu1)|(tsv110_ls2 + tsv110_fsu2)")
+
+(define_insn_reservation
+  "tsv110_neon_ld3" 9
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "neon_load3_3reg,neon_load3_3reg_q,\
+  neon_load3_one_lane,neon_load3_one_lane_q,\
+  neon_load3_all_lanes,neon_load3_all_lanes_q"))
+  "(tsv110_ls1 + tsv110_fsu1)|(tsv110_ls1 + tsv110_fsu2)|(tsv110_ls2 + 
tsv110_fsu1)|(tsv110_ls2 + tsv110_fsu2)")
+
+(define_insn_reservation
+  "tsv110_neon_ld4_lane" 9
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "neon_load4_all_lanes,neon_load4_all_lanes_q,\
+  neon_load4_one_lane,neon_load4_one_lane_q"))
+  "(tsv110_ls1 + tsv110_fsu1)|(tsv110_ls1 + tsv110_fsu2)|(tsv110_ls2 + 
tsv110_fsu1)|(tsv110_ls2 + tsv110_fsu2)")
+
+(define_insn_reservation
+  "tsv110_neon_ld4_reg" 11
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "neon_load4_all_lanes,neon_load4_all_lanes_q,\
+  neon_load4_one_lane,neon_load4_one_lane_q"))
+  "(tsv110_ls1 + tsv110_fsu1)|(tsv110_ls1 + tsv110_fsu2)|(tsv110_ls2 + 
tsv110_fsu1)|(tsv110_ls2 + tsv110_fsu2)")
+
+;; Store Instructions.
+
+(define_insn_reservation
+  "tsv110_neon_store_a" 0
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "tsv110_neon_type" "neon_store_a"))
+  "tsv110_fsu1|tsv110_fsu2")
+
+(define_insn_reservation
+  "tsv110_neon_store_b" 0
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "tsv110_neon_type" "neon_store_b"))
+  "tsv110_fsu1|tsv110_fsu2")
+
+;; These block issue for a number of cycles proportional to the number
+;; of 64-bit chunks they will store, we don't attempt to model that
+;; precisely, treat them as blocking execution for two cycles when
+;; issued.
+(define_insn_reservation
+  "tsv110_neon_store_complex" 0
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "tsv110_neon_type" "neon_store_complex"))
+  "tsv110_block*2")
+
+;; Floating-Point Operations.
+
+(define_insn_reservation "tsv110_fp_const" 2
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "fconsts,fconstd,fmov"))
+  "tsv110_fsu1|tsv110_fsu2")
+
+(define_insn_reservation "tsv110_fp_add_sub" 5
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "fadds,faddd,fmuls,fmuld"))
+  "tsv110_fsu1|tsv110_fsu2")
+
+(define_insn_reservation "tsv110_fp_mac" 7
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "fmacs,ffmas,fmacd,ffmad"))
+  "tsv110_fsu1|tsv110_fsu2")
+
+(define_insn_reservation "tsv110_fp_cvt" 3
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "f_cvt"))
+  "tsv110_fsu1|tsv110_fsu2")
+
+(define_insn_reservation "tsv110_fp_cvtf2i" 4
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "f_cvtf2i"))
+  "tsv110_fsu1")
+
+(define_insn_reservation "tsv110_fp_cvti2f" 5
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "f_cvti2f"))
+  "(tsv110_alu1+tsv110_fsu1)|(tsv110_alu1+tsv110_fsu2)")
+
+(define_insn_reservation "tsv110_fp_cmp" 4
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "fcmps,fcmpd"))
+  "tsv110_fsu1|tsv110_fsu2")
+
+(define_insn_reservation "tsv110_fp_arith" 2
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "ffariths,ffarithd"))
+  "tsv110_fsu1|tsv110_fsu2")
+
+(define_insn_reservation "tsv110_fp_divs" 12
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "fdivs,neon_fp_div_s,fdivd,neon_fp_div_d,\
+  neon_fp_div_s_q,neon_fp_div_d_q"))
+  "tsv110_fsu1")
+
+(define_insn_reservation "tsv110_fp_sqrts" 24
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "fsqrts,neon_fp_sqrt_s,fsqrtd,neon_fp_sqrt_d,\
+  neon_fp_sqrt_s_q,neon_fp_sqrt_d_q"))
+  "tsv110_fsu2")
+
+(define_insn_reservation "tsv110_crypto_aes" 3
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "crypto_aese,crypto_aesmc"))
+  "tsv110_fsu1")
+  
+(define_insn_reservation "tsv110_crypto_sha1_fast" 2
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "crypto_sha1_fast,crypto_sha1_xor"))
+  "(tsv110_fsu1|tsv110_fsu2)")
+
+(define_insn_reservation "tsv110_crypto_sha256_fast" 2
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "crypto_sha1_fast"))
+  "tsv110_fsu1")
+
+(define_insn_reservation "tsv110_crypto_complex" 5
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "crypto_sha1_slow,crypto_sha256_slow"))
+  "tsv110_fsu1")
+
+;; We lie with calls.  They take up all issue slots, but are otherwise
+;; not harmful.
+(define_insn_reservation "tsv110_call" 1
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "call"))
+  
"tsv110_alu1_issue+tsv110_alu2_issue+tsv110_alu3_issue+tsv110_fsu1_issue+tsv110_fsu2_issue\
++tsv110_mdu_issue+tsv110_ls1_issue+tsv110_ls2_issue"
+)
+
+;; Simple execution unit bypasses
+(define_bypass 1 "tsv110_alu"
+"tsv110_alu,tsv110_alu_shift")
+(define_bypass 2 "tsv110_alu_shift"
+"tsv110_alu,tsv110_alu_shift")
+
+;; An MLA or a MUL can feed a dependent MLA.
+(define_bypass 3 "tsv110_neon_*mla*,tsv110_neon_*mul*"
+"tsv110_neon_*mla*")
+
+;; We don't need to care about control hazards, either the branch is
+;; predicted in which case we pay no penalty, or the branch is
+;; mispredicted in which case instruction scheduling will be unlikely to
+;; help.
+(define_bypass 1 "tsv110_*"
+"tsv110_call,tsv110_branch")



-邮件原件-
发件人: James Greenhalgh [mailto:james.greenha...@arm.com] 
发送时间: 2019年1月18日 7:47
收件人: wuyuan (E) 
抄送: Kyrill Tkachov ; gcc-patches@gcc.gnu.org; 
Zhangyichao (AB) ; Zhanghaijian (A) 
; n...@arm.com
主题: Re: add tsv110 pipeline scheduling

On Mon, Jan 14, 2019 at 08:02:45AM -0600, wuyuan (E) wrote:
> Hi  Kyrill:
>  The gcc 7.3.0 does not discard the store1 and load1 command; I did 
> not expect the community's latest gcc changes so large .   
>  now I downloaded the latest GCC code, put the patch into GCC source 
> code, the compiler can pass, thank you very much for your work!
>   
> Best Regards,
>   
> wuyuan

Please check your modeling of Advanced SIMD operations.

> +(define_insn_reservation
> +  "tsv110_neon_ld4_lane" 9
> +  (and (eq_attr "tune" "tsv110")
> +   (eq_attr "type" "neon_load4_all_lanes,neon_load4_all_lanes_q,\
> +neon_load4_one_lane,neon_load4_one_lane_q"))
> +  "((tsv110_ls1*8)|(tsv110_ls2*8)|(tsv110_fsu1*8)|(tsv110_fsu2*8))")
> +

This model says you will reserve
 LS1 for 8 cycles,
  OR LS2 for 8 cycles,
  OR FSU1 for 8 cycles,
  OR FSU2 for 8 cycles.

> +(define_insn_reservation  "tsv110_neon_abd_aba" 4
> +  (and (eq_attr "tune" "tsv110")
> +   (eq_attr "type" "neon_abd,neon_arith_acc"))
> +  "tsv110_fsu1,tsv110_fsu2")

This model says you will reserve
   FSU1 for 1 cycle,
  THEN FSU2 for 1 cycle.

> +(define_insn_reservation  "tsv110_neon_abd_aba_q" 4
> +  (and (eq_attr "tune" "tsv110")
> +   (eq_attr "type" "neon_arith_acc_q"))
> +  "(tsv110_fsu1,tsv110_fsu2)+(tsv110_fsu1,tsv110_fsu2)")
> +

This model says you will reserve:
  FSU1 for 1 cycle,
 THEN FSU2 for 1 cycle
  AND
  FSU1 for 1 cycle,
 THEN FSU2 for 1 cycle

Which would be a redundant AND.

Is that how you intend to model these operations?

Remember,

If you are looking to model a 'THEN' relationship you can use the ',' operator, 
If you are looking to model an 'AND' relationship you can use the '+' operator, 
If you are looking to model an 'OR' relationship you can use the '|' operator.

Taking Cortex-A57 as an example:

> (define_insn_reservation
>   "cortex_a57_neon_load_d" 11
>   (and (eq_attr "tune" "cortexa57")
>(eq_attr "cortex_a57_neon_type" "neon_load_d"))
>   "ca57_cx1_issue+ca57_cx2_issue,
>ca57_ls_issue+ca57_ls_issue,ca57_ldr*2")

This model says you will reserve:

   CX1_ISSUE AND CX2_ISSUE,
  THEN LS_ISSUE AND LS_ISSUE,
  THEN LDR for 2 cycles.

Please let me know if you plan to update the model. If I have misunderstood 
your intentions, please accept my apologies.

Best Regards,
James Greenhalgh


> 
> 
>   * config/aarch64/aarch64-cores.def (tsv1100): Change scheduling model.
>   * config/aarch64/aarch64.md : Add "tsv110.md"
>   * config/aarch64/tsv110.md: New file.


tsv110_pipeline.patch
Description: tsv110_pipeline.patch


Re: add tsv110 pipeline scheduling

2019-01-17 Thread James Greenhalgh
On Mon, Jan 14, 2019 at 08:02:45AM -0600, wuyuan (E) wrote:
> Hi  Kyrill:
>  The gcc 7.3.0 does not discard the store1 and load1 command; I did 
> not expect the community's latest gcc changes so large .   
>  now I downloaded the latest GCC code, put the patch into GCC source 
> code, the compiler can pass, thank you very much for your work!
>   
> Best Regards,
>   
> wuyuan

Please check your modeling of Advanced SIMD operations.

> +(define_insn_reservation
> +  "tsv110_neon_ld4_lane" 9
> +  (and (eq_attr "tune" "tsv110")
> +   (eq_attr "type" "neon_load4_all_lanes,neon_load4_all_lanes_q,\
> +neon_load4_one_lane,neon_load4_one_lane_q"))
> +  "((tsv110_ls1*8)|(tsv110_ls2*8)|(tsv110_fsu1*8)|(tsv110_fsu2*8))")
> +

This model says you will reserve
 LS1 for 8 cycles,
  OR LS2 for 8 cycles,
  OR FSU1 for 8 cycles,
  OR FSU2 for 8 cycles.

> +(define_insn_reservation  "tsv110_neon_abd_aba" 4
> +  (and (eq_attr "tune" "tsv110")
> +   (eq_attr "type" "neon_abd,neon_arith_acc"))
> +  "tsv110_fsu1,tsv110_fsu2")

This model says you will reserve
   FSU1 for 1 cycle,
  THEN FSU2 for 1 cycle.

> +(define_insn_reservation  "tsv110_neon_abd_aba_q" 4
> +  (and (eq_attr "tune" "tsv110")
> +   (eq_attr "type" "neon_arith_acc_q"))
> +  "(tsv110_fsu1,tsv110_fsu2)+(tsv110_fsu1,tsv110_fsu2)")
> +

This model says you will reserve:
  FSU1 for 1 cycle,
 THEN FSU2 for 1 cycle
  AND
  FSU1 for 1 cycle,
 THEN FSU2 for 1 cycle

Which would be a redundant AND.

Is that how you intend to model these operations?

Remember,

If you are looking to model a 'THEN' relationship you can use the ',' operator,
If you are looking to model an 'AND' relationship you can use the '+' operator,
If you are looking to model an 'OR' relationship you can use the '|' operator.

Taking Cortex-A57 as an example:

> (define_insn_reservation
>   "cortex_a57_neon_load_d" 11
>   (and (eq_attr "tune" "cortexa57")
>(eq_attr "cortex_a57_neon_type" "neon_load_d"))
>   "ca57_cx1_issue+ca57_cx2_issue,
>ca57_ls_issue+ca57_ls_issue,ca57_ldr*2")

This model says you will reserve:

   CX1_ISSUE AND CX2_ISSUE,
  THEN LS_ISSUE AND LS_ISSUE,
  THEN LDR for 2 cycles.

Please let me know if you plan to update the model. If I have misunderstood
your intentions, please accept my apologies.

Best Regards,
James Greenhalgh


> 
> 
>   * config/aarch64/aarch64-cores.def (tsv1100): Change scheduling model.
>   * config/aarch64/aarch64.md : Add "tsv110.md"
>   * config/aarch64/tsv110.md: New file.


Re: add tsv110 pipeline scheduling

2019-01-14 Thread Kyrill Tkachov

Hi Wuyuan,

On 14/01/19 14:02, wuyuan (E) wrote:

Hi  Kyrill:
  The gcc 7.3.0 does not discard the store1 and load1 command; I did 
not expect the community's latest gcc changes so large .
  now I downloaded the latest GCC code, put the patch into GCC source 
code, the compiler can pass, thank you very much for your work!



For the future, please test the patches against the branch you plan to apply 
them to.
In this case, since you're submitting a trunk patch it needs to be applied and 
tested on trunk.

This latest version builds on trunk and looks ok to me but you'll need approval 
from the aarch64 maintainers to commit.
I've cc'ed them for you.

Thanks,
Kyrill



Best Regards,

wuyuan


   * config/aarch64/aarch64-cores.def (tsv1100): Change scheduling model.
   * config/aarch64/aarch64.md : Add "tsv110.md"
   * config/aarch64/tsv110.md: New file.

diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index 70b0766..085c40f 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -103,7 +103,7 @@ AARCH64_CORE("cortex-a76",  cortexa76, cortexa57, 8_2A,  
AARCH64_FL_FOR_ARCH8_2
  AARCH64_CORE("ares",  ares, cortexa57, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | 
AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD | AARCH64_FL_PROFILE, cortexa72, 
0x41, 0xd0c, -1)
  
  /* HiSilicon ('H') cores. */

-AARCH64_CORE("tsv110",  tsv110, cortexa57, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | 
AARCH64_FL_CRYPTO | AARCH64_FL_F16 | AARCH64_FL_AES | AARCH64_FL_SHA2, tsv110,   0x48, 
0xd01, -1)
+AARCH64_CORE("tsv110",  tsv110, tsv110, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | 
AARCH64_FL_CRYPTO | AARCH64_FL_F16 | AARCH64_FL_AES | AARCH64_FL_SHA2, tsv110,   0x48, 
0xd01, -1)
  
  /* ARMv8.4-A Architecture Processors.  */
  
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md

index 513aec1..97e0703 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -356,6 +356,7 @@
  (include "thunderx.md")
  (include "../arm/xgene1.md")
  (include "thunderx2t99.md")
+(include "tsv110.md")
  
  ;; ---

  ;; Jumps and other miscellaneous insns
diff --git a/gcc/config/aarch64/tsv110.md b/gcc/config/aarch64/tsv110.md
new file mode 100644
index 000..e33c5cc
--- /dev/null
+++ b/gcc/config/aarch64/tsv110.md
@@ -0,0 +1,708 @@
+;; tsv110 pipeline description
+;; Copyright (C) 2018 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+;;
+;; GCC is distributed in the hope that it will be useful, but
+;; WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+;; General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; .
+
+(define_automaton "tsv110")
+
+(define_attr "tsv110_neon_type"
+  "neon_arith_acc, neon_arith_acc_q,
+   neon_arith_basic, neon_arith_complex,
+   neon_reduc_add_acc, neon_multiply, neon_multiply_q,
+   neon_multiply_long, neon_mla, neon_mla_q, neon_mla_long,
+   neon_sat_mla_long, neon_shift_acc, neon_shift_imm_basic,
+   neon_shift_imm_complex,
+   neon_shift_reg_basic, neon_shift_reg_basic_q, neon_shift_reg_complex,
+   neon_shift_reg_complex_q, neon_fp_negabs, neon_fp_arith,
+   neon_fp_arith_q, neon_fp_reductions_q, neon_fp_cvt_int,
+   neon_fp_cvt_int_q, neon_fp_cvt16, neon_fp_minmax, neon_fp_mul,
+   neon_fp_mul_q, neon_fp_mla, neon_fp_mla_q, neon_fp_recpe_rsqrte,
+   neon_fp_recpe_rsqrte_q, neon_fp_recps_rsqrts, neon_fp_recps_rsqrts_q,
+   neon_bitops, neon_bitops_q, neon_from_gp,
+   neon_from_gp_q, neon_move, neon_tbl3_tbl4, neon_zip_q, neon_to_gp,
+   neon_load_a, neon_load_b, neon_load_c, neon_load_d, neon_load_e,
+   neon_load_f, neon_store_a, neon_store_b, neon_store_complex,
+   unknown"
+  (cond [
+ (eq_attr "type" "neon_arith_acc, neon_reduc_add_acc,\
+  neon_reduc_add_acc_q")
+   (const_string "neon_arith_acc")
+ (eq_attr "type" "neon_arith_acc_q")
+   (const_string "neon_arith_acc_q")
+ (eq_attr "type" "neon_abs,neon_abs_q,neon_add, neon_add_q, 
neon_add_long,\
+  neon_add_widen, neon_neg, neon_neg_q,\
+  neon_reduc_add, neon_reduc_add_q,\
+  neon_reduc_add_long, neon_sub, 

Re: add tsv110 pipeline scheduling

2019-01-14 Thread wuyuan (E)
Hi  Kyrill:
 The gcc 7.3.0 does not discard the store1 and load1 command; I did not 
expect the community's latest gcc changes so large .   
 now I downloaded the latest GCC code, put the patch into GCC source 
code, the compiler can pass, thank you very much for your work!

Best Regards,

wuyuan


  * config/aarch64/aarch64-cores.def (tsv1100): Change scheduling model.
  * config/aarch64/aarch64.md : Add "tsv110.md"
  * config/aarch64/tsv110.md: New file.

diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index 70b0766..085c40f 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -103,7 +103,7 @@ AARCH64_CORE("cortex-a76",  cortexa76, cortexa57, 8_2A,  
AARCH64_FL_FOR_ARCH8_2
 AARCH64_CORE("ares",  ares, cortexa57, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | 
AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD | AARCH64_FL_PROFILE, 
cortexa72, 0x41, 0xd0c, -1)
 
 /* HiSilicon ('H') cores. */
-AARCH64_CORE("tsv110",  tsv110, cortexa57, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | 
AARCH64_FL_CRYPTO | AARCH64_FL_F16 | AARCH64_FL_AES | AARCH64_FL_SHA2, tsv110,  
 0x48, 0xd01, -1)
+AARCH64_CORE("tsv110",  tsv110, tsv110, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | 
AARCH64_FL_CRYPTO | AARCH64_FL_F16 | AARCH64_FL_AES | AARCH64_FL_SHA2, tsv110,  
 0x48, 0xd01, -1)
 
 /* ARMv8.4-A Architecture Processors.  */
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 513aec1..97e0703 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -356,6 +356,7 @@
 (include "thunderx.md")
 (include "../arm/xgene1.md")
 (include "thunderx2t99.md")
+(include "tsv110.md")
 
 ;; ---
 ;; Jumps and other miscellaneous insns
diff --git a/gcc/config/aarch64/tsv110.md b/gcc/config/aarch64/tsv110.md
new file mode 100644
index 000..e33c5cc
--- /dev/null
+++ b/gcc/config/aarch64/tsv110.md
@@ -0,0 +1,708 @@
+;; tsv110 pipeline description
+;; Copyright (C) 2018 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+;;
+;; GCC is distributed in the hope that it will be useful, but
+;; WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+;; General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; .
+
+(define_automaton "tsv110")
+
+(define_attr "tsv110_neon_type"
+  "neon_arith_acc, neon_arith_acc_q,
+   neon_arith_basic, neon_arith_complex,
+   neon_reduc_add_acc, neon_multiply, neon_multiply_q,
+   neon_multiply_long, neon_mla, neon_mla_q, neon_mla_long,
+   neon_sat_mla_long, neon_shift_acc, neon_shift_imm_basic,
+   neon_shift_imm_complex,
+   neon_shift_reg_basic, neon_shift_reg_basic_q, neon_shift_reg_complex,
+   neon_shift_reg_complex_q, neon_fp_negabs, neon_fp_arith,
+   neon_fp_arith_q, neon_fp_reductions_q, neon_fp_cvt_int,
+   neon_fp_cvt_int_q, neon_fp_cvt16, neon_fp_minmax, neon_fp_mul,
+   neon_fp_mul_q, neon_fp_mla, neon_fp_mla_q, neon_fp_recpe_rsqrte,
+   neon_fp_recpe_rsqrte_q, neon_fp_recps_rsqrts, neon_fp_recps_rsqrts_q,
+   neon_bitops, neon_bitops_q, neon_from_gp,
+   neon_from_gp_q, neon_move, neon_tbl3_tbl4, neon_zip_q, neon_to_gp,
+   neon_load_a, neon_load_b, neon_load_c, neon_load_d, neon_load_e,
+   neon_load_f, neon_store_a, neon_store_b, neon_store_complex,
+   unknown"
+  (cond [
+ (eq_attr "type" "neon_arith_acc, neon_reduc_add_acc,\
+  neon_reduc_add_acc_q")
+   (const_string "neon_arith_acc")
+ (eq_attr "type" "neon_arith_acc_q")
+   (const_string "neon_arith_acc_q")
+ (eq_attr "type" "neon_abs,neon_abs_q,neon_add, neon_add_q, 
neon_add_long,\
+  neon_add_widen, neon_neg, neon_neg_q,\
+  neon_reduc_add, neon_reduc_add_q,\
+  neon_reduc_add_long, neon_sub, neon_sub_q,\
+  neon_sub_long, neon_sub_widen, neon_logic,\
+  neon_logic_q, neon_tst, neon_tst_q,\
+  neon_compare, neon_compare_q,\
+  neon_compare_zero, neon_compare_zero_q,\
+  neon_minmax, neon_minmax_q, neon_reduc_minmax,\
+  neon_reduc_minmax_q")
+   (const_string 

Re: add tsv110 pipeline scheduling

2019-01-14 Thread Kyrill Tkachov
+(define_insn_reservation
+  "tsv110_neon_ld2" 8
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "neon_load1_2reg,neon_load1_2reg_q,\
+  neon_load2_2reg,neon_load2_2reg_q,neon_load2_all_lanes,\
+  neon_load2_all_lanes_q,neon_load2_one_lane,neon_load2_one_lane_q"))
+  "((tsv110_ls1*4)|(tsv110_ls2*4)|(tsv110_fsu1*4)|(tsv110_fsu2*4))")
+
+(define_insn_reservation
+  "tsv110_neon_ld3" 9
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "neon_load3_3reg,neon_load3_3reg_q,\
+  neon_load3_one_lane,neon_load3_one_lane_q,\
+  neon_load3_all_lanes,neon_load3_all_lanes_q"))
+  "((tsv110_ls1*6)|(tsv110_ls2*6)|(tsv110_fsu1*6)|(tsv110_fsu2*6))")
+
+(define_insn_reservation
+  "tsv110_neon_ld4_lane" 9
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "neon_load4_all_lanes,neon_load4_all_lanes_q,\
+  neon_load4_one_lane,neon_load4_one_lane_q"))
+  "((tsv110_ls1*8)|(tsv110_ls2*8)|(tsv110_fsu1*8)|(tsv110_fsu2*8))")
+
+(define_insn_reservation
+  "tsv110_neon_ld4_reg" 11
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "neon_load4_all_lanes,neon_load4_all_lanes_q,\
+  neon_load4_one_lane,neon_load4_one_lane_q"))
+  "((tsv110_ls1*8)|(tsv110_ls2*8)|(tsv110_fsu1*8)|(tsv110_fsu2*8))")
+
+;; Store Instructions.
+
+(define_insn_reservation
+  "tsv110_neon_store_a" 0
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "tsv110_neon_type" "neon_store_a"))
+  "tsv110_fsu1|tsv110_fsu2")
+
+(define_insn_reservation
+  "tsv110_neon_store_b" 0
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "tsv110_neon_type" "neon_store_b"))
+  "(tsv110_fsu1|tsv110_fsu2)")
+
+;; These block issue for a number of cycles proportional to the number
+;; of 64-bit chunks they will store, we don't attempt to model that
+;; precisely, treat them as blocking execution for two cycles when
+;; issued.
+(define_insn_reservation
+  "tsv110_neon_store_complex" 0
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "tsv110_neon_type" "neon_store_complex"))
+  "tsv110_block*2")
+
+;; Floating-Point Operations.
+
+(define_insn_reservation "tsv110_fp_const" 2
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "fconsts,fconstd,fmov"))
+  "(tsv110_fsu1|tsv110_fsu2)")
+
+(define_insn_reservation "tsv110_fp_add_sub" 5
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "fadds,faddd,fmuls,fmuld"))
+  "(tsv110_fsu1|tsv110_fsu2)")
+
+(define_insn_reservation "tsv110_fp_mac" 7
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "fmacs,ffmas,fmacd,ffmad"))
+  "(tsv110_fsu1|tsv110_fsu2)")
+
+(define_insn_reservation "tsv110_fp_cvt" 3
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "f_cvt"))
+  "(tsv110_fsu1|tsv110_fsu2)")
+
+(define_insn_reservation "tsv110_fp_cvtf2i" 4
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "f_cvtf2i"))
+  "(tsv110_fsu1)")
+
+(define_insn_reservation "tsv110_fp_cvti2f" 4
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "f_cvti2f"))
+  "((tsv110_alu1*3)|(tsv110_fsu1*3)|(tsv110_fsu2*3))")
+
+(define_insn_reservation "tsv110_fp_cmp" 4
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "fcmps,fcmpd"))
+  "(tsv110_fsu1|tsv110_fsu2)")
+
+(define_insn_reservation "tsv110_fp_arith" 2
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "ffariths,ffarithd"))
+  "(tsv110_fsu1|tsv110_fsu2)")
+
+(define_insn_reservation "tsv110_fp_divs" 12
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "fdivs,neon_fp_div_s,fdivd,neon_fp_div_d,\
+  neon_fp_div_s_q,neon_fp_div_d_q"))
+  "(tsv110_fsu1*8)")
+
+(define_insn_reservation "tsv110_fp_sqrts" 12
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "fsqrts,neon_fp_sqrt_s,fsqrtd,neon_fp_sqrt_d,\
+  neon_fp_sqrt_s_q,neon_fp_sqrt_d_q"))
+  "(tsv110_fsu2*8)")
+
+(define_insn_reservation "tsv110_crypto_aes" 3
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "crypto_aese,crypto_aesmc"))
+  "tsv110_fsu1&qu

Re: add tsv110 pipeline scheduling

2019-01-08 Thread Kyrill Tkachov
ad1_2reg_q,\
+ neon_load2_2reg,neon_load2_2reg_q,neon_load2_all_lanes,\
+ neon_load2_all_lanes_q,neon_load2_one_lane,neon_load2_one_lane_q"))
+ "((tsv110_ls1*4)|(tsv110_ls2*4)|(tsv110_fsu1*4)|(tsv110_fsu2*4))")
+
+(define_insn_reservation
+  "tsv110_neon_ld3" 9
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "neon_load3_3reg,neon_load3_3reg_q,\
+  neon_load3_one_lane,neon_load3_one_lane_q,\
+  neon_load3_all_lanes,neon_load3_all_lanes_q"))
+ "((tsv110_ls1*6)|(tsv110_ls2*6)|(tsv110_fsu1*6)|(tsv110_fsu2*6))")
+
+(define_insn_reservation
+  "tsv110_neon_ld4_lane" 9
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "neon_load4_all_lanes,neon_load4_all_lanes_q,\
+  neon_load4_one_lane,neon_load4_one_lane_q"))
+ "((tsv110_ls1*8)|(tsv110_ls2*8)|(tsv110_fsu1*8)|(tsv110_fsu2*8))")
+
+(define_insn_reservation
+  "tsv110_neon_ld4_reg" 11
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "neon_load4_all_lanes,neon_load4_all_lanes_q,\
+  neon_load4_one_lane,neon_load4_one_lane_q"))
+
+"((tsv110_ls1*16)|(tsv110_ls2*16)|(tsv110_fsu1*16)|(tsv110_fsu2*16))")


With the above two bugs fixed I am concerned that this automaton is much larger 
than other automata in config/aarch64.
This hurts GCC compile time and memory requirements. We've had bug reports in 
the past where people were not able to build
GCC on memory-constrained systems due to these issues.
You can check the size of the generated automata during build time by adding 
(automata_option "stats") to your .md file.
With this, the tsv110 automaton size is 38017 states, more than 5x the size of 
the next largest automaton (cortex_a53_advsimd).

This is usually due to unnecessarily large reservation durations (the *16 part 
above) on long-running instructions such as divisions (integer and 
floating-point)
and ld4 instructions, such as this one. If you use only a maximum of 8 in the 
reservation duration here, and in the division instructions you get a much
smaller automaton size (I see 7681 states if I change it to 8 here and in 
tsv110_div, tsv110_fp_sqrts and tsv110_fp_divs).

Because 8 cycles is such a large scheduling window anyway, it is unlikely that 
modelling the full 16 cycles will give any benefit in real world code.
That has been our experience in the past.

So I recommend you modify the model to use only a maximum of 8 in its 
reservation durations.

Hope this helps,
Kyrill


+
+;; Store Instructions.
+
+(define_insn_reservation
+  "tsv110_neon_store_a" 0
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "tsv110_neon_type" "neon_store_a"))
+  "tsv110_fsu1|tsv110_fsu2")
+
+(define_insn_reservation
+  "tsv110_neon_store_b" 0
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "tsv110_neon_type" "neon_store_b"))
+  "(tsv110_fsu1|tsv110_fsu2)")
+
+;; These block issue for a number of cycles proportional to the number
+;; of 64-bit chunks they will store, we don't attempt to model that ;;
+precisely, treat them as blocking execution for two cycles when ;;
+issued.
+(define_insn_reservation
+  "tsv110_neon_store_complex" 0
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "tsv110_neon_type" "neon_store_complex"))
+  "tsv110_block*2")
+
+;; Floating-Point Operations.
+
+(define_insn_reservation "tsv110_fp_const" 2
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "fconsts,fconstd,fmov"))
+  "(tsv110_fsu1|tsv110_fsu2)")
+
+(define_insn_reservation "tsv110_fp_add_sub" 5
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "fadds,faddd,fmuls,fmuld"))
+  "(tsv110_fsu1|tsv110_fsu2)")
+
+(define_insn_reservation "tsv110_fp_mac" 7
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "fmacs,ffmas,fmacd,ffmad"))
+  "(tsv110_fsu1|tsv110_fsu2)")
+
+(define_insn_reservation "tsv110_fp_cvt" 3
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "f_cvt"))
+  "(tsv110_fsu1|tsv110_fsu2)")
+
+(define_insn_reservation "tsv110_fp_cvtf2i" 4
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "f_cvtf2i"))
+  "(tsv110_fsu1)")
+
+(define_insn_reservation "tsv110_fp_cvti2f" 4
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "f_cvti2f"))
+  "((tsv110_alu1*3)|(tsv110_fsu1*3)|(tsv110_fsu2*3))")
+
+(define_insn_reservation "tsv110_fp_cmp" 4
+  (and (eq_attr "tune&q

Re: add tsv110 pipeline scheduling

2019-01-08 Thread wuyuan (E)
q,neon_load2_all_lanes,\
+  neon_load2_all_lanes_q,neon_load2_one_lane,neon_load2_one_lane_q"))
+  "((tsv110_ls1*4)|(tsv110_ls2*4)|(tsv110_fsu1*4)|(tsv110_fsu2*4))")
+
+(define_insn_reservation
+  "tsv110_neon_ld3" 9
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "neon_load3_3reg,neon_load3_3reg_q,\
+  neon_load3_one_lane,neon_load3_one_lane_q,\
+  neon_load3_all_lanes,neon_load3_all_lanes_q"))
+  "((tsv110_ls1*6)|(tsv110_ls2*6)|(tsv110_fsu1*6)|(tsv110_fsu2*6))")
+
+(define_insn_reservation
+  "tsv110_neon_ld4_lane" 9
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "neon_load4_all_lanes,neon_load4_all_lanes_q,\
+  neon_load4_one_lane,neon_load4_one_lane_q"))
+  "((tsv110_ls1*8)|(tsv110_ls2*8)|(tsv110_fsu1*8)|(tsv110_fsu2*8))")
+
+(define_insn_reservation
+  "tsv110_neon_ld4_reg" 11
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "neon_load4_all_lanes,neon_load4_all_lanes_q,\
+  neon_load4_one_lane,neon_load4_one_lane_q"))
+  
+"((tsv110_ls1*16)|(tsv110_ls2*16)|(tsv110_fsu1*16)|(tsv110_fsu2*16))")
+
+;; Store Instructions.
+
+(define_insn_reservation
+  "tsv110_neon_store_a" 0
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "tsv110_neon_type" "neon_store_a"))
+  "tsv110_fsu1|tsv110_fsu2")
+
+(define_insn_reservation
+  "tsv110_neon_store_b" 0
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "tsv110_neon_type" "neon_store_b"))
+  "(tsv110_fsu1|tsv110_fsu2)")
+
+;; These block issue for a number of cycles proportional to the number 
+;; of 64-bit chunks they will store, we don't attempt to model that ;; 
+precisely, treat them as blocking execution for two cycles when ;; 
+issued.
+(define_insn_reservation
+  "tsv110_neon_store_complex" 0
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "tsv110_neon_type" "neon_store_complex"))
+  "tsv110_block*2")
+
+;; Floating-Point Operations.
+
+(define_insn_reservation "tsv110_fp_const" 2
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "fconsts,fconstd,fmov"))
+  "(tsv110_fsu1|tsv110_fsu2)")
+
+(define_insn_reservation "tsv110_fp_add_sub" 5
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "fadds,faddd,fmuls,fmuld"))
+  "(tsv110_fsu1|tsv110_fsu2)")
+
+(define_insn_reservation "tsv110_fp_mac" 7
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "fmacs,ffmas,fmacd,ffmad"))
+  "(tsv110_fsu1|tsv110_fsu2)")
+
+(define_insn_reservation "tsv110_fp_cvt" 3
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "f_cvt"))
+  "(tsv110_fsu1|tsv110_fsu2)")
+
+(define_insn_reservation "tsv110_fp_cvtf2i" 4
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "f_cvtf2i"))
+  "(tsv110_fsu1)")
+
+(define_insn_reservation "tsv110_fp_cvti2f" 4
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "f_cvti2f"))
+  "((tsv110_alu1*3)|(tsv110_fsu1*3)|(tsv110_fsu2*3))")
+
+(define_insn_reservation "tsv110_fp_cmp" 4
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "fcmps,fcmpd"))
+  "(tsv110_fsu1|tsv110_fsu2)")
+
+(define_insn_reservation "tsv110_fp_arith" 2
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "ffariths,ffarithd"))
+  "(tsv110_fsu1|tsv110_fsu2)")
+
+(define_insn_reservation "tsv110_fp_divs" 12
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "fdivs,neon_fp_div_s,fdivd,neon_fp_div_d,\
+  neon_fp_div_s_q,neon_fp_div_d_q"))
+  "(tsv110_fsu1*12)")
+
+(define_insn_reservation "tsv110_fp_sqrts" 12
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "fsqrts,neon_fp_sqrt_s,fsqrtd,neon_fp_sqrt_d,\
+  neon_fp_sqrt_s_q,neon_fp_sqrt_d_q"))
+  "(tsv110_fsu2*12)")
+
+(define_insn_reservation "tsv110_crypto_aes" 3
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "crypto_aese,crypto_aesmc"))
+  "tsv110_fsu1")
+  
+(define_insn_reservation "tsv110_crypto_sha1_fast" 2
+  (and (eq_attr "tune" "tsv110")
+   (eq_attr "type" "crypto_sha1_fast,crypto_sha1_xor"

Re: add tsv110 pipeline scheduling

2019-01-02 Thread wuyuan (E)
Hi , guys
  Happy new year! 
  On the 20th of last month, I submitted a tsv110 pipeline patch. I 
want to know if you have received  it. Looking forward to your reply.


 
Best Regards,


   
wuyuan
   



   

-邮件原件-
发件人: wuyuan (E) 
发送时间: 2018年12月20日 14:06
收件人: 'Ramana Radhakrishnan' ; 
'gcc-patches@gcc.gnu.org' 
抄送: Zhanghaijian (A) ; Zhangyichao (AB) 
; Yangfei (Felix) ; 
'ni...@redhat.com' ; 'Richard Earnshaw' 
; 'Kyrylo Tkachov' ; 'nd' 
; Zhangshaokun 
主题: Re: add tsv110 pipeline scheduling


Hi Ramana,
 Please ignore the patch in the previous email attachment (the 
ChangeLog has deleted in this patch..)  I have already communicated with Shao 
Kun, he has fixed the problem of the previous patch. So I resubmitted the 
tsv110 pipeline patch, please review.
 The patch  as follows :



2018-12-20   wuyuan  

* config/aarch64/aarch64-cores.def: New CPU.
* config/aarch64/aarch64.md : Add "tsv110.md"
* config/aarch64/tsv110.md : tsv110.md   new file






diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
old mode 100644
new mode 100755
index 20f4924..ea9b7c5
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -97,7 +97,7 @@ AARCH64_CORE("cortex-a76",  cortexa76, cortexa57, 8_2A,  
AARCH64_FL_FOR_ARCH8_2  AARCH64_CORE("ares",  ares, cortexa57, 8_2A,  
AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD 
| AARCH64_FL_PROFILE, cortexa72, 0x41, 0xd0c, -1)
 
 /* HiSilicon ('H') cores. */
-AARCH64_CORE("tsv110",  tsv110, cortexa57, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | 
AARCH64_FL_CRYPTO | AARCH64_FL_F16 | AARCH64_FL_AES | AARCH64_FL_SHA2, tsv110,  
 0x48, 0xd01, -1)
+AARCH64_CORE("tsv110",  tsv110, tsv110, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | 
AARCH64_FL_CRYPTO | AARCH64_FL_F16 | AARCH64_FL_AES | AARCH64_FL_SHA2, tsv110,  
 0x48, 0xd01, -1)
 
 /* ARMv8.4-A Architecture Processors.  */
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md old 
mode 100644 new mode 100755 index cf2732e..7f7673a
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -349,6 +349,7 @@
 (include "thunderx.md")
 (include "../arm/xgene1.md")
 (include "thunderx2t99.md")
+(include "tsv110.md")
 
 ;; ---
 ;; Jumps and other miscellaneous insns
diff --git a/gcc/config/aarch64/tsv110.md b/gcc/config/aarch64/tsv110.md new 
file mode 100644 index 000..758ab95
--- /dev/null
+++ b/gcc/config/aarch64/tsv110.md
@@ -0,0 +1,708 @@
+;; tsv110 pipeline description
+;; Copyright (C) 2018 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it ;; 
+under the terms of the GNU General Public License as published by ;; 
+the Free Software Foundation; either version 3, or (at your option) ;; 
+any later version.
+;;
+;; GCC is distributed in the hope that it will be useful, but ;; 
+WITHOUT ANY WARRANTY; without even the implied warranty of ;; 
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU ;; 
+General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License ;; 
+along with GCC; see the file COPYING3.  If not see ;; 
+<http://www.gnu.org/licenses/>.
+
+(define_automaton "tsv110")
+
+(define_attr "tsv110_neon_type"
+  "neon_arith_acc, neon_arith_acc_q,
+   neon_arith_basic, neon_arith_complex,
+   neon_reduc_add_acc, neon_multiply, neon_multiply_q,
+   neon_multiply_long, neon_mla, neon_mla_q, neon_mla_long,
+   neon_sat_mla_long, neon_shift_acc, neon_shift_imm_basic,
+   neon_shift_imm_complex,
+   neon_shift_reg_basic, neon_shift_reg_basic_q, neon_shift_reg_complex,
+   neon_shift_reg_complex_q, neon_fp_negabs, neon_fp_arith,
+   neon_fp_arith_q, neon_fp_reductions_q, neon_fp_cvt_int,
+   neon_fp_cvt_int_q, neon_fp_cvt16, neon_fp_minmax, neon_fp_mul,
+   neon_fp_mul_q, neon_fp_mla, neon_fp_mla_q, neon_fp_recpe_rsqrte,
+   neon_fp_recpe_rsqrte_q, neon_fp_recps_rsqrts, neon_fp_recps_rsqrts_q,
+   neon_bitops, neon_bi

Re: add tsv110 pipeline scheduling

2018-12-19 Thread wuyuan (E)

Hi Ramana,
 Please ignore the patch in the previous email attachment (the 
ChangeLog has deleted in this patch..)  I have already communicated with Shao 
Kun, he has fixed the problem of the previous patch. So I resubmitted the 
tsv110 pipeline patch, please review.
 The patch  as follows :



2018-12-20   wuyuan  

* config/aarch64/aarch64-cores.def: New CPU.
* config/aarch64/aarch64.md : Add "tsv110.md"
* config/aarch64/tsv110.md : tsv110.md   new file






diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
old mode 100644
new mode 100755
index 20f4924..ea9b7c5
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -97,7 +97,7 @@ AARCH64_CORE("cortex-a76",  cortexa76, cortexa57, 8_2A,  
AARCH64_FL_FOR_ARCH8_2
 AARCH64_CORE("ares",  ares, cortexa57, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | 
AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD | AARCH64_FL_PROFILE, 
cortexa72, 0x41, 0xd0c, -1)
 
 /* HiSilicon ('H') cores. */
-AARCH64_CORE("tsv110",  tsv110, cortexa57, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | 
AARCH64_FL_CRYPTO | AARCH64_FL_F16 | AARCH64_FL_AES | AARCH64_FL_SHA2, tsv110,  
 0x48, 0xd01, -1)
+AARCH64_CORE("tsv110",  tsv110, tsv110, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | 
AARCH64_FL_CRYPTO | AARCH64_FL_F16 | AARCH64_FL_AES | AARCH64_FL_SHA2, tsv110,  
 0x48, 0xd01, -1)
 
 /* ARMv8.4-A Architecture Processors.  */
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
old mode 100644
new mode 100755
index cf2732e..7f7673a
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -349,6 +349,7 @@
 (include "thunderx.md")
 (include "../arm/xgene1.md")
 (include "thunderx2t99.md")
+(include "tsv110.md")
 
 ;; ---
 ;; Jumps and other miscellaneous insns
diff --git a/gcc/config/aarch64/tsv110.md b/gcc/config/aarch64/tsv110.md
new file mode 100644
index 000..758ab95
--- /dev/null
+++ b/gcc/config/aarch64/tsv110.md
@@ -0,0 +1,708 @@
+;; tsv110 pipeline description
+;; Copyright (C) 2018 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+;;
+;; GCC is distributed in the hope that it will be useful, but
+;; WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+;; General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; .
+
+(define_automaton "tsv110")
+
+(define_attr "tsv110_neon_type"
+  "neon_arith_acc, neon_arith_acc_q,
+   neon_arith_basic, neon_arith_complex,
+   neon_reduc_add_acc, neon_multiply, neon_multiply_q,
+   neon_multiply_long, neon_mla, neon_mla_q, neon_mla_long,
+   neon_sat_mla_long, neon_shift_acc, neon_shift_imm_basic,
+   neon_shift_imm_complex,
+   neon_shift_reg_basic, neon_shift_reg_basic_q, neon_shift_reg_complex,
+   neon_shift_reg_complex_q, neon_fp_negabs, neon_fp_arith,
+   neon_fp_arith_q, neon_fp_reductions_q, neon_fp_cvt_int,
+   neon_fp_cvt_int_q, neon_fp_cvt16, neon_fp_minmax, neon_fp_mul,
+   neon_fp_mul_q, neon_fp_mla, neon_fp_mla_q, neon_fp_recpe_rsqrte,
+   neon_fp_recpe_rsqrte_q, neon_fp_recps_rsqrts, neon_fp_recps_rsqrts_q,
+   neon_bitops, neon_bitops_q, neon_from_gp,
+   neon_from_gp_q, neon_move, neon_tbl3_tbl4, neon_zip_q, neon_to_gp,
+   neon_load_a, neon_load_b, neon_load_c, neon_load_d, neon_load_e,
+   neon_load_f, neon_store_a, neon_store_b, neon_store_complex,
+   unknown"
+  (cond [
+ (eq_attr "type" "neon_arith_acc, neon_reduc_add_acc,\
+  neon_reduc_add_acc_q")
+   (const_string "neon_arith_acc")
+ (eq_attr "type" "neon_arith_acc_q")
+   (const_string "neon_arith_acc_q")
+ (eq_attr "type" "neon_abs,neon_abs_q,neon_add, neon_add_q, 
neon_add_long,\
+  neon_add_widen, neon_neg, neon_neg_q,\
+  neon_reduc_add, neon_reduc_add_q,\
+  neon_reduc_add_long, neon_sub, neon_sub_q,\
+  neon_sub_long, neon_sub_widen, neon_logic,\
+  neon_logic_q, neon_tst, neon_tst_q,\
+  neon_compare, neon_compare_q,\
+  neon_compare_zero, neon_compare_zero_q,\
+  neon_minmax, neon_minmax_q, neon_reduc_minmax,\
+  neon_reduc_minmax_q")
+   (const_string "neon_arith_basic")
+ (eq_attr "type" "neon_add_halve_narrow_q,\
+  neon_add_halve, neon_add_halve_q,\
+  neon_sub_halve, 

Re: add tsv110 pipeline scheduling

2018-12-19 Thread wuyuan (E)
Hi Ramana,
  I have already communicated with Shao Kun, he has fixed the problem 
of the previous patch. So I resubmitted the tsv 110 pipeline patch, please 
review.
 The patch  as follows :

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
old mode 100644
new mode 100755
index b1eed3b..5611dd0
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,9 @@
+2018-12-20  wuyuan
+
+   * config/aarch64/aarch64-cores.def: New CPU.
+   * config/aarch64/aarch64.md : Add "tsv110.md"
+   * config/aarch64/tsv110.md : tsv110.md   new file
+
 2018-12-20  Alan Modra  
 
* config/rs6000/sysv4.h (GNU_USER_DYNAMIC_LINKER): Define.
diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
old mode 100644
new mode 100755
index 20f4924..ea9b7c5
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -97,7 +97,7 @@ AARCH64_CORE("cortex-a76",  cortexa76, cortexa57, 8_2A,  
AARCH64_FL_FOR_ARCH8_2
 AARCH64_CORE("ares",  ares, cortexa57, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | 
AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD | AARCH64_FL_PROFILE, 
cortexa72, 0x41, 0xd0c, -1)
 
 /* HiSilicon ('H') cores. */
-AARCH64_CORE("tsv110",  tsv110, cortexa57, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | 
AARCH64_FL_CRYPTO | AARCH64_FL_F16 | AARCH64_FL_AES | AARCH64_FL_SHA2, tsv110,  
 0x48, 0xd01, -1)
+AARCH64_CORE("tsv110",  tsv110, tsv110, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | 
AARCH64_FL_CRYPTO | AARCH64_FL_F16 | AARCH64_FL_AES | AARCH64_FL_SHA2, tsv110,  
 0x48, 0xd01, -1)
 
 /* ARMv8.4-A Architecture Processors.  */
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
old mode 100644
new mode 100755
index cf2732e..7f7673a
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -349,6 +349,7 @@
 (include "thunderx.md")
 (include "../arm/xgene1.md")
 (include "thunderx2t99.md")
+(include "tsv110.md")
 
 ;; ---
 ;; Jumps and other miscellaneous insns
diff --git a/gcc/config/aarch64/tsv110.md b/gcc/config/aarch64/tsv110.md
new file mode 100644
index 000..758ab95
--- /dev/null
+++ b/gcc/config/aarch64/tsv110.md
@@ -0,0 +1,708 @@
+;; tsv110 pipeline description
+;; Copyright (C) 2018 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+;;
+;; GCC is distributed in the hope that it will be useful, but
+;; WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+;; General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; .
+
+(define_automaton "tsv110")
+
+(define_attr "tsv110_neon_type"
+  "neon_arith_acc, neon_arith_acc_q,
+   neon_arith_basic, neon_arith_complex,
+   neon_reduc_add_acc, neon_multiply, neon_multiply_q,
+   neon_multiply_long, neon_mla, neon_mla_q, neon_mla_long,
+   neon_sat_mla_long, neon_shift_acc, neon_shift_imm_basic,
+   neon_shift_imm_complex,
+   neon_shift_reg_basic, neon_shift_reg_basic_q, neon_shift_reg_complex,
+   neon_shift_reg_complex_q, neon_fp_negabs, neon_fp_arith,
+   neon_fp_arith_q, neon_fp_reductions_q, neon_fp_cvt_int,
+   neon_fp_cvt_int_q, neon_fp_cvt16, neon_fp_minmax, neon_fp_mul,
+   neon_fp_mul_q, neon_fp_mla, neon_fp_mla_q, neon_fp_recpe_rsqrte,
+   neon_fp_recpe_rsqrte_q, neon_fp_recps_rsqrts, neon_fp_recps_rsqrts_q,
+   neon_bitops, neon_bitops_q, neon_from_gp,
+   neon_from_gp_q, neon_move, neon_tbl3_tbl4, neon_zip_q, neon_to_gp,
+   neon_load_a, neon_load_b, neon_load_c, neon_load_d, neon_load_e,
+   neon_load_f, neon_store_a, neon_store_b, neon_store_complex,
+   unknown"
+  (cond [
+ (eq_attr "type" "neon_arith_acc, neon_reduc_add_acc,\
+  neon_reduc_add_acc_q")
+   (const_string "neon_arith_acc")
+ (eq_attr "type" "neon_arith_acc_q")
+   (const_string "neon_arith_acc_q")
+ (eq_attr "type" "neon_abs,neon_abs_q,neon_add, neon_add_q, 
neon_add_long,\
+  neon_add_widen, neon_neg, neon_neg_q,\
+  neon_reduc_add, neon_reduc_add_q,\
+  neon_reduc_add_long, neon_sub, neon_sub_q,\
+  neon_sub_long, neon_sub_widen, neon_logic,\
+  neon_logic_q, neon_tst, neon_tst_q,\
+  neon_compare, neon_compare_q,\
+  neon_compare_zero, neon_compare_zero_q,\
+  neon_minmax, neon_minmax_q, neon_reduc_minmax,\
+  neon_reduc_minmax_q")
+   (const_string "neon_arith_basic")
+ (eq_attr "type"