https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79742
Bug ID: 79742 Summary: ARM sched pipeline selection problems Product: gcc Version: 7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: wilson at gcc dot gnu.org Target Milestone: --- I noticed an arm port scheduler problem. I believe it was introduced by Richard Earnshaw's patch to add the arm-cpus.in file. There are also some apparent latent problems exposed by that patch. The main problem is that the "tune for" lines for cpu entries are not working. In the parsecpu.awk script, cpu_tune_for is set, but never used. This means that cpu entries that use tune for lines are not using the desired scheduler pipeline. With the testcase int sub1 (int i, int j) { return i + j; } float sub2 (float i, float j) { return i + j; } Compiling with ./xgcc -B./ -O2 -mcpu=cortex-a12 -fsched-verbose=9 -fdump-rtl-sched1-all -S tmp.c and looking at the sched1 dump file, I see for the int add ;; | 12 | 0 | r0=r0+r1 nothing and for the float add I see ;; | 12 | 0 | s0=s0+s1 fmac So this is using no pipeline for integer code, and the vfp11.md pipeline for float code. This should be using the cortex-a17 pipline. If I compile instead with -mcpu=cortex-a72, I see for the int add ;; | 12 | 0 | r0=r0+r1 core and for the float add I see ;; | 12 | 0 | s0=s0+s1 core*32 So this is using the arm-generic pipeline for both the int and float add. This should be using the cortex-a57 pipeline. Both of these work correctly in GCC 6. I see two apparent latent issues here. The generic_sched and generic_vfp attributes in the arm.md file haven't been updated properly and some targets may be using them accidentally. All targets with their own pipeline should be added to the exclusion lists for these attributes. Also, there is a conflict between generic_sched and generic_vfp pipelines. arm-generic.md has a rule to match multi cycle instructions, which also happens to match FP instructions. So it is not possible to use both generic_sched and generic_vfp, as FP instructions will use the multi cycle rule in arm-generic.md instead of the appropriate rule in vfp11.md. This can be seen in the above results, where compiling for cortex-a12 uses the fmac rule from the vfp11.md file for a float add because it is in the exclusion list for generic_sched. But compiling for cortex-a72 uses the core*32 rule from the arm-generic.md file for a float add because it is not in the exclusion list for generic_sched. Of course cortex-a12 and cortex-a72 should not be using the generic rules, but it does appear that there are some targets that are using both generic_sched and generic_vfp, such as the armv6 parts like mpcore.