[Bug target/99092] Using -O3 and -fprefetch-loop-arrays to compile BLAS on Apple M1 fails
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99092 Andrew Pinski changed: What|Removed |Added See Also||https://github.com/llvm/llv ||m-project/issues/83226 --- Comment #16 from Andrew Pinski --- Filed it as https://github.com/llvm/llvm-project/issues/83226 .
[Bug target/99092] Using -O3 and -fprefetch-loop-arrays to compile BLAS on Apple M1 fails
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99092 --- Comment #15 from Andrew Pinski --- (In reply to Iain Sandoe from comment #14) > (In reply to Andrew Pinski from comment #13) > > Did the LLVM assembler get fixed? > > not as of xcode 13.0 (I don't know if anyone filed a radar tho) - since the > problem was fixed on the branch, I guess no-one was motivated. and it is still a bug in the upstream LLVM too; just checked. Will file a bug there soon.
[Bug target/99092] Using -O3 and -fprefetch-loop-arrays to compile BLAS on Apple M1 fails
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99092 --- Comment #14 from Iain Sandoe --- (In reply to Andrew Pinski from comment #13) > Did the LLVM assembler get fixed? not as of xcode 13.0 (I don't know if anyone filed a radar tho) - since the problem was fixed on the branch, I guess no-one was motivated.
[Bug target/99092] Using -O3 and -fprefetch-loop-arrays to compile BLAS on Apple M1 fails
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99092 --- Comment #13 from Andrew Pinski --- Did the LLVM assembler get fixed?
[Bug target/99092] Using -O3 and -fprefetch-loop-arrays to compile BLAS on Apple M1 fails
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99092 Iain Sandoe changed: What|Removed |Added Ever confirmed|0 |1 Status|UNCONFIRMED |SUSPENDED Last reconfirmed||2021-02-20 --- Comment #12 from Iain Sandoe --- I added an issue to the experimental branch : https://github.com/iains/gcc-darwin-arm64/issues/43 And produced two patches to work around the issue (although the first should tighten up the constraint on prf*m for all targets). -- The first patch is a conservative fix, it just prevents the generation of pfrm insns when the offset is out of range (and when it would require pfrum for Darwin) https://github.com/iains/gcc-darwin-arm64/commit/2fbd9a7f9cddc7e243c0025713841e0bc1465c41 The second patch adds predicate, constraint and patterns for the prfum insn, which means that Darwin now generates: prfum [X0, -8] which is accepted by the LLVM backend, https://github.com/iains/gcc-darwin-arm64/commit/881a59f2258a5a7a9c2c862420c4e93e9df17f2c Given some more time, I expect that the two could be combined in some way; at least unless/until LLVM gets a fix and that percolates through to Xcode. So the bug is "fixed on the experimental branch". Given that it cannot be fixed on GCC 'upstream' until we have a chance to submit the port (which isn't ready yet!) .. I suggest that "SUSPEND" is a reasonable state for this bug.
[Bug target/99092] Using -O3 and -fprefetch-loop-arrays to compile BLAS on Apple M1 fails
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99092 --- Comment #11 from Iain Sandoe --- (In reply to Andrew Pinski from comment #10) > From the ARM ARM: > An assembler program translating a Load/Store instruction, for example LDR, > is required to encode an unambiguous offset using the unscaled 9-bit offset > form, and to encode an ambiguous offset using the scaled 12-bit offset form. > A programmer might force the generation of the unscaled 9-bit form by using > one of the mnemonics in Table C3-17. Arm recommends that a disassembler > outputs all unscaled 9-bit offset forms using one of these mnemonics, but > unambiguous offsets can be output using a Load/Store single register > mnemonic, for example, LDR. it would be nice if that applied to a 'generic' version of the insn (one might read the advice as so): prf PLDL1KEEP, [x0, 200] ===> assembler chooses prfm/prfum as it likes prfm PLDL1KEEP, [x0, 200] --> use the insn I wrote! prfm PLDL1KEEP, [x0, -8] --> .. or error if I'm dumb prfum PLDL1KEEP, [x0, 200] --> use the insn I wrote! prfum PLDL1KEEP, [x0, 4096] --> .. or error if I'm dumb but I guess we have to live with the status quo.
[Bug target/99092] Using -O3 and -fprefetch-loop-arrays to compile BLAS on Apple M1 fails
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99092 --- Comment #10 from Andrew Pinski --- >From the ARM ARM: An assembler program translating a Load/Store instruction, for example LDR, is required to encode an unambiguous offset using the unscaled 9-bit offset form, and to encode an ambiguous offset using the scaled 12-bit offset form. A programmer might force the generation of the unscaled 9-bit form by using one of the mnemonics in Table C3-17. Arm recommends that a disassembler outputs all unscaled 9-bit offset forms using one of these mnemonics, but unambiguous offsets can be output using a Load/Store single register mnemonic, for example, LDR.
[Bug target/99092] Using -O3 and -fprefetch-loop-arrays to compile BLAS on Apple M1 fails
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99092 --- Comment #9 from Andrew Pinski --- hmmm, see https://gcc.gnu.org/legacy-ml/gcc-patches/2014-07/msg00612.html : "When it comes to emitting the pattern, always use "prfm" -- the prfum form can be generated from the prfm mnemonic when the offset implies this is necessary." >From readin the ARM ARM, it does look like the prfm mnemonic should accept the unscaled 9bit signed value. Just like how ldr vs ldur. So the bug is in LLVM assembler I think.
[Bug target/99092] Using -O3 and -fprefetch-loop-arrays to compile BLAS on Apple M1 fails
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99092 --- Comment #8 from Iain Sandoe --- it seems that GAS is accepting an encoding that's not specified in at least version DDI0487Fc_armv8_arm. that says that C6.2.212 PRFM (immediate) takes " Is the optional positive immediate byte offset, a multiple of 8 in the range 0 to 32760, defaulting to 0 and encoded in the "imm12" field as /8." = and C6.2.215 PRFUM " Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0 and encoded in the "imm9" field." === so probably the bug is present for all targets, not just Darwin - it just happens to show there. FWIW, the encoding is shown thus: PRFM (|#), [{, #}] So LLVM might well also reject it without the '#' (I have encountered at least one case before where that happened).
[Bug target/99092] Using -O3 and -fprefetch-loop-arrays to compile BLAS on Apple M1 fails
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99092 --- Comment #7 from Jeff Hammond --- @Martin % gfortran -O3 -fprefetch-loop-arrays --verbose -c ctrsm.f && echo OKAY Using built-in specs. COLLECT_GCC=gfortran Target: aarch64-apple-darwin20 Configured with: ../configure --build=aarch64-apple-darwin20 --prefix=/opt/homebrew/Cellar/gcc/10.2.0_3 --libdir=/opt/homebrew/Cellar/gcc/10.2.0_3/lib/gcc/10 --disable-nls --enable-checking=release --enable-languages=c,c++,objc,obj-c++,fortran --program-suffix=-10 --with-gmp=/opt/homebrew/opt/gmp --with-mpfr=/opt/homebrew/opt/mpfr --with-mpc=/opt/homebrew/opt/libmpc --with-isl=/opt/homebrew/opt/isl --with-system-zlib --with-pkgversion='Homebrew GCC 10.2.0_3' --with-bugurl=https://github.com/Homebrew/homebrew-core/issues --disable-multilib --with-native-system-header-dir=/usr/include --with-sysroot=/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk SED=/usr/bin/sed Thread model: posix Supported LTO compression algorithms: zlib gcc version 10.2.1 20201220 (Homebrew GCC 10.2.0_3) COLLECT_GCC_OPTIONS='-O3' '-fprefetch-loop-arrays' '-v' '-c' '-mmacosx-version-min=11.2.0' '-asm_macosx_version_min=11.2' '-mlittle-endian' '-mabi=lp64' /opt/homebrew/Cellar/gcc/10.2.0_3/libexec/gcc/aarch64-apple-darwin20/10.2.1/f951 ctrsm.f -ffixed-form -fPIC -quiet -dumpbase ctrsm.f -mmacosx-version-min=11.2.0 -mlittle-endian -mabi=lp64 -auxbase ctrsm -O3 -version -fprefetch-loop-arrays -fintrinsic-modules-path /opt/homebrew/Cellar/gcc/10.2.0_3/lib/gcc/10/gcc/aarch64-apple-darwin20/10.2.1/finclude -o /var/folders/8n/llwp7zmd4jx697g8sw5w46p0gn/T//ccR79V1w.s GNU Fortran (Homebrew GCC 10.2.0_3) version 10.2.1 20201220 (aarch64-apple-darwin20) compiled by GNU C version 10.2.1 20201220, GMP version 6.2.1, MPFR version 4.1.0, MPC version 1.2.1, isl version isl-0.23-GMP GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 GNU Fortran2008 (Homebrew GCC 10.2.0_3) version 10.2.1 20201220 (aarch64-apple-darwin20) compiled by GNU C version 10.2.1 20201220, GMP version 6.2.1, MPFR version 4.1.0, MPC version 1.2.1, isl version isl-0.23-GMP GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 COLLECT_GCC_OPTIONS='-O3' '-fprefetch-loop-arrays' '-v' '-c' '-mmacosx-version-min=11.2.0' '-mlittle-endian' '-mabi=lp64' as -arch arm64 -v -mmacosx-version-min=11.2 -o ctrsm.o /var/folders/8n/llwp7zmd4jx697g8sw5w46p0gn/T//ccR79V1w.s Apple clang version 12.0.0 (clang-1200.0.32.29) Target: aarch64-apple-darwin20.3.0 Thread model: posix InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin "/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang" -cc1as -triple arm64-apple-macosx11.2.0 -filetype obj -main-file-name ccR79V1w.s -target-cpu vortex -target-feature +v8.3a -target-feature +fp-armv8 -target-feature +neon -target-feature +crc -target-feature +crypto -target-feature +fullfp16 -target-feature +ras -target-feature +lse -target-feature +rdm -target-feature +rcpc -target-feature +zcm -target-feature +zcz -target-feature +sha2 -target-feature +aes -fdebug-compilation-dir /tmp -dwarf-debug-producer "Apple clang version 12.0.0 (clang-1200.0.32.29)" -dwarf-version=4 -mrelocation-model pic -o ctrsm.o /var/folders/8n/llwp7zmd4jx697g8sw5w46p0gn/T//ccR79V1w.s /var/folders/8n/llwp7zmd4jx697g8sw5w46p0gn/T//ccR79V1w.s:362:23: error: index must be a multiple of 8 in range [0, 32760]. prfmPLDL1KEEP, [x0, -8] ^
[Bug target/99092] Using -O3 and -fprefetch-loop-arrays to compile BLAS on Apple M1 fails
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99092 --- Comment #6 from Iain Sandoe --- (In reply to ktkachov from comment #5) > I do think it's one of those LLVM assembler issues. > Maybe it's due to the fact that "prfmPLDL1KEEP, [x0, -8]" > is just the alias to the: > prfum pldl1keep, [x0, #-8] > > architectural instruction. > Or it could be that the lack of '#' confuses the assembler likely the latter - I have one fix for that already approved for master (but not applied) but that only affected parenthesised expressions e.g. #(a - b).
[Bug target/99092] Using -O3 and -fprefetch-loop-arrays to compile BLAS on Apple M1 fails
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99092 ktkachov at gcc dot gnu.org changed: What|Removed |Added CC||ktkachov at gcc dot gnu.org --- Comment #5 from ktkachov at gcc dot gnu.org --- I do think it's one of those LLVM assembler issues. Maybe it's due to the fact that "prfmPLDL1KEEP, [x0, -8]" is just the alias to the: prfum pldl1keep, [x0, #-8] architectural instruction. Or it could be that the lack of '#' confuses the assembler
[Bug target/99092] Using -O3 and -fprefetch-loop-arrays to compile BLAS on Apple M1 fails
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99092 Iain Sandoe changed: What|Removed |Added CC||fxcoudert at gcc dot gnu.org, ||iains at gcc dot gnu.org --- Comment #4 from Iain Sandoe --- please note: The Apple M1 compiler is 'experimental' on master, the back port to 10.2 is 'even more experimental' (and local to Home-brew) - the sources are not yet part of GCC "upstream" so hard for folks here to fix. The bug could well be genuine, but please report it either on home-brew, or on https://github.com/iains/gcc-darwin-arm64/issues - so that we can try to fix is there (or propose a fix for it here if it's a generic issue). thanks. Its not practical (with the resources available) to do a GAS port for aarch64/mach-o, so we will have to fix either the llvm back end (and then wait for that to be included in Xcode) or fix the asm emitted for the Darwin/Mach-O back end.
[Bug target/99092] Using -O3 and -fprefetch-loop-arrays to compile BLAS on Apple M1 fails
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99092 Martin Liška changed: What|Removed |Added CC||marxin at gcc dot gnu.org --- Comment #3 from Martin Liška --- $ aarch64-suse-linux-as --version GNU assembler (GNU Binutils; openSUSE Tumbleweed) 2.35.1.20201112-1 $ grep 'prfm.*-8' ctrsm.s && aarch64-suse-linux-as ctrsm.s && echo OK prfmPLDL1KEEP, [x0, -8] OK
[Bug target/99092] Using -O3 and -fprefetch-loop-arrays to compile BLAS on Apple M1 fails
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99092 --- Comment #2 from Martin Liška --- The problem is very likely in LLVM assembler, GAS works fine. Please take a look here: https://reviews.llvm.org/D40011 Can you please paste the output of GCC invocation with --verbose argument?
[Bug target/99092] Using -O3 and -fprefetch-loop-arrays to compile BLAS on Apple M1 fails
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99092 Richard Biener changed: What|Removed |Added Component|fortran |target Target||aarch64 Keywords||wrong-code --- Comment #1 from Richard Biener --- sounds familiar, maybe you can try more recent GCC 10 snapshots. target bug in printing the asm I guess or an assembler bug in rejecting the constant.