[Bug target/99092] Using -O3 and -fprefetch-loop-arrays to compile BLAS on Apple M1 fails

2024-02-27 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99092

Andrew Pinski  changed:

   What|Removed |Added

   See Also||https://github.com/llvm/llv
   ||m-project/issues/83226

--- Comment #16 from Andrew Pinski  ---
Filed it as https://github.com/llvm/llvm-project/issues/83226 .

[Bug target/99092] Using -O3 and -fprefetch-loop-arrays to compile BLAS on Apple M1 fails

2024-02-27 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99092

--- Comment #15 from Andrew Pinski  ---
(In reply to Iain Sandoe from comment #14)
> (In reply to Andrew Pinski from comment #13)
> > Did the LLVM assembler get fixed?
> 
> not as of xcode 13.0 (I don't know if anyone filed a radar tho) - since the
> problem was fixed on the branch, I guess no-one was motivated.

and it is still a bug in the upstream LLVM too; just checked. Will file a bug
there soon.

[Bug target/99092] Using -O3 and -fprefetch-loop-arrays to compile BLAS on Apple M1 fails

2021-11-08 Thread iains at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99092

--- Comment #14 from Iain Sandoe  ---
(In reply to Andrew Pinski from comment #13)
> Did the LLVM assembler get fixed?

not as of xcode 13.0 (I don't know if anyone filed a radar tho) - since the
problem was fixed on the branch, I guess no-one was motivated.

[Bug target/99092] Using -O3 and -fprefetch-loop-arrays to compile BLAS on Apple M1 fails

2021-11-08 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99092

--- Comment #13 from Andrew Pinski  ---
Did the LLVM assembler get fixed?

[Bug target/99092] Using -O3 and -fprefetch-loop-arrays to compile BLAS on Apple M1 fails

2021-02-20 Thread iains at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99092

Iain Sandoe  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |SUSPENDED
   Last reconfirmed||2021-02-20

--- Comment #12 from Iain Sandoe  ---
I added an issue to the experimental branch :

https://github.com/iains/gcc-darwin-arm64/issues/43

And produced two patches to work around the issue (although the first should
tighten up the constraint on prf*m for all targets).

--

The first patch is a conservative fix, it just prevents the generation of pfrm
insns when the offset is out of range (and when it would require pfrum for
Darwin)

https://github.com/iains/gcc-darwin-arm64/commit/2fbd9a7f9cddc7e243c0025713841e0bc1465c41

The second patch adds predicate, constraint and patterns for the prfum insn,
which means that Darwin now generates:

prfum [X0, -8]

which is accepted by the LLVM backend,

https://github.com/iains/gcc-darwin-arm64/commit/881a59f2258a5a7a9c2c862420c4e93e9df17f2c



Given some more time, I expect that the two could be combined in some way; at
least unless/until LLVM gets a fix and that percolates through to Xcode.

So the bug is "fixed on the experimental branch".

Given that it cannot be fixed on GCC 'upstream' until we have a chance to
submit the port (which isn't ready yet!) .. I suggest that "SUSPEND" is a
reasonable state for this bug.

[Bug target/99092] Using -O3 and -fprefetch-loop-arrays to compile BLAS on Apple M1 fails

2021-02-20 Thread iains at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99092

--- Comment #11 from Iain Sandoe  ---
(In reply to Andrew Pinski from comment #10)
> From the ARM ARM:
> An assembler program translating a Load/Store instruction, for example LDR,
> is required to encode an unambiguous offset using the unscaled 9-bit offset
> form, and to encode an ambiguous offset using the scaled 12-bit offset form.
> A programmer might force the generation of the unscaled 9-bit form by using
> one of the mnemonics in Table C3-17. Arm recommends that a disassembler
> outputs all unscaled 9-bit offset forms using one of these mnemonics, but
> unambiguous offsets can be output using a Load/Store single register
> mnemonic, for example, LDR.

it would be nice if that applied to a 'generic' version of the insn (one might
read the advice as so):

prf PLDL1KEEP, [x0, 200]  ===> assembler chooses prfm/prfum as it likes

prfm  PLDL1KEEP, [x0, 200] --> use the insn I wrote! 
prfm  PLDL1KEEP, [x0, -8] --> .. or error if I'm dumb

prfum PLDL1KEEP, [x0, 200] --> use the insn I wrote! 
prfum PLDL1KEEP, [x0, 4096] --> .. or error if I'm dumb

 but I guess we have to live with the status quo.

[Bug target/99092] Using -O3 and -fprefetch-loop-arrays to compile BLAS on Apple M1 fails

2021-02-17 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99092

--- Comment #10 from Andrew Pinski  ---
>From the ARM ARM:
An assembler program translating a Load/Store instruction, for example LDR, is
required to encode an unambiguous offset using the unscaled 9-bit offset
form, and to encode an ambiguous offset using the scaled 12-bit offset form. A
programmer might force the generation of the unscaled 9-bit form by using one
of the mnemonics in Table C3-17. Arm recommends that a disassembler outputs all
unscaled 9-bit offset forms using one of these mnemonics, but unambiguous
offsets can be output using a Load/Store single register mnemonic, for example,
LDR.

[Bug target/99092] Using -O3 and -fprefetch-loop-arrays to compile BLAS on Apple M1 fails

2021-02-17 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99092

--- Comment #9 from Andrew Pinski  ---
hmmm, see https://gcc.gnu.org/legacy-ml/gcc-patches/2014-07/msg00612.html :
"When it comes to emitting the pattern, always use "prfm" -- the prfum
form can be generated from the prfm mnemonic when the offset implies
this is necessary."

>From readin the ARM ARM, it does look like the prfm mnemonic should accept the
unscaled 9bit signed value.  Just like how ldr vs ldur.
So the bug is in LLVM assembler I think.

[Bug target/99092] Using -O3 and -fprefetch-loop-arrays to compile BLAS on Apple M1 fails

2021-02-17 Thread iains at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99092

--- Comment #8 from Iain Sandoe  ---
it seems that GAS is accepting an encoding that's not specified in at least
version DDI0487Fc_armv8_arm.

that says that 
C6.2.212 PRFM (immediate) takes 

" Is the optional positive immediate byte offset, a multiple of 8 in the
range 0 to 32760, defaulting to 0 and encoded in the "imm12" field as
/8."

= and

C6.2.215 PRFUM 

" Is the optional signed immediate byte offset, in the range -256 to 255,
defaulting to 0 and encoded in the "imm9" field."

===

so probably the bug is present for all targets, not just Darwin - it just
happens to show there.  FWIW, the encoding is shown thus:

PRFM (|#), [{, #}]

So LLVM might well also reject it without the '#' (I have encountered at least
one case before where that happened).

[Bug target/99092] Using -O3 and -fprefetch-loop-arrays to compile BLAS on Apple M1 fails

2021-02-17 Thread jeff.science at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99092

--- Comment #7 from Jeff Hammond  ---
@Martin

% gfortran -O3 -fprefetch-loop-arrays --verbose -c ctrsm.f && echo OKAY

Using built-in specs.
COLLECT_GCC=gfortran
Target: aarch64-apple-darwin20
Configured with: ../configure --build=aarch64-apple-darwin20
--prefix=/opt/homebrew/Cellar/gcc/10.2.0_3
--libdir=/opt/homebrew/Cellar/gcc/10.2.0_3/lib/gcc/10 --disable-nls
--enable-checking=release --enable-languages=c,c++,objc,obj-c++,fortran
--program-suffix=-10 --with-gmp=/opt/homebrew/opt/gmp
--with-mpfr=/opt/homebrew/opt/mpfr --with-mpc=/opt/homebrew/opt/libmpc
--with-isl=/opt/homebrew/opt/isl --with-system-zlib --with-pkgversion='Homebrew
GCC 10.2.0_3' --with-bugurl=https://github.com/Homebrew/homebrew-core/issues
--disable-multilib --with-native-system-header-dir=/usr/include
--with-sysroot=/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk
SED=/usr/bin/sed
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 10.2.1 20201220 (Homebrew GCC 10.2.0_3) 
COLLECT_GCC_OPTIONS='-O3' '-fprefetch-loop-arrays' '-v' '-c'
'-mmacosx-version-min=11.2.0' '-asm_macosx_version_min=11.2' '-mlittle-endian'
'-mabi=lp64'

/opt/homebrew/Cellar/gcc/10.2.0_3/libexec/gcc/aarch64-apple-darwin20/10.2.1/f951
ctrsm.f -ffixed-form -fPIC -quiet -dumpbase ctrsm.f -mmacosx-version-min=11.2.0
-mlittle-endian -mabi=lp64 -auxbase ctrsm -O3 -version -fprefetch-loop-arrays
-fintrinsic-modules-path
/opt/homebrew/Cellar/gcc/10.2.0_3/lib/gcc/10/gcc/aarch64-apple-darwin20/10.2.1/finclude
-o /var/folders/8n/llwp7zmd4jx697g8sw5w46p0gn/T//ccR79V1w.s
GNU Fortran (Homebrew GCC 10.2.0_3) version 10.2.1 20201220
(aarch64-apple-darwin20)
compiled by GNU C version 10.2.1 20201220, GMP version 6.2.1, MPFR
version 4.1.0, MPC version 1.2.1, isl version isl-0.23-GMP

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
GNU Fortran2008 (Homebrew GCC 10.2.0_3) version 10.2.1 20201220
(aarch64-apple-darwin20)
compiled by GNU C version 10.2.1 20201220, GMP version 6.2.1, MPFR
version 4.1.0, MPC version 1.2.1, isl version isl-0.23-GMP

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
COLLECT_GCC_OPTIONS='-O3' '-fprefetch-loop-arrays' '-v' '-c'
'-mmacosx-version-min=11.2.0'  '-mlittle-endian' '-mabi=lp64'
 as -arch arm64 -v -mmacosx-version-min=11.2 -o ctrsm.o
/var/folders/8n/llwp7zmd4jx697g8sw5w46p0gn/T//ccR79V1w.s
Apple clang version 12.0.0 (clang-1200.0.32.29)
Target: aarch64-apple-darwin20.3.0
Thread model: posix
InstalledDir:
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

"/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang"
-cc1as -triple arm64-apple-macosx11.2.0 -filetype obj -main-file-name
ccR79V1w.s -target-cpu vortex -target-feature +v8.3a -target-feature +fp-armv8
-target-feature +neon -target-feature +crc -target-feature +crypto
-target-feature +fullfp16 -target-feature +ras -target-feature +lse
-target-feature +rdm -target-feature +rcpc -target-feature +zcm -target-feature
+zcz -target-feature +sha2 -target-feature +aes -fdebug-compilation-dir /tmp
-dwarf-debug-producer "Apple clang version 12.0.0 (clang-1200.0.32.29)"
-dwarf-version=4 -mrelocation-model pic -o ctrsm.o
/var/folders/8n/llwp7zmd4jx697g8sw5w46p0gn/T//ccR79V1w.s
/var/folders/8n/llwp7zmd4jx697g8sw5w46p0gn/T//ccR79V1w.s:362:23: error:
index must be a multiple of 8 in range [0, 32760].
prfmPLDL1KEEP, [x0, -8]
^

[Bug target/99092] Using -O3 and -fprefetch-loop-arrays to compile BLAS on Apple M1 fails

2021-02-15 Thread iains at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99092

--- Comment #6 from Iain Sandoe  ---
(In reply to ktkachov from comment #5)
> I do think it's one of those LLVM assembler issues.
> Maybe it's due to the fact that "prfmPLDL1KEEP, [x0, -8]"
> is just the alias to the:
> prfum   pldl1keep, [x0, #-8]
> 
> architectural instruction.
> Or it could be that the lack of '#' confuses the assembler

likely the latter - I have one fix for that already approved for master (but
not applied) but that only affected parenthesised expressions e.g. #(a - b).

[Bug target/99092] Using -O3 and -fprefetch-loop-arrays to compile BLAS on Apple M1 fails

2021-02-15 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99092

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 CC||ktkachov at gcc dot gnu.org

--- Comment #5 from ktkachov at gcc dot gnu.org ---
I do think it's one of those LLVM assembler issues.
Maybe it's due to the fact that "prfmPLDL1KEEP, [x0, -8]"
is just the alias to the:
prfum   pldl1keep, [x0, #-8]

architectural instruction.
Or it could be that the lack of '#' confuses the assembler

[Bug target/99092] Using -O3 and -fprefetch-loop-arrays to compile BLAS on Apple M1 fails

2021-02-15 Thread iains at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99092

Iain Sandoe  changed:

   What|Removed |Added

 CC||fxcoudert at gcc dot gnu.org,
   ||iains at gcc dot gnu.org

--- Comment #4 from Iain Sandoe  ---
please note:

The Apple M1 compiler is 'experimental' on master, the back port to 10.2 is
'even more experimental' (and local to Home-brew) - the sources are not yet
part of GCC "upstream" so hard for folks here to fix.

The bug could well be genuine, but please report it either on home-brew, or on
https://github.com/iains/gcc-darwin-arm64/issues - so that we can try to fix is
there (or propose a fix for it here if it's a generic issue).

thanks.


Its not practical (with the resources available) to do a GAS port for
aarch64/mach-o, so we will have to fix either the llvm back end (and then wait
for that to be included in Xcode) or fix the asm emitted for the Darwin/Mach-O
back end.


[Bug target/99092] Using -O3 and -fprefetch-loop-arrays to compile BLAS on Apple M1 fails

2021-02-15 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99092

Martin Liška  changed:

   What|Removed |Added

 CC||marxin at gcc dot gnu.org

--- Comment #3 from Martin Liška  ---
$ aarch64-suse-linux-as --version
GNU assembler (GNU Binutils; openSUSE Tumbleweed) 2.35.1.20201112-1

$ grep 'prfm.*-8' ctrsm.s && aarch64-suse-linux-as ctrsm.s  && echo OK
prfmPLDL1KEEP, [x0, -8]
OK

[Bug target/99092] Using -O3 and -fprefetch-loop-arrays to compile BLAS on Apple M1 fails

2021-02-15 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99092

--- Comment #2 from Martin Liška  ---
The problem is very likely in LLVM assembler, GAS works fine.
Please take a look here:
https://reviews.llvm.org/D40011

Can you please paste the output of GCC invocation with --verbose argument?

[Bug target/99092] Using -O3 and -fprefetch-loop-arrays to compile BLAS on Apple M1 fails

2021-02-15 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99092

Richard Biener  changed:

   What|Removed |Added

  Component|fortran |target
 Target||aarch64
   Keywords||wrong-code

--- Comment #1 from Richard Biener  ---
sounds familiar, maybe you can try more recent GCC 10 snapshots.  target bug in
printing the asm I guess or an assembler bug in rejecting the constant.