Re: GDC for ARM MacOS / OSX
On Thursday, 29 June 2023 at 09:18:19 UTC, Iain Buclaw wrote: On Thursday, 29 June 2023 at 05:27:57 UTC, Cecil Ward wrote: I tried getting GCC on my ARM M2 Mac using the homebrew package manager, which is how I got LDC. It gave me C++ and C and FORTRAN, but no sign of any GDC. How is GCC available when it has no support for aarch64-darwin2x? (Experimental support is available in [iains fork on github](https://github.com/iains/gcc-darwin-arm64) targeting the 14.x development branch) Iain and I have been going through arm64-darwin support in D run-time, mostly it's just getting cross bootstrap from gcc-11 done cleanly. Nothing that I expect to be made concretely available just yet, though I am hoping that GCC will finally get support for M1/M2 by the time 14.1 is released in May though. Thankyou for your good work Iain. I don’t have a working GDC as the one on my Raspberry Pi AAarch64 Debian Buster dies with an error message ever time you try to compile. I have run GDC on 32-bit ARM on the Raspberry Pi and in godbolt.org on x86-64. I will get hold of an x86-64 box with Linux on it. Get round the problems that way.
GDC for ARM MacOS / OSX
I tried getting GCC on my ARM M2 Mac using the homebrew package manager, which is how I got LDC. It gave me C++ and C and FORTRAN, but no sign of any GDC.
Re: bug report x86-64 code: je / jbe
On Friday, 16 June 2023 at 13:12:06 UTC, Iain Buclaw wrote: On Wednesday, 14 June 2023 at 12:35:43 UTC, Cecil Ward wrote: I have just noticed a bug in the latest release of GDC that targets x86-64. For example GDC 12.3 and above versions too, running on X86-64, targeting self. This was built with: -O3 -frelease -march=alderlake What leads you to believe that it is buggy? Generated code is: je L1 jbe L1 What I see is the first instruction is going to relate to this condition. if ( unlikely( p == 1 ) ) return x; Then the next instruction is the condition in the following for-loop. for ( exp_ui_t i = p; i > 1; i >>= 1 ) Redundant jump? Yes, arguably. Leads to wrong runtime? Doesn't look that way. Completely agree with Iain, it’s not incorrect code, I wasn’t intending to suggest that. I’d just say suboptimal, and not the very best code generation possible.
bug report x86-64 code: je / jbe
I have just noticed a bug in the latest release of GDC that targets x86-64. For example GDC 12.3 and above versions too, running on X86-64, targeting self. This was built with: -O3 -frelease -march=alderlake Generated code is: je L1 jbe L1 … … L1: ret The probably reason for this is my use of the GDC built in for indicating whether conditional jumps are likely or unlikely to be taken. I wrote a trivial routine likely() and unlikely() and used this as follows: public T pow(T, exp_ui_t )( in T x, in exp_ui_t p ) pure @safe nothrow @nogc if ( is ( exp_ui_t == ulong ) || is ( exp_ui_t == uint ) ) in { static assert( is ( typeof( x * x ) ) ); assert( p >= 0 ); } out ( ret ) { assert( ( p == 0 && ret == 1 ) || !( p == 0 ) ); } do { if ( unlikely( p == 0 ) ) return 1; if ( unlikely( p == 1 ) ) return x; /* if ( unlikely( x == 0 ) ) // fast-path opt, unnecessary return x; if ( unlikely( x == 1 ) ) // fast-path opt, unnecessary return x; */ T s = x; T v = 1; for ( exp_ui_t i = p; i > 1; i >>= 1 ) { v = ( i & 0x1 ) ? s * v : v; s = s * s; } //assert( p > 1 && pow( x, p ) == ( p > 1 ? x * pow( x, p-1) : 1) ); return v * s; } pragma( inline, true ) private bool builtin_expect()( in bool test_cond, in bool expected_cond ) pure nothrow @safe @nogc { version ( LDC ) {// ldc.intrinsics.llvm_expect - didi not seem to work when tested in LDC 1.22 import ldc.intrinsics : llvm_expect; return cast(bool) llvm_expect( test_cond, expected_cond ); } version ( GDC ) { import gcc.builtins : __builtin_expect; return cast(bool) __builtin_expect( test_cond, expected_cond ); } return test_cond; } pragma( inline, true ) public bool likely()( in bool test_cond ) pure nothrow @safe @nogc /* Returns test_cond which makes it convenient to do assert( unlikely() ) * Also emulates builtin_expect's return behaviour, by returning the argument */ { return builtin_expect( test_cond, true ); } pragma( inline, true ) public bool unlikely()( in bool test_cond ) pure nothrow @safe @nogc /* Returns test_cond which makes it convenient to do assert( unlikely() ) * Also emulates builtin_expect's return behaviour, by returning the argument */ { return builtin_expect( test_cond, false ); } // ~~~ module likely - end. This is not the whole of this .d file, I can of course give you the whole lot if you desire. I inspected the result in Matt Godbolt’s compiler explorer website godbolt.org. An aside: LDC:: I need to look at LDC’s llvm_expect to see if it is controlling the branches the way I wish. Does anyone know if llvm_expect has any problems?
Regression - quality of generated x86-64 code between GDC v12.3 and v13.1
I wrote a very small procedure in D and the x86-64 asm code generated in GDC 12.3 was excellent whereas that from 13.1 was insanely bloated, totally different. Note: the badness is independent of the -On optimisation level (-O3 used initially.) Here’s the D code and following it, two asm code snippets: public pragma( inline, true ) cpuid_abcd_t cpuid_insn( in uint32_t eax ) pure nothrow @nogc @trusted { /* ecx arg omitted; absolutely minimal variant wrapper */ assert( ! is_ecx_needed( eax ) ); // since we are not providing an ecx, we had better not be needing to supply one static assert( eax.sizeof * 8 == 32 ); // optional, exact static assert( eax.sizeof * 8 >= 32 ); // essential min const uint32_t in_eax = eax; // really just for type-checking, and constness-assertion static assert( in_eax.sizeof * 8 == 32 ); cpuid_abcd_t ret = void; /* undefined until the cpuid insn writes it */ static assert(ret.eax.sizeof * 8 == 32 && ret.ebx.sizeof * 8 == 32 && ret.ecx.sizeof * 8 == 32 && ret.edx.sizeof * 8 == 32 ); asm pure nothrow @nogc { ".intel_syntax " ~ "\n\t" ~ "cpuid" ~ "\n\t" ~ ".att_syntax \n" : /* outputs : it is guaranteed that all bits 63…32 of rax/rbx/rcx/rdx etc are zeroed in output. */ "=a" ( ret.eax ), // an lhs ref, write-only; and only bits 31…0 are significant "=b" ( ret.ebx ), // .. .. "=c" ( ret.ecx ), "=d" ( ret.edx ) : /* inputs : */ "a" ( in_eax ) // read. // /* no ecx input - this is the variant with input ecx omitted */ : /* no clobbers apart from the outputs already listed */ /* does cpuid set flags? - think not, so no "cc" clobber reqd */ ; } return ret; } /* */ GDC 12.3:: -O3 -frelease -march=native pushrbx mov eax, edi cpuid mov rsi, rdx sal rbx, 32 mov eax, eax mov edx, ecx sal rsi, 32 or rax, rbx pop rbx or rdx, rsi ret GDC 13.1 = v. bad, same switches: -O3 -frelease -march=native pushbp mov eax, edi mov rbp, rsp pushrbx and rsp, -32 cpuid vmovd xmm3, eax vmovd xmm2, ecx vpinsrd xmm1, xmm2, edx, 1 vpinsrd xmm0, xmm3, rbx, 1 vpunpcklqdq xmm4, xmm0, xmm1 vmovdqa xmmword ptr [rsp-80], xmm4 mov rax, qword ptr [rsp-80] mov rdx, qword ptr [rsp-72] mov rbx, qword ptr [rbp-8] leave ret /* */
Re: Hello world in AAarch64 Debian Buster
On Thursday, 8 August 2019 at 01:17:53 UTC, Cecil Ward wrote: On Wednesday, 7 August 2019 at 05:48:49 UTC, Iain Buclaw wrote: You could raise a Debian bug report, saying that aarch64 is in the libphobos supported list. Thanks Iain, I went through the prompts in the Debian bug report program and I hope that that has emailed a report to them which makes some kind of sense. I’m assuming that someone somewhere just needs to rerun the make properly, or whatever. Is it a lack of error checking in the makefile or associated tools, which does not ring a bell when things go seriously wrong and allows a bad object file to be created even though the build went wrong internally and became a twisted manky thing before birth. Building it all myself from sources is certainly a nightmare, so much missing and things left unautomated wrt provision of (non-source) files associated with dependencies. Any suggestions as to where I should go from here for a bit (actually a lot) if hand-holding as I An totally out of my death. Just trying to find a human who could build gdc arm64 here for me
Re: Hello world in AAarch64 Debian Buster
On Wednesday, 7 August 2019 at 05:48:49 UTC, Iain Buclaw wrote: You could raise a Debian bug report, saying that aarch64 is in the libphobos supported list. Thanks Iain, I went through the prompts in the Debian bug report program and I hope that that has emailed a report to them which makes some kind of sense. I’m assuming that someone somewhere just needs to rerun the make properly, or whatever. Is it a lack of error checking in the makefile or associated tools, which does not ring a bell when things go seriously wrong and allows a bad object file to be created even though the build went wrong internally and became a twisted manky thing before birth. Building it all myself from sources is certainly a nightmare, so much missing and things left unautomated wrt provision of (non-source) files associated with dependencies.
Re: Hello world in AAarch64 Debian Buster
On Tuesday, 6 August 2019 at 16:35:21 UTC, Johannes Pfau wrote: Am Tue, 06 Aug 2019 05:13:11 + schrieb Cecil Ward: I have a raspberry pi 3B+ running raspbian stretch 32-bit with a containerised guest o/s inside it using systemd-nspawn, the guest o/s being AAarch64 Debian Buster. Inside AAarch64 Debian Buster, I run the following from the shell and get an error from the gdc compiler: root@debian-buster-64:~# gdc -O3 -frelease -S test.d cc1d: error: cannot find source code for runtime library file 'object.d' cc1d: note: dmd might not be correctly installed. Run 'dmd -man' for installation instructions. (null):0: confused by earlier errors, bailing out root@debian-buster-64:~# Any clues as to where I should head from here? You're probably missing libgphobos-dev. However, I think on debian buster there is no arm64 port of libgphobos-dev yet. Testing seems to have libgphobos-9-dev with arm64 support. If you want to use buster though, you probably have to build gcc by yourself. Just get the gcc 9 sources and use ./configure --enable-languages=d when configuring gcc. Thank you very much Johannes. I have never done this before but I thought here goes, so I started out trying to build the whole of GCC including the D language from the sources. I made a bit of a mess of this, because having written a bash script to set things going, I ran it from the wrong shell, the shell in the host o/s not the one in the guest o/s. So this started off building the wrong architecture variant. I then realised I don’t know how to get my few files into the guest o/s Debian buster’s filesystem (inside its chroot jail) so that’s more fun to work out. But worse than that, the make came up with an error, saying there are a few of dependencies that re not part of the download GMP for example. So I’m going to have to build all of those from the sources as well, and find out how to download them. I could easily get into a circular dependency thing here. I am so far out of my depth here. Some person who has done this before will have had binaries / object files / targets for those dependencies. Unless the files are just in the wrong place and I need to tell it where they are - which is one suggestion from the error msg. To recap: I was trying to simply get hold of gdc for aarch64- I didn’t really want to build anything from sources. I suspect this will create additional problems faster than it solves the original ones. Perhaps I should look around for gdc AArch64 Debian prebuilt binaries or ask someone who knows what on earth they are doing to kindly bootstrap this problem for me.
Hello world in AAarch64 Debian Buster
I have a raspberry pi 3B+ running raspbian stretch 32-bit with a containerised guest o/s inside it using systemd-nspawn, the guest o/s being AAarch64 Debian Buster. Inside AAarch64 Debian Buster, I run the following from the shell and get an error from the gdc compiler: root@debian-buster-64:~# gdc -O3 -frelease -S test.d cc1d: error: cannot find source code for runtime library file 'object.d' cc1d: note: dmd might not be correctly installed. Run 'dmd -man' for installation instructions. (null):0: confused by earlier errors, bailing out root@debian-buster-64:~# Any clues as to where I should head from here?
Trying to build GDC from sources (New fool - please be kind)
Got as far as downloading a huge .tar.xz file and extracting it, but U just guessed at a version of gcc sources to ftp-fetch in the first place. I have a gcc-7.3.0 folder now. Is that the correct version number ? Next, the script file setup-gcc.sh comes up with the error message "found gcc version 7 This version of GCC (7) is not supported." It's looking for a patch file with a different name, based on version number. I have patch files patch-*8.patch currently, mismatch against the check for 7. Not sure where to go from here.
Re: [Bug 288] strange nonsensical x86-64 code generation with -O3 - rats' nest of useless conditional jumps
On Friday, 23 March 2018 at 22:14:35 UTC, Iain Buclaw wrote: On Friday, 23 March 2018 at 00:39:13 UTC, Cecil Ward wrote: On Thursday, 22 March 2018 at 22:16:16 UTC, Iain Buclaw wrote: https://bugzilla.gdcproject.org/show_bug.cgi?id=288 --- Comment #1 from Iain Buclaw--- See the long list of useless conditional jumps towards the end of the first function in the asm output (whose demangled name is test.t1(unit)) Well, you'd never use -O3 if you care about speed anyway. :-) And they are not useless jumps, it's just the foreach loop unrolled in its entirety. You can see that it's a feature of the gcc-7 series and latter, irregardless of the target, they all produce the same unrolled loop. https://explore.dgnu.org/g/vD3N4Y It might be a nice experiment to add pragma(ivdep) and pragma(unroll) support to give you more control. https://gcc.gnu.org/onlinedocs/gcc/Loop-Specific-Pragmas.html I wouldn't hold my breath though (this is not strictly a bug). Agreed. It is possibly not a bug, because I don't see that the code is dysfunctional, but I haven't looked through it. But since the backend is doing optimisation here with unrolling, that being sub-optimal given with this weird code is imho a bug in that the achievement of _optimisation_ is not attained. No I understand this is nothing to do with D, and I understand that this is unrolling. But notice the target of the jumps are all to the same location and finishes off with an unconditional jump to the same location. Not quite, if you look a little closer, some jump to other branches hidden inbetween. I feel this is just a quirk of unrolling, in part, but that's not all I feel as the jumps don't make sense cmp #n / jxx L3 cmp #m / jxx L3 jmp L3 is what we have so it all basically does absolutely nothing, unless cmp 1 cmp 2 cmp 3 cmp 4 is an incredibly bad way of testing ( x>=1 && x<=4 ) but with 30-odd tests it isn't very funny. If you compile with -fdump-tree-optimized=stdout you will see that it's the middle-end that has lowered the code to a series of if jumps. The backend consumer doesn't really have any chance for improving it. I know this is merely debug-only code, but am wondering what else might happen if you are misguided enough to use the crazy -O3 with unrolled loops that have conditionals in them. My other complaint about GCC back-end’' code generation is that it (sometimes) doesn't go for jump-less movcc-style operations when it can. For x86/x64, LDC sometimes generates jump-free code using conditional instructions where GCC does not. I can fix the problem with GDC by using A single & instead of a &&, which happens to be legal here. Sometimes in the past I have needed to take steps to make sure that I can do such an operator substitution trick in order to get jump-free far far faster code, faster where the alternatives are extremely short (and side-effect-free) and branch prediction failure is a certainty. You could also try compiling with -O2. I couldn't really see this in your given example, but honestly, if you want to optimize really aggressively you must be willing to coax the compiler in strange ways anyway. I don't know if there are ways in which the backend could try to ascertain whether the results of certain unrolling are really bad. In some cases they could be bad because the code is too long and generates problems with code cache size or won't fit into a loop buffer. A highly per-cpu sub-variant check would need to be carried out in the generated code size, at least in all cases where there is still a loop left (as opposed to full unrolling of known size), as every kind of AMD and Intel processor is different, as Agner Fog warns us. Here though I didn't even explicitly ask for unrolling, so you might harshly say that it is the compiler’s jib ti work out whether it is actually an anti-optimisation, regardless of the possible reasons why the result may be bad news, never mind just based on total generated code size not fitting into some per-CP limit. Well again, from past experience -O3 doesn't really care about code size or cache line so much. All optimizations passes which lower the code this way do so during SSA transformations, so irrespective of what is being targeted. My reason for reporting this was to inquire about for loop unrolling behaves in later versions of the back end, ask about jump generation vs jump-free alternatives (LDC showing the correct way to do things) and to ask if there are any suboptimality nasties junking in code that does not merely come down to driving an assert. I would hope for an optimisation that handles the case of dense-packed cmp #1 | cmp #2 | cmp #3 | cmp #4 etc, especially with no holes, in the case where _all the jumps go to the same target_, so this can get reduced down into a two-test range check and huge optimisation. I would also hope that conditional
Re: [Bug 288] strange nonsensical x86-64 code generation with -O3 - rats' nest of useless conditional jumps
On Thursday, 22 March 2018 at 22:16:16 UTC, Iain Buclaw wrote: https://bugzilla.gdcproject.org/show_bug.cgi?id=288 --- Comment #1 from Iain Buclaw--- See the long list of useless conditional jumps towards the end of the first function in the asm output (whose demangled name is test.t1(unit)) Well, you'd never use -O3 if you care about speed anyway. :-) And they are not useless jumps, it's just the foreach loop unrolled in its entirety. You can see that it's a feature of the gcc-7 series and latter, irregardless of the target, they all produce the same unrolled loop. https://explore.dgnu.org/g/vD3N4Y It might be a nice experiment to add pragma(ivdep) and pragma(unroll) support to give you more control. https://gcc.gnu.org/onlinedocs/gcc/Loop-Specific-Pragmas.html I wouldn't hold my breath though (this is not strictly a bug). Agreed. It is possibly not a bug, because I don't see that the code is dysfunctional, but I haven't looked through it. But since the backend is doing optimisation here with unrolling, that being sub-optimal given with this weird code is imho a bug in that the achievement of _optimisation_ is not attained. No I understand this is nothing to do with D, and I understand that this is unrolling. But notice the target of the jumps are all to the same location and finishes off with an unconditional jump to the same location. I feel this is just a quirk of unrolling, in part, but that's not all I feel as the jumps don't make sense cmp #n / jxx L3 cmp #m / jxx L3 jmp L3 is what we have so it all basically does absolutely nothing, unless cmp 1 cmp 2 cmp 3 cmp 4 is an incredibly bad way of testing ( x>=1 && x<=4 ) but with 30-odd tests it isn't very funny. I know this is merely debug-only code, but am wondering what else might happen if you are misguided enough to use the crazy -O3 with unrolled loops that have conditionals in them. My other complaint about GCC back-end’' code generation is that it (sometimes) doesn't go for jump-less movcc-style operations when it can. For x86/x64, LDC sometimes generates jump-free code using conditional instructions where GCC does not. I can fix the problem with GDC by using A single & instead of a &&, which happens to be legal here. Sometimes in the past I have needed to take steps to make sure that I can do such an operator substitution trick in order to get jump-free far far faster code, faster where the alternatives are extremely short (and side-effect-free) and branch prediction failure is a certainty. I don't know if there are ways in which the backend could try to ascertain whether the results of certain unrolling are really bad. In some cases they could be bad because the code is too long and generates problems with code cache size or won't fit into a loop buffer. A highly per-cpu sub-variant check would need to be carried out in the generated code size, at least in all cases where there is still a loop left (as opposed to full unrolling of known size), as every kind of AMD and Intel processor is different, as Agner Fog warns us. Here though I didn't even explicitly ask for unrolling, so you might harshly say that it is the compiler’s jib ti work out whether it is actually an anti-optimisation, regardless of the possible reasons why the result may be bad news, never mind just based on total generated code size not fitting into some per-CP limit. My reason for reporting this was to inquire about for loop unrolling behaves in later versions of the back end, ask about jump generation vs jump-free alternatives (LDC showing the correct way to do things) and to ask if there are any suboptimality nasties junking in code that does not merely come down to driving an assert. I would hope for an optimisation that handles the case of dense-packed cmp #1 | cmp #2 | cmp #3 | cmp #4 etc, especially with no holes, in the case where _all the jumps go to the same target_, so this can get reduced down into a two-test range check and huge optimisation. I would also hope that conditional jumps followed by an unconditional jump could be spotted and handled too. (Peephole general low level optimisation then? ie jxx L1 / jmp L1 = jmp L1) Perhaps this is all being generated too late and optimisations have ready happened and they opportunities those optimisers provide have been and gone. Would it be possible for the backend to include a number of repeat optimisation passes of certain kinds after unrolled code is generated, or doesn't it work like that? Anyway, this is not for you but for that particular backend, I suspect. I was wondering if someone could have a word, pass it in to the relevant people. I think it's worth making -O3 more generally usable rather than crazy because it features good ideas gone bad.
How to report code generation weirdness
How do I report some extremely weird (useless) code generated by GDC when the -O3 option is used? (bizarre rats’ nest of conditional jumps). [I am an experienced professional asm programmer, now retired.] The D source is short, fortunately. The asm output I am looking at is seen through the telescope that is Matt GodBolt’s Compiler Explorer at http://explore.dgnu.org so I possibly would want to pull the asm text from that site (somehow).