Re: How to defeat the optimizer in GDC
On Monday, 20 May 2019 at 08:21:19 UTC, Iain Buclaw wrote: Looks like you've done a typo to me. Memory should be a clobber, not an input operand. Yes, that too.
Re: How to defeat the optimizer in GDC
On Monday, 20 May 2019 at 08:11:19 UTC, Mike Franklin wrote: But I can't get GDC to do the same: https://explore.dgnu.org/z/quCjhU Is this currently possible in GDC? Gah!! Ignore that. `version (GNU)`, not `version(GDC)`. This works: void use(void* p) { version(LDC) { import ldc.llvmasm; __asm("", "r,~{memory}", p); } version(GNU) { asm { "" : : "r" p : "memory"; }; } }
How to defeat the optimizer in GDC
I'm trying to benchmark some code, but the optimizer is basically removing all of it, so I'm benchmarking nothing. I'd like to do something like what Chandler Carruth does here to defeat the optimizer: https://www.youtube.com/watch?v=nXaxk27zwlk=youtu.be=2446 Here presents the following inline asm function to tell the optimizer that `p` is being used (at least that's how I understand it): ``` void escape(void* p) { asm volatile("" : : "g"(p) : memory); } ``` I tried to do the same thing in D with this function: ``` void use(void* p) { version(LDC) { import ldc.llvmasm; __asm("", "r,~{memory}", p); } version(GDC) { asm { "" : : "g"(p), "memory"; } } } ``` The LDC version seems to work fine: https://d.godbolt.org/z/qbg54J But I can't get GDC to do the same: https://explore.dgnu.org/z/quCjhU Is this currently possible in GDC? Mike
Re: GDC 9 and ARM Cortex-M
On Sunday, 19 May 2019 at 06:54:14 UTC, Timo Sintonen wrote: I am updating my toolset and libraries to GCC/GDC 9.1 release. First impression is that druntime needs more work than with previous versions. Many places to change and even compiler crashes when compiling some files. Before I look further I want to ask if there has been any testing with this target (cross compiler linux->arm-eabi). Is it expected to work, not to work or not tested at all. Several months ago, I used this script (https://github.com/JinShil/native-gdc/blob/master/native-gdc.sh) to build a native GDC compiler. I then used that compiler by way of this script (https://github.com/JinShil/arm-none-eabi-gdc/blob/master/arm-none-eabi-gdc.sh) to build an arm-none-eabi cross-compiler from head. I then used that cross-compiler to build this ARM Cortex-M project (https://github.com/JinShil/stm32f42_discovery_demo). Everything worked fine. I don't know if that helps, but that's my experience for whatever it's worth. Mike
Re: Now that GDC has been officially released with GCC 9.1...
On Saturday, 4 May 2019 at 11:34:16 UTC, Iain Buclaw wrote: 1. Where is development taking place? Where is HEAD? It's happening in SVN, there are a few official git mirrors however. Is that also where I can find the latest GDC with the D frontend?
Now that GDC has been officially released with GCC 9.1...
Now that GDC has been officially released with GCC 9.1... 1. Where is development taking place? Where is HEAD? 2. Where do we file bugs? Thanks, Mike
Re: -ffreestanding option
On Wednesday, 25 July 2018 at 10:32:40 UTC, Zheng (Vic) Luo wrote: Instead of forcing developers to avoid memset-like access pattern in a freestanding environment and increasing their mental burden, a universal flags to disable these the generation of these calls will probably be a better choice. I doesn't need to be avoided. As long as you provide a proper implementation of `memset` you can use memset-like patterns as you wish. Mike
Re: -ffreestanding option
On Wednesday, 25 July 2018 at 08:37:28 UTC, Zheng (Vic) Luo wrote: Current implementation of compilers assumes the existence of some symbols from libc, which leads to an infinite loop if we want to implement primitives like "memset" with our own code because the compiler will optimize consecutive set with "memset". This suggests that we cannot write a freestanding program without supports from compiler. With "-betterC" flag, dmd/gdc/ldc also come into this issue[5], which also applies to C/C++[1] and rust [2][3][4]. GDC doesn't seem to be affected. See https://explore.dgnu.org/g/ZJVjAu i.e. no recursive calls to `memset`, but I don't know if I just got lucky with my implementation. It would be better to provide a standard flag like "-ffreestanding" (or -fno-builtin?) to disable such optimizations to facilitate freestanding programming instead of forcing the developers to hack around different compiler implementations, so I was wondering is there any progress on this problem? According to https://wiki.dlang.org/Using_GDC, `-fno-builtin` is already there. From my experience I haven't yet found a need for `-ffreestanding`, as GDC always seems to do the right thing for me. It does generate calls for `memset`, `memcmp`, etc..., but as long as I provide my own implementation with the correct symbol name as it expects (i.e. `memset` with no name mangling, a.k.a `extern(C) void* memset(void*, int, size_t)`) it seems to work fine. Mike
Using GCC's builtin alloca without the C standard library
I'd like to be able to use GCC's `__builtin_alloca` without the C standard library. This seems to work: --- core/stdc/stdlib.d module core.stdc.stdlib; extern(C) void* alloca(size_t n) pure; --- ...but, since I'm not actually using the C standard library, I'd prefer to avoid creating that module hierarchy. I tried simply adding... --- {anyfile}.d extern extern(C) void* __builtin_alloca(size_t size) pure; --- ... to my existing files, but I get an undefined reference for `__bulitin_alloca`. LDC was pretty straightforward with: --- {anyfile}.d pragma(LDC_alloca) void* alloca(size_t size) pure; --- Is there a way to do something like that in GDC. I don't care if I have to use `__builtin_alloca` or some other identifier, I just don't want to create the C standard library module hierarchy. Thanks, Mike
Re: Improving codegen for ARM Cortex-M
On Friday, 20 July 2018 at 11:11:12 UTC, Mike Franklin wrote: I ask for any insight you might have, should you wish to give this your attention. Regardless, I'll keep investigating. Just to follow up, after I enabled `-funroll-loops` for GDC, it was almost twice as fast as LDC, though the code size was a little larger. Bottom line is: I just need to learn the compilers better (both of them) and learn how to tune them for the application. Mike
Re: Improving codegen for ARM Cortex-M
On Friday, 20 July 2018 at 12:49:59 UTC, Mike Franklin wrote: GDC --- arm-none-eabi-gdc -c -O2 -nophoboslib -nostdinc -nodefaultlibs -nostdlib -mthumb -mcpu=cortex-m4 -mtune=cortex-m4 -mfloat-abi=hard -Isource/runtime -fno-bounds-check -ffunction-sections -fdata-sections -fno-weak _D5board3lcd8fillRectFiikktZv: .fnstart .LFB4: @ args = 4, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. push{r4, r5, r6} add r3, r3, r1 cmp r1, r3 ldrhr5, [sp, #12] bgt .L47 rsb r4, r1, r1, lsl #4 add r0, r0, r4, lsl #4 ldr r4, .L58 add r0, r0, r2 rsb r6, r2, r2, lsl #31 add r4, r4, r0, lsl #1 lslsr6, r6, #1 .L51: cbz r2, .L49 addsr0, r4, r6 .L50: strhr5, [r0], #2@ movhi cmp r0, r4 bne .L50 .L49: addsr1, r1, #1 cmp r3, r1 add r4, r4, #480 bge .L51 .L47: pop {r4, r5, r6} bx lr Gah. Sorry folks. I keep screwing up. I can see above that `fillSpan` function is not being inlined. I must be doing something wrong. Please ignore this thread. Sorry, Mike
Re: Improving codegen for ARM Cortex-M
Actually the assembly output from objdump isn't quite accurate. Here's the generated assembly from the compiler. LDC --- ldc2 -conf= -disable-simplify-libcalls -c -Os -mtriple=thumb-none-eabi -float-abi=hard -mcpu=cortex-m4 -Isource/runtime -boundscheck=off _D5board3lcd8fillRectFiikktZv: .fnstart .save {r4, r5, r6, r7, r8, r9, lr} push.w {r4, r5, r6, r7, r8, r9, lr} add.w lr, r3, r1 cmp lr, r3 it lt poplt.w {r4, r5, r6, r7, r8, r9, pc} ldr.w r12, [sp, #28] rsb r5, r3, r3, lsl #4 movwr8, :lower16:_D5board4ltdc11frameBufferG76800t and r1, r2, #3 sub.w r9, r2, #1 movtr8, :upper16:_D5board4ltdc11frameBufferG76800t subsr4, r2, r1 add.w r5, r12, r5, lsl #4 add.w r5, r8, r5, lsl #1 addsr7, r5, #4 .LBB1_1: cbz r2, .LBB1_8 movsr5, #0 cmp.w r9, #3 blo .LBB1_5 mov r6, r7 .LBB1_4: addsr5, #4 strhr0, [r6, #-2] strhr0, [r6, #-4] strhr0, [r6] strhr0, [r6, #2] addsr6, #8 cmp r4, r5 bne .LBB1_4 .LBB1_5: cbz r1, .LBB1_8 rsb r6, r3, r3, lsl #4 cmp r1, #1 add.w r6, r12, r6, lsl #4 add r5, r6 strh.w r0, [r8, r5, lsl #1] beq .LBB1_8 add.w r5, r8, r5, lsl #1 cmp r1, #2 strhr0, [r5, #2] it ne strhne r0, [r5, #4] .LBB1_8: addsr3, #1 add.w r7, r7, #480 cmp r3, lr ble .LBB1_1 pop.w {r4, r5, r6, r7, r8, r9, pc} GDC --- arm-none-eabi-gdc -c -O2 -nophoboslib -nostdinc -nodefaultlibs -nostdlib -mthumb -mcpu=cortex-m4 -mtune=cortex-m4 -mfloat-abi=hard -Isource/runtime -fno-bounds-check -ffunction-sections -fdata-sections -fno-weak _D5board3lcd8fillRectFiikktZv: .fnstart .LFB4: @ args = 4, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. push{r4, r5, r6} add r3, r3, r1 cmp r1, r3 ldrhr5, [sp, #12] bgt .L47 rsb r4, r1, r1, lsl #4 add r0, r0, r4, lsl #4 ldr r4, .L58 add r0, r0, r2 rsb r6, r2, r2, lsl #31 add r4, r4, r0, lsl #1 lslsr6, r6, #1 .L51: cbz r2, .L49 addsr0, r4, r6 .L50: strhr5, [r0], #2@ movhi cmp r0, r4 bne .L50 .L49: addsr1, r1, #1 cmp r3, r1 add r4, r4, #480 bge .L51 .L47: pop {r4, r5, r6} bx lr Mike
Improving codegen for ARM Cortex-M
I've finally succeeded in getting a build of my STM32 ARM Cortex-M proof of concept in LDC and GDC, thanks to the recent changes in both compilers. So, I now have a way to compare code generation between the two compilers. The project is extremely simple; it just generates a bunch of random rectangles on it's small LCD screen. This is done by simply writing to memory in a frame buffer. Unfortunately, GDC's code executes quite a bit slower than LDC's code. The difference is quite noticeable, as I can see the rate of the status LED blinking much slower with GDC than with LDC. The code to do this is below (I simplified it for this discussion, but tested to ensure reproduction of the symptoms. I also did away with the random behavior to remove that variable). a block of code in main.d --- uint i = 0; while(true) { lcd.fillRect(x, y, width, height, color); if ((i % 1000) == 0) { statusLED.toggle(); } i++; } in lcd.d --- @noinline pragma(inline, false) void fillRect(int x, int y, uint width, uint height, ushort color) { int y2 = y + height; for(int _y = y; _y <= y2; _y++) { ltdc.fillSpan(x, _y, width, color); } } from ltdc.d --- void fillSpan(int x, int y, uint spanWidth, ushort color) { int start = y * width + x; for(int i = 0; i < spanWidth; i++) { frameBuffer[start + i] = color; } } LDC disassembly --- ldc2 -conf= -disable-simplify-libcalls -c -Os -mtriple=thumb-none-eabi -float-abi=hard -mcpu=cortex-m4 -Isource/runtime -boundscheck=off <_D5board3lcd8fillRectFiikktZv>: 8b8: e92d 43f0 stmdb sp!, {r4, r5, r6, r7, r8, r9, lr} 8bc: eb03 0e01 add.w lr, r3, r1 8c0: 459ecmp lr, r3 8c2: bfb8it lt 8c4: e8bd 83f0 ldmialt.w sp!, {r4, r5, r6, r7, r8, r9, pc} 8c8: f8dd c01c ldr.w ip, [sp, #28] 8cc: ebc3 1503 rsb r5, r3, r3, lsl #4 8d0: f240 0800 movw r8, #0 8d4: f002 0103 and.w r1, r2, #3 8d8: f1a2 0901 sub.w r9, r2, #1 8dc: f2c2 0800 movt r8, #8192 ; 0x2000 8e0: 1a54subs r4, r2, r1 8e2: eb0c 1505 add.w r5, ip, r5, lsl #4 8e6: eb08 0545 add.w r5, r8, r5, lsl #1 8ea: 1d2fadds r7, r5, #4 8ec: b1f2cbz r2, 800012c <_D5board3lcd8fillRectFiikktZv+0x74> 8ee: 2500movs r5, #0 8f0: f1b9 0f03 cmp.w r9, #3 8f4: d30abcc.n 800010c <_D5board3lcd8fillRectFiikktZv+0x54> 8f6: 463emov r6, r7 8f8: 3504adds r5, #4 8fa: f826 0c02 strh.w r0, [r6, #-2] 8fe: f826 0c04 strh.w r0, [r6, #-4] 8000102: 8030strh r0, [r6, #0] 8000104: 8070strh r0, [r6, #2] 8000106: 3608adds r6, #8 8000108: 42accmp r4, r5 800010a: d1f5bne.n 8f8 <_D5board3lcd8fillRectFiikktZv+0x40> 800010c: b171cbz r1, 800012c <_D5board3lcd8fillRectFiikktZv+0x74> 800010e: ebc3 1603 rsb r6, r3, r3, lsl #4 8000112: 2901cmp r1, #1 8000114: eb0c 1606 add.w r6, ip, r6, lsl #4 8000118: 4435add r5, r6 800011a: f828 0015 strh.w r0, [r8, r5, lsl #1] 800011e: d005beq.n 800012c <_D5board3lcd8fillRectFiikktZv+0x74> 8000120: eb08 0545 add.w r5, r8, r5, lsl #1 8000124: 2902cmp r1, #2 8000126: 8068strh r0, [r5, #2] 8000128: bf18it ne 800012a: 80a8strhne r0, [r5, #4] 800012c: 3301adds r3, #1 800012e: f507 77f0 add.w r7, r7, #480 ; 0x1e0 8000132: 4573cmp r3, lr 8000134: dddable.n 8ec <_D5board3lcd8fillRectFiikktZv+0x34> 8000136: e8bd 83f0 ldmia.w sp!, {r4, r5, r6, r7, r8, r9, pc} GDC disassembly --- arm-none-eabi-gdc -c -O2 -nophoboslib -nostdinc -nodefaultlibs -nostdlib -mthumb -mcpu=cortex-m4 -mtune=cortex-m4 -mfloat-abi=hard -Isource/runtime -fno-bounds-check -ffunction-sections -fdata-sections -fno-weak <_D5board3lcd8fillRectFiikktZv>: 800049c: b470push {r4, r5, r6} 800049e: 440badd r3, r1 80004a0: 4299cmp r1, r3 80004a2: f8bd 500c ldrh.w r5, [sp, #12] 80004a6: dc15bgt.n 80004d4 <_D5board3lcd8fillRectFiikktZv+0x38> 80004a8: ebc1 1401 rsb r4, r1, r1, lsl #4 80004ac: eb00 1004 add.w r0, r0, r4, lsl #4 80004b0: 4c09ldr r4, [pc, #36] ; (80004d8 <_D5board3lcd8fillRectFiikktZv+0x3c>) 80004b2: 4410add r0, r2 80004b4: ebc2 76c2 rsb r6, r2, r2, lsl #31 80004b8: eb04 0440 add.w r4, r4, r0, lsl #1 80004bc: 0076lsls r6, r6, #1 80004be: b122cbz r2, 80004ca <_D5board3lcd8fillRectFiikktZv+0x2e> 80004c0: 19a0adds r0, r4, r6 80004c2: f820 5b02 strh.w r5, [r0], #2 80004c6: 42a0cmp r0, r4 80004c8: d1fbbne.n 80004c2 <_D5board3lcd8fillRectFiikktZv+0x26> 80004ca: 3101adds r1, #1 80004cc: 428bcmp r3, r1 80004ce: f504 74f0 add.w r4, r4, #480 ; 0x1e0 80004d2: daf4bge.n
Re: Bug after local import statement?
On Wednesday, 7 March 2018 at 12:49:30 UTC, berni wrote: The following code compiles with ldc/dmd but not with gdc: cat test.d void main() { int[] count; { import std.string; ++count[3]; } } gdc test.d test.d:8:16: error: only one index allowed to index void ++count[3]; ^ gdc --version gdc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516 Is this a bug? This seems to work fine in GDC 7. See https://explore.dgnu.org/g/1pfiY2 Mike
Re: [Bug 126] Add support for attribute to mark data as volatile.
On Tuesday, 24 June 2014 at 14:14:18 UTC, Johannes Pfau wrote: Am Tue, 24 Jun 2014 10:46:11 + schrieb Timo Sintonen t.sinto...@luukku.com: To keep this thread going, I had a quick look at the reference material of the dip and picked some thoughts. In some languages volatile has a stronger meaning, like guaranteeing an atomic access. In some languages it may not guarantee anything. In this proposal volatile is only for optimization, not for protection. It does not add any code, it just prevents the optimizer removing some code. Agreed. In fact if this proposal reaches the main forum, we may hear proposals to introduce a don't optimize pragma or some other workaround. I considered it myself :( Walter has been against this and now also Martin. I think there is no use to bring this to the main forum. I understand the point that it is not very good to have something in the language specs that can not be guaranteed. I think we should at least try to bring this to the main newsgroup, however I got distracted with other things and I want to extend the DIP a little (Only rationale stuff, no technical changes). The newsgroups have been quite busy lately and there've been discussions about two new DIPs. Neither Andrei nor Walter even posted a response in these threads so now is probably not a good time to start the discussion on this DIP. Agreed, it would be a shame to not try. And looking for a low in controversial topics on the main forum would probably be a good idea. It would be bad for the passions of one discussion to spill over into this one. Walter told me at DConf that he was in favor of compiler intrinsic peek/poke functions as proposed in DIP20 and he said, in the meantime, I could get by with Martin's volatileGet/volatileSet assembly workaround. This tells me two things that are working against DIP62: 1) Walter believes volatile is a property of the load/store operation. 2) He doesn't consider volatile semantics a big priority. DIP20 has Walter's support, but has been collecting mold on the DIP list for a year and a half. Getting DIP62 approved will be a challenge, to put it mildly, but it will still need to be implemented and accepted into DMD. I was in favor of Walter's suggestion (DIP20) until DIP62. What sold me on DIP62 was the fact that one would never want to access volatile memory with non-volatile semantics. This is an irrefutable truth, and, as I can tell, only a type qualifier can enforce provide this enforcement. DIP62 will also be a difficult sell given the simple assembly workaround. I concede, the workaround will solve the problem and is a trivial implementation, but as Iain said, it is ...an excuse to *not* implement a feature that is rather essential for a systems language. I define a systems programming language as one that is generally accepted and used to implement operating systems (kernels and hardware drivers), and I think this is the traditional definition before the language marketers redefined it. D currently requires the help of C and/or other techniques to do this and lacks a runtime suitable for such development, so it is not a systems programming language IMO. But, out of all the other languages I have encountered, D has the most potential of any to be the systems programming language of choice in the future, and DIP62 is a step in that direction. The work of all those currently participating on this thread also shows some encouraging momentum in that direction. Unfortunately, there just doesn't seem to be very many people using D for systems programming, likely because of the reasons I gave above, and this poses another hurdle for DIP62: volatile has very little use outside of systems programming. DIP62 has my support, but I know that doesn't mean much. It will likely need the support of those with some clout in the D community. Mike