Re: How to defeat the optimizer in GDC

2019-05-20 Thread Mike Franklin via D.gnu

On Monday, 20 May 2019 at 08:21:19 UTC, Iain Buclaw wrote:

Looks like you've done a typo to me. Memory should be a 
clobber, not an input operand.


Yes, that too.




Re: How to defeat the optimizer in GDC

2019-05-20 Thread Mike Franklin via D.gnu

On Monday, 20 May 2019 at 08:11:19 UTC, Mike Franklin wrote:

But I can't get GDC to do the same:  
https://explore.dgnu.org/z/quCjhU


Is this currently possible in GDC?


Gah!! Ignore that.  `version (GNU)`, not `version(GDC)`.

This works:

void use(void* p)
{
version(LDC)
{
import ldc.llvmasm;
 __asm("", "r,~{memory}", p);
}
version(GNU)
{
asm { "" : : "r" p : "memory"; };
}
}



How to defeat the optimizer in GDC

2019-05-20 Thread Mike Franklin via D.gnu
I'm trying to benchmark some code, but the optimizer is basically 
removing all of it, so I'm benchmarking nothing.


I'd like to do something like what Chandler Carruth does here to 
defeat the optimizer:  
https://www.youtube.com/watch?v=nXaxk27zwlk=youtu.be=2446


Here presents the following inline asm function to tell the 
optimizer that `p` is being used (at least that's how I 
understand it):

```
void escape(void* p)
{
asm volatile("" : : "g"(p) : memory);
}
```

I tried to do the same thing in D with this function:
```
void use(void* p)
{
version(LDC)
{
import ldc.llvmasm;
 __asm("", "r,~{memory}", p);
}
version(GDC)
{
asm { "" : : "g"(p), "memory"; }
}
}
```

The LDC version seems to work fine:  
https://d.godbolt.org/z/qbg54J


But I can't get GDC to do the same:  
https://explore.dgnu.org/z/quCjhU


Is this currently possible in GDC?

Mike


Re: GDC 9 and ARM Cortex-M

2019-05-19 Thread Mike Franklin via D.gnu

On Sunday, 19 May 2019 at 06:54:14 UTC, Timo Sintonen wrote:
I am updating my toolset and libraries to GCC/GDC 9.1 release. 
First impression is that druntime needs more work than with 
previous versions. Many places to change and even compiler 
crashes when compiling some files.


Before I look further I want to ask if there has been any 
testing with this target (cross compiler linux->arm-eabi). Is 
it expected to work, not to work or not tested at all.


Several months ago, I used this script 
(https://github.com/JinShil/native-gdc/blob/master/native-gdc.sh) 
to build a native GDC compiler.  I then used that compiler by way 
of this script 
(https://github.com/JinShil/arm-none-eabi-gdc/blob/master/arm-none-eabi-gdc.sh) to build an arm-none-eabi cross-compiler from head.


I then used that cross-compiler to build this ARM Cortex-M 
project (https://github.com/JinShil/stm32f42_discovery_demo). 
Everything worked fine.


I don't know if that helps, but that's my experience for whatever 
it's worth.


Mike


Re: Now that GDC has been officially released with GCC 9.1...

2019-05-04 Thread Mike Franklin via D.gnu

On Saturday, 4 May 2019 at 11:34:16 UTC, Iain Buclaw wrote:


1.  Where is development taking place? Where is HEAD?


It's happening in SVN, there are a few official git mirrors 
however.


Is that also where I can find the latest GDC with the D frontend?





Now that GDC has been officially released with GCC 9.1...

2019-05-04 Thread Mike Franklin via D.gnu

Now that GDC has been officially released with GCC 9.1...

1.  Where is development taking place? Where is HEAD?
2.  Where do we file bugs?

Thanks,
Mike


Re: -ffreestanding option

2018-07-25 Thread Mike Franklin via D.gnu

On Wednesday, 25 July 2018 at 10:32:40 UTC, Zheng (Vic) Luo wrote:

Instead of forcing developers to avoid memset-like access 
pattern in a freestanding environment and increasing their 
mental burden, a universal flags to disable these the 
generation of these calls will probably be a better choice.


I doesn't need to be avoided.  As long as you provide a proper 
implementation of `memset` you can use memset-like patterns as 
you wish.


Mike




Re: -ffreestanding option

2018-07-25 Thread Mike Franklin via D.gnu

On Wednesday, 25 July 2018 at 08:37:28 UTC, Zheng (Vic) Luo wrote:

Current implementation of compilers assumes the existence of 
some symbols from libc, which leads to an infinite loop if we 
want to implement primitives like "memset" with our own code 
because the compiler will optimize consecutive set with 
"memset". This suggests that we cannot write a freestanding 
program without supports from compiler. With "-betterC" flag, 
dmd/gdc/ldc also come into this issue[5], which also applies to 
C/C++[1] and rust [2][3][4].


GDC doesn't seem to be affected.  See 
https://explore.dgnu.org/g/ZJVjAu  i.e. no recursive calls to 
`memset`, but I don't know if I just got lucky with my 
implementation.


It would be better to provide a standard flag like 
"-ffreestanding" (or -fno-builtin?) to disable such 
optimizations to facilitate freestanding programming instead of 
forcing the developers to hack around different compiler 
implementations, so I was wondering is there any progress on 
this problem?


According to https://wiki.dlang.org/Using_GDC, `-fno-builtin` is 
already there.


From my experience I haven't yet found a need for 
`-ffreestanding`, as GDC always seems to do the right thing for 
me.  It does generate calls for `memset`, `memcmp`, etc..., but 
as long as I provide my own implementation with the correct 
symbol name as it expects (i.e. `memset` with no name mangling, 
a.k.a `extern(C) void* memset(void*, int, size_t)`) it seems to 
work fine.


Mike





Using GCC's builtin alloca without the C standard library

2018-07-22 Thread Mike Franklin via D.gnu
I'd like to be able to use GCC's `__builtin_alloca` without the C 
standard library.


This seems to work:

--- core/stdc/stdlib.d
module core.stdc.stdlib;
extern(C) void* alloca(size_t n) pure;
---

...but, since I'm not actually using the C standard library, I'd 
prefer to avoid creating that module hierarchy.



I tried simply adding...

--- {anyfile}.d
extern extern(C) void* __builtin_alloca(size_t size) pure;
---

... to my existing files, but I get an undefined reference for 
`__bulitin_alloca`.



LDC was pretty straightforward with:
--- {anyfile}.d
pragma(LDC_alloca)
void* alloca(size_t size) pure;
---

Is there a way to do something like that in GDC.  I don't care if 
I have to use `__builtin_alloca` or some other identifier, I just 
don't want to create the C standard library module hierarchy.


Thanks,
Mike


Re: Improving codegen for ARM Cortex-M

2018-07-20 Thread Mike Franklin via D.gnu

On Friday, 20 July 2018 at 11:11:12 UTC, Mike Franklin wrote:

I ask for any insight you might have, should you wish to give 
this your attention.  Regardless, I'll keep investigating.


Just to follow up, after I enabled `-funroll-loops` for GDC, it 
was almost twice as fast as LDC, though the code size was a 
little larger.


Bottom line is:  I just need to learn the compilers better (both 
of them) and learn how to tune them for the application.


Mike




Re: Improving codegen for ARM Cortex-M

2018-07-20 Thread Mike Franklin via D.gnu

On Friday, 20 July 2018 at 12:49:59 UTC, Mike Franklin wrote:


GDC
---
arm-none-eabi-gdc -c -O2 -nophoboslib -nostdinc -nodefaultlibs 
-nostdlib -mthumb -mcpu=cortex-m4 -mtune=cortex-m4 
-mfloat-abi=hard -Isource/runtime -fno-bounds-check 
-ffunction-sections -fdata-sections -fno-weak


_D5board3lcd8fillRectFiikktZv:
.fnstart
.LFB4:
@ args = 4, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
push{r4, r5, r6}
add r3, r3, r1
cmp r1, r3
ldrhr5, [sp, #12]
bgt .L47
rsb r4, r1, r1, lsl #4
add r0, r0, r4, lsl #4
ldr r4, .L58
add r0, r0, r2
rsb r6, r2, r2, lsl #31
add r4, r4, r0, lsl #1
lslsr6, r6, #1
.L51:
cbz r2, .L49
addsr0, r4, r6
.L50:
strhr5, [r0], #2@ movhi
cmp r0, r4
bne .L50
.L49:
addsr1, r1, #1
cmp r3, r1
add r4, r4, #480
bge .L51
.L47:
pop {r4, r5, r6}
bx  lr


Gah.  Sorry folks.  I keep screwing up.  I can see above that 
`fillSpan` function is not being inlined.  I must be doing 
something wrong.


Please ignore this thread.

Sorry,
Mike



Re: Improving codegen for ARM Cortex-M

2018-07-20 Thread Mike Franklin via D.gnu
Actually the assembly output from objdump isn't quite accurate.  
Here's the generated assembly from the compiler.


LDC
---
ldc2 -conf= -disable-simplify-libcalls -c -Os  
-mtriple=thumb-none-eabi -float-abi=hard -mcpu=cortex-m4 
-Isource/runtime -boundscheck=off


_D5board3lcd8fillRectFiikktZv:
.fnstart
.save   {r4, r5, r6, r7, r8, r9, lr}
push.w  {r4, r5, r6, r7, r8, r9, lr}
add.w   lr, r3, r1
cmp lr, r3
it  lt
poplt.w {r4, r5, r6, r7, r8, r9, pc}
ldr.w   r12, [sp, #28]
rsb r5, r3, r3, lsl #4
movwr8, :lower16:_D5board4ltdc11frameBufferG76800t
and r1, r2, #3
sub.w   r9, r2, #1
movtr8, :upper16:_D5board4ltdc11frameBufferG76800t
subsr4, r2, r1
add.w   r5, r12, r5, lsl #4
add.w   r5, r8, r5, lsl #1
addsr7, r5, #4
.LBB1_1:
cbz r2, .LBB1_8
movsr5, #0
cmp.w   r9, #3
blo .LBB1_5
mov r6, r7
.LBB1_4:
addsr5, #4
strhr0, [r6, #-2]
strhr0, [r6, #-4]
strhr0, [r6]
strhr0, [r6, #2]
addsr6, #8
cmp r4, r5
bne .LBB1_4
.LBB1_5:
cbz r1, .LBB1_8
rsb r6, r3, r3, lsl #4
cmp r1, #1
add.w   r6, r12, r6, lsl #4
add r5, r6
strh.w  r0, [r8, r5, lsl #1]
beq .LBB1_8
add.w   r5, r8, r5, lsl #1
cmp r1, #2
strhr0, [r5, #2]
it  ne
strhne  r0, [r5, #4]
.LBB1_8:
addsr3, #1
add.w   r7, r7, #480
cmp r3, lr
ble .LBB1_1
pop.w   {r4, r5, r6, r7, r8, r9, pc}

GDC
---
arm-none-eabi-gdc -c -O2 -nophoboslib -nostdinc -nodefaultlibs 
-nostdlib -mthumb -mcpu=cortex-m4 -mtune=cortex-m4 
-mfloat-abi=hard -Isource/runtime -fno-bounds-check 
-ffunction-sections -fdata-sections -fno-weak


_D5board3lcd8fillRectFiikktZv:
.fnstart
.LFB4:
@ args = 4, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
push{r4, r5, r6}
add r3, r3, r1
cmp r1, r3
ldrhr5, [sp, #12]
bgt .L47
rsb r4, r1, r1, lsl #4
add r0, r0, r4, lsl #4
ldr r4, .L58
add r0, r0, r2
rsb r6, r2, r2, lsl #31
add r4, r4, r0, lsl #1
lslsr6, r6, #1
.L51:
cbz r2, .L49
addsr0, r4, r6
.L50:
strhr5, [r0], #2@ movhi
cmp r0, r4
bne .L50
.L49:
addsr1, r1, #1
cmp r3, r1
add r4, r4, #480
bge .L51
.L47:
pop {r4, r5, r6}
bx  lr

Mike


Improving codegen for ARM Cortex-M

2018-07-20 Thread Mike Franklin via D.gnu
I've finally succeeded in getting a build of my STM32 ARM 
Cortex-M proof of concept in LDC and GDC, thanks to the recent 
changes in both compilers.  So, I now have a way to compare code 
generation between the two compilers.


The project is extremely simple; it just generates a bunch of 
random rectangles on it's small LCD screen.  This is done by 
simply writing to memory in a frame buffer.


Unfortunately, GDC's code executes quite a bit slower than LDC's 
code.  The difference is quite noticeable, as I can see the rate 
of the status LED blinking much slower with GDC than with LDC.


The code to do this is below (I simplified it for this 
discussion, but tested to ensure reproduction of the symptoms.  I 
also did away with the random behavior to remove that variable).


a block of code in main.d
---
uint i = 0;
while(true)
{
lcd.fillRect(x, y, width, height, color);
if ((i % 1000) == 0)
{
statusLED.toggle();
}

i++;
}

in lcd.d
---
@noinline pragma(inline, false) void fillRect(int x, int y, uint 
width, uint height, ushort color)

{
int y2 = y + height;
for(int _y = y; _y <= y2; _y++)
{
ltdc.fillSpan(x, _y, width, color);
}
}

from ltdc.d
---
void fillSpan(int x, int y, uint spanWidth, ushort color)
{
int start = y * width + x;
for(int i = 0; i < spanWidth; i++)
{
frameBuffer[start + i] = color;
}
}

LDC disassembly
---
ldc2 -conf= -disable-simplify-libcalls -c -Os  
-mtriple=thumb-none-eabi -float-abi=hard -mcpu=cortex-m4 
-Isource/runtime -boundscheck=off


<_D5board3lcd8fillRectFiikktZv>:
8b8:  e92d 43f0   stmdb  sp!, {r4, r5, r6, r7, r8, r9, lr}
8bc:  eb03 0e01   add.w  lr, r3, r1
8c0:  459ecmp  lr, r3
8c2:  bfb8it  lt
8c4:  e8bd 83f0   ldmialt.w  sp!, {r4, r5, r6, r7, r8, r9, pc}
8c8:  f8dd c01c   ldr.w  ip, [sp, #28]
8cc:  ebc3 1503   rsb  r5, r3, r3, lsl #4
8d0:  f240 0800   movw  r8, #0
8d4:  f002 0103   and.w  r1, r2, #3
8d8:  f1a2 0901   sub.w  r9, r2, #1
8dc:  f2c2 0800   movt  r8, #8192  ; 0x2000
8e0:  1a54subs  r4, r2, r1
8e2:  eb0c 1505   add.w  r5, ip, r5, lsl #4
8e6:  eb08 0545   add.w  r5, r8, r5, lsl #1
8ea:  1d2fadds  r7, r5, #4
8ec:  b1f2cbz  r2, 800012c 
<_D5board3lcd8fillRectFiikktZv+0x74>

8ee:  2500movs  r5, #0
8f0:  f1b9 0f03   cmp.w  r9, #3
8f4:  d30abcc.n  800010c 
<_D5board3lcd8fillRectFiikktZv+0x54>

8f6:  463emov  r6, r7
8f8:  3504adds  r5, #4
8fa:  f826 0c02   strh.w  r0, [r6, #-2]
8fe:  f826 0c04   strh.w  r0, [r6, #-4]
8000102:  8030strh  r0, [r6, #0]
8000104:  8070strh  r0, [r6, #2]
8000106:  3608adds  r6, #8
8000108:  42accmp  r4, r5
800010a:  d1f5bne.n  8f8 
<_D5board3lcd8fillRectFiikktZv+0x40>
800010c:  b171cbz  r1, 800012c 
<_D5board3lcd8fillRectFiikktZv+0x74>

800010e:  ebc3 1603   rsb  r6, r3, r3, lsl #4
8000112:  2901cmp  r1, #1
8000114:  eb0c 1606   add.w  r6, ip, r6, lsl #4
8000118:  4435add  r5, r6
800011a:  f828 0015   strh.w  r0, [r8, r5, lsl #1]
800011e:  d005beq.n  800012c 
<_D5board3lcd8fillRectFiikktZv+0x74>

8000120:  eb08 0545   add.w  r5, r8, r5, lsl #1
8000124:  2902cmp  r1, #2
8000126:  8068strh  r0, [r5, #2]
8000128:  bf18it  ne
800012a:  80a8strhne  r0, [r5, #4]
800012c:  3301adds  r3, #1
800012e:  f507 77f0   add.w  r7, r7, #480  ; 0x1e0
8000132:  4573cmp  r3, lr
8000134:  dddable.n  8ec 
<_D5board3lcd8fillRectFiikktZv+0x34>

8000136:  e8bd 83f0   ldmia.w  sp!, {r4, r5, r6, r7, r8, r9, pc}

GDC disassembly
---
arm-none-eabi-gdc -c -O2 -nophoboslib -nostdinc -nodefaultlibs 
-nostdlib -mthumb -mcpu=cortex-m4 -mtune=cortex-m4 
-mfloat-abi=hard -Isource/runtime -fno-bounds-check 
-ffunction-sections -fdata-sections -fno-weak


<_D5board3lcd8fillRectFiikktZv>:
800049c:  b470push  {r4, r5, r6}
800049e:  440badd  r3, r1
80004a0:  4299cmp  r1, r3
80004a2:  f8bd 500c   ldrh.w  r5, [sp, #12]
80004a6:  dc15bgt.n  80004d4 
<_D5board3lcd8fillRectFiikktZv+0x38>

80004a8:  ebc1 1401   rsb  r4, r1, r1, lsl #4
80004ac:  eb00 1004   add.w  r0, r0, r4, lsl #4
80004b0:  4c09ldr  r4, [pc, #36]  ; (80004d8 
<_D5board3lcd8fillRectFiikktZv+0x3c>)

80004b2:  4410add  r0, r2
80004b4:  ebc2 76c2   rsb  r6, r2, r2, lsl #31
80004b8:  eb04 0440   add.w  r4, r4, r0, lsl #1
80004bc:  0076lsls  r6, r6, #1
80004be:  b122cbz  r2, 80004ca 
<_D5board3lcd8fillRectFiikktZv+0x2e>

80004c0:  19a0adds  r0, r4, r6
80004c2:  f820 5b02   strh.w  r5, [r0], #2
80004c6:  42a0cmp  r0, r4
80004c8:  d1fbbne.n  80004c2 
<_D5board3lcd8fillRectFiikktZv+0x26>

80004ca:  3101adds  r1, #1
80004cc:  428bcmp  r3, r1
80004ce:  f504 74f0   add.w  r4, r4, #480  ; 0x1e0
80004d2:  daf4bge.n  

Re: Bug after local import statement?

2018-03-07 Thread Mike Franklin via D.gnu

On Wednesday, 7 March 2018 at 12:49:30 UTC, berni wrote:

The following code compiles with ldc/dmd but not with gdc:


cat test.d

void main()
{
int[] count;
{
import std.string;
++count[3];
}
}


gdc test.d

test.d:8:16: error: only one index allowed to index void
 ++count[3];
   ^


gdc --version

gdc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516



Is this a bug?


This seems to work fine in GDC 7.  See 
https://explore.dgnu.org/g/1pfiY2


Mike




Re: [Bug 126] Add support for attribute to mark data as volatile.

2014-06-24 Thread Mike Franklin via D.gnu

On Tuesday, 24 June 2014 at 14:14:18 UTC, Johannes Pfau wrote:

Am Tue, 24 Jun 2014 10:46:11 +
schrieb Timo Sintonen t.sinto...@luukku.com:

To keep this thread going, I had a quick look at the reference 
material of the dip and picked some thoughts.


In some languages volatile has a stronger meaning, like 
guaranteeing an atomic access. In some languages it may not 
guarantee anything.


In this proposal volatile is only for optimization, not for 
protection. It does not add any code, it just prevents the 
optimizer removing some code.


Agreed.  In fact if this proposal reaches the main forum, we may 
hear proposals to introduce a don't optimize pragma or some 
other workaround.  I considered it myself :(




Walter has been against this and now also Martin. I think 
there is no use to bring this to the main forum. I understand 
the point that it is not very good to have something in the 
language specs that can not be guaranteed.
I think we should at least try to bring this to the main 
newsgroup,
however I got distracted with other things and I want to extend 
the DIP

a little (Only rationale stuff, no technical changes). The
newsgroups have been quite busy lately and there've been 
discussions
about two new DIPs. Neither Andrei nor Walter even posted a 
response

in these threads so now is probably not a good time to start the
discussion on this DIP.


Agreed, it would be a shame to not try. And looking for a low in 
controversial topics on the main forum would probably be a good 
idea.  It would be bad for the passions of one discussion to 
spill over into this one.


Walter told me at DConf that he was in favor of compiler 
intrinsic peek/poke functions as proposed in DIP20 and he said, 
in the meantime, I could get by with Martin's 
volatileGet/volatileSet assembly workaround.


This tells me two things that are working against DIP62:
1) Walter believes volatile is a property of the load/store 
operation.

2) He doesn't consider volatile semantics a big priority.

DIP20 has Walter's support, but has been collecting mold on the 
DIP list for a year and a half. Getting DIP62 approved will be a 
challenge, to put it mildly, but it will still need to be 
implemented and accepted into DMD.


I was in favor of Walter's suggestion (DIP20) until DIP62. What 
sold me on DIP62 was the fact that one would never want to access 
volatile memory with non-volatile semantics. This is an 
irrefutable truth, and, as I can tell, only a type qualifier can 
enforce provide this enforcement.


DIP62 will also be a difficult sell given the simple assembly 
workaround.  I concede, the workaround will solve the problem and 
is a trivial implementation, but as Iain said, it is ...an 
excuse to *not* implement a feature that is rather essential for 
a systems language.


I define a systems programming language as one that is generally 
accepted and used to implement operating systems (kernels and 
hardware drivers), and I think this is the traditional definition 
before the language marketers redefined it.  D currently requires 
the help of C and/or other techniques to do this and lacks a 
runtime suitable for such development, so it is not a systems 
programming language IMO.  But, out of all the other languages I 
have encountered, D has the most potential of any to be the 
systems programming language of choice in the future, and DIP62 
is a step in that direction.  The work of all those currently 
participating on this thread also shows some encouraging momentum 
in that direction.


Unfortunately, there just doesn't seem to be very many people 
using D for systems programming, likely because of the reasons I 
gave above, and this poses another hurdle for DIP62:  volatile 
has very little use outside of systems programming.  DIP62 has my 
support, but I know that doesn't mean much.  It will likely need 
the support of those with some clout in the D community.


Mike