Re: Tricky DMD bug, but I have no idea how to report

2019-02-08 Thread H. S. Teoh via Digitalmars-d-learn
On Fri, Feb 08, 2019 at 11:36:03PM +, JN via Digitalmars-d-learn wrote:
> On Friday, 8 February 2019 at 23:30:44 UTC, H. S. Teoh wrote:
[...]
> > Pity we couldn't get rid of std.stdio.
[...]
> I can replace it with core.stdc.stdio if it's any better. Looks like
> any attempt to do a check for "x is null" hides the bug. I tried
> assert(), also tried if (x is null) throw new Exception(...)

Aha!  That's an important insight.  It's almost certain that it's caused
by a backend bug now.  So testing the value perturbs the codegen code
path enough to mask the bug / avoid the bug.  I think from this point
somebody who's familiar with the dmd backend ought to be able to track
it down reasonably easily.  (Unfortunately I'm completely unfamiliar
with that part of the dmd code.)


T

-- 
In order to understand recursion you must first understand recursion.


Re: Tricky DMD bug, but I have no idea how to report

2019-02-08 Thread JN via Digitalmars-d-learn

On Friday, 8 February 2019 at 23:30:44 UTC, H. S. Teoh wrote:
On Fri, Feb 08, 2019 at 10:45:39PM +, JN via 
Digitalmars-d-learn wrote: [...]
Anyway, I reduced the code further manually. It's very hard to 
reduce it any further. For example, removing the assignments 
in fromEulerAngles static method hides the bug.  Likewise, 
replacing writeln with assert makes it work properly too.


Pity we couldn't get rid of std.stdio.  It's a pretty big piece 
of code, and there are plenty of places where it may go wrong 
inside, even though we generally expect that the bug lies 
elsewhere.  Oh well.  Hopefully somebody else can dig into this 
and figure out what's going on.


Hmm. I just glanced over the std.stdio code... it appears that 
somebody has added @trusted all over the place, probably just 
to get it to compile with @safe.  That's kinda scary... 
somebody needs to vet this code carefully to make sure nothing 
fishy's going on in there!



T


I can replace it with core.stdc.stdio if it's any better. Looks 
like any attempt to do a check for "x is null" hides the bug. I 
tried assert(), also tried if (x is null) throw new Exception(...)


Re: Tricky DMD bug, but I have no idea how to report

2019-02-08 Thread H. S. Teoh via Digitalmars-d-learn
On Fri, Feb 08, 2019 at 10:45:39PM +, JN via Digitalmars-d-learn wrote:
[...]
> Anyway, I reduced the code further manually. It's very hard to reduce
> it any further. For example, removing the assignments in
> fromEulerAngles static method hides the bug.  Likewise, replacing
> writeln with assert makes it work properly too.

Pity we couldn't get rid of std.stdio.  It's a pretty big piece of code,
and there are plenty of places where it may go wrong inside, even though
we generally expect that the bug lies elsewhere.  Oh well.  Hopefully
somebody else can dig into this and figure out what's going on.

Hmm. I just glanced over the std.stdio code... it appears that somebody
has added @trusted all over the place, probably just to get it to
compile with @safe.  That's kinda scary... somebody needs to vet this
code carefully to make sure nothing fishy's going on in there!


T

-- 
For every argument for something, there is always an equal and opposite 
argument against it. Debates don't give answers, only wounded or inflated egos.


Re: Tricky DMD bug, but I have no idea how to report

2019-02-08 Thread JN via Digitalmars-d-learn

On Friday, 8 February 2019 at 22:11:31 UTC, H. S. Teoh wrote:
Pity I still can't reproduce the problem locally. Otherwise I 
would reduce it even more -- e.g., eliminate std.stdio 
dependency and have the program fail on assert(obj != null), 
and a bunch of other things to make it easier for compiler devs 
to analyze -- and perhaps look at the generated assembly to see 
what went wrong.  If you have the time (and patience) to do 
that, it would greatly increase the chances of this being fixed 
in a timely way, since it would narrow down the bug even more 
so that it's easier to find in the dmd source code.



T


It seems to be a Windows 64-bit only thing. Anyway, I reduced the 
code further manually. It's very hard to reduce it any further. 
For example, removing the assignments in fromEulerAngles static 
method hides the bug. Likewise, replacing writeln with assert 
makes it work properly too.


Re: Tricky DMD bug, but I have no idea how to report

2019-02-08 Thread H. S. Teoh via Digitalmars-d-learn
On Fri, Feb 08, 2019 at 09:42:11PM +, JN via Digitalmars-d-learn wrote:
> On Friday, 8 February 2019 at 21:35:34 UTC, H. S. Teoh wrote:
> > On Fri, Feb 08, 2019 at 09:23:40PM +, JN via Digitalmars-d-learn
> > wrote: [...]
> > > I managed to greatly reduce the source code. I have filed a bug
> > > with the reduced testcase
> > > https://issues.dlang.org/show_bug.cgi?id=19662 .
> > 
> > Haha, you were right!  It's a compiler bug, another one of those
> > nasty -O -inline bugs.  Probably a backend codegen bug.  Ran into
> > one of those before; was pretty nasty.  Fortunately it got fixed
> > soon(ish) after I made noise about it in the forum. :-P
[...]
> Luckily it's not a blocker for me, because it doesn't trigger on debug
> builds, and for release builds I can always use LDC, but still it's
> bugging me (pun intended).

Pity I still can't reproduce the problem locally. Otherwise I would
reduce it even more -- e.g., eliminate std.stdio dependency and have the
program fail on assert(obj != null), and a bunch of other things to make
it easier for compiler devs to analyze -- and perhaps look at the
generated assembly to see what went wrong.  If you have the time (and
patience) to do that, it would greatly increase the chances of this
being fixed in a timely way, since it would narrow down the bug even
more so that it's easier to find in the dmd source code.


T

-- 
I see that you JS got Bach.


Re: Tricky DMD bug, but I have no idea how to report

2019-02-08 Thread JN via Digitalmars-d-learn

On Friday, 8 February 2019 at 21:35:34 UTC, H. S. Teoh wrote:
On Fri, Feb 08, 2019 at 09:23:40PM +, JN via 
Digitalmars-d-learn wrote: [...]
I managed to greatly reduce the source code. I have filed a 
bug with the reduced testcase 
https://issues.dlang.org/show_bug.cgi?id=19662 .


Haha, you were right!  It's a compiler bug, another one of 
those nasty -O -inline bugs.  Probably a backend codegen bug.  
Ran into one of those before; was pretty nasty.  Fortunately it 
got fixed soon(ish) after I made noise about it in the forum. 
:-P



T


Luckily it's not a blocker for me, because it doesn't trigger on 
debug builds, and for release builds I can always use LDC, but 
still it's bugging me (pun intended).


Re: Tricky DMD bug, but I have no idea how to report

2019-02-08 Thread H. S. Teoh via Digitalmars-d-learn
On Fri, Feb 08, 2019 at 09:23:40PM +, JN via Digitalmars-d-learn wrote:
[...]
> I managed to greatly reduce the source code. I have filed a bug with
> the reduced testcase https://issues.dlang.org/show_bug.cgi?id=19662 .

Haha, you were right!  It's a compiler bug, another one of those nasty
-O -inline bugs.  Probably a backend codegen bug.  Ran into one of those
before; was pretty nasty.  Fortunately it got fixed soon(ish) after I
made noise about it in the forum. :-P


T

-- 
Don't drink and derive. Alcohol and algebra don't mix.


Re: Tricky DMD bug, but I have no idea how to report

2019-02-08 Thread JN via Digitalmars-d-learn
On Friday, 8 February 2019 at 09:30:12 UTC, Vladimir Panteleev 
wrote:

On Friday, 8 February 2019 at 09:28:48 UTC, JN wrote:
I will try. However, one last thing - in the example test 
scripts, it runs first with one compiler setting (or D 
version) and the second time with the other compiler setting 
(or D version). But it looks like the exit code of the first 
run is ignored anyway, so why run it?


With "set -e", the shell interpreter will exit the script with 
any command that fails (returns with non-zero status), unless 
it's in an "if" condition or such. I'll update the article to 
clarify it.


I see. Dustmite helped. I had to convert it to windows batch, so 
my testscript ended up to be:


dmd -O -inline -release -boundscheck=on -i app.d -m64
@IF %ERRORLEVEL% EQU 0 (ECHO No error found) ELSE (EXIT /B 1)
@app | FINDSTR /C:"Object"
@IF %ERRORLEVEL% EQU 0 (ECHO No error found) ELSE (EXIT /B 1)
dmd -O -inline -release -boundscheck=off -i app.d -m64
@IF %ERRORLEVEL% EQU 0 (ECHO No error found) ELSE (EXIT /B 1)
@app | FINDSTR /C:"null"
@IF %ERRORLEVEL% EQU 0 (EXIT /B 0) ELSE (EXIT /B 1)

I managed to greatly reduce the source code. I have filed a bug 
with the reduced testcase 
https://issues.dlang.org/show_bug.cgi?id=19662 .


Re: Tricky DMD bug, but I have no idea how to report

2019-02-08 Thread Vladimir Panteleev via Digitalmars-d-learn

On Friday, 8 February 2019 at 09:28:48 UTC, JN wrote:
I will try. However, one last thing - in the example test 
scripts, it runs first with one compiler setting (or D version) 
and the second time with the other compiler setting (or D 
version). But it looks like the exit code of the first run is 
ignored anyway, so why run it?


With "set -e", the shell interpreter will exit the script with 
any command that fails (returns with non-zero status), unless 
it's in an "if" condition or such. I'll update the article to 
clarify it.




Re: Tricky DMD bug, but I have no idea how to report

2019-02-08 Thread JN via Digitalmars-d-learn
On Friday, 8 February 2019 at 07:30:41 UTC, Vladimir Panteleev 
wrote:

On Thursday, 7 February 2019 at 22:16:19 UTC, JN wrote:

Does it also work for dub projects?


It will work if you can put all the relevant D code in one 
directory, which is harder for Dub, as it likes to pull 
dependencies from all over the place. When "dub dustmite" is 
insufficient (as in this case), the safest way to proceed would 
be to build with dub in verbose mode, take note of the compiler 
command lines it's using, then put them in a shell script and 
all mentioned D files in one directory, then pass that to 
Dustmite.


I will try. However, one last thing - in the example test 
scripts, it runs first with one compiler setting (or D version) 
and the second time with the other compiler setting (or D 
version). But it looks like the exit code of the first run is 
ignored anyway, so why run it?


Re: Tricky DMD bug, but I have no idea how to report

2019-02-07 Thread Vladimir Panteleev via Digitalmars-d-learn

On Thursday, 7 February 2019 at 22:16:19 UTC, JN wrote:

Does it also work for dub projects?


It will work if you can put all the relevant D code in one 
directory, which is harder for Dub, as it likes to pull 
dependencies from all over the place. When "dub dustmite" is 
insufficient (as in this case), the safest way to proceed would 
be to build with dub in verbose mode, take note of the compiler 
command lines it's using, then put them in a shell script and all 
mentioned D files in one directory, then pass that to Dustmite.




Re: Tricky DMD bug, but I have no idea how to report

2019-02-07 Thread H. S. Teoh via Digitalmars-d-learn
On Thu, Feb 07, 2019 at 10:16:19PM +, JN via Digitalmars-d-learn wrote:
[...]
> Anyway, I managed to reduce the source code greatly manually:
> 
> https://github.com/helikopterodaktyl/repro_d_release/
> 
> unfortunately I can't get rid of the dlib dependency. When built with
> debug, test outputs [0: Object], with release it outputs [0: null].
> 
> commenting this line out:
> f.rotation = Quaternionf.fromEulerAngles(Vector3f(0.0f, 0.0f, 0.0f));
> or changing it to:
> f.rotation = Quaternionf.identity();
> 
> is enough to make release output [0: Object] as well. I guess dlib is
> doing something dodgy with memory layout, but I can't see anything
> suspicious :(

Hmm. I can't seem to reproduce this in my environment (Linux/x86_64).
Tried it with various combinations of `dub -b release|debug|etc.`,
manually compiling with `dmd -I~/.dub/packages/dlib-0.15.0/dlib` with
various combinations of -release, -debug, etc..

I wonder if you somehow have an ABI mismatch caused by stale cached
objects in dub?  Perhaps try `dub --force` to force a rebuild of
everything?  Or, if you're daring, delete the entire dub cache and
rebuild, just to be sure there are no stray stale files lying around
somewhere.


Barring that, one way to narrow this down further is to copy the
relevant dlib sources into your own source tree, remove the dub
dependency, and then reduce the dlib sources as well.  I did a quick and
crude test, and discovered that you only need the following files:

dlib/math/matrix.d
dlib/math/linsolve.d
dlib/math/quaternion.d
dlib/math/decomposition.d
dlib/math/package.d
dlib/math/vector.d
dlib/math/utils.d
dlib/core/package.d
dlib/core/tuple.d

Replace dlib/core/package.d with an empty file, and edit
dlib/math/package.d to import only dlib.math.quaternion and
dlib.math.vector.

Since you're only using a very small number of functions, you can
probably quickly eliminate most of the above files too. Just edit the
files directly (since they're your own copy) and delete everything that
isn't directly needed by your code.  Of course, at the same time check
also that deleting doesn't change the bug behaviour. If it does, then
whatever you just deleted may possibly be (part of) the cause of the
problem.

Sorry I can't help you with reproducing the problem, as the bug doesn't
seem to show up in my environment.  (I suspect it's still there, just
that subtle differences in my environment may be masking it somehow.)


T

-- 
Political correctness: socially-sanctioned hypocrisy.


Re: Tricky DMD bug, but I have no idea how to report

2019-02-07 Thread JN via Digitalmars-d-learn
On Thursday, 7 February 2019 at 03:50:32 UTC, Vladimir Panteleev 
wrote:

On Monday, 17 December 2018 at 21:59:59 UTC, JN wrote:
while working on my game engine project, I encountered a DMD 
codegen bug. It occurs only when compiling in release mode, 
debug works.


Old thread, but FWIW, such bugs can be easily and precisely 
reduced with DustMite. In your test script, just compile with 
and without the compiler option which causes the bug to 
manifest, and check that one works and the other doesn't.


I put together a short article on the DustMite wiki describing 
how to do this:

https://github.com/CyberShadow/DustMite/wiki/Reducing-a-bug-with-a-specific-compiler-option


Does it also work for dub projects?

Anyway, I managed to reduce the source code greatly manually:

https://github.com/helikopterodaktyl/repro_d_release/

unfortunately I can't get rid of the dlib dependency. When built 
with debug, test outputs [0: Object], with release it outputs [0: 
null].


commenting this line out:
f.rotation = Quaternionf.fromEulerAngles(Vector3f(0.0f, 0.0f, 
0.0f));

or changing it to:
f.rotation = Quaternionf.identity();

is enough to make release output [0: Object] as well. I guess 
dlib is doing something dodgy with memory layout, but I can't see 
anything suspicious :(


Re: Tricky DMD bug, but I have no idea how to report

2019-02-06 Thread Vladimir Panteleev via Digitalmars-d-learn

On Monday, 17 December 2018 at 21:59:59 UTC, JN wrote:
while working on my game engine project, I encountered a DMD 
codegen bug. It occurs only when compiling in release mode, 
debug works.


Old thread, but FWIW, such bugs can be easily and precisely 
reduced with DustMite. In your test script, just compile with and 
without the compiler option which causes the bug to manifest, and 
check that one works and the other doesn't.


I put together a short article on the DustMite wiki describing 
how to do this:

https://github.com/CyberShadow/DustMite/wiki/Reducing-a-bug-with-a-specific-compiler-option



Re: Tricky DMD bug, but I have no idea how to report

2019-02-06 Thread H. S. Teoh via Digitalmars-d-learn
On Wed, Feb 06, 2019 at 10:37:27PM +, JN via Digitalmars-d-learn wrote:
[...]
> I am not sure if it's a pointer bug. What worries me is that it breaks
> at the start of the program, but uncommenting code at the end of the
> program influences it. Unless there's some crazy reordering going on,
> this shouldn't normally have an effect.

As I've said before, this kind of "spooky" action-at-a-distance symptom
is exactly the kind of behaviour you'd expect from a pointer bug.  Of
course, it doesn't mean that it *must* be a pointer bug, but it does
look awfully similar to one.


> I still believe the bug is on the compiler side, but it's a bit of
> code in my case, and if I try to minimize the case, the issue
> disappears. Oh well.

That's another typical symptom of a pointer bug.  It seems less likely
to be a codegen bug, because I'd expect a codegen bug to exhibit more
consistent symptoms: if a particular code is triggering a compiler
codegen bug, then it shouldn't matter what other code is being compiled,
the bug should show up in all cases.  This kind of sensitivity to
minute, unrelated changes is closer to how pointer bugs tend to behave.

Of course, it's possible that there's a pointer bug in the *compiler*,
so there's that.  It's hard to tell either way at this point.  Though
given how much the compiler is used by so many people on a daily basis,
it's also less likely though not impossible. Unless your code just
happens to contain a particularly rare combination of language features
that causes the compiler to go down a rarely-tested code path that
contains the bug.

Anyway, given what you said about how moving (or minimizing)
seemingly-unrelated code around seems to affect the symptoms, we could
do a little educated guesswork to try to narrow it down a little more.
You said commenting out code at the end of the program affects whether
it crashes at the beginning.  Is this in the same function (presumably
main()), or is it in different functions?

If it's in the same function, one possibility is that you have some
local variables that are being overrun by a buffer overflow or some bad
pointer.  Commenting out code at the end of the function changes the
layout of variables on the stack, so it would change what gets
overwritten.  Possibly, the bug gets hidden by the bad pointer being
redirected to some innocuous variable whose value is no longer used, or
some such, so the presence of the bug is masked.

If the commented-out code is in a different function from the location
of the crash, and you're sure that the commented out code is not being
run before the crash, then it would appear to be something related to
the layout of global variables.  Perhaps there's some module static ctor
that's being triggered / not triggered, that changes the global state in
some way that affects the code at the beginning of the program?  If
there's a bad pointer that points to some heap location, the action of
module ctors running vs. not running could alter the heap state enough
to mask the bug in some cases.

Another possibility is if you're interfacing with C code and have a non
null-terminated D string that's being cast to char*, and the presence of
more code in the executable may perturb the data/code segment layout
just enough to push the string somewhere that happens to contain a null
shortly afterwards.

Just some guesses based on my experience with pointer bugs.


T

-- 
Written on the window of a clothing store: No shirt, no shoes, no service.


Re: Tricky DMD bug, but I have no idea how to report

2019-02-06 Thread JN via Digitalmars-d-learn

On Wednesday, 6 February 2019 at 22:22:26 UTC, H. S. Teoh wrote:
Of course, I've no clue whether this is the cause of your 
problems -- it's just one of many possibilities.  Pointer bugs 
are nasty things to debug, regardless of whether or not they've 
been abstracted away in nicer clothing.  I still remember 
pointer bugs that took literally months just to get a clue on, 
because it was nigh impossible to track down where they 
happened -- the symptoms are too far removed from the cause.  
You pretty much have to take a wild guess and get lucky.


They are just as bad as race condition bugs. (Once, a race 
condition bug took me almost half a year to fix, because it 
only showed up in the customer's live environment and we could 
never reproduce it locally. We knew there was a race somewhere, 
but it was impossible to locate it. Eventually, by pure 
accident, an unrelated code change subtly altered the timings 
of certain things that made the bug more likely to manifest 
under certain conditions -- and only then were we finally able 
to reliably reproduce the problem and track down its root 
cause.)



T


I am not sure if it's a pointer bug. What worries me is that it 
breaks at the start of the program, but uncommenting code at the 
end of the program influences it. Unless there's some crazy 
reordering going on, this shouldn't normally have an effect. I 
still believe the bug is on the compiler side, but it's a bit of 
code in my case, and if I try to minimize the case, the issue 
disappears. Oh well.


Re: Tricky DMD bug, but I have no idea how to report

2019-02-06 Thread H. S. Teoh via Digitalmars-d-learn
On Wed, Feb 06, 2019 at 09:50:44PM +, JN via Digitalmars-d-learn wrote:
> On Tuesday, 18 December 2018 at 22:56:19 UTC, H. S. Teoh wrote:
> > Since no explicit slicing was done, there was no compiler error /
> > warning of any sort, and it wasn't obvious from the code what had
> > happened. By the time doSomething() was called, it was already long
> > past the source of the problem in buggyCode(), and it was almost
> > impossible to trace the problem back to its source.
> > 
> > Theoretically, -dip25 and -dip1000 are supposed to prevent this sort
> > of problem, but I don't know how fully-implemented they are, whether
> > they would catch the specific instance in your code, or whether your
> > code even compiles with these options.
[...]
> No luck. Actually, I avoid in my code pointers in general, I write my
> code very "Java-like" with objects everywhere etc.
[...]

The nasty thing about the implicit static array -> slice conversion is
that your code can have no bare pointers in sight, yet you still end up
with an invalid reference to an out-of-scope local variable.

Some of us have argued that this conversion ought to be be prohibited.
But we haven't actually tried going in that direction yet, because it
*will* break existing code (though IMO such code is suspect to begin
with, and besides, all you have to do is to explicitly slice the static
array to get around the newly-introduced compile error).

Of course, I've no clue whether this is the cause of your problems --
it's just one of many possibilities.  Pointer bugs are nasty things to
debug, regardless of whether or not they've been abstracted away in
nicer clothing.  I still remember pointer bugs that took literally
months just to get a clue on, because it was nigh impossible to track
down where they happened -- the symptoms are too far removed from the
cause.  You pretty much have to take a wild guess and get lucky.

They are just as bad as race condition bugs. (Once, a race condition bug
took me almost half a year to fix, because it only showed up in the
customer's live environment and we could never reproduce it locally. We
knew there was a race somewhere, but it was impossible to locate it.
Eventually, by pure accident, an unrelated code change subtly altered
the timings of certain things that made the bug more likely to manifest
under certain conditions -- and only then were we finally able to
reliably reproduce the problem and track down its root cause.)


T

-- 
"I suspect the best way to deal with procrastination is to put off the
procrastination itself until later. I've been meaning to try this, but
haven't gotten around to it yet. " -- swr


Re: Tricky DMD bug, but I have no idea how to report

2019-02-06 Thread JN via Digitalmars-d-learn

On Tuesday, 18 December 2018 at 22:56:19 UTC, H. S. Teoh wrote:
Since no explicit slicing was done, there was no compiler error 
/ warning of any sort, and it wasn't obvious from the code what 
had happened. By the time doSomething() was called, it was 
already long past the source of the problem in buggyCode(), and 
it was almost impossible to trace the problem back to its 
source.


Theoretically, -dip25 and -dip1000 are supposed to prevent this 
sort of problem, but I don't know how fully-implemented they 
are, whether they would catch the specific instance in your 
code, or whether your code even compiles with these options.



T


No luck. Actually, I avoid in my code pointers in general, I 
write my code very "Java-like" with objects everywhere etc. I 
gave up on the issue actually, perhaps I am encountering this bug 
https://issues.dlang.org/show_bug.cgi?id=16511 in my own code. 
Anyway, 32-bit and 64-bit debug work, so does LDC. That's good 
enough for me.


Re: Tricky DMD bug, but I have no idea how to report

2018-12-18 Thread H. S. Teoh via Digitalmars-d-learn
On Tue, Dec 18, 2018 at 10:29:07PM +, JN via Digitalmars-d-learn wrote:
> On Monday, 17 December 2018 at 22:22:05 UTC, H. S. Teoh wrote:
> > A less likely possibility might be an optimizer bug -- do you get
> > different results if you add / remove '-O' (and/or '-inline') from
> > your dmd command-line?  If some combination of -O and -inline (or
> > their removal thereof) "fixes" the problem, it could be an optimizer
> > bug. But those are rare, and usually only show up when you use an
> > obscure D feature combined with another obscure corner case, in a
> > way that people haven't thought of.  My bet is still on a pointer
> > bug somewhere in your code.
> > 
> 
> I played around with dmd commandline. It works with -O. Works with -O
> -inline. As soon as I add -boundscheck=off it breaks.
> 
> As I understand it, out of bounds access is UB. Which would fit my
> problems because they look like UB. But if I run without
> boundscheck=off, shouldn't I get a RangeError somewhere?

In theory, yes.  But I wonder if there's some corner case where some
combination of -O or -inline may cause a bounds check to be elided, but
still hit UB. Perhaps the optimizer skipped a bounds check even though
it shouldn't have.  What about compiling with -boundscheck=off but
without -O -inline?  Does that make a difference?

Barring that, it might be one of those really evil pointer bugs where
the problem has already happened far away from the site where the
symptoms first appear, usually an undetected memory corruption that only
shows up as invalid data long after the actual corruption happened. Very
hard to trace.

Are you sure you didn't accidentally do something like escape a pointer
to a local variable, or a slice of a local static array that has since
gone out of scope?  Because that's what your symptoms most closely
resemble.  The last time I ran into this in my own D code, it was caused
by D's really evil implicit conversion of static arrays to slices, where
passing a local static array implicitly passes a slice instead, e.g.:

SomeObject persistentStorage;

auto someFunc(int[] data)
{
... // stuff
persistentStorage.insert(data); // retains reference to data
...
}

void buggyCode()
{
int[16] arr = ...;
...
someFunc(arr);  // <--- implicit conversion happens here
...
// uh oh, arr is going out of scope, but
// persistentStorage holds a reference to it
}

void main()
{
...
buggyCode(); // escaped reference to local variable
...

// Crash when it tries to access the slice to
// out-of-scope data:
doSomething(persistentStorage);
...
}

Since no explicit slicing was done, there was no compiler error /
warning of any sort, and it wasn't obvious from the code what had
happened. By the time doSomething() was called, it was already long past
the source of the problem in buggyCode(), and it was almost impossible
to trace the problem back to its source.

Theoretically, -dip25 and -dip1000 are supposed to prevent this sort of
problem, but I don't know how fully-implemented they are, whether they
would catch the specific instance in your code, or whether your code
even compiles with these options.


T

-- 
There's light at the end of the tunnel. It's the oncoming train.


Re: Tricky DMD bug, but I have no idea how to report

2018-12-18 Thread JN via Digitalmars-d-learn

On Monday, 17 December 2018 at 22:22:05 UTC, H. S. Teoh wrote:
A less likely possibility might be an optimizer bug -- do you 
get different results if you add / remove '-O' (and/or 
'-inline') from your dmd command-line?  If some combination of 
-O and -inline (or their removal thereof) "fixes" the problem, 
it could be an optimizer bug. But those are rare, and usually 
only show up when you use an obscure D feature combined with 
another obscure corner case, in a way that people haven't 
thought of.  My bet is still on a pointer bug somewhere in your 
code.




I played around with dmd commandline. It works with -O. Works 
with -O -inline. As soon as I add -boundscheck=off it breaks.


As I understand it, out of bounds access is UB. Which would fit 
my problems because they look like UB. But if I run without 
boundscheck=off, shouldn't I get a RangeError somewhere?


Re: Tricky DMD bug, but I have no idea how to report

2018-12-17 Thread Aliak via Digitalmars-d-learn

On Monday, 17 December 2018 at 21:59:59 UTC, JN wrote:

Hey guys,

while working on my game engine project, I encountered a DMD 
codegen bug. It occurs only when compiling in release mode, 
debug works. Unfortunately I am unable to minimize the code, 
since it's quite a bit of code, and changing the code changes 
the bug occurrence. Basically my faulty piece of code looks 
like this


[...]


I remember a couple of months ago someone complaining about 
similar issues when switching to a newer dmd. I tried looking for 
the thread but can’t find it. Think it was on the general list.


Have you tried previous compiler versions yet?


Re: Tricky DMD bug, but I have no idea how to report

2018-12-17 Thread H. S. Teoh via Digitalmars-d-learn
On Mon, Dec 17, 2018 at 09:59:59PM +, JN via Digitalmars-d-learn wrote:
[...]
> class Texture2D {}
> 
> auto a = new Texture2D();
> auto b = new Texture2D();
> auto c = new Texture2D();
> Texture2D[int] TextureBindings;
> writeln(a, b, c);
> textureBindings[0] = a;
> textureBindings[1] = b;
> textureBindings[2] = c;
> writeln(textureBindings);
> 
> and the output is:
> 
> Texture2DTexture2DTexture2D
> [0:null, 2:null, 1:null]
> 
> I'd expect it to output:
> 
> Texture2DTexture2DTexture2D
> [0:Texture2D, 2:Texture2D, 1:Texture2D]
> 
> depending on what I change around this code, for example changing it to
> 
> writeln(a, " ", b, " ", c);
> 
> results in output of:
> 
> Texture2D Texture2D Texture2D
> [0:Texture2D, 2:null, 1:null]

Ah, a pointer bug.  Lovely. :-/

My first guess is that you have a bunch of references to local variables
that have gone out of scope.


> It feels completely random. Removing, adding calls completely
> unrelated to these changes the result.

Typical symptoms of a pointer bug of some kind.  Could be an
uninitialized pointer, if you have used `T* p = void;` anywhere.


> My guess is that the compiler somehow reorders the calls incorrectly,
> changing the semantics.

Possible, but unlikely.  My bet is that you have dangling pointers, most
likely to local variables that have gone out of scope.  Perhaps
somewhere in the code you ran into the evil implicit conversion of
static arrays into slices, which results in dangling pointers if said
slice persists beyond the lifetime of the static array.

Another likely candidate is that if you're calling C/C++ libraries
somewhere in your code, you may have passed in a wrong size, perhaps a
byte count where an array length ought to be used, or vice versa, and as
a result you got a buffer overrun.  I ran into similar bugs when writing
OpenGL code.


> Trick is, LDC works correctly and produces the expected result, both
> when compiling in debug and release mode.
[...]

I bet the bug is still there, just latent because of the slightly
different memory layout when compiling with LDC.  You probably want to
be absolutely sure it's a compiler bug before moving on, as it could
very well be a bug in your code.

A less likely possibility might be an optimizer bug -- do you get
different results if you add / remove '-O' (and/or '-inline') from your
dmd command-line?  If some combination of -O and -inline (or their
removal thereof) "fixes" the problem, it could be an optimizer bug. But
those are rare, and usually only show up when you use an obscure D
feature combined with another obscure corner case, in a way that people
haven't thought of.  My bet is still on a pointer bug somewhere in your
code.


T

-- 
If the comments and the code disagree, it's likely that *both* are wrong. -- 
Christopher