Hello The attached patch (my apologies for gzipping a <1kB file, but mailman keeps invalidating my signatures if I attach diffs) fixes a small but rather annoying segfault in droidSetBits(). If you don't care about the details, you can skip the rest of the mail and apply it, but since the effects were rather interesting, I recommend you start reading _without looking at the patch_.
The first symptom I got was a segfault at the very end of the loading
screen (i.e. just before the 3d display would appear) in the following
loop, at the marked line:
for (inc = 0;inc < psDroid->numWeaps;inc++)
{
psDroid->turretRotation[inc] = 0; /* <-- droid.c:4610 */
psDroid->turretPitch[inc] = 0;
}
At first, the problem seemed obvious: gdb told me that 'inc' was on
the order of 1'000'000, far larger than it should be. But printing
psDroid->numWeaps reveals that it is 0! Furthermore, this happens
_only if compiling without debugging_. Inserting a debugging print
like so:
for (inc = 0;inc < psDroid->numWeaps;inc++)
{
fprintf(stderr, "%d", inc);
psDroid->turretRotation[inc] = 0;
psDroid->turretPitch[inc] = 0;
}
(or any variation on the theme) also lets it run normally. Even more
obscurely, the output will be
0
1
2
3
4
5
6
7
which is still quite invalid, as it's not supposed to run past 2 (at
most).
[If you can, without reading any further, come up with an explanation
for this seemingly totally illogical and erratic behaviour, please
tell me your mail address as you would have saved me quite some grief
;-)]
After quite some time of debugging, experimenting and staring at the
code (and playing rev 452, which I consider stable ;-) ) it turned out
that the loop termination condition was not initialized at this point,
thus explaining how 'inc' could get so large and trigger a segfault:
psDroid->numWeaps = pTemplate->numWeaps; /* moved before the loop */
for (inc = 0;inc < psDroid->numWeaps;inc++)
{
psDroid->turretRotation[inc] = 0;
psDroid->turretPitch[inc] = 0;
}
That's the change the patch does, and it fixes the bug. However, I
couldn't stop at this point; I had to find out why its behaviour was
so erratic. The crucial hint is in the following print, and its
output:
psDroid->numWeaps = pTemplate->numWeaps;
for (inc = 0;inc < psDroid->numWeaps;inc++)
{
fprintf(stderr, "%x, %u", psDroid->numWeaps, inc);
psDroid->turretRotation[inc] = 0;
psDroid->turretPitch[inc] = 0;
}
This results in
51f95e69, 0
51f95e69, 1
51f95e69, 2
51f95e69, 3
51f95e69, 4
51f95e69, 5
51f95e69, 6
51f90000, 7
^^^^
Looking hard at the zeroes, I realised that in a _debugging_ compile,
since 'turretRotation' and 'turretPitch' is before 'numWeaps' in the
DROID structure, as 'inc' gets larger, the loop will reset 'numWeaps'
to 0 after a few iterations and then stop. However, in a _release_
compile, with compiler optimisations, the compiler (GCC 4.0.3) moves
psDroid->numWeaps to a register (a valid optimisation, as referencing
psDroid->turretRotation effectively promises it does not modify
anything out of the array bounds), thus the termination condition is
never corrected and we can't know how long the loop runs. And
finally, if the 'fprintf' is inserted in a release build, the crash
goes away, as the compiler now runs out of registers and has to reload
psDroid->numWeaps from memory, where it has been changed in the
meantime.
I've learned:
1. Check your initialisers. Three times over.
2. Never trust GDB. It tells the truth.
3. There's been a deeper reason all along for this vague feeling that
I hate C (and C++ too).
- Thomas
PS: Off topic, but one of my patches (charon-tabs-segfault.diff) seems
to have been lost in the noise. Could someone commit that?
--
GPG key ID 'BB66CCFD' available from hkp://subkeys.pgp.net or:
http://n.ethz.ch/student/trast/public-key.asc
charon-droidSetBits-segfault.diff.gz
Description: GNU Zip compressed data
pgp2YHsXFqISR.pgp
Description: PGP signature
_______________________________________________ Warzone-dev mailing list [email protected] https://mail.gna.org/listinfo/warzone-dev
