Am Dienstag, 7. November 2006 14:05 schrieb Thomas Rast:
> Hello
>
> The attached patch (my apologies for gzipping a <1kB file, but mailman
> keeps invalidating my signatures if I attach diffs)
Doesn't seem to help. :
(
> fixes a small but 
> rather annoying segfault in droidSetBits().  If you don't care about
> the details, you can skip the rest of the mail and apply it, but since
> the effects were rather interesting, I recommend you start reading
> _without looking at the patch_.
>
> The first symptom I got was a segfault at the very end of the loading
> screen (i.e. just before the 3d display would appear) in the following
> loop, at the marked line:
>
> for (inc = 0;inc < psDroid->numWeaps;inc++)
> {
>         psDroid->turretRotation[inc] = 0;     /* <-- droid.c:4610 */
>         psDroid->turretPitch[inc] = 0;
> }
>
> At first, the problem seemed obvious: gdb told me that 'inc' was on
> the order of 1'000'000, far larger than it should be.  But printing
> psDroid->numWeaps reveals that it is 0!  Furthermore, this happens
> _only if compiling without debugging_.  Inserting a debugging print
> like so:
>
> for (inc = 0;inc < psDroid->numWeaps;inc++)
> {
>         fprintf(stderr, "%d", inc);
>         psDroid->turretRotation[inc] = 0;
>         psDroid->turretPitch[inc] = 0;
> }
>
> (or any variation on the theme) also lets it run normally.  Even more
> obscurely, the output will be
>
>   0
>   1
>   2
>   3
>   4
>   5
>   6
>   7
>
> which is still quite invalid, as it's not supposed to run past 2 (at
> most).
>
> [If you can, without reading any further, come up with an explanation
> for this seemingly totally illogical and erratic behaviour, please
> tell me your mail address as you would have saved me quite some grief
> ;-)]
>
> After quite some time of debugging, experimenting and staring at the
> code (and playing rev 452, which I consider stable ;-) ) it turned out
> that the loop termination condition was not initialized at this point,
> thus explaining how 'inc' could get so large and trigger a segfault:
>
> psDroid->numWeaps = pTemplate->numWeaps;   /* moved before the loop */
> for (inc = 0;inc < psDroid->numWeaps;inc++)
> {
>         psDroid->turretRotation[inc] = 0;
>         psDroid->turretPitch[inc] = 0;
> }
>
> That's the change the patch does, and it fixes the bug.  However, I
> couldn't stop at this point; I had to find out why its behaviour was
> so erratic.  The crucial hint is in the following print, and its
> output:
>
> psDroid->numWeaps = pTemplate->numWeaps;
> for (inc = 0;inc < psDroid->numWeaps;inc++)
> {
>       fprintf(stderr, "%x, %u", psDroid->numWeaps, inc);
>         psDroid->turretRotation[inc] = 0;
>         psDroid->turretPitch[inc] = 0;
> }
>
> This results in
>
>   51f95e69, 0
>   51f95e69, 1
>   51f95e69, 2
>   51f95e69, 3
>   51f95e69, 4
>   51f95e69, 5
>   51f95e69, 6
>   51f90000, 7
>       ^^^^
>
> Looking hard at the zeroes, I realised that in a _debugging_ compile,
> since 'turretRotation' and 'turretPitch' is before 'numWeaps' in the
> DROID structure, as 'inc' gets larger, the loop will reset 'numWeaps'
> to 0 after a few iterations and then stop.  However, in a _release_
> compile, with compiler optimisations, the compiler (GCC 4.0.3) moves
> psDroid->numWeaps to a register (a valid optimisation, as referencing
> psDroid->turretRotation effectively promises it does not modify
> anything out of the array bounds), thus the termination condition is
> never corrected and we can't know how long the loop runs.  And
> finally, if the 'fprintf' is inserted in a release build, the crash
> goes away, as the compiler now runs out of registers and has to reload
> psDroid->numWeaps from memory, where it has been changed in the
> meantime.
>
> I've learned:
>
> 1. Check your initialisers.  Three times over.
And pay attention to the compiler's warnings, telling you about those. ;)

> 2. Never trust GDB.  It tells the truth.
>
> 3. There's been a deeper reason all along for this vague feeling that
>    I hate C (and C++ too).
>
> - Thomas
>
> PS: Off topic, but one of my patches (charon-tabs-segfault.diff) seems
> to have been lost in the noise.  Could someone commit that?
Ah yes, I didn't commit that, because it may hide a bug. You experienced a 
segfault, and now fix it by reseting the vars. Better would be to find out 
why the vars got corrupted. And at least do an ASSERT() (Will just print a 
error message in NDEBUG mode.)

--Dennis

Attachment: pgpnBcdeJB5kk.pgp
Description: PGP signature

_______________________________________________
Warzone-dev mailing list
[email protected]
https://mail.gna.org/listinfo/warzone-dev

Reply via email to