Am Dienstag, 7. November 2006 14:05 schrieb Thomas Rast:
> Hello
>
> The attached patch (my apologies for gzipping a <1kB file, but mailman
> keeps invalidating my signatures if I attach diffs)
Doesn't seem to help. :
(
> fixes a small but
> rather annoying segfault in droidSetBits(). If you don't care about
> the details, you can skip the rest of the mail and apply it, but since
> the effects were rather interesting, I recommend you start reading
> _without looking at the patch_.
>
> The first symptom I got was a segfault at the very end of the loading
> screen (i.e. just before the 3d display would appear) in the following
> loop, at the marked line:
>
> for (inc = 0;inc < psDroid->numWeaps;inc++)
> {
> psDroid->turretRotation[inc] = 0; /* <-- droid.c:4610 */
> psDroid->turretPitch[inc] = 0;
> }
>
> At first, the problem seemed obvious: gdb told me that 'inc' was on
> the order of 1'000'000, far larger than it should be. But printing
> psDroid->numWeaps reveals that it is 0! Furthermore, this happens
> _only if compiling without debugging_. Inserting a debugging print
> like so:
>
> for (inc = 0;inc < psDroid->numWeaps;inc++)
> {
> fprintf(stderr, "%d", inc);
> psDroid->turretRotation[inc] = 0;
> psDroid->turretPitch[inc] = 0;
> }
>
> (or any variation on the theme) also lets it run normally. Even more
> obscurely, the output will be
>
> 0
> 1
> 2
> 3
> 4
> 5
> 6
> 7
>
> which is still quite invalid, as it's not supposed to run past 2 (at
> most).
>
> [If you can, without reading any further, come up with an explanation
> for this seemingly totally illogical and erratic behaviour, please
> tell me your mail address as you would have saved me quite some grief
> ;-)]
>
> After quite some time of debugging, experimenting and staring at the
> code (and playing rev 452, which I consider stable ;-) ) it turned out
> that the loop termination condition was not initialized at this point,
> thus explaining how 'inc' could get so large and trigger a segfault:
>
> psDroid->numWeaps = pTemplate->numWeaps; /* moved before the loop */
> for (inc = 0;inc < psDroid->numWeaps;inc++)
> {
> psDroid->turretRotation[inc] = 0;
> psDroid->turretPitch[inc] = 0;
> }
>
> That's the change the patch does, and it fixes the bug. However, I
> couldn't stop at this point; I had to find out why its behaviour was
> so erratic. The crucial hint is in the following print, and its
> output:
>
> psDroid->numWeaps = pTemplate->numWeaps;
> for (inc = 0;inc < psDroid->numWeaps;inc++)
> {
> fprintf(stderr, "%x, %u", psDroid->numWeaps, inc);
> psDroid->turretRotation[inc] = 0;
> psDroid->turretPitch[inc] = 0;
> }
>
> This results in
>
> 51f95e69, 0
> 51f95e69, 1
> 51f95e69, 2
> 51f95e69, 3
> 51f95e69, 4
> 51f95e69, 5
> 51f95e69, 6
> 51f90000, 7
> ^^^^
>
> Looking hard at the zeroes, I realised that in a _debugging_ compile,
> since 'turretRotation' and 'turretPitch' is before 'numWeaps' in the
> DROID structure, as 'inc' gets larger, the loop will reset 'numWeaps'
> to 0 after a few iterations and then stop. However, in a _release_
> compile, with compiler optimisations, the compiler (GCC 4.0.3) moves
> psDroid->numWeaps to a register (a valid optimisation, as referencing
> psDroid->turretRotation effectively promises it does not modify
> anything out of the array bounds), thus the termination condition is
> never corrected and we can't know how long the loop runs. And
> finally, if the 'fprintf' is inserted in a release build, the crash
> goes away, as the compiler now runs out of registers and has to reload
> psDroid->numWeaps from memory, where it has been changed in the
> meantime.
>
> I've learned:
>
> 1. Check your initialisers. Three times over.
And pay attention to the compiler's warnings, telling you about those. ;)> 2. Never trust GDB. It tells the truth. > > 3. There's been a deeper reason all along for this vague feeling that > I hate C (and C++ too). > > - Thomas > > PS: Off topic, but one of my patches (charon-tabs-segfault.diff) seems > to have been lost in the noise. Could someone commit that? Ah yes, I didn't commit that, because it may hide a bug. You experienced a segfault, and now fix it by reseting the vars. Better would be to find out why the vars got corrupted. And at least do an ASSERT() (Will just print a error message in NDEBUG mode.) --Dennis
pgpnBcdeJB5kk.pgp
Description: PGP signature
_______________________________________________ Warzone-dev mailing list [email protected] https://mail.gna.org/listinfo/warzone-dev
