Re: Memory corruption (?) I don't understand
Senthil Kumar a écrit : > https://gcc.gnu.org/bugzilla/ > > You would need a reduced testcase/preprocessed file (xxx.i obtained by > compiling with -save-temps) to debug the issue - the .S file isn't > useful to debug the compiler. Done. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101188
Re: Memory corruption (?) I don't understand
https://gcc.gnu.org/bugzilla/ You would need a reduced testcase/preprocessed file (xxx.i obtained by compiling with -save-temps) to debug the issue - the .S file isn't useful to debug the compiler. Regards Senthil On Thu, Jun 24, 2021 at 11:36 AM BERTRAND Joël wrote: > > Senthil Kumar a écrit : > > This looks like a miscompilation to me. > > ... > > > What version of the compiler are you using? Can you make a reduced > > testcase containing just the LDL_MAC_otaa function? > > Just a question. Should I fill a bug report ? And if yes, where ? > > Best regards, > > JKB
Re: Memory corruption (?) I don't understand
Senthil Kumar a écrit : > This looks like a miscompilation to me. ... > What version of the compiler are you using? Can you make a reduced > testcase containing just the LDL_MAC_otaa function? Just a question. Should I fill a bug report ? And if yes, where ? Best regards, JKB
Re: Memory corruption (?) I don't understand
Senthil Kumar a écrit : > What version of the compiler are you using? I have seen this bug with Debian compiler : hilbert:[~] > avr-gcc -v Using built-in specs. Reading specs from /usr/lib/gcc/avr/5.4.0/device-specs/specs-avr2 COLLECT_GCC=avr-gcc COLLECT_LTO_WRAPPER=/usr/lib/gcc/avr/5.4.0/lto-wrapper Target: avr Configured with: ../gcc/configure -v --enable-languages=c,c++ --prefix=/usr/lib --infodir=/usr/share/info --mandir=/usr/share/man --bindir=/usr/bin --libexecdir=/usr/lib --libdir=/usr/lib --enable-shared --with-system-zlib --enable-long-long --enable-nls --without-included-gettext --disable-libssp --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=avr CFLAGS='-g -O2 -fdebug-prefix-map=/build/gcc-avr-Q3AFj7/gcc-avr-5.4.0+Atmel3.6.2=. -fstack-protector-strong -Wformat ' CPPFLAGS='-Wdate-time -D_FORTIFY_SOURCE=2' CXXFLAGS='-g -O2 -fdebug-prefix-map=/build/gcc-avr-Q3AFj7/gcc-avr-5.4.0+Atmel3.6.2=. -fstack-protector-strong -Wformat ' FCFLAGS='-g -O2 -fdebug-prefix-map=/build/gcc-avr-Q3AFj7/gcc-avr-5.4.0+Atmel3.6.2=. -fstack-protector-strong' FFLAGS='-g -O2 -fdebug-prefix-map=/build/gcc-avr-Q3AFj7/gcc-avr-5.4.0+Atmel3.6.2=. -fstack-protector-strong' GCJFLAGS='-g -O2 -fdebug-prefix-map=/build/gcc-avr-Q3AFj7/gcc-avr-5.4.0+Atmel3.6.2=. -fstack-protector-strong' LDFLAGS=-Wl,-z,relro OBJCFLAGS='-g -O2 -fdebug-prefix-map=/build/gcc-avr-Q3AFj7/gcc-avr-5.4.0+Atmel3.6.2=. -fstack-protector-strong -Wformat ' OBJCXXFLAGS='-g -O2 -fdebug-prefix-map=/build/gcc-avr-Q3AFj7/gcc-avr-5.4.0+Atmel3.6.2=. -fstack-protector-strong -Wformat ' Thread model: single gcc version 5.4.0 (GCC) I have tried to build another compiler from scratch : hilbert:[~] > /usr/local/cross/avr/bin/avr-gcc -v Using built-in specs. Reading specs from /usr/local/cross/avr/lib/gcc/avr/11.1.0/device-specs/specs-avr2 COLLECT_GCC=/usr/local/cross/avr/bin/avr-gcc COLLECT_LTO_WRAPPER=/usr/local/cross/avr/libexec/gcc/avr/11.1.0/lto-wrapper Target: avr Configured with: /usr/local/cross/build/avr/src/gcc/configure --build=x86_64-build_pc-linux-gnu --host=x86_64-build_pc-linux-gnu --target=avr --prefix=/usr/local/cross/avr --exec_prefix=/usr/local/cross/avr --with-local-prefix=/usr/local/cross/avr/avr --with-headers=/usr/local/cross/avr/avr/include --with-newlib --enable-threads=no --disable-shared --with-pkgversion='crosstool-NG 1.24.0.349_6d00833' --disable-__cxa_atexit --disable-libgomp --disable-libmudflap --disable-libmpx --disable-libssp --disable-libquadmath --disable-libquadmath-support --with-gmp=/usr/local/cross/build/avr/buildtools --with-mpfr=/usr/local/cross/build/avr/buildtools --with-mpc=/usr/local/cross/build/avr/buildtools --with-isl=/usr/local/cross/build/avr/buildtools --enable-lto --with-host-libstdcxx='-static-libgcc -Wl,-Bstatic,-lstdc++,-Bdynamic -lm' --enable-target-optspace --disable-nls --enable-multiarch --enable-languages=c Thread model: single Supported LTO compression algorithms: zlib zstd gcc version 11.1.0 (crosstool-NG 1.24.0.349_6d00833) .S I have uploaded were compiled with 11.1.0. > Can you make a reduced > testcase containing just the LDL_MAC_otaa function? Firmware I have sent is a minimal example. Of course, a lot of code is compiled but is not executed. This code only initializes real board and sents a packet to my LoRaWAN base station. Regards, JKB
Re: Memory corruption (?) I don't understand
This looks like a miscompilation to me. Specifically, in good_2.S, you have a318: f6 01 movw r30, r12 ... a33c: e3 50 subi r30, 0x03 ; 3 a33e: ff 4f sbci r31, 0xFF ; 255 a340: 01 90 ld r0, Z+ a342: f0 81 ld r31, Z a344: e0 2d mov r30, r0 This loads Z with R13:R12, and then later offsets it by -0xFF03 to obtain the address of self->handler, which is then used by the icall instruction. a31c: f6 01 movw r30, r12 ... a340: 68 01 movw r12, r16 a342: ff e4 ldi r31, 0x4F ; 79 a344: cf 1a sub r12, r31 a346: fe ef ldi r31, 0xFE ; 254 a348: df 0a sbc r13, r31 a34a: e3 50 subi r30, 0x03 ; 3 a34c: ff 4f sbci r31, 0xFF ; 255 a34e: 01 90 ld r0, Z+ a350: f0 81 ld r31, Z a352: e0 2d mov r30, r0 Note the clobbering of R31:R30 with immediate values *before* the offsetting is done. I think this is a codegen bug - the compiler should have either picked a different set of regs to subtract R13:R12 from, or should have restored Z with a movw r30, r12 before 0xa34a. What version of the compiler are you using? Can you make a reduced testcase containing just the LDL_MAC_otaa function? Regards Senthil On Wed, Jun 23, 2021 at 3:12 PM BERTRAND Joël wrote: > > David Brown a écrit : > > On 23/06/2021 10:37, BERTRAND Joël wrote: > > > >> In bad.S, self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, ) gives : > >> a340: 68 01 movwr12, r16 > >> a342: ff e4 ldi r31, 0x4F ; 79 > >> a344: cf 1a sub r12, r31 > >> a346: fe ef ldi r31, 0xFE ; 254 > >> a348: df 0a sbc r13, r31 > >> a34a: e3 50 subir30, 0x03 ; 3 > >> a34c: ff 4f sbcir31, 0xFF ; 255 > >> a34e: 01 90 ld r0, Z+ > >> a350: f0 81 ld r31, Z > >> a352: e0 2d mov r30, r0 > >> a354: ae 01 movwr20, r28 > >> a356: 4f 5a subir20, 0xAF ; 175 > >> a358: 5f 4f sbcir21, 0xFF ; 255 > >> a35a: 65 e0 ldi r22, 0x05 ; 5 > >> a35c: 70 e0 ldi r23, 0x00 ; 0 > >> a35e: d6 01 movwr26, r12 > >> a360: 8d 91 ld r24, X+ > >> a362: 9c 91 ld r25, X > >> a364: 09 95 icall > >> > >> I have rebuilt a second firmware that works as expected with: > >> > >> if (self->handler == NULL) for(;;); > >> self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, ); > >> > >> Asmb output contains : > >> > >> if (self->handler == NULL) for(;;); > >> a33c: e3 50 subir30, 0x03 ; 3 > >> a33e: ff 4f sbcir31, 0xFF ; 255 > >> a340: 01 90 ld r0, Z+ > >> a342: f0 81 ld r31, Z > >> a344: e0 2d mov r30, r0 > >> a346: 30 97 sbiwr30, 0x00 ; 0 > >> a348: 09 f4 brne.+2 ; 0xa34c > >> a34a: 4a c0 rjmp.+148 ; 0xa3e0 > >> self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, ); > >> a34c: 68 01 movwr12, r16 > >> a34e: 2f e4 ldi r18, 0x4F ; 79 > >> a350: c2 1a sub r12, r18 > >> a352: 2e ef ldi r18, 0xFE ; 254 > >> a354: d2 0a sbc r13, r18 > >> a356: ae 01 movwr20, r28 > >> a358: 4f 5a subir20, 0xAF ; 175 > >> a35a: 5f 4f sbcir21, 0xFF ; 255 > >> a35c: 65 e0 ldi r22, 0x05 ; 5 > >> a35e: 70 e0 ldi r23, 0x00 ; 0 > >> a360: d6 01 movwr26, r12 > >> a362: 8d 91 ld r24, X+ > >> a364: 9c 91 ld r25, X > >> a366: 09 95 icall > >> > >> I'm not a AVR ASMB specialist, but I don't understand why > >> self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, ) produces a > >> different code (with of course the same compilation options) when it was > >> preceded by if (self->handler == NULL) for(;;); > >> > > > > I haven't looked at any of the rest of your code. But here I think the > > differences are just that after the NULL check, the compiler knows that > > self->handler is already in Z, and this can be re-used later in the > > indirect call. So the registers used for getting the other parameters, > > and perhaps the order they are collected, can change accordingly. > > Optimisations and register allocation algorithms can often make the > > assembly appear significantly differently when in fact the changes are > > typically cosmetic. > > Maybe, but how do you can explain that with a NULL test, code runs as > expected and without, it crashes ? > > I have tried another test : > > if (self->handler != app_handler) for(;;); > self->handler(self->app,
Re: Memory corruption (?) I don't understand
David Brown a écrit : > On 23/06/2021 10:37, BERTRAND Joël wrote: > >> In bad.S, self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, ) gives : >> a340: 68 01 movwr12, r16 >> a342: ff e4 ldi r31, 0x4F ; 79 >> a344: cf 1a sub r12, r31 >> a346: fe ef ldi r31, 0xFE ; 254 >> a348: df 0a sbc r13, r31 >> a34a: e3 50 subir30, 0x03 ; 3 >> a34c: ff 4f sbcir31, 0xFF ; 255 >> a34e: 01 90 ld r0, Z+ >> a350: f0 81 ld r31, Z >> a352: e0 2d mov r30, r0 >> a354: ae 01 movwr20, r28 >> a356: 4f 5a subir20, 0xAF ; 175 >> a358: 5f 4f sbcir21, 0xFF ; 255 >> a35a: 65 e0 ldi r22, 0x05 ; 5 >> a35c: 70 e0 ldi r23, 0x00 ; 0 >> a35e: d6 01 movwr26, r12 >> a360: 8d 91 ld r24, X+ >> a362: 9c 91 ld r25, X >> a364: 09 95 icall >> >> I have rebuilt a second firmware that works as expected with: >> >> if (self->handler == NULL) for(;;); >> self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, ); >> >> Asmb output contains : >> >> if (self->handler == NULL) for(;;); >> a33c: e3 50 subir30, 0x03 ; 3 >> a33e: ff 4f sbcir31, 0xFF ; 255 >> a340: 01 90 ld r0, Z+ >> a342: f0 81 ld r31, Z >> a344: e0 2d mov r30, r0 >> a346: 30 97 sbiwr30, 0x00 ; 0 >> a348: 09 f4 brne.+2 ; 0xa34c >> a34a: 4a c0 rjmp.+148 ; 0xa3e0 >> self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, ); >> a34c: 68 01 movwr12, r16 >> a34e: 2f e4 ldi r18, 0x4F ; 79 >> a350: c2 1a sub r12, r18 >> a352: 2e ef ldi r18, 0xFE ; 254 >> a354: d2 0a sbc r13, r18 >> a356: ae 01 movwr20, r28 >> a358: 4f 5a subir20, 0xAF ; 175 >> a35a: 5f 4f sbcir21, 0xFF ; 255 >> a35c: 65 e0 ldi r22, 0x05 ; 5 >> a35e: 70 e0 ldi r23, 0x00 ; 0 >> a360: d6 01 movwr26, r12 >> a362: 8d 91 ld r24, X+ >> a364: 9c 91 ld r25, X >> a366: 09 95 icall >> >> I'm not a AVR ASMB specialist, but I don't understand why >> self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, ) produces a >> different code (with of course the same compilation options) when it was >> preceded by if (self->handler == NULL) for(;;); >> > > I haven't looked at any of the rest of your code. But here I think the > differences are just that after the NULL check, the compiler knows that > self->handler is already in Z, and this can be re-used later in the > indirect call. So the registers used for getting the other parameters, > and perhaps the order they are collected, can change accordingly. > Optimisations and register allocation algorithms can often make the > assembly appear significantly differently when in fact the changes are > typically cosmetic. Maybe, but how do you can explain that with a NULL test, code runs as expected and without, it crashes ? I have tried another test : if (self->handler != app_handler) for(;;); self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, ); and firmware runs as expected. But this one crashes : if (self->handler != app_handler) {} self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, ); I have uploded three firmwares (built with -Os) : https://hilbert.systella.fr/public/good.S https://hilbert.systella.fr/public/good_2.S https://hilbert.systella.fr/public/bad.S Best regards, JKB
Re: Memory corruption (?) I don't understand
This part > a340: 68 01 movwr12, r16 > a342: ff e4 ldi r31, 0x4F ; 79 > a344: cf 1a sub r12, r31 > a346: fe ef ldi r31, 0xFE ; 254 > a348: df 0a sbc r13, r31 > a34a: e3 50 subir30, 0x03 ; 3 > a34c: ff 4f sbcir31, 0xFF ; 255 > a34e: 01 90 ld r0, Z+ > a350: f0 81 ld r31, Z > a352: e0 2d mov r30, r0 looks suspicious - the Z register is loaded with 0xFE4F and then just subtracted with 0xFF03 to obtain self->handler, whereas self->app is obtained by offsetting R13:R12. Explicitly checking if self->handler is NULL also has the sub, but I presume R31:R30 are loaded with the address of self beforehand? > a33c: e3 50 subir30, 0x03 ; 3 > a33e: ff 4f sbcir31, 0xFF ; 255 > a340: 01 90 ld r0, Z+ > a342: f0 81 ld r31, Z > a344: e0 2d mov r30, r0 Regards Senthil On Wed, Jun 23, 2021 at 2:07 PM BERTRAND Joël wrote: > > Comparaison between good.S and bad.S. > > Good.elf is compiled with : > > app_handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, ); > > instead of bad.elf that is compiled with : > > self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, ); > > in enum ldl_mac_status LDL_MAC_otaa(struct ldl_mac *self) > > In both .S files: > - app_handler is written @22f2. > - argmac.handler = app_handler gives following code : > > 2886: 89 e7 ldi r24, 0x79 ; 121 > 2888: 91 e1 ldi r25, 0x11 ; 17 > 288a: 9c 87 std Y+12, r25 ; 0x0c > 288c: 8b 87 std Y+11, r24 ; 0x0b > > In good.S, app_handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, ) is > built as : > a340: 68 01 movwr12, r16 > a342: ff e4 ldi r31, 0x4F ; 79 > a344: cf 1a sub r12, r31 > a346: fe ef ldi r31, 0xFE ; 254 > a348: df 0a sbc r13, r31 > a34a: ae 01 movwr20, r28 > a34c: 4f 5a subir20, 0xAF ; 175 > a34e: 5f 4f sbcir21, 0xFF ; 255 > // 0x05 0x00 = LDL_MAC_DEV_NONCE_UPDATED > a350: 65 e0 ldi r22, 0x05 ; 5 > a352: 70 e0 ldi r23, 0x00 ; 0 > a354: d6 01 movwr26, r12 > a356: 8d 91 ld r24, X+ > a358: 9c 91 ld r25, X > a35a: 0e 94 79 11 call0x22f2 ; 0x22f2 > > In bad.S, self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, ) gives : > a340: 68 01 movwr12, r16 > a342: ff e4 ldi r31, 0x4F ; 79 > a344: cf 1a sub r12, r31 > a346: fe ef ldi r31, 0xFE ; 254 > a348: df 0a sbc r13, r31 > a34a: e3 50 subir30, 0x03 ; 3 > a34c: ff 4f sbcir31, 0xFF ; 255 > a34e: 01 90 ld r0, Z+ > a350: f0 81 ld r31, Z > a352: e0 2d mov r30, r0 > a354: ae 01 movwr20, r28 > a356: 4f 5a subir20, 0xAF ; 175 > a358: 5f 4f sbcir21, 0xFF ; 255 > a35a: 65 e0 ldi r22, 0x05 ; 5 > a35c: 70 e0 ldi r23, 0x00 ; 0 > a35e: d6 01 movwr26, r12 > a360: 8d 91 ld r24, X+ > a362: 9c 91 ld r25, X > a364: 09 95 icall > > I have rebuilt a second firmware that works as expected with: > > if (self->handler == NULL) for(;;); > self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, ); > > Asmb output contains : > > if (self->handler == NULL) for(;;); > a33c: e3 50 subir30, 0x03 ; 3 > a33e: ff 4f sbcir31, 0xFF ; 255 > a340: 01 90 ld r0, Z+ > a342: f0 81 ld r31, Z > a344: e0 2d mov r30, r0 > a346: 30 97 sbiwr30, 0x00 ; 0 > a348: 09 f4 brne.+2 ; 0xa34c > a34a: 4a c0 rjmp.+148 ; 0xa3e0 > self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, ); > a34c: 68 01 movwr12, r16 > a34e: 2f e4 ldi r18, 0x4F ; 79 > a350: c2 1a sub r12, r18 > a352: 2e ef ldi r18, 0xFE ; 254 > a354: d2 0a sbc r13, r18 > a356: ae 01 movwr20, r28 > a358: 4f 5a subir20, 0xAF ; 175 > a35a: 5f 4f sbcir21, 0xFF ; 255 > a35c: 65 e0 ldi r22, 0x05 ; 5 > a35e: 70 e0 ldi r23, 0x00 ; 0 > a360: d6 01 movwr26, r12 > a362: 8d 91 ld r24, X+ > a364: 9c 91 ld r25, X > a366: 09 95 icall > > I'm not a AVR ASMB specialist, but I don't understand why >
Re: Memory corruption (?) I don't understand
On 23/06/2021 10:37, BERTRAND Joël wrote: > In bad.S, self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, ) gives : > a340: 68 01 movwr12, r16 > a342: ff e4 ldi r31, 0x4F ; 79 > a344: cf 1a sub r12, r31 > a346: fe ef ldi r31, 0xFE ; 254 > a348: df 0a sbc r13, r31 > a34a: e3 50 subir30, 0x03 ; 3 > a34c: ff 4f sbcir31, 0xFF ; 255 > a34e: 01 90 ld r0, Z+ > a350: f0 81 ld r31, Z > a352: e0 2d mov r30, r0 > a354: ae 01 movwr20, r28 > a356: 4f 5a subir20, 0xAF ; 175 > a358: 5f 4f sbcir21, 0xFF ; 255 > a35a: 65 e0 ldi r22, 0x05 ; 5 > a35c: 70 e0 ldi r23, 0x00 ; 0 > a35e: d6 01 movwr26, r12 > a360: 8d 91 ld r24, X+ > a362: 9c 91 ld r25, X > a364: 09 95 icall > > I have rebuilt a second firmware that works as expected with: > > if (self->handler == NULL) for(;;); > self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, ); > > Asmb output contains : > > if (self->handler == NULL) for(;;); > a33c: e3 50 subir30, 0x03 ; 3 > a33e: ff 4f sbcir31, 0xFF ; 255 > a340: 01 90 ld r0, Z+ > a342: f0 81 ld r31, Z > a344: e0 2d mov r30, r0 > a346: 30 97 sbiwr30, 0x00 ; 0 > a348: 09 f4 brne.+2 ; 0xa34c > a34a: 4a c0 rjmp.+148 ; 0xa3e0 > self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, ); > a34c: 68 01 movwr12, r16 > a34e: 2f e4 ldi r18, 0x4F ; 79 > a350: c2 1a sub r12, r18 > a352: 2e ef ldi r18, 0xFE ; 254 > a354: d2 0a sbc r13, r18 > a356: ae 01 movwr20, r28 > a358: 4f 5a subir20, 0xAF ; 175 > a35a: 5f 4f sbcir21, 0xFF ; 255 > a35c: 65 e0 ldi r22, 0x05 ; 5 > a35e: 70 e0 ldi r23, 0x00 ; 0 > a360: d6 01 movwr26, r12 > a362: 8d 91 ld r24, X+ > a364: 9c 91 ld r25, X > a366: 09 95 icall > > I'm not a AVR ASMB specialist, but I don't understand why > self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, ) produces a > different code (with of course the same compilation options) when it was > preceded by if (self->handler == NULL) for(;;); > I haven't looked at any of the rest of your code. But here I think the differences are just that after the NULL check, the compiler knows that self->handler is already in Z, and this can be re-used later in the indirect call. So the registers used for getting the other parameters, and perhaps the order they are collected, can change accordingly. Optimisations and register allocation algorithms can often make the assembly appear significantly differently when in fact the changes are typically cosmetic.
Re: Memory corruption (?) I don't understand
Comparaison between good.S and bad.S. Good.elf is compiled with : app_handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, ); instead of bad.elf that is compiled with : self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, ); in enum ldl_mac_status LDL_MAC_otaa(struct ldl_mac *self) In both .S files: - app_handler is written @22f2. - argmac.handler = app_handler gives following code : 2886: 89 e7 ldi r24, 0x79 ; 121 2888: 91 e1 ldi r25, 0x11 ; 17 288a: 9c 87 std Y+12, r25 ; 0x0c 288c: 8b 87 std Y+11, r24 ; 0x0b In good.S, app_handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, ) is built as : a340: 68 01 movwr12, r16 a342: ff e4 ldi r31, 0x4F ; 79 a344: cf 1a sub r12, r31 a346: fe ef ldi r31, 0xFE ; 254 a348: df 0a sbc r13, r31 a34a: ae 01 movwr20, r28 a34c: 4f 5a subir20, 0xAF ; 175 a34e: 5f 4f sbcir21, 0xFF ; 255 // 0x05 0x00 = LDL_MAC_DEV_NONCE_UPDATED a350: 65 e0 ldi r22, 0x05 ; 5 a352: 70 e0 ldi r23, 0x00 ; 0 a354: d6 01 movwr26, r12 a356: 8d 91 ld r24, X+ a358: 9c 91 ld r25, X a35a: 0e 94 79 11 call0x22f2 ; 0x22f2 In bad.S, self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, ) gives : a340: 68 01 movwr12, r16 a342: ff e4 ldi r31, 0x4F ; 79 a344: cf 1a sub r12, r31 a346: fe ef ldi r31, 0xFE ; 254 a348: df 0a sbc r13, r31 a34a: e3 50 subir30, 0x03 ; 3 a34c: ff 4f sbcir31, 0xFF ; 255 a34e: 01 90 ld r0, Z+ a350: f0 81 ld r31, Z a352: e0 2d mov r30, r0 a354: ae 01 movwr20, r28 a356: 4f 5a subir20, 0xAF ; 175 a358: 5f 4f sbcir21, 0xFF ; 255 a35a: 65 e0 ldi r22, 0x05 ; 5 a35c: 70 e0 ldi r23, 0x00 ; 0 a35e: d6 01 movwr26, r12 a360: 8d 91 ld r24, X+ a362: 9c 91 ld r25, X a364: 09 95 icall I have rebuilt a second firmware that works as expected with: if (self->handler == NULL) for(;;); self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, ); Asmb output contains : if (self->handler == NULL) for(;;); a33c: e3 50 subir30, 0x03 ; 3 a33e: ff 4f sbcir31, 0xFF ; 255 a340: 01 90 ld r0, Z+ a342: f0 81 ld r31, Z a344: e0 2d mov r30, r0 a346: 30 97 sbiwr30, 0x00 ; 0 a348: 09 f4 brne.+2 ; 0xa34c a34a: 4a c0 rjmp.+148 ; 0xa3e0 self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, ); a34c: 68 01 movwr12, r16 a34e: 2f e4 ldi r18, 0x4F ; 79 a350: c2 1a sub r12, r18 a352: 2e ef ldi r18, 0xFE ; 254 a354: d2 0a sbc r13, r18 a356: ae 01 movwr20, r28 a358: 4f 5a subir20, 0xAF ; 175 a35a: 5f 4f sbcir21, 0xFF ; 255 a35c: 65 e0 ldi r22, 0x05 ; 5 a35e: 70 e0 ldi r23, 0x00 ; 0 a360: d6 01 movwr26, r12 a362: 8d 91 ld r24, X+ a364: 9c 91 ld r25, X a366: 09 95 icall I'm not a AVR ASMB specialist, but I don't understand why self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, ) produces a different code (with of course the same compilation options) when it was preceded by if (self->handler == NULL) for(;;); Best regards, JKB
Re: Memory corruption (?) I don't understand
Some news. In lorawan/ldl_mac.c is defined LDL_MAC_otaa : enum ldl_mac_status LDL_MAC_otaa(struct ldl_mac *self) { enum ldl_mac_status retval; union ldl_mac_response_arg arg; LDL_PEDANTIC(self != NULL) if(self->ctx.joined){ retval = LDL_STATUS_JOINED; } else if(self->op == LDL_OP_NONE){ if(self->devNonce <= U32(UINT16_MAX)){ forgetNetwork(self); self->trials = 0; self->day = U32(60) * U32(60) * U32(24) * timeTPS; #if defined(LDL_ENABLE_L2_1_1) LDL_OPS_deriveJoinKeys(self); #endif fillJoinBuffer(self, U16(self->devNonce)); self->devNonce++; arg.dev_nonce_updated.nextDevNonce = self->devNonce; self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, ); self->tx.power = 0; self->op = LDL_OP_JOINING; if(self->state == LDL_STATE_IDLE){ self->state = LDL_STATE_WAIT_OTAA; LDL_MAC_timerSet(self, LDL_TIMER_WAITA, 0); } retval = LDL_STATUS_OK; LDL_DEBUG("OTAA is pending") } else{ /* need to re-init with a different JoinEUI */ retval = LDL_STATUS_DEVNONCE; } } else{ retval = LDL_STATUS_BUSY; } return retval; } In this function, there is a call to self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, ). If I comment out this line, firmware doesn't crash anymore. I think in a first time that self->handler is nullified by something. OK, I've checked and I have added just before self->handler() : if (self->handler == NULL) for(;;); After recompilation, firmware runs as expected (!) and doesn't enter in loop (!). I don't understand why this test fix this bug. I have tried to replace in a second time : self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, ); by a direct call to app_handler function (and of course without previous test): app_handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, ); Thus : - self->handler() triggers the bug ; - I'm not sure that self->handler is really nullified as execution doesn't enter in if (self->handler == NULL) for(;;); - why and how does the line if (self->handler == NULL) for(;;); fix this issue ? self->handler() seems to be modified by something or CPU is not able to jump to specified address. Is there a difference in generated asm code between a call to a function pointer and a direct call ? I can post three firmware.elf if required : - original that crashes ; - modified with if (self->handler == NULL) for(;;); that runs as expected : - modified without test but with direct call to app_handler() that runs. Best regards, JKB
Re: Memory corruption (?) I don't understand
Trampas Stern a écrit : > Also when you print out the size in the make file, how much SRAM does it > indicate your program is using? AVR Memory Usage Device: atmega1284 Program: 101324 bytes (77.3% Full) (.text + .data + .bootloader) Data: 6890 bytes (42.1% Full) (.data + .bss + .noinit) EEPROM: 2276 bytes (55.6% Full) (.eeprom)
Re: Memory corruption (?) I don't understand
Also when you print out the size in the make file, how much SRAM does it indicate your program is using? On Tue, Jun 22, 2021 at 2:10 PM Trampas Stern wrote: > So I would recommend that you put in the ASSERT for the LDL_PEDANTIC() > macro. > > Secondly, you need to be careful with PROGMEM, that is with AVR being a > Harvard machine you can burn up SRAM quickly by not using PROGMEM for > constants. For example with the LDL library you need to be sure > that LDL_ENABLE_AVR is defined. > > Note the AVR is a great processor, however I have personally not used one > in about 15 years. The price and ease of programming ARM Cortex M have > gotten such AVR are no longer viable for new development as I can develop > products so much faster with ARM Cortex M parts, and their power and price > is comparable to AVR. > > Another trick you can do is inside a function do this: > > void printMem(void) { >uint8_t *ptr; >printf("stack 0x%X\n", ); //this will be address on stack where ptr > is. >ptr=malloc(10); >printf("heap 0x%X\n", ptr); //this will be address as to heap >free(ptr); > } > > This will help as you can do some rough tracking of stack and heap usage, > assuming the AVR linker script has stack growing from top of memory to end > of heap it will help you check for stack overflow. > > > > > > On Tue, Jun 22, 2021 at 1:39 PM BERTRAND Joël > wrote: > >> Strange. Following function runs as expected. >> >> enum ldl_mac_status LDL_MAC_otaa(struct ldl_mac *self) >> { >> enum ldl_mac_status retval; >> union ldl_mac_response_arg arg; >> >> LDL_PEDANTIC(self != NULL) >> >> if(self->ctx.joined){ >> >> retval = LDL_STATUS_JOINED; >> } >> else if(self->op == LDL_OP_NONE){ >> >> if(self->devNonce <= U32(UINT16_MAX)){ >> >> forgetNetwork(self); >> >> self->trials = 0; >> >> self->day = U32(60) * U32(60) * U32(24) * timeTPS; >> >> #if defined(LDL_ENABLE_L2_1_1) >> LDL_OPS_deriveJoinKeys(self); >> #endif >> fillJoinBuffer(self, U16(self->devNonce)); >> >> self->devNonce++; >> >> arg.dev_nonce_updated.nextDevNonce = self->devNonce; >> >> unsigned char t[80]; >> sprintf(t, "self->handler=%p\r\n", self->handler); >> self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, ); >> >> self->tx.power = 0; >> >> self->op = LDL_OP_JOINING; >> >> if(self->state == LDL_STATE_IDLE){ >> >> self->state = LDL_STATE_WAIT_OTAA; >> LDL_MAC_timerSet(self, LDL_TIMER_WAITA, 0); >> } >> >> retval = LDL_STATUS_OK; >> >> LDL_DEBUG("OTAA is pending") >> } >> else{ >> >> /* need to re-init with a different JoinEUI */ >> retval = LDL_STATUS_DEVNONCE; >> } >> } >> else{ >> >> retval = LDL_STATUS_BUSY; >> } >> >> return retval; >> } >> >> If I comment out sprintf(), it crashes. If I deplace this debug >> trace >> before or _after_ self->handler call, firmware runs as expected. I don't >> understand. If there is a memory corruption somewhere, I could >> understand that a debug trace _before_ the line that triggers the bug >> can change something. But I don't understand why the following function >> runs as expected : >> >> enum ldl_mac_status LDL_MAC_otaa(struct ldl_mac *self) >> { >> enum ldl_mac_status retval; >> union ldl_mac_response_arg arg; >> >> LDL_PEDANTIC(self != NULL) >> >> if(self->ctx.joined){ >> >> retval = LDL_STATUS_JOINED; >> } >> else if(self->op == LDL_OP_NONE){ >> >> if(self->devNonce <= U32(UINT16_MAX)){ >> >> forgetNetwork(self); >> >> self->trials = 0; >> >> self->day = U32(60) * U32(60) * U32(24) * timeTPS; >> >> #if defined(LDL_ENABLE_L2_1_1) >> LDL_OPS_deriveJoinKeys(self); >> #endif >> fillJoinBuffer(self, U16(self->devNonce)); >> >> self->devNonce++; >> >> arg.dev_nonce_updated.nextDevNonce = self->devNonce; >> >> self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, ); >> >> self->tx.power = 0; >> >> self->op = LDL_OP_JOINING; >> >> if(self->state == LDL_STATE_IDLE){ >> >> self->state = LDL_STATE_WAIT_OTAA; >> LDL_MAC_timerSet(self, LDL_TIMER_WAITA, 0); >> } >> >> retval = LDL_STATUS_OK; >> >> LDL_DEBUG("OTAA is pending") >> } >> else{ >> >> /* need to re-init with a different JoinEUI */ >> retval = LDL_STATUS_DEVNONCE; >> } >> } >> else{ >> >> retval = LDL_STATUS_BUSY; >> } >> >> unsigned char t[80]; >> sprintf(t, "self->handler=%p\r\n", self->handler); >> return retval; >> } >> >> Of course, if I comment out : >> >> unsigned char t[80]; >> sprintf(t,
Re: Memory corruption (?) I don't understand
So I would recommend that you put in the ASSERT for the LDL_PEDANTIC() macro. Secondly, you need to be careful with PROGMEM, that is with AVR being a Harvard machine you can burn up SRAM quickly by not using PROGMEM for constants. For example with the LDL library you need to be sure that LDL_ENABLE_AVR is defined. Note the AVR is a great processor, however I have personally not used one in about 15 years. The price and ease of programming ARM Cortex M have gotten such AVR are no longer viable for new development as I can develop products so much faster with ARM Cortex M parts, and their power and price is comparable to AVR. Another trick you can do is inside a function do this: void printMem(void) { uint8_t *ptr; printf("stack 0x%X\n", ); //this will be address on stack where ptr is. ptr=malloc(10); printf("heap 0x%X\n", ptr); //this will be address as to heap free(ptr); } This will help as you can do some rough tracking of stack and heap usage, assuming the AVR linker script has stack growing from top of memory to end of heap it will help you check for stack overflow. On Tue, Jun 22, 2021 at 1:39 PM BERTRAND Joël wrote: > Strange. Following function runs as expected. > > enum ldl_mac_status LDL_MAC_otaa(struct ldl_mac *self) > { > enum ldl_mac_status retval; > union ldl_mac_response_arg arg; > > LDL_PEDANTIC(self != NULL) > > if(self->ctx.joined){ > > retval = LDL_STATUS_JOINED; > } > else if(self->op == LDL_OP_NONE){ > > if(self->devNonce <= U32(UINT16_MAX)){ > > forgetNetwork(self); > > self->trials = 0; > > self->day = U32(60) * U32(60) * U32(24) * timeTPS; > > #if defined(LDL_ENABLE_L2_1_1) > LDL_OPS_deriveJoinKeys(self); > #endif > fillJoinBuffer(self, U16(self->devNonce)); > > self->devNonce++; > > arg.dev_nonce_updated.nextDevNonce = self->devNonce; > > unsigned char t[80]; > sprintf(t, "self->handler=%p\r\n", self->handler); > self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, ); > > self->tx.power = 0; > > self->op = LDL_OP_JOINING; > > if(self->state == LDL_STATE_IDLE){ > > self->state = LDL_STATE_WAIT_OTAA; > LDL_MAC_timerSet(self, LDL_TIMER_WAITA, 0); > } > > retval = LDL_STATUS_OK; > > LDL_DEBUG("OTAA is pending") > } > else{ > > /* need to re-init with a different JoinEUI */ > retval = LDL_STATUS_DEVNONCE; > } > } > else{ > > retval = LDL_STATUS_BUSY; > } > > return retval; > } > > If I comment out sprintf(), it crashes. If I deplace this debug > trace > before or _after_ self->handler call, firmware runs as expected. I don't > understand. If there is a memory corruption somewhere, I could > understand that a debug trace _before_ the line that triggers the bug > can change something. But I don't understand why the following function > runs as expected : > > enum ldl_mac_status LDL_MAC_otaa(struct ldl_mac *self) > { > enum ldl_mac_status retval; > union ldl_mac_response_arg arg; > > LDL_PEDANTIC(self != NULL) > > if(self->ctx.joined){ > > retval = LDL_STATUS_JOINED; > } > else if(self->op == LDL_OP_NONE){ > > if(self->devNonce <= U32(UINT16_MAX)){ > > forgetNetwork(self); > > self->trials = 0; > > self->day = U32(60) * U32(60) * U32(24) * timeTPS; > > #if defined(LDL_ENABLE_L2_1_1) > LDL_OPS_deriveJoinKeys(self); > #endif > fillJoinBuffer(self, U16(self->devNonce)); > > self->devNonce++; > > arg.dev_nonce_updated.nextDevNonce = self->devNonce; > > self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, ); > > self->tx.power = 0; > > self->op = LDL_OP_JOINING; > > if(self->state == LDL_STATE_IDLE){ > > self->state = LDL_STATE_WAIT_OTAA; > LDL_MAC_timerSet(self, LDL_TIMER_WAITA, 0); > } > > retval = LDL_STATUS_OK; > > LDL_DEBUG("OTAA is pending") > } > else{ > > /* need to re-init with a different JoinEUI */ > retval = LDL_STATUS_DEVNONCE; > } > } > else{ > > retval = LDL_STATUS_BUSY; > } > > unsigned char t[80]; > sprintf(t, "self->handler=%p\r\n", self->handler); > return retval; > } > > Of course, if I comment out : > > unsigned char t[80]; > sprintf(t, "self->handler=%p\r\n", self->handler); > > it crashes again : > > hilbert:[~/cvs/firmware-antivol] > simavr -t -vvv -m atmega1284 -f > 1600 firmware.elf > Loaded 95670 .text at address 0x0 > Loaded 5654 .data > Loaded 2276 .eeprom > 01.. > .. > =.. > Systella L100-A.. > =.. > .. > Booting firmware 2021062218.. > SPI initialized.. > Reset
Re: Memory corruption (?) I don't understand
Strange. Following function runs as expected. enum ldl_mac_status LDL_MAC_otaa(struct ldl_mac *self) { enum ldl_mac_status retval; union ldl_mac_response_arg arg; LDL_PEDANTIC(self != NULL) if(self->ctx.joined){ retval = LDL_STATUS_JOINED; } else if(self->op == LDL_OP_NONE){ if(self->devNonce <= U32(UINT16_MAX)){ forgetNetwork(self); self->trials = 0; self->day = U32(60) * U32(60) * U32(24) * timeTPS; #if defined(LDL_ENABLE_L2_1_1) LDL_OPS_deriveJoinKeys(self); #endif fillJoinBuffer(self, U16(self->devNonce)); self->devNonce++; arg.dev_nonce_updated.nextDevNonce = self->devNonce; unsigned char t[80]; sprintf(t, "self->handler=%p\r\n", self->handler); self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, ); self->tx.power = 0; self->op = LDL_OP_JOINING; if(self->state == LDL_STATE_IDLE){ self->state = LDL_STATE_WAIT_OTAA; LDL_MAC_timerSet(self, LDL_TIMER_WAITA, 0); } retval = LDL_STATUS_OK; LDL_DEBUG("OTAA is pending") } else{ /* need to re-init with a different JoinEUI */ retval = LDL_STATUS_DEVNONCE; } } else{ retval = LDL_STATUS_BUSY; } return retval; } If I comment out sprintf(), it crashes. If I deplace this debug trace before or _after_ self->handler call, firmware runs as expected. I don't understand. If there is a memory corruption somewhere, I could understand that a debug trace _before_ the line that triggers the bug can change something. But I don't understand why the following function runs as expected : enum ldl_mac_status LDL_MAC_otaa(struct ldl_mac *self) { enum ldl_mac_status retval; union ldl_mac_response_arg arg; LDL_PEDANTIC(self != NULL) if(self->ctx.joined){ retval = LDL_STATUS_JOINED; } else if(self->op == LDL_OP_NONE){ if(self->devNonce <= U32(UINT16_MAX)){ forgetNetwork(self); self->trials = 0; self->day = U32(60) * U32(60) * U32(24) * timeTPS; #if defined(LDL_ENABLE_L2_1_1) LDL_OPS_deriveJoinKeys(self); #endif fillJoinBuffer(self, U16(self->devNonce)); self->devNonce++; arg.dev_nonce_updated.nextDevNonce = self->devNonce; self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, ); self->tx.power = 0; self->op = LDL_OP_JOINING; if(self->state == LDL_STATE_IDLE){ self->state = LDL_STATE_WAIT_OTAA; LDL_MAC_timerSet(self, LDL_TIMER_WAITA, 0); } retval = LDL_STATUS_OK; LDL_DEBUG("OTAA is pending") } else{ /* need to re-init with a different JoinEUI */ retval = LDL_STATUS_DEVNONCE; } } else{ retval = LDL_STATUS_BUSY; } unsigned char t[80]; sprintf(t, "self->handler=%p\r\n", self->handler); return retval; } Of course, if I comment out : unsigned char t[80]; sprintf(t, "self->handler=%p\r\n", self->handler); it crashes again : hilbert:[~/cvs/firmware-antivol] > simavr -t -vvv -m atmega1284 -f 1600 firmware.elf Loaded 95670 .text at address 0x0 Loaded 5654 .data Loaded 2276 .eeprom 01.. .. =.. Systella L100-A.. =.. .. Booting firmware 2021062218.. SPI initialized.. Reset LORA.. Reset LORA done.. LoRaWAN 1.1.. Initialization SX1262.. Initialization SX1262 done.. .. MAC initialization.. LDL_MAC_addChannel:790>chIndex=0 freq=86810 minRate=0 maxRate=5.. LDL_MAC_addChannel:790>chIndex=1 freq=86830 minRate=0 maxRate=5.. LDL_MAC_addChannel:790>chIndex=2 freq=86850 minRate=0 maxRate=5.. cb type=11.. processInit:994>set radio reset: ticks=151.. processRadioReset:1009>clear radio reset: ticks=151.. MAC initialization done.. lora_send.. processStartRadioForEntropy:1061>listen for entropy: ticks=152.. processEntropy:1078>read entropy: ticks=152 entropy=0.. cb type=0.. LDL_MAC_ready.. LDL_MAC_otaa.. LDL_MAC_addChannel:790>chIndex=0 freq=86810 minRate=0 maxRate=5.. LDL_MAC_addChannel:790>chIndex=1 freq=86830 minRate=0 maxRate=5.. LDL_MAC_addChannel:790>chIndex=2 freq=86850 minRate=0 maxRate=5.. avr_gdb_init listening on port 1234
Re: Memory corruption (?) I don't understand
BERTRAND Joël a écrit : > Trampas Stern a écrit : >> Does the code run if you comment out 'lora_send("coucou", 7); > > Yes. To be honnest, if I comment out LDL_MAC_otaa(), it runs. > >> Does it still crash with simple main() { _delay_ms(1000); >> stty_print("hello");} >> >> That is start removing code until it works, when you find the line in >> main that causes a crash, go into that function and start removing code >> until it works. Slowly you will find the problem. > > I have tried (and I continue...) but I'm not sure that this bug is in > LoRaWAN library. Indeed, LDL_MAC_otaa() shows my bug but, sometimes, > just with a debug trace in main(), LDL_MAC_otaa() doesn't crash. In lorawan/ldl_mac.c, you have : enum ldl_mac_status LDL_MAC_otaa(struct ldl_mac *self) { ... fillJoinBuffer(self, U16(self->devNonce)); self->devNonce++; arg.dev_nonce_updated.nextDevNonce = self->devNonce; self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, ); self->tx.power = 0; self->op = LDL_OP_JOINING; ... } If I comment out : self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, ); Firmware doesn't crash. This pointer is a pointer to app_handler() in ./lorawan.c. Now, I suppose that this pointer is modified. Thus, I have add a debug trace : enum ldl_mac_status LDL_MAC_otaa(struct ldl_mac *self) { enum ldl_mac_status retval; union ldl_mac_response_arg arg; ... unsigned char t[80]; snprintf(t, 80, "self->handler=%p\r\n", self->handler); stty_print(t); self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, ); self->tx.power = 0; self->op = LDL_OP_JOINING; ... } and with this simple debug trace, it doesn't crash anymore !...
Re: Memory corruption (?) I don't understand
Trampas Stern a écrit : > Does the code run if you comment out 'lora_send("coucou", 7); Yes. To be honnest, if I comment out LDL_MAC_otaa(), it runs. > Does it still crash with simple main() { _delay_ms(1000); > stty_print("hello");} > > That is start removing code until it works, when you find the line in > main that causes a crash, go into that function and start removing code > until it works. Slowly you will find the problem. I have tried (and I continue...) but I'm not sure that this bug is in LoRaWAN library. Indeed, LDL_MAC_otaa() shows my bug but, sometimes, just with a debug trace in main(), LDL_MAC_otaa() doesn't crash. Regards, JB
Re: Memory corruption (?) I don't understand
David Brown a écrit : > On 22/06/2021 18:00, Ian Molton wrote: >> On 22/06/2021 12:59, BERTRAND Joël wrote: >>> Hello, >>> >>> I'm writing a firmware for a board that uses a ATMega 1284. Firmware >>> continuously restarts >> >> This is likely a branch through zero. >> >> Common causes on AVR are bad function pointers, and stack damage. >> > > A good way to check this is to track the reset cause. At startup, look > for the reset cause (and print it out on debugging, record it in logs, > or whatever is appropriate for the system), then clear the reset cause > flags. If you find you are starting up and there are no reset cause > flags set, then you haven't actually reset - you've just jumped to > address 0. > > MCUSR=02 (after external reset) = Systella L100-A = Booting firmware 2021062218 SPI initialized Reset LORA Reset LORA done LoRaWAN 1.1 Initialization SX1262 Initialization SX1262 done 0004A30B0013F046 MAC initialization LDL_MAC_addChannel:786>chIndex=0 freq=86810 minRate=0 maxRate=5 LDL_MAC_addChannel:786>chIndex=1 freq=86830 minRate=0 maxRate=5 LDL_MAC_addChannel:786>chIndex=2 freq=86850 minRate=0 maxRate=5 cb type=11 processInit:990>set radio reset: ticks=126 processRadioReset:1005>clear radio reset: ticks=126 MAC initialization done lora_send processStartRadioForEntropy:1057>listen for entropy: ticks=127 processEntropy:1074>read entropy: ticks=127 entropy=1385813498 cb type=0 LDL_MAC_ready LDL_MAC_otaa LDL_MAC_addChannel:786>chIndex=0 freq=86810 minRate=0 maxRate=5 LDL_MAC_addChannel:786>chIndex=1 freq=86830 minRate=0 maxRate=5 LDL_MAC_addChannel:786>chIndex=2 freq=86850 minRate=0 maxRate=5 MCUSR=00 = Systella L100-A = Booting firmware 2021062218 SPI initialized ... RESET flags are always 0 when firmwware restarts...
Re: Memory corruption (?) I don't understand
Does the code run if you comment out 'lora_send("coucou", 7); Does it still crash with simple main() { _delay_ms(1000); stty_print("hello");} That is start removing code until it works, when you find the line in main that causes a crash, go into that function and start removing code until it works. Slowly you will find the problem. On Tue, Jun 22, 2021 at 12:00 PM Ian Molton wrote: > On 22/06/2021 12:59, BERTRAND Joël wrote: > > Hello, > > > > I'm writing a firmware for a board that uses a ATMega 1284. > Firmware > > continuously restarts > > This is likely a branch through zero. > > Common causes on AVR are bad function pointers, and stack damage. > > -Ian > >
Re: Memory corruption (?) I don't understand
On 22/06/2021 18:00, Ian Molton wrote: > On 22/06/2021 12:59, BERTRAND Joël wrote: >> Hello, >> >> I'm writing a firmware for a board that uses a ATMega 1284. Firmware >> continuously restarts > > This is likely a branch through zero. > > Common causes on AVR are bad function pointers, and stack damage. > A good way to check this is to track the reset cause. At startup, look for the reset cause (and print it out on debugging, record it in logs, or whatever is appropriate for the system), then clear the reset cause flags. If you find you are starting up and there are no reset cause flags set, then you haven't actually reset - you've just jumped to address 0.
Re: Memory corruption (?) I don't understand
On 22/06/2021 12:59, BERTRAND Joël wrote: > Hello, > > I'm writing a firmware for a board that uses a ATMega 1284. Firmware > continuously restarts This is likely a branch through zero. Common causes on AVR are bad function pointers, and stack damage. -Ian
Re: Memory corruption (?) I don't understand
Trampas Stern a écrit : > Can you comment out lines and have firmware run? I don't understand. I have tried to comment out some lines. Sometimes firmware runs as expected, sometimes, it continuously reboots. When I comment out some lines or add some debug trace, firmware desn't crash at the same place. I have deleted some piece of code to only initialize my board and to try to access to SX1262... > Could it be that your power supply is dropping and browning out? Nope, unfortunately. I have done tests with in-board uninterruptible power supply and with an external 150W power supply. Please note that simavr shows exactly the same result : hilbert:[~/cvs/firmware-antivol] > simavr -t -vvv -m atmega1284 -f 1600 firmware.elf Loaded 95692 .text at address 0x0 Loaded 5646 .data Loaded 2276 .eeprom .. =.. Systella L100-A.. =.. .. Booting firmware 2021062213.. SPI initialized.. Reset LORA.. Reset LORA done.. LoRaWAN 1.1.. Initialization SX1262.. Initialization SX1262 done.. .. MAC initialization.. LDL_MAC_addChannel:786>chIndex=0 freq=86810 minRate=0 maxRate=5.. LDL_MAC_addChannel:786>chIndex=1 freq=86830 minRate=0 maxRate=5.. LDL_MAC_addChannel:786>chIndex=2 freq=86850 minRate=0 maxRate=5.. cb type=11.. processInit:990>set radio reset: ticks=126.. processRadioReset:1005>clear radio reset: ticks=126.. MAC initialization done.. lora_send.. processStartRadioForEntropy:1057>listen for entropy: ticks=127.. processEntropy:1074>read entropy: ticks=127 entropy=0.. cb type=0.. LDL_MAC_ready.. LDL_MAC_otaa.. LDL_MAC_addChannel:786>chIndex=0 freq=86810 minRate=0 maxRate=5.. LDL_MAC_addChannel:786>chIndex=1 freq=86830 minRate=0 maxRate=5.. LDL_MAC_addChannel:786>chIndex=2 freq=86850 minRate=0 maxRate=5.. avr_gdb_init listening on port 1234 Real board reboots (or stalls) when simavr enters in gdb. Regards, JB
Re: Memory corruption (?) I don't understand
Can you comment out lines and have firmware run? Could it be that your power supply is dropping and browning out? On Tue, Jun 22, 2021 at 7:59 AM BERTRAND Joël wrote: > Hello, > > I'm writing a firmware for a board that uses a ATMega 1284. > Firmware > continuously restarts and I'm not able to fix this bug. It looks like a > memory corruption, but after several days without finding this bug, I > doubt. Hardware seems to be bug free and runs as expected. > > I have uploaded this firmware at > https://hilbert.systella.fr/public/firmware.tar.gz > > In this tarball, firmware.elf was compiled with gcc 11.1.0 (and > crashes). > > I have removed all multitasking capabilities, mutex_* functions > don't > nothing. Firmware crashes in lora_send(). There is no memory allocation > (only one malloc() in this code but in function gnss() and this function > is not called). Firmware starts in main(), initialises some chips and > LoRa module (SX1262) and enters in a loop. > > simavr gives the same result, but avr-gdb doesn't give usable > information. I have tried to build firmware with gcc 11.1.0 (instead of > 5.4.0). Sams result, but avr-gdb now returns "internal error". Thus, I > have built a new avr-gdb (9.2). Problem : > > hilbert:[/usr/local/cross/avr/bin] > ./avr-gdb -v > Erreur de segmentation > hilbert:[/usr/local/cross/avr/bin] > > > I have tried to bissect, to add debug traces (with some debug > traces, > firmware runs as expected !) without result. I have tried -Os and -O2 > also. I no longer know how to correct the problem. > > Help will be welcome. > > Best regards, > > JB > >