Re: Memory corruption (?) I don't understand

2021-06-24 Thread BERTRAND Joël
Senthil Kumar a écrit :
> https://gcc.gnu.org/bugzilla/
> 
> You would need a reduced testcase/preprocessed file (xxx.i obtained by
> compiling with -save-temps) to debug the issue -  the .S file isn't
> useful to debug the compiler.

Done.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101188



Re: Memory corruption (?) I don't understand

2021-06-24 Thread Senthil Kumar
https://gcc.gnu.org/bugzilla/

You would need a reduced testcase/preprocessed file (xxx.i obtained by
compiling with -save-temps) to debug the issue -  the .S file isn't
useful to debug the compiler.

Regards
Senthil

On Thu, Jun 24, 2021 at 11:36 AM BERTRAND Joël
 wrote:
>
> Senthil Kumar a écrit :
> > This looks like a miscompilation to me.
>
> ...
>
> > What version of the compiler are you using? Can you make a reduced
> > testcase containing just the LDL_MAC_otaa function?
>
> Just a question. Should I fill a bug report ? And if yes, where  ?
>
> Best regards,
>
> JKB



Re: Memory corruption (?) I don't understand

2021-06-24 Thread BERTRAND Joël
Senthil Kumar a écrit :
> This looks like a miscompilation to me.

...

> What version of the compiler are you using? Can you make a reduced
> testcase containing just the LDL_MAC_otaa function?

Just a question. Should I fill a bug report ? And if yes, where  ?

Best regards,

JKB



Re: Memory corruption (?) I don't understand

2021-06-23 Thread BERTRAND Joël
Senthil Kumar a écrit :
> What version of the compiler are you using?

I have seen this bug with Debian compiler :

hilbert:[~] > avr-gcc -v
Using built-in specs.
Reading specs from /usr/lib/gcc/avr/5.4.0/device-specs/specs-avr2
COLLECT_GCC=avr-gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/avr/5.4.0/lto-wrapper
Target: avr
Configured with: ../gcc/configure -v --enable-languages=c,c++
--prefix=/usr/lib --infodir=/usr/share/info --mandir=/usr/share/man
--bindir=/usr/bin --libexecdir=/usr/lib --libdir=/usr/lib
--enable-shared --with-system-zlib --enable-long-long --enable-nls
--without-included-gettext --disable-libssp --build=x86_64-linux-gnu
--host=x86_64-linux-gnu --target=avr CFLAGS='-g -O2
-fdebug-prefix-map=/build/gcc-avr-Q3AFj7/gcc-avr-5.4.0+Atmel3.6.2=.
-fstack-protector-strong -Wformat ' CPPFLAGS='-Wdate-time
-D_FORTIFY_SOURCE=2' CXXFLAGS='-g -O2
-fdebug-prefix-map=/build/gcc-avr-Q3AFj7/gcc-avr-5.4.0+Atmel3.6.2=.
-fstack-protector-strong -Wformat ' FCFLAGS='-g -O2
-fdebug-prefix-map=/build/gcc-avr-Q3AFj7/gcc-avr-5.4.0+Atmel3.6.2=.
-fstack-protector-strong' FFLAGS='-g -O2
-fdebug-prefix-map=/build/gcc-avr-Q3AFj7/gcc-avr-5.4.0+Atmel3.6.2=.
-fstack-protector-strong' GCJFLAGS='-g -O2
-fdebug-prefix-map=/build/gcc-avr-Q3AFj7/gcc-avr-5.4.0+Atmel3.6.2=.
-fstack-protector-strong' LDFLAGS=-Wl,-z,relro OBJCFLAGS='-g -O2
-fdebug-prefix-map=/build/gcc-avr-Q3AFj7/gcc-avr-5.4.0+Atmel3.6.2=.
-fstack-protector-strong -Wformat ' OBJCXXFLAGS='-g -O2
-fdebug-prefix-map=/build/gcc-avr-Q3AFj7/gcc-avr-5.4.0+Atmel3.6.2=.
-fstack-protector-strong -Wformat '
Thread model: single
gcc version 5.4.0 (GCC)

I have tried to build another compiler from scratch :
hilbert:[~] > /usr/local/cross/avr/bin/avr-gcc -v
Using built-in specs.
Reading specs from
/usr/local/cross/avr/lib/gcc/avr/11.1.0/device-specs/specs-avr2
COLLECT_GCC=/usr/local/cross/avr/bin/avr-gcc
COLLECT_LTO_WRAPPER=/usr/local/cross/avr/libexec/gcc/avr/11.1.0/lto-wrapper
Target: avr
Configured with: /usr/local/cross/build/avr/src/gcc/configure
--build=x86_64-build_pc-linux-gnu --host=x86_64-build_pc-linux-gnu
--target=avr --prefix=/usr/local/cross/avr
--exec_prefix=/usr/local/cross/avr
--with-local-prefix=/usr/local/cross/avr/avr
--with-headers=/usr/local/cross/avr/avr/include --with-newlib
--enable-threads=no --disable-shared --with-pkgversion='crosstool-NG
1.24.0.349_6d00833' --disable-__cxa_atexit --disable-libgomp
--disable-libmudflap --disable-libmpx --disable-libssp
--disable-libquadmath --disable-libquadmath-support
--with-gmp=/usr/local/cross/build/avr/buildtools
--with-mpfr=/usr/local/cross/build/avr/buildtools
--with-mpc=/usr/local/cross/build/avr/buildtools
--with-isl=/usr/local/cross/build/avr/buildtools --enable-lto
--with-host-libstdcxx='-static-libgcc -Wl,-Bstatic,-lstdc++,-Bdynamic
-lm' --enable-target-optspace --disable-nls --enable-multiarch
--enable-languages=c
Thread model: single
Supported LTO compression algorithms: zlib zstd
gcc version 11.1.0 (crosstool-NG 1.24.0.349_6d00833)

.S I have uploaded were compiled with 11.1.0.

> Can you make a reduced
> testcase containing just the LDL_MAC_otaa function?

Firmware I have sent is a minimal example. Of course, a lot of code is
compiled but is not executed. This code only initializes real board and
sents a packet to my LoRaWAN base station.

Regards,

JKB



Re: Memory corruption (?) I don't understand

2021-06-23 Thread Senthil Kumar
This looks like a miscompilation to me. Specifically, in good_2.S, you have

a318: f6 01   movw r30, r12
...

a33c: e3 50   subi r30, 0x03 ; 3
a33e: ff 4f   sbci r31, 0xFF ; 255
a340: 01 90   ld r0, Z+
a342: f0 81   ld r31, Z
a344: e0 2d   mov r30, r0

This loads Z with R13:R12, and then later offsets it by -0xFF03 to
obtain the address of self->handler, which is then used by the icall
instruction.

a31c: f6 01   movw r30, r12
...

 a340: 68 01   movw r12, r16
a342: ff e4   ldi r31, 0x4F ; 79
a344: cf 1a   sub r12, r31
a346: fe ef   ldi r31, 0xFE ; 254
a348: df 0a   sbc r13, r31
a34a: e3 50   subi r30, 0x03 ; 3
a34c: ff 4f   sbci r31, 0xFF ; 255
a34e: 01 90   ld r0, Z+
a350: f0 81   ld r31, Z
a352: e0 2d   mov r30, r0

Note the clobbering of R31:R30 with immediate values *before* the
offsetting is done. I think this is a codegen bug - the compiler
should have either picked a different set of regs to subtract R13:R12
from, or should have restored Z with a movw r30, r12 before 0xa34a.

What version of the compiler are you using? Can you make a reduced
testcase containing just the LDL_MAC_otaa function?

Regards
Senthil


On Wed, Jun 23, 2021 at 3:12 PM BERTRAND Joël  wrote:
>
> David Brown a écrit :
> > On 23/06/2021 10:37, BERTRAND Joël wrote:
> >
> >> In bad.S, self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, ) gives :
> >> a340:   68 01   movwr12, r16
> >> a342:   ff e4   ldi r31, 0x4F   ; 79
> >> a344:   cf 1a   sub r12, r31
> >> a346:   fe ef   ldi r31, 0xFE   ; 254
> >> a348:   df 0a   sbc r13, r31
> >> a34a:   e3 50   subir30, 0x03   ; 3
> >> a34c:   ff 4f   sbcir31, 0xFF   ; 255
> >> a34e:   01 90   ld  r0, Z+
> >> a350:   f0 81   ld  r31, Z
> >> a352:   e0 2d   mov r30, r0
> >> a354:   ae 01   movwr20, r28
> >> a356:   4f 5a   subir20, 0xAF   ; 175
> >> a358:   5f 4f   sbcir21, 0xFF   ; 255
> >> a35a:   65 e0   ldi r22, 0x05   ; 5
> >> a35c:   70 e0   ldi r23, 0x00   ; 0
> >> a35e:   d6 01   movwr26, r12
> >> a360:   8d 91   ld  r24, X+
> >> a362:   9c 91   ld  r25, X
> >> a364:   09 95   icall
> >>
> >>  I have rebuilt a second firmware that works as expected with:
> >>
> >> if (self->handler == NULL) for(;;);
> >> self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, );
> >>
> >>  Asmb output contains :
> >>
> >> if (self->handler == NULL) for(;;);
> >> a33c:   e3 50   subir30, 0x03   ; 3
> >> a33e:   ff 4f   sbcir31, 0xFF   ; 255
> >> a340:   01 90   ld  r0, Z+
> >> a342:   f0 81   ld  r31, Z
> >> a344:   e0 2d   mov r30, r0
> >> a346:   30 97   sbiwr30, 0x00   ; 0
> >> a348:   09 f4   brne.+2 ; 0xa34c
> >> a34a:   4a c0   rjmp.+148   ; 0xa3e0
> >> self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, );
> >> a34c:   68 01   movwr12, r16
> >> a34e:   2f e4   ldi r18, 0x4F   ; 79
> >> a350:   c2 1a   sub r12, r18
> >> a352:   2e ef   ldi r18, 0xFE   ; 254
> >> a354:   d2 0a   sbc r13, r18
> >> a356:   ae 01   movwr20, r28
> >> a358:   4f 5a   subir20, 0xAF   ; 175
> >> a35a:   5f 4f   sbcir21, 0xFF   ; 255
> >> a35c:   65 e0   ldi r22, 0x05   ; 5
> >> a35e:   70 e0   ldi r23, 0x00   ; 0
> >> a360:   d6 01   movwr26, r12
> >> a362:   8d 91   ld  r24, X+
> >> a364:   9c 91   ld  r25, X
> >> a366:   09 95   icall
> >>
> >>  I'm not a AVR ASMB specialist, but I don't understand why
> >> self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, ) produces a
> >> different code (with of course the same compilation options) when it was
> >> preceded by if (self->handler == NULL) for(;;);
> >>
> >
> > I haven't looked at any of the rest of your code.  But here I think the
> > differences are just that after the NULL check, the compiler knows that
> > self->handler is already in Z, and this can be re-used later in the
> > indirect call.  So the registers used for getting the other parameters,
> > and perhaps the order they are collected, can change accordingly.
> > Optimisations and register allocation algorithms can often make the
> > assembly appear significantly differently when in fact the changes are
> > typically cosmetic.
>
> Maybe, but how do you can explain that with a NULL test, code runs as
> expected and without, it crashes ?
>
> I have tried another test :
>
> if (self->handler != app_handler) for(;;);
> self->handler(self->app, 

Re: Memory corruption (?) I don't understand

2021-06-23 Thread BERTRAND Joël
David Brown a écrit :
> On 23/06/2021 10:37, BERTRAND Joël wrote:
> 
>> In bad.S, self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, ) gives :
>> a340:   68 01   movwr12, r16
>> a342:   ff e4   ldi r31, 0x4F   ; 79
>> a344:   cf 1a   sub r12, r31
>> a346:   fe ef   ldi r31, 0xFE   ; 254
>> a348:   df 0a   sbc r13, r31
>> a34a:   e3 50   subir30, 0x03   ; 3
>> a34c:   ff 4f   sbcir31, 0xFF   ; 255
>> a34e:   01 90   ld  r0, Z+
>> a350:   f0 81   ld  r31, Z
>> a352:   e0 2d   mov r30, r0
>> a354:   ae 01   movwr20, r28
>> a356:   4f 5a   subir20, 0xAF   ; 175
>> a358:   5f 4f   sbcir21, 0xFF   ; 255
>> a35a:   65 e0   ldi r22, 0x05   ; 5
>> a35c:   70 e0   ldi r23, 0x00   ; 0
>> a35e:   d6 01   movwr26, r12
>> a360:   8d 91   ld  r24, X+
>> a362:   9c 91   ld  r25, X
>> a364:   09 95   icall
>>
>>  I have rebuilt a second firmware that works as expected with:
>>
>> if (self->handler == NULL) for(;;);
>> self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, );
>>
>>  Asmb output contains :
>>
>> if (self->handler == NULL) for(;;);
>> a33c:   e3 50   subir30, 0x03   ; 3
>> a33e:   ff 4f   sbcir31, 0xFF   ; 255
>> a340:   01 90   ld  r0, Z+
>> a342:   f0 81   ld  r31, Z
>> a344:   e0 2d   mov r30, r0
>> a346:   30 97   sbiwr30, 0x00   ; 0
>> a348:   09 f4   brne.+2 ; 0xa34c
>> a34a:   4a c0   rjmp.+148   ; 0xa3e0
>> self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, );
>> a34c:   68 01   movwr12, r16
>> a34e:   2f e4   ldi r18, 0x4F   ; 79
>> a350:   c2 1a   sub r12, r18
>> a352:   2e ef   ldi r18, 0xFE   ; 254
>> a354:   d2 0a   sbc r13, r18
>> a356:   ae 01   movwr20, r28
>> a358:   4f 5a   subir20, 0xAF   ; 175
>> a35a:   5f 4f   sbcir21, 0xFF   ; 255
>> a35c:   65 e0   ldi r22, 0x05   ; 5
>> a35e:   70 e0   ldi r23, 0x00   ; 0
>> a360:   d6 01   movwr26, r12
>> a362:   8d 91   ld  r24, X+
>> a364:   9c 91   ld  r25, X
>> a366:   09 95   icall
>>
>>  I'm not a AVR ASMB specialist, but I don't understand why
>> self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, ) produces a
>> different code (with of course the same compilation options) when it was
>> preceded by if (self->handler == NULL) for(;;);
>>
> 
> I haven't looked at any of the rest of your code.  But here I think the
> differences are just that after the NULL check, the compiler knows that
> self->handler is already in Z, and this can be re-used later in the
> indirect call.  So the registers used for getting the other parameters,
> and perhaps the order they are collected, can change accordingly.
> Optimisations and register allocation algorithms can often make the
> assembly appear significantly differently when in fact the changes are
> typically cosmetic.

Maybe, but how do you can explain that with a NULL test, code runs as
expected and without, it crashes ?

I have tried another test :

if (self->handler != app_handler) for(;;);
self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, );

and firmware runs as expected. But this one crashes :

if (self->handler != app_handler) {}
self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, );

I have uploded three firmwares (built with -Os) :

https://hilbert.systella.fr/public/good.S
https://hilbert.systella.fr/public/good_2.S
https://hilbert.systella.fr/public/bad.S

Best regards,

JKB



Re: Memory corruption (?) I don't understand

2021-06-23 Thread Senthil Kumar
This part

> a340:   68 01   movwr12, r16
> a342:   ff e4   ldi r31, 0x4F   ; 79
> a344:   cf 1a   sub r12, r31
> a346:   fe ef   ldi r31, 0xFE   ; 254
> a348:   df 0a   sbc r13, r31
> a34a:   e3 50   subir30, 0x03   ; 3
> a34c:   ff 4f   sbcir31, 0xFF   ; 255
> a34e:   01 90   ld  r0, Z+
> a350:   f0 81   ld  r31, Z
> a352:   e0 2d   mov r30, r0

looks suspicious - the Z register is loaded with 0xFE4F and then just
subtracted with 0xFF03 to obtain self->handler, whereas self->app is
obtained by offsetting R13:R12.

Explicitly checking if self->handler is NULL also has the sub, but I
presume R31:R30 are loaded with the address of self beforehand?

> a33c:   e3 50   subir30, 0x03   ; 3
> a33e:   ff 4f   sbcir31, 0xFF   ; 255
> a340:   01 90   ld  r0, Z+
> a342:   f0 81   ld  r31, Z
> a344:   e0 2d   mov r30, r0

Regards
Senthil

On Wed, Jun 23, 2021 at 2:07 PM BERTRAND Joël  wrote:
>
> Comparaison between good.S and bad.S.
>
> Good.elf is compiled with :
>
> app_handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, );
>
> instead of bad.elf that is compiled with :
>
> self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, );
>
> in enum ldl_mac_status LDL_MAC_otaa(struct ldl_mac *self)
>
> In both .S files:
> - app_handler is written @22f2.
> - argmac.handler = app_handler gives following code :
>
> 2886:   89 e7   ldi r24, 0x79   ; 121
> 2888:   91 e1   ldi r25, 0x11   ; 17
> 288a:   9c 87   std Y+12, r25   ; 0x0c
> 288c:   8b 87   std Y+11, r24   ; 0x0b
>
> In good.S, app_handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, ) is
> built as :
> a340:   68 01   movwr12, r16
> a342:   ff e4   ldi r31, 0x4F   ; 79
> a344:   cf 1a   sub r12, r31
> a346:   fe ef   ldi r31, 0xFE   ; 254
> a348:   df 0a   sbc r13, r31
> a34a:   ae 01   movwr20, r28
> a34c:   4f 5a   subir20, 0xAF   ; 175
> a34e:   5f 4f   sbcir21, 0xFF   ; 255
> // 0x05 0x00 = LDL_MAC_DEV_NONCE_UPDATED
> a350:   65 e0   ldi r22, 0x05   ; 5
> a352:   70 e0   ldi r23, 0x00   ; 0
> a354:   d6 01   movwr26, r12
> a356:   8d 91   ld  r24, X+
> a358:   9c 91   ld  r25, X
> a35a:   0e 94 79 11 call0x22f2  ; 0x22f2 
>
> In bad.S, self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, ) gives :
> a340:   68 01   movwr12, r16
> a342:   ff e4   ldi r31, 0x4F   ; 79
> a344:   cf 1a   sub r12, r31
> a346:   fe ef   ldi r31, 0xFE   ; 254
> a348:   df 0a   sbc r13, r31
> a34a:   e3 50   subir30, 0x03   ; 3
> a34c:   ff 4f   sbcir31, 0xFF   ; 255
> a34e:   01 90   ld  r0, Z+
> a350:   f0 81   ld  r31, Z
> a352:   e0 2d   mov r30, r0
> a354:   ae 01   movwr20, r28
> a356:   4f 5a   subir20, 0xAF   ; 175
> a358:   5f 4f   sbcir21, 0xFF   ; 255
> a35a:   65 e0   ldi r22, 0x05   ; 5
> a35c:   70 e0   ldi r23, 0x00   ; 0
> a35e:   d6 01   movwr26, r12
> a360:   8d 91   ld  r24, X+
> a362:   9c 91   ld  r25, X
> a364:   09 95   icall
>
> I have rebuilt a second firmware that works as expected with:
>
> if (self->handler == NULL) for(;;);
> self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, );
>
> Asmb output contains :
>
> if (self->handler == NULL) for(;;);
> a33c:   e3 50   subir30, 0x03   ; 3
> a33e:   ff 4f   sbcir31, 0xFF   ; 255
> a340:   01 90   ld  r0, Z+
> a342:   f0 81   ld  r31, Z
> a344:   e0 2d   mov r30, r0
> a346:   30 97   sbiwr30, 0x00   ; 0
> a348:   09 f4   brne.+2 ; 0xa34c
> a34a:   4a c0   rjmp.+148   ; 0xa3e0
> self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, );
> a34c:   68 01   movwr12, r16
> a34e:   2f e4   ldi r18, 0x4F   ; 79
> a350:   c2 1a   sub r12, r18
> a352:   2e ef   ldi r18, 0xFE   ; 254
> a354:   d2 0a   sbc r13, r18
> a356:   ae 01   movwr20, r28
> a358:   4f 5a   subir20, 0xAF   ; 175
> a35a:   5f 4f   sbcir21, 0xFF   ; 255
> a35c:   65 e0   ldi r22, 0x05   ; 5
> a35e:   70 e0   ldi r23, 0x00   ; 0
> a360:   d6 01   movwr26, r12
> a362:   8d 91   ld  r24, X+
> a364:   9c 91   ld  r25, X
> a366:   09 95   icall
>
> I'm not a AVR ASMB specialist, but I don't understand why
> 

Re: Memory corruption (?) I don't understand

2021-06-23 Thread David Brown
On 23/06/2021 10:37, BERTRAND Joël wrote:

> In bad.S, self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, ) gives :
> a340:   68 01   movwr12, r16
> a342:   ff e4   ldi r31, 0x4F   ; 79
> a344:   cf 1a   sub r12, r31
> a346:   fe ef   ldi r31, 0xFE   ; 254
> a348:   df 0a   sbc r13, r31
> a34a:   e3 50   subir30, 0x03   ; 3
> a34c:   ff 4f   sbcir31, 0xFF   ; 255
> a34e:   01 90   ld  r0, Z+
> a350:   f0 81   ld  r31, Z
> a352:   e0 2d   mov r30, r0
> a354:   ae 01   movwr20, r28
> a356:   4f 5a   subir20, 0xAF   ; 175
> a358:   5f 4f   sbcir21, 0xFF   ; 255
> a35a:   65 e0   ldi r22, 0x05   ; 5
> a35c:   70 e0   ldi r23, 0x00   ; 0
> a35e:   d6 01   movwr26, r12
> a360:   8d 91   ld  r24, X+
> a362:   9c 91   ld  r25, X
> a364:   09 95   icall
> 
>   I have rebuilt a second firmware that works as expected with:
> 
> if (self->handler == NULL) for(;;);
> self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, );
> 
>   Asmb output contains :
> 
> if (self->handler == NULL) for(;;);
> a33c:   e3 50   subir30, 0x03   ; 3
> a33e:   ff 4f   sbcir31, 0xFF   ; 255
> a340:   01 90   ld  r0, Z+
> a342:   f0 81   ld  r31, Z
> a344:   e0 2d   mov r30, r0
> a346:   30 97   sbiwr30, 0x00   ; 0
> a348:   09 f4   brne.+2 ; 0xa34c
> a34a:   4a c0   rjmp.+148   ; 0xa3e0
> self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, );
> a34c:   68 01   movwr12, r16
> a34e:   2f e4   ldi r18, 0x4F   ; 79
> a350:   c2 1a   sub r12, r18
> a352:   2e ef   ldi r18, 0xFE   ; 254
> a354:   d2 0a   sbc r13, r18
> a356:   ae 01   movwr20, r28
> a358:   4f 5a   subir20, 0xAF   ; 175
> a35a:   5f 4f   sbcir21, 0xFF   ; 255
> a35c:   65 e0   ldi r22, 0x05   ; 5
> a35e:   70 e0   ldi r23, 0x00   ; 0
> a360:   d6 01   movwr26, r12
> a362:   8d 91   ld  r24, X+
> a364:   9c 91   ld  r25, X
> a366:   09 95   icall
> 
>   I'm not a AVR ASMB specialist, but I don't understand why
> self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, ) produces a
> different code (with of course the same compilation options) when it was
> preceded by if (self->handler == NULL) for(;;);
> 

I haven't looked at any of the rest of your code.  But here I think the
differences are just that after the NULL check, the compiler knows that
self->handler is already in Z, and this can be re-used later in the
indirect call.  So the registers used for getting the other parameters,
and perhaps the order they are collected, can change accordingly.
Optimisations and register allocation algorithms can often make the
assembly appear significantly differently when in fact the changes are
typically cosmetic.




Re: Memory corruption (?) I don't understand

2021-06-23 Thread BERTRAND Joël
Comparaison between good.S and bad.S.

Good.elf is compiled with :

app_handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, );

instead of bad.elf that is compiled with :

self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, );

in enum ldl_mac_status LDL_MAC_otaa(struct ldl_mac *self)

In both .S files:
- app_handler is written @22f2.
- argmac.handler = app_handler gives following code :

2886:   89 e7   ldi r24, 0x79   ; 121
2888:   91 e1   ldi r25, 0x11   ; 17
288a:   9c 87   std Y+12, r25   ; 0x0c
288c:   8b 87   std Y+11, r24   ; 0x0b

In good.S, app_handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, ) is
built as :
a340:   68 01   movwr12, r16
a342:   ff e4   ldi r31, 0x4F   ; 79
a344:   cf 1a   sub r12, r31
a346:   fe ef   ldi r31, 0xFE   ; 254
a348:   df 0a   sbc r13, r31
a34a:   ae 01   movwr20, r28
a34c:   4f 5a   subir20, 0xAF   ; 175
a34e:   5f 4f   sbcir21, 0xFF   ; 255
// 0x05 0x00 = LDL_MAC_DEV_NONCE_UPDATED
a350:   65 e0   ldi r22, 0x05   ; 5
a352:   70 e0   ldi r23, 0x00   ; 0
a354:   d6 01   movwr26, r12
a356:   8d 91   ld  r24, X+
a358:   9c 91   ld  r25, X
a35a:   0e 94 79 11 call0x22f2  ; 0x22f2 

In bad.S, self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, ) gives :
a340:   68 01   movwr12, r16
a342:   ff e4   ldi r31, 0x4F   ; 79
a344:   cf 1a   sub r12, r31
a346:   fe ef   ldi r31, 0xFE   ; 254
a348:   df 0a   sbc r13, r31
a34a:   e3 50   subir30, 0x03   ; 3
a34c:   ff 4f   sbcir31, 0xFF   ; 255
a34e:   01 90   ld  r0, Z+
a350:   f0 81   ld  r31, Z
a352:   e0 2d   mov r30, r0
a354:   ae 01   movwr20, r28
a356:   4f 5a   subir20, 0xAF   ; 175
a358:   5f 4f   sbcir21, 0xFF   ; 255
a35a:   65 e0   ldi r22, 0x05   ; 5
a35c:   70 e0   ldi r23, 0x00   ; 0
a35e:   d6 01   movwr26, r12
a360:   8d 91   ld  r24, X+
a362:   9c 91   ld  r25, X
a364:   09 95   icall

I have rebuilt a second firmware that works as expected with:

if (self->handler == NULL) for(;;);
self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, );

Asmb output contains :

if (self->handler == NULL) for(;;);
a33c:   e3 50   subir30, 0x03   ; 3
a33e:   ff 4f   sbcir31, 0xFF   ; 255
a340:   01 90   ld  r0, Z+
a342:   f0 81   ld  r31, Z
a344:   e0 2d   mov r30, r0
a346:   30 97   sbiwr30, 0x00   ; 0
a348:   09 f4   brne.+2 ; 0xa34c
a34a:   4a c0   rjmp.+148   ; 0xa3e0
self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, );
a34c:   68 01   movwr12, r16
a34e:   2f e4   ldi r18, 0x4F   ; 79
a350:   c2 1a   sub r12, r18
a352:   2e ef   ldi r18, 0xFE   ; 254
a354:   d2 0a   sbc r13, r18
a356:   ae 01   movwr20, r28
a358:   4f 5a   subir20, 0xAF   ; 175
a35a:   5f 4f   sbcir21, 0xFF   ; 255
a35c:   65 e0   ldi r22, 0x05   ; 5
a35e:   70 e0   ldi r23, 0x00   ; 0
a360:   d6 01   movwr26, r12
a362:   8d 91   ld  r24, X+
a364:   9c 91   ld  r25, X
a366:   09 95   icall

I'm not a AVR ASMB specialist, but I don't understand why
self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, ) produces a
different code (with of course the same compilation options) when it was
preceded by if (self->handler == NULL) for(;;);

Best regards,

JKB



Re: Memory corruption (?) I don't understand

2021-06-23 Thread BERTRAND Joël
Some news.

In lorawan/ldl_mac.c is defined LDL_MAC_otaa :

enum ldl_mac_status LDL_MAC_otaa(struct ldl_mac *self)
{
enum ldl_mac_status retval;
union ldl_mac_response_arg arg;

LDL_PEDANTIC(self != NULL)

if(self->ctx.joined){

retval = LDL_STATUS_JOINED;
}
else if(self->op == LDL_OP_NONE){

if(self->devNonce <= U32(UINT16_MAX)){

forgetNetwork(self);

self->trials = 0;

self->day = U32(60) * U32(60) * U32(24) * timeTPS;

#if defined(LDL_ENABLE_L2_1_1)
LDL_OPS_deriveJoinKeys(self);
#endif
fillJoinBuffer(self, U16(self->devNonce));

self->devNonce++;

arg.dev_nonce_updated.nextDevNonce = self->devNonce;

self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, );

self->tx.power = 0;

self->op = LDL_OP_JOINING;

if(self->state == LDL_STATE_IDLE){

self->state = LDL_STATE_WAIT_OTAA;
LDL_MAC_timerSet(self, LDL_TIMER_WAITA, 0);
}

retval = LDL_STATUS_OK;

LDL_DEBUG("OTAA is pending")
}
else{

/* need to re-init with a different JoinEUI */
retval = LDL_STATUS_DEVNONCE;
}
}
else{

retval = LDL_STATUS_BUSY;
}

return retval;
}

In this function, there is a call to self->handler(self->app,
LDL_MAC_DEV_NONCE_UPDATED, ). If I comment out this line, firmware
doesn't crash anymore. I think in a first time that self->handler is
nullified by something. OK, I've checked and I have added just before
self->handler() :

if (self->handler == NULL) for(;;);

After recompilation, firmware runs as expected (!) and doesn't enter in
loop (!). I don't understand why this test fix this bug. I have tried to
replace in a second time :

self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, );

by a direct call to app_handler function (and of course without previous
test):

app_handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, );

Thus :
- self->handler() triggers the bug ;
- I'm not sure that self->handler is really nullified as execution
doesn't enter in if (self->handler == NULL) for(;;);
- why and how does the line if (self->handler == NULL) for(;;); fix this
issue ?

self->handler() seems to be modified by something or CPU is not able to
jump to specified address. Is there a difference in generated asm code
between a call to a function pointer and a direct call ?

I can post three firmware.elf if required :
- original that crashes ;
- modified with if (self->handler == NULL) for(;;); that runs as expected :
- modified without test but with direct call to app_handler() that runs.

Best regards,

JKB



Re: Memory corruption (?) I don't understand

2021-06-22 Thread BERTRAND Joël
Trampas Stern a écrit :
> Also when you print out the size in the make file, how much SRAM does it
> indicate your program is using? 

AVR Memory Usage

Device: atmega1284

Program:  101324 bytes (77.3% Full)
(.text + .data + .bootloader)

Data:   6890 bytes (42.1% Full)
(.data + .bss + .noinit)

EEPROM: 2276 bytes (55.6% Full)
(.eeprom)



Re: Memory corruption (?) I don't understand

2021-06-22 Thread Trampas Stern
Also when you print out the size in the make file, how much SRAM does it
indicate your program is using?

On Tue, Jun 22, 2021 at 2:10 PM Trampas Stern  wrote:

> So I would recommend that you put in the ASSERT for the LDL_PEDANTIC()
> macro.
>
> Secondly, you need to be careful with PROGMEM, that is with AVR being a
> Harvard machine you can burn up SRAM quickly by not using PROGMEM  for
> constants.  For example with the LDL library you need to be sure
> that LDL_ENABLE_AVR is defined.
>
> Note the AVR is a great processor, however I have personally not used one
> in about 15 years.  The price and ease of programming ARM Cortex M have
> gotten such AVR are no longer viable for new development as I can develop
> products so much faster with ARM Cortex M parts, and their power and price
> is comparable to AVR.
>
> Another trick you can do is inside a function do this:
>
> void printMem(void) {
>uint8_t *ptr;
>printf("stack 0x%X\n", ); //this will be address on stack where ptr
> is.
>ptr=malloc(10);
>printf("heap 0x%X\n", ptr); //this will be address as to heap
>free(ptr);
> }
>
> This will help as you can do some rough tracking of stack and heap usage,
> assuming the AVR linker script has stack growing from top of memory to end
> of heap it will help you check for stack overflow.
>
>
>
>
>
> On Tue, Jun 22, 2021 at 1:39 PM BERTRAND Joël 
> wrote:
>
>> Strange. Following function runs as expected.
>>
>> enum ldl_mac_status LDL_MAC_otaa(struct ldl_mac *self)
>> {
>> enum ldl_mac_status retval;
>> union ldl_mac_response_arg arg;
>>
>> LDL_PEDANTIC(self != NULL)
>>
>> if(self->ctx.joined){
>>
>> retval = LDL_STATUS_JOINED;
>> }
>> else if(self->op == LDL_OP_NONE){
>>
>> if(self->devNonce <= U32(UINT16_MAX)){
>>
>> forgetNetwork(self);
>>
>> self->trials = 0;
>>
>> self->day = U32(60) * U32(60) * U32(24) * timeTPS;
>>
>> #if defined(LDL_ENABLE_L2_1_1)
>> LDL_OPS_deriveJoinKeys(self);
>> #endif
>> fillJoinBuffer(self, U16(self->devNonce));
>>
>> self->devNonce++;
>>
>> arg.dev_nonce_updated.nextDevNonce = self->devNonce;
>>
>> unsigned char t[80];
>> sprintf(t, "self->handler=%p\r\n", self->handler);
>> self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, );
>>
>> self->tx.power = 0;
>>
>> self->op = LDL_OP_JOINING;
>>
>> if(self->state == LDL_STATE_IDLE){
>>
>> self->state = LDL_STATE_WAIT_OTAA;
>> LDL_MAC_timerSet(self, LDL_TIMER_WAITA, 0);
>> }
>>
>> retval = LDL_STATUS_OK;
>>
>> LDL_DEBUG("OTAA is pending")
>> }
>> else{
>>
>> /* need to re-init with a different JoinEUI */
>> retval = LDL_STATUS_DEVNONCE;
>> }
>> }
>> else{
>>
>> retval = LDL_STATUS_BUSY;
>> }
>>
>> return retval;
>> }
>>
>> If I comment out sprintf(), it crashes. If I deplace this debug
>> trace
>> before or _after_ self->handler call, firmware runs as expected. I don't
>> understand. If there is a memory corruption somewhere, I could
>> understand that a debug trace _before_ the line that triggers the bug
>> can change something. But I don't understand why the following function
>> runs as expected :
>>
>> enum ldl_mac_status LDL_MAC_otaa(struct ldl_mac *self)
>> {
>> enum ldl_mac_status retval;
>> union ldl_mac_response_arg arg;
>>
>> LDL_PEDANTIC(self != NULL)
>>
>> if(self->ctx.joined){
>>
>> retval = LDL_STATUS_JOINED;
>> }
>> else if(self->op == LDL_OP_NONE){
>>
>> if(self->devNonce <= U32(UINT16_MAX)){
>>
>> forgetNetwork(self);
>>
>> self->trials = 0;
>>
>> self->day = U32(60) * U32(60) * U32(24) * timeTPS;
>>
>> #if defined(LDL_ENABLE_L2_1_1)
>> LDL_OPS_deriveJoinKeys(self);
>> #endif
>> fillJoinBuffer(self, U16(self->devNonce));
>>
>> self->devNonce++;
>>
>> arg.dev_nonce_updated.nextDevNonce = self->devNonce;
>>
>> self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, );
>>
>> self->tx.power = 0;
>>
>> self->op = LDL_OP_JOINING;
>>
>> if(self->state == LDL_STATE_IDLE){
>>
>> self->state = LDL_STATE_WAIT_OTAA;
>> LDL_MAC_timerSet(self, LDL_TIMER_WAITA, 0);
>> }
>>
>> retval = LDL_STATUS_OK;
>>
>> LDL_DEBUG("OTAA is pending")
>> }
>> else{
>>
>> /* need to re-init with a different JoinEUI */
>> retval = LDL_STATUS_DEVNONCE;
>> }
>> }
>> else{
>>
>> retval = LDL_STATUS_BUSY;
>> }
>>
>> unsigned char t[80];
>> sprintf(t, "self->handler=%p\r\n", self->handler);
>> return retval;
>> }
>>
>> Of course, if I comment out :
>>
>> unsigned char t[80];
>> sprintf(t, 

Re: Memory corruption (?) I don't understand

2021-06-22 Thread Trampas Stern
So I would recommend that you put in the ASSERT for the LDL_PEDANTIC()
macro.

Secondly, you need to be careful with PROGMEM, that is with AVR being a
Harvard machine you can burn up SRAM quickly by not using PROGMEM  for
constants.  For example with the LDL library you need to be sure
that LDL_ENABLE_AVR is defined.

Note the AVR is a great processor, however I have personally not used one
in about 15 years.  The price and ease of programming ARM Cortex M have
gotten such AVR are no longer viable for new development as I can develop
products so much faster with ARM Cortex M parts, and their power and price
is comparable to AVR.

Another trick you can do is inside a function do this:

void printMem(void) {
   uint8_t *ptr;
   printf("stack 0x%X\n", ); //this will be address on stack where ptr
is.
   ptr=malloc(10);
   printf("heap 0x%X\n", ptr); //this will be address as to heap
   free(ptr);
}

This will help as you can do some rough tracking of stack and heap usage,
assuming the AVR linker script has stack growing from top of memory to end
of heap it will help you check for stack overflow.





On Tue, Jun 22, 2021 at 1:39 PM BERTRAND Joël 
wrote:

> Strange. Following function runs as expected.
>
> enum ldl_mac_status LDL_MAC_otaa(struct ldl_mac *self)
> {
> enum ldl_mac_status retval;
> union ldl_mac_response_arg arg;
>
> LDL_PEDANTIC(self != NULL)
>
> if(self->ctx.joined){
>
> retval = LDL_STATUS_JOINED;
> }
> else if(self->op == LDL_OP_NONE){
>
> if(self->devNonce <= U32(UINT16_MAX)){
>
> forgetNetwork(self);
>
> self->trials = 0;
>
> self->day = U32(60) * U32(60) * U32(24) * timeTPS;
>
> #if defined(LDL_ENABLE_L2_1_1)
> LDL_OPS_deriveJoinKeys(self);
> #endif
> fillJoinBuffer(self, U16(self->devNonce));
>
> self->devNonce++;
>
> arg.dev_nonce_updated.nextDevNonce = self->devNonce;
>
> unsigned char t[80];
> sprintf(t, "self->handler=%p\r\n", self->handler);
> self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, );
>
> self->tx.power = 0;
>
> self->op = LDL_OP_JOINING;
>
> if(self->state == LDL_STATE_IDLE){
>
> self->state = LDL_STATE_WAIT_OTAA;
> LDL_MAC_timerSet(self, LDL_TIMER_WAITA, 0);
> }
>
> retval = LDL_STATUS_OK;
>
> LDL_DEBUG("OTAA is pending")
> }
> else{
>
> /* need to re-init with a different JoinEUI */
> retval = LDL_STATUS_DEVNONCE;
> }
> }
> else{
>
> retval = LDL_STATUS_BUSY;
> }
>
> return retval;
> }
>
> If I comment out sprintf(), it crashes. If I deplace this debug
> trace
> before or _after_ self->handler call, firmware runs as expected. I don't
> understand. If there is a memory corruption somewhere, I could
> understand that a debug trace _before_ the line that triggers the bug
> can change something. But I don't understand why the following function
> runs as expected :
>
> enum ldl_mac_status LDL_MAC_otaa(struct ldl_mac *self)
> {
> enum ldl_mac_status retval;
> union ldl_mac_response_arg arg;
>
> LDL_PEDANTIC(self != NULL)
>
> if(self->ctx.joined){
>
> retval = LDL_STATUS_JOINED;
> }
> else if(self->op == LDL_OP_NONE){
>
> if(self->devNonce <= U32(UINT16_MAX)){
>
> forgetNetwork(self);
>
> self->trials = 0;
>
> self->day = U32(60) * U32(60) * U32(24) * timeTPS;
>
> #if defined(LDL_ENABLE_L2_1_1)
> LDL_OPS_deriveJoinKeys(self);
> #endif
> fillJoinBuffer(self, U16(self->devNonce));
>
> self->devNonce++;
>
> arg.dev_nonce_updated.nextDevNonce = self->devNonce;
>
> self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, );
>
> self->tx.power = 0;
>
> self->op = LDL_OP_JOINING;
>
> if(self->state == LDL_STATE_IDLE){
>
> self->state = LDL_STATE_WAIT_OTAA;
> LDL_MAC_timerSet(self, LDL_TIMER_WAITA, 0);
> }
>
> retval = LDL_STATUS_OK;
>
> LDL_DEBUG("OTAA is pending")
> }
> else{
>
> /* need to re-init with a different JoinEUI */
> retval = LDL_STATUS_DEVNONCE;
> }
> }
> else{
>
> retval = LDL_STATUS_BUSY;
> }
>
> unsigned char t[80];
> sprintf(t, "self->handler=%p\r\n", self->handler);
> return retval;
> }
>
> Of course, if I comment out :
>
> unsigned char t[80];
> sprintf(t, "self->handler=%p\r\n", self->handler);
>
> it crashes again :
>
> hilbert:[~/cvs/firmware-antivol] > simavr -t -vvv -m atmega1284 -f
> 1600 firmware.elf
> Loaded 95670 .text at address 0x0
> Loaded 5654 .data
> Loaded 2276 .eeprom
> 01..
> ..
> =..
>  Systella L100-A..
> =..
> ..
> Booting firmware 2021062218..
> SPI initialized..
> Reset 

Re: Memory corruption (?) I don't understand

2021-06-22 Thread BERTRAND Joël
Strange. Following function runs as expected.

enum ldl_mac_status LDL_MAC_otaa(struct ldl_mac *self)
{
enum ldl_mac_status retval;
union ldl_mac_response_arg arg;

LDL_PEDANTIC(self != NULL)

if(self->ctx.joined){

retval = LDL_STATUS_JOINED;
}
else if(self->op == LDL_OP_NONE){

if(self->devNonce <= U32(UINT16_MAX)){

forgetNetwork(self);

self->trials = 0;

self->day = U32(60) * U32(60) * U32(24) * timeTPS;

#if defined(LDL_ENABLE_L2_1_1)
LDL_OPS_deriveJoinKeys(self);
#endif
fillJoinBuffer(self, U16(self->devNonce));

self->devNonce++;

arg.dev_nonce_updated.nextDevNonce = self->devNonce;

unsigned char t[80];
sprintf(t, "self->handler=%p\r\n", self->handler);
self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, );

self->tx.power = 0;

self->op = LDL_OP_JOINING;

if(self->state == LDL_STATE_IDLE){

self->state = LDL_STATE_WAIT_OTAA;
LDL_MAC_timerSet(self, LDL_TIMER_WAITA, 0);
}

retval = LDL_STATUS_OK;

LDL_DEBUG("OTAA is pending")
}
else{

/* need to re-init with a different JoinEUI */
retval = LDL_STATUS_DEVNONCE;
}
}
else{

retval = LDL_STATUS_BUSY;
}

return retval;
}

If I comment out sprintf(), it crashes. If I deplace this debug trace
before or _after_ self->handler call, firmware runs as expected. I don't
understand. If there is a memory corruption somewhere, I could
understand that a debug trace _before_ the line that triggers the bug
can change something. But I don't understand why the following function
runs as expected :

enum ldl_mac_status LDL_MAC_otaa(struct ldl_mac *self)
{
enum ldl_mac_status retval;
union ldl_mac_response_arg arg;

LDL_PEDANTIC(self != NULL)

if(self->ctx.joined){

retval = LDL_STATUS_JOINED;
}
else if(self->op == LDL_OP_NONE){

if(self->devNonce <= U32(UINT16_MAX)){

forgetNetwork(self);

self->trials = 0;

self->day = U32(60) * U32(60) * U32(24) * timeTPS;

#if defined(LDL_ENABLE_L2_1_1)
LDL_OPS_deriveJoinKeys(self);
#endif
fillJoinBuffer(self, U16(self->devNonce));

self->devNonce++;

arg.dev_nonce_updated.nextDevNonce = self->devNonce;

self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, );

self->tx.power = 0;

self->op = LDL_OP_JOINING;

if(self->state == LDL_STATE_IDLE){

self->state = LDL_STATE_WAIT_OTAA;
LDL_MAC_timerSet(self, LDL_TIMER_WAITA, 0);
}

retval = LDL_STATUS_OK;

LDL_DEBUG("OTAA is pending")
}
else{

/* need to re-init with a different JoinEUI */
retval = LDL_STATUS_DEVNONCE;
}
}
else{

retval = LDL_STATUS_BUSY;
}

unsigned char t[80];
sprintf(t, "self->handler=%p\r\n", self->handler);
return retval;
}

Of course, if I comment out :

unsigned char t[80];
sprintf(t, "self->handler=%p\r\n", self->handler);

it crashes again :

hilbert:[~/cvs/firmware-antivol] > simavr -t -vvv -m atmega1284 -f
1600 firmware.elf
Loaded 95670 .text at address 0x0
Loaded 5654 .data
Loaded 2276 .eeprom
01..
..
=..
 Systella L100-A..
=..
..
Booting firmware 2021062218..
SPI initialized..
Reset LORA..
Reset LORA done..
LoRaWAN 1.1..
Initialization SX1262..
Initialization SX1262 done..
..
MAC initialization..
LDL_MAC_addChannel:790>chIndex=0 freq=86810 minRate=0 maxRate=5..
LDL_MAC_addChannel:790>chIndex=1 freq=86830 minRate=0 maxRate=5..
LDL_MAC_addChannel:790>chIndex=2 freq=86850 minRate=0 maxRate=5..
cb type=11..
processInit:994>set radio reset: ticks=151..
processRadioReset:1009>clear radio reset: ticks=151..
MAC initialization done..
lora_send..
processStartRadioForEntropy:1061>listen for entropy: ticks=152..
processEntropy:1078>read entropy: ticks=152 entropy=0..
cb type=0..
LDL_MAC_ready..
LDL_MAC_otaa..
LDL_MAC_addChannel:790>chIndex=0 freq=86810 minRate=0 maxRate=5..
LDL_MAC_addChannel:790>chIndex=1 freq=86830 minRate=0 maxRate=5..
LDL_MAC_addChannel:790>chIndex=2 freq=86850 minRate=0 maxRate=5..
avr_gdb_init listening on port 1234





Re: Memory corruption (?) I don't understand

2021-06-22 Thread BERTRAND Joël
BERTRAND Joël a écrit :
> Trampas Stern a écrit :
>> Does the code run if you comment out 'lora_send("coucou", 7);
> 
>   Yes. To be honnest, if I comment out LDL_MAC_otaa(), it runs.
> 
>> Does it still crash with simple main() { _delay_ms(1000);
>> stty_print("hello");}
>>
>> That is start removing code until it works, when you find the line in
>> main that causes a crash, go into that function and start removing code
>> until it works.  Slowly you will find the problem. 
> 
>   I have tried (and I continue...) but I'm not sure that this bug is in
> LoRaWAN library. Indeed, LDL_MAC_otaa() shows my bug but, sometimes,
> just with a debug trace in main(), LDL_MAC_otaa() doesn't crash.

In lorawan/ldl_mac.c, you have :


enum ldl_mac_status LDL_MAC_otaa(struct ldl_mac *self)
{
...
fillJoinBuffer(self, U16(self->devNonce));

self->devNonce++;

arg.dev_nonce_updated.nextDevNonce = self->devNonce;

self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, );

self->tx.power = 0;

self->op = LDL_OP_JOINING;
...
}


If I comment out :

self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, );

Firmware doesn't crash. This pointer is a pointer to app_handler() in
./lorawan.c.

Now, I suppose that this pointer is modified. Thus, I have add a debug
trace :

enum ldl_mac_status LDL_MAC_otaa(struct ldl_mac *self)
{
enum ldl_mac_status retval;
union ldl_mac_response_arg arg;
...
unsigned char t[80];
snprintf(t, 80, "self->handler=%p\r\n", self->handler);
stty_print(t);
self->handler(self->app, LDL_MAC_DEV_NONCE_UPDATED, );

self->tx.power = 0;

self->op = LDL_OP_JOINING;
...
}

and with this simple debug trace, it doesn't crash anymore !...




Re: Memory corruption (?) I don't understand

2021-06-22 Thread BERTRAND Joël
Trampas Stern a écrit :
> Does the code run if you comment out 'lora_send("coucou", 7);

Yes. To be honnest, if I comment out LDL_MAC_otaa(), it runs.

> Does it still crash with simple main() { _delay_ms(1000);
> stty_print("hello");}
> 
> That is start removing code until it works, when you find the line in
> main that causes a crash, go into that function and start removing code
> until it works.  Slowly you will find the problem. 

I have tried (and I continue...) but I'm not sure that this bug is in
LoRaWAN library. Indeed, LDL_MAC_otaa() shows my bug but, sometimes,
just with a debug trace in main(), LDL_MAC_otaa() doesn't crash.

Regards,

JB



Re: Memory corruption (?) I don't understand

2021-06-22 Thread BERTRAND Joël
David Brown a écrit :
> On 22/06/2021 18:00, Ian Molton wrote:
>> On 22/06/2021 12:59, BERTRAND Joël wrote:
>>> Hello,
>>>
>>> I'm writing a firmware for a board that uses a ATMega 1284. Firmware
>>> continuously restarts
>>
>> This is likely a branch through zero.
>>
>> Common causes on AVR are bad function pointers, and stack damage.
>>
> 
> A good way to check this is to track the reset cause.  At startup, look
> for the reset cause (and print it out on debugging, record it in logs,
> or whatever is appropriate for the system), then clear the reset cause
> flags.  If you find you are starting up and there are no reset cause
> flags set, then you haven't actually reset - you've just jumped to
> address 0.
> 
> 

MCUSR=02 (after external reset)
=
 Systella L100-A
=

Booting firmware 2021062218
SPI initialized
Reset LORA
Reset LORA done
LoRaWAN 1.1
Initialization SX1262
Initialization SX1262 done
0004A30B0013F046
MAC initialization
LDL_MAC_addChannel:786>chIndex=0 freq=86810 minRate=0 maxRate=5
LDL_MAC_addChannel:786>chIndex=1 freq=86830 minRate=0 maxRate=5
LDL_MAC_addChannel:786>chIndex=2 freq=86850 minRate=0 maxRate=5
cb type=11
processInit:990>set radio reset: ticks=126
processRadioReset:1005>clear radio reset: ticks=126
MAC initialization done
lora_send
processStartRadioForEntropy:1057>listen for entropy: ticks=127
processEntropy:1074>read entropy: ticks=127 entropy=1385813498
cb type=0
LDL_MAC_ready
LDL_MAC_otaa
LDL_MAC_addChannel:786>chIndex=0 freq=86810 minRate=0 maxRate=5
LDL_MAC_addChannel:786>chIndex=1 freq=86830 minRate=0 maxRate=5
LDL_MAC_addChannel:786>chIndex=2 freq=86850 minRate=0 maxRate=5
MCUSR=00
=
 Systella L100-A
=

Booting firmware 2021062218
SPI initialized
...


RESET flags are always 0 when firmwware restarts...



Re: Memory corruption (?) I don't understand

2021-06-22 Thread Trampas Stern
Does the code run if you comment out 'lora_send("coucou", 7);
Does it still crash with simple main() { _delay_ms(1000);
stty_print("hello");}

That is start removing code until it works, when you find the line in main
that causes a crash, go into that function and start removing code until it
works.  Slowly you will find the problem.

On Tue, Jun 22, 2021 at 12:00 PM Ian Molton 
wrote:

> On 22/06/2021 12:59, BERTRAND Joël wrote:
> >   Hello,
> >
> >   I'm writing a firmware for a board that uses a ATMega 1284.
> Firmware
> > continuously restarts
>
> This is likely a branch through zero.
>
> Common causes on AVR are bad function pointers, and stack damage.
>
> -Ian
>
>


Re: Memory corruption (?) I don't understand

2021-06-22 Thread David Brown
On 22/06/2021 18:00, Ian Molton wrote:
> On 22/06/2021 12:59, BERTRAND Joël wrote:
>>  Hello,
>>
>>  I'm writing a firmware for a board that uses a ATMega 1284. Firmware
>> continuously restarts
> 
> This is likely a branch through zero.
> 
> Common causes on AVR are bad function pointers, and stack damage.
> 

A good way to check this is to track the reset cause.  At startup, look
for the reset cause (and print it out on debugging, record it in logs,
or whatever is appropriate for the system), then clear the reset cause
flags.  If you find you are starting up and there are no reset cause
flags set, then you haven't actually reset - you've just jumped to
address 0.




Re: Memory corruption (?) I don't understand

2021-06-22 Thread Ian Molton
On 22/06/2021 12:59, BERTRAND Joël wrote:
>   Hello,
> 
>   I'm writing a firmware for a board that uses a ATMega 1284. Firmware
> continuously restarts

This is likely a branch through zero.

Common causes on AVR are bad function pointers, and stack damage.

-Ian



Re: Memory corruption (?) I don't understand

2021-06-22 Thread BERTRAND Joël
Trampas Stern a écrit :
> Can you comment out lines and have firmware run?  

I don't understand. I have tried to comment out some lines. Sometimes
firmware runs as expected, sometimes, it continuously reboots.

When I comment out some lines or add some debug trace, firmware desn't
crash at the same place. I have deleted some piece of code to only
initialize my board and to try to access to SX1262...

> Could it be that your power supply is dropping and browning out? 

Nope, unfortunately. I have done tests with in-board uninterruptible
power supply and with an external 150W power supply.

Please note that simavr shows exactly the same result :

hilbert:[~/cvs/firmware-antivol] > simavr -t -vvv -m atmega1284 -f
1600 firmware.elf
Loaded 95692 .text at address 0x0
Loaded 5646 .data
Loaded 2276 .eeprom
..
=..
 Systella L100-A..
=..
..
Booting firmware 2021062213..
SPI initialized..
Reset LORA..
Reset LORA done..
LoRaWAN 1.1..
Initialization SX1262..
Initialization SX1262 done..
..
MAC initialization..
LDL_MAC_addChannel:786>chIndex=0 freq=86810 minRate=0 maxRate=5..
LDL_MAC_addChannel:786>chIndex=1 freq=86830 minRate=0 maxRate=5..
LDL_MAC_addChannel:786>chIndex=2 freq=86850 minRate=0 maxRate=5..
cb type=11..
processInit:990>set radio reset: ticks=126..
processRadioReset:1005>clear radio reset: ticks=126..
MAC initialization done..
lora_send..
processStartRadioForEntropy:1057>listen for entropy: ticks=127..
processEntropy:1074>read entropy: ticks=127 entropy=0..
cb type=0..
LDL_MAC_ready..
LDL_MAC_otaa..
LDL_MAC_addChannel:786>chIndex=0 freq=86810 minRate=0 maxRate=5..
LDL_MAC_addChannel:786>chIndex=1 freq=86830 minRate=0 maxRate=5..
LDL_MAC_addChannel:786>chIndex=2 freq=86850 minRate=0 maxRate=5..
avr_gdb_init listening on port 1234

Real board reboots (or stalls) when simavr enters in gdb.

Regards,

JB



Re: Memory corruption (?) I don't understand

2021-06-22 Thread Trampas Stern
Can you comment out lines and have firmware run?

Could it be that your power supply is dropping and browning out?




On Tue, Jun 22, 2021 at 7:59 AM BERTRAND Joël 
wrote:

> Hello,
>
> I'm writing a firmware for a board that uses a ATMega 1284.
> Firmware
> continuously restarts and I'm not able to fix this bug. It looks like a
> memory corruption, but after several days without finding this bug, I
> doubt. Hardware seems to be bug free and runs as expected.
>
> I have uploaded this firmware at
> https://hilbert.systella.fr/public/firmware.tar.gz
>
> In this tarball, firmware.elf was compiled with gcc 11.1.0 (and
> crashes).
>
> I have removed all multitasking capabilities, mutex_* functions
> don't
> nothing. Firmware crashes in lora_send(). There is no memory allocation
> (only one malloc() in this code but in function gnss() and this function
> is not called). Firmware starts in main(), initialises some chips and
> LoRa module (SX1262) and enters in a loop.
>
> simavr gives the same result, but avr-gdb doesn't give usable
> information. I have tried to build firmware with gcc 11.1.0 (instead of
> 5.4.0). Sams result, but avr-gdb now returns "internal error". Thus, I
> have built a new avr-gdb (9.2). Problem :
>
> hilbert:[/usr/local/cross/avr/bin] > ./avr-gdb -v
> Erreur de segmentation
> hilbert:[/usr/local/cross/avr/bin] >
>
> I have tried to bissect, to add debug traces (with some debug
> traces,
> firmware runs as expected !) without result. I have tried -Os and -O2
> also. I no longer know how to correct the problem.
>
> Help will be welcome.
>
> Best regards,
>
> JB
>
>