cat /proc/cpuinfo
  [[snip]]
processor    : 1
model name    : ARMv7 Processor rev 10 (v7l)
BogoMIPS    : 132.00
Features    : half thumb fastmult vfp edsp neon vfpv3 tls vfpd32
CPU implementer    : 0x41
CPU architecture: 7
CPU variant    : 0x2
CPU part    : 0xc09
CPU revision    : 10

Hardware    : Freescale i.MX6 Quad/DualLite (Device Tree)
Revision    : 0000
Serial        : 0000000000000000

Operating system info:
cat /etc/openwrt_release
DISTRIB_ID='OpenWrt'
DISTRIB_RELEASE='15.05'
DISTRIB_REVISION='r48153'
DISTRIB_CODENAME='chaos_calmer'
DISTRIB_TARGET='imx6/generic'
DISTRIB_DESCRIPTION='OpenWrt Chaos Calmer 15.05'
DISTRIB_TAINTS='no-all busybox'

cat /proc/version
Linux version 3.18.23 (gcc version 5.3.0 (OpenWrt GCC 5.3.0 r48153) ) #6 SMP 
Tue Jul 11 16:35:20 CEST 2017

Source code could be downloaded from:
https://uclibc.org/downloads/uClibc-0.9.33.2.tar.bz2

Extracted chunks from trace output can be downloaded from:
https://github.com/KKoovalsky/Valgrind-problems

The file is called vgtrace-shortened.txt. Full trace available in vgtrace.txt 
file. In this repo I also included compiled uClibc library.

Thank you for the detailed information, particularly the vgtrace*.txt.

It's a compiler "bug", and a "bug" in the memcheck implementation,
and a definite bug in the memcheck error reporting.

The workaround is to invoke valgrind(memcheck) with 
"--ignore-range-below-sp=0x0-0x14".

The problem can be seen here:
===== vgtrace-shortened.txt line 8308
         (arm) 0x4817678:  mov r12, r13   ## copy r12 from r13(==sp)

              ------ IMark(0x4817678, 4, 0) ------
              t1 = GET:I32(60)
              t0 = t1
              t2 = t0
              PUT(56) = t2
              PUT(68) = 0x481767C:I32

       (arm) 0x481767C:  stmdb r13!, {0xDFF0}  ## push 
r15(==pc),r14(==lr),r12,r11-r4 onto stack (sp===r13) *in that order*
              ------ IMark(0x481767C, 4, 0) ------
              t3 = GET:I32(60)
              t4 = t3
              PUT(60) = Sub32(t3,0x2C:I32)
              STle(Sub32(t4,0x4:I32)) = 0x4817684:I32
              STle(Sub32(t4,0x8:I32)) = GET:I32(64)
              STle(Sub32(t4,0xC:I32)) = GET:I32(56)
              STle(Sub32(t4,0x10:I32)) = GET:I32(52)
              STle(Sub32(t4,0x14:I32)) = GET:I32(48)
              STle(Sub32(t4,0x18:I32)) = GET:I32(44)
              STle(Sub32(t4,0x1C:I32)) = GET:I32(40)
              STle(Sub32(t4,0x20:I32)) = GET:I32(36)
              STle(Sub32(t4,0x24:I32)) = GET:I32(32)
              STle(Sub32(t4,0x28:I32)) = GET:I32(28)
              STle(Sub32(t4,0x2C:I32)) = GET:I32(24)
              PUT(68) = 0x4817680:I32

        (arm) 0x4817680:  sub r11, r12, #0x4   ## r12 has same value as sp 
before the 'stmdb'

              ------ IMark(0x4817680, 4, 0) ------
              t5 = GET:I32(56)
              t6 = 0x4:I32
              t7 = Sub32(t5,t6)
              PUT(52) = t7
              PUT(68) = 0x4817684:I32

        (arm) 0x4817684:  ldmdb r11, {0xAFF0}   ## load r15(==pc) from stored 
lr, r13(==sp) from stored r12, r11-r4 from stored original values

              ------ IMark(0x4817684, 4, 0) ------
              t8 = GET:I32(52)
              t9 = t8
              PUT(68) = LDle:I32(Sub32(t9,0x4:I32))
              PUT(60) = LDle:I32(Sub32(t9,0x8:I32))
              PUT(48) = LDle:I32(Sub32(t9,0x10:I32))
              PUT(44) = LDle:I32(Sub32(t9,0x14:I32))
              PUT(40) = LDle:I32(Sub32(t9,0x18:I32))
              PUT(36) = LDle:I32(Sub32(t9,0x1C:I32))
              PUT(32) = LDle:I32(Sub32(t9,0x20:I32))
              PUT(28) = LDle:I32(Sub32(t9,0x24:I32))
              PUT(24) = LDle:I32(Sub32(t9,0x28:I32))
              PUT(52) = LDle:I32(Sub32(t9,0xC:I32))
              PUT(68) = GET:I32(68)
              PUT(68) = GET:I32(68); exit-Boring

GuestBytes 4817678 16  0D C0 A0 E1 F0 DF 2D E9 04 B0 4C E2 F0 AF 1B E9  0028F343

VexExpansionRatio 16 952   595 :10

==26904== Invalid read of size 4
==26904==    at 0x4000E54: ??? (in /lib/ld-uClibc-0.9.33.2.so)
==26904==  Address 0x7dad09fc is on thread 1's stack
==26904==  20 bytes below stack pointer
=====

The net effect of those 3 instructions is:
    r0-r3 do not change; none of them was written
    r4-r10 do not change; each value is stored and fetched to/from the same 
(corresponding) address
    r11 = (r12 - 4) from the 'sub'
    r12 gets the original (and final) value of r13(==sp)
    r13(==sp) does not change.  It was decremented by 44 (11 registers times 4 
bytes per register)
        but then loaded from the location which was written with the value of 
r12, which is
        the same as the original sp
    r14(==lr) does not change; it never was written
    r15(==pc) is loaded from the original value in r14(==lr) which is the 
return address

memcheck's bug is reporting the location "at 0x..." using the new value that 
was loaded into pc,
instead of the original value of the pc of the instruction which suffered the 
complaint.

The compiler bug is relying on a particular implementation of poorly-specified 
hardware.
The "ldmdb r11, {0xAFF0}" reads 10 words from memory, and changes the value of 
r13(==sp)
among other registers.  The compiler assumes that the change to r13 does not 
become visible
until the entire instruction has completed, yet this is not guaranteed 
explicitly.
It is conceivable that the 'ldmdb' could be interrupted immediately after 
writing r13(==sp),
save internal state as part of servicing the interrupt, and resume state upon 
return.
If so, then the fetches to load the remaining registers are outside the boundary
of the stack (namely, less than the downward-growing sp), and that's a memcheck 
error.
On the other hand, all known hardware does not allow such an interrupt (all 
side effects
are "atomic") so the memcheck implementation is not faithful because it uses 
the new
value of r13(==sp) to check subsequent memory fetches for other registers 
before the
'lmdb' instruction ends.

The compiler's choice of storing and re-loading r4-r11 is horribly inefficient:
8 writes and 8 reads that only waste time.  The value stored from r15(==pc) via 
the 'stmdb'
never is read.  The entire sequence could be replaced by "bx lr" or "mov pc,lr",
(possibly preceded by "mov r12,sp"); except that 'bx' is not implemented in some
early hardware, and "mov pc,lr" is frowned upon in hardware that does have 'bx'.
One possible solution that works everywhere is to use "blx lr" and just ignore
the value that is written to lr.

--

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users

Reply via email to