> Here is the sync pattern the code normally achieves, once the parent has
> successfully spawned a child thread, which has to wait for a start signal
> before it may run application code:
>
> 1. parent calls threadobj_start(child)
> 1.1 child->status |= __THREAD_S_STARTED
> 1.2 wait for child->status & __THREAD_S_ACTIVE
>
> 2. child calls threadobj_wait_start(self)
> 2.1 wait for self->status & __THREAD_S_STARTED
> 2.2 raise self->status |= __THREAD_S_ACTIVE
>
> All accesses to the status bits are serialized by a per-thread mutex,
> operated by the threadobj_lock/unlock accessors, which also covers the
> condvar signaling/waiting as one would expect.
>
> When running in pshared mode, thread descriptors (holding ->status, mutex
> and barrier sync) are obtained from /dev/shm. If --disable-pshared, we are
> using 100% process-private memory.
>
> Case 1: a race when manipulating the thread status due to inconsistent
> locking. I could not find any so far.
>
> Case 2: a cache coherence issue in SMP, also caused by improper locking.
> Otherwise, the locking should enforce memory barriers as expected.
>
> Case 3: anything not mentioned in other cases...
>
> - Could you paste/copy the disassembly (objdump -dl rather than gdb's
> disass) of the wait_on_barrier() function?
>
>
I have attached the disassembly as wait_on_barrier_disas.txt
> - Does running both programs with --cpu-affinity=0/1 change the outcome?
>
>
There is no change in behavior when trying any combination of cpu
affinities, with either the "task-1" alchemy test or my event test apps.
> - Without specifying any affinity this time, could you run the current
> test with the debug patch below applied (this is clearly not a fix)? The
> patch forces the code to read the value of the ->status field before
> waiting on the barrier. With that code in and a backtrace showing locals,
> we should be able to check the status word when threadobj_wait_start() is
> entered.
>
diff --git a/lib/copperplate/threadobj.c b/lib/copperplate/threadobj.c
> index cc64caa..ed85a12 100644
> --- a/lib/copperplate/threadobj.c
> +++ b/lib/copperplate/threadobj.c
> @@ -1273,7 +1273,9 @@ void threadobj_wait_start(void) /* current->lock
> free. */
> int status;
>
> threadobj_lock(current);
> - status = wait_on_barrier(current,
> __THREAD_S_STARTED|__THREAD_S_ABORTED);
> + status = current->status;
> + if (!(status & __THREAD_S_STARTED))
> + status = wait_on_barrier(current,
> __THREAD_S_STARTED|__THREAD_S_ABORTED);
> threadobj_unlock(current);
>
> /*
>
> --
> Philippe.
>
I patched in the debug and I have attached the full backtraces of threads 1
and 3 of the "task-1" alchemy test.
At the time of the hang:
- parent sees status = 73 (matches the flags set during
threadobj_start())
- child sees status = 8 (locked?)
- Charles
-------------- next part --------------
00001868 <wait_on_barrier>:
wait_on_barrier():
/home/debian/Development/build/lib/copperplate/../../../xenomai-3/lib/copperplate/threadobj.c:1200
1868: b580 push {r7, lr}
186a: b0ce sub sp, #312 ; 0x138
186c: af00 add r7, sp, #0
186e: 1d3b adds r3, r7, #4
1870: 6018 str r0, [r3, #0]
1872: 463b mov r3, r7
1874: 6019 str r1, [r3, #0]
/home/debian/Development/build/lib/copperplate/../../../xenomai-3/lib/copperplate/threadobj.c:1204
1876: 1d3b adds r3, r7, #4
1878: 681b ldr r3, [r3, #0]
187a: 6a9b ldr r3, [r3, #40] ; 0x28
187c: f8c7 3134 str.w r3, [r7, #308] ; 0x134
/home/debian/Development/build/lib/copperplate/../../../xenomai-3/lib/copperplate/threadobj.c:1205
1880: 463b mov r3, r7
1882: f8d7 2134 ldr.w r2, [r7, #308] ; 0x134
1886: 681b ldr r3, [r3, #0]
1888: 4013 ands r3, r2
188a: 2b00 cmp r3, #0
188c: d148 bne.n 1920 <wait_on_barrier+0xb8>
/home/debian/Development/build/lib/copperplate/../../../xenomai-3/lib/copperplate/threadobj.c:1207
188e: 1d3b adds r3, r7, #4
1890: 681b ldr r3, [r3, #0]
1892: 6a5b ldr r3, [r3, #36] ; 0x24
1894: f8c7 3130 str.w r3, [r7, #304] ; 0x130
/home/debian/Development/build/lib/copperplate/../../../xenomai-3/lib/copperplate/threadobj.c:1208
1898: f240 0300 movw r3, #0
189c: f2c0 0300 movt r3, #0
18a0: f8c7 312c str.w r3, [r7, #300] ; 0x12c
18a4: 1d3b adds r3, r7, #4
18a6: 681b ldr r3, [r3, #0]
18a8: 3308 adds r3, #8
18aa: f8c7 3128 str.w r3, [r7, #296] ; 0x128
18ae: f107 0308 add.w r3, r7, #8
18b2: 2100 movs r1, #0
18b4: 4618 mov r0, r3
18b6: f7ff fffe bl 0 <__sigsetjmp>
18ba: f8c7 0124 str.w r0, [r7, #292] ; 0x124
18be: f8d7 3124 ldr.w r3, [r7, #292] ; 0x124
18c2: 2b00 cmp r3, #0
18c4: d009 beq.n 18da <wait_on_barrier+0x72>
/home/debian/Development/build/lib/copperplate/../../../xenomai-3/lib/copperplate/threadobj.c:1208
(discriminator 2)
18c6: f8d7 312c ldr.w r3, [r7, #300] ; 0x12c
18ca: f8d7 0128 ldr.w r0, [r7, #296] ; 0x128
18ce: 4798 blx r3
18d0: f107 0308 add.w r3, r7, #8
18d4: 4618 mov r0, r3
18d6: f7ff fffe bl 0 <__pthread_unwind_next>
/home/debian/Development/build/lib/copperplate/../../../xenomai-3/lib/copperplate/threadobj.c:1208
(discriminator 3)
18da: f107 0308 add.w r3, r7, #8
18de: 4618 mov r0, r3
18e0: f7ff fffe bl 0 <__pthread_register_cancel>
/home/debian/Development/build/lib/copperplate/../../../xenomai-3/lib/copperplate/threadobj.c:1209
(discriminator 3)
18e4: 1d3b adds r3, r7, #4
18e6: 6818 ldr r0, [r3, #0]
18e8: f7fe fd80 bl 3ec <__threadobj_tag_unlocked>
/home/debian/Development/build/lib/copperplate/../../../xenomai-3/lib/copperplate/threadobj.c:1210
(discriminator 3)
18ec: 1d3b adds r3, r7, #4
18ee: 681b ldr r3, [r3, #0]
18f0: f503 7280 add.w r2, r3, #256 ; 0x100
18f4: 1d3b adds r3, r7, #4
18f6: 681b ldr r3, [r3, #0]
18f8: 3308 adds r3, #8
18fa: 4619 mov r1, r3
18fc: 4610 mov r0, r2
18fe: f7ff fffe bl 1364 <threadobj_cond_wait>
/home/debian/Development/build/lib/copperplate/../../../xenomai-3/lib/copperplate/threadobj.c:1211
(discriminator 3)
1902: 1d3b adds r3, r7, #4
1904: 6818 ldr r0, [r3, #0]
1906: f7fe fd61 bl 3cc <__threadobj_tag_locked>
/home/debian/Development/build/lib/copperplate/../../../xenomai-3/lib/copperplate/threadobj.c:1212
(discriminator 3)
190a: f107 0308 add.w r3, r7, #8
190e: 4618 mov r0, r3
1910: f7ff fffe bl 0 <__pthread_unregister_cancel>
/home/debian/Development/build/lib/copperplate/../../../xenomai-3/lib/copperplate/threadobj.c:1213
(discriminator 3)
1914: 1d3b adds r3, r7, #4
1916: 681b ldr r3, [r3, #0]
1918: f8d7 2130 ldr.w r2, [r7, #304] ; 0x130
191c: 625a str r2, [r3, #36] ; 0x24
/home/debian/Development/build/lib/copperplate/../../../xenomai-3/lib/copperplate/threadobj.c:1214
(discriminator 3)
191e: e7aa b.n 1876 <wait_on_barrier+0xe>
/home/debian/Development/build/lib/copperplate/../../../xenomai-3/lib/copperplate/threadobj.c:1206
1920: bf00 nop
/home/debian/Development/build/lib/copperplate/../../../xenomai-3/lib/copperplate/threadobj.c:1216
1922: f8d7 3134 ldr.w r3, [r7, #308] ; 0x134
/home/debian/Development/build/lib/copperplate/../../../xenomai-3/lib/copperplate/threadobj.c:1217
1926: 4618 mov r0, r3
1928: f507 779c add.w r7, r7, #312 ; 0x138
192c: 46bd mov sp, r7
192e: bd80 pop {r7, pc}
-------------- next part --------------
(gdb) thread 1
[Switching to thread 1 (Thread 0xb6ff0000 (LWP 4007))]
#0 __libc_do_syscall () at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
46 in ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S
(gdb) bt full
#0 __libc_do_syscall () at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
No locals.
#1 0xb6f5f6f2 in __pthread_cond_wait (cond=0xb6caa194, mutex=0xb6caa09c) at
pthread_cond_wait.c:177
_a2tmp = 11
_a2 = 11
_nametmp = 240
_a3tmp = 3
_a3 = 3
_a1 = -1228234344
_v1tmp = -1228234596
_a4tmp = 0
_a1tmp = -1228234344
_a4 = 0
_v1 = -1228234596
_name = 240
__ret = <optimized out>
futex_val = 3
buffer = {__routine = 0xb6f5f3e1 <__condvar_cleanup>, __arg =
0xbefffa00, __canceltype = -1225038976, __prev = 0x0}
cbuffer = {oldtype = 1, cond = 0xb6caa194, mutex = 0xb6caa09c, bc_seq =
0}
err = <optimized out>
pshared = <optimized out>
pi_flag = 1
val = <optimized out>
seq = <optimized out>
#2 0xb6f9cfa6 in threadobj_cond_wait (cond=0xb6caa194, lock=0xb6caa09c) at
../../../xenomai-3/lib/copperplate/threadobj.c:980
ret = -1228234604
#3 0xb6f9d554 in wait_on_barrier (thobj=0xb6caa094, mask=16) at
../../../xenomai-3/lib/copperplate/threadobj.c:1231
__cancel_buf = {__cancel_jmp_buf = {{__cancel_jmp_buf = {-619902981,
-754243412, -1090520024, 0, 0, -1090520496, 0, 0, -1224740864, 0 <repeats 17
times>,
2, 0, 5, 0, 1, -1224758200, -1659757782, -1224874387, 0,
-1224738124, -1224757760, -1090520312, -1224757760, 0, -1, -1224796312,
-1225176048,
-1224794112, -1225021852, -1224802304, -1224801088, 1, 0,
-1224859353, -1224796312, 1, 5, 0, 0, 1, -1225038976, 0, -1090520024, 0,
-1090520232,
-1090520024, 0, 0}, __mask_was_saved = 0}}, __pad =
{0xbefffc28, 0x0, 0x0, 0xb6ca496c}}
__cancel_routine = 0xb6f5e225 <__GI___pthread_mutex_unlock>
__cancel_arg = 0xb6caa09c
__not_first_call = 0
oldstate = 0
status = 73
#4 0xb6f9d606 in threadobj_start (thobj=0xb6caa094) at
../../../xenomai-3/lib/copperplate/threadobj.c:1268
current = 0xb6ca496c
ret = 0
oldstate = 1
#5 0xb6fc2d10 in rt_task_start (task=0x20ed0 <t_main>, entry=0x10985
<main_task>, arg=0xdeadbeef) at ../../../xenomai-3/lib/alchemy/task.c:634
tcb = 0xb6ca9f6c
svc = {cancel_type = -559038737}
ret = 67973
#6 0x00010a62 in main (argc=1, argv=0x21548) at task-1.c:26
ret = 0
(gdb)
-------------- next part --------------
(gdb) thread 3
[Switching to thread 3 (Thread 0xb6c6f460 (LWP 4016))]
#0 __libc_do_syscall () at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
46 in ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S
(gdb) bt full
#0 __libc_do_syscall () at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
No locals.
#1 0xb6f5f6f2 in __pthread_cond_wait (cond=0xb6caa194, mutex=0xb6caa09c) at
pthread_cond_wait.c:177
_a2tmp = 11
_a2 = 11
_nametmp = 240
_a3tmp = 1
_a3 = 1
_a1 = -1228234344
_v1tmp = -1228234596
_a4tmp = 0
_a1tmp = -1228234344
_a4 = 0
_v1 = -1228234596
_name = 240
__ret = <optimized out>
futex_val = 1
buffer = {__routine = 0xb6f5f3e1 <__condvar_cleanup>, __arg =
0xb6c6ec18, __canceltype = -1225038976, __prev = 0x0}
cbuffer = {oldtype = 1, cond = 0xb6caa194, mutex = 0xb6caa09c, bc_seq =
0}
err = <optimized out>
pshared = <optimized out>
pi_flag = 1
val = <optimized out>
seq = <optimized out>
#2 0xb6f9cfa6 in threadobj_cond_wait (cond=0xb6caa194, lock=0xb6caa09c) at
../../../xenomai-3/lib/copperplate/threadobj.c:980
ret = -1228234596
#3 0xb6f9d554 in wait_on_barrier (thobj=0xb6caa094, mask=5) at
../../../xenomai-3/lib/copperplate/threadobj.c:1231
__cancel_buf = {__cancel_jmp_buf = {{__cancel_jmp_buf = {-751562301,
-754243412, -1228476860, -1090520124, 0, -1228477336, -1228476512, -1224802304,
0,
-1228474732, 0 <repeats 16 times>, 1, -1224758200, -1228477152,
-1228477004, 0, -1228477120, -1224757760, -1228477168, -1224757760,
-1224738124,
-1, 0, -1225177664, -1224794112, 1, -1224794112, 0, 0, 0, 0,
-1224793672, -1228477112, -1224793672, 0, -1, 0, -1225027584, 119632,
-1225021852,
0, -1228234596, -1090520124, 0, -1228477040, -1228476512,
-1224802304, 0, -1225404053}, __mask_was_saved = 0}}, __pad = {0xb6c6ee80, 0x0,
0x0,
0xb6c6ed90}}
__cancel_routine = 0xb6f5e225 <__GI___pthread_mutex_unlock>
__cancel_arg = 0xb6caa09c
__not_first_call = 0
oldstate = 0
status = 8
#4 0xb6f9d67a in threadobj_wait_start () at
../../../xenomai-3/lib/copperplate/threadobj.c:1299
current = 0xb6caa094
status = 8
#5 0xb6fc2770 in task_prologue_2 (tcb=0xb6ca9f6c) at
../../../xenomai-3/lib/alchemy/task.c:211
ret = 0
#6 0xb6fc27b6 in task_entry (arg=0xb6ca9f6c) at
../../../xenomai-3/lib/alchemy/task.c:227
__ret = -1228476896
tcb = 0xb6ca9f6c
svc = {cancel_type = 1}
ret = -1228475296
__FUNCTION__ = "task_entry"
#7 0xb6f9a2d8 in thread_trampoline (arg=0xbefffb94) at
../../../xenomai-3/lib/copperplate/internal.c:251
cta = 0xbefffb94
_cta = {stacksize = 0, detachstate = 1, policy = 1, param_ex =
{__sched_priority = 99, sched_u = {rr = {__sched_rr_quantum = {tv_sec =
-1225254836,
tv_nsec = 0}}}}, prologue = 0xb6fc2719 <task_prologue_1>, run
= 0xb6fc27a5 <task_entry>, arg = 0xb6ca9f6c, __reserved = {status = -38, warm =
{
__size =
"\001\000\000\000\000\000\000\000\060m\376\266\001\000\000", __align = 1},
released = 0x10cac}}
released = {__size = '\000' <repeats 15 times>, __align = 0}
ret = 0
__FUNCTION__ = "thread_trampoline"
#8 0xb6f5b424 in start_thread (arg=0x0) at pthread_create.c:335
pd = 0x0
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {-751561765, -754497110,
-1228475296, -1090520280, 0, -1228476816, -1228476512, -1224802304, 0,
-1228474732,
0 <repeats 54 times>}, mask_was_saved = 0}}, priv = {pad =
{0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
not_first_call = <optimized out>
pagesize_m1 = <optimized out>
sp = <optimized out>
freesize = <optimized out>
__PRETTY_FUNCTION__ = "start_thread"
#9 0xb6eb243c in ?? () at ../sysdeps/unix/sysv/linux/arm/clone.S:89 from
/lib/arm-linux-gnueabihf/libc.so.6
No locals.
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb)
_______________________________________________
Xenomai mailing list
[email protected]
http://xenomai.org/mailman/listinfo/xenomai