On 02/12/2016 03:08 PM, Charles Kiorpes wrote:
>
>
> On Fri, Feb 12, 2016 at 5:43 AM, Philippe Gerum <[email protected]
> <mailto:[email protected]>> wrote:
>
> On 02/11/2016 01:57 PM, Charles Kiorpes wrote:
> >
> > I attempted to run several tests: 'task-1', 'event-1', and 'mutex-1'.
> > Each of these hung indefinitely. A gdb trace indicated that they were
> > hanging on __libc_do_syscall() within __pthread_cond_wait() within
> > threadobj_cond_wait().
> >
> > I have attached the full backtrace from mutex-1 as mutex-1_bt.txt
> >
>
> Ok, if the test suite does not pass, something is badly wrong, so we
> should investigate that hang issue before anything else.
>
> The backtrace reveals that copperplate cannot handshake with a newly
> spawned task, this is the purpose of the wait_on_barrier() call over the
> context of rt_task_start(). That barrier should be signaled by a call to
> threadobj_notify_entry() from the internal trampoline code of the
> emerging thread (task_entry() in alchemy/task.c).
>
> - maybe task_prologue_2() (alchemy/task.c) which is called earlier hangs
> indefinitely, and therefore prevents threadobj_notify_entry() from
> running?
>
> - maybe the new thread does not even start for some reason, are we sure
> task_entry() is reached (e.g. do we hit a breakpoint there?)
>
> Could you inspect the current thread list under gdb when the program
> hangs?
>
> Also, I would recommend to enable full debugging for now
> (--enable-debug=full) to get accurate line information, assuming the
> issue should still show up with a non-optimized code. Hopefully.
>
> --
> Philippe.
>
>
> I ran the task-1 test under gdb with this Xenomai configuration:
> --with-core=mercury \
> --enable-debug=full \
> --enable-registry \
> --enable-smp \
> --enable-pshared \
> --enable-condvar-workaround
>
> It appears that the new thread is being launched, and getting stuck in
> threadobj_wait_start() within task_prologue_2(), as you indicated might
> be the case.
> I have attached the thread list and a full backtrace for each thread (in
> separate files by thread id).
>
> As per your other message, my kernel configs all include CONFIG_FUTEX.
>
> I have tried glibc 2.19 and 2.21, as well as RT patched and vanilla kernels.
>
> Interestingly, when I removed --enable-pshared from my configuration,
> the task-1 test passed.
>
Here is the sync pattern the code normally achieves, once the parent has
successfully spawned a child thread, which has to wait for a start signal
before it may run application code:
1. parent calls threadobj_start(child)
1.1 child->status |= __THREAD_S_STARTED
1.2 wait for child->status & __THREAD_S_ACTIVE
2. child calls threadobj_wait_start(self)
2.1 wait for self->status & __THREAD_S_STARTED
2.2 raise self->status |= __THREAD_S_ACTIVE
All accesses to the status bits are serialized by a per-thread mutex, operated
by the threadobj_lock/unlock accessors, which also covers the condvar
signaling/waiting as one would expect.
When running in pshared mode, thread descriptors (holding ->status, mutex and
barrier sync) are obtained from /dev/shm. If --disable-pshared, we are using
100% process-private memory.
Case 1: a race when manipulating the thread status due to inconsistent locking.
I could not find any so far.
Case 2: a cache coherence issue in SMP, also caused by improper locking.
Otherwise, the locking should enforce memory barriers as expected.
Case 3: anything not mentioned in other cases...
- Could you paste/copy the disassembly (objdump -dl rather than gdb's disass)
of the wait_on_barrier() function?
- Does running both programs with --cpu-affinity=0/1 change the outcome?
- Without specifying any affinity this time, could you run the current test
with the debug patch below applied (this is clearly not a fix)? The patch
forces the code to read the value of the ->status field before waiting on the
barrier. With that code in and a backtrace showing locals, we should be able to
check the status word when threadobj_wait_start() is entered.
diff --git a/lib/copperplate/threadobj.c b/lib/copperplate/threadobj.c
index cc64caa..ed85a12 100644
--- a/lib/copperplate/threadobj.c
+++ b/lib/copperplate/threadobj.c
@@ -1273,7 +1273,9 @@ void threadobj_wait_start(void) /* current->lock free. */
int status;
threadobj_lock(current);
- status = wait_on_barrier(current,
__THREAD_S_STARTED|__THREAD_S_ABORTED);
+ status = current->status;
+ if (!(status & __THREAD_S_STARTED))
+ status = wait_on_barrier(current,
__THREAD_S_STARTED|__THREAD_S_ABORTED);
threadobj_unlock(current);
/*
--
Philippe.
_______________________________________________
Xenomai mailing list
[email protected]
http://xenomai.org/mailman/listinfo/xenomai