On Mon, 25 Oct 2010, Vladimir Kirillov wrote: > I've been trying to test rthreads and have hit some weird races > using simple tests: ... > I get this segfault almost always: > > #0 pthread_exit (retval=0x0) at /usr/src/lib/librthread/rthread.c:223 > 223 for (clfn = thread->cleanup_fns; clfn; ) { ... > Looks like some stupid race with two threads exiting at almost same > time? Any ideas on tracking it down?
The problem was actually introduced during the c2k10 hackathon, where I changed getthrid() to always add THREAD_PID_OFFSET to the proc's real pid (which closes a race for pthread_kill)...but failed to teach fork1() to do that too. The patch at bottom fixes this in my testing. (I wasn't seeing this myself becauswe I'm normally running a severely hacked librthread that uses the platform's per-thread register to implement pthread_self() instead of having to walk the thread list. Sorry folks. Time to get this stuff committed...) > By the way, is such gdb behaviour normal? Does it need any additional > patching before being useful to debug software using rthreads? Oh yes, it needs lots of work. I don't know if anyone had really looked closely at this yet. Philip Guenther Index: sys/kern/kern_fork.c =================================================================== RCS file: /cvs/src/sys/kern/kern_fork.c,v retrieving revision 1.122 diff -u -p -r1.122 kern_fork.c --- sys/kern/kern_fork.c 26 Jul 2010 01:56:27 -0000 1.122 +++ sys/kern/kern_fork.c 30 Oct 2010 22:52:39 -0000 @@ -480,7 +480,8 @@ fork1(struct proc *p1, int exitsig, int * marking us as parent via retval[1]. */ if (retval != NULL) { - retval[0] = p2->p_pid; + retval[0] = p2->p_pid + + (flags & FORK_THREAD ? THREAD_PID_OFFSET : 0); retval[1] = 0; } return (0);