Bug#370060: Segmentation fault if linked with the pthread library

2006-06-18 Thread Damián Viano
On Tue, Jun 06, 2006 at 11:59:14PM +0100, David Given wrote:
 Damián Viano wrote:
 [...]
  I'm writing you just to let you know that your bug-report didn't fall
  into void. I'm currently looking into this, and will report back to you
  and the bug when I have something more usefull.
 
 Excellent --- thanks!
 
 I've actually found a possibly related problem, which I haven't managed to
 isolate yet; what I appear to be seeing is that an exception thrown by (some
 C++ running in) one coroutine is ignoring the coroutine's own try{} block and
 being caught by the one in co_main instead. Needless to say, this is causing
 my app to go wrong. I have managed to get identical code behaving differently
 if compiled on either the i386 and ARM, but unfortunately this involves
 running the entire app, which isn't convenient. Does this ring any bells?

Yes, this is probably the same sort of error. Maybe you can reduce it
to a simpler test case? 

  By the moment I was able to reproduce the bug in an arm4 machine
  through a friend, I hope i could get direct/semi-direct access this
  machine soon so I could investigate firther.

I've been investigating this a while now and have some ideas/pointers
but no solution nor clear identification of the exact problem. I'll keep
digging.

Meanwhile I thought to drop you and the bug this note about my
findings.

It seems that arm lack support for {make,get,swap}context() calls so
an alternate way of creating the context for the threads is used. This
uses a very clever hack of sigaltstack an a little trickery with
setjmp/longmp to get a stack for the threads. The problem seems to be
that pthread uses the same trick, so there might possibly lay the problem.

However, forcing the alternative stack (by commenting
HAVE_{MAKE,GET,SWAP}CONTEXT in config.h) didn't allowed me to reproduce
the bug in i386. This may be because my guess is wrong, because pthread
uses {get,make,swap}context calls instead of this trick on i386 so there's 
no clashing, or because there is something specific to arm that make
this use-case fail.

I'll keep looking because I'm still not fully convinced that, even
though both libraries used the same trick, this use should not work.

Btw, here I leave the backtrace of the proposed testcase:

Program received signal SIGSEGV, Segmentation fault.
0x4003fc08 in __pthread_sighandler () from /lib/libpthread.so.0
(gdb) bt
#0  0x4003fc08 in __pthread_sighandler () from /lib/libpthread.so.0
#1  signal handler called
#2  0x400ba968 in sigsuspend () from /lib/libc.so.6
#3  0x4001edf0 in co_set_context (ctx=0x11050, func=0x4001f14c,
stkbase=0x11150 , stksiz=value optimized out) at pcl.c:281
#4  0x4001eea4 in co_create (func=0x8768 switch_bench, data=0x0,
stack=0x11050, size=value optimized out) at pcl.c:401
#5  0x8820 in main (argc=1, argv=0xbec555f4) at cobench.c:70

Damián Viano(Des).


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#370060: Segmentation fault if linked with the pthread library

2006-06-02 Thread David Given
Package: libpcl1
Version: 1.6-1

I have a situation on the ARM platform where if I compile an executable to use
both libpcl and libpthread, then even if I'm not actually using threads,
trying to do anything to a coroutine causes an immediate segmentation fault.

To replicate:

$ gcc /usr/share/doc/libpc1-dev/examples/cobench.c -lpcl
-- produces working executable.
$ ldd ./a.out
libpcl.so.1 = /usr/lib/libpcl.so.1 (0x4001e000)
libc.so.6 = /lib/libc.so.6 (0x40029000)
/lib/ld-linux.so.2 (0x4000)

gcc /usr/share/doc/libpc1-dev/examples/cobench.c -lpcl -lpthread
-- produces failing executable.
$ ldd ./a.out
libpcl.so.1 = /usr/lib/libpcl.so.1 (0x4001e000)
libpthread.so.0 = /lib/libpthread.so.0 (0x40029000)
libc.so.6 = /lib/libc.so.6 (0x40083000)
/lib/ld-linux.so.2 (0x4000)

The reason why this is an issue is that I have an application which uses
coroutines, but not pthreads, which wants to link to the sqlite library. When
sqlite was built, it was done so with its threading support enabled; which
means that it tries to call mutex primitives in the pthreads library. This
means that linking an application against libsqlite will implicitly pull in
libpthread, which then causes it to fail.

This actually sounds suspiciously similar to a problem I previously
encountered on the i386, Debian bug 339827. To summarise: on i386 when running
a sufficiently old kernel that does not have thread-local storage, the
threading library figures out which thread is running by the location of the
stack pointer. If the application has allocated a user stack, it fails to cope
and crashes. This means that calling any thread-safe function which needs to
know what the calling thread is will fail when it is called from any user stack.

The only workaround I've got for this is to recompile sqlite without threading
support, which is painful.

Architecture:
Linux pyanfar 2.6.15-1-nslu2 #2 Tue Mar 7 17:36:32 UTC 2006 armv5tel GNU/Linux

Debian version:
Etch (current)




signature.asc
Description: OpenPGP digital signature