Hey there, I'm not sure this is the correct list considering the level of internal Python knowledge it likely requires. If I should take this to another list, please let me know.
I have written an application that links against libpython and starts multiple interpreters within one thread. Beyond that it actually links against libpython2.7 *and* libpython3.4 (though I doubt that makes a difference here). In addition to launching my own interpreters I created a custom C module that can be called from Python (and will be loaded during my own init function). The problem is a bit weird and not easy to reproduce. There are two types of behaviors launched, both running the same C code but diverting during interpreter runtime. The problem happens only if I start 8 instances of type A and at least one of type B. All instances of type A start before any instance of type B starts. On the first start of tpye B the program segfaults. Decreasing the number to 7 yields normal execution. I could now go along and post a lot of code here and explain the complex setup but I think it is much more helpful if I can get hints on how to debug this myself. An excerpt of the traceback (full can be found here: http://pastebin.com/unAaG6qi) #0 0x00007f922096cd6a in visit_decref (op=0x7f921d801518 <__hoisted_globals+216>, data=0x0) at Modules/gcmodule.c:360 #1 0x00007f92208aadf2 in meth_traverse (m=0x7f921d4f0d40, visit=0x7f922096cd52 <visit_decref>, arg=0x0) at Objects/methodobject.c:166 #2 0x00007f922096ce2c in subtract_refs (containers=0x7f9220c108e0 <generations+96>) at Modules/gcmodule.c:385 #3 0x00007f922096dac9 in collect (generation=0x2) at Modules/gcmodule.c:925 #4 0x00007f922096de0a in collect_generations () at Modules/gcmodule.c:1050 #5 0x00007f922096e939 in _PyObject_GC_Malloc (basicsize=0x20) at Modules/gcmodule.c:1511 #6 0x00007f922096e9d9 in _PyObject_GC_NewVar (tp=0x7f9220bf1540 <PyTuple_Type>, nitems=0x1) at Modules/gcmodule.c:1531 #7 0x00007f92208c7681 in PyTuple_New (size=0x1) at Objects/tupleobject.c:90 ... #58 0x00007f9220953ff7 in initsite () at Python/pythonrun.c:726 #59 0x00007f9220953d94 in Py_NewInterpreter () at Python/pythonrun.c:621 #60 0x00007f921d60024b in prepare_interpreter (argc=0x9, argv=0x387ee70, m=0x3861e80) at /home/javex/Thesis/src/shadow-plugin-extras/python/src/python.c:19 #61 0x00007f921d5ff925 in python_new (argc=0x9, argv=0x387ee70, log=0x421420 <_thread_interface_log>) at /home/javex/Thesis/src/shadow-plugin-extras/python/src/python.c:160 with the exact point of segfault being this: 355 /* A traversal callback for subtract_refs. */ 356 static int 357 visit_decref(PyObject *op, void *data) 358 { 359 assert(op != NULL); 360 if (PyObject_IS_GC(op)) { <---- segfault here 361 PyGC_Head *gc = AS_GC(op); 362 /* We're only interested in gc_refs for objects in the 363 * generation being collected, which can be recognized 364 * because only they have positive gc_refs. With "op" being: gdb$ print *op $4 = {ob_refcnt = 0x1, ob_type = 0x0} Now I vaguely remember having seen this error before. It *might* have been that I was passing back Py_None without increasing its reference (but it later being decref'ed). That case is now solved and I don't return Py_None at another place. So the big question is: How do I find out what is happening here? From what I can gather it looks like GC is cleaning up so I guess I have a refcount wrong. But which? Where? Can I debug this? If you think this might help in any case, my source code is also available: https://github.com/Javex/shadow-plugin-extras/tree/master/python/src Here the files python-logging.c, python.c and python-plugin.c are in use with python.c having most of the magic and python-logging.c being my own module. Beyond that there's only python code, no more C of my own. Thanks in advance for any help you can provide. Kind regards, Florian -- https://mail.python.org/mailman/listinfo/python-list