Nathaniel, Thanks for the feedback. Investigations made where I acquire
GIL in spots where I already hold it lead me to suspect that the issue
is caused by use of Py_BEGIN_ALLOW_THREADS or another macro
NPY_BEGIN_ALLOW_THREADS.
to put this point to rest, I have a high degree of confidence in our
code. I have run the code in valgrind, and it runs cleanly when I use
Python's valgrind suppression file. It is certainly the case the the
error is harder to reproduce with valgrind, and it runs so slowly under
valgrind that running in a bash loop (the usual way I reproduce) is not
practical. I have examined crashes with gdb and I have verified that
they occur while I hold the GIL.
However, based on your suggestion, I experimented with acquiring GIL in
certain spots in my code where it should not be necessary either because
those code pieces are invoked by the main thread when there is only one
thread running, or because the code in question is invoked while I'm
already holding the GIL. I also added trace output to stderr in those
spots as well.
Acquiring the GIL caused deadlocks as expected, except when I used it in
new_object template (the only spot we pass data to numpy) below.
Aquiring the GIL here fixes the issues!
206 //
207 template
208 PyArrayObject *new_object(teca_variant_array_impl *varrt)
209 {
*210 PyGILState_STATE gstate;**// experimental, I already hold
the GIL higher up in stack
**211 gstate = PyGILState_Ensure();*
212 TECA_STATUS("teca_py_array::new_object");
213
214 // allocate a buffer
215 npy_intp n_elem = varrt->size();
216 size_t n_bytes = n_elem*sizeof(NT);
217 NT *mem = static_cast(malloc(n_bytes));
218 if (!mem)
219 {
220 PyErr_Format(PyExc_RuntimeError,
221 "failed to allocate %lu bytes", n_bytes);
222 return nullptr;
223 }
224
225 // copy the data
226 memcpy(mem, varrt->get(), n_bytes);
227
228 // put the buffer in to a new numpy object
229 PyArrayObject *arr = reinterpret_cast(
230 PyArray_SimpleNewFromData(1, &n_elem,
numpy_tt::code, mem));
231 PyArray_ENABLEFLAGS(arr, NPY_ARRAY_OWNDATA);
232
*233 PyGILState_Release(gstate);*
234 return arr;
235 }
now, this function is only used from within the user provided Python
callback which is invoked only while I'm holding the GIL. A fact that
I've verified via gdb stack traces. This function should be running
serially due to the fact that I hold the GIL. However, it's running
concurrently, as evidenced from the random crashes when it's used, and
the garbled stderr output, both of which go away with the above
addition. I don't think that I should have to acquire the GIL here, but
the evidence is against me!
In Python docs on the GIL, I noticed that there's a macro
Py_BEGIN_ALLOW_THREADS that Python uses internally around blocking I/O
and heavy computations. much to my surprise grep reveals
Py_BEGIN_ALLOW_THREADS is used in numpy. I think this can explain the
issues I'm experiencing. where numpy uses Py_BEGIN_ALLOW_THREADS it
would let my threads, who utilize the numpy C-API via new_object
template above, run concurrently despite the fact that I hold the GIL.
It seems to me that in order for numpy to be thread safe it should not
use this macro at all. At least this is my theory, I haven't yet had a
chance to modify numpy build to verify. It's possible that
Py_BEGIN_ALLOW_THREADS maybe used elsewhere.
here's my question: given Py_BEGIN_ALLOW_THREADS is used by numpy how
can numpy be thread safe? and how can someone using the C-API know where
it's necessary to acquire the GIL? Maybe someone can explain this?
$grep BEGIN_ALLOW_THREADS ./ -rIn
./doc/source/f2py/signature-file.rst:250: Use
``Py_BEGIN_ALLOW_THREADS .. Py_END_ALLOW_THREADS`` block
./doc/source/reference/c-api.array.rst:3092:.. c:macro::
NPY_BEGIN_ALLOW_THREADS
./doc/source/reference/c-api.array.rst:3094:Equivalent to
:c:macro:`Py_BEGIN_ALLOW_THREADS` except it uses
./build/src.linux-x86_64-2.7/numpy/core/src/multiarray/scalartypes.c:2109:
Py_BEGIN_ALLOW_THREADS
./build/src.linux-x86_64-2.7/numpy/core/src/multiarray/scalartypes.c:2125:
Py_BEGIN_ALLOW_THREADS
./build/src.linux-x86_64-2.7/numpy/core/src/multiarray/scalartypes.c:2143:
Py_BEGIN_ALLOW_THREADS
./build/src.linux-x86_64-2.7/numpy/core/src/multiarray/scalartypes.c:2159:
Py_BEGIN_ALLOW_THREADS
./build/src.linux-x86_64-2.7/numpy/core/src/multiarray/scalartypes.c:2177:
Py_BEGIN_ALLOW_THREADS
./build/src.linux-x86_64-2.7/numpy/core/src/multiarray/scalartypes.c:2193:
Py_BEGIN_ALLOW_THREADS
./build/src.linux-x86_64-2.7/numpy/core/src/multiarray/scalartypes.c:2211:
Py_BEGIN_ALLOW_THREADS
./build/src.linux-x86_64-2.7/numpy/core/src/