When I build a manylinux wheel for nupuc.core
(https://github.com/numenta/nupic.core/pull/1001), all nupic.core and
nupic (https://github.com/numenta/nupic) tests pass on Ubuntu 14.04.
However, when I run nupic unit tests on Ubuntu 16.04, I always get a futex
lock hang at 
https://github.com/sandstorm-io/capnproto/blob/v0.5.3/c%2B%2B/src/kj/mutex.
c%2B%2B#L87 (a statically-linked copy of capnproto embedded in the python
extension .so that¹s part of the nupic.bindings manylinux wheel built by
nupic.core).

The extension build uses shared libs: libc.so.6, libstdc++.so.6, and
libgcc_s.so.1. Built and running against Python 2.7.11. I use a custom
manylinux docker image that¹s created from a fork of manylinux that
replaces centos5 with centos6.8
(https://github.com/numenta/manylinux/pull/1) as suggested in
https://mail.python.org/pipermail/wheel-builders/2016-July/000175.html.
This image has been pushed to quay.io/numenta/manylinux1_x86_64_centos6.

The traceback to the hang looks like this:

(gdb) bt
#0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1  0x00007f2e042d7d77 in kj::_::Mutex::lock (this=0x42b6610,
exclusivity=<optimized out>)
    at 
/nupic.core/build/scripts/ThirdParty/Source/CapnProto/src/kj/mutex.c++:87
#2  0x00007f2e042a658e in
kj::MutexGuarded<kj::Own<capnp::SchemaLoader::Impl> >::lockExclusive
(this=0x42b6610)
    at 
/nupic.core/build/scripts/ThirdParty/Source/CapnProto/src/kj/mutex.h:300
#3  capnp::SchemaLoader::loadNative (this=0x42b6610,
nativeSchema=0x7f2e045c1f40 <capnp::schemas::s_b414112f4b6b1b45>)
    at 
/nupic.core/build/scripts/ThirdParty/Source/CapnProto/src/capnp/schema-load
er.c++:2069
#4  0x00007f2e04074761 in
capnp::SchemaLoader::loadCompiledTypeAndDependencies<NetworkProto>
(this=<optimized out>)
    at 
/nupic.core/build/scripts/ThirdParty/Install/include/capnp/schema-loader.h:
168
#5  capnp::SchemaParser::loadCompiledTypeAndDependencies<NetworkProto>
(this=<optimized out>)
    at 
/nupic.core/build/scripts/ThirdParty/Install/include/capnp/schema-parser.h:
83
#6  nupic::getBuilder<NetworkProto> (pyBuilder=0x7f2e0a0a55f0) at
/nupic.core/src/nupic/py_support/PyCapnp.hpp:77
#7  0x00007f2e03fcacd3 in nupic_Network_write__SWIG_2 (self=0x3253090,
pyBuilder=<optimized out>)
    at 
/nupic.core/build/scripts/src/nupic/bindings/engine_internalPYTHON_wrap.cxx
:5287
#8  0x00007f2e03ff878f in _wrap_Network_write__SWIG_2
(nobjs=nobjs@entry=2, swig_obj=swig_obj@entry=0x7ffc743cf0d0)
    at 
/nupic.core/build/scripts/src/nupic/bindings/engine_internalPYTHON_wrap.cxx
:27690
#9  0x00007f2e03ff8c05 in _wrap_Network_write (self=0x0, args=<optimized
out>)
    at 
/nupic.core/build/scripts/src/nupic/bindings/engine_internalPYTHON_wrap.cxx
:27812
#10 0x00000000004cb26d in PyEval_EvalFrameEx ()
#11 0x00000000004c22e5 in PyEval_EvalCodeEx ()


I am going to put in additional effort to isolate the issue to a small
code footprint from the vast body of code that it¹s in now. However, in
the meantime, I was hoping that someone might have run into something
similar and might share some helpful clues about the issue or possibly how
to debug it efficiently.

Many thanks,
Vitaly

_______________________________________________
Wheel-builders mailing list
Wheel-builders@python.org
https://mail.python.org/mailman/listinfo/wheel-builders

Reply via email to