[GitHub] [incubator-mxnet] eric-haibin-lin commented on issue #18772: horovod seg-fault with mxnet pip wheels

2020-07-23 Thread GitBox


eric-haibin-lin commented on issue #18772:
URL: 
https://github.com/apache/incubator-mxnet/issues/18772#issuecomment-663264152


   Thanks for the investigation and good catch about the c++ headers. I agree. 
We need to rewrite the integration code using only the c APIs to avoid ABI 
compatibility issues



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-mxnet] eric-haibin-lin commented on issue #18772: horovod seg-fault with mxnet pip wheels

2020-07-22 Thread GitBox


eric-haibin-lin commented on issue #18772:
URL: 
https://github.com/apache/incubator-mxnet/issues/18772#issuecomment-662772210


   ```
   [1,0]:(gdb) bt
   [1,0]:#0  0x77419b80 in pthread_mutex_lock () from 
/lib64/libpthread.so.0
   [1,0]:#1  0x7fff68a1b81d in 
mxnet::engine::ThreadedVar::AppendWriteDependency(mxnet::engine::OprBlock*) ()
   [1,0]:   from 
/home/ec2-user/.local/lib/python3.7/site-packages/mxnet/libmxnet.so
   [1,0]:#2  0x7fff68a176ff in 
mxnet::engine::ThreadedEngine::Push(mxnet::engine::Opr*, mxnet::Context, int, 
bool) ()
   [1,0]:   from 
/home/ec2-user/.local/lib/python3.7/site-packages/mxnet/libmxnet.so
   [1,0]:#3  0x7fff68a147a7 in 
mxnet::engine::ThreadedEngine::PushAsync(std::function, mxnet::Context, 
std::vector > const&, 
std::vector > const&, 
mxnet::FnProperty, int, char const*, bool) ()
   [1,0]:   from 
/home/ec2-user/.local/lib/python3.7/site-packages/mxnet/libmxnet.so
   [1,0]:#4  0x7fff688f5f42 in MXEnginePushAsync ()
   [1,0]:   from 
/home/ec2-user/.local/lib/python3.7/site-packages/mxnet/libmxnet.so
   [1,0]:#5  0x7ffdcc11ace9 in horovod::mxnet::PushHorovodOperation 
(
   [1,0]:op_type=op_type@entry=horovod::common::Request::BROADCAST,
   [1,0]:input=input@entry=0x182fb90, output=output@entry=0x182fb90,
   [1,0]:name=name@entry=0x7ffdd5e63f20 "0.bias", 
priority=priority@entry=0,
   [1,0]:root_rank=root_rank@entry=0) at 
horovod/mxnet/mpi_ops.cc:138
   [1,0]:#6  0x7ffdcc116010 in 
horovod::mxnet::horovod_mxnet_broadcast_async (
   [1,0]:input=0x182fb90, output=0x182fb90, name=0x7ffdd5e63f20 
"0.bias",
   [1,0]:root_rank=0, priority=0) at horovod/mxnet/mpi_ops.cc:301
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org