Speed. All those `std::string` and `std::unordered_map` objects don't come
cheaply.
I compared an integrated fork with a custom operator.
https://github.com/kpuatamazon/incubator-mxnet/tree/intgemm integrated version
end-to-end Sockeye performance (based on 1.6.0):
```
real2m57.962s
Custom ops should be able to set the inplace property.
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/17006#issuecomment-616807857
@larroy Users may need matrix operators and DNN Op(e.g. ReLU, Conv) when
writing a custom Op. Although they can implement it by third-party libraries,
it is more convenient to use the built-in functions in MXNet.
--
You are receiving this because you are subscribed to this thread.
Reply to
We should create a namespace for the stuff in the lib_api.h file as suggested
by @larroy:
https://github.com/apache/incubator-mxnet/pull/15760/files#r311756416
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
@wkcn could you explain your suggestion? calling gemm back into the framework
which gets dispatched to GPU or CPU?
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
Need to include a fix for the test error
https://github.com/apache/incubator-mxnet/pull/15921#pullrequestreview-328686634
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
Hi @samskalicky , thank you for the contribution!
I have several suggestions.
- custom GPU operators
1. Provide CUDA stream in `OpResource`.
2. Share the same function on CPU and GPU.
Users can discriminate the context by `MXTensor::dltensor::ctx`
- Call framework specific math helper
## Description
Request for comments on the next PR for enhancing custom operator support
Heres some suggestions from the initial PR (Part 1):
- custom GPU operators
- Random number generator resource request
- sparse data types
- migrate lambda functions in MXLoadLib in src/c_api/c_api.cc to