[GitHub] [incubator-mxnet] wkcn commented on a change in pull request #17486: Update CustomOp doc with changes for GPU support

2020-02-05 Thread GitBox
wkcn commented on a change in pull request #17486: Update CustomOp doc with 
changes for GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17486#discussion_r375598920
 
 

 ##
 File path: example/extensions/lib_custom_op/README.md
 ##
 @@ -146,21 +186,160 @@ Let’s take a closer look at those registry functions:
 
 * **inferShape**: This function is similar to the `inferType` function, except 
it is used for populating the output data shapes. You need to figure out the 
shapes of each output tensors for this computation. For example, if the inputs 
are images with shape (224,224,3) and you write a padding operator to make 10px 
borders for the images, then your output shape will be (234,234,3).
 
-* **forward**: This function executes the main forward computation. It takes 
four arguments. The 1st argument is the attributes. The 2nd argument is the 
input `MXTensors` which stores all data and info of input ndarrays. The 3rd 
argument is the output `MXTensors`. The 4th argument is the `OpResource` object 
for memory allocation and other utilities. Additionally, you can use a 
`dltensor` tensor structure stored in the `MXTensor` as a more standardized 
data structure for computing.
+* **forward**: This function executes the main forward computation. It takes 
four arguments. The 1st argument is the attributes. The 2nd argument is the 
input `MXTensors` which stores all data and info of input ndarrays. The 3rd 
argument is the output `MXTensors`. The 4th argument is the `OpResource` object 
for memory allocation and other utilities. The details of `OpResource` are 
covered in the below section.
+Additionally, you can use a `dltensor` tensor structure stored in the 
`MXTensor` as a more standardized data structure for computing.
 
 * **backward**: This function is doing the backward gradient computation. It 
will be similar to the forward function. And you need to figure out the formula 
of the backward gradient computation.
 
 * **mutateInputs**: This function is for marking mutable inputs. It takes two 
arguments. The 1st argument is the attributes. The 2nd argument is a list of 
input indices that are mutable among all input tensors. It is useful when some 
inputs are auxiliary model parameters and might be altered during 
forward/backward computation. Remember, the index number of `input_indices` 
should not exceed the number of inputs.
 
-### Writing Stateful Custom Operator:
+### Writing Stateful Custom Operator
 
 A stateful custom operator is useful when a forward/backward call needs some 
data or ‘state’ from previous forward/backward calls. Normally we create a 
class, and make instance variables store the states used for computing or 
caching.
 
 Most of the building blocks for making a stateful custom operator is the same 
as regular custom operator, except it’ll register `createOpState` instead of a 
`forward` function for the computation.
 
 * [createOpState](./gemm_lib.cc#L204) - Create stateful operator instance:
 * This function takes two arguments. The 1st argument is attributes. The 
2nd argument is a placeholder for `CustomStatefulOp` object. You must [define a 
class that inherits CustomStatefulOp](./gemm_lib.cc#L178) and override the 
forward function (optionally the backward function). Then you need to create an 
instance of your class and assign it to the placeholder. In this way, all of 
the forward/backward calls will use the same methods in that instance, and the 
instance is able to keep the state of the operator.
+```c++
+MXReturnValue createOpState(
+std::map attrs,
+CustomStatefulOp** op_inst)
+```
+
+* The operator registering function will look like this:
+```c++
+REGISTER_OP(my_state_op)
+...
+.setCreateOpState(createOpState, "cpu");
+```
+
+## Writing A Custom GPU Operator Library
+
+Most of the building blocks for registering GPU custom operators are the 
exactly same as CPU ones, except you need to specify the `"gpu"` context name 
when registering `forward`, `backward` or `createOpState` function.
+
+### Run A GPU Example
+
+For illustration purposes, we provided a `ReLU` (Rectified Linear Unit) 
activation operator that can run on GPU. Make sure you have installed a CUDA 
compatible MXNet build. Go to `lib_custom_op` directory and follow these steps: 
+
+1. Run `make relu_lib`. The Makefile will invoke `NVCC` compiler to compile 
the CUDA kernel along with regular custom operator functions from `relu_lib.cu` 
to generate `librelu_lib.so` library.
+2. Run `python test_relu.py`. It’ll register the GPU `ReLU` operator in the 
MXNet backend, then invoke the operator by feeding an `NDArray` input with GPU 
context, and output the result tensor with GPU context.
+
+### Writing A Regular GPU Custom Operator
+
+Since most of the building blocks for registering GPU custom operators are the 
exactly same as CPU ones, the registering function for an operator supporting 
both GPU and CPU will look like this:
+
+```c++

[GitHub] [incubator-mxnet] wkcn commented on a change in pull request #17486: Update CustomOp doc with changes for GPU support

2020-02-04 Thread GitBox
wkcn commented on a change in pull request #17486: Update CustomOp doc with 
changes for GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17486#discussion_r375044803
 
 

 ##
 File path: example/extensions/lib_custom_op/README.md
 ##
 @@ -146,21 +186,160 @@ Let’s take a closer look at those registry functions:
 
 * **inferShape**: This function is similar to the `inferType` function, except 
it is used for populating the output data shapes. You need to figure out the 
shapes of each output tensors for this computation. For example, if the inputs 
are images with shape (224,224,3) and you write a padding operator to make 10px 
borders for the images, then your output shape will be (234,234,3).
 
-* **forward**: This function executes the main forward computation. It takes 
four arguments. The 1st argument is the attributes. The 2nd argument is the 
input `MXTensors` which stores all data and info of input ndarrays. The 3rd 
argument is the output `MXTensors`. The 4th argument is the `OpResource` object 
for memory allocation and other utilities. Additionally, you can use a 
`dltensor` tensor structure stored in the `MXTensor` as a more standardized 
data structure for computing.
+* **forward**: This function executes the main forward computation. It takes 
four arguments. The 1st argument is the attributes. The 2nd argument is the 
input `MXTensors` which stores all data and info of input ndarrays. The 3rd 
argument is the output `MXTensors`. The 4th argument is the `OpResource` object 
for memory allocation and other utilities. The details of `OpResource` are 
covered in the below section.
+Additionally, you can use a `dltensor` tensor structure stored in the 
`MXTensor` as a more standardized data structure for computing.
 
 * **backward**: This function is doing the backward gradient computation. It 
will be similar to the forward function. And you need to figure out the formula 
of the backward gradient computation.
 
 * **mutateInputs**: This function is for marking mutable inputs. It takes two 
arguments. The 1st argument is the attributes. The 2nd argument is a list of 
input indices that are mutable among all input tensors. It is useful when some 
inputs are auxiliary model parameters and might be altered during 
forward/backward computation. Remember, the index number of `input_indices` 
should not exceed the number of inputs.
 
-### Writing Stateful Custom Operator:
+### Writing Stateful Custom Operator
 
 A stateful custom operator is useful when a forward/backward call needs some 
data or ‘state’ from previous forward/backward calls. Normally we create a 
class, and make instance variables store the states used for computing or 
caching.
 
 Most of the building blocks for making a stateful custom operator is the same 
as regular custom operator, except it’ll register `createOpState` instead of a 
`forward` function for the computation.
 
 * [createOpState](./gemm_lib.cc#L204) - Create stateful operator instance:
 * This function takes two arguments. The 1st argument is attributes. The 
2nd argument is a placeholder for `CustomStatefulOp` object. You must [define a 
class that inherits CustomStatefulOp](./gemm_lib.cc#L178) and override the 
forward function (optionally the backward function). Then you need to create an 
instance of your class and assign it to the placeholder. In this way, all of 
the forward/backward calls will use the same methods in that instance, and the 
instance is able to keep the state of the operator.
+```c++
+MXReturnValue createOpState(
+std::map attrs,
+CustomStatefulOp** op_inst)
+```
+
+* The operator registering function will look like this:
+```c++
+REGISTER_OP(my_state_op)
+...
+.setCreateOpState(createOpState, "cpu");
+```
+
+## Writing A Custom GPU Operator Library
+
+Most of the building blocks for registering GPU custom operators are the 
exactly same as CPU ones, except you need to specify the `"gpu"` context name 
when registering `forward`, `backward` or `createOpState` function.
+
+### Run A GPU Example
+
+For illustration purposes, we provided a `ReLU` (Rectified Linear Unit) 
activation operator that can run on GPU. Make sure you have installed a CUDA 
compatible MXNet build. Go to `lib_custom_op` directory and follow these steps: 
+
+1. Run `make relu_lib`. The Makefile will invoke `NVCC` compiler to compile 
the CUDA kernel along with regular custom operator functions from `relu_lib.cu` 
to generate `librelu_lib.so` library.
+2. Run `python test_relu.py`. It’ll register the GPU `ReLU` operator in the 
MXNet backend, then invoke the operator by feeding an `NDArray` input with GPU 
context, and output the result tensor with GPU context.
+
+### Writing A Regular GPU Custom Operator
+
+Since most of the building blocks for registering GPU custom operators are the 
exactly same as CPU ones, the registering function for an operator supporting 
both GPU and CPU will look like this:
+
+```c++

[GitHub] [incubator-mxnet] wkcn commented on a change in pull request #17486: Update CustomOp doc with changes for GPU support

2020-02-04 Thread GitBox
wkcn commented on a change in pull request #17486: Update CustomOp doc with 
changes for GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17486#discussion_r375044803
 
 

 ##
 File path: example/extensions/lib_custom_op/README.md
 ##
 @@ -146,21 +186,160 @@ Let’s take a closer look at those registry functions:
 
 * **inferShape**: This function is similar to the `inferType` function, except 
it is used for populating the output data shapes. You need to figure out the 
shapes of each output tensors for this computation. For example, if the inputs 
are images with shape (224,224,3) and you write a padding operator to make 10px 
borders for the images, then your output shape will be (234,234,3).
 
-* **forward**: This function executes the main forward computation. It takes 
four arguments. The 1st argument is the attributes. The 2nd argument is the 
input `MXTensors` which stores all data and info of input ndarrays. The 3rd 
argument is the output `MXTensors`. The 4th argument is the `OpResource` object 
for memory allocation and other utilities. Additionally, you can use a 
`dltensor` tensor structure stored in the `MXTensor` as a more standardized 
data structure for computing.
+* **forward**: This function executes the main forward computation. It takes 
four arguments. The 1st argument is the attributes. The 2nd argument is the 
input `MXTensors` which stores all data and info of input ndarrays. The 3rd 
argument is the output `MXTensors`. The 4th argument is the `OpResource` object 
for memory allocation and other utilities. The details of `OpResource` are 
covered in the below section.
+Additionally, you can use a `dltensor` tensor structure stored in the 
`MXTensor` as a more standardized data structure for computing.
 
 * **backward**: This function is doing the backward gradient computation. It 
will be similar to the forward function. And you need to figure out the formula 
of the backward gradient computation.
 
 * **mutateInputs**: This function is for marking mutable inputs. It takes two 
arguments. The 1st argument is the attributes. The 2nd argument is a list of 
input indices that are mutable among all input tensors. It is useful when some 
inputs are auxiliary model parameters and might be altered during 
forward/backward computation. Remember, the index number of `input_indices` 
should not exceed the number of inputs.
 
-### Writing Stateful Custom Operator:
+### Writing Stateful Custom Operator
 
 A stateful custom operator is useful when a forward/backward call needs some 
data or ‘state’ from previous forward/backward calls. Normally we create a 
class, and make instance variables store the states used for computing or 
caching.
 
 Most of the building blocks for making a stateful custom operator is the same 
as regular custom operator, except it’ll register `createOpState` instead of a 
`forward` function for the computation.
 
 * [createOpState](./gemm_lib.cc#L204) - Create stateful operator instance:
 * This function takes two arguments. The 1st argument is attributes. The 
2nd argument is a placeholder for `CustomStatefulOp` object. You must [define a 
class that inherits CustomStatefulOp](./gemm_lib.cc#L178) and override the 
forward function (optionally the backward function). Then you need to create an 
instance of your class and assign it to the placeholder. In this way, all of 
the forward/backward calls will use the same methods in that instance, and the 
instance is able to keep the state of the operator.
+```c++
+MXReturnValue createOpState(
+std::map attrs,
+CustomStatefulOp** op_inst)
+```
+
+* The operator registering function will look like this:
+```c++
+REGISTER_OP(my_state_op)
+...
+.setCreateOpState(createOpState, "cpu");
+```
+
+## Writing A Custom GPU Operator Library
+
+Most of the building blocks for registering GPU custom operators are the 
exactly same as CPU ones, except you need to specify the `"gpu"` context name 
when registering `forward`, `backward` or `createOpState` function.
+
+### Run A GPU Example
+
+For illustration purposes, we provided a `ReLU` (Rectified Linear Unit) 
activation operator that can run on GPU. Make sure you have installed a CUDA 
compatible MXNet build. Go to `lib_custom_op` directory and follow these steps: 
+
+1. Run `make relu_lib`. The Makefile will invoke `NVCC` compiler to compile 
the CUDA kernel along with regular custom operator functions from `relu_lib.cu` 
to generate `librelu_lib.so` library.
+2. Run `python test_relu.py`. It’ll register the GPU `ReLU` operator in the 
MXNet backend, then invoke the operator by feeding an `NDArray` input with GPU 
context, and output the result tensor with GPU context.
+
+### Writing A Regular GPU Custom Operator
+
+Since most of the building blocks for registering GPU custom operators are the 
exactly same as CPU ones, the registering function for an operator supporting 
both GPU and CPU will look like this:
+
+```c++

[GitHub] [incubator-mxnet] wkcn commented on a change in pull request #17486: Update CustomOp doc with changes for GPU support

2020-02-04 Thread GitBox
wkcn commented on a change in pull request #17486: Update CustomOp doc with 
changes for GPU support
URL: https://github.com/apache/incubator-mxnet/pull/17486#discussion_r375044803
 
 

 ##
 File path: example/extensions/lib_custom_op/README.md
 ##
 @@ -146,21 +186,160 @@ Let’s take a closer look at those registry functions:
 
 * **inferShape**: This function is similar to the `inferType` function, except 
it is used for populating the output data shapes. You need to figure out the 
shapes of each output tensors for this computation. For example, if the inputs 
are images with shape (224,224,3) and you write a padding operator to make 10px 
borders for the images, then your output shape will be (234,234,3).
 
-* **forward**: This function executes the main forward computation. It takes 
four arguments. The 1st argument is the attributes. The 2nd argument is the 
input `MXTensors` which stores all data and info of input ndarrays. The 3rd 
argument is the output `MXTensors`. The 4th argument is the `OpResource` object 
for memory allocation and other utilities. Additionally, you can use a 
`dltensor` tensor structure stored in the `MXTensor` as a more standardized 
data structure for computing.
+* **forward**: This function executes the main forward computation. It takes 
four arguments. The 1st argument is the attributes. The 2nd argument is the 
input `MXTensors` which stores all data and info of input ndarrays. The 3rd 
argument is the output `MXTensors`. The 4th argument is the `OpResource` object 
for memory allocation and other utilities. The details of `OpResource` are 
covered in the below section.
+Additionally, you can use a `dltensor` tensor structure stored in the 
`MXTensor` as a more standardized data structure for computing.
 
 * **backward**: This function is doing the backward gradient computation. It 
will be similar to the forward function. And you need to figure out the formula 
of the backward gradient computation.
 
 * **mutateInputs**: This function is for marking mutable inputs. It takes two 
arguments. The 1st argument is the attributes. The 2nd argument is a list of 
input indices that are mutable among all input tensors. It is useful when some 
inputs are auxiliary model parameters and might be altered during 
forward/backward computation. Remember, the index number of `input_indices` 
should not exceed the number of inputs.
 
-### Writing Stateful Custom Operator:
+### Writing Stateful Custom Operator
 
 A stateful custom operator is useful when a forward/backward call needs some 
data or ‘state’ from previous forward/backward calls. Normally we create a 
class, and make instance variables store the states used for computing or 
caching.
 
 Most of the building blocks for making a stateful custom operator is the same 
as regular custom operator, except it’ll register `createOpState` instead of a 
`forward` function for the computation.
 
 * [createOpState](./gemm_lib.cc#L204) - Create stateful operator instance:
 * This function takes two arguments. The 1st argument is attributes. The 
2nd argument is a placeholder for `CustomStatefulOp` object. You must [define a 
class that inherits CustomStatefulOp](./gemm_lib.cc#L178) and override the 
forward function (optionally the backward function). Then you need to create an 
instance of your class and assign it to the placeholder. In this way, all of 
the forward/backward calls will use the same methods in that instance, and the 
instance is able to keep the state of the operator.
+```c++
+MXReturnValue createOpState(
+std::map attrs,
+CustomStatefulOp** op_inst)
+```
+
+* The operator registering function will look like this:
+```c++
+REGISTER_OP(my_state_op)
+...
+.setCreateOpState(createOpState, "cpu");
+```
+
+## Writing A Custom GPU Operator Library
+
+Most of the building blocks for registering GPU custom operators are the 
exactly same as CPU ones, except you need to specify the `"gpu"` context name 
when registering `forward`, `backward` or `createOpState` function.
+
+### Run A GPU Example
+
+For illustration purposes, we provided a `ReLU` (Rectified Linear Unit) 
activation operator that can run on GPU. Make sure you have installed a CUDA 
compatible MXNet build. Go to `lib_custom_op` directory and follow these steps: 
+
+1. Run `make relu_lib`. The Makefile will invoke `NVCC` compiler to compile 
the CUDA kernel along with regular custom operator functions from `relu_lib.cu` 
to generate `librelu_lib.so` library.
+2. Run `python test_relu.py`. It’ll register the GPU `ReLU` operator in the 
MXNet backend, then invoke the operator by feeding an `NDArray` input with GPU 
context, and output the result tensor with GPU context.
+
+### Writing A Regular GPU Custom Operator
+
+Since most of the building blocks for registering GPU custom operators are the 
exactly same as CPU ones, the registering function for an operator supporting 
both GPU and CPU will look like this:
+
+```c++