[GitHub] [incubator-mxnet] apeforest commented on a change in pull request #15109: [DOC] refine autograd docs

2019-07-01 Thread GitBox
apeforest commented on a change in pull request #15109: [DOC] refine autograd 
docs
URL: https://github.com/apache/incubator-mxnet/pull/15109#discussion_r299238908
 
 

 ##
 File path: docs/api/python/autograd/autograd.md
 ##
 @@ -76,7 +82,63 @@ Detailed tutorials are available in Part 1 of
 [the MXNet gluon book](http://gluon.mxnet.io/).
 
 
+# Higher order gradient
+
+Some operators support higher order gradients. Meaning that you calculate the 
gradient of the
+gradient. For this the operator's backward must be differentiable as well. 
Some operators support
+differentiating multiple times, and others two, most just once.
+
+For calculating higher order gradients, we can use the `mx.autograd.grad` 
function while recording
+and then call backward, or call `mx.autograd.grad` two times. If we do the 
latter, is important that
+the first call uses `create_graph=True` and `retain_graph=True` and the second 
call uses
+`create_graph=False` and `retain_graph=True`. Otherwise we will not get the 
results that we want. If
+we would be to recreate the graph in the second call, we would end up with a 
graph of just the
+backward nodes, not the full initial graph that includes the forward nodes.
+
+The pattern to calculate higher order gradients is the following:
+
+```python
+from mxnet import ndarray as nd
+from mxnet import autograd as ag
+x = nd.array([1,2,3])
+x.attach_grad()
+def f(x):
+# Any function which supports higher oder gradient
+return nd.log(x)
+```
+
+If the operators used in `f` don't support higher order gradients you will get 
an error like
+`operator ... is non-differentiable because it didn't register FGradient 
attribute.`. This means
+that it doesn't support getting the gradient of the gradient. Which is, 
running backward on
+the backward graph.
+
+Using mxnet.autograd.grad multiple times:
+
+```python
+with ag.record():
+y = f(x)
+x_grad = ag.grad(heads=y, variables=x, create_graph=True, 
retain_graph=True)[0]
+x_grad_grad = ag.grad(heads=x_grad, variables=x, create_graph=False, 
retain_graph=False)[0]
+print(f"dL/dx: {x_grad}")
+print(f"d2L/dx2: {x_grad_grad}")
 
 Review comment:
   As we discussed with @sxjscience, this may be not `d2L/dx2` because the 
function of gradients does not necessarily have to be the loss function `L` in 
the first order. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-mxnet] apeforest commented on a change in pull request #15109: [DOC] refine autograd docs

2019-07-01 Thread GitBox
apeforest commented on a change in pull request #15109: [DOC] refine autograd 
docs
URL: https://github.com/apache/incubator-mxnet/pull/15109#discussion_r299238471
 
 

 ##
 File path: docs/api/python/autograd/autograd.md
 ##
 @@ -76,7 +82,63 @@ Detailed tutorials are available in Part 1 of
 [the MXNet gluon book](http://gluon.mxnet.io/).
 
 
+# Higher order gradient
+
+Some operators support higher order gradients. Meaning that you calculate the 
gradient of the
+gradient. For this the operator's backward must be as well differentiable. 
Some operators support
+differentiating multiple times, and others two, most just once.
+
+For calculating higher order gradients, we can use the `mx.autograd.grad` 
function while recording
+and then call backward, or call `mx.autograd.grad` two times. If we do the 
later is important that
+the first call uses `create_graph=True` and `retain_graph=True` and the second 
call uses
+`create_graph=False` and `retain_graph=True`. Otherwise we will not get the 
results that we want. If
 
 Review comment:
   Please paste a computation graph here for better clarity.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-mxnet] apeforest commented on a change in pull request #15109: [DOC] refine autograd docs

2019-06-13 Thread GitBox
apeforest commented on a change in pull request #15109: [DOC] refine autograd 
docs
URL: https://github.com/apache/incubator-mxnet/pull/15109#discussion_r293483770
 
 

 ##
 File path: docs/api/python/autograd/autograd.md
 ##
 @@ -76,7 +82,63 @@ Detailed tutorials are available in Part 1 of
 [the MXNet gluon book](http://gluon.mxnet.io/).
 
 
+# Higher order gradient
+
+Some operators support higher order gradients. Meaning that you calculate the 
gradient of the
+gradient. For this the operator's backward must be differentiable as well. 
Some operators support
+differentiating multiple times, and others two, most just once.
+
+For calculating higher order gradients, we can use the `mx.autograd.grad` 
function while recording
+and then call backward, or call `mx.autograd.grad` two times. If we do the 
latter, is important that
+the first call uses `create_graph=True` and `retain_graph=True` and the second 
call uses
+`create_graph=False` and `retain_graph=True`. Otherwise we will not get the 
results that we want. If
+we would be to recreate the graph in the second call, we would end up with a 
graph of just the
+backward nodes, not the full initial graph that includes the forward nodes.
+
+The pattern to calculate higher order gradients is the following:
+
+```python
+from mxnet import ndarray as nd
+from mxnet import autograd as ag
+x=nd.array([1,2,3])
+x.attach_grad()
+def f(x):
+# A function which supports higher oder gradients
+return nd.log(x)
+```
+
+If the operators used in `f` don't support higher order gradients you will get 
an error like
+`operator ... is non-differentiable because it didn't register FGradient 
attribute.`. This means
+that it doesn't support getting the gradient of the gradient. Which is, 
running backward on
+the backward graph.
+
+Using mxnet.autograd.grad multiple times:
+
+```python
+with ag.record():
 
 Review comment:
   We should also give an example with just calling `x_grad.backward()` to be 
consistent with the first order gradient example.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-mxnet] apeforest commented on a change in pull request #15109: [DOC] refine autograd docs

2019-06-13 Thread GitBox
apeforest commented on a change in pull request #15109: [DOC] refine autograd 
docs
URL: https://github.com/apache/incubator-mxnet/pull/15109#discussion_r293483269
 
 

 ##
 File path: docs/api/python/autograd/autograd.md
 ##
 @@ -76,7 +82,63 @@ Detailed tutorials are available in Part 1 of
 [the MXNet gluon book](http://gluon.mxnet.io/).
 
 
+# Higher order gradient
+
+Some operators support higher order gradients. Meaning that you calculate the 
gradient of the
+gradient. For this the operator's backward must be differentiable as well. 
Some operators support
+differentiating multiple times, and others two, most just once.
+
+For calculating higher order gradients, we can use the `mx.autograd.grad` 
function while recording
+and then call backward, or call `mx.autograd.grad` two times. If we do the 
latter, is important that
+the first call uses `create_graph=True` and `retain_graph=True` and the second 
call uses
+`create_graph=False` and `retain_graph=True`. Otherwise we will not get the 
results that we want. If
+we would be to recreate the graph in the second call, we would end up with a 
graph of just the
+backward nodes, not the full initial graph that includes the forward nodes.
+
+The pattern to calculate higher order gradients is the following:
+
+```python
+from mxnet import ndarray as nd
+from mxnet import autograd as ag
+x=nd.array([1,2,3])
+x.attach_grad()
+def f(x):
+# A function which supports higher oder gradients
+return nd.log(x)
+```
+
+If the operators used in `f` don't support higher order gradients you will get 
an error like
+`operator ... is non-differentiable because it didn't register FGradient 
attribute.`. This means
+that it doesn't support getting the gradient of the gradient. Which is, 
running backward on
+the backward graph.
+
+Using mxnet.autograd.grad multiple times:
+
+```python
+with ag.record():
+y = f(x)
+x_grad = ag.grad(y, x, create_graph=True, retain_graph=True)[0]
+x_grad_grad = ag.grad(x_grad, x, create_graph=False, retain_graph=True)[0]
 
 Review comment:
   Explicitly callout heads=x_grad, variables=x


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-mxnet] apeforest commented on a change in pull request #15109: [DOC] refine autograd docs

2019-06-13 Thread GitBox
apeforest commented on a change in pull request #15109: [DOC] refine autograd 
docs
URL: https://github.com/apache/incubator-mxnet/pull/15109#discussion_r293482974
 
 

 ##
 File path: docs/api/python/autograd/autograd.md
 ##
 @@ -76,7 +82,63 @@ Detailed tutorials are available in Part 1 of
 [the MXNet gluon book](http://gluon.mxnet.io/).
 
 
+# Higher order gradient
+
+Some operators support higher order gradients. Meaning that you calculate the 
gradient of the
+gradient. For this the operator's backward must be differentiable as well. 
Some operators support
+differentiating multiple times, and others two, most just once.
+
+For calculating higher order gradients, we can use the `mx.autograd.grad` 
function while recording
+and then call backward, or call `mx.autograd.grad` two times. If we do the 
latter, is important that
+the first call uses `create_graph=True` and `retain_graph=True` and the second 
call uses
+`create_graph=False` and `retain_graph=True`. Otherwise we will not get the 
results that we want. If
+we would be to recreate the graph in the second call, we would end up with a 
graph of just the
+backward nodes, not the full initial graph that includes the forward nodes.
+
+The pattern to calculate higher order gradients is the following:
+
+```python
+from mxnet import ndarray as nd
+from mxnet import autograd as ag
+x=nd.array([1,2,3])
+x.attach_grad()
 
 Review comment:
   Add one line y_grad = nd.array([2.0, 2.0, 2.0])


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-mxnet] apeforest commented on a change in pull request #15109: [DOC] refine autograd docs

2019-06-13 Thread GitBox
apeforest commented on a change in pull request #15109: [DOC] refine autograd 
docs
URL: https://github.com/apache/incubator-mxnet/pull/15109#discussion_r293482529
 
 

 ##
 File path: docs/api/python/autograd/autograd.md
 ##
 @@ -76,7 +82,63 @@ Detailed tutorials are available in Part 1 of
 [the MXNet gluon book](http://gluon.mxnet.io/).
 
 
+# Higher order gradient
+
+Some operators support higher order gradients. Meaning that you calculate the 
gradient of the
+gradient. For this the operator's backward must be differentiable as well. 
Some operators support
+differentiating multiple times, and others two, most just once.
+
+For calculating higher order gradients, we can use the `mx.autograd.grad` 
function while recording
+and then call backward, or call `mx.autograd.grad` two times. If we do the 
latter, is important that
+the first call uses `create_graph=True` and `retain_graph=True` and the second 
call uses
+`create_graph=False` and `retain_graph=True`. Otherwise we will not get the 
results that we want. If
+we would be to recreate the graph in the second call, we would end up with a 
graph of just the
+backward nodes, not the full initial graph that includes the forward nodes.
+
+The pattern to calculate higher order gradients is the following:
+
+```python
+from mxnet import ndarray as nd
+from mxnet import autograd as ag
+x=nd.array([1,2,3])
 
 Review comment:
   Add space between "="


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-mxnet] apeforest commented on a change in pull request #15109: [DOC] refine autograd docs

2019-06-13 Thread GitBox
apeforest commented on a change in pull request #15109: [DOC] refine autograd 
docs
URL: https://github.com/apache/incubator-mxnet/pull/15109#discussion_r293482399
 
 

 ##
 File path: docs/api/python/autograd/autograd.md
 ##
 @@ -76,7 +82,63 @@ Detailed tutorials are available in Part 1 of
 [the MXNet gluon book](http://gluon.mxnet.io/).
 
 
+# Higher order gradient
+
+Some operators support higher order gradients. Meaning that you calculate the 
gradient of the
+gradient. For this the operator's backward must be differentiable as well. 
Some operators support
+differentiating multiple times, and others two, most just once.
+
+For calculating higher order gradients, we can use the `mx.autograd.grad` 
function while recording
+and then call backward, or call `mx.autograd.grad` two times. If we do the 
latter, is important that
+the first call uses `create_graph=True` and `retain_graph=True` and the second 
call uses
+`create_graph=False` and `retain_graph=True`. Otherwise we will not get the 
results that we want. If
+we would be to recreate the graph in the second call, we would end up with a 
graph of just the
+backward nodes, not the full initial graph that includes the forward nodes.
+
+The pattern to calculate higher order gradients is the following:
+
+```python
+from mxnet import ndarray as nd
+from mxnet import autograd as ag
+x=nd.array([1,2,3])
+x.attach_grad()
+def f(x):
+# A function which supports higher oder gradients
+return nd.log(x)
+```
+
+If the operators used in `f` don't support higher order gradients you will get 
an error like
+`operator ... is non-differentiable because it didn't register FGradient 
attribute.`. This means
+that it doesn't support getting the gradient of the gradient. Which is, 
running backward on
+the backward graph.
+
+Using mxnet.autograd.grad multiple times:
+
+```python
+with ag.record():
+y = f(x)
+x_grad = ag.grad(y, x, create_graph=True, retain_graph=True)[0]
 
 Review comment:
   Better to specify the argument explicitly:
   
```suggestion
   x_grad = ag.grad(heads=y, variables=x, head_grads=y_grad, 
create_graph=True, retain_graph=True)[0]
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-mxnet] apeforest commented on a change in pull request #15109: [DOC] refine autograd docs

2019-06-13 Thread GitBox
apeforest commented on a change in pull request #15109: [DOC] refine autograd 
docs
URL: https://github.com/apache/incubator-mxnet/pull/15109#discussion_r293481338
 
 

 ##
 File path: docs/api/python/autograd/autograd.md
 ##
 @@ -76,7 +82,63 @@ Detailed tutorials are available in Part 1 of
 [the MXNet gluon book](http://gluon.mxnet.io/).
 
 
+# Higher order gradient
+
+Some operators support higher order gradients. Meaning that you calculate the 
gradient of the
+gradient. For this the operator's backward must be differentiable as well. 
Some operators support
+differentiating multiple times, and others two, most just once.
+
+For calculating higher order gradients, we can use the `mx.autograd.grad` 
function while recording
+and then call backward, or call `mx.autograd.grad` two times. If we do the 
latter, is important that
+the first call uses `create_graph=True` and `retain_graph=True` and the second 
call uses
+`create_graph=False` and `retain_graph=True`. Otherwise we will not get the 
results that we want. If
+we would be to recreate the graph in the second call, we would end up with a 
graph of just the
+backward nodes, not the full initial graph that includes the forward nodes.
+
+The pattern to calculate higher order gradients is the following:
+
+```python
+from mxnet import ndarray as nd
+from mxnet import autograd as ag
+x=nd.array([1,2,3])
+x.attach_grad()
+def f(x):
+# A function which supports higher oder gradients
 
 Review comment:
   Rephrase to "Any function that supports higher order gradient" ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-mxnet] apeforest commented on a change in pull request #15109: [DOC] refine autograd docs

2019-06-13 Thread GitBox
apeforest commented on a change in pull request #15109: [DOC] refine autograd 
docs
URL: https://github.com/apache/incubator-mxnet/pull/15109#discussion_r293480796
 
 

 ##
 File path: docs/api/python/autograd/autograd.md
 ##
 @@ -76,7 +82,63 @@ Detailed tutorials are available in Part 1 of
 [the MXNet gluon book](http://gluon.mxnet.io/).
 
 
+# Higher order gradient
+
+Some operators support higher order gradients. Meaning that you calculate the 
gradient of the
+gradient. For this the operator's backward must be differentiable as well. 
Some operators support
+differentiating multiple times, and others two, most just once.
+
+For calculating higher order gradients, we can use the `mx.autograd.grad` 
function while recording
+and then call backward, or call `mx.autograd.grad` two times. If we do the 
latter, is important that
+the first call uses `create_graph=True` and `retain_graph=True` and the second 
call uses
+`create_graph=False` and `retain_graph=True`. Otherwise we will not get the 
results that we want. If
+we would be to recreate the graph in the second call, we would end up with a 
graph of just the
+backward nodes, not the full initial graph that includes the forward nodes.
+
+The pattern to calculate higher order gradients is the following:
+
+```python
+from mxnet import ndarray as nd
+from mxnet import autograd as ag
+x=nd.array([1,2,3])
+x.attach_grad()
+def f(x):
+# A function which supports higher oder gradients
+return nd.log(x)
+```
+
+If the operators used in `f` don't support higher order gradients you will get 
an error like
+`operator ... is non-differentiable because it didn't register FGradient 
attribute.`. This means
+that it doesn't support getting the gradient of the gradient. Which is, 
running backward on
+the backward graph.
+
+Using mxnet.autograd.grad multiple times:
+
+```python
+with ag.record():
+y = f(x)
+x_grad = ag.grad(y, x, create_graph=True, retain_graph=True)[0]
+x_grad_grad = ag.grad(x_grad, x, create_graph=False, retain_graph=True)[0]
+print(f"dy/dx: {x_grad}")
+print(f"d2y/dx2: {x_grad_grad}")
+```
+
+Running backward on the backward graph:
+
+```python
+with ag.record():
+y = f(x)
+x_grad = ag.grad(y, x, create_graph=True, retain_graph=True)[0]
+x_grad.backward()
+x_grad_grad = x.grad
+print(f"dy/dx: {x_grad}")
+print(f"d2y/dx2: {x_grad_grad}")
+
+```
 
+Both methods are equivalent, except that in the second case, retain_graph on 
running backward is set
 
 Review comment:
   I feel using the backward() like in the first order gradient is simpler.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-mxnet] apeforest commented on a change in pull request #15109: [DOC] refine autograd docs

2019-06-13 Thread GitBox
apeforest commented on a change in pull request #15109: [DOC] refine autograd 
docs
URL: https://github.com/apache/incubator-mxnet/pull/15109#discussion_r293480408
 
 

 ##
 File path: docs/api/python/autograd/autograd.md
 ##
 @@ -76,7 +82,63 @@ Detailed tutorials are available in Part 1 of
 [the MXNet gluon book](http://gluon.mxnet.io/).
 
 
+# Higher order gradient
+
+Some operators support higher order gradients. Meaning that you calculate the 
gradient of the
+gradient. For this the operator's backward must be differentiable as well. 
Some operators support
+differentiating multiple times, and others two, most just once.
+
+For calculating higher order gradients, we can use the `mx.autograd.grad` 
function while recording
+and then call backward, or call `mx.autograd.grad` two times. If we do the 
latter, is important that
+the first call uses `create_graph=True` and `retain_graph=True` and the second 
call uses
+`create_graph=False` and `retain_graph=True`. Otherwise we will not get the 
results that we want. If
+we would be to recreate the graph in the second call, we would end up with a 
graph of just the
+backward nodes, not the full initial graph that includes the forward nodes.
+
+The pattern to calculate higher order gradients is the following:
+
+```python
+from mxnet import ndarray as nd
+from mxnet import autograd as ag
+x=nd.array([1,2,3])
+x.attach_grad()
+def f(x):
+# A function which supports higher oder gradients
+return nd.log(x)
+```
+
+If the operators used in `f` don't support higher order gradients you will get 
an error like
+`operator ... is non-differentiable because it didn't register FGradient 
attribute.`. This means
+that it doesn't support getting the gradient of the gradient. Which is, 
running backward on
+the backward graph.
+
+Using mxnet.autograd.grad multiple times:
+
+```python
+with ag.record():
+y = f(x)
+x_grad = ag.grad(y, x, create_graph=True, retain_graph=True)[0]
+x_grad_grad = ag.grad(x_grad, x, create_graph=False, retain_graph=True)[0]
+print(f"dy/dx: {x_grad}")
 
 Review comment:
   x.grad is not dy/dx. It is dL/dx


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-mxnet] apeforest commented on a change in pull request #15109: [DOC] refine autograd docs

2019-06-13 Thread GitBox
apeforest commented on a change in pull request #15109: [DOC] refine autograd 
docs
URL: https://github.com/apache/incubator-mxnet/pull/15109#discussion_r293480578
 
 

 ##
 File path: docs/api/python/autograd/autograd.md
 ##
 @@ -76,7 +82,63 @@ Detailed tutorials are available in Part 1 of
 [the MXNet gluon book](http://gluon.mxnet.io/).
 
 
+# Higher order gradient
+
+Some operators support higher order gradients. Meaning that you calculate the 
gradient of the
+gradient. For this the operator's backward must be differentiable as well. 
Some operators support
+differentiating multiple times, and others two, most just once.
+
+For calculating higher order gradients, we can use the `mx.autograd.grad` 
function while recording
+and then call backward, or call `mx.autograd.grad` two times. If we do the 
latter, is important that
+the first call uses `create_graph=True` and `retain_graph=True` and the second 
call uses
+`create_graph=False` and `retain_graph=True`. Otherwise we will not get the 
results that we want. If
+we would be to recreate the graph in the second call, we would end up with a 
graph of just the
+backward nodes, not the full initial graph that includes the forward nodes.
+
+The pattern to calculate higher order gradients is the following:
+
+```python
+from mxnet import ndarray as nd
+from mxnet import autograd as ag
+x=nd.array([1,2,3])
+x.attach_grad()
+def f(x):
+# A function which supports higher oder gradients
+return nd.log(x)
+```
+
+If the operators used in `f` don't support higher order gradients you will get 
an error like
+`operator ... is non-differentiable because it didn't register FGradient 
attribute.`. This means
+that it doesn't support getting the gradient of the gradient. Which is, 
running backward on
+the backward graph.
+
+Using mxnet.autograd.grad multiple times:
+
+```python
+with ag.record():
+y = f(x)
+x_grad = ag.grad(y, x, create_graph=True, retain_graph=True)[0]
+x_grad_grad = ag.grad(x_grad, x, create_graph=False, retain_graph=True)[0]
+print(f"dy/dx: {x_grad}")
+print(f"d2y/dx2: {x_grad_grad}")
+```
+
+Running backward on the backward graph:
+
+```python
+with ag.record():
+y = f(x)
+x_grad = ag.grad(y, x, create_graph=True, retain_graph=True)[0]
+x_grad.backward()
+x_grad_grad = x.grad
+print(f"dy/dx: {x_grad}")
+print(f"d2y/dx2: {x_grad_grad}")
 
 Review comment:
   This is not d2y/dx2, it is d2L/dx2


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-mxnet] apeforest commented on a change in pull request #15109: [DOC] refine autograd docs

2019-06-13 Thread GitBox
apeforest commented on a change in pull request #15109: [DOC] refine autograd 
docs
URL: https://github.com/apache/incubator-mxnet/pull/15109#discussion_r293479860
 
 

 ##
 File path: docs/api/python/autograd/autograd.md
 ##
 @@ -76,7 +82,63 @@ Detailed tutorials are available in Part 1 of
 [the MXNet gluon book](http://gluon.mxnet.io/).
 
 
+# Higher order gradient
+
+Some operators support higher order gradients. Meaning that you calculate the 
gradient of the
+gradient. For this the operator's backward must be differentiable as well. 
Some operators support
+differentiating multiple times, and others two, most just once.
+
+For calculating higher order gradients, we can use the `mx.autograd.grad` 
function while recording
 
 Review comment:
   They could just call backward() instead of `mx.autograd.grad`, right?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-mxnet] apeforest commented on a change in pull request #15109: [DOC] refine autograd docs

2019-06-13 Thread GitBox
apeforest commented on a change in pull request #15109: [DOC] refine autograd 
docs
URL: https://github.com/apache/incubator-mxnet/pull/15109#discussion_r293479509
 
 

 ##
 File path: docs/api/python/autograd/autograd.md
 ##
 @@ -76,7 +82,63 @@ Detailed tutorials are available in Part 1 of
 [the MXNet gluon book](http://gluon.mxnet.io/).
 
 
+# Higher order gradient
+
+Some operators support higher order gradients. Meaning that you calculate the 
gradient of the
+gradient. For this the operator's backward must be differentiable as well. 
Some operators support
 
 Review comment:
   "For this the operator's backward must be differentiable as well".  This 
sentence also seems not accurate and necessary.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-mxnet] apeforest commented on a change in pull request #15109: [DOC] refine autograd docs

2019-06-13 Thread GitBox
apeforest commented on a change in pull request #15109: [DOC] refine autograd 
docs
URL: https://github.com/apache/incubator-mxnet/pull/15109#discussion_r293479052
 
 

 ##
 File path: docs/api/python/autograd/autograd.md
 ##
 @@ -76,7 +82,63 @@ Detailed tutorials are available in Part 1 of
 [the MXNet gluon book](http://gluon.mxnet.io/).
 
 
+# Higher order gradient
+
+Some operators support higher order gradients. Meaning that you calculate the 
gradient of the
 
 Review comment:
   I think "Meaning that you calculate the gradient of the gradient" is 
redundant. People who use this package should understand what second order 
gradient is. Technically speaking it is the second order gradients of a loss 
function with respect to the input variables.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services