[GitHub] [incubator-mxnet] apeforest commented on a change in pull request #15109: [DOC] refine autograd docs
apeforest commented on a change in pull request #15109: [DOC] refine autograd docs URL: https://github.com/apache/incubator-mxnet/pull/15109#discussion_r299238908 ## File path: docs/api/python/autograd/autograd.md ## @@ -76,7 +82,63 @@ Detailed tutorials are available in Part 1 of [the MXNet gluon book](http://gluon.mxnet.io/). +# Higher order gradient + +Some operators support higher order gradients. Meaning that you calculate the gradient of the +gradient. For this the operator's backward must be differentiable as well. Some operators support +differentiating multiple times, and others two, most just once. + +For calculating higher order gradients, we can use the `mx.autograd.grad` function while recording +and then call backward, or call `mx.autograd.grad` two times. If we do the latter, is important that +the first call uses `create_graph=True` and `retain_graph=True` and the second call uses +`create_graph=False` and `retain_graph=True`. Otherwise we will not get the results that we want. If +we would be to recreate the graph in the second call, we would end up with a graph of just the +backward nodes, not the full initial graph that includes the forward nodes. + +The pattern to calculate higher order gradients is the following: + +```python +from mxnet import ndarray as nd +from mxnet import autograd as ag +x = nd.array([1,2,3]) +x.attach_grad() +def f(x): +# Any function which supports higher oder gradient +return nd.log(x) +``` + +If the operators used in `f` don't support higher order gradients you will get an error like +`operator ... is non-differentiable because it didn't register FGradient attribute.`. This means +that it doesn't support getting the gradient of the gradient. Which is, running backward on +the backward graph. + +Using mxnet.autograd.grad multiple times: + +```python +with ag.record(): +y = f(x) +x_grad = ag.grad(heads=y, variables=x, create_graph=True, retain_graph=True)[0] +x_grad_grad = ag.grad(heads=x_grad, variables=x, create_graph=False, retain_graph=False)[0] +print(f"dL/dx: {x_grad}") +print(f"d2L/dx2: {x_grad_grad}") Review comment: As we discussed with @sxjscience, this may be not `d2L/dx2` because the function of gradients does not necessarily have to be the loss function `L` in the first order. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] apeforest commented on a change in pull request #15109: [DOC] refine autograd docs
apeforest commented on a change in pull request #15109: [DOC] refine autograd docs URL: https://github.com/apache/incubator-mxnet/pull/15109#discussion_r299238471 ## File path: docs/api/python/autograd/autograd.md ## @@ -76,7 +82,63 @@ Detailed tutorials are available in Part 1 of [the MXNet gluon book](http://gluon.mxnet.io/). +# Higher order gradient + +Some operators support higher order gradients. Meaning that you calculate the gradient of the +gradient. For this the operator's backward must be as well differentiable. Some operators support +differentiating multiple times, and others two, most just once. + +For calculating higher order gradients, we can use the `mx.autograd.grad` function while recording +and then call backward, or call `mx.autograd.grad` two times. If we do the later is important that +the first call uses `create_graph=True` and `retain_graph=True` and the second call uses +`create_graph=False` and `retain_graph=True`. Otherwise we will not get the results that we want. If Review comment: Please paste a computation graph here for better clarity. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] apeforest commented on a change in pull request #15109: [DOC] refine autograd docs
apeforest commented on a change in pull request #15109: [DOC] refine autograd docs URL: https://github.com/apache/incubator-mxnet/pull/15109#discussion_r293483770 ## File path: docs/api/python/autograd/autograd.md ## @@ -76,7 +82,63 @@ Detailed tutorials are available in Part 1 of [the MXNet gluon book](http://gluon.mxnet.io/). +# Higher order gradient + +Some operators support higher order gradients. Meaning that you calculate the gradient of the +gradient. For this the operator's backward must be differentiable as well. Some operators support +differentiating multiple times, and others two, most just once. + +For calculating higher order gradients, we can use the `mx.autograd.grad` function while recording +and then call backward, or call `mx.autograd.grad` two times. If we do the latter, is important that +the first call uses `create_graph=True` and `retain_graph=True` and the second call uses +`create_graph=False` and `retain_graph=True`. Otherwise we will not get the results that we want. If +we would be to recreate the graph in the second call, we would end up with a graph of just the +backward nodes, not the full initial graph that includes the forward nodes. + +The pattern to calculate higher order gradients is the following: + +```python +from mxnet import ndarray as nd +from mxnet import autograd as ag +x=nd.array([1,2,3]) +x.attach_grad() +def f(x): +# A function which supports higher oder gradients +return nd.log(x) +``` + +If the operators used in `f` don't support higher order gradients you will get an error like +`operator ... is non-differentiable because it didn't register FGradient attribute.`. This means +that it doesn't support getting the gradient of the gradient. Which is, running backward on +the backward graph. + +Using mxnet.autograd.grad multiple times: + +```python +with ag.record(): Review comment: We should also give an example with just calling `x_grad.backward()` to be consistent with the first order gradient example. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] apeforest commented on a change in pull request #15109: [DOC] refine autograd docs
apeforest commented on a change in pull request #15109: [DOC] refine autograd docs URL: https://github.com/apache/incubator-mxnet/pull/15109#discussion_r293483269 ## File path: docs/api/python/autograd/autograd.md ## @@ -76,7 +82,63 @@ Detailed tutorials are available in Part 1 of [the MXNet gluon book](http://gluon.mxnet.io/). +# Higher order gradient + +Some operators support higher order gradients. Meaning that you calculate the gradient of the +gradient. For this the operator's backward must be differentiable as well. Some operators support +differentiating multiple times, and others two, most just once. + +For calculating higher order gradients, we can use the `mx.autograd.grad` function while recording +and then call backward, or call `mx.autograd.grad` two times. If we do the latter, is important that +the first call uses `create_graph=True` and `retain_graph=True` and the second call uses +`create_graph=False` and `retain_graph=True`. Otherwise we will not get the results that we want. If +we would be to recreate the graph in the second call, we would end up with a graph of just the +backward nodes, not the full initial graph that includes the forward nodes. + +The pattern to calculate higher order gradients is the following: + +```python +from mxnet import ndarray as nd +from mxnet import autograd as ag +x=nd.array([1,2,3]) +x.attach_grad() +def f(x): +# A function which supports higher oder gradients +return nd.log(x) +``` + +If the operators used in `f` don't support higher order gradients you will get an error like +`operator ... is non-differentiable because it didn't register FGradient attribute.`. This means +that it doesn't support getting the gradient of the gradient. Which is, running backward on +the backward graph. + +Using mxnet.autograd.grad multiple times: + +```python +with ag.record(): +y = f(x) +x_grad = ag.grad(y, x, create_graph=True, retain_graph=True)[0] +x_grad_grad = ag.grad(x_grad, x, create_graph=False, retain_graph=True)[0] Review comment: Explicitly callout heads=x_grad, variables=x This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] apeforest commented on a change in pull request #15109: [DOC] refine autograd docs
apeforest commented on a change in pull request #15109: [DOC] refine autograd docs URL: https://github.com/apache/incubator-mxnet/pull/15109#discussion_r293482974 ## File path: docs/api/python/autograd/autograd.md ## @@ -76,7 +82,63 @@ Detailed tutorials are available in Part 1 of [the MXNet gluon book](http://gluon.mxnet.io/). +# Higher order gradient + +Some operators support higher order gradients. Meaning that you calculate the gradient of the +gradient. For this the operator's backward must be differentiable as well. Some operators support +differentiating multiple times, and others two, most just once. + +For calculating higher order gradients, we can use the `mx.autograd.grad` function while recording +and then call backward, or call `mx.autograd.grad` two times. If we do the latter, is important that +the first call uses `create_graph=True` and `retain_graph=True` and the second call uses +`create_graph=False` and `retain_graph=True`. Otherwise we will not get the results that we want. If +we would be to recreate the graph in the second call, we would end up with a graph of just the +backward nodes, not the full initial graph that includes the forward nodes. + +The pattern to calculate higher order gradients is the following: + +```python +from mxnet import ndarray as nd +from mxnet import autograd as ag +x=nd.array([1,2,3]) +x.attach_grad() Review comment: Add one line y_grad = nd.array([2.0, 2.0, 2.0]) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] apeforest commented on a change in pull request #15109: [DOC] refine autograd docs
apeforest commented on a change in pull request #15109: [DOC] refine autograd docs URL: https://github.com/apache/incubator-mxnet/pull/15109#discussion_r293482529 ## File path: docs/api/python/autograd/autograd.md ## @@ -76,7 +82,63 @@ Detailed tutorials are available in Part 1 of [the MXNet gluon book](http://gluon.mxnet.io/). +# Higher order gradient + +Some operators support higher order gradients. Meaning that you calculate the gradient of the +gradient. For this the operator's backward must be differentiable as well. Some operators support +differentiating multiple times, and others two, most just once. + +For calculating higher order gradients, we can use the `mx.autograd.grad` function while recording +and then call backward, or call `mx.autograd.grad` two times. If we do the latter, is important that +the first call uses `create_graph=True` and `retain_graph=True` and the second call uses +`create_graph=False` and `retain_graph=True`. Otherwise we will not get the results that we want. If +we would be to recreate the graph in the second call, we would end up with a graph of just the +backward nodes, not the full initial graph that includes the forward nodes. + +The pattern to calculate higher order gradients is the following: + +```python +from mxnet import ndarray as nd +from mxnet import autograd as ag +x=nd.array([1,2,3]) Review comment: Add space between "=" This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] apeforest commented on a change in pull request #15109: [DOC] refine autograd docs
apeforest commented on a change in pull request #15109: [DOC] refine autograd docs URL: https://github.com/apache/incubator-mxnet/pull/15109#discussion_r293482399 ## File path: docs/api/python/autograd/autograd.md ## @@ -76,7 +82,63 @@ Detailed tutorials are available in Part 1 of [the MXNet gluon book](http://gluon.mxnet.io/). +# Higher order gradient + +Some operators support higher order gradients. Meaning that you calculate the gradient of the +gradient. For this the operator's backward must be differentiable as well. Some operators support +differentiating multiple times, and others two, most just once. + +For calculating higher order gradients, we can use the `mx.autograd.grad` function while recording +and then call backward, or call `mx.autograd.grad` two times. If we do the latter, is important that +the first call uses `create_graph=True` and `retain_graph=True` and the second call uses +`create_graph=False` and `retain_graph=True`. Otherwise we will not get the results that we want. If +we would be to recreate the graph in the second call, we would end up with a graph of just the +backward nodes, not the full initial graph that includes the forward nodes. + +The pattern to calculate higher order gradients is the following: + +```python +from mxnet import ndarray as nd +from mxnet import autograd as ag +x=nd.array([1,2,3]) +x.attach_grad() +def f(x): +# A function which supports higher oder gradients +return nd.log(x) +``` + +If the operators used in `f` don't support higher order gradients you will get an error like +`operator ... is non-differentiable because it didn't register FGradient attribute.`. This means +that it doesn't support getting the gradient of the gradient. Which is, running backward on +the backward graph. + +Using mxnet.autograd.grad multiple times: + +```python +with ag.record(): +y = f(x) +x_grad = ag.grad(y, x, create_graph=True, retain_graph=True)[0] Review comment: Better to specify the argument explicitly: ```suggestion x_grad = ag.grad(heads=y, variables=x, head_grads=y_grad, create_graph=True, retain_graph=True)[0] ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] apeforest commented on a change in pull request #15109: [DOC] refine autograd docs
apeforest commented on a change in pull request #15109: [DOC] refine autograd docs URL: https://github.com/apache/incubator-mxnet/pull/15109#discussion_r293481338 ## File path: docs/api/python/autograd/autograd.md ## @@ -76,7 +82,63 @@ Detailed tutorials are available in Part 1 of [the MXNet gluon book](http://gluon.mxnet.io/). +# Higher order gradient + +Some operators support higher order gradients. Meaning that you calculate the gradient of the +gradient. For this the operator's backward must be differentiable as well. Some operators support +differentiating multiple times, and others two, most just once. + +For calculating higher order gradients, we can use the `mx.autograd.grad` function while recording +and then call backward, or call `mx.autograd.grad` two times. If we do the latter, is important that +the first call uses `create_graph=True` and `retain_graph=True` and the second call uses +`create_graph=False` and `retain_graph=True`. Otherwise we will not get the results that we want. If +we would be to recreate the graph in the second call, we would end up with a graph of just the +backward nodes, not the full initial graph that includes the forward nodes. + +The pattern to calculate higher order gradients is the following: + +```python +from mxnet import ndarray as nd +from mxnet import autograd as ag +x=nd.array([1,2,3]) +x.attach_grad() +def f(x): +# A function which supports higher oder gradients Review comment: Rephrase to "Any function that supports higher order gradient" ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] apeforest commented on a change in pull request #15109: [DOC] refine autograd docs
apeforest commented on a change in pull request #15109: [DOC] refine autograd docs URL: https://github.com/apache/incubator-mxnet/pull/15109#discussion_r293480796 ## File path: docs/api/python/autograd/autograd.md ## @@ -76,7 +82,63 @@ Detailed tutorials are available in Part 1 of [the MXNet gluon book](http://gluon.mxnet.io/). +# Higher order gradient + +Some operators support higher order gradients. Meaning that you calculate the gradient of the +gradient. For this the operator's backward must be differentiable as well. Some operators support +differentiating multiple times, and others two, most just once. + +For calculating higher order gradients, we can use the `mx.autograd.grad` function while recording +and then call backward, or call `mx.autograd.grad` two times. If we do the latter, is important that +the first call uses `create_graph=True` and `retain_graph=True` and the second call uses +`create_graph=False` and `retain_graph=True`. Otherwise we will not get the results that we want. If +we would be to recreate the graph in the second call, we would end up with a graph of just the +backward nodes, not the full initial graph that includes the forward nodes. + +The pattern to calculate higher order gradients is the following: + +```python +from mxnet import ndarray as nd +from mxnet import autograd as ag +x=nd.array([1,2,3]) +x.attach_grad() +def f(x): +# A function which supports higher oder gradients +return nd.log(x) +``` + +If the operators used in `f` don't support higher order gradients you will get an error like +`operator ... is non-differentiable because it didn't register FGradient attribute.`. This means +that it doesn't support getting the gradient of the gradient. Which is, running backward on +the backward graph. + +Using mxnet.autograd.grad multiple times: + +```python +with ag.record(): +y = f(x) +x_grad = ag.grad(y, x, create_graph=True, retain_graph=True)[0] +x_grad_grad = ag.grad(x_grad, x, create_graph=False, retain_graph=True)[0] +print(f"dy/dx: {x_grad}") +print(f"d2y/dx2: {x_grad_grad}") +``` + +Running backward on the backward graph: + +```python +with ag.record(): +y = f(x) +x_grad = ag.grad(y, x, create_graph=True, retain_graph=True)[0] +x_grad.backward() +x_grad_grad = x.grad +print(f"dy/dx: {x_grad}") +print(f"d2y/dx2: {x_grad_grad}") + +``` +Both methods are equivalent, except that in the second case, retain_graph on running backward is set Review comment: I feel using the backward() like in the first order gradient is simpler. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] apeforest commented on a change in pull request #15109: [DOC] refine autograd docs
apeforest commented on a change in pull request #15109: [DOC] refine autograd docs URL: https://github.com/apache/incubator-mxnet/pull/15109#discussion_r293480408 ## File path: docs/api/python/autograd/autograd.md ## @@ -76,7 +82,63 @@ Detailed tutorials are available in Part 1 of [the MXNet gluon book](http://gluon.mxnet.io/). +# Higher order gradient + +Some operators support higher order gradients. Meaning that you calculate the gradient of the +gradient. For this the operator's backward must be differentiable as well. Some operators support +differentiating multiple times, and others two, most just once. + +For calculating higher order gradients, we can use the `mx.autograd.grad` function while recording +and then call backward, or call `mx.autograd.grad` two times. If we do the latter, is important that +the first call uses `create_graph=True` and `retain_graph=True` and the second call uses +`create_graph=False` and `retain_graph=True`. Otherwise we will not get the results that we want. If +we would be to recreate the graph in the second call, we would end up with a graph of just the +backward nodes, not the full initial graph that includes the forward nodes. + +The pattern to calculate higher order gradients is the following: + +```python +from mxnet import ndarray as nd +from mxnet import autograd as ag +x=nd.array([1,2,3]) +x.attach_grad() +def f(x): +# A function which supports higher oder gradients +return nd.log(x) +``` + +If the operators used in `f` don't support higher order gradients you will get an error like +`operator ... is non-differentiable because it didn't register FGradient attribute.`. This means +that it doesn't support getting the gradient of the gradient. Which is, running backward on +the backward graph. + +Using mxnet.autograd.grad multiple times: + +```python +with ag.record(): +y = f(x) +x_grad = ag.grad(y, x, create_graph=True, retain_graph=True)[0] +x_grad_grad = ag.grad(x_grad, x, create_graph=False, retain_graph=True)[0] +print(f"dy/dx: {x_grad}") Review comment: x.grad is not dy/dx. It is dL/dx This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] apeforest commented on a change in pull request #15109: [DOC] refine autograd docs
apeforest commented on a change in pull request #15109: [DOC] refine autograd docs URL: https://github.com/apache/incubator-mxnet/pull/15109#discussion_r293480578 ## File path: docs/api/python/autograd/autograd.md ## @@ -76,7 +82,63 @@ Detailed tutorials are available in Part 1 of [the MXNet gluon book](http://gluon.mxnet.io/). +# Higher order gradient + +Some operators support higher order gradients. Meaning that you calculate the gradient of the +gradient. For this the operator's backward must be differentiable as well. Some operators support +differentiating multiple times, and others two, most just once. + +For calculating higher order gradients, we can use the `mx.autograd.grad` function while recording +and then call backward, or call `mx.autograd.grad` two times. If we do the latter, is important that +the first call uses `create_graph=True` and `retain_graph=True` and the second call uses +`create_graph=False` and `retain_graph=True`. Otherwise we will not get the results that we want. If +we would be to recreate the graph in the second call, we would end up with a graph of just the +backward nodes, not the full initial graph that includes the forward nodes. + +The pattern to calculate higher order gradients is the following: + +```python +from mxnet import ndarray as nd +from mxnet import autograd as ag +x=nd.array([1,2,3]) +x.attach_grad() +def f(x): +# A function which supports higher oder gradients +return nd.log(x) +``` + +If the operators used in `f` don't support higher order gradients you will get an error like +`operator ... is non-differentiable because it didn't register FGradient attribute.`. This means +that it doesn't support getting the gradient of the gradient. Which is, running backward on +the backward graph. + +Using mxnet.autograd.grad multiple times: + +```python +with ag.record(): +y = f(x) +x_grad = ag.grad(y, x, create_graph=True, retain_graph=True)[0] +x_grad_grad = ag.grad(x_grad, x, create_graph=False, retain_graph=True)[0] +print(f"dy/dx: {x_grad}") +print(f"d2y/dx2: {x_grad_grad}") +``` + +Running backward on the backward graph: + +```python +with ag.record(): +y = f(x) +x_grad = ag.grad(y, x, create_graph=True, retain_graph=True)[0] +x_grad.backward() +x_grad_grad = x.grad +print(f"dy/dx: {x_grad}") +print(f"d2y/dx2: {x_grad_grad}") Review comment: This is not d2y/dx2, it is d2L/dx2 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] apeforest commented on a change in pull request #15109: [DOC] refine autograd docs
apeforest commented on a change in pull request #15109: [DOC] refine autograd docs URL: https://github.com/apache/incubator-mxnet/pull/15109#discussion_r293479860 ## File path: docs/api/python/autograd/autograd.md ## @@ -76,7 +82,63 @@ Detailed tutorials are available in Part 1 of [the MXNet gluon book](http://gluon.mxnet.io/). +# Higher order gradient + +Some operators support higher order gradients. Meaning that you calculate the gradient of the +gradient. For this the operator's backward must be differentiable as well. Some operators support +differentiating multiple times, and others two, most just once. + +For calculating higher order gradients, we can use the `mx.autograd.grad` function while recording Review comment: They could just call backward() instead of `mx.autograd.grad`, right? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] apeforest commented on a change in pull request #15109: [DOC] refine autograd docs
apeforest commented on a change in pull request #15109: [DOC] refine autograd docs URL: https://github.com/apache/incubator-mxnet/pull/15109#discussion_r293479509 ## File path: docs/api/python/autograd/autograd.md ## @@ -76,7 +82,63 @@ Detailed tutorials are available in Part 1 of [the MXNet gluon book](http://gluon.mxnet.io/). +# Higher order gradient + +Some operators support higher order gradients. Meaning that you calculate the gradient of the +gradient. For this the operator's backward must be differentiable as well. Some operators support Review comment: "For this the operator's backward must be differentiable as well". This sentence also seems not accurate and necessary. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] apeforest commented on a change in pull request #15109: [DOC] refine autograd docs
apeforest commented on a change in pull request #15109: [DOC] refine autograd docs URL: https://github.com/apache/incubator-mxnet/pull/15109#discussion_r293479052 ## File path: docs/api/python/autograd/autograd.md ## @@ -76,7 +82,63 @@ Detailed tutorials are available in Part 1 of [the MXNet gluon book](http://gluon.mxnet.io/). +# Higher order gradient + +Some operators support higher order gradients. Meaning that you calculate the gradient of the Review comment: I think "Meaning that you calculate the gradient of the gradient" is redundant. People who use this package should understand what second order gradient is. Technically speaking it is the second order gradients of a loss function with respect to the input variables. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services