Re: [theano-users] Straight-through gradient for stochastic node in theano

Pascal Lamblin Wed, 29 Mar 2017 16:59:09 -0700

You can try OpFromGraph with inline=True, and specify
override_gradients=... with the right expression.
This is still experimental.


On Wed, Mar 29, 2017, nokunok...@gmail.com wrote:
> Hi guys!
> 
> I recently started using theano and am struggling to implement custom 
> gradient for stochastic node. Can anyone help me?
> 
> What I want is an op that produces one hot vector whose hot element is 
> sampled from softmax distribution.
> The op is not differentiable, but I want to "fake" as if its gradient is 
> softmax's one ("straight through estimator").
> Below is the minimum code that perform forward path, which raises 
> DisconnectedInputError due to missing gradient.
> 
> 
> import theano
> 
> import theano.tensor as T
> 
> import numpy as np
> 
> 
> logits_values = np.random.uniform(-1, 1, size=3)
> 
> logits = theano.shared(logits_values, 'logits')
> 
> probabilities = T.nnet.softmax(logits)
> 
> print('probabilities', probabilities.eval())
> 
> # result: probabilities [[ 0.55155489  0.290773    0.15767211]]
> 
> 
> random_streams = T.shared_randomstreams.RandomStreams()
> 
> index = random_streams.choice(size=(1,), a=3, p=probabilities[0])
> 
> samples = T.extra_ops.to_one_hot(index, logits.shape[-1])
> 
> print('samples', samples.eval())
> 
> # result: samples [[ 1.  0.  0.]]
> 
> 
> # We want to use gradient of probabilities instead of samples!
> 
> samples_grad = T.grad(samples[0][0], logits)
> 
> # result: raise DisconnectedInputError
> 
> 
> The node is not the final layer, so I can't use categorical cross entropy 
> loss for training it.
> 
> I am trying to implement custom op (see attached stochastic_softmax.py), but 
> it is not working in practice.
> 
> Since I have working expression for forward path, can I simply override 
> gradient of existing expression?
> 
> -- 
> 
> --- 
> You received this message because you are subscribed to the Google Groups 
> "theano-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to theano-users+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

> import numpy as np
> import theano
> import theano.tensor as T
> 
> 
> class StochasticSoftmax(theano.Op):
>     def __init__(self, random_state=np.random.RandomState()):
>         self.random_state = random_state
> 
>     def make_node(self, x):
>         x = T.as_tensor_variable(x)
>         return theano.Apply(self, [x], [x.type()])
> 
>     def perform(self, node, inputs, output_storage):
>         # Gumbel-max trick
>         x, = inputs
>         z = self.random_state.gumbel(loc=0, scale=1, size=x.shape)
>         indices = (x + z).argmax(axis=-1)
>         y = np.eye(x.shape[-1], dtype=np.float32)[indices]
>         output_storage[0][0] = y
> 
>     def grad(self, inp, grads):
>         x, = inp
>         g_sm, = grads
> 
>         sm = T.nnet.softmax(x)
>         return [T.nnet.softmax_grad(g_sm, sm)]
> 
>     def infer_shape(self, node, i0_shapes):
>         return i0_shapes


-- 
Pascal

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to theano-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [theano-users] Straight-through gradient for stochastic node in theano

Reply via email to