[theano-users] NaN gradient when using dropout in GRU

Wenpeng Yin Sun, 27 Nov 2016 23:38:33 -0800

Hi guys,

I always meet Nan gradient problem when using my following dropout code in 
the GRU network:



def dropout_standard(is_train, input, p, rng): 
>
>     srng = T.shared_randomstreams.RandomStreams(rng.randint(999999))
>
>     mask = srng.binomial(n = 1, p = 1-p, size = input.shape, dtype = 
>> theano.config.floatX)
>
>     return  T.switch(T.eq(is_train, 1), input * mask, input * (1 - p))
>
>
I searched and found the above code is pretty standard, right?  I used it 
for GRU to deal with sentence representations, in which each sentence is 
initially represented as a matrix, each column inside is a word embedding. 
 I planed to use dropout to set some entries in the matrix to be zero with 
probablity p.   

The Google shows some similar questions, but i didn't find any good 
solutions.

Thanks for any help!

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[theano-users] NaN gradient when using dropout in GRU

Reply via email to