Hi guys, I always meet Nan gradient problem when using my following dropout code in the GRU network:
def dropout_standard(is_train, input, p, rng): > > srng = T.shared_randomstreams.RandomStreams(rng.randint(999999)) > > mask = srng.binomial(n = 1, p = 1-p, size = input.shape, dtype = >> theano.config.floatX) > > return T.switch(T.eq(is_train, 1), input * mask, input * (1 - p)) > > I searched and found the above code is pretty standard, right? I used it for GRU to deal with sentence representations, in which each sentence is initially represented as a matrix, each column inside is a word embedding. I planed to use dropout to set some entries in the matrix to be zero with probablity p. The Google shows some similar questions, but i didn't find any good solutions. Thanks for any help! -- --- You received this message because you are subscribed to the Google Groups "theano-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
