I'm trying to verify the vanilla RNN model using Theano but is seeing the
error I am not being able to understand. Any help will be much appreciated.
Following is the code snippet for RNN:
-----------------------------------------------------
W_xh, W_hy, W_hh, b_h, b_y = self.params
x = T.vector('x')
y = T.vector('y')
def forward_prop_step(x_t, h_t_prev, W_xh, W_hy, W_hh, b_h, b_y):
h_t = T.tanh(W_xh[:, x_t] + T.dot(W_hh, h_t_prev) + b_h)
...........................................(1)
# h_t = T.tanh(T.dot(W_xh, x_t) + T.dot(W_hh, h_t_prev) +
b_h).....................................(2)
o_t = T.nnet.softmax(T.dot(W_hy, h_t) + b_y)
return [o_t[0], h_t]
h_0 = T.zeros(self.n_hidden)
[o,h], _ = theano.scan(
forward_prop_step,
sequences=x,
outputs_info=[None, h_0],
non_sequences=[W_xh, W_hy, W_hh, b_h, b_y],
truncate_gradient=self.bptt_truncate,
strict=True)
---------------------------------------------------
In line (1) and (2), the only difference is replacing W_xh[:, x_t] with
actual dot product T.dot(W_xh, x_t). When x_t is one hot encoding vector,
column selection specified in (1) works fine. But if input is a vector of
floating point numbers, we need to use what specified in (2). However,
using (2) throws the following error:
ValueError: When compiling the inner function of scan (the function called
by scan in each of its iterations) the following error has been
encountered: The initial state (`outputs_info` in scan nomenclature) of
variable IncSubtensor{Set;:int64:}.0 (argument number 1) has 2
dimension(s), while the corresponding variable in the result of the inner
function of scan (`fn`) has 2 dimension(s) (it should be one less than the
initial state). For example, if the inner function of scan returns a vector
of size d and scan uses the values of the previous time-step, then the
initial state in scan should be a matrix of shape (1, d). The first
dimension of this matrix corresponds to the number of previous time-steps
that scan uses in each of its iterations. In order to solve this issue if
the two varialbe currently have the same dimensionality, you can increase
the dimensionality of the variable in the initial state of scan by using
dimshuffle or shape_padleft.
Seems like there's some problem with initialization 'h_0'.
My questions are:
1. Is there any workaround to avoid the issue like how 'h_0' should be
initialized?
2. Why does it work for one hot encoding vector in line (1) but not in line
(2)?
--
---
You received this message because you are subscribed to the Google Groups
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.