Hi all,
I am trying to implement vanilla Seq2Seq using GRU, the code works fine
without the mini-batch version, however after incorporating mini-batch the
model never converges, something is definitely wrong, I presume it's the
part where I have tried to ignore padded tokens (or the way I am computing
cost), here's the relevant code of decoder (encoder is straight-forward):
def recurrence(msk, h_tm_prev, y_tm_prev):
x_z = T.dot(self.emb[y_tm_prev], self.W_z) + self.b_z
x_r = T.dot(self.emb[y_tm_prev], self.W_r) + self.b_r
x_h = T.dot(self.emb[y_tm_prev], self.W) + T.dot(self.enc_h,
self.c_h) + self.bh
z_t = self.inner_activation(x_z + T.dot(h_tm_prev, self.U_z))
r_t = self.inner_activation(x_r + T.dot(h_tm_prev, self.U_r))
hh_t = self.activation(x_h + T.dot(r_t * h_tm_prev, self.U))
h_t = (T.ones_like(z_t) - z_t) * hh_t + z_t * h_tm_prev
# needed to back-propagate errors
y_d_t = T.dot(h_t, self.V) + T.dot(self.enc_h, self.c_y) + T.dot
(self.emb[y_tm_prev], self.y_t1) + self.by
*# ignore padded tokens, is this correct ?*
y_d_t = T.batched_dot(y_d_t, msk)
# y_d_t = y_d_t * msk.dimshuffle(0, 'x')
y_d = T.clip(T.nnet.softmax(y_d_t),
0.0001, 0.9999)
y_t = T.argmax(y_d, axis=1)
return h_t, y_d, T.cast(y_t.flatten(), 'int32')
[_, y_dist, y], _ = theano.scan(
fn=recurrence,
sequences=mask.dimshuffle(1, 0), # ugly, but we have to go
till the end (will go till max_len)
outputs_info=[T.alloc(self.h0, self.enc_h.shape[0], hidden_dim),
None,
T.alloc(self.y0, self.enc_h.shape[0])]
)
self.y = y.dimshuffle(1, 0)
self.y_dist = y_dist.dimshuffle(1, 0, 2)
def negative_log_likelihood(self, y):
def compute_cost(y_dist, target):
return T.sum(T.nnet.categorical_crossentropy(y_dist, target))
batched_cost, _ = theano.scan(
fn=compute_cost,
sequences=[self.y_dist, y],
outputs_info=None
)
return T.mean(batched_cost)
* dimensions of selective variables: *mask* -> (batch_size, max_len),
*enc_h* -> (batch_size, hidden_dim), *X, Y* -> (batch_size, max_len),
Any clues would be highly appreciated. Thanks !
If someone is interested, this <https://github.com/uyaseen/neural-converse>
is the complete code.
--
---
You received this message because you are subscribed to the Google Groups
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.