gamma and beta should be shared variables. They should not reinitialized,
but need to be updated by passing them as arguments of updates to your
theano function (see documentation or examples on using update).
Here is how I use batch_normalization:
(assuming X is input for a particular layer, X.ndim == 2, and gamma and
beta are shared variable vectors with the same shape as X.shape[1])
mean = X.mean(0, keepdims=True)
std = X.std(0, keepdims=True)
std = T.sqrt(std**2 + 1e-6) # for stability
X_bn = batch_normalization(inputs=X, gamma=gamma, beta=beta,
mean=mean, std=std)
Typically, gamma and beta are learned through backprop / gradient descent,
though it probably wouldn't be incorrect to update them toward the mean /
std of that particular layer, e.g. `updates = [(gamma, 0.9 * gamma + 0.1 *
X.mean(0))]`.
--
---
You received this message because you are subscribed to the Google Groups
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.