I figured it out. I was not using the ''axes'' parameter. So I've been doing "per-activation" mean and variance which is simply poor. By starting to use "axes" for convolutions as 'spatial', I am getting good results now.
Perhaps theano should make 'spatial' default for 4d tensors input and 'per-activation' default for 2d tensors. On Saturday, February 25, 2017 at 6:47:03 PM UTC-7, Ragav Venkatesan wrote: > > There still seems to be a bug. It works reasonably for dotproduct layers. > But it fails for conv layers. All I get is Nans. > > On Monday, February 20, 2017 at 9:55:27 AM UTC-7, nouiz wrote: >> >> BN had a bug that we fixed Friday. Can you update Theano and try again? >> Maybe it is already fixed. >> >> Fred >> >> On Mon, Feb 20, 2017 at 12:28 AM Ragav Venkatesan <[email protected]> >> wrote: >> >>> My previous comments had some bugs. Here is how I use it. >>> >>> self.gamma = >>> theano.shared(value=numpy.ones((1,channels,width,height), >>> dtype=theano.config.floatX), name = >>> 'gamma', borrow = borrow) >>> self.beta = >>> theano.shared(value=numpy.ones((1,channels,width,height), >>> dtype=theano.config.floatX), name = >>> 'beta', borrow=borrow) >>> self.mean = >>> theano.shared(value=numpy.ones((1,channels,width,height), >>> dtype=theano.config.floatX), name = >>> 'population_mean', borrow = borrow) >>> self.var = >>> theano.shared(value=numpy.ones((1,channels,width,height), >>> dtype=theano.config.floatX), name = >>> 'population_var', borrow=borrow) >>> >>> batch_norm_out,_,_,self.mean,self.var = >>> batch_normalization_train( >>> inputs = pool_out + \ >>> >>> self.b.dimshuffle('x', 0, 'x', 'x'), >>> gamma = self.gamma, >>> beta = self.beta, >>> running_mean = >>> self.mean, >>> running_var = self.var >>> ) >>> batch_norm_inference = batch_normalization_test ( >>> inputs = pool_out + \ >>> >>> self.b.dimshuffle('x', 0, 'x', 'x'), >>> gamma = self.gamma, >>> beta = self.beta, >>> mean = self.mean, >>> var = self.var ) >>> >>> I use batch_norm_out while training and batch_norm_inference while >>> testing. >>> >>> The question I still have though is on the running mean and variance >>> returned by the train method. Is it alright to over-write them the way I >>> have done so ? If not should I create a automatic update for the mean and >>> variance update such as >>> >>> updates [self.mean] = (running mean returned by the train method). >>> >>> >>> On Sunday, February 19, 2017 at 9:19:57 PM UTC-7, Ragav Venkatesan wrote: >>>> >>>> I also have a question on this. This is how I am using it at the moment >>>> for my convolutional layer: (conv + pool in pool out, pre-activation) >>>> >>>> self.mean = theano.shared(value=numpy.zeros((channels,), >>>> dtype=theano.config.floatX), name = >>>> 'population_mean', borrow = borrow) >>>> self.var = theano.shared(value=numpy.zeros((nkerns,), >>>> dtype=theano.config.floatX), >>>> name = 'population_var', >>>> borrow=borrow) >>>> >>>> batch_norm_out,_,_,self.mean,self.var = >>>> batch_normalization_train( >>>> inputs = pool_out + \ >>>> >>>> self.b.dimshuffle('x', 0, 'x', 'x'), >>>> gamma = self.gamma, >>>> beta = self.beta, >>>> running_mean = >>>> self.mean, >>>> running_var = >>>> self.var ) >>>> >>>> And for inference time, I use the following : >>>> >>>> batch_norm_inference = batch_normalization_test ( >>>> inputs = pool_out + >>>> \ >>>> >>>> self.b.dimshuffle('x', 0, 'x', 'x'), >>>> gamma = self.gamma, >>>> beta = self.beta, >>>> mean = self.mean, >>>> var = self.var ) >>>> >>>> The question I have though is on the running mean and variance returned >>>> by the train method. Is it alright to over-write them the way I have done >>>> so ? If not should I create a automatic update for the mean and variance >>>> update such as >>>> >>>> updates [self.mean] = (running mean returned by the train method). >>>> >>>> >>>> On Thursday, February 16, 2017 at 8:17:24 PM UTC-7, David Leon wrote: >>>>> >>>>> I'm using nnet.bn.batch_normalization_train() and >>>>> nnet.bn.batch_normalization_test() for batch normalization, however >>>>> during >>>>> test phase, nnet.bn.batch_normalization_test() produces wrong results. >>>>> For >>>>> the time being, I just use nnet.bn.batch_normalization_train() with >>>>> *running_average_factor >>>>> *set to zero for test phase as: >>>>> >>>>> if deterministic is False: # train phase >>>>> normalized, input_mean, input_inv_std, self.mean, self.var = >>>>> T.nnet.bn.batch_normalization_train(input, self.gamma, self.beta, >>>>> self.axes, >>>>> >>>>> self.epsilon, self.alpha, self.mean, self.var) >>>>> else: # test phase >>>>> # normalized = T.nnet.bn.batch_normalization_test(input, self.gamma, >>>>> self.beta, self.mean, self.var, self.axes, self.epsilon) >>>>> normalized, _, _, _, _ = T.nnet.bn.batch_normalization_train(input, >>>>> self.gamma, self.beta, self.axes, self.epsilon, 0.0, self.mean, self.var) >>>>> return normalized >>>>> >>>>> >>>>> >>>>> My theano version is >>>>> '0.9.0beta1.dev-b2afa088d1cb416b4507348019af34adae908b73', CUDA 8.0 and >>>>> CuDNN 5.1 >>>>> >>>>> -- >>> >>> --- >>> You received this message because you are subscribed to the Google >>> Groups "theano-users" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> For more options, visit https://groups.google.com/d/optout. >>> >> -- --- You received this message because you are subscribed to the Google Groups "theano-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
