Do you have the same problem with float64? If not this is a problem of numerical stability.
Le 18 sept. 2016 17:49, <[email protected]> a écrit : > An update: this apparently has nothing to do with softplus. I've > drastically simplified my network and removed many components (reducing the > loss function to a simple quadratic) and I still get the same error. The > issue seems to come from how I'm doing sub-tensor indexing (which obviously > should not introduce a NaN). I'll try to generate some simple sample code > that can reproduce the error. > > > On Sunday, September 18, 2016 at 10:14:31 PM UTC+2, [email protected] > wrote: >> >> I've been banging my head against this problem for several hours so I >> wanted to make sure one of my assumptions is not flawed (and hopefully get >> some advice). >> >> I have a relatively simple network that is entirely linear except for the >> loss function which is a sum of many softpluses. A snippet of the code is >> (here a_emb is a matrix and bias a scalar): >> >> z= bias - (a_emb - b_emb).norm(2, axis=1) >> if clip: >> z=z.clip(-bound,bound) >> L0 = -T.nnet.softplus(-z) >> >> L0 is one contribution to the loss. There are a few more from higher >> rank tensors that look like this (zn is a 3-tensor): >> >> L1=T.sum(-T.nnet.softplus(zn), axis=1) >> >> The only real complexity in the network is the use of subtensor >> indexing. Basically I'm training a very large embedding model so to avoid >> updating the whole matrix I take all inputs (e.g. indices corresponding to >> "a_emb", "b_emb" above), put them in a subtensor, and then extract them out >> again (by subindexing). I then only update the subtensor via something >> like this: >> >> updates = [(self.V, T.set_subtensor(subV, subV-lr* grads))] >> >> If it helps I could post code showing how I setup the subtensor (but all >> that stuff is just indexing, there's no non-linear operation there). >> >> There's also a non-linear L2 loss function on the subtensor but I can't >> imagine that's causing the problem: >> >> L2lossV=(subV.norm(2, axis=1)) >> >> I'm not sure all the above is relevant but the issue is that I'm getting >> NaN's very consistently and I'm having a hard time figuring out what >> operation is causing it (using Nangaurd just made the code too slow to ever >> get to the NaN). >> >> As you can see from the above I tried to fight the NaN's by clipping the >> input to softplus but this doesn't seem to work. I clip the inputs to -10 >> to 10 but I still get NaNs. >> >> My understanding, from reading around a little, was that softplus was >> supposed to help avoid NaNs so I'm a bit confused that they're still >> cropping up (and I can't see where else they could come from). I would >> appreciate any advice as to how to figure out the problem or even code >> around it. >> >> This is all with theano 0.8.2 on Ubuntu 16.04 and I'm using a CPU (but >> with float32). >> >> Thanks in advance for any help. >> >> >> -- > > --- > You received this message because you are subscribed to the Google Groups > "theano-users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. > -- --- You received this message because you are subscribed to the Google Groups "theano-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
