Do you have the same problem with float64? If not this is a problem of
numerical stability.

Le 18 sept. 2016 17:49, <my.m...@gmail.com> a écrit :

> An update: this apparently has nothing to do with softplus.  I've
> drastically simplified my network and removed many components (reducing the
> loss function to a simple quadratic) and I still get the same error.  The
> issue seems to come from how I'm doing sub-tensor indexing (which obviously
> should not introduce a NaN).  I'll try to generate some simple sample code
> that can reproduce the error.
>
>
> On Sunday, September 18, 2016 at 10:14:31 PM UTC+2, my....@gmail.com
> wrote:
>>
>> I've been banging my head against this problem for several hours so I
>> wanted to make sure one of my assumptions is not flawed (and hopefully get
>> some advice).
>>
>> I have a relatively simple network that is entirely linear except for the
>> loss function which is a sum of many softpluses.  A snippet of the code is
>> (here a_emb is a matrix and bias a scalar):
>>
>>         z= bias - (a_emb  - b_emb).norm(2, axis=1)
>>         if clip:
>>             z=z.clip(-bound,bound)
>>         L0 = -T.nnet.softplus(-z)
>>
>> L0 is one contribution to the loss.  There are a few more from higher
>> rank tensors that look like this (zn is a 3-tensor):
>>
>>         L1=T.sum(-T.nnet.softplus(zn), axis=1)
>>
>> The only real complexity in the network is the use of subtensor
>> indexing.  Basically I'm training a very large embedding model so to avoid
>> updating the whole matrix I take all inputs (e.g. indices corresponding to
>> "a_emb", "b_emb" above), put them in a subtensor, and then extract them out
>> again (by subindexing).  I then only update the subtensor via something
>> like this:
>>
>>         updates = [(self.V, T.set_subtensor(subV, subV-lr* grads))]
>>
>> If it helps I could post code showing how I setup the subtensor (but all
>> that stuff is just indexing, there's no non-linear operation there).
>>
>> There's also a non-linear L2 loss function on the subtensor but I can't
>> imagine that's causing the problem:
>>
>>         L2lossV=(subV.norm(2, axis=1))
>>
>> I'm not sure all the above is relevant but the issue is that I'm getting
>> NaN's very consistently and I'm having a hard time figuring out what
>> operation is causing it (using Nangaurd just made the code too slow to ever
>> get to the NaN).
>>
>> As you can see from the above I tried to fight the NaN's by clipping the
>> input to softplus but this doesn't seem to work.  I clip the inputs to -10
>> to 10 but I still get NaNs.
>>
>> My understanding, from reading around a little, was that softplus was
>> supposed to help avoid NaNs so I'm a bit confused that they're still
>> cropping up (and I can't see where else they could come from).  I would
>> appreciate any advice as to how to figure out the problem or even code
>> around it.
>>
>> This is all with theano 0.8.2 on Ubuntu 16.04 and I'm using a CPU (but
>> with float32).
>>
>> Thanks in advance for any help.
>>
>>
>> --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "theano-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to theano-users+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to theano-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to