[theano-users] Re: Gradients are always 0 for custom loss function

Jesse Livezey Mon, 06 Mar 2017 09:40:22 -0800

There is nothing wrong with using T.eq. But, the derivatives with respect 
to the inputs will be zero, so your cost function is not useful for 
training.


On Sunday, March 5, 2017 at 8:03:12 PM UTC-8, 
tarom...@alum.northwestern.edu wrote:
>
> Thanks Jesse, so are there operations that are "safe" to use and others 
> that aren't? Where can I find this information? Also, I've used T.eq before 
> in another custom loss function which works correctly and doesn't return 0 
> gradients, but my use case there is in computing array indices, such as the 
> way I'm using it in this line:
>
> pred_up = T.flatten(pred_one_hot[T.eq(unchanged_col, 0).nonzero(), 
> preprocess.up_index])
>
> Is T.eq ok to use in some contexts and not others?
>
> On Sunday, March 5, 2017 at 9:14:20 PM UTC-6, Jesse Livezey wrote:
>>
>> The gradient of T.eq will be zero (almost) everywhere and you're using it 
>> to compute num_win and num_lose.
>>
>> On Sunday, March 5, 2017 at 2:42:14 PM UTC-8, 
>> tarom...@alum.northwestern.edu wrote:
>>>
>>> Also, the return values of this loss function are small compared to 
>>> cross-entropy, some sample values after random initialization were around 
>>> +/- 0.01. There is a LSTM layer and the input sequences are thousands of 
>>> elements long, so I suspected vanishing gradients. However, I'm printing 
>>> out the min, max, and mean of the gradients w.r.t each parameter, and they 
>>> are all exactly equal to 0, which seems to indicate a different problem.
>>>
>>> On Sunday, March 5, 2017 at 3:59:42 PM UTC-6, 
>>> tarom...@alum.northwestern.edu wrote:
>>>>
>>>> I have defined a custom loss function, and despite the loss function 
>>>> returning correct values given the inputs, the gradients are all always 0 
>>>> w.r.t each of my parameters. I am not suppressing any theano errors 
>>>> including the disconnected input error, so I can't explain what is causing 
>>>> this. I have copied the loss function below; in words, I first convert a 3 
>>>> class softmax output into a one hot representation, then I compare a 
>>>> subset 
>>>> of it to the response and compute a quantity of interest. More generally, 
>>>> I 
>>>> was under the impression that if one could express a function using theano 
>>>> ops, it could be used as a loss function. Is this not the case?
>>>>
>>>> def calc_one_hot_loss(pred, y, mask):
>>>>     mask_flat = T.flatten(mask)
>>>>     length = T.sum(mask_flat, dtype='int32')
>>>>     pred_unmasked = pred[mask_flat.nonzero()]
>>>>     max_indices = T.argmax(pred_unmasked, axis=1)
>>>>     pred_zero = T.set_subtensor(pred_unmasked[:], 0)
>>>>     pred_one_hot = T.set_subtensor(pred_zero[T.arange(length), 
>>>> max_indices], 1)
>>>>     y_unmasked = y[mask_flat.nonzero()]
>>>>     unchanged_col = pred_one_hot[:, preprocess.unchanged_index]
>>>>     pred_up = T.flatten(pred_one_hot[T.eq(unchanged_col, 0).nonzero(), 
>>>> preprocess.up_index])
>>>>     pred_down = T.flatten(pred_one_hot[T.eq(unchanged_col, 0).nonzero(), 
>>>> preprocess.down_index])
>>>>     y_up = T.flatten(y_unmasked[T.eq(unchanged_col, 0).nonzero(), 
>>>> preprocess.up_index])
>>>>     y_down = T.flatten(y_unmasked[T.eq(unchanged_col, 0).nonzero(), 
>>>> preprocess.down_index])
>>>>     diff_up = T.abs_(pred_up - y_up)
>>>>     diff_down = T.abs_(pred_down - y_down)
>>>>     diff_sum = diff_up + diff_down
>>>>     num_win = T.sum(T.eq(diff_sum, 0))
>>>>     num_lose = T.sum(T.eq(diff_sum, 2))
>>>>     loss = -1 * (num_win - num_lose) / length
>>>>     return loss
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to theano-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[theano-users] Re: Gradients are always 0 for custom loss function

Reply via email to