[theano-users] Re: Gradients are always 0 for custom loss function

taromakino Tue, 07 Mar 2017 06:13:12 -0800

Thanks, that is a helpful heuristic.

On Monday, March 6, 2017 at 11:41:58 PM UTC-6, Jesse Livezey wrote:
>
> I'm not sure that such a list exists.
>
> One heuristic is that if your function returns an int or boolean (binary) 
> value, then the derivatives are probably going to be zero.
>
> set_subtensor returns a modified tensor (potentially a float) and so the 
> derivative with respect to the original tensor and new subtensor will 
> generally be non-zero.
>
> On Monday, March 6, 2017 at 3:20:45 PM UTC-8, 
> [email protected] wrote:
>>
>> Thanks, is there a way to know what operations are allowed in the context 
>> of building a loss function? I can see that T.eq would have 0 gradients 
>> everywhere except the discontinuous point at which the function equals 1, 
>> but I'm having trouble imagining what the gradient would be for something 
>> like T.set_subtensor, which also seems to have a 0 gradient.
>>
>> On Monday, March 6, 2017 at 11:38:59 AM UTC-6, Jesse Livezey wrote:
>>>
>>> There is nothing wrong with using T.eq. But, the derivatives with 
>>> respect to the inputs will be zero, so your cost function is not useful for 
>>> training.
>>>
>>> On Sunday, March 5, 2017 at 8:03:12 PM UTC-8, 
>>> [email protected] wrote:
>>>>
>>>> Thanks Jesse, so are there operations that are "safe" to use and others 
>>>> that aren't? Where can I find this information? Also, I've used T.eq 
>>>> before 
>>>> in another custom loss function which works correctly and doesn't return 0 
>>>> gradients, but my use case there is in computing array indices, such as 
>>>> the 
>>>> way I'm using it in this line:
>>>>
>>>> pred_up = T.flatten(pred_one_hot[T.eq(unchanged_col, 0).nonzero(), 
>>>> preprocess.up_index])
>>>>
>>>> Is T.eq ok to use in some contexts and not others?
>>>>
>>>> On Sunday, March 5, 2017 at 9:14:20 PM UTC-6, Jesse Livezey wrote:
>>>>>
>>>>> The gradient of T.eq will be zero (almost) everywhere and you're using 
>>>>> it to compute num_win and num_lose.
>>>>>
>>>>> On Sunday, March 5, 2017 at 2:42:14 PM UTC-8, 
>>>>> [email protected] wrote:
>>>>>>
>>>>>> Also, the return values of this loss function are small compared to 
>>>>>> cross-entropy, some sample values after random initialization were 
>>>>>> around 
>>>>>> +/- 0.01. There is a LSTM layer and the input sequences are thousands of 
>>>>>> elements long, so I suspected vanishing gradients. However, I'm printing 
>>>>>> out the min, max, and mean of the gradients w.r.t each parameter, and 
>>>>>> they 
>>>>>> are all exactly equal to 0, which seems to indicate a different problem.
>>>>>>
>>>>>> On Sunday, March 5, 2017 at 3:59:42 PM UTC-6, 
>>>>>> [email protected] wrote:
>>>>>>>
>>>>>>> I have defined a custom loss function, and despite the loss function 
>>>>>>> returning correct values given the inputs, the gradients are all always 
>>>>>>> 0 
>>>>>>> w.r.t each of my parameters. I am not suppressing any theano errors 
>>>>>>> including the disconnected input error, so I can't explain what is 
>>>>>>> causing 
>>>>>>> this. I have copied the loss function below; in words, I first convert 
>>>>>>> a 3 
>>>>>>> class softmax output into a one hot representation, then I compare a 
>>>>>>> subset 
>>>>>>> of it to the response and compute a quantity of interest. More 
>>>>>>> generally, I 
>>>>>>> was under the impression that if one could express a function using 
>>>>>>> theano 
>>>>>>> ops, it could be used as a loss function. Is this not the case?
>>>>>>>
>>>>>>> def calc_one_hot_loss(pred, y, mask):
>>>>>>>     mask_flat = T.flatten(mask)
>>>>>>>     length = T.sum(mask_flat, dtype='int32')
>>>>>>>     pred_unmasked = pred[mask_flat.nonzero()]
>>>>>>>     max_indices = T.argmax(pred_unmasked, axis=1)
>>>>>>>     pred_zero = T.set_subtensor(pred_unmasked[:], 0)
>>>>>>>     pred_one_hot = T.set_subtensor(pred_zero[T.arange(length), 
>>>>>>> max_indices], 1)
>>>>>>>     y_unmasked = y[mask_flat.nonzero()]
>>>>>>>     unchanged_col = pred_one_hot[:, preprocess.unchanged_index]
>>>>>>>     pred_up = T.flatten(pred_one_hot[T.eq(unchanged_col, 0).nonzero(), 
>>>>>>> preprocess.up_index])
>>>>>>>     pred_down = T.flatten(pred_one_hot[T.eq(unchanged_col, 
>>>>>>> 0).nonzero(), preprocess.down_index])
>>>>>>>     y_up = T.flatten(y_unmasked[T.eq(unchanged_col, 0).nonzero(), 
>>>>>>> preprocess.up_index])
>>>>>>>     y_down = T.flatten(y_unmasked[T.eq(unchanged_col, 0).nonzero(), 
>>>>>>> preprocess.down_index])
>>>>>>>     diff_up = T.abs_(pred_up - y_up)
>>>>>>>     diff_down = T.abs_(pred_down - y_down)
>>>>>>>     diff_sum = diff_up + diff_down
>>>>>>>     num_win = T.sum(T.eq(diff_sum, 0))
>>>>>>>     num_lose = T.sum(T.eq(diff_sum, 2))
>>>>>>>     loss = -1 * (num_win - num_lose) / length
>>>>>>>     return loss
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>


-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[theano-users] Re: Gradients are always 0 for custom loss function

Reply via email to