Thanks Jesse, so are there operations that are "safe" to use and others 
that aren't? Where can I find this information? Also, I've used T.eq before 
in another custom loss function which works correctly and doesn't return 0 
gradients, but my use case there is in computing array indices, such as the 
way I'm using it in this line:

pred_up = T.flatten(pred_one_hot[T.eq(unchanged_col, 0).nonzero(), 
preprocess.up_index])

Is T.eq ok to use in some contexts and not others?

On Sunday, March 5, 2017 at 9:14:20 PM UTC-6, Jesse Livezey wrote:
>
> The gradient of T.eq will be zero (almost) everywhere and you're using it 
> to compute num_win and num_lose.
>
> On Sunday, March 5, 2017 at 2:42:14 PM UTC-8, 
> [email protected] wrote:
>>
>> Also, the return values of this loss function are small compared to 
>> cross-entropy, some sample values after random initialization were around 
>> +/- 0.01. There is a LSTM layer and the input sequences are thousands of 
>> elements long, so I suspected vanishing gradients. However, I'm printing 
>> out the min, max, and mean of the gradients w.r.t each parameter, and they 
>> are all exactly equal to 0, which seems to indicate a different problem.
>>
>> On Sunday, March 5, 2017 at 3:59:42 PM UTC-6, 
>> [email protected] wrote:
>>>
>>> I have defined a custom loss function, and despite the loss function 
>>> returning correct values given the inputs, the gradients are all always 0 
>>> w.r.t each of my parameters. I am not suppressing any theano errors 
>>> including the disconnected input error, so I can't explain what is causing 
>>> this. I have copied the loss function below; in words, I first convert a 3 
>>> class softmax output into a one hot representation, then I compare a subset 
>>> of it to the response and compute a quantity of interest. More generally, I 
>>> was under the impression that if one could express a function using theano 
>>> ops, it could be used as a loss function. Is this not the case?
>>>
>>> def calc_one_hot_loss(pred, y, mask):
>>>     mask_flat = T.flatten(mask)
>>>     length = T.sum(mask_flat, dtype='int32')
>>>     pred_unmasked = pred[mask_flat.nonzero()]
>>>     max_indices = T.argmax(pred_unmasked, axis=1)
>>>     pred_zero = T.set_subtensor(pred_unmasked[:], 0)
>>>     pred_one_hot = T.set_subtensor(pred_zero[T.arange(length), 
>>> max_indices], 1)
>>>     y_unmasked = y[mask_flat.nonzero()]
>>>     unchanged_col = pred_one_hot[:, preprocess.unchanged_index]
>>>     pred_up = T.flatten(pred_one_hot[T.eq(unchanged_col, 0).nonzero(), 
>>> preprocess.up_index])
>>>     pred_down = T.flatten(pred_one_hot[T.eq(unchanged_col, 0).nonzero(), 
>>> preprocess.down_index])
>>>     y_up = T.flatten(y_unmasked[T.eq(unchanged_col, 0).nonzero(), 
>>> preprocess.up_index])
>>>     y_down = T.flatten(y_unmasked[T.eq(unchanged_col, 0).nonzero(), 
>>> preprocess.down_index])
>>>     diff_up = T.abs_(pred_up - y_up)
>>>     diff_down = T.abs_(pred_down - y_down)
>>>     diff_sum = diff_up + diff_down
>>>     num_win = T.sum(T.eq(diff_sum, 0))
>>>     num_lose = T.sum(T.eq(diff_sum, 2))
>>>     loss = -1 * (num_win - num_lose) / length
>>>     return loss
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to