I'm not sure that such a list exists. One heuristic is that if your function returns an int or boolean (binary) value, then the derivatives are probably going to be zero.
set_subtensor returns a modified tensor (potentially a float) and so the derivative with respect to the original tensor and new subtensor will generally be non-zero. On Monday, March 6, 2017 at 3:20:45 PM UTC-8, [email protected] wrote: > > Thanks, is there a way to know what operations are allowed in the context > of building a loss function? I can see that T.eq would have 0 gradients > everywhere except the discontinuous point at which the function equals 1, > but I'm having trouble imagining what the gradient would be for something > like T.set_subtensor, which also seems to have a 0 gradient. > > On Monday, March 6, 2017 at 11:38:59 AM UTC-6, Jesse Livezey wrote: >> >> There is nothing wrong with using T.eq. But, the derivatives with respect >> to the inputs will be zero, so your cost function is not useful for >> training. >> >> On Sunday, March 5, 2017 at 8:03:12 PM UTC-8, >> [email protected] wrote: >>> >>> Thanks Jesse, so are there operations that are "safe" to use and others >>> that aren't? Where can I find this information? Also, I've used T.eq before >>> in another custom loss function which works correctly and doesn't return 0 >>> gradients, but my use case there is in computing array indices, such as the >>> way I'm using it in this line: >>> >>> pred_up = T.flatten(pred_one_hot[T.eq(unchanged_col, 0).nonzero(), >>> preprocess.up_index]) >>> >>> Is T.eq ok to use in some contexts and not others? >>> >>> On Sunday, March 5, 2017 at 9:14:20 PM UTC-6, Jesse Livezey wrote: >>>> >>>> The gradient of T.eq will be zero (almost) everywhere and you're using >>>> it to compute num_win and num_lose. >>>> >>>> On Sunday, March 5, 2017 at 2:42:14 PM UTC-8, >>>> [email protected] wrote: >>>>> >>>>> Also, the return values of this loss function are small compared to >>>>> cross-entropy, some sample values after random initialization were around >>>>> +/- 0.01. There is a LSTM layer and the input sequences are thousands of >>>>> elements long, so I suspected vanishing gradients. However, I'm printing >>>>> out the min, max, and mean of the gradients w.r.t each parameter, and >>>>> they >>>>> are all exactly equal to 0, which seems to indicate a different problem. >>>>> >>>>> On Sunday, March 5, 2017 at 3:59:42 PM UTC-6, >>>>> [email protected] wrote: >>>>>> >>>>>> I have defined a custom loss function, and despite the loss function >>>>>> returning correct values given the inputs, the gradients are all always >>>>>> 0 >>>>>> w.r.t each of my parameters. I am not suppressing any theano errors >>>>>> including the disconnected input error, so I can't explain what is >>>>>> causing >>>>>> this. I have copied the loss function below; in words, I first convert a >>>>>> 3 >>>>>> class softmax output into a one hot representation, then I compare a >>>>>> subset >>>>>> of it to the response and compute a quantity of interest. More >>>>>> generally, I >>>>>> was under the impression that if one could express a function using >>>>>> theano >>>>>> ops, it could be used as a loss function. Is this not the case? >>>>>> >>>>>> def calc_one_hot_loss(pred, y, mask): >>>>>> mask_flat = T.flatten(mask) >>>>>> length = T.sum(mask_flat, dtype='int32') >>>>>> pred_unmasked = pred[mask_flat.nonzero()] >>>>>> max_indices = T.argmax(pred_unmasked, axis=1) >>>>>> pred_zero = T.set_subtensor(pred_unmasked[:], 0) >>>>>> pred_one_hot = T.set_subtensor(pred_zero[T.arange(length), >>>>>> max_indices], 1) >>>>>> y_unmasked = y[mask_flat.nonzero()] >>>>>> unchanged_col = pred_one_hot[:, preprocess.unchanged_index] >>>>>> pred_up = T.flatten(pred_one_hot[T.eq(unchanged_col, 0).nonzero(), >>>>>> preprocess.up_index]) >>>>>> pred_down = T.flatten(pred_one_hot[T.eq(unchanged_col, 0).nonzero(), >>>>>> preprocess.down_index]) >>>>>> y_up = T.flatten(y_unmasked[T.eq(unchanged_col, 0).nonzero(), >>>>>> preprocess.up_index]) >>>>>> y_down = T.flatten(y_unmasked[T.eq(unchanged_col, 0).nonzero(), >>>>>> preprocess.down_index]) >>>>>> diff_up = T.abs_(pred_up - y_up) >>>>>> diff_down = T.abs_(pred_down - y_down) >>>>>> diff_sum = diff_up + diff_down >>>>>> num_win = T.sum(T.eq(diff_sum, 0)) >>>>>> num_lose = T.sum(T.eq(diff_sum, 2)) >>>>>> loss = -1 * (num_win - num_lose) / length >>>>>> return loss >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- --- You received this message because you are subscribed to the Google Groups "theano-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
