Thanks, that is a helpful heuristic. On Monday, March 6, 2017 at 11:41:58 PM UTC-6, Jesse Livezey wrote: > > I'm not sure that such a list exists. > > One heuristic is that if your function returns an int or boolean (binary) > value, then the derivatives are probably going to be zero. > > set_subtensor returns a modified tensor (potentially a float) and so the > derivative with respect to the original tensor and new subtensor will > generally be non-zero. > > On Monday, March 6, 2017 at 3:20:45 PM UTC-8, > [email protected] wrote: >> >> Thanks, is there a way to know what operations are allowed in the context >> of building a loss function? I can see that T.eq would have 0 gradients >> everywhere except the discontinuous point at which the function equals 1, >> but I'm having trouble imagining what the gradient would be for something >> like T.set_subtensor, which also seems to have a 0 gradient. >> >> On Monday, March 6, 2017 at 11:38:59 AM UTC-6, Jesse Livezey wrote: >>> >>> There is nothing wrong with using T.eq. But, the derivatives with >>> respect to the inputs will be zero, so your cost function is not useful for >>> training. >>> >>> On Sunday, March 5, 2017 at 8:03:12 PM UTC-8, >>> [email protected] wrote: >>>> >>>> Thanks Jesse, so are there operations that are "safe" to use and others >>>> that aren't? Where can I find this information? Also, I've used T.eq >>>> before >>>> in another custom loss function which works correctly and doesn't return 0 >>>> gradients, but my use case there is in computing array indices, such as >>>> the >>>> way I'm using it in this line: >>>> >>>> pred_up = T.flatten(pred_one_hot[T.eq(unchanged_col, 0).nonzero(), >>>> preprocess.up_index]) >>>> >>>> Is T.eq ok to use in some contexts and not others? >>>> >>>> On Sunday, March 5, 2017 at 9:14:20 PM UTC-6, Jesse Livezey wrote: >>>>> >>>>> The gradient of T.eq will be zero (almost) everywhere and you're using >>>>> it to compute num_win and num_lose. >>>>> >>>>> On Sunday, March 5, 2017 at 2:42:14 PM UTC-8, >>>>> [email protected] wrote: >>>>>> >>>>>> Also, the return values of this loss function are small compared to >>>>>> cross-entropy, some sample values after random initialization were >>>>>> around >>>>>> +/- 0.01. There is a LSTM layer and the input sequences are thousands of >>>>>> elements long, so I suspected vanishing gradients. However, I'm printing >>>>>> out the min, max, and mean of the gradients w.r.t each parameter, and >>>>>> they >>>>>> are all exactly equal to 0, which seems to indicate a different problem. >>>>>> >>>>>> On Sunday, March 5, 2017 at 3:59:42 PM UTC-6, >>>>>> [email protected] wrote: >>>>>>> >>>>>>> I have defined a custom loss function, and despite the loss function >>>>>>> returning correct values given the inputs, the gradients are all always >>>>>>> 0 >>>>>>> w.r.t each of my parameters. I am not suppressing any theano errors >>>>>>> including the disconnected input error, so I can't explain what is >>>>>>> causing >>>>>>> this. I have copied the loss function below; in words, I first convert >>>>>>> a 3 >>>>>>> class softmax output into a one hot representation, then I compare a >>>>>>> subset >>>>>>> of it to the response and compute a quantity of interest. More >>>>>>> generally, I >>>>>>> was under the impression that if one could express a function using >>>>>>> theano >>>>>>> ops, it could be used as a loss function. Is this not the case? >>>>>>> >>>>>>> def calc_one_hot_loss(pred, y, mask): >>>>>>> mask_flat = T.flatten(mask) >>>>>>> length = T.sum(mask_flat, dtype='int32') >>>>>>> pred_unmasked = pred[mask_flat.nonzero()] >>>>>>> max_indices = T.argmax(pred_unmasked, axis=1) >>>>>>> pred_zero = T.set_subtensor(pred_unmasked[:], 0) >>>>>>> pred_one_hot = T.set_subtensor(pred_zero[T.arange(length), >>>>>>> max_indices], 1) >>>>>>> y_unmasked = y[mask_flat.nonzero()] >>>>>>> unchanged_col = pred_one_hot[:, preprocess.unchanged_index] >>>>>>> pred_up = T.flatten(pred_one_hot[T.eq(unchanged_col, 0).nonzero(), >>>>>>> preprocess.up_index]) >>>>>>> pred_down = T.flatten(pred_one_hot[T.eq(unchanged_col, >>>>>>> 0).nonzero(), preprocess.down_index]) >>>>>>> y_up = T.flatten(y_unmasked[T.eq(unchanged_col, 0).nonzero(), >>>>>>> preprocess.up_index]) >>>>>>> y_down = T.flatten(y_unmasked[T.eq(unchanged_col, 0).nonzero(), >>>>>>> preprocess.down_index]) >>>>>>> diff_up = T.abs_(pred_up - y_up) >>>>>>> diff_down = T.abs_(pred_down - y_down) >>>>>>> diff_sum = diff_up + diff_down >>>>>>> num_win = T.sum(T.eq(diff_sum, 0)) >>>>>>> num_lose = T.sum(T.eq(diff_sum, 2)) >>>>>>> loss = -1 * (num_win - num_lose) / length >>>>>>> return loss >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>
-- --- You received this message because you are subscribed to the Google Groups "theano-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
