date:20170306

[theano-users] Re: Gradients are always 0 for custom loss function

2017-03-06 Thread Jesse Livezey

I'm not sure that such a list exists.

One heuristic is that if your function returns an int or boolean (binary) 
value, then the derivatives are probably going to be zero.

set_subtensor returns a modified tensor (potentially a float) and so the 
derivative with respect to the original tensor and new subtensor will 
generally be non-zero.

On Monday, March 6, 2017 at 3:20:45 PM UTC-8, 
tarom...@alum.northwestern.edu wrote:
>
> Thanks, is there a way to know what operations are allowed in the context 
> of building a loss function? I can see that T.eq would have 0 gradients 
> everywhere except the discontinuous point at which the function equals 1, 
> but I'm having trouble imagining what the gradient would be for something 
> like T.set_subtensor, which also seems to have a 0 gradient.
>
> On Monday, March 6, 2017 at 11:38:59 AM UTC-6, Jesse Livezey wrote:
>>
>> There is nothing wrong with using T.eq. But, the derivatives with respect 
>> to the inputs will be zero, so your cost function is not useful for 
>> training.
>>
>> On Sunday, March 5, 2017 at 8:03:12 PM UTC-8, 
>> tarom...@alum.northwestern.edu wrote:
>>>
>>> Thanks Jesse, so are there operations that are "safe" to use and others 
>>> that aren't? Where can I find this information? Also, I've used T.eq before 
>>> in another custom loss function which works correctly and doesn't return 0 
>>> gradients, but my use case there is in computing array indices, such as the 
>>> way I'm using it in this line:
>>>
>>> pred_up = T.flatten(pred_one_hot[T.eq(unchanged_col, 0).nonzero(), 
>>> preprocess.up_index])
>>>
>>> Is T.eq ok to use in some contexts and not others?
>>>
>>> On Sunday, March 5, 2017 at 9:14:20 PM UTC-6, Jesse Livezey wrote:

 The gradient of T.eq will be zero (almost) everywhere and you're using 
 it to compute num_win and num_lose.

 On Sunday, March 5, 2017 at 2:42:14 PM UTC-8, 
 tarom...@alum.northwestern.edu wrote:
>
> Also, the return values of this loss function are small compared to 
> cross-entropy, some sample values after random initialization were around 
> +/- 0.01. There is a LSTM layer and the input sequences are thousands of 
> elements long, so I suspected vanishing gradients. However, I'm printing 
> out the min, max, and mean of the gradients w.r.t each parameter, and 
> they 
> are all exactly equal to 0, which seems to indicate a different problem.
>
> On Sunday, March 5, 2017 at 3:59:42 PM UTC-6, 
> tarom...@alum.northwestern.edu wrote:
>>
>> I have defined a custom loss function, and despite the loss function 
>> returning correct values given the inputs, the gradients are all always 
>> 0 
>> w.r.t each of my parameters. I am not suppressing any theano errors 
>> including the disconnected input error, so I can't explain what is 
>> causing 
>> this. I have copied the loss function below; in words, I first convert a 
>> 3 
>> class softmax output into a one hot representation, then I compare a 
>> subset 
>> of it to the response and compute a quantity of interest. More 
>> generally, I 
>> was under the impression that if one could express a function using 
>> theano 
>> ops, it could be used as a loss function. Is this not the case?
>>
>> def calc_one_hot_loss(pred, y, mask):
>> mask_flat = T.flatten(mask)
>> length = T.sum(mask_flat, dtype='int32')
>> pred_unmasked = pred[mask_flat.nonzero()]
>> max_indices = T.argmax(pred_unmasked, axis=1)
>> pred_zero = T.set_subtensor(pred_unmasked[:], 0)
>> pred_one_hot = T.set_subtensor(pred_zero[T.arange(length), 
>> max_indices], 1)
>> y_unmasked = y[mask_flat.nonzero()]
>> unchanged_col = pred_one_hot[:, preprocess.unchanged_index]
>> pred_up = T.flatten(pred_one_hot[T.eq(unchanged_col, 0).nonzero(), 
>> preprocess.up_index])
>> pred_down = T.flatten(pred_one_hot[T.eq(unchanged_col, 0).nonzero(), 
>> preprocess.down_index])
>> y_up = T.flatten(y_unmasked[T.eq(unchanged_col, 0).nonzero(), 
>> preprocess.up_index])
>> y_down = T.flatten(y_unmasked[T.eq(unchanged_col, 0).nonzero(), 
>> preprocess.down_index])
>> diff_up = T.abs_(pred_up - y_up)
>> diff_down = T.abs_(pred_down - y_down)
>> diff_sum = diff_up + diff_down
>> num_win = T.sum(T.eq(diff_sum, 0))
>> num_lose = T.sum(T.eq(diff_sum, 2))
>> loss = -1 * (num_win - num_lose) / length
>> return loss
>>
>>
>>
>>
>>
>>
>>
>>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to theano-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[theano-users] Re: Gradients are always 0 for custom loss function

2017-03-06 Thread taromakino

Thanks, is there a way to know what operations are allowed in the context 
of building a loss function? I can see that T.eq would have 0 gradients 
everywhere except the discontinuous point at which the function equals 1, 
but I'm having trouble imagining what the gradient would be for something 
like T.set_subtensor, which also seems to have a 0 gradient.

On Monday, March 6, 2017 at 11:38:59 AM UTC-6, Jesse Livezey wrote:
>
> There is nothing wrong with using T.eq. But, the derivatives with respect 
> to the inputs will be zero, so your cost function is not useful for 
> training.
>
> On Sunday, March 5, 2017 at 8:03:12 PM UTC-8, 
> tarom...@alum.northwestern.edu wrote:
>>
>> Thanks Jesse, so are there operations that are "safe" to use and others 
>> that aren't? Where can I find this information? Also, I've used T.eq before 
>> in another custom loss function which works correctly and doesn't return 0 
>> gradients, but my use case there is in computing array indices, such as the 
>> way I'm using it in this line:
>>
>> pred_up = T.flatten(pred_one_hot[T.eq(unchanged_col, 0).nonzero(), 
>> preprocess.up_index])
>>
>> Is T.eq ok to use in some contexts and not others?
>>
>> On Sunday, March 5, 2017 at 9:14:20 PM UTC-6, Jesse Livezey wrote:
>>>
>>> The gradient of T.eq will be zero (almost) everywhere and you're using 
>>> it to compute num_win and num_lose.
>>>
>>> On Sunday, March 5, 2017 at 2:42:14 PM UTC-8, 
>>> tarom...@alum.northwestern.edu wrote:

 Also, the return values of this loss function are small compared to 
 cross-entropy, some sample values after random initialization were around 
 +/- 0.01. There is a LSTM layer and the input sequences are thousands of 
 elements long, so I suspected vanishing gradients. However, I'm printing 
 out the min, max, and mean of the gradients w.r.t each parameter, and they 
 are all exactly equal to 0, which seems to indicate a different problem.

 On Sunday, March 5, 2017 at 3:59:42 PM UTC-6, 
 tarom...@alum.northwestern.edu wrote:
>
> I have defined a custom loss function, and despite the loss function 
> returning correct values given the inputs, the gradients are all always 0 
> w.r.t each of my parameters. I am not suppressing any theano errors 
> including the disconnected input error, so I can't explain what is 
> causing 
> this. I have copied the loss function below; in words, I first convert a 
> 3 
> class softmax output into a one hot representation, then I compare a 
> subset 
> of it to the response and compute a quantity of interest. More generally, 
> I 
> was under the impression that if one could express a function using 
> theano 
> ops, it could be used as a loss function. Is this not the case?
>
> def calc_one_hot_loss(pred, y, mask):
> mask_flat = T.flatten(mask)
> length = T.sum(mask_flat, dtype='int32')
> pred_unmasked = pred[mask_flat.nonzero()]
> max_indices = T.argmax(pred_unmasked, axis=1)
> pred_zero = T.set_subtensor(pred_unmasked[:], 0)
> pred_one_hot = T.set_subtensor(pred_zero[T.arange(length), 
> max_indices], 1)
> y_unmasked = y[mask_flat.nonzero()]
> unchanged_col = pred_one_hot[:, preprocess.unchanged_index]
> pred_up = T.flatten(pred_one_hot[T.eq(unchanged_col, 0).nonzero(), 
> preprocess.up_index])
> pred_down = T.flatten(pred_one_hot[T.eq(unchanged_col, 0).nonzero(), 
> preprocess.down_index])
> y_up = T.flatten(y_unmasked[T.eq(unchanged_col, 0).nonzero(), 
> preprocess.up_index])
> y_down = T.flatten(y_unmasked[T.eq(unchanged_col, 0).nonzero(), 
> preprocess.down_index])
> diff_up = T.abs_(pred_up - y_up)
> diff_down = T.abs_(pred_down - y_down)
> diff_sum = diff_up + diff_down
> num_win = T.sum(T.eq(diff_sum, 0))
> num_lose = T.sum(T.eq(diff_sum, 2))
> loss = -1 * (num_win - num_lose) / length
> return loss
>
>
>
>
>
>
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to theano-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[theano-users] Re: Gradients are always 0 for custom loss function

2017-03-06 Thread Jesse Livezey

There is nothing wrong with using T.eq. But, the derivatives with respect 
to the inputs will be zero, so your cost function is not useful for 
training.

On Sunday, March 5, 2017 at 8:03:12 PM UTC-8, 
tarom...@alum.northwestern.edu wrote:
>
> Thanks Jesse, so are there operations that are "safe" to use and others 
> that aren't? Where can I find this information? Also, I've used T.eq before 
> in another custom loss function which works correctly and doesn't return 0 
> gradients, but my use case there is in computing array indices, such as the 
> way I'm using it in this line:
>
> pred_up = T.flatten(pred_one_hot[T.eq(unchanged_col, 0).nonzero(), 
> preprocess.up_index])
>
> Is T.eq ok to use in some contexts and not others?
>
> On Sunday, March 5, 2017 at 9:14:20 PM UTC-6, Jesse Livezey wrote:
>>
>> The gradient of T.eq will be zero (almost) everywhere and you're using it 
>> to compute num_win and num_lose.
>>
>> On Sunday, March 5, 2017 at 2:42:14 PM UTC-8, 
>> tarom...@alum.northwestern.edu wrote:
>>>
>>> Also, the return values of this loss function are small compared to 
>>> cross-entropy, some sample values after random initialization were around 
>>> +/- 0.01. There is a LSTM layer and the input sequences are thousands of 
>>> elements long, so I suspected vanishing gradients. However, I'm printing 
>>> out the min, max, and mean of the gradients w.r.t each parameter, and they 
>>> are all exactly equal to 0, which seems to indicate a different problem.
>>>
>>> On Sunday, March 5, 2017 at 3:59:42 PM UTC-6, 
>>> tarom...@alum.northwestern.edu wrote:

 I have defined a custom loss function, and despite the loss function 
 returning correct values given the inputs, the gradients are all always 0 
 w.r.t each of my parameters. I am not suppressing any theano errors 
 including the disconnected input error, so I can't explain what is causing 
 this. I have copied the loss function below; in words, I first convert a 3 
 class softmax output into a one hot representation, then I compare a 
 subset 
 of it to the response and compute a quantity of interest. More generally, 
 I 
 was under the impression that if one could express a function using theano 
 ops, it could be used as a loss function. Is this not the case?

 def calc_one_hot_loss(pred, y, mask):
 mask_flat = T.flatten(mask)
 length = T.sum(mask_flat, dtype='int32')
 pred_unmasked = pred[mask_flat.nonzero()]
 max_indices = T.argmax(pred_unmasked, axis=1)
 pred_zero = T.set_subtensor(pred_unmasked[:], 0)
 pred_one_hot = T.set_subtensor(pred_zero[T.arange(length), 
 max_indices], 1)
 y_unmasked = y[mask_flat.nonzero()]
 unchanged_col = pred_one_hot[:, preprocess.unchanged_index]
 pred_up = T.flatten(pred_one_hot[T.eq(unchanged_col, 0).nonzero(), 
 preprocess.up_index])
 pred_down = T.flatten(pred_one_hot[T.eq(unchanged_col, 0).nonzero(), 
 preprocess.down_index])
 y_up = T.flatten(y_unmasked[T.eq(unchanged_col, 0).nonzero(), 
 preprocess.up_index])
 y_down = T.flatten(y_unmasked[T.eq(unchanged_col, 0).nonzero(), 
 preprocess.down_index])
 diff_up = T.abs_(pred_up - y_up)
 diff_down = T.abs_(pred_down - y_down)
 diff_sum = diff_up + diff_down
 num_win = T.sum(T.eq(diff_sum, 0))
 num_lose = T.sum(T.eq(diff_sum, 2))
 loss = -1 * (num_win - num_lose) / length
 return loss









-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to theano-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[theano-users] Re: custom binary_crossentropy gives value nan / inf

2017-03-06 Thread taromakino

You may be trying to compute log(0) or something close to it.

On Monday, March 6, 2017 at 8:36:58 AM UTC-6, Onkar Pandit wrote:
>
> Hi All,
>
> I want to modify loss function to tackle class imbalance problem. For this 
> I am trying to penalize the loss function in following way;
>
> def binary_ce(output,target):
>
> penalty = 50
> return  -(target * tensor.log(output) *penalty + (1.0 - target) * 
> tensor.log(1 - output))
>
>
> But, I am getting loss value as nan 
>
>
> Can anyone please explain what am I missing ?
>
>
> Thanks for your patience.
>
>
> Thanks and Regards,
>
> Onkar
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to theano-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[theano-users] custom binary_crossentropy gives value nan / inf

2017-03-06 Thread Onkar Pandit

Hi All,

I want to modify loss function to tackle class imbalance problem. For this 
I am trying to penalize the loss function in following way;

def binary_ce(output,target):

penalty = 50
return  -(target * tensor.log(output) *penalty + (1.0 - target) * 
tensor.log(1 - output))


But, I am getting loss value as nan 


Can anyone please explain what am I missing ?


Thanks for your patience.


Thanks and Regards,

Onkar

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to theano-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [theano-users] Re: CNMeM is disabled, CuDNN not available

2017-03-06 Thread Kiuhnm Mnhuik

You should use gpuarray.preallocate=0.7 (choose the % you want)

On Monday, March 6, 2017 at 8:31:58 AM UTC+1, Ani wrote:
>
> Sir I think GPUArray will itself control GPU memory allocation, there is 
> no need of cnmem now as it is depreciated whic is controlled via CUDA 
> backend.
>
> On Mon, Mar 6, 2017 at 12:20 PM, anishi gupta  > wrote:
>
>> Hello Sir,
>>
>> When I used command "device=gpuN python file.py" then code is 
>> successfullt running on gpu.As you said theano FLAGS will override .rc file 
>> settings so I think there is no need of making .theanorc file. Kindly tell 
>> me how to make cuda memory faster if I dont use cnmem...I want to gain 
>> speedup in gpu as compare to cpu
>>
>>
>>
>> On Fri, Mar 3, 2017 at 7:08 PM, Ramana Subramanyam > > wrote:
>>
>>> Hi, 
>>> You don't set the device parameter. Set your device to gpuN and try 
>>> running again. cnmem is from the old backend, which will be depreciated 
>>> soon. You should start using device=cudaN and 
>>> config.gpuarray.preallocate instead of lib.cnmem. Also, you seem to 
>>> override the config file with THEANO_FLAGS. Use either of them (remember 
>>> FLAGS can override .rc file settings).
>>>
>>> Ramana
>>>
>>> On Friday, March 3, 2017 at 7:06:24 PM UTC+5:30, Ani wrote:

 No sir I have put
 [lib] 
 cnmem=1 
 but it shows error of lib command not found..what to do sir now.
 On 3 Mar 2017 5:07 p.m., "Kiuhnm Mnhuik"  wrote:

 You probably forgot [lib]:

 [lib]
 cnmem = 1

 Hello Sir...
>
> I have successfully installed CuDNN and it is also available...but 
> problem is with cnmem..When I wrote cnmem=1 in my .theanorc file which is 
> at home directory..it gives error on running any code with cpu or 
> gpu..please have a look at error(screenshot is attached).
>
> Thanks in advance
> Regards
> Anishi Gupta
>
> On Saturday, 28 May 2016 11:05:29 UTC+5:30, Poornachandra Sandur wrote:
>>
>> Hi,
>>CNMem is  fast CUDA memory allocator ... if you 
>> want to set it you can do it by writing the below lines in your 
>> .theanorc 
>> file  ...
>>
>> [lib]
>> cnmem=1
>>
>>
>> ---
>>
>> For CUDNN you can do it with the following steps :
>>
>> *First download the file : cudnn-7.0-linux-x64-v3.0-prod.tgz*
>>
>> *Extract it to home directory*
>>
>> *and set the LD_LIBRARY_PATH to the above extracted directory*
>>
>> *and then follow the below steps :*
>>
>>
>>
>> *sudo cp /home/poornachandra/cuda/include/cudnn.h 
>> /usr/local/cuda-7.5/include/*
>>
>> *sudo cp /home/poornachandra/cuda/lib64/libcudnn* 
>> /usr/local/cuda-7.5/lib64/*
>>
>>
>> On Fri, May 27, 2016 at 1:22 PM, Ramana Subramanyam <
>> vxrr...@gmail.com> wrote:
>>
>>> Hello there, 
>>> If anyone is running theano on OSX, you need to make  a few changes 
>>> to Poornachandra's answer. Instead of setting the path of 
>>> *LD_LIBRARY_PATH, 
>>> *set *DYLD_LIBRARY_PATH *to the extracted directory of CUDNN. I 
>>> have extracted them into `/home/packages/` and I am using CUDA 7.5. 
>>> Copy 
>>> those files to the following destination, 
>>>
>>>
>>> 1) sudo cp ~/packages/cuda/lib/libcudnn* /Developer/NVIDIA/CUDA-7.5/
>>> lib/
>>>
>>>
>>>
>>> 2) sudo cp ~/packages/cuda/include/cudnn.h /Developer/NVIDIA/CUDA-
>>> 7.5/include/
>>>
>>>
>>>
>>> Hope this helps! 
>>>
>>> On Tuesday, March 15, 2016 at 7:52:50 AM UTC+5:30, Ashutosh Modi 
>>> wrote:

 Hi,

 I am running the latest version of Theano 0.8 on linux server using 
 the GPUs. When I run my code I get the following message :

 Using gpu device 0: GeForce GTX TITAN X (CNMeM is disabled, CuDNN 
 not available)


 Is it normal or something is lacking or is it some bug? I ran my 
 code after clearing the .thano directory.


 Please help me in resolving this.


 Thanks,

 Ashutosh

>>> -- 
>>>
>>> --- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "theano-users" group.
>>> To unsubscribe from this group and stop receiving emails from it, 
>>> send an email to theano-users...@googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>>
>> -- 
>> warm regards,
>> Poornachandra Sandur,
>> Mob.No.+91-9901913049
>>
> -- 

 --- 
 You received this message because you are subscribed to a topic in the 
 Google Groups "theano-users" group.
 To unsubscribe from this topic, visit

[theano-users] Re: Gradients are always 0 for custom loss function

[theano-users] Re: Gradients are always 0 for custom loss function

[theano-users] Re: Gradients are always 0 for custom loss function

[theano-users] Re: custom binary_crossentropy gives value nan / inf

[theano-users] custom binary_crossentropy gives value nan / inf

Re: [theano-users] Re: CNMeM is disabled, CuDNN not available

6 matches

Site Navigation

Mail list logo

Footer information