Re: [theano-users] why does this gradient is invalid?

Frédéric Bastien Wed, 09 Aug 2017 15:39:25 -0700

This is a bug in one Theano optimization: local_dimshuffle_subtensor

Thanks for the report. I made an issue so that we don't forget it:


https://github.com/Theano/Theano/issues/6288

Frédéric

On Wed, Aug 9, 2017 at 4:50 AM 佐藤優 <[email protected]> wrote:

> I wonder why bellow code is invalid..
>
> from numpy import *
> import theano.tensor as T
> x = T.dmatrix("x")
> mx = x[...,None,:]
> a = T.ones((1,3))
> T.grad(mx[...,0].dot(a).sum(), a).eval({x:ones((5,10)).astype(float32)})
>
> bellow error is emerged.
>
> ---------------------------------------------------------------------------ValueError
>                                 Traceback (most recent call 
> last)/home/yu/anaconda3/lib/python3.5/site-packages/theano/compile/function_module.py
>  in __call__(self, *args, **kwargs)    883             outputs =\--> 884      
>            self.fn() if output_subset is None else\    885                 
> self.fn(output_subset=output_subset)
> ValueError: Shape mismatch: A.shape[1] != x.shape[0]
>
> During handling of the above exception, another exception occurred:
> ValueError                                Traceback (most recent call 
> last)<ipython-input-74-52410617594a> in <module>()      3 mx = x[...,None,:]  
>     4 a = T.ones((1,3))----> 5 T.grad(mx[...,0].dot(a).sum(), 
> a).eval({x:ones((5,10)).astype(float32)})
> /home/yu/anaconda3/lib/python3.5/site-packages/theano/gof/graph.py in 
> eval(self, inputs_to_values)    517         args = [inputs_to_values[param] 
> for param in inputs]    518 --> 519         rval = 
> self._fn_cache[inputs](*args)    520     521         return rval
> /home/yu/anaconda3/lib/python3.5/site-packages/theano/compile/function_module.py
>  in __call__(self, *args, **kwargs)    896                     
> node=self.fn.nodes[self.fn.position_of_error],    897                     
> thunk=thunk,--> 898                     storage_map=getattr(self.fn, 
> 'storage_map', None))    899             else:    900                 # 
> old-style linkers raise their own exceptions
> /home/yu/anaconda3/lib/python3.5/site-packages/theano/gof/link.py in 
> raise_with_op(node, thunk, exc_info, storage_map)    323         # extra long 
> error message in that case.    324         pass--> 325     reraise(exc_type, 
> exc_value, exc_trace)    326     327
> /home/yu/anaconda3/lib/python3.5/site-packages/six.py in reraise(tp, value, 
> tb)    683             value = tp()    684         if value.__traceback__ is 
> not tb:--> 685             raise value.with_traceback(tb)    686         
> raise value    687
> /home/yu/anaconda3/lib/python3.5/site-packages/theano/compile/function_module.py
>  in __call__(self, *args, **kwargs)    882         try:    883             
> outputs =\--> 884                 self.fn() if output_subset is None else\    
> 885                 self.fn(output_subset=output_subset)    886         
> except Exception:
> ValueError: Shape mismatch: A.shape[1] != x.shape[0]
> Apply node that caused the error: 
> CGemv{inplace}(AllocEmpty{dtype='float64'}.0, TensorConstant{1.0}, 
> InplaceDimShuffle{1,0}.0, Rebroadcast{0}.0, TensorConstant{0.0})
> Toposort index: 7
> Inputs types: [TensorType(float64, vector), TensorType(float64, scalar), 
> TensorType(float64, matrix), TensorType(float64, vector), TensorType(float64, 
> scalar)]
> Inputs shapes: [(3,), (), (3, 5), (1,), ()]
> Inputs strides: [(8,), (), (8, 24), (80,), ()]
> Inputs values: [array([  0.00000000e+000,   4.94065646e-324,   
> 9.88131292e-324]), array(1.0), 'not shown', array([ 1.]), array(0.0)]
> Inputs type_num: [12, 12, 12, 12, 12]
> Outputs clients: [[InplaceDimShuffle{x,0}(CGemv{inplace}.0)]]
>
> Debugprint of the apply node:
> CGemv{inplace} [id A] <TensorType(float64, vector)> ''
>  |AllocEmpty{dtype='float64'} [id B] <TensorType(float64, vector)> ''
>  | |TensorConstant{3} [id C] <TensorType(int64, scalar)>
>  |TensorConstant{1.0} [id D] <TensorType(float64, scalar)>
>  |InplaceDimShuffle{1,0} [id E] <TensorType(float64, matrix)> ''
>  | |Alloc [id F] <TensorType(float64, matrix)> ''
>  |   |TensorConstant{(1, 1) of 1.0} [id G] <TensorType(float64, (True, True))>
>  |   |Shape_i{0} [id H] <TensorType(int64, scalar)> ''
>  |   | |x [id I] <TensorType(float64, matrix)>
>  |   |TensorConstant{3} [id C] <TensorType(int64, scalar)>
>  |Rebroadcast{0} [id J] <TensorType(float64, vector)> ''
>  | |Subtensor{int8, ::, int64} [id K] <TensorType(float64, (True,))> ''
>  |   |InplaceDimShuffle{0,x,1} [id L] <TensorType(float64, (False, True, 
> False))> ''
>  |   | |x [id I] <TensorType(float64, matrix)>
>  |   |Constant{0} [id M] <int8>
>  |   |Constant{0} [id N] <int64>
>  |TensorConstant{0.0} [id O] <TensorType(float64, scalar)>
>
> Storage map footprint:
>  - x, Input, Shape: (5, 10), ElemSize: 8 Byte(s), TotalSize: 400 Byte(s)
>  - InplaceDimShuffle{0,x,1}.0, Shape: (5, 1, 10), ElemSize: 8 Byte(s), 
> TotalSize: 400 Byte(s)
>  - Alloc.0, Shape: (5, 3), ElemSize: 8 Byte(s), TotalSize: 120 Byte(s)
>  - InplaceDimShuffle{1,0}.0, Shape: (3, 5), ElemSize: 8 Byte(s), TotalSize: 
> 120 Byte(s)
>  - AllocEmpty{dtype='float64'}.0, Shape: (3,), ElemSize: 8 Byte(s), 
> TotalSize: 24 Byte(s)
>  - Subtensor{int8, ::, int64}.0, Shape: (1,), ElemSize: 8 Byte(s), TotalSize: 
> 8 Byte(s)
>  - Shape_i{0}.0, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
>  - TensorConstant{1.0}, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
>  - TensorConstant{0.0}, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
>  - Constant{0}, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
>  - Rebroadcast{0}.0, Shape: (1,), ElemSize: 8 Byte(s), TotalSize: 8 Byte(s)
>  - TensorConstant{3}, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
>  - TensorConstant{(1, 1) of 1.0}, Shape: (1, 1), ElemSize: 8 Byte(s), 
> TotalSize: 8 Byte(s)
>  - Constant{0}, Shape: (), ElemSize: 1 Byte(s), TotalSize: 1.0 Byte(s)
>  TotalSize: 593.0 Byte(s) 0.000 GB
>  TotalSize inputs: 441.0 Byte(s) 0.000 GB
>
> HINT: Re-running with most Theano optimization disabled could give you a 
> back-trace of when this node was created. This can be done with by setting 
> the Theano flag 'optimizer=fast_compile'. If that does not work, Theano 
> optimizations can be disabled with 'optimizer=None'.
>
>
> I thought above script includes broadcasted operation was wrong,
> So no broadcasting used before gradient operation as follows,
>
> x = T.tensor3("x")
> mx = x
> a = T.ones((1,3))
> T.grad(mx[...,0].dot(a).sum(), a).eval({x:ones((5,1,10)).astype(float32)})
>
> successfully performed and dumped bellow result.
>
> array([[ 5.,  5.,  5.]], dtype=float32)
>
>
> But why did the former case invalid?
>
> Is the gradient with broadcasting mathmatically invalid?
>
> Why does shape miss much happen on gradient?
>
>
> Could you taught me about above question?
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "theano-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [theano-users] why does this gradient is invalid?

Reply via email to