Dear all,

I have two vectors:
    *group *with shape (1000, 1) - uint8
    *value *with shape (1000, 1) - float32

I need to group rows in *value *by index in *group *and then take the 
average for each group. Currently, I'm doing this by introducing a matrix
    *conversion *of shape (n_groups, 1000 ) where *conversion[row,col] = 
int( group[col] == row )*
and then my result is achieved with
*    result = T.dot( conversion, value )*

There are around 500 groups usually so that leads to around 500000 
multiplications. Groups are pretty small so there is a lot of 
multiplication by 0 happening and would expect that could be sped up by 
ignoring those 0s. 

However, after profiling and many many optimizations I ended up with this 
operation taking 80% of my runtime. Thus, i am searching for ways to speed 
this up. 

Afaik, I could use *scan *together with *typed_list *and slice *value *in 
each scan evaluation. My questions are:
    1. Is typed_list supported on the GPU?
    2. Does scan parallelise efficiently on the GPU if scan iterations are 
independent
    3. Are there any other ways to do this? 

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to