Dear all,
I have two vectors:
*group *with shape (1000, 1) - uint8
*value *with shape (1000, 1) - float32
I need to group rows in *value *by index in *group *and then take the
average for each group. Currently, I'm doing this by introducing a matrix
*conversion *of shape (n_groups, 1000 ) where *conversion[row,col] =
int( group[col] == row )*
and then my result is achieved with
* result = T.dot( conversion, value )*
There are around 500 groups usually so that leads to around 500000
multiplications. Groups are pretty small so there is a lot of
multiplication by 0 happening and would expect that could be sped up by
ignoring those 0s.
However, after profiling and many many optimizations I ended up with this
operation taking 80% of my runtime. Thus, i am searching for ways to speed
this up.
Afaik, I could use *scan *together with *typed_list *and slice *value *in
each scan evaluation. My questions are:
1. Is typed_list supported on the GPU?
2. Does scan parallelise efficiently on the GPU if scan iterations are
independent
3. Are there any other ways to do this?
--
---
You received this message because you are subscribed to the Google Groups
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.