Thanks a lot!
You are completely right. With these changes the code is ~20% faster.
Thanks,
Fran.
El 05/04/2012, a las 23:19, pierre castellani escribió:
Hi Francisco,
Just my 2cents on your kernel, I ve learned that pow should be avoid
(in
my old days ;-) ). Just try:
That is for a Windows system. I have Linux.
From the pyCuda documentation section Just-in-time Compilation:
If keep is True, the compiler output directory is kept, and a line
indicating its location in the file system is printed for debugging
purposes.
There is nothing printed when I set
That is for a Windows system. I have Linux.
This doesn't make a difference, just the paths probably point to /tmp/
like on my system.
From the pyCuda documentation section Just-in-time Compilation:
If keep is True, the compiler output directory is kept, and a line
indicating its location in
Ok, the location is only printed when you compile code that has not been
compiled before.
I have the file now.
Michiel.
Tomi Pieviläinentomi.pievilai...@iki.fi 4/6/2012 11:51 AM
That is for a Windows system. I have Linux.
This doesn't make a difference, just the paths probably point to
Do you mind posting the final code here for future reference (as a
gist perhaps)?
Also, another optimization might be to remove the (slow) sqrt() in
each distance calculation and then do sqrt() of the bin labels in the
reduction step.
On Fri, Apr 6, 2012 at 3:56 AM, Francisco Villaescusa Navarro
Hi Francisco,
Good to see that it is useful, I was thinking about other way to speed
it.
Do you really need L2 norm? You could use some other distance
calculation that could be faster.
Did you look at cuda spécific fonction (for example sqrtf)?
Thanks,
Pierre.
Le vendredi 06 avril 2012 à
Thanks for all the suggestions!
Regarding removing sqrt: it seems that the code only gains about ~1%,
and you lose the capacity to easily define linear intervals...
I have tried with sqrt and sqrtf, but there is not difference in the
total time (or it is very small).
The code to find the