Hi,
Oh, I get it better now. I am not entirely convinced, though ;)
>From my experience, the overhead of the jit launch is negligible compared
to the compilation of one kernel. I'm not sure whether compiling two
kernels in the same program or two different program creates a big
difference. Plus,
Hi Philippe,
> I don't understand why this would go through more than one compilation...
> This kernel is compiled only once, the value of flip_sign and reciprocal
> only changes the dynamic value of the argument, not the source code.
>
> This would eventually result in:
>
> if(alpha_reciprocal)
Hey,
2014/1/25 Karl Rupp
> Hey hey hey,
>
>
> > Convergence depends on what is inside generate_execute() ;-) How is
>
>> the problem with alpha and beta residing on the GPU addressed? How
>> will the batch-compilation look like? The important point is that
>> for the default axp
Hey hey,
2014/1/25 Karl Rupp
> Hi,
>
>
> I prefer option 3. This would allow for something like :
>>
>> if(size(x)>1e5 && stride==1 && start==0){
>>
>
> Here we also need to check the internal_size to fit the vector width
>
>
>
>> //The following steps are costly for small vectors
>> Numer
Hey hey hey,
> Convergence depends on what is inside generate_execute() ;-) How is
> the problem with alpha and beta residing on the GPU addressed? How
> will the batch-compilation look like? The important point is that
> for the default axpy kernels we really don't want to go thr
Hi,
> I prefer option 3. This would allow for something like :
>
> if(size(x)>1e5 && stride==1 && start==0){
Here we also need to check the internal_size to fit the vector width
>
> //The following steps are costly for small vectors
> NumericT cpu_alpha = alpha //copy back to host when the s
Hey,
2014/1/24 Karl Rupp
> Hi,
>
>
> > I was in fact wondering why one passed reciprocal_alpha and flip_sign
>
>> into the kernel. After thinking more about it, I have noticed that this
>> permits us to do the corresponding inversion/multiplication within the
>> kernel, and therefore avoid one
Hi,
> I was in fact wondering why one passed reciprocal_alpha and flip_sign
> into the kernel. After thinking more about it, I have noticed that this
> permits us to do the corresponding inversion/multiplication within the
> kernel, and therefore avoid one some latency penalty / kernel launch
> o
Hi Karl,
2014/1/24 Karl Rupp
> Hey,
>
> > I am a bit confused, is there any reason for using "reciprocal" and
> > "flip_sign", instead of just changing the scalar accordingly?
>
> yes (with a drawback I'll discuss at the end): Consider the family of
> operations
>
> x = +- y OP1 a +- z OP2
Hey,
> I am a bit confused, is there any reason for using "reciprocal" and
> "flip_sign", instead of just changing the scalar accordingly?
yes (with a drawback I'll discuss at the end): Consider the family of
operations
x = +- y OP1 a +- z OP2 b
where x, y, and z are vectors, OP1 and OP2 ar
Hello,
I am a bit confused, is there any reason for using "reciprocal" and
"flip_sign", instead of just changing the scalar accordingly?
Best regards,
Philippe
--
CenturyLink Cloud: The Leader in Enterprise Cloud Services
11 matches
Mail list logo