Re: [ViennaCL-devel] AXPY and "reciprocal", "flip_sign" parameters

2014-01-24 Thread Philippe Tillet
Hi, Oh, I get it better now. I am not entirely convinced, though ;) >From my experience, the overhead of the jit launch is negligible compared to the compilation of one kernel. I'm not sure whether compiling two kernels in the same program or two different program creates a big difference. Plus,

Re: [ViennaCL-devel] AXPY and "reciprocal", "flip_sign" parameters

2014-01-24 Thread Karl Rupp
Hi Philippe, > I don't understand why this would go through more than one compilation... > This kernel is compiled only once, the value of flip_sign and reciprocal > only changes the dynamic value of the argument, not the source code. > > This would eventually result in: > > if(alpha_reciprocal)

Re: [ViennaCL-devel] AXPY and "reciprocal", "flip_sign" parameters

2014-01-24 Thread Philippe Tillet
Hey, 2014/1/25 Karl Rupp > Hey hey hey, > > > > Convergence depends on what is inside generate_execute() ;-) How is > >> the problem with alpha and beta residing on the GPU addressed? How >> will the batch-compilation look like? The important point is that >> for the default axp

Re: [ViennaCL-devel] AXPY and "reciprocal", "flip_sign" parameters

2014-01-24 Thread Philippe Tillet
Hey hey, 2014/1/25 Karl Rupp > Hi, > > > I prefer option 3. This would allow for something like : >> >> if(size(x)>1e5 && stride==1 && start==0){ >> > > Here we also need to check the internal_size to fit the vector width > > > >> //The following steps are costly for small vectors >> Numer

Re: [ViennaCL-devel] AXPY and "reciprocal", "flip_sign" parameters

2014-01-24 Thread Karl Rupp
Hey hey hey, > Convergence depends on what is inside generate_execute() ;-) How is > the problem with alpha and beta residing on the GPU addressed? How > will the batch-compilation look like? The important point is that > for the default axpy kernels we really don't want to go thr

Re: [ViennaCL-devel] AXPY and "reciprocal", "flip_sign" parameters

2014-01-24 Thread Karl Rupp
Hi, > I prefer option 3. This would allow for something like : > > if(size(x)>1e5 && stride==1 && start==0){ Here we also need to check the internal_size to fit the vector width > > //The following steps are costly for small vectors > NumericT cpu_alpha = alpha //copy back to host when the s

Re: [ViennaCL-devel] AXPY and "reciprocal", "flip_sign" parameters

2014-01-24 Thread Philippe Tillet
Hey, 2014/1/24 Karl Rupp > Hi, > > > > I was in fact wondering why one passed reciprocal_alpha and flip_sign > >> into the kernel. After thinking more about it, I have noticed that this >> permits us to do the corresponding inversion/multiplication within the >> kernel, and therefore avoid one

Re: [ViennaCL-devel] AXPY and "reciprocal", "flip_sign" parameters

2014-01-24 Thread Karl Rupp
Hi, > I was in fact wondering why one passed reciprocal_alpha and flip_sign > into the kernel. After thinking more about it, I have noticed that this > permits us to do the corresponding inversion/multiplication within the > kernel, and therefore avoid one some latency penalty / kernel launch > o

Re: [ViennaCL-devel] AXPY and "reciprocal", "flip_sign" parameters

2014-01-24 Thread Philippe Tillet
Hi Karl, 2014/1/24 Karl Rupp > Hey, > > > I am a bit confused, is there any reason for using "reciprocal" and > > "flip_sign", instead of just changing the scalar accordingly? > > yes (with a drawback I'll discuss at the end): Consider the family of > operations > > x = +- y OP1 a +- z OP2

Re: [ViennaCL-devel] AXPY and "reciprocal", "flip_sign" parameters

2014-01-24 Thread Karl Rupp
Hey, > I am a bit confused, is there any reason for using "reciprocal" and > "flip_sign", instead of just changing the scalar accordingly? yes (with a drawback I'll discuss at the end): Consider the family of operations x = +- y OP1 a +- z OP2 b where x, y, and z are vectors, OP1 and OP2 ar

[ViennaCL-devel] AXPY and "reciprocal", "flip_sign" parameters

2014-01-24 Thread Philippe Tillet
Hello, I am a bit confused, is there any reason for using "reciprocal" and "flip_sign", instead of just changing the scalar accordingly? Best regards, Philippe -- CenturyLink Cloud: The Leader in Enterprise Cloud Services