Hi again,

 > While we're at it, let's discuss the dynamic dispatching mechanism we'd
> ideally want. I see two options:
>
> (1) A global function pointer table. So, one could for example set:
> viennacl::internal_blas::sgemv_ptr = &viennacl::cblas_wrapper;
> where cblas_wrapper essentially checks for the stride in the non-leading
> dimension and forwards to cblas if this stride is one. Of course, if the
> current backend is different, cblas_wrapper is not defined, and
> cublas_wrapper can be defined instead.

I'd prefer to have this function table per object or per memory backend 
rather than being global, otherwise this will sooner or later bite us in 
a multi-threaded setting. We (or a user) might want to use one 
implementation of a certain operation for smaller or skinny matrices and 
other implementations for larger/square matrices, in which case things 
are much easier if tied to the particular object.


> I like this solution a lot, since this allows one to mix multiple blas
> implementation in the same program. This can be useful in some case
> (OpenBlas is faster than MKL for BLAS3, but MKL is supposedly faster for
> all the rest). HOWEVER, this requires linkage if we want to avoid
> multiple definitions of that global pointer table.

That's another reason why it shouldn't be global ;-)

> Since we now provide
> a libviennacl.so, though, we could include the global table therein, and
> one would link with it if he wants to use the additional
> functionnalities. Plus, if one has his own blas function he wants to
> benchmark against ours, for example, then this solution is very convenient.

The shared library is available in addition to the header-only 
implementation, it's not compulsory. We might change that for ViennaCL 
2.0.0, but 1.x.y will stay header-only.


> (2) A template parameter. So that one would write:
> viennacl::prod<CBlasBackend>(), similarly to how I did with UMinTL.
> However, I am not very fond of this solution for ViennaCL, because it
> will create a huge bloat in the code, since templates essentially need
> to propagate, and it might screw up a bit the template deduction
> mechanism of some compiler (since prod<> is already templated with the
> underlying ScalarType...)

Same here, I consider this to be a wrong use of templates for the 
reasons you mentioned. Fortunately we don't have to worry about 
performance for something tiny like 3x3-matrices, so a bit of runtime 
logic is not an issue.

Best regards,
Karli


------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
_______________________________________________
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Reply via email to