Hi Matthew,
We'll have a doodle meeting pretty soon about the 1.6 release plan, feel
free to register here, http://doodle.com/fbdespb57uq3r7y2, and to let us
know if the times are not convenient!
I had released a short patch for supporting BLAS3 from OpenBlas and CuBlas,
but it turned out to require more work than expected to be cleanly
integrated in the current branch. Since then, the kernel generator has
improved to equal the performance of MAGMA/CuBlas, which has led me to
post-pone my plans for external blas linking. Since there is a clear demand
for that feature, it might make it partially to ViennaCL 1.6. Could you
tell us more about which BLAS functionnality you'd like to be wrapped?
Anyway, below are my ideas for ViennaCL 1.6. We'll discuss them during the
incoming IRC meeting, and we'll see what we do with them.
There are big plans for performance-tuning and benchmarking. In the
mid-term, there'll be input-specific BLAS call, meaning that each kernel
will be tailored for the sizes of the input parameters (using machine
learning techniques). However, this might not be entirely robust/ready for
the 1.6 release (I'm not sure, it'll depend on the incoming IRC talk).
However, there will be more temporary removal mechanisms in ViennaCL 1.6
*for the OpenCL backend*, and potentially more operators. Assuming the
proper data-structures are already created, I'm confident that the
following will make it to the 1.6 release:
---------------------------------------------------------------------------------------------------
//Transparently calls one single auto-tuned and dynamically generated kernel
x = y + element_exp(z) + element_prod(x,y).
---------------------------------------------------------------------------------------------------
//Calls a single custom auto-tuned kernel for computing "x=y+z, z=x+y". x,y
and z are accessed only once
std::vector<statement> packed_operations{statement(x, op_assign(), y + z)
,
statement(z, op_assign(), x + y)};
viennacl::device_specific::execute(profiles::get(VECTOR_AXPY_TYPE,
FLOAT_TYPE), packed_operations);
---------------------------------------------------------------------------------------------------
//Calls a single custom auto-tuned kernel for computing the "s0 = max(x),
s1=min(x)". x is accessed only once.
std::vector<statement> packed_operations{ statement(s0, op_assign(),
reduce<max>(x))
,
statement(s1, op_assign(), reduce<min>(x)) };
viennacl::device_specific::execute(profiles::get(REDUCTION_TYPE,
FLOAT_TYPE), packed_operations);
---------------------------------------------------------------------------------------------------
This could be a great news if you're in need of more performance on an
OpenCL device.
In case you're interested, I'd like to also point out that there will be an
external template library for doing nonlinear gradient-based optimization
using ViennaCL. It will support Gradient-Descent, Nonlinear Conjugate
Gradient, BFGS, L-BFGS, Truncated Newton, Newton.
Of course, these things are just what I'm working on for the 1.6 release,
and other developers may have other new features to propose.
Best regards,
Philippe
2014-05-13 19:31 GMT+02:00 Matthew Musto <matthew.mu...@gmail.com>:
> Karl Et al.,
>
> I was looking at the roadmap for 1.6 and was most interested in the
> ability to leverage external blas libraries. ACML 6 is out in beta and
> it's the first release geared toward heterogeneous compute. I suspect
> major performance improvements may be possible. While thinking about this
> though, it did lead me to wonder about the performance tuning and
> benchmarking as it introduces another layer of parameters. Is this feature
> still in 1.6 and if so when should a beta be expected?
>
> I also noticed the continued push to rely upon boost less and less. I
> think that is a great idea. Can you shed any light on what we can expect
> in 1.6?
>
> Thanks,
> -Matt
> On Jan 21, 2014 4:53 PM, "Karl Rupp" <r...@iue.tuwien.ac.at> wrote:
>
>> Hi Philippe,
>>
>> > I'm slowly getting back to ViennaCL.
>> > I have added one bullet point to the roadmap:
>> > * Full integration of the micro-scheduler and the generator
>>
>> Yep, definitely. See issue #8, it's already on the TODO-list for the
>> 1.5.x branch. The nice thing is that this is completely internal work,
>> so we can switch over without doing anything harmful to the public API.
>>
>>
>> > I will be working on cleaning GEMM (i.e. better integration of the
>> > multiple BLAS backends, and harmonize the kernels using the
>> > column&trans<->row¬rans identity.) until I go back to France, in 1
>> > week.
>>
>> I'll comment on this on the other thread you started.
>>
>>
>> > I have also noticed that the size checking could be moved upwards
>> > in the dispatching mechanism, for now, they are duplicated between
>> > opencl/cuda/openmp .
>>
>> My initial intention was to check the sizes only in the common layer,
>> but I failed to apply this consistently and hence used double checking
>> at some point. Feel free to move this to the generic dispatcher routines.
>>
>>
>> > Once this is done, I will probably work towards the
>> > full integration of the micro-scheduler. Can we get rid of
>> op_executor<>?
>>
>> As soon as the micro-scheduler is working, op_executor is obsolete. I
>> think we will need to have both around for a very short time-frame to do
>> all the testing and verifications.
>>
>> Best regards,
>> Karli
>>
>>
>>
>> ------------------------------------------------------------------------------
>> CenturyLink Cloud: The Leader in Enterprise Cloud Services.
>> Learn Why More Businesses Are Choosing CenturyLink Cloud For
>> Critical Workloads, Development Environments & Everything In Between.
>> Get a Quote or Start a Free Trial Today.
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
>>
>> _______________________________________________
>> ViennaCL-devel mailing list
>> ViennaCL-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/viennacl-devel
>>
>
------------------------------------------------------------------------------
"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.
Get unparalleled scalability from the best Selenium testing platform available
Simple to use. Nothing to install. Get started now for free."
http://p.sf.net/sfu/SauceLabs
_______________________________________________
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel