Hi Philippe,

Philippe Tillet <phil.til...@gmail.com> writes:
> Well, it really depends on applications. 2**15 is actually still "fairly
> small", since it is not exactly big enough to be bandwidth-limited (as
> opposed to kernel-launch overhead limited). I'd say considering the low
> bandwidth of PCI-E compared to GDDR5, temporaries should hurt before
> bandwidth. Some fast pen & paper calculation :
>
> 2**15 doubles = 262kB. Bandwidth usually obtained when using PCI Express
> ~4GB/s => Each transfer should cost 262kB/4GB/s ~ 65 microsecond which is
> of the order of the kernel launch overhead. As you can see (if i'm not
> mistaken), you're just at the transition phase.

Thanks, that's helpful; I should have thought to think about it in terms
of bandwidth! I ran the benchmark again having re-enabled OpenCL, and
the speed for 1000 "y = x1 + x2" doubles is ~0.370983s -- so, given that
there are two x variables to fill, 1000 iterations, 65us for data
transfer per variable per iteration, that means about 92us of
unaccounted overhead per operation. [With four operands, the overhead on
this basis is very nearly double.]

Interestingly, I get about 87us of overhead *without* the pure Python
representation of type deductions (ie, _viennacl vs pyviennacl),
suggesting that that only eats around 5 us. But I'm sceptical of these
figures, since I don't know how NumPy manages to be faster than ViennaCL
on both CPU and GPU... Nonetheless, at 2**15 elements, on my system, the
ViennaCL OpenCL GPU implementation is about twice as fast as the CPU
implementation.

This does suggest that I don't need to worry right away about
re-implementing by Python stuff in C++. But, the NumPy speed mystery
remains -- and I fear that NumPy is that fast because more of it is done
in C (though I have not investigated). I'm so concerned about this now,
because I don't want to implement a load more types and operations in
the Python version, only to decide to scrap all that work, and redo it
in C++ later. At some point, of course, there's going to have to be a
(thin) Python code layer, just to make things nicer for users.

For comparison, the numbers on GPU: http://paste.ubuntu.com/5934320/
                        and on CPU: http://paste.ubuntu.com/5933347/


But that's enough of that for now!

Toby


------------------------------------------------------------------------------
Get your SQL database under version control now!
Version control is standard for application code, but databases havent 
caught up. So what steps can you take to put your SQL databases under 
version control? Why should you start doing it? Read more to find out.
http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk
_______________________________________________
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Reply via email to