ok, sorry, please scratch the last question, it was the mapping fault, not
passing in a proper context to the wrapper.
On Wed, Jul 13, 2016 at 3:37 PM, Dmitriy Lyubimov <dlie...@gmail.com> wrote:
> Also: when i am using a wrapping constructor to initilaize a MAIN_MEMORY
> matrix around preexisting row-major buffer, when i subsequently try to use
> this matrix, i get the message:
>
> ViennaCL: Internal memory error: not initialised!
>
> why?
>
>
> On Wed, Jul 13, 2016 at 2:01 PM, Dmitriy Lyubimov <dlie...@gmail.com>
> wrote:
>
>> So fast_copy still copies the memory and has copying overhead, even with
>> MAIN_MEMORY context?
>>
>> Is there a way to do shallow copying (i.e. just pointer initialization)
>> to the matrix data buffer? Isn't it what some constructors of matrix or
>> matrix_base do?
>>
>> What i am getting at, it looks like i am getting a significant overhead
>> for just copying -- actually, it seems i am getting double overhead -- once
>> when i prepare padding and all as required by the internal_size?(), and
>> then i pass it into the fast_copy() which apparently does copying again,
>> even if we are using host memory matrices.
>>
>> all in all, by my estimates this copying back and forth (which is,
>> granted, is not greatly optimized on our side) takes ~15..17 seconds out of
>> 60 seconds total when multiplying 10k x 10k dense arguments via ViennaCL. I
>> also optimize to -march=haswell and use -ffast-math, without those i seem
>> to fall too far behind what R + openblas can do in this test. Then, my
>> processing time swells up to 2 minutes without optimizing for non-compliant
>> arithmetics.
>>
>> If i can wrap the buffer and avoid copying for MAIN_MEMORY context, i'd
>> be shaving off another 10% or so of the execution time. Which would make me
>> happier, as i probably would be able to beat openblas given custom cpu
>> architecture flags.
>>
>> On the other hand, bidmat (which allegedly uses mkl) does the same test,
>> double precision, in under 10 seconds. I can't fathom how, but it does. I
>> have a haswell-E platform.
>>
>> thank you.
>> dmitriy
>>
>> On Tue, Jul 12, 2016 at 9:27 AM, Karl Rupp <r...@iue.tuwien.ac.at> wrote:
>>
>>> Hi,
>>>
>>> > One question: you mentioned padding for the `matrix` type. When i
>>>
>>>> initialize the `matrix` instance, i only specify dimensions. how do I
>>>> know padding values?
>>>>
>>>
>>> if you want to provide your own padded dimensions, consider using
>>> matrix_base directly. If you want to query the padded dimensions, use
>>> internal_size1() and internal_size2() for the internal number of rows and
>>> columns.
>>>
>>> http://viennacl.sourceforge.net/doc/manual-types.html#manual-types-matrix
>>>
>>> Best regards,
>>> Karli
>>>
>>>
>>>
>>>
>>>> On Tue, Jul 12, 2016 at 5:53 AM, Karl Rupp <r...@iue.tuwien.ac.at
>>>> <mailto:r...@iue.tuwien.ac.at>> wrote:
>>>>
>>>> Hi Dmitriy,
>>>>
>>>> On 07/12/2016 07:17 AM, Dmitriy Lyubimov wrote:
>>>>
>>>> Hi,
>>>>
>>>> I am trying to create some elementary wrappers for VCL in
>>>> javacpp.
>>>>
>>>> Everything goes fine, except i really would rather not use those
>>>> "cpu"
>>>> types (std::map,
>>>> std::vector) and rather initialize matrices directly by feeding
>>>> row-major or CCS formats.
>>>>
>>>> I see that matrix () constructor accepts this form of
>>>> initialization;
>>>> but it really states that
>>>> it does "wrapping" for the device memory.
>>>>
>>>>
>>>> Yes, the constructors either create their own memory buffer
>>>> (zero-initialized) or wrap an existing buffer. These are the only
>>>> reasonable options.
>>>>
>>>>
>>>> Now, i can create a host matrix() using host memory and
>>>> row-major
>>>> packing. This works ok it seems.
>>>>
>>>> However, these are still host instances. Can i copy host
>>>> instances to
>>>> instances on opencl context?
>>>>
>>>>
>>>> Did you look at viennacl::copy() or viennacl::fast_copy()?
>>>>
>>>>
>>>> That might be one way bypassing unnecessary (in my case)
>>>> complexities of
>>>> working with std::vector and std::map classes from java side.
>>>>
>>>> But it looks like there's no copy() variation that would accept
>>>> a
>>>> matrix-on-host and matrix-on-opencl arguments (or rather, it of
>>>> course
>>>> declares those to be ambiguous since two methods fit).
>>>>
>>>>
>>>> If you want to copy your OpenCL data into a viennacl::matrix, you
>>>> may wrap the memory handle (obtained with .elements()) into a vector
>>>> and copy that. If you have plain host data, use
>>>> viennacl::fast_copy() and mind the data layout (padding of
>>>> rows/columns!)
>>>>
>>>>
>>>> For compressed_matrix, there seems to be a set() method, but i
>>>> guess
>>>> this also requires CCS arrays in the device memory if I use it.
>>>> Same
>>>> question, is there a way to send-and-wrap CCS arrays to an
>>>> opencl device
>>>> instance of compressed matrix without using std::map?
>>>>
>>>>
>>>> Currently you have to use .set() if you want to bypass
>>>> viennacl::copy() and std::map.
>>>>
>>>> I acknowledge that the C++ type system is a pain when interfacing
>>>> from other languages. We will make this much more convenient in
>>>> ViennaCL 2.0. The existing interface in ViennaCL 1.x is too hard to
>>>> fix without breaking lots of user code, so we won't invest time in
>>>> that (contributions welcome, though :-) )
>>>>
>>>> Best regards,
>>>> Karli
>>>>
>>>>
>>>>
>>>>
>>>
>>
>
------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are
consuming the most bandwidth. Provides multi-vendor support for NetFlow,
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports.http://sdm.link/zohodev2dev
_______________________________________________
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel