from:"Karl Rupp"

Re: [ViennaCL-devel] shared_ptr and compressed_matrix

2018-06-15 Thread Karl Rupp


Hi,


I have discovered something rather odd.

If I run an minimal example (as shown in eigen-with-viennacl.cpp) all 
runs fine where I pass a compressed_matrix directly to the copy e.g.


Eigen::SparseMatrix spAm;

... code to fill spAm;

viennacl::matrix A = viennacl::compressed_matrix(K, M)


I assume you mean
  viennacl::compressed_matrix A =
 viennacl::compressed_matrix(K, M);
here? This may result in an unnecessary temporary object, so I'd recommend



viennacl::copy(spAm, A);


However, if my compressed_matrix is wrapped up in a std::shared_ptr 
(C++11) I don't seem to be able to copy even when dereferencing.


std::shared_ptr > shptr =
std::make_shared
 >(viennacl::compressed_matrix(K,M));

viennacl::copy(spAm, *shptr);


T == float?







Strangely, this results in a series of memory errors, the top relevant 
ones are here


0x701CF5F6 (0x18C5A670 0x1502E060
0x043EB20B 0x043EB230),
_ZN8viennacl7backend13memory_createERNS0_10mem_handleEyRKNS_7contextEPKv()
+ 0x1C6 bytes(s)
0x701C00C0 (0x 0x04627FD0
0x0010 0x0004),

_ZN8viennacl6detail9copy_implINS_5tools27const_sparse_matrix_adapterIdjEEdLj1EEEvRKT_RNS_17compressed_matrixIT0_XT1_EEEy()
+ 0x3A0 bytes(s)
0x701BD721 (0x043EB350 0x04627FD0
0x04530620 0x07474070),

_ZN8viennacl4copyIdLi1ELj1EEEvRKN5Eigen12SparseMatrixIT_XT0_EiEERNS_17compressed_matrixIS3_XT1_EEE()
+ 0x321 bytes(s)


which following some c++filt we get the following

viennacl::backend::memory_create(viennacl::backend::mem_handle&,
unsigned long long, viennacl::context const&, void const*)
void

viennacl::detail::copy_impl, float,
1u>(viennacl::tools::const_sparse_matrix_adapter const&, viennacl::compressed_matrix&, unsigned long
long)
void viennacl::copy(Eigen::SparseMatrix
const&, viennacl::compressed_matrix&)


Any insight as to why this would be would be appreciated.


compressed_matrix has the copy-constructor implemented, so that should 
be okay. Maybe it doesn't copy *all* internal members. I'll try to 
reproduce the problem so that I can debug it.


Best regards,
Karli


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] Failing to find platforms with pocl

2017-08-11 Thread Karl Rupp


Hi Charles,


I have another curious situation.  I have installed pocl 0.14 on a 
ubuntu 14.04 system.  I can install and run clinfo without any 
problems.  However, when I compile and run my context.cpp file 
(https://github.com/cdeterman/gpuR/blob/develop/src/context.cpp) and try 
to run the initContexts function I keep getting a -1001 error for the 
get_platforms call.


Any idea why this script would be failing to find the platforms whereas 
even a basic query file like this 
(https://github.com/cdeterman/pocl_test/blob/master/clDeviceQuery.cpp) 
can be compiled simply with


g++ -o clDeviceQuery clDeviceQuery.cpp -lOpenCL

and run without a problem.


whenever I've seen problems with querying platform information it was 
due to a problem with the OpenCL environment: Either the GPU driver was 
not correctly installed (unlikely in your case) or the wrong 
libOpenCL.so was picked up (more likely here). Can you verify with ldd 
that the correct libOpenCL.so is picked up? clDeviceQuery and gpuR 
should pick up the same libOpenCL.so.


Best regards,
Karli

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] Efficient unary operation

2016-11-23 Thread Karl Rupp

Hi Charles,

> Right now, if I want to take the negative of every element in a matrix I
> end up doing the following:
>
> // previously assigned
> viennacl::matrix vcl_A;
>
> // matrix of zeros to subtract from
> viennacl::matrix vcl_Z =
> viennacl::zero_matrix(vcl_A.size1(),vcl_A.size2());
>
> // subtract in-place
> vcl_Z -= vcl_A;
> vcl_A = vcl_Z;
>
> Is there a more efficient way to approach this?  Allocating an
> additional entire matrix is proving quite wasteful in some of my benchmarks.

What about just

vcl_A = T(-1) * vcl_A;

? This is 'inplace' as requested :-)

Best regards,
Karli



--
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] vector vs vector_base classes

2016-11-10 Thread Karl Rupp

Hi,

> What exactly is the distinction between vector and vector_base classes?

vector_base is the base class for dense vectors (viennacl::vector) as 
well as vector proxy objects (vector_range, vector_slice). Since some of 
the constructors of vector_base are tricky, the recommendation is to use 
viennacl::vector. Also, this allows for an API very similar to Boost.uBLAS.



> I assume that users would mostly use 'vector' but it appears
> 'vector_base' has useful features such as casting a matrix to be
> interpreted as a vector while sharing the memory (this would be very
> useful for 'vector' class).  Is the same true of 'vector' where there
> are features it has but not in 'vector_base'?

You can create a vector from scalar_vector, unit_vector, or zero_vector, 
which is not possible for a vector_base object.
Everything else is the same.

Best regards,
Karli

PS: Similar statements hold true for matrix vs. matrix_base.



>
> --
> Developer Access Program for Intel Xeon Phi Processors
> Access to Intel Xeon Phi processor-based developer platforms.
> With one year of Intel Parallel Studio XE.
> Training and support from Colfax.
> Order your platform today. http://sdm.link/xeonphi
>
>
>
> ___
> ViennaCL-devel mailing list
> ViennaCL-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/viennacl-devel
>


--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] Fw: Matrix * Vector CL_OUT_OF_RESOURCES error

2016-10-21 Thread Karl Rupp


Hi Andy,

apparently I was completely bananas and couldn't see the more 
fundamental problem with the code. I could reproduce the issue on my 
laptop (I'm on travel) and after finding the cause, I wonder why it 
didn't fail on my other machine.


The reason for the problem is the initialization of the matrix A and the 
vector B: If you pass in a pointer to host memory, you *must* specify 
the memory type as viennacl::MAIN_MEMORY. To convert the data to OpenCL, 
switch to the OpenCL context via the member function 
.switch_memory_context(). It is not possible to use host data directly 
with OpenCL in general, because host memory and device memory can be 
physically distinct. (ignoring some possible pointer sharing on certain 
OpenCL devices here)


The corrected code is attached. Please have a look at how A and B are 
initialized and how their memory context is changed to OpenCL after 
creation. Use viennacl::copy() or viennacl::fast_copy() to bring the 
data back to one of your host buffers.


Sorry for not spotting this earlier...

Best regards,
Karli



On 10/12/2016 06:25 AM, Andrew Palumbo wrote:

Hi Karl,

As I mentioned before, I'm using libviennacl-dev version 1.7.1 installed
from the ubunto repo.



When I run your attached code, I get do get an error:


andrew@michael:~/Downloads$ g++ DenseVectorMmul.cpp
-I/usr/include/viennacl/ -lOpenCL -o denseVec

andrew@michael:~/Downloads$ ./denseVec
terminate called after throwing an instance of 'viennacl::memory_exception'
  what():  ViennaCL: Internal memory error: not initialised!
Aborted (core dumped)

andrew@michael:~/Downloads$ g++ --version
g++ (Ubuntu 5.4.0-6ubuntu1~16.04.2) 5.4.0 20160609
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Thanks,


Andy






*From:* Karl Rupp <r...@iue.tuwien.ac.at>
*Sent:* Tuesday, October 11, 2016 10:31 AM
*To:* Andrew Palumbo; viennacl-devel@lists.sourceforge.net
*Subject:* Re: [ViennaCL-devel] Fw: Matrix * Vector CL_OUT_OF_RESOURCES
error

Hi Andy,

thanks for the reminder and sorry for not getting back to you sooner.

After replacing the dynamically-sized arrays with std::vector, I
compiled the code you provided and can execute it without problems. The
code is also valgrind-clean, so I don't know what could possibly be the
problem.

Can you please verify two things:
  a) you use the latest code from the master branch?
  b) does the error show up with the attached code? It contains the
fixes for std::vector<>. The reason for the change is that my compiler
(GCC 4.6) did error out with the following:

DenseVectorMmul.cpp: In function ‘int main()’:
DenseVectorMmul.cpp:32:30: error: variable-sized object ‘A_values’ may
not be initialized
DenseVectorMmul.cpp:45:26: error: variable-sized object ‘B_values’ may
not be initialized


Best regards,
Karli


#define VIENNACL_WITH_OPENCL
#include "viennacl/context.hpp"
#include "viennacl/matrix.hpp"
#include "viennacl/tools/random.hpp"

#include 

// C_vec = A_dense_matrix %*% B_vec.

// compile line: g++ DenseVectorMmul.cpp  -I/usr/include/viennacl/ -lOpenCL -o denseVec



int main()
{
  // trying to recreate javacpp wrapper functionalliy as closely as possible
  // so not using typedef, unsigned ints, etc, and defining templates as doubles
  // creating buffers as int/double arrays and then setting pointers to them.
  // (not 100% sure that this is how javacpp passes pointers but should be close.)  


  //typedef double   ScalarType;

  // using unsigned ints here to suppress warnings/errors w/o using -fpermissive`
  // in acuallity, we cast `int`s from jni/javacpp.
  unsigned int m = 200;
  unsigned int n = 100;

  // create an OpenCL context which we will pass directly to the constructors 
  // of the Matrix and vector
  viennacl::context oclCtx(viennacl::OPENCL_MEMORY);
 
  std::vector A_values(m * n);

  viennacl::tools::uniform_random_numbers randomNumber;
  for (int i = 0; i < m * n; i++) { 
A_values[i] = randomNumber();
  }

  double* A_values_ptr = &(A_values[0]);

  // this is currently the constructor that we're using through scala/javacpp.
  viennacl::matrix<double,viennacl::row_major> A_dense_matrix(A_values_ptr, viennacl::MAIN_MEMORY , m, n);
  A_dense_matrix.switch_memory_context(oclCtx);

  std::vector B_values(n);

  for (int i = 0; i < n; i++) { 
B_values[i] = randomNumber();
  }

  double* B_values_ptr = &(B_values[0]);
   
  // this is currently the constructor that we're using through scala/javacpp.
  viennacl::vector B_vec(B_values_ptr, viennacl::MAIN_MEMORY, n, 0, 1);  
  B_vec.switch_memory_context(oclCtx);

  // perform multiplication and pass result to a vector constructor
  viennacl::vector C_vec(viennacl::linalg::p

Re: [ViennaCL-devel] Fw: Matrix * Vector CL_OUT_OF_RESOURCES error

2016-10-12 Thread Karl Rupp

Hi Andy,

> As I mentioned before, I'm using libviennacl-dev version 1.7.1 installed
> from the ubunto repo.
>
>
>
> When I run your attached code, I get do get an error:
>
>
> andrew@michael:~/Downloads$ g++ DenseVectorMmul.cpp
> -I/usr/include/viennacl/ -lOpenCL -o denseVec
>
> andrew@michael:~/Downloads$ ./denseVec
> terminate called after throwing an instance of 'viennacl::memory_exception'
>   what():  ViennaCL: Internal memory error: not initialised!
> Aborted (core dumped)
>
> andrew@michael:~/Downloads$ g++ --version
> g++ (Ubuntu 5.4.0-6ubuntu1~16.04.2) 5.4.0 20160609
> Copyright (C) 2015 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions.  There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Now things get weird: I can't reproduce the problem with neiter the 
current developer tip, the 1.7.1 release, and 1.7.0, GCC 4.6 and GCC 
4.8. I'll replicate your environment (Ubuntu 16.04) and try it there.

Best regards,
Karli


--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] Fw: Matrix * Vector CL_OUT_OF_RESOURCES error

2016-09-15 Thread Karl Rupp

nt i = 0; i < n; i++) {
> B_values[i] = randomNumber();
>   }
>
>   double* B_values_ptr = B_values;
>
>   // this is currently the constructor that we're using through
> scala/javacpp.
>   viennacl::vector B_vec(B_values_ptr, oclCtx.memory_type(), n,
> 0, 1);
>
>   // perform multiplication and pass result to a vector constructor
>   viennacl::vector C_vec(viennacl::linalg::prod(A_dense_matrix ,
> B_vec));
>
>   // print out vec
>   std::cout << "ViennaCL: " << C_vec << std::endl;
>
>
>   // just exit with success for now if there are no runtime errors.
>
>   return EXIT_SUCCESS;
> }
>
> 
> *From:* Karl Rupp <r...@iue.tuwien.ac.at>
> *Sent:* Thursday, September 15, 2016 5:14:36 PM
> *To:* Andrew Palumbo; ViennaCL-devel@lists.sourceforge.net
> *Subject:* Re: [ViennaCL-devel] Matrix * Vector CL_OUT_OF_RESOURCES error
>
> Hi,
>
>> Attached and below is the the matrix and vector setup that I'm using
>> from scala.
>>
>> I've also attached it as DenseVectorMmul.cpp.
>>
>>
>> When i the snippet below, I get the following error:
>>
>> terminate called after throwing an instance of 'viennacl::memory_exception'
>>   what():  ViennaCL: Internal memory error: not initialised!
>> Aborted (core dumped)
>
> you need to either
>   #define VIENNACL_WITH_OPENCL
> at the very top or add
>   -DVIENNACL_WITH_OPENCL
> to your compiler call.
>
> Best regards,
> Karli
>
>
>
>
>>
>>
>>
>>
>>
>>
>> #include "viennacl/matrix.hpp"
>> #include "viennacl/compressed_matrix.hpp"
>> #include "viennacl/vector.hpp"
>> #include "viennacl/tools/random.hpp"
>> #include "viennacl/context.hpp"
>>
>>
>> // C_vec = A_dense_matrix %*% B_vec.
>>
>> // compile line w/o OpenMP: g++ DenseVectorMmul.cpp
>> -I/usr/include/viennacl/ -o denseVec
>>
>>
>>
>> int main()
>> {
>>   // trying to recreate javacpp wrapper functionalliy as closely as possible
>>   // so not using typedef, unsigned ints, etc, and defining templates as
>> doubles
>>   // creating buffers as int/double arrays and then setting pointers to
>> them.
>>   // (not 100% sure that this is how javacpp passes pointers but should
>> be close.)
>>
>>
>>   //typedef double   ScalarType;
>>
>>   // using unsigned ints here to suppress warnings/errors w/o using
>> -fpermissive`
>>   // in acuallity, we cast `int`s from jni/javacpp.
>>   unsigned int m = 200;
>>   unsigned int n = 100;
>>
>>   // create an OpenCL context which we will pass directly to the
>> constructors
>>   // of the Matrix and vector
>>   viennacl::context oclCtx(viennacl::OPENCL_MEMORY);
>>
>>   double A_values[m * n] = {0};
>>
>>   viennacl::tools::uniform_random_numbers randomNumber;
>>   for (int i = 0; i < m * n; i++) {
>> A_values[i] = randomNumber();
>>   }
>>
>>   double* A_values_ptr = A_values;
>>
>>   // this is currently the constructor that we're using through
>> scala/javacpp.
>>   const viennacl::matrix<double,viennacl::row_major>
>>A_dense_matrix(A_values_ptr, oclCtx.memory_type()
>> , m, n);
>>
>>   double B_values[n] = {0};
>>
>>   for (int i = 0; i < n; i++) {
>> B_values[i] = randomNumber();
>>   }
>>
>>   double* B_values_ptr = B_values;
>>
>>   // this is currently the constructor that we're using through
>> scala/javacpp.
>>   viennacl::vector B_vec(B_values_ptr, oclCtx.memory_type(), n,
>> 0, 1);
>>
>>   // perform multiplication and pass result to a vector constructor
>>   viennacl::vector C_vec(viennacl::linalg::prod(A_dense_matrix ,
>> B_vec));
>>
>>   // print out vec
>>   std::cout << "ViennaCL: " << C_vec << std::endl;
>>
>>
>>   // just exit with success for now if there are no runtime errors.
>>
>>   return EXIT_SUCCESS;
>> }
>>
>> 
>> *From:* Andrew Palumbo <ap@outlook.com>
>> *Sent:* Wednesday, September 14, 2016 9:49:32 PM
>> *To:* Karl Rupp; ViennaCL-devel@lists.sourceforge.net
>> *Subject:* Re: [ViennaCL-devel] Matrix * Vector CL_OUT_OF_RESOURCES error
>>
>>
>> Hi Karl,
>>
>>
>> Thanks, Yeah I'll try to mock one

Re: [ViennaCL-devel] Matrix * Vector CL_OUT_OF_RESOURCES error

2016-09-15 Thread Karl Rupp

Hi,

> Attached and below is the the matrix and vector setup that I'm using
> from scala.
>
> I've also attached it as DenseVectorMmul.cpp.
>
>
> When i the snippet below, I get the following error:
>
> terminate called after throwing an instance of 'viennacl::memory_exception'
>   what():  ViennaCL: Internal memory error: not initialised!
> Aborted (core dumped)

you need to either
  #define VIENNACL_WITH_OPENCL
at the very top or add
  -DVIENNACL_WITH_OPENCL
to your compiler call.

Best regards,
Karli




>
>
>
>
>
>
> #include "viennacl/matrix.hpp"
> #include "viennacl/compressed_matrix.hpp"
> #include "viennacl/vector.hpp"
> #include "viennacl/tools/random.hpp"
> #include "viennacl/context.hpp"
>
>
> // C_vec = A_dense_matrix %*% B_vec.
>
> // compile line w/o OpenMP: g++ DenseVectorMmul.cpp
> -I/usr/include/viennacl/ -o denseVec
>
>
>
> int main()
> {
>   // trying to recreate javacpp wrapper functionalliy as closely as possible
>   // so not using typedef, unsigned ints, etc, and defining templates as
> doubles
>   // creating buffers as int/double arrays and then setting pointers to
> them.
>   // (not 100% sure that this is how javacpp passes pointers but should
> be close.)
>
>
>   //typedef double   ScalarType;
>
>   // using unsigned ints here to suppress warnings/errors w/o using
> -fpermissive`
>   // in acuallity, we cast `int`s from jni/javacpp.
>   unsigned int m = 200;
>   unsigned int n = 100;
>
>   // create an OpenCL context which we will pass directly to the
> constructors
>   // of the Matrix and vector
>   viennacl::context oclCtx(viennacl::OPENCL_MEMORY);
>
>   double A_values[m * n] = {0};
>
>   viennacl::tools::uniform_random_numbers randomNumber;
>   for (int i = 0; i < m * n; i++) {
> A_values[i] = randomNumber();
>   }
>
>   double* A_values_ptr = A_values;
>
>   // this is currently the constructor that we're using through
> scala/javacpp.
>   const viennacl::matrix<double,viennacl::row_major>
>A_dense_matrix(A_values_ptr, oclCtx.memory_type()
> , m, n);
>
>   double B_values[n] = {0};
>
>   for (int i = 0; i < n; i++) {
> B_values[i] = randomNumber();
>   }
>
>   double* B_values_ptr = B_values;
>
>   // this is currently the constructor that we're using through
> scala/javacpp.
>   viennacl::vector B_vec(B_values_ptr, oclCtx.memory_type(), n,
> 0, 1);
>
>   // perform multiplication and pass result to a vector constructor
>   viennacl::vector C_vec(viennacl::linalg::prod(A_dense_matrix ,
> B_vec));
>
>   // print out vec
>   std::cout << "ViennaCL: " << C_vec << std::endl;
>
>
>   // just exit with success for now if there are no runtime errors.
>
>   return EXIT_SUCCESS;
> }
>
> 
> *From:* Andrew Palumbo <ap@outlook.com>
> *Sent:* Wednesday, September 14, 2016 9:49:32 PM
> *To:* Karl Rupp; ViennaCL-devel@lists.sourceforge.net
> *Subject:* Re: [ViennaCL-devel] Matrix * Vector CL_OUT_OF_RESOURCES error
>
>
> Hi Karl,
>
>
> Thanks, Yeah I'll try to mock one up in C++ and see if i can reproduce
> it.   (still working in java via javacpp so it can be tough to debug on
> my end).  Will send you a C++ snippit soon.
>
>
> Thanks,
>
>
> Andy
>
> 
> *From:* Karl Rupp <r...@iue.tuwien.ac.at>
> *Sent:* Wednesday, September 14, 2016 7:09:13 PM
> *To:* Andrew Palumbo; ViennaCL-devel@lists.sourceforge.net
> *Subject:* Re: [ViennaCL-devel] Matrix * Vector CL_OUT_OF_RESOURCES error
>
> Hi Andrew,
>
>> I've been getting a CL_OUT_OF_RESOURCES error when I try to do (somthing
>> like) the following with A OpenCL Contex in a unit testt:
>>
>>
>> viennacl::matrix<double,viennacl::row_major> mxA
>>
>> viennacl::vector vecB
>>
>>
>> // add some data to both mxA and vecB
>>
>>
>> viennacl::vector vecB = viennacl::linalg::prod(mxA, vecB)
>>
>> This seems right and everything works when using an OpenMP Context, but
>> when I try to read the data off of the GPU in (with in an openCL
>> Context)  using backend::memory_read, I get the CL_OUT_OF_RESOURCES error.
>
> You get a CL_OUT_OF_RESOURCES error if one of the previous kernels or
> data manipulations seg-faulted. Although unlikely, it may also be a
> problem with the matrix-vector product kernel. Is there any chance you
> can send a working c

Re: [ViennaCL-devel] Calculating matrix inverse

2016-09-08 Thread Karl Rupp

Hi Charles,

lu_factorize factors the matrix A into a lower triangular matrix L (with 
unit diagonal) and an upper triangular matrix U. The values in A are 
overwritten with these values.

If you want to obtain the inverse, you have to call
  viennacl::linalg::lu_substitute(vcl_A, vcl_B);
where vcl_B is the unit matrix. The inverse will be then stored in vcl_B.

Best regards,
Karli


On 09/08/2016 07:45 PM, Charles Determan wrote:
> I am trying to calculate the inverse of a matrix taking the advice from
> a previous post
> (https://sourceforge.net/p/viennacl/discussion/1143678/thread/ba394d35/)
> suggesting the use of LU factorization.  So I do the following:
>
> I have vcl_A matrix
>
> viennacl::vector vcl_lu_rhs(vcl_A.size1());
>
> // solution of a full system right into the load vector vcl_rhs:
> viennacl::linalg::lu_factorize(vcl_A);
> viennacl::linalg::lu_substitute(vcl_A, vcl_lu_rhs);
>
> std::cout << "matrix A" << std::endl;
> std::cout << vcl_A << std::endl;
>
> std::cout << "vector" << std::endl;
> std::cout << vcl_lu_rhs << std::endl;
>
> However, neither of these outputs is remotely close to the output I
> expect to see for the inverse of a matrix.  In R the output would be:
>
>># mat = vcl_A
>> mat
>[,1][,2][,3]   [,4]
> [1,] -1.0099356  0.19566691  0.47349181  2.2673060
> [2,] -0.7398383  0.81302435  0.34390506  0.4029221
> [3,]  1.0020811 -0.06548085 -0.09373213  0.3257177
> [4,]  1.1549178  0.87441621  1.53483119 -0.5862660
>> solve(mat)
> [,1] [,2]   [,3][,4]
> [1,] -0.09714492  0.004242602  0.8125165  0.07863873
> [2,] -0.45207626  1.583809306  0.8987234 -0.16052995
> [3,]  0.46074427 -0.886299611 -0.9398235  0.65059443
> [4,]  0.34057494  0.050298145  0.4806311 -0.08698459
>
> but with the above viennacl code I see:
>
> matrix A
> [4,4]((-1.00994,0.195667,0.473492,2.26731),(0.73256,0.669687,-0.00295601,-1.25802),(-0.992223,0.192126,0.376645,2.81709),(-1.14356,1.63983,5.52547,-11.4963))
> vector
> [4](-0,0,0,-0)
>
> Did I miss something here?
>
> Thanks,
> Charles
>
>
> --
>
>
>
> ___
> ViennaCL-devel mailing list
> ViennaCL-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/viennacl-devel
>


--
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] compressed_matrix %*% matrix_Base

2016-08-17 Thread Karl Rupp

Hi Charles,

> Just adding my opinion here as I have been following this thread.  Would
> it be possible to have both the .so library and header only options
> available or is it a strictly 'this-or-that' scenario?

a header-only version should be possible, but exposes the user to all 
kinds of compiler flags. One example: In order to provide AVX-enabled 
code, the user has to pass the respective compilation flags when using a 
header-only model. Many users don't want that, or may not even be in the 
position to do that (e.g. if ViennaCL is part of a larger software 
stack). Ideally, ViennaCL contains AVX- and non-AVX code in the same 
binary, selecting the appropriate code path based on the actual CPU 
features *available* rather than relying on the user passing the correct 
optimization flags.

Best regards,
Karli

--
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] Interfacing with clMAGMA?

2016-08-17 Thread Karl Rupp

Hi Charles,

> There is a fair amount of output, hopefully something here provides a
> clue that you can understand.

Ok, so let me explain the relevant messages:

> ViennaCL: Setting handle kernel argument 0xbac6690 at pos 0 for kernel
> assign_cpu
(...)
> ViennaCL: Setting handle kernel argument 0xbb2d3d0 at pos 0 for kernel
> assign_cpu
(...)
> ViennaCL: Setting handle kernel argument 0xbb2e120 at pos 0 for kernel
> assign_cpu

The buffers 0xbac6690, 0xbb2d3d0, and 0xbb2e120 (of type cl_mem) are the 
relevant matrix buffers for clMAGMA. These are the ones you should pass 
the the GEMM routines. You can verify that by printing the values of 
'bufA' and the like.

(Of course the buffer addresses change in each run)

Best regards,
Karli


--
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] compressed_matrix %*% matrix_Base

2016-08-17 Thread Karl Rupp

Hi Dmitriy,

> We could (and probably should?) add such a convenience header file
> at the expense of increased compilation times (and reduced
> encapsulation of source code against compiler issues).
>
>
> +1 on single header! :)

thanks for the feedback:
https://github.com/viennacl/viennacl-dev/issues/196


> Ultimately, this all boils down to fighting limitations of the
> current header-only source code distribution model.
>
>
> FWIW, if our opinion matters, actually, header-only is one of the things
> we like very much. It means we don't have to redistribute any
> executables, everything already is included in our jars, everything that
> we use and need (and only it) is already generated for us by javacpp.
> This is one of the most valuable features about ViennaCL in my opinion.
> It is very hard to get customers to install yet-another libX.so on their
> clusters.

I agree that additional libraries on clusters can be tricky at times...


> But header-only, template-based code solves
>
> (1) we include everything we need in jar (no extra infra requirement)
> (2) we include only that we actually support/use (lightweight, slim
> application size requirement)
>
> these are very valuable for flink/spark type of applications. Which is
> what we are.
>
> I know that you have plans to generate a .so lib with apparently
> non-object API, but for apache mahout the OAA api with header-only
> requirement is super optimal. (at least I have a high hope you won't
> _force_ us to redistribute an .so(s) in the future releases :) )

Will a static library suffice for your purposes? I'm not an expert on 
releasing .jar packages, but I'd expect that a static library could 
offer similar advantages to an header-only approach.

Best regards,
Karli

--
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] Interfacing with clMAGMA?

2016-08-12 Thread Karl Rupp

Hi Charles,

call .handle()/.handle1()/.handle2() to get the abstract memory buffers, 
and call .opencl_handle() on them to get the cl_mem handles:

  A.handle().opencl_handle()

Similarly, the command queue is obtained with
  viennacl::ocl::get_queue().handle().get()

Unfortunately it's not explicitly written in the manual :-/

Best regards,
Karli


On 08/12/2016 09:39 PM, Charles Determan wrote:
> I also would need to access the command queue handle (cl_command_queue)
> object to pass to clBLAS and clMAGMA functions.  Is this easily
> accessible as well?
>
> Thanks,
> Charles
>
> On Fri, Aug 12, 2016 at 11:45 AM, Charles Determan
> > wrote:
>
> Thanks Karl,
>
> I have been looking through the docs and I can't find an example for
> how to pull the OpenCL handles from a matrix.  I saw a couple I
> think from a context but not sure that is what I need.  Is this in
> the documentation somewhere?  The closest I could fine is this page
> (http://viennacl.sourceforge.net/doc/manual-memory.html
> ).
>
> Regards,
> Charles
>
> On Wed, Aug 10, 2016 at 12:09 PM,  > wrote:
>
> Hi Charles,
>
>
> I have recently expressed some interest in different
> factorizations such as
> QR and SVD.  I am aware that these or currently experimental
> within
> ViennaCL.  Until such a time that these factorizations are
> fully supported
> (I hope to contribute but the algorithms are quite complex)
> would it be
> feasible to interface with a library like clMAGMA?  I'm not
> sure of any
> other library offhand that does implement these methods.  I
> thought perhaps
> VexCL but I couldn't find anything to that effect in the
> documentation.
>
>
> Sure, you can always grab the OpenCL handles from the matrices
> and plug that into clMAGMA.
> I don't think there is any value in ViennaCL wrapping the
> clMAGMA interfaces, though.
>
> Best regards,
> Karli
>
>
>
>


--
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. http://sdm.link/zohodev2dev
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] Does ViennaCL supports opencl 1.1 EP

2016-08-11 Thread Karl Rupp

Hi Tanguy,

> we are working with an arm 7 on an imx6q board and wonder if ViennaCL
> supports the opencl 1.1 Embedded Profile.

No, the embedded profile is not supported. Since the embedded profile is 
not part of OpenCL 2.0 and since newer mobile GPUs tend to have full 
OpenCL support, it is unlikely that we will add support for OpenCL 1.1 
EP in the future.

Best regards,
Karli



--
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. http://sdm.link/zohodev2dev
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] compressed_matrix %*% matrix_Base

2016-08-07 Thread Karl Rupp

Hi Andy,

the relevant tests for sparse matrices times dense matrices are in 
tests/spmdm.cpp. In particular, I recreated a test case based on your 
description and couldn't find any issues:

  viennacl::compressed_matrix compressed_A;
  viennacl::matrix  B1(std_A.size(), cols_rhs);
  viennacl::matrix_base B1_ref(B1);
  viennacl::matrix_base 
C2(viennacl::linalg::prod(compressed_A, B1_ref));

compiles cleanly. Could you please provide a code snippet demonstrating 
the problem you are encountering?

Thanks and best regards,
Karli



On 08/05/2016 09:04 PM, Andrew Palumbo wrote:
> Hi Karl,
>
>
> I've been trying to implement tests for:
>
>
>  matrix_base C = compressed_matrix A %*%
>
>  matrix_base B.
>
>
> I cant find in the code or the documentation any constructor for
> matrix_base(
>
> matrix_expression, const
> viennacl::matrix_base, viennacl::op_prod>)
>
> ie. a mixed expression of compressed_matrix and matrix_base
>
> and get a compilation error when I try to instantiate a:
>
>  matrix_base(matrix_expression  viennacl::compressed_matrix, const
> viennacl::matrix_base,
>  viennacl::op_prod>)
>
> Is there a transformation that I need to do from this
>
>  matrix_expression op_prod>
>
> to something else so that I may be able to initialize a matrix_base (or
> possibly even a compressed_matrix) from it?
>
> The compilation error that i get is below.
>
> Thanks,
>
> Andy
>


--
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] compressed_matrix %*% matrix_Base

2016-08-04 Thread Karl Rupp

Hi Andrew,



On 08/04/2016 01:33 AM, Andrew Palumbo wrote:
> Oops sorry - wrong class in the last post.  Too many things going on at
> once.
>
>
> @Properties(inherit = Array(classOf[Context]),
>value = Array(new Platform(
>  include =Array("matrix.hpp"),
>  library ="jniViennaCL")
>))
> @Namespace("viennacl")
> @Name(Array("matrix_expression, " +
>"const viennacl::matrix_base, " +
>"viennacl::op_prod>"))

yes, this is the right result expression template type.

Regarding trans: Currently the functionality isn't fully exposed through 
the API, i.e. you cannot write A = trans(B) for sparse matrices A and B. 
However, the functionality is implemented in 
viennacl::linalg::detail::amg::amg_transpose(B, A) and will be properly 
exposed soon.

Best regards,
Karli





> ----
> *From:* Andrew Palumbo <ap@outlook.com>
> *Sent:* Wednesday, August 3, 2016 6:44:10 PM
> *To:* Karl Rupp; viennacl-devel@lists.sourceforge.net
> *Subject:* Re: [ViennaCL-devel] compressed_matrix %*% matrix_Base
>
> Hi Karl, as always thanks for the quick response.
>
> I Just needed a point in the right direction, and have it compiling
> now.  (Tests up next).
>
> Just FYI, I needed a new class for the product result:
>
> @Properties(inherit = Array(classOf[Context]),
>value = Array(new Platform(
>  include =Array("matrix.hpp"),
>  library  ="jniViennaCL")
>))
> @Namespace("viennacl")
> @Name(Array("vector_expression, " +
>"const viennacl::vector_base, " +
>"viennacl::op_prod>"))
> class MatVecProdExpressionextends Pointer {
>
> }
>
> Wanted to make sure that I wasn't grinding my wheels.
>
> Thanks alot for your time.
>
> One more question, there is no `trans(compressed_matrix cm)` function
> correct?  This should just be done by taking the teanspose first of the
> matrix before converting it to CSR, etc?  Curious, as we may be able to
> shave a small amount of time if so.
>
> Thanks!
>
> Andy
>
>
> 
> *From:* Karl Rupp <r...@iue.tuwien.ac.at>
> *Sent:* Wednesday, August 3, 2016 5:28:58 PM
> *To:* Andrew Palumbo; viennacl-devel@lists.sourceforge.net
> *Subject:* Re: [ViennaCL-devel] compressed_matrix %*% matrix_Base
> Hi Andrew,
>
>   > I'm having some trouble with sparse `compressed_matrix` `matrix`(base)
>> matrix multiplication.  This is supported, correct?
>
> Yes. Could you please let us know what you have tried already?
> It shouldn't be any more code to write than
>
>viennacl::compressed_matrix A(...);
>viennacl::matrix B(...);
>viennacl::matrix C = viennacl::linalg::prod(A, B);
>
> Make sure to
>#include "viennacl/matrix.hpp"
>#include "viennacl/compressed_matrix.hpp"
>#include "viennacl/linalg/prod.hpp"
> at the beginning; otherwise you get incomprehensible C++ compiler output.
>
> Best regards,
> Karli
>
>
>
>
>>
>>
>> I've been trying to use the:
>>
>>
>> template< typename SparseMatrixType, typename SCALARTYPE>
>> typename viennacl::enable_if<
>> viennacl::is_any_sparse_matrix::value
>> <http://viennacl.sourceforge.net/doc/structviennacl_1_1enable__if.html>,
>> viennacl::matrix_expression
>> <http://viennacl.sourceforge.net/doc/classviennacl_1_1matrix__expression.html>> SparseMatrixType,
>> const matrix_base 
>> <http://viennacl.sourceforge.net/doc/classviennacl_1_1matrix__base.html>,
>> op_prod
>> <http://viennacl.sourceforge.net/doc/structviennacl_1_1op__prod.html> >
>>  >::type
>> <http://viennacl.sourceforge.net/doc/namespaceviennacl_1_1linalg.html#a3bba0146e669e012bb7c7380ce780a25>
>> prod
>> <http://viennacl.sourceforge.net/doc/namespaceviennacl_1_1linalg.html#aa18d10f8a90e38bd9ff43c650fc670ef>(const
>> SparseMatrixType & sp_mat,
>> const viennacl::matrix_base
>> <http://viennacl.sourceforge.net/doc/classviennacl_1_1matrix__base.html>
>> & d_mat)
>>{
>> return viennacl::matrix_expression
>> <http://viennacl.sourceforge.net/doc/classviennacl_1_1matrix__expression.html>> SparseMatrixType,
>> const viennacl::matrix_base
>> <http://viennacl.sourceforge.net/doc/classviennacl_1_1matrix__base.html>,
>> op_prod
>> <http://viennacl.sourceforge.net/doc/structviennacl_1_1op__prod.html>
>>  >(sp_mat, d_mat);
>>
>>
&

Re: [ViennaCL-devel] Removing Boost elements

2016-08-03 Thread Karl Rupp

Hi Charles,

 > I know the intent is to remove boost from viennacl.  As such, I was
> looking at some files I could contribute to that removal.  That said,
> before I start blindly making changes, is there a plan for how to
> replace elements such as `boost::numeric::ublas::prod`?

Well, the ViennaCL core is mostly free from uBLAS already. By 'mostly' I 
mean all the parts that are fully supported for all three backends. The 
experimental eigenvalue routines that are only available for the OpenCL 
backend are the major exception. I don't recommend anybody to start 
there, because this requires the most knowledge of ViennaCL's internals 
to be migrated.


> Is the intent to just use std objects such as
> std::vector and doing a manual matrix
> multiplication?
>
> Or do you intend to use another library?

Given that the stable core is already uBLAS free (or at least 95%), the 
major work is to ditch uBLAS from the examples and tests. Many of the 
examples no longer need uBLAS, because the functionality is available in 
ViennaCL. This is the major effort in terms of lines of code touched, 
but it shouldn't be too hard. If something cannot be expressed directly 
with ViennaCL, a STL-type should be used (std::vector, 
std::vector, etc.)

Hope this helps a bit :-)

Thanks and best regards,
Karli



--
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] boost/ublass requirements

2016-07-26 Thread Karl Rupp

Hi Andy,

 > I was just looking at looking at svd.hpp, qr.hpp and fspai.hpp, and I
> notice that there are boost requirements and includes in these files.
>
>
> Is there any way of running svd, qr or cholesky_solve without
> instlalling boost?

currently this is tied to uBLAS, but we want to get rid of all 
Boost-dependencies as soon as possible (i.e. with ViennaCL 1.8.0):
https://github.com/viennacl/viennacl-dev/wiki/ViennaCL-Roadmap

Best regards,
Karli

--
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports.http://sdm.link/zohodev2dev
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] Copying Values out of a compressed_matrix

2016-07-23 Thread Karl Rupp

Hi,

 > yes. this seems to be the case. if i force out-of-order CSR into
> in-order CSR everything seems to work. Can't see the documentation
> explicitly mentioning this if this is the case indeed.
>
> Karl, can you please confirm only in-order CSRs are supported? Thanks!

out-of-order CSR works for SpMVs, but not for sparse matrix-matrix 
multiplies.

Parallel algorithms usually work better for in-order data layouts. The 
performance penalty for out-of-order data is almost always too high to 
justify any extra kernels for out-of-order data.

Best regards,
Karli

PS: Yes, the documentation should be more explicit about this.

--
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports.http://sdm.link/zohodev2dev
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] Copying Values out of a compressed_matrix

2016-07-23 Thread Karl Rupp

Hi,

 > PS
> (4) column indices admit out-of-order placements of elements within each
> row.

Column indices *have* to be in ascending order for sparse matrix-matrix 
multiplication.

Best regards,
Karli


>
> Thank you.
> -Dmitriy
>
> On Fri, Jul 22, 2016 at 12:56 PM, Dmitriy Lyubimov  > wrote:
>
> I think I still am getting seg faults on attempt to multiply
> matrices even without conversion back (larger arguments, 3k x 1k)
>
> I re-wrote another alternative transformation procedure and see
> nothing wrong with it. Both Andrew's code and mine fail with the
> same symptoms.
>
> Karl, can we verify assumptions about the format:
>
> (1) the compressed_marix.set method expects host memory pointers.
> (2) the format is compressed row storage (CSR). Documentation never
> says explicitly that, and actually seems to have errors in size of
> elements and jumper arrays (it says jumper array has to be cols+1
> long wheres in CSR it shoud actually be rows + 1 long, right? )
> (3) the element sizes of jumper and column indices arrays are 32 bit
> and are in little endian order (at least for the open MP backend).
>
> Right now I can't even get open mp sparse multiplication work
> although CSR format is not rocket science at all. Don't see a
> problem anywhere. Tried to read Vienna's code to converm the
> assumptions above, but this seems to be pretty elusive for the time
> being.
>


--
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports.http://sdm.link/zohodev2dev
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] Copying Values out of a compressed_matrix

2016-07-23 Thread Karl Rupp

Hi Dmitriy,

 > Karl, can we verify assumptions about the format:
>
> (1) the compressed_marix.set method expects host memory pointers.

yes

> (2) the format is compressed row storage (CSR). Documentation never says
> explicitly that, and actually seems to have errors in size of elements
> and jumper arrays (it says jumper array has to be cols+1 long wheres in
> CSR it shoud actually be rows + 1 long, right? )

yes

> (3) the element sizes of jumper and column indices arrays are 32 bit and
> are in little endian order (at least for the open MP backend).

elements are in whatever order your machine supports.

Best regards,
Karli


> Right now I can't even get open mp sparse multiplication work although
> CSR format is not rocket science at all. Don't see a problem anywhere.
> Tried to read Vienna's code to converm the assumptions above, but this
> seems to be pretty elusive for the time being.
>
>
> On Fri, Jul 22, 2016 at 10:26 AM, Andrew Palumbo <ap@outlook.com
> <mailto:ap@outlook.com>> wrote:
>
> Yep thats it.  Oh wow- well thats just embarrassing .
>
>
> Thanks very much for your time, Karl- much appreciated.
>
>
> Andy
>
>     ----
> *From:* Karl Rupp <r...@iue.tuwien.ac.at <mailto:r...@iue.tuwien.ac.at>>
> *Sent:* Friday, July 22, 2016 12:39:20 PM
> *To:* Andrew Palumbo; viennacl-devel
> *Subject:* Re: [ViennaCL-devel] Copying Values out of a
> compressed_matrix
> Hi,
>
> your second and third arguments to memory_read() are incorrect:
> The second argument is the offset from the beginning, the third
> argument
> is the number of bytes to be read. Shifting the zero to the second
> position fixes the snippet (plus correcting the loop bounds when
> printing at the end) :-)
>
> Best regards,
> Karli
>
>
>
> On 07/22/2016 08:51 AM, Andrew Palumbo wrote:
> > a couple of small mistakes in the previous c++ file:
> >
> >
> > The memory_read(..) call should be:
> >
> >
> >// read data back into our product buffers
> >viennacl::backend::memory_read(handle1, product_size_row * 4, 0,
> > product_row_ptr, false);
> >viennacl::backend::memory_read(handle2, product_NNz * 4, 0,
> > product_col_ptr, false);
> >viennacl::backend::memory_read(handle, product_NNz * 8, 0,
> > product_values_ptr, false);
> >
> >
> > (read product_NNz * x bytes instead of product_size_row * x)
> >
> >
> > I've attached the corrected file.
> >
>     >
> > Thanks
> >
> >
> > Andy
> >
> > 
> > *From:* Andrew Palumbo <ap@outlook.com <mailto:ap@outlook.com>>
> > *Sent:* Thursday, July 21, 2016 11:03:59 PM
> > *To:* Karl Rupp; viennacl-devel
> > *Subject:* Re: [ViennaCL-devel] Copying Values out of a 
> compressed_matrix
> >
> > Hello,
> >
> >
> > I've mocked up a sample of the compressed_matrix multiplication that
> > I've been working with javacpp on in C++.  I am seeing the same type of
> > memory errors when I try to read the data out of product, and into the
> > output buffers as I was with javacpp.  By printing the matrix to stdout
> > as in the compressed_matrix example we can see that there are values
> > there, and they seem reasonable,  but when i use
> > backend::memory_read(...)  to retrive the buffers, I'm getting values
> > consistent with a memory error, and similar to what i was seeing in the
> > javacpp code.  Maybe I am not using the handles correctly?  Admittedly
> > my C++ is more than rusty, but I believe I am referencing the buffers
> > correctly in the output.
> >
> >
> > Below is the output of the attached file: sparse.cpp
> >
> >
> > Thanks very much,
> >
> >
> > Andy
> >
> >
> >
> > ViennaCL: compressed_matrix of size (10, 10) with 24 nonzeros:
> >(1, 2)0.329908
> >(1, 3)0.0110522
> >(1, 4)0.336839
> >(2, 5)0.0150778
> >(2, 7)0.0143518
> >(3, 3)0.217256
> >(3, 6)0.346854
> >(3, 9)0.45353
> >(4, 3)0.407954
> >(4, 6)0.651308
> >

Re: [ViennaCL-devel] Copying Values out of a compressed_matrix

2016-07-22 Thread Karl Rupp

Hi,

your second and third arguments to memory_read() are incorrect:
The second argument is the offset from the beginning, the third argument 
is the number of bytes to be read. Shifting the zero to the second 
position fixes the snippet (plus correcting the loop bounds when 
printing at the end) :-)

Best regards,
Karli



On 07/22/2016 08:51 AM, Andrew Palumbo wrote:
> a couple of small mistakes in the previous c++ file:
>
>
> The memory_read(..) call should be:
>
>
>// read data back into our product buffers
>viennacl::backend::memory_read(handle1, product_size_row * 4, 0,
> product_row_ptr, false);
>viennacl::backend::memory_read(handle2, product_NNz * 4, 0,
> product_col_ptr, false);
>viennacl::backend::memory_read(handle, product_NNz * 8, 0,
> product_values_ptr, false);
>
>
> (read product_NNz * x bytes instead of product_size_row * x)
>
>
> I've attached the corrected file.
>
>
> Thanks
>
>
> Andy
>
> 
> *From:* Andrew Palumbo <ap@outlook.com>
> *Sent:* Thursday, July 21, 2016 11:03:59 PM
> *To:* Karl Rupp; viennacl-devel
> *Subject:* Re: [ViennaCL-devel] Copying Values out of a compressed_matrix
>
> Hello,
>
>
> I've mocked up a sample of the compressed_matrix multiplication that
> I've been working with javacpp on in C++.  I am seeing the same type of
> memory errors when I try to read the data out of product, and into the
> output buffers as I was with javacpp.  By printing the matrix to stdout
> as in the compressed_matrix example we can see that there are values
> there, and they seem reasonable,  but when i use
> backend::memory_read(...)  to retrive the buffers, I'm getting values
> consistent with a memory error, and similar to what i was seeing in the
> javacpp code.  Maybe I am not using the handles correctly?  Admittedly
> my C++ is more than rusty, but I believe I am referencing the buffers
> correctly in the output.
>
>
> Below is the output of the attached file: sparse.cpp
>
>
> Thanks very much,
>
>
> Andy
>
>
>
> ViennaCL: compressed_matrix of size (10, 10) with 24 nonzeros:
>(1, 2)0.329908
>(1, 3)0.0110522
>(1, 4)0.336839
>(2, 5)0.0150778
>(2, 7)0.0143518
>(3, 3)0.217256
>(3, 6)0.346854
>(3, 9)0.45353
>(4, 3)0.407954
>(4, 6)0.651308
>(5, 2)0.676061
>(5, 3)0.0226486
>(5, 4)0.690264
>(6, 5)0.0998838
>(6, 7)0.0950744
>(7, 2)0.346173
>(7, 3)0.0115971
>(7, 4)0.353446
>(7, 9)0.684458
>(8, 5)0.0448123
>(8, 7)0.0426546
>(8, 9)0.82782
>(9, 5)0.295356
>(9, 7)0.281134
>
> row jumpers: [
> -36207072,32642,-39708721,32642,6390336,0,2012467744,32767,2012467968,32767,4203729,]
> col ptrs: [
> 0,0,-39655605,32642,-36207072,32642,6390336,0,10,0,-39672717,32642,2012466352,32767,-32892691,32642,1,0,6390336,0,2012466344,32767,60002304,2059362829,]
> elements: [
> 0.289516,0.304161,0.795779,0.334456,0.935264,0.585813,0.871237,0.811508,0.828558,0.0271863,6.92683e-310,6.92683e-310,1.061e-313,1.061e-313,6.36599e-314,4.24399e-314,6.36599e-314,6.92683e-310,4.24399e-314,1.2732e-313,2.122e-313,6.95324e-310,0.406537,0.0495716,0.370862,]
>
>
> and similarly for multiplication of 2 1x1 matrices:
>
> Result:
>
> ViennaCL: compressed_matrix of size (1, 1) with 1 nonzeros:
>(0, 0)    0.117699
>
> row jumpers: [
> -717571424,32767,]
> col ptrs: [
> 6386240,]
> elements: [
> 0.289516,6.9479e-310,]
>
>
>
>
> 
> *From:* Andrew Palumbo <ap@outlook.com>
> *Sent:* Wednesday, July 20, 2016 5:40:31 PM
> *To:* Karl Rupp; viennacl-devel
> *Subject:* Re: [ViennaCL-devel] Copying Values out of a compressed_matrix
>
> Oops, sorry about not cc'ing all.
>
>
> I do not get correct data back for a (Random.nextDouble() populated) 1 x
> 1 Matrix.
>
>
> A:
>
>Row Pointer: [0, 1 ]
>
>Col Pointer: [0 ]
>element Pointer: [0.6465821602909256 ]
>
>
> B:
>
>
>Row Pointer: [0, 1 ]
>Col Pointer: [0 ]
>element Pointer: [0.9513577109193919 ]
>
>
> C = A %*% B
>
>Row Pointer: [469762248, 32632]
>Col Pointer: [469762248 ]
>element Pointer: [6.9245198744523E-310 ]
>
>
> ouch.
>
>
> It looks like I'm not copying the Buffers correctly at all.  I'm may be
> using the javacpp buffers incorrectly here, or I have possibly wrapped
> the viennacl::backend::memory_handle class incorrectly, so I'm using a
&g

Re: [ViennaCL-devel] Copying Values out of a compressed_matrix

2016-07-20 Thread Karl Rupp

Hi,

please keep viennacl-devel in CC:

Just to clarify: Do you get incorrect values for a 1-by-1 matrix as 
indicated in your sample data? In your previous email you mentioned that 
results are fine for small matrices...

I'm afraid I can only guess at the source of the error with the 
informations provided. Any chance that you can provide a standalone code 
to reproduce the problem with reasonable effort?

Best regards,
Karli



On 07/20/2016 10:16 PM, Andrew Palumbo wrote:
> Thanks so much for your quick answer!
>
>
> I actually am sorry to say that I made a mistake when writing the last
> email, I copied the wrong signature from the VCL documentation, and then
> the mistake propagated through the rest of the e-mail.
>
>
> I am actually using viennacl::backend::memory_read().
>
>
> Eg, for the row_jumpers and column_idx  I read use:
>
> @Name("backend::memory_read")
> public static native void memoryReadInt(@Const @ByRef MemHandle src_buffer,
>int bytes_to_read,
>int offset,
>IntPointer ptr,
>boolean async);
>
> and for the Values:
>
>
> @Name("backend::memory_read")
> public static native void memoryReadDouble(@Const @ByRef MemHandle src_buffer,
>  int bytes_to_read,
>  int offset,
>  DoublePointer ptr,
>  boolean async);
>
> And then call:
>
>
> memoryReadInt(row_ptr_handle, (m +1) *4,0, row_ptr,false)
> memoryReadInt(col_idx_handle, NNz *4,0,col_idx,false)
> memoryReadDouble(element_handle, NNz *8,0, values,false)
>
>
> and after convetring them to java.nio.Buffers, am getting results like:
>
>
> rowBuff.get(1): 0colBuff(1): 402653448 valBuff(1): 6.91730177312166E-310
>
>
> Have also tried reading into BytePointers similarly with the same type
> of results.  I know that the use of Javacpp obfuscates what the problem
> may be.  But I believe the Memorry is properly allocated.
>
>
>
> Sorry for the mistake.
>
>
> Thanks,
>
>
> Andy
>
>
> 
> *From:* Karl Rupp <r...@iue.tuwien.ac.at>
> *Sent:* Wednesday, July 20, 2016 3:50:07 PM
> *To:* Andrew Palumbo; ViennaCL-devel@lists.sourceforge.net
> *Subject:* Re: [ViennaCL-devel] Copying Values out of a compressed_matrix
> Hi Andy,
>
> instead of viennacl::backend::memory_copy(), you want to use
> viennacl::backend::memory_read(), which directly transfers the data into
> your buffer(s).
>
> If you *know* that your handles are in host memory, you can even grab
> the values directly via
>viennacl::linalg::host_based::detail::extract_raw_pointer();
> defined in viennacl/linalg/host_based/common.hpp, around line 40.
>
> Please let me know if you still get errors after using that.
>
> Best regards,
> Karli
>
>
>
>
> On 07/20/2016 09:05 PM, Andrew Palumbo wrote:
>> Hello,
>>
>>
>> I'm Having some difficulties with compressed_matrix multiplication.
>>
>>
>> Essentially I am copying  three buffers, the CSR conversion of an Apache
>> Mahout SparseMatrix, into two compressed_matrices performing matrix
>> multiplication. I am doing this in scala and Java using javacpp.
>>
>>
>> For example, I have a 5 x 10 matrix of ~20% non-zero values which in CSR
>> format looks like this:
>>
>>
>> NNz: 12
>>
>> Row Pointer: [0, 1, 4, 6, 9, 12, ]
>>
>> Col Pointer: [9, 0, 8, 7, 2, 9, 0, 8, 9, 0, 3, 5, ]
>>
>> element Pointer: [0.4065367203992265, 0.04957158909682802,
>> 0.5205586068847993, 0.3708618354358446, 0.6963900565931678,
>> 0.8330915529787706, 0.32839112750638844, 0.7856168903297948,
>> 0.4265801782090245, 0.14733066454561583, 0.9501663495824946,
>> 0.9710498974366047, ]
>>
>> Multiplied by a similarly Sparse 10 x 5 compressed_matrix
>>
>> I use a CompressedMatrix wrapper which essentially wraps the
>>
>>  viennacl:: compressed_matrix (vcl_size_t rows, vcl_size_t cols,
>> vcl_size_t nonzeros=0, viennacl::context ctx=viennacl::context())
>>
>> constructor as well as the
>>
>>  compressed_matrix (matrix_expression< const compressed_matrix,
>> const compressed_matrix, op_prod > const ).
>>
>> I have a helper function, /toVclCompressedMatrix/(..) which essentially
>> does the CSR conversion from a Mahout src matrix, calls the constructor
>> and uses viennacl::compressed_matrix::set(...)

Re: [ViennaCL-devel] Copying Values out of a compressed_matrix

2016-07-20 Thread Karl Rupp

Hi Andy,

instead of viennacl::backend::memory_copy(), you want to use
viennacl::backend::memory_read(), which directly transfers the data into 
your buffer(s).

If you *know* that your handles are in host memory, you can even grab 
the values directly via
  viennacl::linalg::host_based::detail::extract_raw_pointer();
defined in viennacl/linalg/host_based/common.hpp, around line 40.

Please let me know if you still get errors after using that.

Best regards,
Karli




On 07/20/2016 09:05 PM, Andrew Palumbo wrote:
> Hello,
>
>
> I'm Having some difficulties with compressed_matrix multiplication.
>
>
> Essentially I am copying  three buffers, the CSR conversion of an Apache
> Mahout SparseMatrix, into two compressed_matrices performing matrix
> multiplication. I am doing this in scala and Java using javacpp.
>
>
> For example, I have a 5 x 10 matrix of ~20% non-zero values which in CSR
> format looks like this:
>
>
> NNz: 12
>
> Row Pointer: [0, 1, 4, 6, 9, 12, ]
>
> Col Pointer: [9, 0, 8, 7, 2, 9, 0, 8, 9, 0, 3, 5, ]
>
> element Pointer: [0.4065367203992265, 0.04957158909682802,
> 0.5205586068847993, 0.3708618354358446, 0.6963900565931678,
> 0.8330915529787706, 0.32839112750638844, 0.7856168903297948,
> 0.4265801782090245, 0.14733066454561583, 0.9501663495824946,
> 0.9710498974366047, ]
>
> Multiplied by a similarly Sparse 10 x 5 compressed_matrix
>
> I use a CompressedMatrix wrapper which essentially wraps the
>
>  viennacl:: compressed_matrix (vcl_size_t rows, vcl_size_t cols,
> vcl_size_t nonzeros=0, viennacl::context ctx=viennacl::context())
>
> constructor as well as the
>
>  compressed_matrix (matrix_expression< const compressed_matrix,
> const compressed_matrix, op_prod > const ).
>
> I have a helper function, /toVclCompressedMatrix/(..) which essentially
> does the CSR conversion from a Mahout src matrix, calls the constructor
> and uses viennacl::compressed_matrix::set(...) to set the buffers:
>
> val ompA =toVclCompressedMatrix(src = mxA, ompCtx)
> val ompB =toVclCompressedMatrix(src = mxB, ompCtx)
>
>
> and then create a new viennacl::compressed_matrix from the
> viennacl::linalg::prod of the 2 matrices i.e.:
>
> val ompC =new CompressedMatrix(prod(ompA, ompB))
>
> The context in the above case is either the Host or OpenMP (I know that
> there is some special casting of the row_jumpers and col_idxs that needs
> to be done in the OpenCL version)
>
> The Matrix multiplication completes without error on small Matrices eg.
> < 300 x 300
> but seems to overwrite the resulting buffers on larger Matrices.
>
> My real problem, though is getting the memory back out of the
> resulting`ompC` compresed_matrix so that i can write it back to a mahout
> SparseMatrix.
>
> currently I am using:
>
> void viennacl::backend::memory_copy (mem_handle const &  src_buffer,
>  mem_handle &  dst_buffer,
>  vcl_size_t  src_offset,
>  vcl_size_t  dst_offset,
>  vcl_size_t  bytes_to_copy
>  )
>
> on ompC.handel1,ompC.handel2 and ompC.handel source handels
>
> to copy into pre-allocated  row_jumper,  col_index and element buffers
> (of size ompC.size1() + 1, ompC.nnz and ompC.nnz, respectivly).
>
> I am getting nonsensical values back that one would expect from memory
> errors. eg:
>
> the Matrix geometry of the result: ompC.size1(), and omp.size2() are
> correct and ompC.nnz is a reasonable value.
>
> It is possible that I have mis-allocated some of the memory on my side,
> but I am pretty sure that most of the Buffers are allocated correctly
> (usually JavaCPP does a pretty good job of this).
>
>
> I guess, long story short, my question is am i using the correct method
> of copying the memory out of a compressed_matrix?  is there something
> glaringly incorrect that i am doing here?  Should I be using
> viennacl::backend::memory_copy or is there a different method that i
> should be using?
>
>
> Thanks very much,
>
> Andy
>
>
>
>
>
>
>
>
>
>
> --
> What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
> patterns at an interface-level. Reveals which users, apps, and protocols are
> consuming the most bandwidth. Provides multi-vendor support for NetFlow,
> J-Flow, sFlow and other flows. Make informed decisions using capacity planning
> reports.http://sdm.link/zohodev2dev
>
>
>
> ___
> ViennaCL-devel mailing list
> ViennaCL-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/viennacl-devel
>


--
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity planning

Re: [ViennaCL-devel] Initializing matrices via direct serialization (row major or CCS)

2016-07-18 Thread Karl Rupp

Hi,

 > Is DGEMM your performance-critical operation? Are there any other
> performance-critical operations?
>
>
> For now we are only looking at (especially sparse) blas3 and
> decompositions. Basically, your normal R base functionality for
> in-memory sparse algebra.

Sparse factorizations (LU, QR, etc.) are very hard to parallelize for 
many-core architectures (GPUs in particular).


> One more question i had:
>
> do you guys handle low resource cases? like transfer optimization for
> blockwise multiplication in case operands do not fit -- out-of-core
> algorithms?

out-of-core has gone out-of-fashion. The reason is that the differences 
in memory speed has become so large that falling back to a slower memory 
type almost never pays off.


> Did you look at gpu+cpu combined balanced algorithms (as i guess MAGMA
> did for some)?

yes, a couple of algorithms in ViennaCL use GPUs for the main work (i.e. 
GEMM) and CPUs for sequential in the algorithm.

Best regards,
Karli


--
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports.http://sdm.link/zohodev2dev
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] Initializing matrices via direct serialization (row major or CCS)

2016-07-14 Thread Karl Rupp

Hi again,

 > So fast_copy still copies the memory and has copying overhead, even with
> MAIN_MEMORY context?

Yes. It's a copy() operation, so it just does what the name suggests.

> Is there a way to do shallow copying  (i.e. just pointer initialization)
> to the matrix data buffer? Isn't it what some constructors of matrix or
> matrix_base do?

Yes, you can pass your pointer via the constructors, e.g.
https://github.com/viennacl/viennacl-dev/blob/master/viennacl/matrix.hpp#L721

> What i am getting at, it looks like i am getting a significant overhead
> for just copying -- actually, it seems i am getting double overhead --
> once when i prepare padding and all as required by the internal_size?(),
> and then i pass it into the fast_copy() which apparently does copying
> again, even if we are using host memory matrices.

If you want to 'wrap' your data in a ViennaCL matrix, pass the pointer 
to the constructors. If you want to quickly copy your data over to 
memory managed by a ViennaCL matrix, use copy() or fast_copy(). From 
your description it looks like you are now looking for the constructor 
calls, but from your earlier email I thought that you are looking for a 
fast_copy().

> all in all, by my estimates this copying back and forth (which is,
> granted, is not greatly optimized on our side) takes ~15..17 seconds out
> of 60 seconds total when multiplying 10k x 10k dense arguments via
> ViennaCL. I also optimize to -march=haswell  and use -ffast-math,
> without those i seem to fall too far behind what R + openblas can do in
> this test. Then, my processing time swells up to 2 minutes without
> optimizing for non-compliant arithmetics.

15 seconds of copying for a 10k-by-10k matrix looks way too much. 
10k-by-10k is 800 MB of data for double precision, so this should not 
take much more than 100 ms on a low-range laptop (10 GB/sec memory 
bandwidth). Even with multiple matrices and copies you should stay in 
the 1 second regime.

> If i can wrap the buffer and avoid copying for MAIN_MEMORY context, i'd
> be shaving off another 10% or so of the execution time. Which would make
> me happier, as i probably would be able to beat openblas given custom
> cpu architecture flags.

Why do you expect to beat OpenBLAS? Their kernels are really well 
optimized, and for lare dense matrix-matrix you are always FLOP-limited.

> On the other hand, bidmat (which allegedly uses mkl) does the same test,
> double precision, in under 10 seconds. I can't fathom how, but it does.
> I have a haswell-E platform.

Multiplication of 10k-by-10k matrices amounts to 200 GFLOP of compute in 
double precision. A Haswell-E machine provides that within a few 
seconds, depending on the number of cores (2.4 GHz * 4 doubles with AVX 
* 2 for FMA = 19.2 GFLOP/sec per core. MKL achieves about 15 GFLOP/sec 
per core).

ViennaCL's host-backend is not strong on dense matrix-matrix multiplies 
(even though we've got some improvements in a pull request), so for this 
particular operation you will get better performance from MKL, OpenBLAS, 
or libflame.

Best regards,
Karli

> On Tue, Jul 12, 2016 at 9:27 AM, Karl Rupp <r...@iue.tuwien.ac.at
> <mailto:r...@iue.tuwien.ac.at>> wrote:
>
> Hi,
>
> > One question: you mentioned padding for the `matrix` type. When i
>
> initialize the `matrix` instance, i only specify dimensions. how
> do I
> know padding values?
>
>
> if you want to provide your own padded dimensions, consider using
> matrix_base directly. If you want to query the padded dimensions,
> use internal_size1() and internal_size2() for the internal number of
> rows and columns.
>
> http://viennacl.sourceforge.net/doc/manual-types.html#manual-types-matrix
>
> Best regards,
> Karli
>
>
>
>
> On Tue, Jul 12, 2016 at 5:53 AM, Karl Rupp
> <r...@iue.tuwien.ac.at <mailto:r...@iue.tuwien.ac.at>
> <mailto:r...@iue.tuwien.ac.at <mailto:r...@iue.tuwien.ac.at>>>
> wrote:
>
>  Hi Dmitriy,
>
>  On 07/12/2016 07:17 AM, Dmitriy Lyubimov wrote:
>
>  Hi,
>
>  I am trying to create some elementary wrappers for VCL
> in javacpp.
>
>  Everything goes fine, except i really would rather not
> use those
>  "cpu"
>  types (std::map,
>  std::vector) and rather initialize matrices directly by
> feeding
>  row-major or CCS formats.
>
>  I see that matrix () constructor accepts this form of
>  initialization;
>  but it really states that
>  it does "wrapping" for th

Re: [ViennaCL-devel] Initializing matrices via direct serialization (row major or CCS)

2016-07-12 Thread Karl Rupp

Hi,

 > One question: you mentioned padding for the `matrix` type. When i
> initialize the `matrix` instance, i only specify dimensions. how do I
> know padding values?

if you want to provide your own padded dimensions, consider using 
matrix_base directly. If you want to query the padded dimensions, use 
internal_size1() and internal_size2() for the internal number of rows 
and columns.

http://viennacl.sourceforge.net/doc/manual-types.html#manual-types-matrix

Best regards,
Karli



>
> On Tue, Jul 12, 2016 at 5:53 AM, Karl Rupp <r...@iue.tuwien.ac.at
> <mailto:r...@iue.tuwien.ac.at>> wrote:
>
> Hi Dmitriy,
>
> On 07/12/2016 07:17 AM, Dmitriy Lyubimov wrote:
>
> Hi,
>
> I am trying to create some elementary wrappers for VCL in javacpp.
>
> Everything goes fine, except i really would rather not use those
> "cpu"
> types (std::map,
> std::vector) and rather initialize matrices directly by feeding
> row-major or CCS formats.
>
> I see that matrix () constructor accepts this form of
> initialization;
> but it really states that
> it does "wrapping" for the device memory.
>
>
> Yes, the constructors either create their own memory buffer
> (zero-initialized) or wrap an existing buffer. These are the only
> reasonable options.
>
>
> Now, i can create a host matrix() using host memory and row-major
> packing. This works ok it seems.
>
> However, these are still host instances. Can i copy host
> instances to
> instances on opencl context?
>
>
> Did you look at viennacl::copy() or viennacl::fast_copy()?
>
>
> That might be one way bypassing unnecessary (in my case)
> complexities of
> working with std::vector and std::map classes from java side.
>
> But it looks like there's no copy() variation that would accept a
> matrix-on-host and matrix-on-opencl arguments (or rather, it of
> course
> declares those to be ambiguous since two methods fit).
>
>
> If you want to copy your OpenCL data into a viennacl::matrix, you
> may wrap the memory handle (obtained with .elements()) into a vector
> and copy that. If you have plain host data, use
> viennacl::fast_copy() and mind the data layout (padding of
> rows/columns!)
>
>
> For compressed_matrix, there seems to be a set() method, but i guess
> this also requires CCS arrays in the device memory if I use it. Same
> question, is there a way to send-and-wrap CCS arrays to an
> opencl device
> instance of compressed matrix without using std::map?
>
>
> Currently you have to use .set() if you want to bypass
> viennacl::copy() and std::map.
>
> I acknowledge that the C++ type system is a pain when interfacing
> from other languages. We will make this much more convenient in
> ViennaCL 2.0. The existing interface in ViennaCL 1.x is too hard to
> fix without breaking lots of user code, so we won't invest time in
> that (contributions welcome, though :-) )
>
> Best regards,
> Karli
>
>
>


--
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports.http://sdm.link/zohodev2dev
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] Initializing matrices via direct serialization (row major or CCS)

2016-07-12 Thread Karl Rupp

Hi Dmitriy,

On 07/12/2016 07:17 AM, Dmitriy Lyubimov wrote:
> Hi,
>
> I am trying to create some elementary wrappers for VCL in javacpp.
>
> Everything goes fine, except i really would rather not use those "cpu"
> types (std::map,
> std::vector) and rather initialize matrices directly by feeding
> row-major or CCS formats.
>
> I see that matrix () constructor accepts this form of initialization;
> but it really states that
> it does "wrapping" for the device memory.

Yes, the constructors either create their own memory buffer 
(zero-initialized) or wrap an existing buffer. These are the only 
reasonable options.

> Now, i can create a host matrix() using host memory and row-major
> packing. This works ok it seems.
>
> However, these are still host instances. Can i copy host instances to
> instances on opencl context?

Did you look at viennacl::copy() or viennacl::fast_copy()?

> That might be one way bypassing unnecessary (in my case) complexities of
> working with std::vector and std::map classes from java side.
>
> But it looks like there's no copy() variation that would accept a
> matrix-on-host and matrix-on-opencl arguments (or rather, it of course
> declares those to be ambiguous since two methods fit).

If you want to copy your OpenCL data into a viennacl::matrix, you may 
wrap the memory handle (obtained with .elements()) into a vector and 
copy that. If you have plain host data, use viennacl::fast_copy() and 
mind the data layout (padding of rows/columns!)

> For compressed_matrix, there seems to be a set() method, but i guess
> this also requires CCS arrays in the device memory if I use it. Same
> question, is there a way to send-and-wrap CCS arrays to an opencl device
> instance of compressed matrix without using std::map?

Currently you have to use .set() if you want to bypass viennacl::copy() 
and std::map.

I acknowledge that the C++ type system is a pain when interfacing from 
other languages. We will make this much more convenient in ViennaCL 2.0. 
The existing interface in ViennaCL 1.x is too hard to fix without 
breaking lots of user code, so we won't invest time in that 
(contributions welcome, though :-) )

Best regards,
Karli

--
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports.http://sdm.link/zohodev2dev
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] Custom OpenCL kernel confusion

2016-06-10 Thread Karl Rupp

Hi,

 > I believe I figured it out, your comment about the global sizes allowed
> me to realize the the defaults don't account for a second dimension.
> Once I set that I am able to get the kernel to work properly.  Thank you
> for listening and directing me to different points to check.

ah, great, I'm glad it's now working! :-)

Best regards,
Karli




> On Fri, Jun 10, 2016 at 7:42 AM, Charles Determan <cdeterma...@gmail.com
> <mailto:cdeterma...@gmail.com>> wrote:
>
> I neglected one further question you had
>
> Which local and global work sizes do you use?
>
> I am not setting any local/global work sizes as I thought the
> defaults specified by ViennaCL were supposed to be sufficient as
> noted in the documentation
> (http://viennacl.sourceforge.net/doc/manual-custom-kernels.html) -
> 'The default work sizes suffice for most cases'.
>
> Regards,
> Charles
>
> On Fri, Jun 10, 2016 at 7:35 AM, Charles Determan
> <cdeterma...@gmail.com <mailto:cdeterma...@gmail.com>> wrote:
>
> Karl,
>
> I am trying to adapt from a previous kernel I knew worked on an
> unpadded matrix.
>
> __kernel void iMatMult(const int Mdim, const int Pdim,
> __global const int *A, __global const
> int *B, __global int *C) {
>
>  int k;
>
>  // Get the index of the elements to be processed
>  const int globalRow = get_global_id(0); // C Row ID
>  const int globalCol = get_global_id(1); // C Col ID
>  int tmp = 0;
>
>  // Do the operation
>  for(k=0; k < Pdim; k++){
>  tmp += A[k*Mdim+globalRow] * B[globalCol*Pdim+k];
>  }
>  C[globalCol*Mdim+globalRow] = tmp;
> }
>
> So when you ask - "where is the third dimension? Are you
> assuming C to be M-by-M?"
>
> I haven't passed a third dimension as Mdim is the number of
> columns and Pdim is the number of rowsin matrix 'A'.
>
> Which values do you pass to the kernel? Which local and global
> work sizes do you use?
>
> Right now I am passing Mdim, Pdim, MdimPad (padded number of
> columns), PdimPad (padded number of rows), and three matrices.
>
> I'm confused with your use of MdimPad and PdimPad here. As
> currently written, A has Mdim columns, and B has Pdim columns.
> But this doesn't agree with the if-check above, where C is
> assumed Mdim-by-Mdim.
>
> I am using MdimPad and PdimPad to index the matrix elements
> because they are padded (this is new to me for writing OpenCL
> kernels).  C is intended to be square but I can't even get it to
> work with a square matrix.  That line actually looks like I
> intended to have:
>
> if (globalRow > MdimPad || globalCol > PdimPad)
>  return;
>
> but that still doesn't fix the problem for me.
>
> The last line assumes C to be M-by-M. Is this the case?
>
> Again, I am trying to base this off the previous kernel which I
> thought worked for non-square matrices but I could very well be
> mistaken.  The entire goal here is to just get a basic working
> integer gemm kernel for square or rectangular matrices.  I
> really didn't think it would be difficult but I think I have
> fallen in a rabbit hole at this point and likely just confusing
> myself.
>
> Regards,
> Charles
>
>
> On Fri, Jun 10, 2016 at 3:40 AM, Karl Rupp
> <r...@iue.tuwien.ac.at <mailto:r...@iue.tuwien.ac.at>> wrote:
>
> Hi Charles,
>
> Here is the current kernel
> with all the different attempts commented out (where
> MdimPad and PdimPad
> or the padded dimensions).
>
>
> where is the third dimension? Are you assuming C to be M-by-M?
>
>
>
> If I don't have a size condition check, the
> device quickly runs out of resources (Error: ViennaCL:
> FATAL ERROR:
> CL_OUT_OF_RESOURCES ).  Any thoughts?  I feel like I
> must be missing
> something simple at this point.
>
>
> Which values do you pass to the kernel? Which local and
> global work sizes do you use?
>
>
>
>
> __kernel void iMatMult(const int Mdim, const int MdimPad,
>

Re: [ViennaCL-devel] Custom OpenCL kernel confusion

2016-06-10 Thread Karl Rupp

Hi Charles,

> Here is the current kernel
> with all the different attempts commented out (where MdimPad and PdimPad
> or the padded dimensions).

where is the third dimension? Are you assuming C to be M-by-M?



> If I don't have a size condition check, the
> device quickly runs out of resources (Error: ViennaCL: FATAL ERROR:
> CL_OUT_OF_RESOURCES ).  Any thoughts?  I feel like I must be missing
> something simple at this point.

Which values do you pass to the kernel? Which local and global work 
sizes do you use?




> __kernel void iMatMult(const int Mdim, const int MdimPad,
> const int Pdim, const int PdimPad,
> __global const int *A, __global const int *B,
> __global int *C) {
>
>  // Get the index of the elements to be processed
>  const int globalRow = get_global_id(0); // C Row ID
>  const int globalCol = get_global_id(1); // C Col ID
>  int tmp = 0;
>
>  if (globalRow > MdimPad || globalCol > MdimPad)
>  return;

Here it should be enough to check against Mdim.

>  printf("globalCol = %d\n", globalCol);
>  printf("globalRow = %d\n", globalRow);
>
>  // Do the operation
>  for(int k=0; k < Pdim; k++){
>  tmp += A[globalRow * MdimPad + k] * B[globalCol+PdimPad*k];

I'm confused with your use of MdimPad and PdimPad here. As currently 
written, A has Mdim columns, and B has Pdim columns. But this doesn't 
agree with the if-check above, where C is assumed Mdim-by-Mdim.

>  }
>
>  C[globalCol+MdimPad*globalRow] = tmp;

The last line assumes C to be M-by-M. Is this the case?

Best regards,
Karli


--
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] Custom OpenCL kernel confusion

2016-05-23 Thread Karl Rupp

Hey,

 > Ah yes, thanks Karl.  I remember that now.  With that said, are there
> recommendations on how kernels should be written to address the padded
> columns?  I am imagining some if/else or loop limits on indices but
> thought I would ask here before I start trying to do that.  I am trying
> to look through the kernels and I am seeing things along the lines of
> 'global_size(0) < size' where I assume size refers to one of the dimensions?

It depends on the respective assumptions and guarantees you make on the 
underlying data. The 'safest' way to deal with it is with something like 
the following (based on the kernel code you provided):

  const int globalRow = get_global_id(0); // C Row ID
  const int globalCol = get_global_id(1); // C Col ID
  int tmp = 0;

  if (globalRow < rowsC || globalCol < colsC)
return;

  for(int k=0; k < Msize; k++) // Msize instead of Mdim here!
tmp += A[k*Mdim+globalRow] * B[globalCol*Pdim+k];

  C[globalCol*Mdim+globalRow] = tmp;

Note that this code assumes column-major (Fortan) data layout, whereas 
the standard layout in ViennaCL is row-major (C).

> If so, I humbly recommend that although the padding is mentioned with
> respect to the matrix types either an example or explanation would be
> valuable in the custom kernel section (at the very least another
> friendly reminder).  Not all repetition is bad :)

Agreed. :-)

Best regards,
Karli


--
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] Custom OpenCL kernel confusion

2016-05-23 Thread Karl Rupp

Hi,

On 05/23/2016 05:38 PM, Charles Determan wrote:
> I am experimenting with the custom OpenCL kernel functionality,
> specifically a naive matrix multiplication as an example.
>
> My OpenCL Kernel:
> __kernel void iMatMult(const int Mdim, const int Pdim,
> __global const int *A, __global const int *B,
> __global int *C) {
>
>  // Get the index of the elements to be processed
>  const int globalRow = get_global_id(0); // C Row ID
>  const int globalCol = get_global_id(1); // C Col ID
>  int tmp = 0;
>
>  // Do the operation
>  for(int k=0; k < Pdim; k++){
>  tmp += A[k*Mdim+globalRow] * B[globalCol*Pdim+k];
>  }
>  C[globalCol*Mdim+globalRow] = tmp;
> }
>
>
> Relevant C++ code
> where vcl_* refer to viennacl::matrix
> and my_kernel is a string referring to the kernel above:
>
>  int M = vcl_A.size2();
>  int P = vcl_A.size1();
>
>  // add kernel to program
>  viennacl::ocl::program & my_prog =
> viennacl::ocl::current_context().add_program(my_kernel, "my_kernel");
>
>  // get compiled kernel function
>  viennacl::ocl::kernel & my_kernel_mul = my_prog.get_kernel("iMatMult");
>
>  // execute kernel
>  viennacl::ocl::enqueue(my_kernel_mul(M, P, vcl_A, vcl_B, vcl_C));
>
>
> Oddly, the results in the vcl_C object are incorrect.  But if I manually
> go through the OpenCL using the C++ API the results are correct (which
> you can see the API code here
> https://github.com/cdeterman/gpuR/blob/develop/src/gpuMatrix_igemm.cpp).  Did
> I miss something?

yes, you missed the internal data layout: rows/columns may be padded 
with zeros:
http://viennacl.sourceforge.net/doc/manual-types.html#manual-types-matrix
(this has performance reasons, but it is - unfortunately - often 
overlooked by users)

Best regards,
Karli




>
>
>
> --
> Mobile security can be enabling, not merely restricting. Employees who
> bring their own devices (BYOD) to work are irked by the imposition of MDM
> restrictions. Mobile Device Manager Plus allows you to control only the
> apps on BYO-devices by containerizing them, leaving personal data untouched!
> https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
>
>
>
> ___
> ViennaCL-devel mailing list
> ViennaCL-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/viennacl-devel
>


--
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] gpuR support

2016-05-16 Thread Karl Rupp

Hi Charles,

 > In light of the recent question coming here regarding gpuR I was
> wondering if I should alter the error function within RViennaCL.  Do you
> mind the users may come here if a function throws a ViennaCL error?

As long as the ViennaCL mailing list don't get swamped by gpuR-related 
issues, I see no problem.

>  I
> am trying to implement many error checks from the R end and direct users
> of the R packages to file them within the github or contacting me but it
> is likely that users will sometimes see that same message and go
> immediately here.

You can fairly generically catch the exceptions thrown and add/replace 
whichever error gets thrown. That may already suffice. :-)

Best regards,
Karli



--
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] Forking multiple OpenCL contexts

2016-05-11 Thread Karl Rupp

Hi,

 > One last followup question on this line of thought.  I seem to be able
> to use the direct context in most of my functions but I am having
> trouble when I try to assign a previously declared class variable.
>
> In my class dynVCLMat
> (https://github.com/cdeterman/gpuR/blob/develop/inst/include/gpuR/dynVCLMat.hpp)
> I have 'A' declared as a viennacl::matrix
>
> I can assign this object with a new matrix:
>
>
>  viennacl::context
> ctx(viennacl::ocl::get_context(static_cast(ctx_id)));
>  A = viennacl::matrix(nr_in, nc_in, ctx);
>
> This normally works fine as long as the 'ctx_id' is the current
> context.  If it isn't (say the ctx_id is '1' when the current context is
> '0'), then I get the error:
>
>  >Error: ViennaCL: FATAL ERROR: CL_INVALID_MEM_OBJECT
>
> I'm trying to purge away my 'setContext()' calls within functions so
> they can fork safely but this doesn't appear to behave the way I was
> expecting.

It seems to me that your matrix A got created in the 'other' context (in 
this example: with ID 0) first. Automatic context migration is not 
supported, because that can cause quite a couple of undesired side effects.

If you want to make sure that ['A' resides on the correct context, call 
.switch_memory_context(new_ctx) first. For your example:

   viennacl::context ctx(viennacl::ocl::get_context(ctx_id));
   A.switch_memory_context(ctx);
   A = viennacl::matrix(nr_in, nc_in, ctx);

Best regards,
Karli



>
> On Tue, May 10, 2016 at 10:54 AM, Charles Determan
> <cdeterma...@gmail.com <mailto:cdeterma...@gmail.com>> wrote:
>
> That does it, didn't think to look for an explicit context.hpp
> file.  Makes sense that the it is with the matrix.hpp file now.
>
> Thanks again,
> Charles
>
> On Tue, May 10, 2016 at 9:07 AM, Karl Rupp <r...@iue.tuwien.ac.at
> <mailto:r...@iue.tuwien.ac.at>> wrote:
>
> Hi,
>
> > After poking around with different headers it appears I need one of 
> the
>
> object type files such as "viennacl/matrix.hpp" in order to
> declare
> viennacl::context objects.  Shouldn't this be possible from just
> "viennacl/ocl/backend.hpp"?  Just curious on the design at
> this point.
>
>
> have you tried to include viennacl/context.hpp directly?
>
> Best regards,
> Karli
>
>
>
> On Tue, May 10, 2016 at 8:40 AM, Charles Determan
> <cdeterma...@gmail.com <mailto:cdeterma...@gmail.com>
> <mailto:cdeterma...@gmail.com
> <mailto:cdeterma...@gmail.com>>> wrote:
>
>  Excellent, thank you for explaining that.  One followup
> thing, I am
>  trying a trivial function to test this:
>
>  int findContext(int x){
>   viennacl::context
>  ctx(viennacl::ocl::get_context(static_cast(x)))
>   return x
>  }
>
>  but I keep getting a compiler error:
>
>   >error: variable ‘viennacl::context ctx’ has
> initializer but
>  incomplete type
>
>  Did I miss something?
>
>  Thanks,
>  Charles
>
>
>  On Mon, May 9, 2016 at 4:09 PM, Karl Rupp
> <r...@iue.tuwien.ac.at <mailto:r...@iue.tuwien.ac.at>
>  <mailto:r...@iue.tuwien.ac.at
> <mailto:r...@iue.tuwien.ac.at>>> wrote:
>
>  Hi Charles,
>
>  setContext() is not thread-safe, so if mclapply()
> is executing
>  in parallel, there will be a race. MPI works across
> processes,
>  so globals are not shared (and hence setContext()
> has no problems).
>
>  If you want to run multithreaded, you should manage
> the contexts
>  explicitly. That is, pass a viennacl::context
> object to the
>  respective constructors of viennacl::vector,
> viennacl::matrix,
>  etc. You can find examples here:
> 
> https://github.com/viennacl/viennacl-dev/blob/master/examples/tutorial/multithreaded.cpp
> 
> https://github.com/viennacl/viennacl-dev/blob/master/examples/tutorial/multithreaded_cg.cpp
>
>  Best regards,
>  Karli
>
>
>
>
>  On 05/09/2016

Re: [ViennaCL-devel] Forking multiple OpenCL contexts

2016-05-09 Thread Karl Rupp

Hi Charles,

setContext() is not thread-safe, so if mclapply() is executing in 
parallel, there will be a race. MPI works across processes, so globals 
are not shared (and hence setContext() has no problems).

If you want to run multithreaded, you should manage the contexts 
explicitly. That is, pass a viennacl::context object to the respective 
constructors of viennacl::vector, viennacl::matrix, etc. You can find 
examples here:
https://github.com/viennacl/viennacl-dev/blob/master/examples/tutorial/multithreaded.cpp
https://github.com/viennacl/viennacl-dev/blob/master/examples/tutorial/multithreaded_cg.cpp

Best regards,
Karli



On 05/09/2016 10:32 PM, Charles Determan wrote:
> I am trying to use multiple GPU's in parallel using ViennaCL through my
> gpuR package.  In essence the context is selected at the R level (one
> context per device already initialized).  Then a matrix is created on
> that device.  The code looks like this:
>
> create_matrix <- function(id){
> # sets context
> setContext(id)
> # create matrix
> mat <- vclMatrix(rnorm(16), nrow=4, ncol=4)
> return(mat)
> }
>
> # a fork over context 1 and 2
> out <- mclapply(1:2, function(x) create_matrix(x))
>
> Yet, strangely this just hangs.  It doesn't return anything.  Perhaps I
> am missing something in how OpenCL contexts are handed in parallel?  I
> ask as I recall that PETSC had some multi GPU functionality with MPI.
> The above makes sense to me without MPI but again I may be missing
> something.
>
> The actual copying happens in this file at line 36
> (https://github.com/cdeterman/gpuR/blob/develop/src/dynVCLMat.cpp) just
> after "starting to copy".  It seems almost like it a problem with the
> class objects (defined in
> https://github.com/cdeterman/gpuR/blob/develop/inst/include/gpuR/dynVCLMat.hpp).
>
> Any insight would be appreciated.
>
> Regards,
> Charles
>
>
> --
> Mobile security can be enabling, not merely restricting. Employees who
> bring their own devices (BYOD) to work are irked by the imposition of MDM
> restrictions. Mobile Device Manager Plus allows you to control only the
> apps on BYO-devices by containerizing them, leaving personal data untouched!
> https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
>
>
>
> ___
> ViennaCL-devel mailing list
> ViennaCL-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/viennacl-devel
>


--
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] Setting context platforms and devices

2016-04-28 Thread Karl Rupp

Hi Charles,

 > Thanks for getting back, there isn't anything else at the moment.  The
> issue was needing to use setup_context to explicitly assign the devices.

ok, thanks for reporting.

> However, since I have you attention, is there any way for using multiple
> GPUs on a single job (e.g. GEMM)?
>
> Let's say I have two matrices that are too large for one GPU but will
> fit on both my GPUs.  If I create a ONE context with BOTH GPUs and I
> assign each matrix to one GPU would the normal GEMM workflow function
> normally?

No, currently this is not implemented. You would have to take care of 
all the data decomposition yourself. The OpenCL semantics of how memory 
buffers are managed don't help either; in the end, one is left with all 
the data juggling.

I don't have a long-term view of whether it is worth to implement such 
multi-GPU support. The (imho) best way to deal with multiple GPUs is via 
MPI, where one has to deal with distributed memory anyway. As such, 
using ViennaCL via PETSc gives you just that for sparse matrices, 
solvers, etc. At the same time, I understand that MPI is not always an 
option, for example in your context. It would be very helpful to know 
how important (vs. just "nice to have") such 'native' GPU support is for 
you and users of gpuR.

Best regards,
Karli




> On Thu, Apr 28, 2016 at 11:12 AM, Karl Rupp <r...@iue.tuwien.ac.at
> <mailto:r...@iue.tuwien.ac.at>> wrote:
>
> Hi Charles,
>
> > I am trying to set up a list of contexts whereby each context represents
>
> one platform and one device.  I was thinking the function found here
> (https://github.com/cdeterman/gpuR/blob/develop/src/context.cpp)
> starting at line 108 (listContexts) would work.
>
> However, as seen in this issue
> (https://github.com/cdeterman/gpuR/issues/9) this is not the
> case.  For
> some reason a user with two AMD GPUs, iterating through the
> platforms
> and devices with each iterative context id (using
> 'set_context_platform_index' and
> 'get_context(id).switch_device(gpu_idx)') results in only the
> first gpu
> as recognizable.
>
> Perhaps I have made some mistake in how these context functions
> work?  I
> can confirm access to the second gpu if referring directly to the
> 'platforms' object but not with respect to a context.  Any
> insight would
> be appreciated.
>
>
> I see that the referenced issue on GitHub seems to be resolved
> (sorry for not being able to answer sooner). Is there anything left
> that I should look into?
>
> Best regards,
> Karli
>
>


--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] Handling a matrix from OpenCL

2016-03-30 Thread Karl Rupp

Hi Jose,

 > (Is it Karl or Karli?)

feel free to pick your favorite (just like Jose/Pepe) :-)


> Thanks for your quick answer! I should further investigate. However, I
> could not find how to execute an exclusive_scan() without downloading
> the number of elements per row from the GPU to the host.

exclusive_scan() is also available for vectors (more precisely: the base 
class vector_base<>):
https://github.com/viennacl/viennacl-dev/blob/master/viennacl/linalg/vector_operations.hpp#L1272

Here is an example of how to wrap a host buffer in a vector_base object 
in order to call exclusive_scan:
https://github.com/viennacl/viennacl-dev/blob/master/viennacl/linalg/host_based/amg_operations.hpp#L1444

(the OpenCL version works similarly - either provide your own OpenCL 
cl_mem buffer, or use the memory handle from the compressed_matrix).

Best regards,
Karli






> 2016-03-29 22:24 GMT+02:00 Karl Rupp <r...@iue.tuwien.ac.at
> <mailto:r...@iue.tuwien.ac.at>>:
>
> Hi Jose,
>
> > I have been looking for documentation regarding how to handle a matrix
>
> (sparse) in an OpenCL kernel. My main objective is to can fill a
> sparse
> matrix using an OpenCL kernel.
>
> Would it be possible??
>
> P.S. I can eventually precompute the matrix and indexes arrays
> sizes.
>
>
> Yes, that's possible. The typical procedure for doing so is:
>
> 1. Compute number of elements per row, store in a buffer
> 2. Run exclusive_scan() on the buffer to obtain the row-array for
> the three-array CSR format.
> 3. Allocate the column index and value array (CSR format)
> 4. Populate these two arrays with values
> 5. Pass the arrays to a viennacl::compressed_matrix<>.
>
> Instead of 5. you may already start with a
> viennacl::compressed_matrix<> and operate on the OpenCL buffers
> directly.
>
> There is no ready-to-go example, but the pattern shows up in e.g.
> the sparse matrix-matrix product
> (viennacl/linalg/opencl/sparse_matrix_operations.hpp).
>
> Best regards,
> Karli
>
>
>


--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785471=/4140
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] Handling a matrix from OpenCL

2016-03-29 Thread Karl Rupp

Hi Jose,

 > I have been looking for documentation regarding how to handle a matrix
> (sparse) in an OpenCL kernel. My main objective is to can fill a sparse
> matrix using an OpenCL kernel.
>
> Would it be possible??
>
> P.S. I can eventually precompute the matrix and indexes arrays sizes.

Yes, that's possible. The typical procedure for doing so is:

1. Compute number of elements per row, store in a buffer
2. Run exclusive_scan() on the buffer to obtain the row-array for the 
three-array CSR format.
3. Allocate the column index and value array (CSR format)
4. Populate these two arrays with values
5. Pass the arrays to a viennacl::compressed_matrix<>.

Instead of 5. you may already start with a viennacl::compressed_matrix<> 
and operate on the OpenCL buffers directly.

There is no ready-to-go example, but the pattern shows up in e.g. the 
sparse matrix-matrix product 
(viennacl/linalg/opencl/sparse_matrix_operations.hpp).

Best regards,
Karli



--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785471=/4140
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] FFT-2D

2016-03-02 Thread Karl Rupp

Hi Sumit,

 > I will try out the method you suggested. I have a second question
> regarding FFT. Is there any example in VCL to compute a 2D FFT? Can you
> share some pointers to that?

There are tests in tests/src which you can have a look at.


> Also, when computing the FFT do we have to ensure that the data is
> dyadic (or to be zero-padded to make it dyadic)?

You can pass any data size, but you will only see good performance with 
a power of 2.

Best regards,
Karli




>
> ----
> *From:* Karl Rupp <r...@iue.tuwien.ac.at>
> *To:* Sumit Kumar <dost_4_e...@yahoo.com>; viennacl-devel
> <viennacl-devel@lists.sourceforge.net>
> *Sent:* Tuesday, March 1, 2016 7:24 PM
> *Subject:* Re: [ViennaCL-devel] Examples of importing raw buffers
>
> Hi Sumit,
>
> sorry, I overlooked this email in my mailers threaded view.
>
>
>  > I have a question here:
>  > a.) Suppose I have three raw buffers (let us assume float); I would like
>  > to do something like this:
>  > v = (I- a) / (W-a+EPS), where I, W and a are the three buffers and EPS
>  > is a scalar to prevent a division by 0 operation.
>  >
>  > b.) I know I can wrap these buffers to Eigen matrices and then copy them
>  > to VCL using the API's. However, is there a way to do this step in a 1
>  > shot operation without acutally wrapping them to Eigen? Essentially,
>  > this is an element-wise subtraction and a division happening
>  > simultaneously, and both are embarrassingly parallel operations.
>
> You can just use vector operations. something like
>viennacl::vector eps = viennacl::scalar_vector(N, EPS);
>v = viennacl::linalg::element_div(I - a, W - a + eps);
> should do it.
>
>
>
>  > The second question I have is w.r.t OpenCL SDK selection:
>  > a.) Let us assume I have an Intel CPU with an AMD GPU. VCL supports
>  > OpenCL backends for both CPU and GPU and I have the liberty to choose
>  > any device and any backend.
>  > b.) If I were to have this combination, how would I go about building
>  > the code? Do I need to download SDK's for both combinations (so that I
>  > can distribute both binaries?). I had posed this question on the OpenCL
>  > users group in LinkedIn but got no response so thought of asking you
>  > instead.
>
> The Intel OpenCL SDK will only support CPUs. The only way to get access
> to the AMD GPU is to install the AMD GPU driver (formerly this was
> bundled in the AMD APP SDK). You have to query the OpenCL platform
> properties (e.g. vendor name) at runtime to select the correct platform.
> In ViennaCL you can, for example, check the id() of the platform, which
> is usually the same for a given vendor across different machines.
>
>
> Best regards,
> Karli
>
>
>
>


--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151=/4140
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] Examples of importing raw buffers

2016-03-01 Thread Karl Rupp

Hi Sumit,

sorry, I overlooked this email in my mailers threaded view.


 > I have a question here:
> a.) Suppose I have three raw buffers (let us assume float); I would like
> to do something like this:
> v = (I- a) / (W-a+EPS), where I, W and a are the three buffers and EPS
> is a scalar to prevent a division by 0 operation.
>
> b.) I know I can wrap these buffers to Eigen matrices and then copy them
> to VCL using the API's. However, is there a way to do this step in a 1
> shot operation without acutally wrapping them to Eigen? Essentially,
> this is an element-wise subtraction and a division happening
> simultaneously, and both are embarrassingly parallel operations.

You can just use vector operations. something like
  viennacl::vector eps = viennacl::scalar_vector(N, EPS);
  v = viennacl::linalg::element_div(I - a, W - a + eps);
should do it.



> The second question I have is w.r.t OpenCL SDK selection:
> a.) Let us assume I have an Intel CPU with an AMD GPU. VCL supports
> OpenCL backends for both CPU and GPU and I have the liberty to choose
> any device and any backend.
> b.) If I were to have this combination, how would I go about building
> the code? Do I need to download SDK's for both combinations (so that I
> can distribute both binaries?). I had posed this question on the OpenCL
> users group in LinkedIn but got no response so thought of asking you
> instead.

The Intel OpenCL SDK will only support CPUs. The only way to get access 
to the AMD GPU is to install the AMD GPU driver (formerly this was 
bundled in the AMD APP SDK). You have to query the OpenCL platform 
properties (e.g. vendor name) at runtime to select the correct platform. 
In ViennaCL you can, for example, check the id() of the platform, which 
is usually the same for a given vendor across different machines.

Best regards,
Karli



--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151=/4140
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] Rank/Sort Functionality?

2016-02-24 Thread Karl Rupp

Hi Charles,

 > I read on a prior post on Karl Rupp's website regarding project ideas
> for GSoC 2014
> (https://www.karlrupp.net/2014/02/mentored-project-ideas-for-gsoc-2014/).
> One of those was to implement sort() methods.  Was this ever
> accomplished/implemented?  I haven't found anything in the documentation
> but again it is possible I have overlooked it.  This would be very
> valuable for rank based methods such as simple correlation methods like
> Kendall and Spearman.

no, sort() was never implemented (other projects were run in that GSoC). 
Sorting will be implemented sooner or later, but it won't make it for 1.8.0.

Best regards,
Karli

--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151=/4140
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] Current context index?

2016-02-01 Thread Karl Rupp

Hi Charles,

 > Sorry to continue this thread but I am very confused as I am trying to
> add the public method you have suggested.  I thought the following
> simple function would be sufficient in the backend.hpp file.
>
>/** @brief Returns the current context index */
>long current_context_id()

please add the 'static' keyword before 'long'. This should render
  viennacl::ocl::backend<>::current_context_id()
valid.

Best regards,
Karli


>{
>  return current_context_id_;
>}
>
> However, I am unable to call this function.  I have tried:
>
> long context_id = viennacl::ocl::backend::current_context_id();
>
> but I just get an error:
>
> error: cannot call member function ‘long int
> viennacl::ocl::backend::current_context_id() [with bool dummy =
> false]’ without object
>
> Have I missed something with the structure of these files that is
> preventing this from working?
>
> Regards,
> Charles
>
> On Fri, Jan 29, 2016 at 2:04 PM, Karl Rupp <r...@iue.tuwien.ac.at
> <mailto:r...@iue.tuwien.ac.at>> wrote:
>
> Hi Charles,
>
> sorry for the late response. Currently there is no way of getting
> the internal index. You can, however, edit viennacl/ocl/backend.hpp
> and add a public member function in viennacl::ocl::backend returning
> the index.
>
> Best regards,
> Karli
>
>
>
> On 01/29/2016 04:00 PM, Charles Determan wrote:
>
> While trying to figure out the device index I thought the following
> would work:
>
> viennacl::ocl::current_context().current_device_id_
>
> but apparently the `current_device_id_` is private (which didn't
> appear
> to be the case in the context.hpp file).  Perhaps a simple
> method for
> its' accession?  If this functionality doesn't exist I can
> likely write
> this.
>
> I still have made no headway on the context index.
>
> Regards,
> Charles
>
> On Thu, Jan 28, 2016 at 1:30 PM, Charles Determan
> <cdeterma...@gmail.com <mailto:cdeterma...@gmail.com>
> <mailto:cdeterma...@gmail.com <mailto:cdeterma...@gmail.com>>>
> wrote:
>
>  A user can switch contexts easily with
>
>  long id = 1;
>  viennacl::ocl::switch_context(id);
>
>  Is there a way to determine the current context index?  In
> this case
>  it would return 1?
>
>  The corollary for platform is:
>
>  viennacl::ocl::current_context().platform_index()
>
>
>  I also don't see a method for device_index() either.  But it is
>  possible I am simply overlooking this somewhere in the
> documentation.
>
>  Regards,
>  Charles
>
>
>
>
> 
> --
> Site24x7 APM Insight: Get Deep Visibility into Application
> Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140
>
>
>
> ___
> ViennaCL-devel mailing list
> ViennaCL-devel@lists.sourceforge.net
> <mailto:ViennaCL-devel@lists.sourceforge.net>
> https://lists.sourceforge.net/lists/listinfo/viennacl-devel
>
>
>


--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] Current context index?

2016-01-29 Thread Karl Rupp

Hi Charles,

sorry for the late response. Currently there is no way of getting the 
internal index. You can, however, edit viennacl/ocl/backend.hpp and add 
a public member function in viennacl::ocl::backend returning the index.

Best regards,
Karli



On 01/29/2016 04:00 PM, Charles Determan wrote:
> While trying to figure out the device index I thought the following
> would work:
>
> viennacl::ocl::current_context().current_device_id_
>
> but apparently the `current_device_id_` is private (which didn't appear
> to be the case in the context.hpp file).  Perhaps a simple method for
> its' accession?  If this functionality doesn't exist I can likely write
> this.
>
> I still have made no headway on the context index.
>
> Regards,
> Charles
>
> On Thu, Jan 28, 2016 at 1:30 PM, Charles Determan  > wrote:
>
> A user can switch contexts easily with
>
> long id = 1;
> viennacl::ocl::switch_context(id);
>
> Is there a way to determine the current context index?  In this case
> it would return 1?
>
> The corollary for platform is:
>
> viennacl::ocl::current_context().platform_index()
>
>
> I also don't see a method for device_index() either.  But it is
> possible I am simply overlooking this somewhere in the documentation.
>
> Regards,
> Charles
>
>
>
>
> --
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140
>
>
>
> ___
> ViennaCL-devel mailing list
> ViennaCL-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/viennacl-devel
>


--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] resizing columns to front of matrix?

2016-01-20 Thread Karl Rupp

Hi Charles,

 > Okay, I thought that would be a workaround.  Does the current `resize`
> method create a copy of the matrix internally as well?  Ideally one
> could resize without the copying overhead.

The resize() method allocates a new buffer and copies over the old 
entries. Pretty much the same as in the workaround solution I suggested.

> Also, would you consider this as new functionality to be added in the
> future?  If so I can create a new github issue but if not, this
> workaround should be sufficient unless there is a way to avoid the
> aforementioned copy overhead.

Do you happen to have a suggestion for an intuitive interface? I can't 
think of a good one, so I'd rather leave it as-is with the explicit 
copying as suggested.

Best regards,
Karli



> On Wed, Jan 20, 2016 at 11:08 AM, Karl Rupp <r...@iue.tuwien.ac.at
> <mailto:r...@iue.tuwien.ac.at>> wrote:
>
> Hi Charles,
>
> > I can easily resize a matrix with the `resize` function
>
>
> viennacl::matrix vcl_A(3,3);
> vcl_A.resize(3,4);
>
> but I notice that this adds the new column on the end of the
> matrix.  Is
> there a way to append it on to the front of the matrix?
>
>
> Not directly. But you can create a new matrix and then use
> viennacl::project() to place the old entries at the correct spot:
>
> matrix B(3,4);
> matrix_range<matrix > B_right(B, range(0, 3), range(1,4));
> B_right = A;
>
>
>
> Best regards,
> Karli
>
>


--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

[ViennaCL-devel] ViennaCL 1.7.1 released!

2016-01-20 Thread Karl Rupp

Dear ViennaCL users,

ViennaCL 1.7.1 has just been released and is available for download at
http://viennacl.sourceforge.net/

The highlights of this new bugfix release are:
  * Fixed performance regression with newer AMD drivers for dense 
matrix-matrix products on AMD GPUs.
  * trans() now takes arbitrary matrix expressions as input
  * Improved performance of y += Ax and y -= Ax for sparse matrix A
  * Better support for systems with maximum OpenCL workgroup size 1
  * Runtime selection of best SpMV kernel on NVIDIA hardware
  * Improved OpenMP performance for BLAS level 1 and 2 operations

A full list of changes is available here:
  http://viennacl.sourceforge.net/doc/changelog.html

Many thanks to the community for the precious input!

Best regards,
Karl Rupp


--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] Switching context device types

2016-01-15 Thread Karl Rupp

Hi Charles,

 > I know I can set a device type with:
>
> viennacl::ocl::set_context_device_type(id, viennacl::ocl::gpu_tag());
>
> but is there a where for me to subsequently change that to `cpu_tag()`
> within the same main() function?

Depends on what you want to achieve. You can call 
set_context_device_type() multiple times and reset the device type. 
However, once you allocate a matrix or vector, the context gets 
initialized and hence you can no longer change the device type.


>  I see I can change devices with:
>
> viennacl::ocl::current_context().switch_device(1);
>
> but, if I understand correctly, that just switches be devices of the
> same type.

You can place all available devices into the same context and then 
switch between them via current_context().switch_device().
If you want different devices to work on the same OpenCL memory (e.g. 
two devices working on the same viennacl::vector), then this is what you 
should do. Keep in mind, however, that moving memory between devices is 
fairly expensive.

It is, however, not possible to reinitialize the same context with a new 
device, because this requires to deallocate all memory buffers, destruct 
all OpenCL programs, etc.

Best regards,
Karli


--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] tests failed

2015-12-15 Thread Karl Rupp

Hi,

> I installed pocl-0.12 in Fedora-23, but tests are still failing. (...)

Ok, this looks like a bug on our side, as the generator fails to pick a 
proper fallback. Is upgrading to ViennaCL 1.7.1 (which I'll roll out in 
the next few days and which will contain a fix) an option for you?

Best regards,
Karli


--
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] tests failed

2015-12-08 Thread Karl Rupp

Hi,

these errors look a lot like your GPU driver isn't installed properly. 
Could you please try to reinstall the GPU driver?

Best regards,
Karli


On 12/08/2015 08:01 PM, Ilya Gradina wrote:
> Hi all, I am build ViennaCL for Fedora-Rawhide:
> https://github.com/neurofedora/ViennaCL/raw/master/viennacl.spec
>
> result build use mock:
>
> 49% tests passed, 37 tests failed out of 73
>
> Total Test time (real) = 116.79 sec
>
> The following tests FAILED:
>   37 - bisect-opencl (OTHER_FAULT)
>   38 - matrix_product_float-opencl (OTHER_FAULT)
>   39 - matrix_product_double-opencl (OTHER_FAULT)
>   40 - blas3_solve-opencl (OTHER_FAULT)
>   41 - fft_1d-opencl (OTHER_FAULT)
>   42 - fft_2d-opencl (OTHER_FAULT)
>   43 - iterators-opencl (OTHER_FAULT)
>   44 - global_variables-opencl (OTHER_FAULT)
>   45 - matrix_convert-opencl (OTHER_FAULT)
>   46 - matrix_vector-opencl (OTHER_FAULT)
>   47 - matrix_vector_int-opencl (OTHER_FAULT)
>   48 - matrix_row_float-opencl (OTHER_FAULT)
>   49 - matrix_row_double-opencl (OTHER_FAULT)
>   50 - matrix_row_int-opencl (OTHER_FAULT)
>   51 - matrix_col_float-opencl (OTHER_FAULT)
>   52 - matrix_col_double-opencl (OTHER_FAULT)
>   53 - matrix_col_int-opencl (OTHER_FAULT)
>   54 - nmf-opencl (OTHER_FAULT)
>   55 - qr_method-opencl (OTHER_FAULT)
>   56 - qr_method_func-opencl (OTHER_FAULT)
>   57 - scan-opencl (OTHER_FAULT)
>   58 - scalar-opencl (OTHER_FAULT)
>   59 - self_assign-opencl (OTHER_FAULT)
>   60 - sparse-opencl (OTHER_FAULT)
>   61 - sparse_prod-opencl (OTHER_FAULT)
>   62 - structured-matrices-opencl (OTHER_FAULT)
>   63 - svd-opencl (OTHER_FAULT)
>   64 - tql-opencl (OTHER_FAULT)
>   65 - vector_convert-opencl (OTHER_FAULT)
>   66 - vector_float_double-opencl (OTHER_FAULT)
>   67 - vector_int-opencl (OTHER_FAULT)
>   68 - vector_uint-opencl (OTHER_FAULT)
>   69 - vector_multi_inner_prod-opencl (OTHER_FAULT)
>   70 - spmdm-opencl (OTHER_FAULT)
>   71 - libviennacl-blas1 (OTHER_FAULT)
>   72 - libviennacl-blas2 (OTHER_FAULT)
>   73 - libviennacl-blas3 (OTHER_FAULT)
>
> How can I fix these errors.
> any ideas?
>
> Thanks.
>
> --
> Go from Idea to Many App Stores Faster with Intel(R) XDK
> Give your users amazing mobile app experiences with Intel(R) XDK.
> Use one codebase in this all-in-one HTML5 development environment.
> Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
> http://pubads.g.doubleclick.net/gampad/clk?id=254741911=/4140
> ___
> ViennaCL-devel mailing list
> ViennaCL-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/viennacl-devel
>


--
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] Help in Interfacing another BLAS library?

2015-11-02 Thread Karl Rupp

Hi Oswin,

 >> FYI: ViennaCL 2.x will provide all functionality via a shared C
>> library in order to make cases such as the one you describe much
>> easier. This means that there will be stable function names for all
>> operations. An optional C++ layer will sit on top, rather than the
>> other way round as it is the case now...
>
> Oh this sounds great! I am in no hurry, I can wait for this. Is there
> some way I can help to make this happen? I am not good in OpenCL/CUDA
> but I am pretty decent when it comes to C++, so if I can help speed up
> the process...

thanks for the offer. C++-wise things are pretty much settled, most of 
the work to come is actually in low-level optimizations in each of the 
three backends. Still, any kind of feedback (improvements to API, 
performance regressions, bugreports, etc.) is highly appreciated and 
equally important.

Most of the updates to ViennaCL in the near future relate to better 
performance (ViennaCL 1.7.x) and extending some functionality only 
available in the OpenCL-backend to the other backends (1.8.x). I think 
this will take until about the end of the year.

2.0.0 will be quite a leap, so it may take until around April 2016 to 
have it released. There will be a separate development branch for early 
feedback, though. :-)

Best regards,
Karli


--
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] Help in Interfacing another BLAS library?

2015-10-31 Thread Karl Rupp

Hi Oswin,

 >> Unfortunately, C++ template metaprogramming is not very friendly to
>> exchanging library interfaces at this point, free functions would
>> be a lot easier.
> I know, I have looked into the interfaces and figured that there might
> be some rough edges to get around. My biggest concern are naming
> collisions which might make 1:1 replacements hard, e.g.
> ublas/shark::blas::matrix_expression vs viennacl::matrix_expression,
> which mean different things. Also, my library supports mixing of
> row/column major in expressions which is not supported by viennacl yet,
> therefore I expect some clashes there.

yeah, I bet that any attempt of trying to 'exchange' libraries based on 
such expression template magic will not work satisfactorily. This is too 
much an 'accident' of C++... ;-)


> My current plan would be to use my expression system as frontend and map
> that to the computational kernels of viennaCL in the same way I did it
> with the ATLAS backend - just with a few more bells and whistles to
> ensure that cpu stuff only interacts with cpu stuff and gpu-stuff only
> with gpu.
>
> this means i would use the exposed interface in e.g.
> viennacl/linalg/opencl/matrix_operations.hpp and more or less ignore the
> expression templates.

If you want to have backend-independence, use the functions in 
viennacl/linalg/X_operations.hpp instead. These functions dispatch 
into the respective backend. At the same time, once you are down to 
mapping your expression system to the computational kernels, you can map 
them either way. For example, once you have identified a vector 
addition, you can either call 'x = y + z;' or 
'viennacl::linalg::avbv(..)'. The former is more stable API-wise.


> Is this okay or is this somehow "unstable" or
> "might change randomly in future releases"?

The backend function names don't have any guarantee on 
forward-compatibility, but it is very unlikely to change in the ViennaCL 
1.x.y release series.

FYI: ViennaCL 2.x will provide all functionality via a shared C library 
in order to make cases such as the one you describe much easier. This 
means that there will be stable function names for all operations. An 
optional C++ layer will sit on top, rather than the other way round as 
it is the case now...

Best regards,
Karli




--
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] Help in Interfacing another BLAS library?

2015-10-30 Thread Karl Rupp

Hi Oswin,

 > I am currently investigating whether ViennaCL can be used as a possible
> GPU backend for our linear Algebra as part of the Machine Learning
> Library Shark. We started by using boost::ublas but then decided to
> reimplement it in a better way(e.g. we dropped about 70% of ublas code,
> redesigned the internal interfaced an now we have an ATLAS backend for
> CPU). Therefore, large parts of our library can be used as drop-in
> replacement for ublas. This makes ViennaCL very interesting for us.
> Unfortunately we will have to add a bit of operators and other functions
> as well
> (e.g. we drop the "element_" in element_cos or have operator* as
> elementwise multiplication, just because we deal with a lot of nonlinear
> computations.).

Unfortunately, C++ template metaprogramming is not very friendly to 
exchanging library interfaces at this point, free functions would be a 
lot easier.


> Therefore my question:
> In case i run into trouble at some point, can I ask here for some help
> on how to do it in a ViennaCL
> compatible way?

Sure, that's what the mailinglists are for :-)
If you have a bunch of smaller questions which require back, we 
can also meet in the #viennaCL IRC channel at freenode if you want. Just 
let me know when to turn up.

Best regards,
Karli


--
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] HSA backend

2015-10-05 Thread Karl Rupp

Hi Vladimir,

 > As part of the masters thesis I've developed a HSA (hsafoundation.com
> ) backend for viennacl library -
> https://github.com/vpa1977/viennacl-dev/ . It is still work in progess,
> but i was wondering if it would prove useful enough to be integrated
> into mainline some time in the future?

great, this is definitely of interest.

As far as I can see, your HSA-backend follows the OpenCL backend very 
closely. Ideally all the common code paths can be shared among the 
OpenCL and HSA backends, so it would be helpful if you can keep that in 
mind during your further development.

If you have any questions, don't hesitate to ask.

Best regards,
Karli


--
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] Non-contiguous matrix subset?

2015-08-18 Thread Karl Rupp

Hi Charles,

this is currently not implemented. I thought about gather/scatter 
routines such that you can extract arbitrary patterns from a matrix, but 
this is (again) not yet available.

Best regards,
Karli


On 08/18/2015 09:13 PM, Charles Determan wrote:
 I have read in the documentation that I can use viennacl::range  or
 viennacl::slice to subset a matrix.  However, is there a way to subset a
 matrix with indices that aren't in a patter?

 Lets say I have a matrix with dimensions 10x10 and I want to subset the
 1,2,5,8, and 9 columns so the resulting matrix would be 10x5.  Is there
 a way to accomplish this with a viennacl::matrix?

 Regards,
 Charles


 --



 ___
 ViennaCL-devel mailing list
 ViennaCL-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/viennacl-devel



--
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] Matrix column or row index?

2015-08-17 Thread Karl Rupp

Hi,
  Fair enough, thank you Karl.  One follow-up question, where is this in
 the documentation?  I am trying to get a feel of where everything is so
 I don't have to ask these simple questions on here.

This is listed in the 'Basic Operations'-Chapter:
http://viennacl.sourceforge.net/doc/manual-operations.html#manual-operations-row-column-diagonal

All supported operations are supposed to be described in the 'User 
Manual'. With the operator-overloading in C++ it is not easy to list all 
available operation in e.g. a single large table of function names. If 
you have suggestions for further improvements, please let us know. :-)

Best regards,
Karli



 On Mon, Aug 17, 2015 at 8:15 AM, Karl Rupp r...@iue.tuwien.ac.at
 mailto:r...@iue.tuwien.ac.at wrote:

 Hi Charles,

 you can extract the row or column of a matrix using viennacl::row()
 and viennacl::column(), e.g.

 viennacl::matrixT A(N, N);
 viennacl::vectorT x(N);

 x = viennacl::row(A, 2);
 x = viennacl::column(A, 3);

 The assignment of a vector to a matrix row or matrix column is more
 involved, though...

 In either case you really want to iterate over elements and use
 per-element access. This is because each access entails a
 host-device communication, which is very costly with CUDA and OpenCL.

 Best regards,
 Karli




 On 08/17/2015 03:08 PM, Charles Determan wrote:

 I have seen in the documentation that it is simple to access an
 individual element in a viennacl::matrix with

 mat(i,j)

 but how could I access an entire row or column?  Is there
 similar syntax
 or would I need to iterate over each element of a row/column?

 Thanks,
 Charles


 
 --



 ___
 ViennaCL-devel mailing list
 ViennaCL-devel@lists.sourceforge.net
 mailto:ViennaCL-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/viennacl-devel





--
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] Matrix column or row index?

2015-08-17 Thread Karl Rupp

Hi Charles,

you can extract the row or column of a matrix using viennacl::row() and 
viennacl::column(), e.g.

viennacl::matrixT A(N, N);
viennacl::vectorT x(N);

x = viennacl::row(A, 2);
x = viennacl::column(A, 3);

The assignment of a vector to a matrix row or matrix column is more 
involved, though...

In either case you really want to iterate over elements and use 
per-element access. This is because each access entails a host-device 
communication, which is very costly with CUDA and OpenCL.

Best regards,
Karli



On 08/17/2015 03:08 PM, Charles Determan wrote:
 I have seen in the documentation that it is simple to access an
 individual element in a viennacl::matrix with

 mat(i,j)

 but how could I access an entire row or column?  Is there similar syntax
 or would I need to iterate over each element of a row/column?

 Thanks,
 Charles


 --



 ___
 ViennaCL-devel mailing list
 ViennaCL-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/viennacl-devel



--
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] ViennaCL reductions

2015-08-05 Thread Karl Rupp

Hi Sumit,

  Bah :) Looks like I am the first one to ask about this. Yeah, that would
 be helpful (at least int and long)

Charles opened up an issue for tracking purposes:
https://github.com/viennacl/viennacl-dev/issues/153


 The second one is moot. So, if CSR is not supported for Column-Major
 matrices, then the only way to go about would be to implicitly cast away
 the Colum -major and make it row-major. Thus, something like this in VCL
 Eigen: Storage orders
 http://eigen.tuxfamily.org/dox/group__TopicStorageOrders.html
   
 Would be helpful without having to create intermediates.

Well, this is a matter of data-structures, so implicit casts will not be 
enough. Explicit transpositions are, unfortunately, costly.


 Also, I encountered another problem: One of my matrix multiplications
 returned all zeros. When I tried to copy an all-zero matrix to the host
 VCL threw an exception in the copy sequence. I think the correct way to
 do this would be:
 a.) Not do any matrix multiplication if all elements are zeros
 b.) If done, then it should fill up the matrix with zeros and return it
 without throwing an assertion.

This sounds like this could also be a bug. What was the matrix size?

Best regards,
Karli



 
 *From:* Karl Rupp r...@iue.tuwien.ac.at
 *To:* Sumit Kumar dost_4_e...@yahoo.com
 *Cc:* viennacl-devel@lists.sourceforge.net
 viennacl-devel@lists.sourceforge.net
 *Sent:* Monday, August 3, 2015 1:23 AM
 *Subject:* Re: [ViennaCL-devel] ViennaCL reductions

 Hi Sumit,

   I was trying to run sparse matrix multiplication, but one of my explicit
   template typedefs had Int. After some digging, I found out that CSR only
   supported Float or double. Is there any reason for this? Can we also
   have support for other templates? (like int ?)

 It is technically possible, but we haven't implemented it yet. After 5
 years you are the first to even ask for it ;-)


   Another thing would be the alignment order. Suppose I have a Row-major
   Sparse Eigen matrix, then I can copy it to a (Row-Major ?) VCL
   compressed matrix. What about a column-major sparse Eigen matrix?

 If you can point me to a fast, massively parallel column-major
 matrix-vector multiplication routine, I look into it. However, as far as
 I know, there is no such routine for general sparse matrices, hence it
 does not make sense for us to support it.

 Best regards,
 Karli






   
   *From:* Karl Rupp r...@iue.tuwien.ac.at mailto:r...@iue.tuwien.ac.at
   *To:* Sumit Kumar dost_4_e...@yahoo.com mailto:dost_4_e...@yahoo.com
   *Cc:* viennacl-devel@lists.sourceforge.net
 mailto:viennacl-devel@lists.sourceforge.net
   viennacl-devel@lists.sourceforge.net
 mailto:viennacl-devel@lists.sourceforge.net
   *Sent:* Friday, July 31, 2015 9:04 PM
   *Subject:* Re: [ViennaCL-devel] ViennaCL reductions
  
   Hi Sumit,
  
 I am aware that Eigen can do it for its matrices and I am also aware
 that VCL cannot do it natively. My question was this:
 In your example of interfacing with Eigen, you have shown a VCL dense
 matrix interfacing with an Eigen dense matrix. Do you have any example
 of interfacing an Eigen Sparse matrix with a VCL dense matrix?
  
   No, we don't have this.
  
  
  
  
   Best regards,
   Karli
  
  
  
  





--
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] ViennaCL Matrix multiplication

2015-08-05 Thread Karl Rupp

On 08/04/2015 08:21 AM, Sumit Kumar wrote:
 Hi Karl
 Based on all your suggestions,
 I came up with this implementation:

// Copy Sparse Eigen matrices to Dense viennacl matrices
typedef Eigen::MatrixScalarType, Eigen::Dynamic, Eigen::Dynamic,
 Eigen::RowMajor RMMatrix;
viennacl::matrixScalarType, viennacl::row_major
 vcl_A(source-rows(), source-cols());
viennacl::copy(RMMatrix(*source), vcl_A);
viennacl::matrixScalarType, viennacl::row_major
 vcl_B(target-rows(), target-cols());
viennacl::copy(RMMatrix(*target), vcl_B);
viennacl::matrixScalarType, viennacl::row_major
 vcl_C(result-rows(), result-cols());
// Implement the matrix multiplication on the GPU.
vcl_C  = viennacl::linalg::prod(vcl_A, vcl_B);
// Copy the matrix back to the host matrix
RMMatrix temp = RMMatrix(*result);
viennacl::copy(vcl_C, temp);
(*result) = temp.sparseView();

You should really double-check your uses of RMMatrix. Each of the calls 
to copy() creates a temporary Eigen matrix from the buffer, resulting in 
another copy. fast_copy() is *much* more appropriate here. async_copy() 
is not needed here, because you don't have other computations for 
overlapping host-device-host transfers.

The sparse-to-dense conversion depends on your use case. If you have 
more than ~10 percent nonzeros in your matrix, a dense matrix-matrix 
product may pay off. It depends a lot on the sparse matrix pattern of 
the result matrix, which can be fairly hard to predict.

Best regards,
Karli


--
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] ViennaCL reductions

2015-08-05 Thread Karl Rupp


 http://arxiv.org/pdf/math/0307321v2.pdf
 Can this be implemented? I am not aware of Group theory :)
 Apparently, this has a complexity of O(n^2) !

Well, O(n^2) does not have any practical value without specifying the 
constants. For example, 100*n^2 won't be faster than 2*n^3 for 
any practical matrix sizes ;-)

Either way, we are happy about any contributions. :-)

Best regards,
Karli



 
 *From:* Karl Rupp r...@iue.tuwien.ac.at
 *To:* Sumit Kumar dost_4_e...@yahoo.com
 *Cc:* viennacl-devel@lists.sourceforge.net
 viennacl-devel@lists.sourceforge.net
 *Sent:* Monday, August 3, 2015 1:23 AM
 *Subject:* Re: [ViennaCL-devel] ViennaCL reductions

 Hi Sumit,

   I was trying to run sparse matrix multiplication, but one of my explicit
   template typedefs had Int. After some digging, I found out that CSR only
   supported Float or double. Is there any reason for this? Can we also
   have support for other templates? (like int ?)

 It is technically possible, but we haven't implemented it yet. After 5
 years you are the first to even ask for it ;-)


   Another thing would be the alignment order. Suppose I have a Row-major
   Sparse Eigen matrix, then I can copy it to a (Row-Major ?) VCL
   compressed matrix. What about a column-major sparse Eigen matrix?

 If you can point me to a fast, massively parallel column-major
 matrix-vector multiplication routine, I look into it. However, as far as
 I know, there is no such routine for general sparse matrices, hence it
 does not make sense for us to support it.

 Best regards,
 Karli






   
   *From:* Karl Rupp r...@iue.tuwien.ac.at mailto:r...@iue.tuwien.ac.at
   *To:* Sumit Kumar dost_4_e...@yahoo.com mailto:dost_4_e...@yahoo.com
   *Cc:* viennacl-devel@lists.sourceforge.net
 mailto:viennacl-devel@lists.sourceforge.net
   viennacl-devel@lists.sourceforge.net
 mailto:viennacl-devel@lists.sourceforge.net
   *Sent:* Friday, July 31, 2015 9:04 PM
   *Subject:* Re: [ViennaCL-devel] ViennaCL reductions
  
   Hi Sumit,
  
 I am aware that Eigen can do it for its matrices and I am also aware
 that VCL cannot do it natively. My question was this:
 In your example of interfacing with Eigen, you have shown a VCL dense
 matrix interfacing with an Eigen dense matrix. Do you have any example
 of interfacing an Eigen Sparse matrix with a VCL dense matrix?
  
   No, we don't have this.
  
  
  
  
   Best regards,
   Karli
  
  
  
  





--
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] ViennaCL fast_copy() matrices

2015-08-05 Thread Karl Rupp


 I am aware of the methodology. However, is there any example / can you
 share some examples of interfacing an Eigen matrix with fast_copy ? As
 per the API, if we detect an Eigen Matrix then something specific for
 VCL  Eigen comes up.

Sumit, don't get me wrong, but this is basic C/C++. Just grab a pointer 
to the internal data from an Eigen matrix with .data() and pass it to 
fast_copy().

Best regards,
Karli



 
 *From:* Karl Rupp r...@iue.tuwien.ac.at
 *To:* Sumit Kumar dost_4_e...@yahoo.com
 *Cc:* viennacl-devel@lists.sourceforge.net
 viennacl-devel@lists.sourceforge.net
 *Sent:* Wednesday, August 5, 2015 7:38 PM
 *Subject:* Re: [ViennaCL-devel] ViennaCL fast_copy() matrices

 Hi,

   I have a dense matrix (2048 x 2048) on the GPU and I want to copy it
   back to a host matrix on the CPU (2048 x 2048). Is there any example of
   using fast_copy() ? 2048 is a multiple of 128.

 examples/benchmarks/dense_blas.cpp

 http://viennacl.sourceforge.net/doc/namespaceviennacl.html#a92695ea6fa06cb0b9d940bb14080b62a

 Best regards,
 Karli

   
   *From:* Karl Rupp r...@iue.tuwien.ac.at mailto:r...@iue.tuwien.ac.at
   *To:* Sumit Kumar dost_4_e...@yahoo.com mailto:dost_4_e...@yahoo.com
   *Cc:* viennacl-devel@lists.sourceforge.net
 mailto:viennacl-devel@lists.sourceforge.net
   viennacl-devel@lists.sourceforge.net
 mailto:viennacl-devel@lists.sourceforge.net
   *Sent:* Monday, August 3, 2015 1:23 AM
   *Subject:* Re: [ViennaCL-devel] ViennaCL reductions
  
   Hi Sumit,
  
 I was trying to run sparse matrix multiplication, but one of my
 explicit
 template typedefs had Int. After some digging, I found out that
 CSR only
 supported Float or double. Is there any reason for this? Can we also
 have support for other templates? (like int ?)
  
   It is technically possible, but we haven't implemented it yet. After 5
   years you are the first to even ask for it ;-)
  
  
 Another thing would be the alignment order. Suppose I have a Row-major
 Sparse Eigen matrix, then I can copy it to a (Row-Major ?) VCL
 compressed matrix. What about a column-major sparse Eigen matrix?
  
   If you can point me to a fast, massively parallel column-major
   matrix-vector multiplication routine, I look into it. However, as far as
   I know, there is no such routine for general sparse matrices, hence it
   does not make sense for us to support it.
  
   Best regards,
   Karli
  
  
  
  
  
  

 
 *From:* Karl Rupp r...@iue.tuwien.ac.at
 mailto:r...@iue.tuwien.ac.at mailto:r...@iue.tuwien.ac.at
 mailto:r...@iue.tuwien.ac.at
 *To:* Sumit Kumar dost_4_e...@yahoo.com
 mailto:dost_4_e...@yahoo.com mailto:dost_4_e...@yahoo.com
 mailto:dost_4_e...@yahoo.com
 *Cc:* viennacl-devel@lists.sourceforge.net
 mailto:viennacl-devel@lists.sourceforge.net
   mailto:viennacl-devel@lists.sourceforge.net
 mailto:viennacl-devel@lists.sourceforge.net
 viennacl-devel@lists.sourceforge.net
 mailto:viennacl-devel@lists.sourceforge.net
   mailto:viennacl-devel@lists.sourceforge.net
 mailto:viennacl-devel@lists.sourceforge.net



 *Sent:* Friday, July 31, 2015 9:04 PM
 *Subject:* Re: [ViennaCL-devel] ViennaCL reductions

 Hi Sumit,

   I am aware that Eigen can do it for its matrices and I am also
 aware
   that VCL cannot do it natively. My question was this:
   In your example of interfacing with Eigen, you have shown a VCL
 dense
   matrix interfacing with an Eigen dense matrix. Do you have any
 example
   of interfacing an Eigen Sparse matrix with a VCL dense matrix?

 No, we don't have this.




 Best regards,
 Karli




  
  
  





--
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] CUDA slower than OpenCL in new R implementation?

2015-08-03 Thread Karl Rupp


 I am glad that I can at least understand why I am seeing this
 difference.  I absolutely think the CUDA 'port' should be added to
 ViennaCL.  It certainly may be preferable to some to call the direct
 cuBLAS routines but I am in favor of trying to find a balance between
 speed and 'ease-of-use'.  From my point of view, having both optimized
 OpenCL and CUDA kernels would be a great selling point for ViennaCL.

well, we would actually call the cuBLAS routines internally, so a user 
would not get in touch with it at all. Performance *and* ease-of-use so 
to say ;-)

Best regards,
Karli




 On Mon, Aug 3, 2015 at 7:37 AM, Karl Rupp r...@iue.tuwien.ac.at
 mailto:r...@iue.tuwien.ac.at wrote:

 Hi Charles,

  I was benchmarking 4096x4096 matrices (again, with my R bindings).  By

 'slower' I mean that I am observing OpenCL at this size beating the
 OpenBLAS CPU implementation by over 2X but the CUDA
 implementation is
 nearly 5X slower than the CPU.  This seemed odd to me that the CUDA
 would be so much slower than the OpenCL, hence my initial thought to
 invite others to review my code if I am making some sort of silly
 mistake.  Otherwise I was intending to begin trying to pursue direct
 cublas methods but I would very much prefer to use ViennaCL.


 okay, in this case what Philippe was just the full answer. Our
 OpenCL kernels are highly GPU-specific and generate a 'good' kernel
 at runtime. We haven't 'ported' (i.e. a one-to-one translation from
 OpenCL to CUDA) these kernels to the CUDA backend yet, so only a
 fallback kernel is used for the CUDA backend. It should be possible
 to carry these over with not too much effort, but in such case it
 makes more sense to just call the cuBLAS routines instead. Adding
 this for ViennaCL 1.7.1 is certainly possible if that is what you
 would be happy with.

 Best regards,
 Karli



 On Sat, Aug 1, 2015 at 3:56 AM, Karl Rupp r...@iue.tuwien.ac.at
 mailto:r...@iue.tuwien.ac.at
 mailto:r...@iue.tuwien.ac.at mailto:r...@iue.tuwien.ac.at
 wrote:

  Hi Charles,

  can you please quantify what you mean by 'slower'? How does
 'slower'
  change as you increase the problem size? I would not be
 surprised if
  you see no performance gains below matrices of size
 500-by-500. With
  the extra back-and-forth through PCI-Express you may even need
  matrices of at least 1000-by-1000.

  Best regards,
  Karli



  On 07/31/2015 09:04 PM, Charles Determan wrote:

  Greetings,

  Brief background, I am developing a series of R
 packages to bring
  ViennaCL to the R community.  I have had success with the
  development of
  my gpuR package (https://github.com/cdeterman/gpuR)
 which relies
  on the
  OpenCL backend of ViennaCL (which is housed in the package
  RViennaCL).
  I am hoping to submit to CRAN in the coming weeks now
 that the
  latest
  stable ViennaCL version has just been released.

  Naturally, I wanted a companion package for a CUDA backend.
  This is now
  the gpuRcuda package
 (https://github.com/cdeterman/gpuRcuda).
  This has
  appeared to work successfully as most of the code is
 the same.
  However,
  my initial benchmarks are showing very dismal
 performance with
  the CUDA
  backend.

  I was wondering if someone from this list would be
 willing to have a
  look at my code to see why the CUDA code would be so much
  worse.  I had
  thought, given working a NVIDIA card (GeForce GTX 970),
 CUDA would
  provide improved speed but the benchmarks are showing
 performance at
  least 5-fold slower than the CPU based R
 multiplication.  Even the
  'float' type matrix multiplication is slower than R
 (which only has
  double type support!).

  The sgemm CUDA file is

 (https://github.com/cdeterman/gpuRcuda/blob/master/src/vcl_sgemm.cu)
  and
  the associated C++ file is

 
 (https://github.com/cdeterman/gpuRcuda/blob/master/src/vcl_cudaMatrix_gemm.cpp).

  Other note, I have tried making the two packages completely
  independent
  and the performance is still very poor with CUDA.

  I really appreciate any help others could provide

Re: [ViennaCL-devel] ViennaCL reductions

2015-08-02 Thread Karl Rupp

Hi Sumit,

  I was trying to run sparse matrix multiplication, but one of my explicit
 template typedefs had Int. After some digging, I found out that CSR only
 supported Float or double. Is there any reason for this? Can we also
 have support for other templates? (like int ?)

It is technically possible, but we haven't implemented it yet. After 5 
years you are the first to even ask for it ;-)


 Another thing would be the alignment order. Suppose I have a Row-major
 Sparse Eigen matrix, then I can copy it to a (Row-Major ?) VCL
 compressed matrix. What about a column-major sparse Eigen matrix?

If you can point me to a fast, massively parallel column-major 
matrix-vector multiplication routine, I look into it. However, as far as 
I know, there is no such routine for general sparse matrices, hence it 
does not make sense for us to support it.

Best regards,
Karli



 
 *From:* Karl Rupp r...@iue.tuwien.ac.at
 *To:* Sumit Kumar dost_4_e...@yahoo.com
 *Cc:* viennacl-devel@lists.sourceforge.net
 viennacl-devel@lists.sourceforge.net
 *Sent:* Friday, July 31, 2015 9:04 PM
 *Subject:* Re: [ViennaCL-devel] ViennaCL reductions

 Hi Sumit,

   I am aware that Eigen can do it for its matrices and I am also aware
   that VCL cannot do it natively. My question was this:
   In your example of interfacing with Eigen, you have shown a VCL dense
   matrix interfacing with an Eigen dense matrix. Do you have any example
   of interfacing an Eigen Sparse matrix with a VCL dense matrix?

 No, we don't have this.




 Best regards,
 Karli






--
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] CUDA slower than OpenCL in new R implementation?

2015-08-01 Thread Karl Rupp

Hi Charles,

can you please quantify what you mean by 'slower'? How does 'slower' 
change as you increase the problem size? I would not be surprised if you 
see no performance gains below matrices of size 500-by-500. With the 
extra back-and-forth through PCI-Express you may even need matrices of 
at least 1000-by-1000.

Best regards,
Karli


On 07/31/2015 09:04 PM, Charles Determan wrote:
 Greetings,

 Brief background, I am developing a series of R packages to bring
 ViennaCL to the R community.  I have had success with the development of
 my gpuR package (https://github.com/cdeterman/gpuR) which relies on the
 OpenCL backend of ViennaCL (which is housed in the package RViennaCL).
 I am hoping to submit to CRAN in the coming weeks now that the latest
 stable ViennaCL version has just been released.

 Naturally, I wanted a companion package for a CUDA backend.  This is now
 the gpuRcuda package (https://github.com/cdeterman/gpuRcuda).  This has
 appeared to work successfully as most of the code is the same.  However,
 my initial benchmarks are showing very dismal performance with the CUDA
 backend.

 I was wondering if someone from this list would be willing to have a
 look at my code to see why the CUDA code would be so much worse.  I had
 thought, given working a NVIDIA card (GeForce GTX 970), CUDA would
 provide improved speed but the benchmarks are showing performance at
 least 5-fold slower than the CPU based R multiplication.  Even the
 'float' type matrix multiplication is slower than R (which only has
 double type support!).

 The sgemm CUDA file is
 (https://github.com/cdeterman/gpuRcuda/blob/master/src/vcl_sgemm.cu) and
 the associated C++ file is
 (https://github.com/cdeterman/gpuRcuda/blob/master/src/vcl_cudaMatrix_gemm.cpp).

 Other note, I have tried making the two packages completely independent
 and the performance is still very poor with CUDA.

 I really appreciate any help others could provide troubleshooting this.
 I have truly run out of ideas as to why the code has such poor performance.

 Regards,
 Charles


 --



 ___
 ViennaCL-devel mailing list
 ViennaCL-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/viennacl-devel



--
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] ViennaCL reductions

2015-07-31 Thread Karl Rupp

Hi Sumit,

  I am aware that Eigen can do it for its matrices and I am also aware
 that VCL cannot do it natively. My question was this:
 In your example of interfacing with Eigen, you have shown a VCL dense
 matrix interfacing with an Eigen dense matrix. Do you have any example
 of interfacing an Eigen Sparse matrix with a VCL dense matrix?

No, we don't have this.

Best regards,
Karli



--
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

[ViennaCL-devel] ViennaCL 1.7.0 released!

2015-07-31 Thread Karl Rupp

Dear ViennaCL users,

ViennaCL 1.7.0 has just been released.

Due to a storage fault at Sourceforge [1], a download from the 
ViennaCL-Sourceforge page [2,3] is not yet possible. An alternative for 
the next few days is to download a release tarball from the developer 
repository on GitHub: https://github.com/viennacl/viennacl-dev/releases

The highlights of the 1.7.0 release are:
  * Fine-grained parallel incomplete LU factorization preconditioners
(based on recent paper by Chow and Patel [4])
  * Fast sparse matrix-matrix multiplication
(based on recent paper by Gremse et al. [5])
  * Improved performance of sparse matrix-vector products
  * Fine-grained parallel algebraic multigrid preconditioners for CUDA, 
OpenCL, and OpenMP.
  * Conversion between vectors and matrices of different numeric type
  * Lanczos eigenvalue solver now optionally also returns eigenvectors.
  * Interface to/from the Armadillo library [4]

A full list of changes is available here:
  http://viennacl.sourceforge.net/doc/changelog.html

Updates to the benchmark section on the ViennaCL webpage in order to 
showcast the good performance will follow shortly.

Best regards,
Karl Rupp

[1] 
https://sourceforge.net/blog/sourceforge-infrastructure-and-service-restoration-update-for-724/
[2] https://sourceforge.net/projects/viennacl/
[3] http://viennacl.sourceforge.net/
[4] http://epubs.siam.org/doi/abs/10.1137/140968896
[5] http://epubs.siam.org/doi/abs/10.1137/130948811
[6] http://arma.sourceforge.net/


--
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] ViennaCL reductions

2015-07-31 Thread Karl Rupp

Hi Sumit,

  I know that we can copy a dense eigen matrix to a dense VCL matrix and
 the same thing for sparse matrices also.
 Is it possible to copy a sparse eigen matrix to a dense eigen matrix?
 and vice-versa ?

I don't know for Eigen, you have to ask the Eigen developers.
It is not possible out-of-the-box in ViennaCL to assign a ViennaCL 
sparse matrix to a ViennaCL dense matrix (and vice versa).

Best regards,
Karli





 
 *From:* Karl Rupp r...@iue.tuwien.ac.at
 *To:* Sumit Kumar dost_4_e...@yahoo.com
 *Cc:* viennacl-devel@lists.sourceforge.net
 viennacl-devel@lists.sourceforge.net
 *Sent:* Tuesday, July 28, 2015 4:01 PM
 *Subject:* Re: [ViennaCL-devel] ViennaCL reductions

 Hi,

   That worked. So, the final question I have would be as follows:
   viennacl::copy(vcl_C, eigen_mat);
   is a valid statement
   However, if I want to copy vcl_C to some particular region of a host
   matrix, then, do I have to first create a temporary host matrix and then
   copy that host matrix into a larger matrix or is there anything direct?
   Something like this:
   viennacl::copy(vcl_C, prod.block(0, j, source.rows(), bTemp.cols()));
  
   Otherwise the only way I can think of is this:
   RMMatrix_Float temp(rows,cols);
   viennacl::copy(vcl_C, temp);
   RMMatrix_Float biggerMatrix.block(0,j,rows,cols) = temp

 you can either use temporaries, or create an Eigen::Map from your
 Eigen matrix for wrapping your existing matrix accordingly:
 http://eigen.tuxfamily.org/dox/classEigen_1_1Map.html

 Best regards,
 Karli


   
   *From:* Karl Rupp r...@iue.tuwien.ac.at mailto:r...@iue.tuwien.ac.at
   *To:* Sumit Kumar dost_4_e...@yahoo.com mailto:dost_4_e...@yahoo.com
   *Cc:* viennacl-devel@lists.sourceforge.net
 mailto:viennacl-devel@lists.sourceforge.net
   viennacl-devel@lists.sourceforge.net
 mailto:viennacl-devel@lists.sourceforge.net
   *Sent:* Tuesday, July 28, 2015 2:06 AM
   *Subject:* Re: [ViennaCL-devel] ViennaCL reductions
  
   Hi,
  
 That's why I explicitly stated in my previous mail Optional :) I am
 one of those folks who likes the coziness of compile time
 decisions and
 (if possible) would like to make it available in VCL. However, I
 am not
 sure if doing this is possible via the current API structure of VCL. I
 will look into this as I first need to understand the intricacies of
 OpenCL.

 BTW, a previous question of mine was unanswered (or I may have
 overlooked). Are there any equivalent functions in VCL to do the
   following:
 a.) Two matrices, when transferred to the GPU should undergo
 element_wise multiplication ?
 b.) A unary operation on a matrix that has been transferred to the
 GPU?
  
   a.) and b.) sound a lot like you are looking for element_*-functions
  
 http://viennacl.sourceforge.net/doc/manual-operations.html#manual-operations-blas1,
  
 http://viennacl.sourceforge.net/doc/manual-operations.html#manual-operations-blas1,
   which work for vectors and matrices alike.
  
   Best regards,
   Karli
  
  
  

 
 *From:* Karl Rupp r...@iue.tuwien.ac.at
 mailto:r...@iue.tuwien.ac.at mailto:r...@iue.tuwien.ac.at
 mailto:r...@iue.tuwien.ac.at
 *To:* Sumit Kumar dost_4_e...@yahoo.com
 mailto:dost_4_e...@yahoo.com mailto:dost_4_e...@yahoo.com
 mailto:dost_4_e...@yahoo.com
 *Cc:* viennacl-devel@lists.sourceforge.net
 mailto:viennacl-devel@lists.sourceforge.net
   mailto:viennacl-devel@lists.sourceforge.net
 mailto:viennacl-devel@lists.sourceforge.net
 viennacl-devel@lists.sourceforge.net
 mailto:viennacl-devel@lists.sourceforge.net
   mailto:viennacl-devel@lists.sourceforge.net
 mailto:viennacl-devel@lists.sourceforge.net
 *Sent:* Monday, July 27, 2015 9:13 PM
 *Subject:* Re: [ViennaCL-devel] ViennaCL reductions

 Hi Sumit,

   Agreed, the names can be returned in any order. As you are
 using CMake
   would it be possible to:
   a.) Write a small helper script using CMake that lists what
   devices the
   user has on his/her machine that are OpenCL compliant?

 Exactly this is provided by running viennacl-info:
$ mkdir build  cd build
$ cmake ..
$ make viennacl-info
$ examples/tutorial/viennacl-info


   b.) Make VCL select the appropriate device so that these issues of
   context selection etc can be avoided?

 In most cases the first OpenCL device returned is the most appropriate
 device for running the computations. If you know of any better
 strategy
 for picking the default device, please let us know and we are happy to
 adopt it. :-)


   c.) Of course, this would be purely optional and only available
 if the
   user wants to pre-select the device before writing any VCL code

Re: [ViennaCL-devel] ViennaCL reductions

2015-07-29 Thread Karl Rupp

Hi Sumit,

  ViennaCL: Sparse Matrix-Matrix Multiplication
 http://www.iue.tuwien.ac.at/cse/index.php/gsoc/2014/ideas/167-viennacl-sparse-matrix-matrix-multiplication-2.html
 Description
 View on www.iue.tuwien.ac.at
 http://www.iue.tuwien.ac.at/cse/index.php/gsoc/2014/ideas/167-viennacl-sparse-matrix-matrix-multiplication-2.html

 Do you have any example in ViennaCL that actually does this? Can you
 please point this out? I was rethinking my strategy and was wondering if
 I formulate my Dense matrix multiplication as a sparse matrix
 multiplication and,

1.7.0 will provide a sparse matrix-matrix product for 
compressed_matrix. Some code for the nightly tests is in
  tests/src/sparse_prod.cpp
The API is the same as for dense matrices:
  A = viennacl::linalg::prod(B, C);
for A, B, and C being of type compressed_matrixT.


 a.) Use The modified Strassen algorithm to reduce the workload
 Strassen's Matrix Multiplication Relabeled
 http://src.acm.org/loos/loos.html

I don't understand what you are trying to achieve with Strassen for 
sparse matrices. Strassen is (sometimes) useful for dense matrices, but 
not for sparse matrices. The algorithmic complexities involved in sparse 
and dense matrix-matrix multiplications are totally different.

Best regards,
Karli


--
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] ViennaCL reductions

2015-07-27 Thread Karl Rupp

Hi Sumit,

  Thanks for the update. Indeed, support for RM matrices would be helpful.

Added right here:
https://github.com/viennacl/viennacl-dev/commit/7d29adf3970d5eaefef448742101c5a5960d773f


 I did take a look at the page and tried all the methods there. My
 architecture contains the following:
 a.) AMD CPU
 b.) AMD Devastator (with the CPU)
 c.) AMD Hainan (discrete GPU card)

 No matter whatever I do, it ends up picking up b.) whereas I want to
 pick up c.)
 Can anyone share an example of how to go about doing this?

Try the following at the begin of your program (i.e. before creating any 
ViennaCL-objects):

viennacl::ocl::setup_context(0, viennacl::ocl::platform().devices()[2]);

Best regards,
Karli


 
 *From:* Karl Rupp r...@iue.tuwien.ac.at
 *To:* Sumit Kumar dost_4_e...@yahoo.com
 *Cc:* viennacl-devel@lists.sourceforge.net
 viennacl-devel@lists.sourceforge.net
 *Sent:* Sunday, July 26, 2015 11:37 PM
 *Subject:* Re: [ViennaCL-devel] ViennaCL reductions

 Hi Sumit,

   Thanks for the information. I have a second problem that has cropped up!
   a.) Apparently, I cannot copy a Row-Major Eigen matrix to a vcl matrix!
   It comes up with weird errors. I was using the default statements from
   the examples folder. The moment I changed it to Column Major eigen
   matrices the program compiled without any problem!

 Our interface currently supports Eigen::MatrixXf and Eigen::MatrixXd,
 which seems like it does not cover the row-major case. I'll look into it
 and extend the interface if needed.


   b.) I have an AMD APU (CPU + GPU) and a discrete GPU. How do I force
   ViennaCl to default to the discrete GPU and not the GPU from the APU? I
   tried a kludge:
   /get all available devices
  viennacl::ocl::platform pf;
  std::cout  Platform info:   pf.info()  std::endl;
  std::vectorviennacl::ocl::device devices =
   pf.devices(CL_DEVICE_TYPE_GPU);// CL_DEVICE_TYPE_DEFAULT);
  std::cout  devices[1].name()  std::endl;
  std::cout  Number of devices for custom context:  
   devices.size()  std::endl;
  
  //set up context using all found devices:
  int gSize = (int)devices.size();
  for (int i = gSize-1; i =0 ; --i)
  {
  device_id_array.push_back(devices[i].id());
  }
  
  std::cout  Creating context...  std::endl;
  cl_int err;
  cl_context my_context = clCreateContext(0,
   cl_uint(device_id_array.size()), (device_id_array[0]), NULL, NULL,
 err);
  
   and this made it work. However, is there a cleaner way of doing this?
   This code snippet is from custom-context.cpp. If i tried
   (device_id_array[1]), the program crashes !

 http://viennacl.sourceforge.net/doc/manual-multi-device.html




 Best regards,
 Karli




--
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] ViennaCL reductions

2015-07-27 Thread Karl Rupp

Hi Sumit,

  Thanks for the update. I will check it out. As for the second one,
 indeed thats what I ended up doing:

// Get some context information for OpenCL compatible GPU devices
viennacl::ocl::platform pf;
std::vectorviennacl::ocl::device devices =
 pf.devices(CL_DEVICE_TYPE_GPU);
// If no GPU devices are found, we select the CPU device
if (devices.size()  0)
{
  // Now, often we may have an integrated GPU with the CPU. We would
  // like to avoid using that GPU. Instead, we search for a discrete
  // GPU.
  viennacl::ocl::setup_context(0, devices[1]);
}
else
{
  devices = pf.devices(CL_DEVICE_TYPE_CPU);
  viennacl::ocl::setup_context(0, devices[0]);
}

Keep in mind that devices may be returned in arbitrary order. I've 
already seen cases where the discrete GPU is returned as the first 
device. (Unfortunately I don't know of a robust and general way of 
dealing with such cases).


  Here is my kludge. Is there any real example of data partitioning and
  using multiple GPU's?

No. PCI-Express latencies narrow down to sweet spot of real performance 
gains for most algorithms in ViennaCL a lot. Only algorithms relying 
heavily on matrix-matrix multiplications are likely to show good benefit 
from multiple GPUs. As a consequence, we are currently keeping our focus 
on single GPUs. There is some multi-GPU support through ViennaCL as a 
plugin for PETSc available, but that focuses on iterative solvers and 
does not cover any eigenvalue routines yet.

Best regards,
Karli


--
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] ViennaCL reductions

2015-07-27 Thread Karl Rupp

Hi Sumit,

  Agreed, the names can be returned in any order. As you are using CMake
 would it be possible to:
 a.) Write a small helper script using CMake that lists what devices the
 user has on his/her machine that are OpenCL compliant?

Exactly this is provided by running viennacl-info:
  $ mkdir build  cd build
  $ cmake ..
  $ make viennacl-info
  $ examples/tutorial/viennacl-info


 b.) Make VCL select the appropriate device so that these issues of
 context selection etc can be avoided?

In most cases the first OpenCL device returned is the most appropriate 
device for running the computations. If you know of any better strategy 
for picking the default device, please let us know and we are happy to 
adopt it. :-)


 c.) Of course, this would be purely optional and only available if the
 user wants to pre-select the device before writing any VCL code!

How is that different from what is possible now? What is the value of 
making a runtime-decision (current ViennaCL code) a compile-time 
decision (CMake)?

Imagine you are building a big application with GUI and other bells and 
whistles. In such a scenario you would like the user to select the 
compute device through a dialog at runtime rather than asking the user 
to recompile the whole application just to change the default device. 
Our current mindset is to incrementally move away from compile time 
decisions in cases where they are not needed.

Best regards,
Karli




 
 *From:* Karl Rupp r...@iue.tuwien.ac.at
 *To:* Sumit Kumar dost_4_e...@yahoo.com
 *Cc:* viennacl-devel@lists.sourceforge.net
 viennacl-devel@lists.sourceforge.net
 *Sent:* Monday, July 27, 2015 7:03 PM
 *Subject:* Re: [ViennaCL-devel] ViennaCL reductions

 Hi Sumit,

   Thanks for the update. I will check it out. As for the second one,
   indeed thats what I ended up doing:
  
  // Get some context information for OpenCL compatible GPU devices
  viennacl::ocl::platform pf;
  std::vectorviennacl::ocl::device devices =
   pf.devices(CL_DEVICE_TYPE_GPU);
  // If no GPU devices are found, we select the CPU device
  if (devices.size()  0)
  {
// Now, often we may have an integrated GPU with the CPU. We would
// like to avoid using that GPU. Instead, we search for a discrete
// GPU.
viennacl::ocl::setup_context(0, devices[1]);
  }
  else
  {
devices = pf.devices(CL_DEVICE_TYPE_CPU);
viennacl::ocl::setup_context(0, devices[0]);
  }

 Keep in mind that devices may be returned in arbitrary order. I've
 already seen cases where the discrete GPU is returned as the first
 device. (Unfortunately I don't know of a robust and general way of
 dealing with such cases).


   Here is my kludge. Is there any real example of data partitioning and
   using multiple GPU's?

 No. PCI-Express latencies narrow down to sweet spot of real performance
 gains for most algorithms in ViennaCL a lot. Only algorithms relying
 heavily on matrix-matrix multiplications are likely to show good benefit
 from multiple GPUs. As a consequence, we are currently keeping our focus
 on single GPUs. There is some multi-GPU support through ViennaCL as a
 plugin for PETSc available, but that focuses on iterative solvers and
 does not cover any eigenvalue routines yet.




 Best regards,
 Karli





--
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] Column-wise kernels?

2015-07-27 Thread Karl Rupp

Hi Charles,

  I am working on writing some additional opencl kernels (potentially to
 incorporate in to viennacl) which involve column-wise reductions.  A
 simple case would simply be the sum of each column of a matrix.
 However, I am having an extremely difficult time getting my kernel
 correct (reductions are tricky to me).  That said, after searching for
 some resources I came across an old post on sourceforge referring to
 column-wise kernels
 (http://sourceforge.net/p/viennacl/mailman/message/27542552/) with
 viennacl.  This leads me to my primary question.

 Are there such kernels already in ViennaCL that I have overlooked?

Yes ;-) Have a look here at how row-wise sums reduce to a standard 
matrix-vector product:
https://sourceforge.net/p/viennacl/discussion/1143678/thread/38e942a0/

That is, in order to compute a row-sum and a column-sum you can use
  row_sum = prod(A, ones);
  col_sum = prod(trans(A), ones);

In an hour or two I will push convenience functions for summation fixing 
the only remaining issue for the 1.7.0 release:
  https://github.com/viennacl/viennacl-dev/issues/127


 If not, are there any examples or resources you would recommend to help
 learn this topic?  I have tried searching further but the only thing I
 can really find is a reduction of an entire matrix (which is relatively
 simple) as opposed to by column or row.

At this point I can only recommend to think about how such operations 
can be recast in terms of (standard) linear algebra. For example, row- 
and column-wise updates to a matrix are special cases of the more general
  A += outer_prod(u, v);
operation (rank-1 updates). I'll improve the documentation in that 
direction.

Best regards,
Karli


--
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] Column-wise kernels?

2015-07-27 Thread Karl Rupp


 Excellent, thank you.  I thought that would be the way to go initially
 but I hesitated because of concerns about having additional temporary
 objects taking up memory when matrices begin to get larger but it
 certainly is simpler this way.

Just pushed:
https://github.com/viennacl/viennacl-dev/commit/4063c941235d46804cd448db7ddecf0c3238548f

Yeah, it's a bit of a trade-off: Sure, one could optimize the summation 
kernel, but this also implies more code to maintain. On the other hand, 
I'm not aware (which, of course, does not deny a possible existence) of 
a scenario where such summation routines are the performance bottleneck.

 Glad to hear that 1.7.0 is nearly completed.  Does that mean we should
 expect a formal release soon?

Yep. Expect the release on Wednesday.

Best regards,
Karli



 On Mon, Jul 27, 2015 at 9:57 AM, Karl Rupp r...@iue.tuwien.ac.at
 mailto:r...@iue.tuwien.ac.at wrote:

 Hi Charles,

I am working on writing some additional opencl kernels
 (potentially to
  incorporate in to viennacl) which involve column-wise reductions.  A
  simple case would simply be the sum of each column of a matrix.
  However, I am having an extremely difficult time getting my kernel
  correct (reductions are tricky to me).  That said, after searching for
  some resources I came across an old post on sourceforge referring to
  column-wise kernels
   (http://sourceforge.net/p/viennacl/mailman/message/27542552/) with
  viennacl.  This leads me to my primary question.
 
  Are there such kernels already in ViennaCL that I have overlooked?

 Yes ;-) Have a look here at how row-wise sums reduce to a standard
 matrix-vector product:
 https://sourceforge.net/p/viennacl/discussion/1143678/thread/38e942a0/

 That is, in order to compute a row-sum and a column-sum you can use
row_sum = prod(A, ones);
col_sum = prod(trans(A), ones);

 In an hour or two I will push convenience functions for summation fixing
 the only remaining issue for the 1.7.0 release:
 https://github.com/viennacl/viennacl-dev/issues/127


  If not, are there any examples or resources you would recommend to help
  learn this topic?  I have tried searching further but the only thing I
  can really find is a reduction of an entire matrix (which is relatively
  simple) as opposed to by column or row.

 At this point I can only recommend to think about how such operations
 can be recast in terms of (standard) linear algebra. For example, row-
 and column-wise updates to a matrix are special cases of the more
 general
A += outer_prod(u, v);
 operation (rank-1 updates). I'll improve the documentation in that
 direction.

 Best regards,
 Karli


 
 --
 ___
 ViennaCL-devel mailing list
 ViennaCL-devel@lists.sourceforge.net
 mailto:ViennaCL-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/viennacl-devel




--
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] Column-wise kernels?

2015-07-27 Thread Karl Rupp

Hi,

in addition to what Philippe said, let me give you a short code snippet 
used as a prototype for a couple of reductions in ViennaCL. It certainly 
takes a little to get your head around it, but once you figured it out 
it's like a swiss army knife ;-)

The overall workflow is always the same: You decompose the full data 
into large chunks upon which all the workgroups operate (e.g. individual 
rows for a matrix-vector product). Within each work group you further 
decompose the work for each thread. Then you need to sum (alternatives: 
min/max/xor/etc.) all those values:

  // place thread results in __local array:
  shared_array[get_local_id(0)] = value_computed_by_respective_thread();

  // reduction loop:
  for (uint stride=get_local_size(0)/2; stride  0; stride /= 2)
  {
barrier(CLK_LOCAL_MEM_FENCE);
if (get_local_id(0)  stride)
  shared_array[get_local_id(0)] += shared_array[get_local_id(0) + 
stride];
  }

  // process result in shared_array[0], e.g. write to global memory:
  if (get_local_id(0) == 0)
do_something_with_result(shared_array[0]);

The barrier in the body of the for-loop is required to avoid data races. 
The last if-statement is merely for processing the results and may also 
involve all threads rather than just the first thread in the workgroup.

Hope this helps :-)

Best regards,
Karli


On 07/27/2015 07:07 PM, Charles Determan wrote:
 Philippe,

 I definitely understand and support such a solution for ViennaCL.  I
 don't mean to say it should be included, I was just curious what the
 current approach was.  However, I am interested in additional OpenCL
 development outside of the framework.  Do you have any recommendations
 in learning more about coding OpenCL reductions?  As I mentioned above,
 I have only found very basic reduction approaches so far, nothing for
 slightly more complex scenarios like the column and row sum examples.

 If this is transitioning too far from the mailing list focus I would
 certainly appreciate a reply off list.

 Thank you,
 Charles

 On Mon, Jul 27, 2015 at 11:46 AM, Philippe Tillet phil.til...@gmail.com
 mailto:phil.til...@gmail.com wrote:

 Hi,

 Such row-rise / column-wise reductions could be generate-able by the
 OpenCL backend, but this won't work on the Host of CUDA backend.
 Plus, this is not really maintained at the moment. I would recommend
 Karl's solution, even though it won't be optimal when the vector
 does not fit in the L2 cache of the OpenCL device (Maxwell for
 example has 2MB of L2 cache), as the current algorithm for GEMV
 accesses the entire vector get_num_groups(0) times.

 Philippe

 2015-07-27 9:40 GMT-07:00 Karl Rupp r...@iue.tuwien.ac.at
 mailto:r...@iue.tuwien.ac.at:


  Excellent, thank you.  I thought that would be the way to go 
 initially
  but I hesitated because of concerns about having additional 
 temporary
  objects taking up memory when matrices begin to get larger but it
  certainly is simpler this way.

 Just pushed:
 
 https://github.com/viennacl/viennacl-dev/commit/4063c941235d46804cd448db7ddecf0c3238548f

 Yeah, it's a bit of a trade-off: Sure, one could optimize the
 summation
 kernel, but this also implies more code to maintain. On the
 other hand,
 I'm not aware (which, of course, does not deny a possible
 existence) of
 a scenario where such summation routines are the performance
 bottleneck.

  Glad to hear that 1.7.0 is nearly completed.  Does that mean we 
 should
  expect a formal release soon?

 Yep. Expect the release on Wednesday.

 Best regards,
 Karli



  On Mon, Jul 27, 2015 at 9:57 AM, Karl Rupp r...@iue.tuwien.ac.at 
 mailto:r...@iue.tuwien.ac.at
  mailto:r...@iue.tuwien.ac.at mailto:r...@iue.tuwien.ac.at 
 wrote:
 
  Hi Charles,
 
 I am working on writing some additional opencl kernels
  (potentially to
   incorporate in to viennacl) which involve column-wise 
 reductions.  A
   simple case would simply be the sum of each column of a 
 matrix.
   However, I am having an extremely difficult time getting my 
 kernel
   correct (reductions are tricky to me).  That said, after 
 searching for
   some resources I came across an old post on sourceforge 
 referring to
   column-wise kernels

 (http://sourceforge.net/p/viennacl/mailman/message/27542552/) with
   viennacl.  This leads me to my primary question.
  
   Are there such kernels already in ViennaCL that I have 
 overlooked?
 
  Yes ;-) Have a look here at how row-wise sums reduce to a 
 standard
  matrix-vector product:
 
 https://sourceforge.net/p/viennacl/discussion/1143678/thread/38e942a0

Re: [ViennaCL-devel] ViennaCL reductions

2015-07-27 Thread Karl Rupp

Hi,

  That's why I explicitly stated in my previous mail Optional :) I am
 one of those folks who likes the coziness of compile time decisions and
 (if possible) would like to make it available in VCL. However, I am not
 sure if doing this is possible via the current API structure of VCL. I
 will look into this as I first need to understand the intricacies of
 OpenCL.

 BTW, a previous question of mine was unanswered (or I may have
 overlooked). Are there any equivalent functions in VCL to do the following:
 a.) Two matrices, when transferred to the GPU should undergo
 element_wise multiplication ?
 b.) A unary operation on a matrix that has been transferred to the GPU?

a.) and b.) sound a lot like you are looking for element_*-functions
http://viennacl.sourceforge.net/doc/manual-operations.html#manual-operations-blas1,
 
which work for vectors and matrices alike.

Best regards,
Karli



 
 *From:* Karl Rupp r...@iue.tuwien.ac.at
 *To:* Sumit Kumar dost_4_e...@yahoo.com
 *Cc:* viennacl-devel@lists.sourceforge.net
 viennacl-devel@lists.sourceforge.net
 *Sent:* Monday, July 27, 2015 9:13 PM
 *Subject:* Re: [ViennaCL-devel] ViennaCL reductions

 Hi Sumit,

   Agreed, the names can be returned in any order. As you are using CMake
   would it be possible to:
   a.) Write a small helper script using CMake that lists what devices the
   user has on his/her machine that are OpenCL compliant?

 Exactly this is provided by running viennacl-info:
$ mkdir build  cd build
$ cmake ..
$ make viennacl-info
$ examples/tutorial/viennacl-info


   b.) Make VCL select the appropriate device so that these issues of
   context selection etc can be avoided?

 In most cases the first OpenCL device returned is the most appropriate
 device for running the computations. If you know of any better strategy
 for picking the default device, please let us know and we are happy to
 adopt it. :-)


   c.) Of course, this would be purely optional and only available if the
   user wants to pre-select the device before writing any VCL code!

 How is that different from what is possible now? What is the value of
 making a runtime-decision (current ViennaCL code) a compile-time
 decision (CMake)?

 Imagine you are building a big application with GUI and other bells and
 whistles. In such a scenario you would like the user to select the
 compute device through a dialog at runtime rather than asking the user
 to recompile the whole application just to change the default device.
 Our current mindset is to incrementally move away from compile time
 decisions in cases where they are not needed.

 Best regards,
 Karli







   
   *From:* Karl Rupp r...@iue.tuwien.ac.at mailto:r...@iue.tuwien.ac.at
   *To:* Sumit Kumar dost_4_e...@yahoo.com mailto:dost_4_e...@yahoo.com
   *Cc:* viennacl-devel@lists.sourceforge.net
 mailto:viennacl-devel@lists.sourceforge.net
   viennacl-devel@lists.sourceforge.net
 mailto:viennacl-devel@lists.sourceforge.net
   *Sent:* Monday, July 27, 2015 7:03 PM
   *Subject:* Re: [ViennaCL-devel] ViennaCL reductions
  
   Hi Sumit,
  
 Thanks for the update. I will check it out. As for the second one,
 indeed thats what I ended up doing:

// Get some context information for OpenCL compatible GPU devices
viennacl::ocl::platform pf;
std::vectorviennacl::ocl::device devices =
 pf.devices(CL_DEVICE_TYPE_GPU);
// If no GPU devices are found, we select the CPU device
if (devices.size()  0)
{
  // Now, often we may have an integrated GPU with the CPU. We
 would
  // like to avoid using that GPU. Instead, we search for a
 discrete
  // GPU.
  viennacl::ocl::setup_context(0, devices[1]);
}
else
{
  devices = pf.devices(CL_DEVICE_TYPE_CPU);
  viennacl::ocl::setup_context(0, devices[0]);
}
  
   Keep in mind that devices may be returned in arbitrary order. I've
   already seen cases where the discrete GPU is returned as the first
   device. (Unfortunately I don't know of a robust and general way of
   dealing with such cases).
  
  
 Here is my kludge. Is there any real example of data
 partitioning and
 using multiple GPU's?
  
   No. PCI-Express latencies narrow down to sweet spot of real performance
   gains for most algorithms in ViennaCL a lot. Only algorithms relying
   heavily on matrix-matrix multiplications are likely to show good benefit
   from multiple GPUs. As a consequence, we are currently keeping our focus
   on single GPUs. There is some multi-GPU support through ViennaCL as a
   plugin for PETSc available, but that focuses on iterative solvers and
   does not cover any eigenvalue routines yet.
  
  
  
  
   Best regards,
   Karli

Re: [ViennaCL-devel] ViennaCL reductions

2015-07-26 Thread Karl Rupp

Hi Sumit,

I have the following reduction that I can do in Eigen. Is there
something similar that I can do in ViennaCL?
RMMatrix_Float v = (prod.block(0,0,rows,cols).rowwise() -
aVector.transpose());

you can use outer_prod() for this. First argument is a vector of all
ones, the second argument is your row vector.

RMSparseMatrix_Float v1 =
v.unaryExpr(std::ptr_fun(clip_lower)).sparseView();

We don't provide anything like this natively at the moment, so you would
need to write a custom kernel for it (most likely you would need one
custom kernel to count the number of nonzeros per row, then an exclusive
scan, and finally another loop over the elements to fill the sparse matrix).

Best regards,
Karli

Essentially, I am subtracting a row vector from every rows of a matrix.
Then, I am applying a unary operation (soft threshold for example) on
every pixel and making the matrix sparse.

RMMatrix_Float is a typedef for a row-major Eigen Dense matrix;
SparseMatrix being its sparse equivalenth

Thanks and Regards
Sumit

*From:* Karl Rupp r...@iue.tuwien.ac.at
*To:* Charles Determan cdeterma...@gmail.com
*Cc:* viennacl-devel@lists.sourceforge.net
*Sent:* Wednesday, July 15, 2015 1:53 AM
*Subject:* Re: [ViennaCL-devel] ViennaCL eigenvectors?

Hi Charles,

Everything appears to be working correctly with the power method. A
followup though, is it possible to return 'all' eigenvalues and
eigenvectors with the power algorithm?

No, it only provides the largest eigenvalue in modulus.

The Lanczos example (in the
feature-improve-lanczos branch) code looks like it would iterate through
all eigenvalues and eigenvectors. For example, with a trivial matrix of
size 4x4 I would expect to get 4 eigenvalues and 4 eigenvectors (of
length 4, i.e. a matrix of 16 elements).

Lanczos is typically used for obtaining estimates on the largest
eigenvalues of a huge matrix (say, the largest 10). There is actually an
implementation of the QR method for finding all eigenvalues of a
symmetric matrix available, too, cf.
https://github.com/viennacl/viennacl-dev/blob/master/examples/tutorial/qr_method.cpp
(I need to check why it does not show up in the manual)

This leads me to another question, I didn't see a merge between the
Lanczos algorithm. Is that still under development or is it now
implemented in the master branch?

Some parts currently experience a final polishing (for example,
yesterday I eliminated the Boost dependency), so it will migrate to
master soon. :-)

Best regards,
Karli

On Fri, Jul 10, 2015 at 4:20 AM, Karl Rupp r...@iue.tuwien.ac.at
mailto:r...@iue.tuwien.ac.at
mailto:r...@iue.tuwien.ac.at mailto:r...@iue.tuwien.ac.at wrote:

Hi Charles,

the interface for the power iteration has been updated to also
return eigenvectors:

https://github.com/viennacl/viennacl-dev/commit/e80cc2141f266eb9b279dd45b7c4075b557bf558

Please let me know if you run into issues.

Best regards,
Karli

On 07/08/2015 06:58 PM, Charles Determan wrote:

Greetings,

I have seen that I can get all the eigenvalues with the lanczos
algorithm in the lanczos.cpp example file but I don't see any
documentation on eigenvectors. The only thing I have found is
on the
feature-improve-lanczos branch

https://github.com/viennacl/viennacl-dev/blob/karlrupp/feature-improve-lanczos/examples/tutorial/lanczos.cpp.
Is this intended to be implemented or is there existing
support for
determining eigen vectors?

A secondary question, is the same possible with the power method?

Thanks,
Charles

___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
mailto:ViennaCL-devel@lists.sourceforge.net
mailto:ViennaCL-devel@lists.sourceforge.net
mailto:ViennaCL-devel@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] ViennaCL reductions

2015-07-26 Thread Karl Rupp

Hi Sumit,

  Thanks for the information. I have a second problem that has cropped up!
 a.) Apparently, I cannot copy a Row-Major Eigen matrix to a vcl matrix!
 It comes up with weird errors. I was using the default statements from
 the examples folder. The moment I changed it to Column Major eigen
 matrices the program compiled without any problem!

Our interface currently supports Eigen::MatrixXf and Eigen::MatrixXd, 
which seems like it does not cover the row-major case. I'll look into it 
and extend the interface if needed.


 b.) I have an AMD APU (CPU + GPU) and a discrete GPU. How do I force
 ViennaCl to default to the discrete GPU and not the GPU from the APU? I
 tried a kludge:
 /get all available devices
viennacl::ocl::platform pf;
std::cout  Platform info:   pf.info()  std::endl;
std::vectorviennacl::ocl::device devices =
 pf.devices(CL_DEVICE_TYPE_GPU);// CL_DEVICE_TYPE_DEFAULT);
std::cout  devices[1].name()  std::endl;
std::cout  Number of devices for custom context:  
 devices.size()  std::endl;

//set up context using all found devices:
int gSize = (int)devices.size();
for (int i = gSize-1; i =0 ; --i)
{
device_id_array.push_back(devices[i].id());
}

std::cout  Creating context...  std::endl;
cl_int err;
cl_context my_context = clCreateContext(0,
 cl_uint(device_id_array.size()), (device_id_array[0]), NULL, NULL, err);

 and this made it work. However, is there a cleaner way of doing this?
 This code snippet is from custom-context.cpp. If i tried
 (device_id_array[1]), the program crashes !

http://viennacl.sourceforge.net/doc/manual-multi-device.html

Best regards,
Karli

--
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] ViennaCL eigenvectors?

2015-07-14 Thread Karl Rupp

Hi Charles,

Everything appears to be working correctly with the power method. A
followup though, is it possible to return 'all' eigenvalues and
eigenvectors with the power algorithm?

No, it only provides the largest eigenvalue in modulus.

This leads me to another question, I didn't see a merge between the
Lanczos algorithm. Is that still under development or is it now
implemented in the master branch?

Some parts currently experience a final polishing (for example,
yesterday I eliminated the Boost dependency), so it will migrate to
master soon. :-)

Best regards,
Karli

On Fri, Jul 10, 2015 at 4:20 AM, Karl Rupp r...@iue.tuwien.ac.at
mailto:r...@iue.tuwien.ac.at wrote:

Hi Charles,

the interface for the power iteration has been updated to also
return eigenvectors:

https://github.com/viennacl/viennacl-dev/commit/e80cc2141f266eb9b279dd45b7c4075b557bf558

Please let me know if you run into issues.

Best regards,
Karli

On 07/08/2015 06:58 PM, Charles Determan wrote:

Greetings,

A secondary question, is the same possible with the power method?

Thanks,
Charles

___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
mailto:ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] ViennaCL eigenvectors?

2015-07-10 Thread Karl Rupp

Hi Charles,

the interface for the power iteration has been updated to also return 
eigenvectors:
https://github.com/viennacl/viennacl-dev/commit/e80cc2141f266eb9b279dd45b7c4075b557bf558

Please let me know if you run into issues.

Best regards,
Karli


On 07/08/2015 06:58 PM, Charles Determan wrote:
 Greetings,

 I have seen that I can get all the eigenvalues with the lanczos
 algorithm in the lanczos.cpp example file but I don't see any
 documentation on eigenvectors.  The only thing I have found is on the
 feature-improve-lanczos branch
 https://github.com/viennacl/viennacl-dev/blob/karlrupp/feature-improve-lanczos/examples/tutorial/lanczos.cpp.
 Is this intended to be implemented or is there existing support for
 determining eigen vectors?

 A secondary question, is the same possible with the power method?

 Thanks,
 Charles


 --
 Don't Limit Your Business. Reach for the Cloud.
 GigeNET's Cloud Solutions provide you with the tools and support that
 you need to offload your IT needs and focus on growing your business.
 Configured For All Businesses. Start Your Cloud Today.
 https://www.gigenetcloud.com/



 ___
 ViennaCL-devel mailing list
 ViennaCL-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/viennacl-devel



--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] ViennaCL eigenvectors?

2015-07-08 Thread Karl Rupp

Hi Charles,

  I have seen that I can get all the eigenvalues with the lanczos
 algorithm in the lanczos.cpp example file but I don't see any
 documentation on eigenvectors.  The only thing I have found is on the
 feature-improve-lanczos branch
 https://github.com/viennacl/viennacl-dev/blob/karlrupp/feature-improve-lanczos/examples/tutorial/lanczos.cpp.
 Is this intended to be implemented or is there existing support for
 determining eigen vectors?

The Lanczos-code in the feature-branch is work-in-progress and will soon 
be merged to the master branch (and thus be included in 1.7.0). As you 
correctly noticed, documentation is one of the remaining todos.


 A secondary question, is the same possible with the power method?

The eigenvector is not returned yet. It certainly should allow that. 
I'll update the interface tomorrow and let you know.

Best regards,
Karli



--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] NVCC compiler requirement for CUDA backend?

2015-05-26 Thread Karl Rupp

Hi Charles,

  I am new to the viennacl library after going through some of the
 documentation there was something I would like a bit of clarification.
 Could please confirm if the NVCC compiler is indeed required to use the
 CUDA backend or just for the makefiles of the examples?  I ask this as I
 have compiled CUDA programs simply with g++ previously.  If viennacl
 does require NVCC to use CUDA backend, could someone kindly explain why
 this is so?

ViennaCL is a C++ header-only library, which has several implications:
  a) you can just copypaste the source tree and get going. No 
configuration is needed, there's no shared library you need to build first.
  b) ViennaCL can provide neat operator overloads using expression templates
  c) each of your compilation units will see most (if not all) of 
ViennaCL's sources. If you enable the CUDA backend, your compiler will 
see all the kernels written in CUDA, hence it requires you to use NVCC. 
This not only includes the examples, but also your applications. We do 
not ship PTX-code and use the CUDA driver-API to get around NVCC, as 
this is too much of a burden to maintain.

If you don't want to use NVCC, you can still interface to libviennacl, 
which compiles part of ViennaCL's functionality into a shared library. 
It is pretty much BLAS-like and focuses on dense linear algebra for now, 
so sparse linear algebra is not covered yet. Such an interface to a 
shared library is most likely how you've called CUDA-functionality in 
the past.

I hope this clarifies things. Please let me know if you have further 
questions. :-)

Best regards,
Karli


--
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] NVCC compiler requirement for CUDA backend?

2015-05-26 Thread Karl Rupp

Hi Charles,

On 05/26/2015 03:08 PM, Charles Determan wrote:
 Thank you Karli,

 The ViennaCL library does appear to be impressive and I am excited to
 begin using it.  Just so I know I understand you clearly.  Unless I
 compile the libviennacl shared library (with dense linear algebra
 focuse) and link against it I would need to use nvcc to compile a
 program I write with viennacl with the intent to use the CUDA backend.

 Am I understanding you correctly?

Yes, this is correct.

Out of curiosity: Would you prefer to have all functionality available 
via libviennacl (using C-functions rather than C++ template stuff) instead?

Best regards,
Karli


 On Tue, May 26, 2015 at 7:47 AM, Karl Rupp r...@iue.tuwien.ac.at
 mailto:r...@iue.tuwien.ac.at wrote:

 Hi Charles,


   I am new to the viennacl library after going through some of the

 documentation there was something I would like a bit of
 clarification.
 Could please confirm if the NVCC compiler is indeed required to
 use the
 CUDA backend or just for the makefiles of the examples?  I ask
 this as I
 have compiled CUDA programs simply with g++ previously.  If viennacl
 does require NVCC to use CUDA backend, could someone kindly
 explain why
 this is so?


 ViennaCL is a C++ header-only library, which has several implications:
   a) you can just copypaste the source tree and get going. No
 configuration is needed, there's no shared library you need to build
 first.
   b) ViennaCL can provide neat operator overloads using expression
 templates
   c) each of your compilation units will see most (if not all) of
 ViennaCL's sources. If you enable the CUDA backend, your compiler
 will see all the kernels written in CUDA, hence it requires you to
 use NVCC. This not only includes the examples, but also your
 applications. We do not ship PTX-code and use the CUDA driver-API to
 get around NVCC, as this is too much of a burden to maintain.

 If you don't want to use NVCC, you can still interface to
 libviennacl, which compiles part of ViennaCL's functionality into a
 shared library. It is pretty much BLAS-like and focuses on dense
 linear algebra for now, so sparse linear algebra is not covered yet.
 Such an interface to a shared library is most likely how you've
 called CUDA-functionality in the past.

 I hope this clarifies things. Please let me know if you have further
 questions. :-)

 Best regards,
 Karli




--
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] NVCC compiler requirement for CUDA backend?

2015-05-26 Thread Karl Rupp


 My interest comes in my development of additional programs that would
 utilize ViennaCL.  I would like them to be platform independent so the
 prospect of a header-only library is very exciting to me.  I definitely
 prefer the C++ template format to the C-functions so that is fine.  The
 less linking and the less shared objects to be created the easier the
 program will be for future users to install and begin using the new
 program (future users will likely not be heavy OpenCL or even C++
 programmers).  In a perfect world, I would just like to include the
 header for the respective function (OpenCL or CUDA) and compile the program.

I see - our imaginations of a 'perfect world' are pretty similar when it 
comes to computing. ;-)

(Note that with ViennaCL you don't have to include different headers for 
the different compute backends yourself: You just
  #include viennacl/vector.hpp
and you get all the operations in a backend-agnostic way.)


 That said, I realize this scenario is very unlikely.  However, if a user
 has a NVIDIA GPU, they also would need to have the driver and SDK
 installed to use it appropriately.  So perhaps my next best hope would
 be to just include the header for a given CUDA function and then just
 link to the original CUDA shared library (which would/should already be
 installed in a standard location on the given OS).

 What are your thoughts on this scenario?

Currently you have to specify at compile time which backends should be 
available. That is, you have to explicitly enable OpenCL, CUDA, or 
OpenCL via the respective switches. One common way to do this is to e.g. 
pass -DVIENNACL_WITH_OPENCL to your compiler.

While such a static selection at compile time is common practice in 
scientific computing, it doesn't follow the 'perfect world' idea: 
Depending on the enabled computing backends, different libraries (like 
libOpenCL.so or the CUDA runtime) need to be available at the client 
machine. This imposes limitations when deploying binaries, for example 
if you want to provide ready-to-run binaries for your application built 
on top of ViennaCL.

A possible improvement for ViennaCL is to switch to a plugin 
architecture similar to web browsers: ViennaCL could then be shipped and 
run with a minimum set of dependency and dynamically load additional 
compute backends (OpenCL, CUDA, etc.) at runtime (either from a shared 
library on the file system, or just enable/disable internal backends). 
This could be even interactive through a plugin loader. Such a plugin 
system might be too 'radical' with respect to the way libraries are used 
for scientific computing, so I'm still hesitant whether this would be 
worth the effort for the future. Any input on this is of course 
appreciated ;-)

Best regards,
Karli



 On Tue, May 26, 2015 at 8:14 AM, Karl Rupp r...@iue.tuwien.ac.at
 mailto:r...@iue.tuwien.ac.at wrote:

 Hi Charles,

 On 05/26/2015 03:08 PM, Charles Determan wrote:

 Thank you Karli,

 The ViennaCL library does appear to be impressive and I am
 excited to
 begin using it.  Just so I know I understand you clearly.  Unless I
 compile the libviennacl shared library (with dense linear algebra
 focuse) and link against it I would need to use nvcc to compile a
 program I write with viennacl with the intent to use the CUDA
 backend.

 Am I understanding you correctly?


 Yes, this is correct.

 Out of curiosity: Would you prefer to have all functionality
 available via libviennacl (using C-functions rather than C++
 template stuff) instead?

 Best regards,
 Karli


 On Tue, May 26, 2015 at 7:47 AM, Karl Rupp
 r...@iue.tuwien.ac.at mailto:r...@iue.tuwien.ac.at
 mailto:r...@iue.tuwien.ac.at mailto:r...@iue.tuwien.ac.at
 wrote:

  Hi Charles,


I am new to the viennacl library after going through
 some of the

  documentation there was something I would like a bit of
  clarification.
  Could please confirm if the NVCC compiler is indeed
 required to
  use the
  CUDA backend or just for the makefiles of the
 examples?  I ask
  this as I
  have compiled CUDA programs simply with g++
 previously.  If viennacl
  does require NVCC to use CUDA backend, could someone kindly
  explain why
  this is so?


  ViennaCL is a C++ header-only library, which has several
 implications:
a) you can just copypaste the source tree and get going. No
  configuration is needed, there's no shared library you need
 to build
  first.
b) ViennaCL can provide neat operator overloads using
 expression
  templates
c) each of your compilation

[ViennaCL-devel] IWOCL 2015

2015-03-31 Thread Karl Rupp

Hi,

(slightly off-topic, but certainly of interest to users of the 
OpenCL-backend of ViennaCL)

I'd like to take the opportunity to promote the 2015 edition of the 
Intl. Workshop on OpenCL (IWOCL): http://www.iwocl.org/

The big theme for this year is SyCL, a royalty-free, cross-platform C++ 
abstraction layer for OpenCL. (I consider it to be the OpenCL-response 
to NVIDIA's CUDA compiler ;-) ). Almost all major vendors will showcast 
their OpenCL-enabled hardware, resulting in an overall program with a 
nice balance of OpenCL tutorials, vendor presentations, and research 
papers. ViennaCL is actively looking into extending its support for 
mobile (smartphone, etc.) hardware, so you can certainly find some good 
future applications there. :-)

Best regards,
Karli

--
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] SIAM Conference - March 14-15, 2015

2015-03-03 Thread Karl Rupp

Hi Matt,

On 03/03/2015 06:26 PM, Matthew Musto wrote:
 Folks,

 I see the conference is coming up very soon and I wanted to wish you all
 good luck on your presentation.

Thanks. Since it's a poster presentation, it's more a one-to-one 
discussion rather than a 20-minute one-man show with extensive 
preparation. But hey, we even have a demo-table! :-)

 I know in past year's there was a
 flurry of activity right before, preparing new benchmarks and
 highlighting recent progress.

Usually people get really into panic before supercomputing (SC) in 
November. SIAM CSE is a much more laid-back type of event - but still 
with excellent quality :-)


 This development list has been very quiet
 lately, so I thought I would prod you all for some information and/or
 some details of your presentation.

We will primarily focus on features available in the library already, 
which is more than enough to fill a poster. If we place too much 
emphasis on upcoming features, nobody is interested in the available 
functionality anymore, which would be sad.

Either way, our pipelined solvers will be featured, since we collected 
quite a lot of benchmark data for this new feature. Here's a preprint of 
our paper: http://arxiv.org/abs/1410.4054
The poster will also mention the device database, PyViennaCL and 
ViennaCLBench. Details will be covered in oral conversation :-)

Best regards,
Karli




--
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

[ViennaCL-devel] ViennaCL 1.6.2 released!

2014-12-11 Thread Karl Rupp

Dear ViennaCL users,

ViennaCL 1.6.2 is now available for download at
http://viennacl.sourceforge.net/

Most notably, this latest release provides full compatibility with the 
OpenMP 2.0 standard so that also specialized compilers on supercomputers 
can be used, resolves compilation problems with CUDA on Visual Studio, 
and further enhances the performance of pipelined iterative solvers.

A full list of changes is available here:
http://viennacl.sourceforge.net/doc/changelog.html

Best regards,
Karl Rupp

--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration  more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151iu=/4140/ostg.clktrk
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

[ViennaCL-devel] ViennaCLBench 1.0.0 released

2014-12-01 Thread Karl Rupp

Dear ViennaCL users,

it is my pleasure to announce the first release of ViennaCLBench,
a benchmark GUI developed primarily by our Google Summer of Code student 
Namik Karovic. Chapeau!

The GUI extensively benchmarks the following operations:

  - Dense matrix-matrix products (GEMM)
  - Sparse matrix-vector products (SpMV - with Matrix Market browser)
  - Vector operations (AXPY)
  - Host-Device bandwidth (PCI-Express, etc.)

This provides a quick quantification of the performance achievable on a 
given machine. More operations are likely to be added in the future, 
subject to user feedback and contributions.

Webpage:http://viennaclbench.sourceforge.net/
Download  Screenshots: https://sourceforge.net/projects/viennaclbench/
Developer Repository:   https://github.com/viennacl/viennaclbench-dev
License:MIT/X11

Precompiled binaries are available for Windows, Linux, and Mac OS. 
Builds from source are also reasonably convenient and documented, but 
more involved than just downloading the binaries.

With best regards,
Karl Rupp

--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration  more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=157005751iu=/4140/ostg.clktrk
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] PyViennaCL 1.1.0 release?

2014-11-28 Thread Karl Rupp

Hi Toby,

  could you please provide a short update on your anticipated PyViennaCL
 1.1.0 release? I'd like to know whether I should go ahead with promoting
 ViennaCL 1.6.1 plus the GUI, or whether it's worth waiting a few (less
 than 7) more days for PyViennaCL.

 I'm sorry, I've been quite snowed under recently. I won't have any time
 this weekend, but I might be able to snatch some next week, and things
 should lighten up a little thereafter. You should probably go ahead with
 ViennaCL + GUI for now. In the worst case, PyViennaCL 1.1.0 will be a
 Christmas present to the numerical world..

Ok, thanks for the update. Looks like the best path is to go ahead with 
the advertising now...

I've been holding back the GUI a little in the hope that sourceforge 
staff will change the Unix name to 'viennaclbench' as requested, but it 
seems like this has become a random process which might get fixed any 
time - or never... :-(

Best regards,
Karli

--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration  more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=157005751iu=/4140/ostg.clktrk
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

[ViennaCL-devel] PyViennaCL 1.1.0 release?

2014-11-26 Thread Karl Rupp

Hi Toby,

could you please provide a short update on your anticipated PyViennaCL 
1.1.0 release? I'd like to know whether I should go ahead with promoting 
ViennaCL 1.6.1 plus the GUI, or whether it's worth waiting a few (less 
than 7) more days for PyViennaCL.

Best regards,
Karli

--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration  more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=157005751iu=/4140/ostg.clktrk
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] ViennaCL Benchmark GUI 1.0.0 Release Candidate

2014-11-24 Thread Karl Rupp

Hi Phil,

  Worked well on my laptop :-)

Good to know, thanks! :-)

   A couple of suggestions:

 - Maybe use layout N-T for GEMM, or perhaps it is already possible to
 chose? From my experience NT-col major (TN row major) always leads to
 higher performance on GEMM.

Ah, good point. We can use all four combinations (NN, TN, NT, TT) to 
emphasize the importance of proper memory layout. That's fairly easy to add.


 - The plots were hard to read because rather small on my laptop. I would
 love to be able to make the plot fullscreen, or to display the data as a
 table when I click on a curve. I don't know how easy this is to do with
 qt-creator, though...

Yeah, this is a resolution issue. On full-HD resultion it's much better 
to view/navigate, but still it's hard to extract the x/y values. At some 
later point we definitely have to offer a data export. For the time 
being I'll try to display a label if the mouse pointer is over a data item.

 Apart from this, I'm impressed! This is very user-friendly and detailed.

Cool - congrats to Namik! :-)

Best regards,
Karli


--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration  more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=157005751iu=/4140/ostg.clktrk
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] ViennaCL Benchmark GUI 1.0.0 Release Candidate

2014-11-24 Thread Karl Rupp

Hi Namik,

  It worked fine on my Windows 7 desktop machine; later I will test it on
 my Windows 8 laptop and report if anything is missing.
 I guess linking Qt dynamically is ok for the time being. But we should
 think about a static build for the long run.

A static build is unfortunately not possible because WebKit+Qt cannot be 
built statically. A static build of QWebKit is officially *disabled*:
https://qt.gitorious.org/qt/qtwebkit/commit/2573bb654e49a0bfb00ced6446cacae3a41fd776


 It would reduce the program
 size significantly.

Most of the package size stems from the WebKit engine (the .dll is 30 
MB). If we really want to cut down the program size, we have to replace 
the WebKit-based MatrixMarket Browser with a static list of matrices 
(after 1.0.0 of course... ;-) ).


 A few smaller issues are still left and will be addressed tomorrow.
 @Namik: Do you have some time to fix the layout on the start screen
 for multiple OpenCL devices (see screenshot)? Adding a 'scrollable'
 property might do the trick already...


 I thought I fixed that already... Damn, this cross-platform widget
 customizing is so annoying. I think I'll just revert back to using
 native widget styles...

Be careful with global changes at this point ;-) I think that it will 
suffice if you add a scrollbar to the widget on overflow.


  - The plots were hard to read because rather small on my laptop. I would
  love to be able to make the plot fullscreen, or to display the data as a
  table when I click on a curve. I don't know how easy this is to do with
  qt-creator, though...
 Yeah, this is a resolution issue. On full-HD resultion it's much better
 to view/navigate, but still it's hard to extract the x/y values. At some
 later point we definitely have to offer a data export. For the time
 being I'll try to display a label if the mouse pointer is over a
 data item.


 A fullscreen option is a good idea indeed. And not too hard implement.
 I'll see what I can do.
 As for displaying a label on mouseover, it's not as easy as it may seem.
 The plotting library we use doesn't provide that functionality
 out-of-the-box, so we'll pretty much have to make our own solution. It's
 not impossible, but not easy either. I've tried to do it during GSoC,
 but there were more important things to back then. Here's a nice
 explanation on how it can be achieved:

 http://www.qcustomplot.com/index.php/support/forum/54

I found a similar discussion yesterday on the web, including some more 
code snippets. If there's time left this afternoon, I'll look into it.

Best regards,
Karli


--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration  more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=157005751iu=/4140/ostg.clktrk
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] ViennaCL Benchmark GUI 1.0.0 Release Candidate

2014-11-24 Thread Karl Rupp

Hi Namik,

  Alright, I'll try with a scrollbar then.


 I've added the scrollbar, and went back to using native widget styles in
 the home screen. Here's a screenshot:

 http://pokit.org/get/img/e1f2f00e2c54701eba99811744a37369.jpg

Thanks, looks good (also on Linux).


 I really don't like those barely visible groupbox borders, which is why
 I tried customizing them. Unfortunately, my customization does not work
 so well on other platforms. I think it would be better to go with native
 looks if we want to avoid compatibility troubles. At least for now.

Yep. I find the native look more appealing then the customized look on 
Linux, and it also fixed layouting issues :-) Seems like optimizing the 
look for one platform will make things worse on other...

Best regards,
Karli



 On Mon, Nov 24, 2014 at 11:45 AM, Namik Karovic namik.karo...@gmail.com
 mailto:namik.karo...@gmail.com wrote:

 Hi Karl,

 A static build is unfortunately not possible because WebKit+Qt
 cannot be built statically. A static build of QWebKit is
 officially *disabled*:

 
 https://qt.gitorious.org/qt/__qtwebkit/commit/__2573bb654e49a0bfb00ced6446caca__e3a41fd776
 
 https://qt.gitorious.org/qt/qtwebkit/commit/2573bb654e49a0bfb00ced6446cacae3a41fd776

 I did not know that.

 Most of the package size stems from the WebKit engine (the .dll
 is 30 MB). If we really want to cut down the program size, we
 have to replace the WebKit-based MatrixMarket Browser with a
 static list of matrices (after 1.0.0 of course... ;-) ).


 Yeah, I guess that the best thing to do in this case.

 Be careful with global changes at this point ;-) I think that it
 will suffice if you add a scrollbar to the widget on overflow.


 Alright, I'll try with a scrollbar then.

 Regards, Namik

 On Mon, Nov 24, 2014 at 11:35 AM, Karl Rupp r...@iue.tuwien.ac.at
 mailto:r...@iue.tuwien.ac.at wrote:

 Hi Namik,

  It worked fine on my Windows 7 desktop machine; later I will test 
 it on

 my Windows 8 laptop and report if anything is missing.
 I guess linking Qt dynamically is ok for the time being. But
 we should
 think about a static build for the long run.


 A static build is unfortunately not possible because WebKit+Qt
 cannot be built statically. A static build of QWebKit is
 officially *disabled*:
 
 https://qt.gitorious.org/qt/__qtwebkit/commit/__2573bb654e49a0bfb00ced6446caca__e3a41fd776
 
 https://qt.gitorious.org/qt/qtwebkit/commit/2573bb654e49a0bfb00ced6446cacae3a41fd776


 It would reduce the program
 size significantly.


 Most of the package size stems from the WebKit engine (the .dll
 is 30 MB). If we really want to cut down the program size, we
 have to replace the WebKit-based MatrixMarket Browser with a
 static list of matrices (after 1.0.0 of course... ;-) ).


  A few smaller issues are still left and will be
 addressed tomorrow.
  @Namik: Do you have some time to fix the layout on the
 start screen
  for multiple OpenCL devices (see screenshot)? Adding a
 'scrollable'
  property might do the trick already...


 I thought I fixed that already... Damn, this cross-platform
 widget
 customizing is so annoying. I think I'll just revert back to
 using
 native widget styles...


 Be careful with global changes at this point ;-) I think that it
 will suffice if you add a scrollbar to the widget on overflow.


   - The plots were hard to read because rather small on
 my laptop. I would
   love to be able to make the plot fullscreen, or to
 display the data as a
   table when I click on a curve. I don't know how easy
 this is to do with
   qt-creator, though...
  Yeah, this is a resolution issue. On full-HD resultion
 it's much better
  to view/navigate, but still it's hard to extract the
 x/y values. At some
  later point we definitely have to offer a data export.
 For the time
  being I'll try to display a label if the mouse pointer
 is over a
  data item.


 A fullscreen option is a good idea indeed. And not too hard
 implement.
 I'll see what I can do.
 As for displaying a label on mouseover, it's not as easy as
 it may seem.
 The plotting library we use doesn't provide that functionality
 out-of-the-box, so we'll pretty much have to make our own
 solution. It's
 not impossible

Re: [ViennaCL-devel] ViennaCL Benchmark GUI 1.0.0 Release Candidate

2014-11-24 Thread Karl Rupp

Hi everybody,

just a quick status update:

- All the feedback on the portability of prebuilt binaries have been 
positive (various Linux distros, Win 7, Win 8), so it seems like the 
deployment works out okay.

- I requested a change of http://viennaclbenchmark.sourceforge.net/ to
   http://viennaclbench.sourceforge.net/ (shorter project name).
   It's also more in line with e.g. 'Luxmark' or 'CineBench'... ;-)

- Pimping the splash-screen and the logo is almost done.

- Possibly we will also start with a release for MacOS X. Need to play 
with it.

- Namik completed the Download and Run-Feature in the MatrixMarket :-)

Looks like we are good for a nice release tomorrow. Further testing 
still appreciated ;-)

Best regards,
Karli


On 11/23/2014 09:51 PM, Karl Rupp wrote:
 Hi guys,

 a release candidate for the benchmark GUI is available for download, I'd
 appreciate any testing - particularly the Windows version:

 ** Windows **
 http://viennaclbenchmark.sourceforge.net/ViennaCLBenchmark-1.0.0-RC.zip

 This is a self-contained package which is ready to launch after
 unzipping. It only requires OpenCL to be installed system-wide or to be
 available in the PATH environment variable.

 ** Linux **
 http://viennaclbenchmark.sourceforge.net/ViennaCLBenchmark-1.0.0-Linux-RC.gz


 Requires qt4 to be available on the system (Ubuntu: apt-get install
 libqt4, Arch Linux: pacman -Su qt4). On some distributions the webkit
 component needs to installed separately (Ubuntu: apt-get install
 libqtwebkit4, Arch Linux: pacman -Su qtwebkit). Make sure libOpenCL.so
 can be found system-wide, or run ldconfig accordingly.


 A few smaller issues are still left and will be addressed tomorrow.
 @Namik: Do you have some time to fix the layout on the start screen for
 multiple OpenCL devices (see screenshot)? Adding a 'scrollable' property
 might do the trick already...

 Best regards,
 Karli


 --
 Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
 from Actuate! Instantly Supercharge Your Business Reports and Dashboards
 with Interactivity, Sharing, Native Excel Exports, App Integration  more
 Get technology previously reserved for billion-dollar corporations, FREE
 http://pubads.g.doubleclick.net/gampad/clk?id=157005751iu=/4140/ostg.clktrk



 ___
 ViennaCL-devel mailing list
 ViennaCL-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/viennacl-devel



--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration  more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=157005751iu=/4140/ostg.clktrk
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

[ViennaCL-devel] ViennaCL Benchmark GUI 1.0.0 Release Candidate

2014-11-23 Thread Karl Rupp


Hi guys,

a release candidate for the benchmark GUI is available for download, I'd 
appreciate any testing - particularly the Windows version:


** Windows **
http://viennaclbenchmark.sourceforge.net/ViennaCLBenchmark-1.0.0-RC.zip

This is a self-contained package which is ready to launch after 
unzipping. It only requires OpenCL to be installed system-wide or to be 
available in the PATH environment variable.


** Linux **
http://viennaclbenchmark.sourceforge.net/ViennaCLBenchmark-1.0.0-Linux-RC.gz

Requires qt4 to be available on the system (Ubuntu: apt-get install 
libqt4, Arch Linux: pacman -Su qt4). On some distributions the webkit 
component needs to installed separately (Ubuntu: apt-get install 
libqtwebkit4, Arch Linux: pacman -Su qtwebkit). Make sure libOpenCL.so 
can be found system-wide, or run ldconfig accordingly.



A few smaller issues are still left and will be addressed tomorrow. 
@Namik: Do you have some time to fix the layout on the start screen for 
multiple OpenCL devices (see screenshot)? Adding a 'scrollable' property 
might do the trick already...


Best regards,
Karli
--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration  more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=157005751iu=/4140/ostg.clktrk___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] Benchmark GUI - 1.0.0 TODOs

2014-11-22 Thread Karl Rupp

/index.html. I've
 never used before, but considering the good documentation I don't think
 it'll be a problem.


 Regards,
 Namik

 On Tue, Nov 18, 2014 at 10:46 AM, Karl Rupp r...@iue.tuwien.ac.at
 mailto:r...@iue.tuwien.ac.at wrote:

 Hi Namik,

 thanks for the list. I'll comment on the items inline, particularly
 with having the aim of a release in the near future in mind.

  -advanced mode: reset (load default) benchmark settings button

 nice to have, but not a blocker :-)


 -advanced mode: it would be cool if we could show how much video
 memory
 is required to complete the benchmark. This could be very useful
 when
 playing with custom settings as we would know in advance if we
 set the
 matrix size too high, thus preventing the program from crashing.


 this is not as easy as it may seem. Even if the memory consumed by
 the benchmark is below the physical memory provided, the GPU memory
 may be occupied otherwise. This is particularly an issue with mobile
 GPUs attached to 4K displays, where the screen buffer occupies a
 decent amount of GPU RAM. I think the best we can do is to warn a
 user, e.g.
 The benchmark configuration you provided exceeds XXX (256?) MB of
 GPU RAM. This may exhaust your available GPU RAM. Do you want to
 continue? [Continue]/[Cancel]


 -which makes me think a certain safety switch could also be a nice
 addition, one that would prevent users from starting a benchmark
 if the
 settings are set too high.


 I think the warning dialog above would take care of this. :-)


 -vector benchmark progress update is slacking; it should be
 updated with
 each completed step instead of going from 0% to 100% in one step
 (the
 benchmarking loop needs to altered a little to fix this)


 Fine with me. Can you fix this?
 Also, I noticed that a repeated reset of the x-axis leads to ugly
 fonts for the labels, so the axis should only be set once when
 starting the benchmark. See screenshot attached, including some
 other things...


 -benchmarking with a custom sparse matrix now works with some
 matrices,
 but not all; at least now it doesn't crash the program...


 This is presumably due to the matrix market format. Matrices with
 scalar type 'complex' or with pattern type 'pattern' are not
 supported. Can you add an error message stating this? Something like
 The matrix market reader cannot read the provided file. Only
 real-valued matrices in coordinate-format are supported.


 -my results screen needs significant improvements:more
 benchmark data
 should be shown, auto-refresh when a result gets saved,
 enable/disable
 result saving, delete selected/all results, add user notes, maybe a
 dynamic filter to search through result notes...


 Can we disable the 'my results' screen for the first release? I
 think it requires too much work *now*, but can be picked up any time
 later.


 -result uploading: it's completely missing


 Disable that for the 1.0.0 release. The infrastructure is not yet
 ready - in particular, result uploads are only interesting once we
 have autotuning features in the GUI.


 -result database screen: also completely missing


 Disable :-)


 -system info screen feels kinda weird. Is it just me or maybe
 it could
 use some improvements?


 Well, at least it is informative. ;-)
 It would be great if you can make it a bit more appealing, but
 visual appearance is not a top priority here I'd say...


 -matrixmarket: reconnect the download  run functionality now that
 benchmarking with custom sparse matrices does not crash the program.
 Also, show the achieved result inside the matrixmarket table.
 The idea
 is to make it work like this: you find a nice matrix, hit download 
 run, the program downloads the matrix,  runs a benchmark with your
 matrix, and shows the result next the downloadrun button. It
 would be
 good if we could avoid switching to the benchmark screen during
 execution, and make it so that users don't need to leave the
 matrixmarket screen at all.


 Hmm, I think it is interesting for the user to see which sparse
 matrix type works best for the particular sparse matrix. Unless we
 can provide the full results from the 'sparse' tab in the
 MatrixMarket, I'd rather refrain from it for now.


 -some minor face-lifting procedures: Qt isn't very friendly
 when it
 comes to customizing the looks of widgets. That's why some
 buttons and
 tabs might look a bit wrong (depending on your platform). This
 isn't a
 significant issue, but still

Re: [ViennaCL-devel] Release schedule and project hosting

2014-11-16 Thread Karl Rupp

Hi Namik,

  2.) We currently use viennacl.sourceforge.net
 http://viennacl.sourceforge.net/ as a common domain.
 However, I wonder whether it better to use three separate domains for
 better managing the projects? What about using e.g.
 pyviennacl.sourceforge.net
 http://pyviennacl.sourceforge.net/viennaclbench.sourceforge.net
 http://viennaclbench.sourceforge.net/for PyViennaCL and the
 benchmark GUI? Any thoughts?


 This is fine by me.

 Before continuing work on the benchmark GUI, I'll have to go through the
 project to see what's missing. Tomorrow I'll report on the missing parts
 and things that need to be fixed. We can then decide what should be done
 for the first release so I can get to work.

great to hear from you again ;-)
FYI: I've got a few changes to the benchmark set in the pipe, I hope to 
be able to push them before tomorrow afternoon. Otherwise, this sounds 
great.

Best regards,
Karli


--
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://pubads.g.doubleclick.net/gampad/clk?id=154624111iu=/4140/ostg.clktrk
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] Release schedule and project hosting

2014-11-16 Thread Karl Rupp

Hi Toby,

  1.) Toby, how's the status with PyViennaCL? What is missing for a 
release?

 No code is missing, but some documentation is, mostly finishing off the
 new parts added during GSoC. I'm slowly making progress, but I will
 probably need another 7-10 days for this, as busy as I currently am! I
 wonder if it might be better to release PyViennaCL with ViennaCL 1.6.1,
 which I can imagine being ready before I am.

yeah, having pyViennaCL based on ViennaCL 1.6.1 makes sense, as we found 
(an eliminated ;-) ) a couple of nasty bugs sind the 1.6.0 release.


 2.) We currently use viennacl.sourceforge.net as a common domain.
 However, I wonder whether it better to use three separate domains for
 better managing the projects? What about using e.g.
pyviennacl.sourceforge.net
viennaclbench.sourceforge.net
 for PyViennaCL and the benchmark GUI? Any thoughts?

 I'm quite happy with this. Philippe was talking recently in terms of a
 'ViennaCL ecosystem', and I think this follows that pattern.

Fine - so it seems like we have a majority here :-)

Best regards,
Karli


--
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://pubads.g.doubleclick.net/gampad/clk?id=154624111iu=/4140/ostg.clktrk
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

[ViennaCL-devel] Release schedule and project hosting

2014-11-14 Thread Karl Rupp

Hi,

we should push our next wave of releases out next week and have some 
coordinated promotion. The respective releases are:

* PyViennaCL 1.1.0 (am I correct?)
* ViennaCL 1.6.1
* ViennaCL Benchmark GUI 1.0.0

This brings up two important questions:

1.) Toby, how's the status with PyViennaCL? What is missing for a release?

2.) We currently use viennacl.sourceforge.net as a common domain. 
However, I wonder whether it better to use three separate domains for 
better managing the projects? What about using e.g.
  pyviennacl.sourceforge.net
  viennaclbench.sourceforge.net
for PyViennaCL and the benchmark GUI? Any thoughts?

Best regards,
Karli

--
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://pubads.g.doubleclick.net/gampad/clk?id=154624111iu=/4140/ostg.clktrk
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] Error on ViennaCL build with CUDA

2014-11-12 Thread Karl Rupp

Hi Aanchan,

yes, this is a Boost-related problem. Since the error occurs only in the 
tests, I recommend disabling the tests by disabling ENABLE_TESTING in 
CMake (e.g. through the command line: cmake .. -DENABLE_TESTING=Off)

A student once suggested to replace assert.h in Boost with the version 
from 1.51:
https://github.com/imvu/boost/blob/master/boost/assert.hpp
I haven't tried this myself, but it may work.

Best regards,
Karli



On 11/12/2014 07:03 PM, Aanchan mohan wrote:
 Hi everyone,

 I tried compiling ViennaCL(checked out of the Github dev repo this
 morning EST) with ENABLE_CUDA (and disabling OpenCL). A compile error
 gets thrown when  compiling blas3_solve.cu http://blas3_solve.cu under
 the test directiory. The screen output of just that chunk is
 here:http://pastebin.com/TN4dmcTB. Is the error coming from Boost's
 assert.hpp? Any suggestions to rectify the error would be appreciated.

 Regards,
 Aanchan


--
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://pubads.g.doubleclick.net/gampad/clk?id=154624111iu=/4140/ostg.clktrk
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] Error on ViennaCL build with CUDA

2014-11-12 Thread Karl Rupp

Hi Aanchan,

  Thanks for that e-mail. To report further investigation:

 As you mentioned the problem was coming from array.hpp in Boost. Take a
 look here: https://svn.boost.org/trac/boost/ticket/9392. The issue has
 apparently been rectified in CUDA 6.5.

 I did something super-hacky. The issue is with the preprocessor
 BOOST_NOINLINE defined in boost/config/suffix.hpp. I did a
 #undef BOOST_NOINLINE, and did a #define BOOST_NOINLINE not followed by
 anything. I got a few warnings, but the build compiled. Not sure if that
 was the best thing to do for forward compatibility.

Thanks for the hint.

 It might be safer to
 disable testing with -DENABLE_TESTING=Off.

ENABLE_TESTING is disabled by default in our release tarballs. If we 
disable the test targets in our developer repository, compilation 
problems won't get caught quickly anymore, so I rather don't change that.

Best regards,
Karli



 On Wed, Nov 12, 2014 at 3:00 PM, Karl Rupp r...@iue.tuwien.ac.at
 mailto:r...@iue.tuwien.ac.at wrote:

 Hi Aanchan,

 yes, this is a Boost-related problem. Since the error occurs only in
 the tests, I recommend disabling the tests by disabling
 ENABLE_TESTING in CMake (e.g. through the command line: cmake ..
 -DENABLE_TESTING=Off)

 A student once suggested to replace assert.h in Boost with the
 version from 1.51:
 https://github.com/imvu/boost/__blob/master/boost/assert.hpp
 https://github.com/imvu/boost/blob/master/boost/assert.hpp
 I haven't tried this myself, but it may work.

 Best regards,
 Karli



 On 11/12/2014 07:03 PM, Aanchan mohan wrote:

 Hi everyone,

 I tried compiling ViennaCL(checked out of the Github dev repo this
 morning EST) with ENABLE_CUDA (and disabling OpenCL). A compile
 error
 gets thrown when  compiling blas3_solve.cu
 http://blas3_solve.cu http://blas3_solve.cu under
 the test directiory. The screen output of just that chunk is
 here:http://pastebin.com/__TN4dmcTB
 http://pastebin.com/TN4dmcTB. Is the error coming from Boost's
 assert.hpp? Any suggestions to rectify the error would be
 appreciated.

 Regards,
 Aanchan





--
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://pubads.g.doubleclick.net/gampad/clk?id=154624111iu=/4140/ostg.clktrk
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] Roadmap update

2014-11-10 Thread Karl Rupp

Hey,

  I've updated our roadmap taking into account the latest release:
 https://github.com/viennacl/viennacl-dev/wiki/ViennaCL-Roadmap
 Feel free to add your topics and post your wishes :-)


 Awesome! Is it like a christmas present list? Can we post any wish? I'd
 like a pony, actually. :D

Haha - send me your address and I'll order a hairdresser for you :-P



 The 1.6.1 release is scheduled for the week November 17-21, for which we
 will provide a new fast kernel right when it is presented at the
 Supercomputing conference.


 I had the hope I could get my hand on some GTX970 or GTX980, but I
 wasn't able to. If any deveoper has access to such hardware, it would be
 great to let us know, so that we can get optimized kernel for this
 hardware, and possibly compare against CuBLAS, before SC14.

The GTX 970 and 980 are again limited by firmware for double precision, 
so these GPUs are not that interesting for GPGPU. Also, we already have 
a profile for the GTX 750 Ti, which is Maxwell and detected as such 
(hence the profile gets reused for the 970 and 980). What we don't have 
at the moment is a profile for the GTX Titan (with no good fallback), so 
I consider this more urgent.



 My personal main goal for 1.7.0 is to reduce the use of Boost.uBLAS as
 much as possible and to have a fast, entirely GPU-based AMG
 preconditioner (similar to what is in CUSP). At the same time, I'd like
 to promote shorter release cycles: 1.6.0 was released about a year after
 1.5.0, which keeps quite a number of completed features stuck in the
 pipeline for too long.


 I've added mines. Rather modest: better auto-tuning, and more devices
 supported. I am directing  my efforts towards my  specialization for
 dense BLAS on OpenCL, which will hopefully get integrated in the 2.0.0
 release.

Makes sense, thanks!


 Maybe there will be a 1.8.0 release as well, which will still follow the
 current header-only model. However, we may also switch to ViennaCL 2.0.0
 right after the 1.7.x series in order to better target languages other
 than C++ (most notably C and Fortran due to their wide-spread use in
 HPC).


 I will post what I think is reasonable, although most of my thoughts go
 towards ViennaCL 2.0. As I said I have started today to rewrite the
 OpenCL layer of ViennaCL using CL/cl.hpp and dynamic layout + datatype
 (the rationale behind this choice is that OpenCL is already not
 type-safe anyway, and so clAmdBlas is not type-safe either). It will be
 interesting to see the influence it will have on the compilation time.

Btw: Please be considerate with MacOS and don't forget the necessary 
#ifdef when including the OpenCL headers... :-)

Best regards,
Karli


--
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] More weird problems

2014-11-08 Thread Karl Rupp

Hi Toby,

  Thanks! The numerical errors with element-wise operations such as tan()
 or sin() look okay, that's just numerical noise. The following test
 cases deserve a closer look, though:

 test_matrix_matrix_trans_isub_C_float32
 test_matrix_matrix_slice_trans_isub_C_float32
 test_matrix_matrix_trans_isub_F_float32
 test_matrix_range_matrix_slice_trans_iadd_C_float32
 test_matrix_slice_matrix_slice_trans_iadd_C_float32
 test_matrix_slice_matrix_trans_iadd_F_float32
 test_matrix_matrix_range_trans_isub_C_float32
 test_matrix_slice_matrix_trans_iadd_C_float32
 test_matrix_matrix_slice_trans_isub_F_float32
 test_matrix_matrix_range_trans_iadd_C_float32
 test_matrix_range_matrix_trans_isub_C_float32
 test_matrix_slice_matrix_slice_trans_isub_C_float32
 test_matrix_slice_matrix_trans_isub_F_float32
 test_matrix_matrix_range_trans_iadd_F_float32
 test_matrix_slice_matrix_trans_isub_C_float32
 test_matrix_matrix_slice_trans_iadd_C_float32
 test_matrix_range_matrix_range_trans_isub_C_float32
 test_matrix_range_matrix_trans_isub_F_float32
 test_matrix_range_matrix_slice_trans_isub_C_float32
 test_matrix_slice_matrix_range_trans_iadd_F_float32
 test_matrix_range_matrix_range_trans_iadd_C_float32
 test_matrix_slice_matrix_range_trans_isub_F_float32
 test_matrix_range_matrix_trans_iadd_C_float32
 test_matrix_slice_matrix_slice_trans_iadd_F_float32
 test_matrix_range_matrix_trans_iadd_F_float32
 test_matrix_slice_matrix_range_trans_iadd_C_float32
 test_matrix_matrix_trans_iadd_F_float32
 test_matrix_matrix_trans_iadd_C_float32
 test_matrix_slice_matrix_slice_trans_isub_F_float32
 test_matrix_range_matrix_slice_trans_isub_F_float32
 test_matrix_matrix_range_trans_isub_F_float32
 test_matrix_range_matrix_range_trans_isub_F_float32
 test_matrix_slice_matrix_range_trans_isub_C_float32
 test_matrix_range_matrix_range_trans_iadd_F_float32

 Apparently they all belong to the same family of operations. Can you
 please help me with the deciphering? Which operations correspond to the
 test cases above? (I could guess, but I may be wrong...) iadd and isub
 refer to += and -=?

 Yep. So 'C' and 'F' mean C (row-major) layout or Fortran (col-major) layout.
 Your guess was right about 'iadd' and 'isub': these are simply A += B
 and A -= B. The values of A and B are given by the _matrix_ bits: the
 first one describes A, and the second describes B. So
 test_matrix_slice_matrix_range_trans_isub_C_float32 means

matrix_slice -= matrix_range.T

 where both are C-layout and single precision.

okay, these should be working now, I've pushed a fix. :-)
Still need to look into the GMRES issue...

Best regards,
Karli


--
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

1 2 3 >

1 - 100 of 252 matches

Mail list logo