Hi Karl, here is the kernel below. Regarding your second point, I would
love to process all columns in one kernel but I want to avoid initializing
another entire matrix of the same size. To avoid this I am trying to only
initialize a vector of size = number of rows which can then be assigned to
the source matrix. Ideally I would like to do the reordering 'inplace' so
I can pass the row indexes I want them to be int.
__kernel void set_row_order(
__global const double *A, __global double *B, __global const int
*indices,
const int Mdim, const int globalCol, const int MdimPad) {
// Get the index of the elements to be processed
const int globalRow = get_global_id(0); // C Row ID
//const int globalCol = get_global_id(1); // C Col ID
// Do the operation
if((globalRow <= Mdim)){
B[globalRow] = A[indices[globalRow] * MdimPad + globalCol];
}
}
On Thu, Dec 15, 2016 at 5:18 AM, Karl Rupp <r...@iue.tuwien.ac.at> wrote:
> Hi Charles,
>
> can you please send us the kernel? Maybe there's something wrong with the
> thread assignment there.
>
> Also, rather than looping from 0 to P-1, it would make much more sense to
> process all columns in parallel in a single kernel.
>
> Best regards,
> Karli
>
>
> On 12/14/2016 06:01 PM, Charles Determan wrote:
>
>> A quick addition, it also only seems to crash when the number of rows in
>> the input matrix match or exceed 1000 (i.e. it works with the trivial
>> example with 100 rows).
>>
>> Charles
>>
>> On Wed, Dec 14, 2016 at 10:55 AM, Charles Determan
>> <cdeterma...@gmail.com <mailto:cdeterma...@gmail.com>> wrote:
>>
>> I have a function where I use a custom opencl kernel. The function
>> is below. The function runs without problem and provides the
>> correct result after the *first time* I call it. However, if I try
>>
>> to call the function again it crashes right after the 'initialized'
>> output where it is trying to add the kernel program. Any idea why
>> it would be crashing here on subsequent calls? Is there some
>> cleanup I should be doing at the end of this function?
>>
>> Thanks,
>> Charles
>>
>>
>> template<typename T>
>> void
>> cpp_vclMatrix_set_row_order(
>> SEXP ptrA_,
>> const bool AisVCL,
>> Eigen::VectorXi indices,
>> SEXP sourceCode_,
>> const int max_local_size,
>> const int ctx_id)
>> {
>>
>> std::cout << "called" << std::endl;
>>
>> std::string my_kernel = as<std::string>(sourceCode_);
>>
>> viennacl::ocl::context ctx(viennacl::ocl::get_context(ctx_id));
>>
>> viennacl::matrix<T> *vcl_A;
>> // viennacl::matrix<T> *vcl_B;
>>
>> std::cout << "getting matrix" << std::endl;
>> vcl_A = getVCLptr<T>(ptrA_, AisVCL, ctx_id);
>> // vcl_B = getVCLptr<T>(ptrB_, BisVCL, ctx_id);
>>
>> unsigned int M = vcl_A->size1();
>> // // int N = vcl_B.size1();
>> unsigned int P = vcl_A->size2();
>> unsigned int M_internal = vcl_A->internal_size1();
>> unsigned int P_internal = vcl_A->internal_size2();
>>
>> std::cout << "initialized" << std::endl;
>>
>> // add kernel to program
>> viennacl::ocl::program & my_prog = ctx.add_program(my_kernel,
>> "my_kernel");
>>
>> std::cout << "program added" << std::endl;
>>
>> // get compiled kernel function
>> viennacl::ocl::kernel & set_row_order =
>> my_prog.get_kernel("set_row_order");
>>
>> std::cout << "got kernel" << std::endl;
>>
>> // set global work sizes
>> set_row_order.global_work_size(0, M_internal);
>> set_row_order.global_work_size(1, P_internal);
>>
>> std::cout << "set global" << std::endl;
>>
>> // set local work sizes
>> set_row_order.local_work_size(0, max_local_size);
>> set_row_order.local_work_size(1, max_local_size);
>>
>> std::cout << "begin enqueue" << std::endl;
>>
>> {
>>
>> std::cout << "moving indexes" << std::endl;
>> viennacl::vector<int> vcl_I(indices.size());
>> viennacl::copy(indices, vcl_I);
>>
>> std::cout << "creating dummy vector" << std::endl;
>> viennacl::vector<T> vcl_V = viennacl::zero_vector<T>(M);
>>
>> viennacl::matrix_base<T> vcl_B(vcl_V.handle(),
>> M, 0, 1, M, //row layout
>> 1, 0, 1, 1, //column layout
>> true); // row-major
>>
>> viennacl::range r(0, M);
>>
>> for(unsigned int i=0; i < P; i++){
>>
>> viennacl::range c(i, i+1);
>>
>> viennacl::matrix_range<viennacl::matrix<T> > tmp(*vcl_A,
>> r, c);
>>
>> // std::cout << tmp << std::endl;
>>
>> viennacl::ocl::enqueue(set_row_order(tmp, vcl_B, vcl_I,
>> M, i, M_internal));
>>
>> tmp = vcl_B;
>> }
>> }
>> }
>>
>>
>>
>>
>> ------------------------------------------------------------
>> ------------------
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>>
>>
>>
>> _______________________________________________
>> ViennaCL-devel mailing list
>> ViennaCL-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/viennacl-devel
>>
>>
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel