Re: [petsc-dev] Not possible to do a VecPlaceArray for veccusp

2016-03-10 Thread Jose E. Roman

> El 10 mar 2016, a las 11:21, Karl Rupp  escribió:
> 
> Great! I'm looking forward to reviewing your pull request. Let me know if you 
> need support with the Mat part.
> 
> Best regards,
> Karli

The pull request:
https://bitbucket.org/petsc/petsc/pull-requests/421/



Re: [petsc-dev] Not possible to do a VecPlaceArray for veccusp

2016-03-10 Thread Karl Rupp




Hi Jose and Alejandro,

how's your current progress/status? It looks like I'm able to spend some time 
on this and can get this done by early next week. On the other hand, if you've 
finished all the relevant parts you required, I will refrain on duplicating the 
work.

Best regards,
Karli


We are done with the Vec part. We will create a pull request today to start 
discussion, and then continue with the part related to Mat.


Great! I'm looking forward to reviewing your pull request. Let me know 
if you need support with the Mat part.


Best regards,
Karli



Re: [petsc-dev] Not possible to do a VecPlaceArray for veccusp

2016-03-10 Thread Jose E. Roman

> El 10 mar 2016, a las 10:10, Karl Rupp  escribió:
> 
> Hi Jose and Alejandro,
> 
> how's your current progress/status? It looks like I'm able to spend some time 
> on this and can get this done by early next week. On the other hand, if 
> you've finished all the relevant parts you required, I will refrain on 
> duplicating the work.
> 
> Best regards,
> Karli

We are done with the Vec part. We will create a pull request today to start 
discussion, and then continue with the part related to Mat.

Jose



Re: [petsc-dev] Not possible to do a VecPlaceArray for veccusp

2016-03-10 Thread Karl Rupp

Hi Jose and Alejandro,

how's your current progress/status? It looks like I'm able to spend some 
time on this and can get this done by early next week. On the other 
hand, if you've finished all the relevant parts you required, I will 
refrain on duplicating the work.


Best regards,
Karli



On 02/28/2016 05:54 PM, Jose E. Roman wrote:



El 28 feb 2016, a las 10:45, Karl Rupp  escribió:

Hi,


I like the idea of having separate VECCUDA and VECVIENNACL, because it is 
possible to implement VECCUDA without dependence on a C++ compiler (only the 
CUDA compiler).


I don't understand this part. NVCC also requires a C++ host compiler and is 
fairly picky about the supported compilers.


You are right. I was thinking of the case when one has a pure C code and wants 
to use a --with-language=C PETSc configuration.





If you want, we can prepare a rough initial implementation of VECCUDA in the 
next days, and we can later discuss what to keep/discard.


Any contributions are welcome :-)



Karl: regarding the time constraints, our idea is to present something at a 
conference this summer, and deadlines are approaching.


Ok, this is on fairly short notice considering the changes required. I 
recommend to start with copying the CUSP sources and migrate it over to VECCUDA 
by replacing any use of cusp::array1d to a raw CUDA handle. Operations from 
CUSP should be replaced by CUBLAS calls.


Ok. Will start work on this.

Jose



Best regards,
Karli







Re: [petsc-dev] Not possible to do a VecPlaceArray for veccusp

2016-02-29 Thread Karl Rupp



> This is not about performance, but about providing the ability for

users to 'implant' their own memory buffers. CUSP doesn't allow it
(which was the initial point of this thread).


Thanks. Sorry I missed that. Given CPU memcpy is an order of magnitude
more bandwidth than PCI offload, I still don't get the point, but I
don't need to.


User-provided GPU memory buffers. CUDA buffers. Avoiding PCI Express ;-)

Best regards,
Karli



Re: [petsc-dev] Not possible to do a VecPlaceArray for veccusp

2016-02-29 Thread Jeff Hammond
On Monday, February 29, 2016, Karl Rupp  wrote:

> Hi Jeff,
>
> > Ok, this is on fairly short notice considering the changes required.
>
>> I recommend to start with copying the CUSP sources and migrate it
>> over to VECCUDA by replacing any use of cusp::array1d to a raw CUDA
>> handle. Operations from CUSP should be replaced by CUBLAS calls.
>>
>>
>> It's hard to imagine any performance benefit from this unless CUSP
>> sucks. What am I missing?
>>
>
> This is not about performance, but about providing the ability for users
> to 'implant' their own memory buffers. CUSP doesn't allow it (which was the
> initial point of this thread).
>
>
Thanks. Sorry I missed that. Given CPU memcpy is an order of magnitude more
bandwidth than PCI offload, I still don't get the point, but I don't need
to.

Jeff


> Best regards,
> Karli
>


-- 
Jeff Hammond
jeff.scie...@gmail.com
http://jeffhammond.github.io/


Re: [petsc-dev] Not possible to do a VecPlaceArray for veccusp

2016-02-29 Thread Karl Rupp

Hi Jeff,

> Ok, this is on fairly short notice considering the changes required.

I recommend to start with copying the CUSP sources and migrate it
over to VECCUDA by replacing any use of cusp::array1d to a raw CUDA
handle. Operations from CUSP should be replaced by CUBLAS calls.


It's hard to imagine any performance benefit from this unless CUSP
sucks. What am I missing?


This is not about performance, but about providing the ability for users 
to 'implant' their own memory buffers. CUSP doesn't allow it (which was 
the initial point of this thread).


Best regards,
Karli


Re: [petsc-dev] Not possible to do a VecPlaceArray for veccusp

2016-02-28 Thread Jeff Hammond
On Sunday, February 28, 2016, Karl Rupp  wrote:

> Hi,
>
> I like the idea of having separate VECCUDA and VECVIENNACL, because it is
>> possible to implement VECCUDA without dependence on a C++ compiler (only
>> the CUDA compiler).
>>
>
> I don't understand this part. NVCC also requires a C++ host compiler and
> is fairly picky about the supported compilers.
>
>
> If you want, we can prepare a rough initial implementation of VECCUDA in
>> the next days, and we can later discuss what to keep/discard.
>>
>
> Any contributions are welcome :-)
>
>
> Karl: regarding the time constraints, our idea is to present something at
>> a conference this summer, and deadlines are approaching.
>>
>
> Ok, this is on fairly short notice considering the changes required. I
> recommend to start with copying the CUSP sources and migrate it over to
> VECCUDA by replacing any use of cusp::array1d to a raw CUDA handle.
> Operations from CUSP should be replaced by CUBLAS calls.
>
>
It's hard to imagine any performance benefit from this unless CUSP sucks.
What am I missing?

Jeff


> Best regards,
> Karli
>
>
>
>
> If there is interest we can help in adding this stuff.
>

 What are your time constraints?

 Best regards,
 Karli



>>> --
>>> Dominic Meiser
>>> Tech-X Corporation - 5621 Arapahoe Avenue - Boulder, CO 80303
>>>
>>
>>
>

-- 
Jeff Hammond
jeff.scie...@gmail.com
http://jeffhammond.github.io/


Re: [petsc-dev] Not possible to do a VecPlaceArray for veccusp

2016-02-28 Thread Jose E. Roman

> El 28 feb 2016, a las 10:45, Karl Rupp  escribió:
> 
> Hi,
> 
>> I like the idea of having separate VECCUDA and VECVIENNACL, because it is 
>> possible to implement VECCUDA without dependence on a C++ compiler (only the 
>> CUDA compiler).
> 
> I don't understand this part. NVCC also requires a C++ host compiler and is 
> fairly picky about the supported compilers.

You are right. I was thinking of the case when one has a pure C code and wants 
to use a --with-language=C PETSc configuration.

> 
> 
>> If you want, we can prepare a rough initial implementation of VECCUDA in the 
>> next days, and we can later discuss what to keep/discard.
> 
> Any contributions are welcome :-)
> 
> 
>> Karl: regarding the time constraints, our idea is to present something at a 
>> conference this summer, and deadlines are approaching.
> 
> Ok, this is on fairly short notice considering the changes required. I 
> recommend to start with copying the CUSP sources and migrate it over to 
> VECCUDA by replacing any use of cusp::array1d to a raw CUDA handle. 
> Operations from CUSP should be replaced by CUBLAS calls.

Ok. Will start work on this.

Jose

> 
> Best regards,
> Karli
> 



Re: [petsc-dev] Not possible to do a VecPlaceArray for veccusp

2016-02-28 Thread Karl Rupp

Hi,


I like the idea of having separate VECCUDA and VECVIENNACL, because it is 
possible to implement VECCUDA without dependence on a C++ compiler (only the 
CUDA compiler).


I don't understand this part. NVCC also requires a C++ host compiler and 
is fairly picky about the supported compilers.




If you want, we can prepare a rough initial implementation of VECCUDA in the 
next days, and we can later discuss what to keep/discard.


Any contributions are welcome :-)



Karl: regarding the time constraints, our idea is to present something at a 
conference this summer, and deadlines are approaching.


Ok, this is on fairly short notice considering the changes required. I 
recommend to start with copying the CUSP sources and migrate it over to 
VECCUDA by replacing any use of cusp::array1d to a raw CUDA handle. 
Operations from CUSP should be replaced by CUBLAS calls.


Best regards,
Karli





If there is interest we can help in adding this stuff.


What are your time constraints?

Best regards,
Karli




--
Dominic Meiser
Tech-X Corporation - 5621 Arapahoe Avenue - Boulder, CO 80303






Re: [petsc-dev] Not possible to do a VecPlaceArray for veccusp

2016-02-26 Thread Jose E. Roman

> El 26 feb 2016, a las 18:31, Dominic Meiser  escribió:
> 
> On Fri, Feb 26, 2016 at 02:49:39PM +0100, Karl Rupp wrote:
>> 
 The alternative would be to use raw cuda pointers instead of cusp
 arrays for GPU memory in VecCUSP.  That would be a fairly
 significant undertaking (certainly more than the 2-3 weeks Karli
 is estimating for getting the ViennaCL cuda backend in).
>>> 
>>> Do you mean creating a new class VECCUDA in addition to VECCUSP and 
>>> VECVIENNACL? This could be a solution for us. It would mean maybe 
>>> refactoring MATAIJCUSPARSE to work with these new Vecs?
>> 
>> I prefer to replace VECCUSP with e.g. VECCUDA (and eventually also
>> rename VECCUSPARSE to VECCUDA to have a unified naming for all the
>> things provided natively with the CUDA SDK) in order to reduce
>> external dependencies. CUSP will provide matrices, preconditioners,
>> etc. as before, but is only optional and thus less likely to cause
>> installation troubles. Supporting VECCUSP and VECCUDA next to each
>> other is going to be too much code duplication without any benefit.
> 
> That makes sense.  At the Vec level we should be using a low
> level construct (i.e. cuda raw pointers) because clients can
> always provide raw pointers and they know how to consume them
> (e.g. if they want to use cusp vectors on their end).
> 
>> 
>> Even if we do provide VECCUDA, I still dislike the fact that we
>> would have to maintain essentially the same code twice: One for
>> CUDA, one for OpenCL/ViennaCL. With the ViennaCL bindings providing
>> OpenMP and CUDA support soon, this also duplicates functionality. A
>> possible 'fix' is to just use ViennaCL for CUDA+OpenCL+OpenMP and
>> thus only maintain a single PETSc plugin for all three. However, I'm
>> certainly too biased to be taken seriously here.
> 
> I agree with this in principle.  Perhaps it's time to consolidate
> the cuda/cusp/cusparse/opencl efforts.  Note however that
> MATAIJCUSPARSE provides capabilities that won't be available
> right away with ViennaCL (e.g. multi-GPU block Jacobi and ASM
> preconditioners).

I like the idea of having separate VECCUDA and VECVIENNACL, because it is 
possible to implement VECCUDA without dependence on a C++ compiler (only the 
CUDA compiler).

If you want, we can prepare a rough initial implementation of VECCUDA in the 
next days, and we can later discuss what to keep/discard.

Karl: regarding the time constraints, our idea is to present something at a 
conference this summer, and deadlines are approaching.


> 
> Cheers,
> Dominic
> 
> 
>> 
>>> 
>>> If there is interest we can help in adding this stuff.
>> 
>> What are your time constraints?
>> 
>> Best regards,
>> Karli
>> 
>> 
> 
> -- 
> Dominic Meiser
> Tech-X Corporation - 5621 Arapahoe Avenue - Boulder, CO 80303



Re: [petsc-dev] Not possible to do a VecPlaceArray for veccusp

2016-02-26 Thread Dominic Meiser
On Fri, Feb 26, 2016 at 02:49:39PM +0100, Karl Rupp wrote:
> 
> >>The alternative would be to use raw cuda pointers instead of cusp
> >>arrays for GPU memory in VecCUSP.  That would be a fairly
> >>significant undertaking (certainly more than the 2-3 weeks Karli
> >>is estimating for getting the ViennaCL cuda backend in).
> >
> >Do you mean creating a new class VECCUDA in addition to VECCUSP and 
> >VECVIENNACL? This could be a solution for us. It would mean maybe 
> >refactoring MATAIJCUSPARSE to work with these new Vecs?
> 
> I prefer to replace VECCUSP with e.g. VECCUDA (and eventually also
> rename VECCUSPARSE to VECCUDA to have a unified naming for all the
> things provided natively with the CUDA SDK) in order to reduce
> external dependencies. CUSP will provide matrices, preconditioners,
> etc. as before, but is only optional and thus less likely to cause
> installation troubles. Supporting VECCUSP and VECCUDA next to each
> other is going to be too much code duplication without any benefit.

That makes sense.  At the Vec level we should be using a low
level construct (i.e. cuda raw pointers) because clients can
always provide raw pointers and they know how to consume them
(e.g. if they want to use cusp vectors on their end).

> 
> Even if we do provide VECCUDA, I still dislike the fact that we
> would have to maintain essentially the same code twice: One for
> CUDA, one for OpenCL/ViennaCL. With the ViennaCL bindings providing
> OpenMP and CUDA support soon, this also duplicates functionality. A
> possible 'fix' is to just use ViennaCL for CUDA+OpenCL+OpenMP and
> thus only maintain a single PETSc plugin for all three. However, I'm
> certainly too biased to be taken seriously here.

I agree with this in principle.  Perhaps it's time to consolidate
the cuda/cusp/cusparse/opencl efforts.  Note however that
MATAIJCUSPARSE provides capabilities that won't be available
right away with ViennaCL (e.g. multi-GPU block Jacobi and ASM
preconditioners).

Cheers,
Dominic


> 
> >
> > If there is interest we can help in adding this stuff.
> 
> What are your time constraints?
> 
> Best regards,
> Karli
> 
> 

-- 
Dominic Meiser
Tech-X Corporation - 5621 Arapahoe Avenue - Boulder, CO 80303


Re: [petsc-dev] Not possible to do a VecPlaceArray for veccusp

2016-02-26 Thread Karl Rupp



The alternative would be to use raw cuda pointers instead of cusp
arrays for GPU memory in VecCUSP.  That would be a fairly
significant undertaking (certainly more than the 2-3 weeks Karli
is estimating for getting the ViennaCL cuda backend in).


Do you mean creating a new class VECCUDA in addition to VECCUSP and 
VECVIENNACL? This could be a solution for us. It would mean maybe refactoring 
MATAIJCUSPARSE to work with these new Vecs?


I prefer to replace VECCUSP with e.g. VECCUDA (and eventually also 
rename VECCUSPARSE to VECCUDA to have a unified naming for all the 
things provided natively with the CUDA SDK) in order to reduce external 
dependencies. CUSP will provide matrices, preconditioners, etc. as 
before, but is only optional and thus less likely to cause installation 
troubles. Supporting VECCUSP and VECCUDA next to each other is going to 
be too much code duplication without any benefit.


Even if we do provide VECCUDA, I still dislike the fact that we would 
have to maintain essentially the same code twice: One for CUDA, one for 
OpenCL/ViennaCL. With the ViennaCL bindings providing OpenMP and CUDA 
support soon, this also duplicates functionality. A possible 'fix' is to 
just use ViennaCL for CUDA+OpenCL+OpenMP and thus only maintain a single 
PETSc plugin for all three. However, I'm certainly too biased to be 
taken seriously here.


>
> If there is interest we can help in adding this stuff.

What are your time constraints?

Best regards,
Karli





Re: [petsc-dev] Not possible to do a VecPlaceArray for veccusp

2016-02-26 Thread Jose E. Roman

> El 25 feb 2016, a las 17:19, Dominic Meiser  escribió:
> 
> On Thu, Feb 25, 2016 at 01:13:01PM +0100, Jose E. Roman wrote:
>> We are trying to do some GPU developments on the SLEPc side, and we would 
>> need a way of placing the array of a VECCUSP vector, providing the GPU 
>> address. Specifically, what we want to do is have a large Vec on GPU and 
>> slice it in several smaller Vecs.
>> 
>> For the GetArray/RestoreArray we have all possibilities:
>> - VecGetArray: gets the pointer to the buffer stored in CPU memory
>> - VecCUSPGetArray*: returns a CUSPARRAY object that contains some info, 
>> including the buffer allocated in GPU memory
>> - VecCUSPGetCUDAArray*: returns a raw pointer of the GPU buffer
>> 
>> The problem comes with PlaceArray equivalents. Using VecPlaceArray we can 
>> provide a new pointer to CPU memory. We wanted to implement the equivalent 
>> thing for GPU, but we found difficulties due to Thrust. If we wanted to 
>> provide a VecCUSPPlaceCUDAArray the problem is that Thrust does not allow 
>> wrapping an exisiting GPU buffer with a CUSPARRAY object (when creating a 
>> CUSPARRAY it always allocates new memory). On the other hand, 
>> VecCUSPPlaceArray is possible to implement, but the problem is that one 
>> should provide a CUSPARRAY obtained from a VecCUSPGetArray* without 
>> modification (it is not possible to do pointer arithmetic with a CUSPARRAY).
>> 
>> Any thoughts?
>> 
> 
> I think your and Karli's analysis is correct, this is currently
> not supported.  Besides Karli's proposal to use ViennaCL's cuda
> backend a different option might be to use cusp's array views.
> These have a constructor for sub-ranges of other cusp arrays:
> 
> https://github.com/cusplibrary/cusplibrary/blob/master/cusp/array1d.h#L409
> 
> However, enabling cusp array views in something like
> VecCUSPPlaceArray is not immediately possible.  The CUSPARRAY
> type, which is currently hardwired to be
> cusp::array1d, would have to
> become a template parameter.  I'm not sure if we want to go down
> that path.

Yes, we do not like this.

> 
> The alternative would be to use raw cuda pointers instead of cusp
> arrays for GPU memory in VecCUSP.  That would be a fairly
> significant undertaking (certainly more than the 2-3 weeks Karli
> is estimating for getting the ViennaCL cuda backend in).

Do you mean creating a new class VECCUDA in addition to VECCUSP and 
VECVIENNACL? This could be a solution for us. It would mean maybe refactoring 
MATAIJCUSPARSE to work with these new Vecs?

If there is interest we can help in adding this stuff.


> 
> Cheers,
> Dominic
> 
> -- 
> Dominic Meiser
> Tech-X Corporation - 5621 Arapahoe Avenue - Boulder, CO 80303



Re: [petsc-dev] Not possible to do a VecPlaceArray for veccusp

2016-02-25 Thread Dominic Meiser
On Thu, Feb 25, 2016 at 01:13:01PM +0100, Jose E. Roman wrote:
> We are trying to do some GPU developments on the SLEPc side, and we would 
> need a way of placing the array of a VECCUSP vector, providing the GPU 
> address. Specifically, what we want to do is have a large Vec on GPU and 
> slice it in several smaller Vecs.
> 
> For the GetArray/RestoreArray we have all possibilities:
> - VecGetArray: gets the pointer to the buffer stored in CPU memory
> - VecCUSPGetArray*: returns a CUSPARRAY object that contains some info, 
> including the buffer allocated in GPU memory
> - VecCUSPGetCUDAArray*: returns a raw pointer of the GPU buffer
> 
> The problem comes with PlaceArray equivalents. Using VecPlaceArray we can 
> provide a new pointer to CPU memory. We wanted to implement the equivalent 
> thing for GPU, but we found difficulties due to Thrust. If we wanted to 
> provide a VecCUSPPlaceCUDAArray the problem is that Thrust does not allow 
> wrapping an exisiting GPU buffer with a CUSPARRAY object (when creating a 
> CUSPARRAY it always allocates new memory). On the other hand, 
> VecCUSPPlaceArray is possible to implement, but the problem is that one 
> should provide a CUSPARRAY obtained from a VecCUSPGetArray* without 
> modification (it is not possible to do pointer arithmetic with a CUSPARRAY).
> 
> Any thoughts?
> 

I think your and Karli's analysis is correct, this is currently
not supported.  Besides Karli's proposal to use ViennaCL's cuda
backend a different option might be to use cusp's array views.
These have a constructor for sub-ranges of other cusp arrays:

https://github.com/cusplibrary/cusplibrary/blob/master/cusp/array1d.h#L409

However, enabling cusp array views in something like
VecCUSPPlaceArray is not immediately possible.  The CUSPARRAY
type, which is currently hardwired to be
cusp::array1d, would have to
become a template parameter.  I'm not sure if we want to go down
that path.

The alternative would be to use raw cuda pointers instead of cusp
arrays for GPU memory in VecCUSP.  That would be a fairly
significant undertaking (certainly more than the 2-3 weeks Karli
is estimating for getting the ViennaCL cuda backend in).

Cheers,
Dominic

-- 
Dominic Meiser
Tech-X Corporation - 5621 Arapahoe Avenue - Boulder, CO 80303


Re: [petsc-dev] Not possible to do a VecPlaceArray for veccusp

2016-02-25 Thread Karl Rupp

Hi Jose,

> We are trying to do some GPU developments on the SLEPc side, and we 
would need a way of placing the array of a VECCUSP vector, providing the 
GPU address. Specifically, what we want to do is have a large Vec on GPU 
and slice it in several smaller Vecs.


For the GetArray/RestoreArray we have all possibilities:
- VecGetArray: gets the pointer to the buffer stored in CPU memory
- VecCUSPGetArray*: returns a CUSPARRAY object that contains some info, 
including the buffer allocated in GPU memory
- VecCUSPGetCUDAArray*: returns a raw pointer of the GPU buffer

The problem comes with PlaceArray equivalents. Using VecPlaceArray we can 
provide a new pointer to CPU memory. We wanted to implement the equivalent 
thing for GPU, but we found difficulties due to Thrust. If we wanted to provide 
a VecCUSPPlaceCUDAArray the problem is that Thrust does not allow wrapping an 
exisiting GPU buffer with a CUSPARRAY object (when creating a CUSPARRAY it 
always allocates new memory). On the other hand, VecCUSPPlaceArray is possible 
to implement, but the problem is that one should provide a CUSPARRAY obtained 
from a VecCUSPGetArray* without modification (it is not possible to do pointer 
arithmetic with a CUSPARRAY).


As far as I can see from browsing the documentation and the web there is 
indeed no such option. Ouch.




Any thoughts?


I'll soon expose the CUDA backend of ViennaCL in PETSc, which will offer 
such functionality. Is this an option? It will take 2-3 weeks, though.


Best regards,
Karli



[petsc-dev] Not possible to do a VecPlaceArray for veccusp

2016-02-25 Thread Jose E. Roman
We are trying to do some GPU developments on the SLEPc side, and we would need 
a way of placing the array of a VECCUSP vector, providing the GPU address. 
Specifically, what we want to do is have a large Vec on GPU and slice it in 
several smaller Vecs.

For the GetArray/RestoreArray we have all possibilities:
- VecGetArray: gets the pointer to the buffer stored in CPU memory
- VecCUSPGetArray*: returns a CUSPARRAY object that contains some info, 
including the buffer allocated in GPU memory
- VecCUSPGetCUDAArray*: returns a raw pointer of the GPU buffer

The problem comes with PlaceArray equivalents. Using VecPlaceArray we can 
provide a new pointer to CPU memory. We wanted to implement the equivalent 
thing for GPU, but we found difficulties due to Thrust. If we wanted to provide 
a VecCUSPPlaceCUDAArray the problem is that Thrust does not allow wrapping an 
exisiting GPU buffer with a CUSPARRAY object (when creating a CUSPARRAY it 
always allocates new memory). On the other hand, VecCUSPPlaceArray is possible 
to implement, but the problem is that one should provide a CUSPARRAY obtained 
from a VecCUSPGetArray* without modification (it is not possible to do pointer 
arithmetic with a CUSPARRAY).

Any thoughts?