Re: [petsc-dev] Not possible to do a VecPlaceArray for veccusp
> El 10 mar 2016, a las 11:21, Karl Rupp escribió: > > Great! I'm looking forward to reviewing your pull request. Let me know if you > need support with the Mat part. > > Best regards, > Karli The pull request: https://bitbucket.org/petsc/petsc/pull-requests/421/
Re: [petsc-dev] Not possible to do a VecPlaceArray for veccusp
Hi Jose and Alejandro, how's your current progress/status? It looks like I'm able to spend some time on this and can get this done by early next week. On the other hand, if you've finished all the relevant parts you required, I will refrain on duplicating the work. Best regards, Karli We are done with the Vec part. We will create a pull request today to start discussion, and then continue with the part related to Mat. Great! I'm looking forward to reviewing your pull request. Let me know if you need support with the Mat part. Best regards, Karli
Re: [petsc-dev] Not possible to do a VecPlaceArray for veccusp
> El 10 mar 2016, a las 10:10, Karl Rupp escribió: > > Hi Jose and Alejandro, > > how's your current progress/status? It looks like I'm able to spend some time > on this and can get this done by early next week. On the other hand, if > you've finished all the relevant parts you required, I will refrain on > duplicating the work. > > Best regards, > Karli We are done with the Vec part. We will create a pull request today to start discussion, and then continue with the part related to Mat. Jose
Re: [petsc-dev] Not possible to do a VecPlaceArray for veccusp
Hi Jose and Alejandro, how's your current progress/status? It looks like I'm able to spend some time on this and can get this done by early next week. On the other hand, if you've finished all the relevant parts you required, I will refrain on duplicating the work. Best regards, Karli On 02/28/2016 05:54 PM, Jose E. Roman wrote: El 28 feb 2016, a las 10:45, Karl Rupp escribió: Hi, I like the idea of having separate VECCUDA and VECVIENNACL, because it is possible to implement VECCUDA without dependence on a C++ compiler (only the CUDA compiler). I don't understand this part. NVCC also requires a C++ host compiler and is fairly picky about the supported compilers. You are right. I was thinking of the case when one has a pure C code and wants to use a --with-language=C PETSc configuration. If you want, we can prepare a rough initial implementation of VECCUDA in the next days, and we can later discuss what to keep/discard. Any contributions are welcome :-) Karl: regarding the time constraints, our idea is to present something at a conference this summer, and deadlines are approaching. Ok, this is on fairly short notice considering the changes required. I recommend to start with copying the CUSP sources and migrate it over to VECCUDA by replacing any use of cusp::array1d to a raw CUDA handle. Operations from CUSP should be replaced by CUBLAS calls. Ok. Will start work on this. Jose Best regards, Karli
Re: [petsc-dev] Not possible to do a VecPlaceArray for veccusp
> This is not about performance, but about providing the ability for users to 'implant' their own memory buffers. CUSP doesn't allow it (which was the initial point of this thread). Thanks. Sorry I missed that. Given CPU memcpy is an order of magnitude more bandwidth than PCI offload, I still don't get the point, but I don't need to. User-provided GPU memory buffers. CUDA buffers. Avoiding PCI Express ;-) Best regards, Karli
Re: [petsc-dev] Not possible to do a VecPlaceArray for veccusp
On Monday, February 29, 2016, Karl Rupp wrote: > Hi Jeff, > > > Ok, this is on fairly short notice considering the changes required. > >> I recommend to start with copying the CUSP sources and migrate it >> over to VECCUDA by replacing any use of cusp::array1d to a raw CUDA >> handle. Operations from CUSP should be replaced by CUBLAS calls. >> >> >> It's hard to imagine any performance benefit from this unless CUSP >> sucks. What am I missing? >> > > This is not about performance, but about providing the ability for users > to 'implant' their own memory buffers. CUSP doesn't allow it (which was the > initial point of this thread). > > Thanks. Sorry I missed that. Given CPU memcpy is an order of magnitude more bandwidth than PCI offload, I still don't get the point, but I don't need to. Jeff > Best regards, > Karli > -- Jeff Hammond jeff.scie...@gmail.com http://jeffhammond.github.io/
Re: [petsc-dev] Not possible to do a VecPlaceArray for veccusp
Hi Jeff, > Ok, this is on fairly short notice considering the changes required. I recommend to start with copying the CUSP sources and migrate it over to VECCUDA by replacing any use of cusp::array1d to a raw CUDA handle. Operations from CUSP should be replaced by CUBLAS calls. It's hard to imagine any performance benefit from this unless CUSP sucks. What am I missing? This is not about performance, but about providing the ability for users to 'implant' their own memory buffers. CUSP doesn't allow it (which was the initial point of this thread). Best regards, Karli
Re: [petsc-dev] Not possible to do a VecPlaceArray for veccusp
On Sunday, February 28, 2016, Karl Rupp wrote: > Hi, > > I like the idea of having separate VECCUDA and VECVIENNACL, because it is >> possible to implement VECCUDA without dependence on a C++ compiler (only >> the CUDA compiler). >> > > I don't understand this part. NVCC also requires a C++ host compiler and > is fairly picky about the supported compilers. > > > If you want, we can prepare a rough initial implementation of VECCUDA in >> the next days, and we can later discuss what to keep/discard. >> > > Any contributions are welcome :-) > > > Karl: regarding the time constraints, our idea is to present something at >> a conference this summer, and deadlines are approaching. >> > > Ok, this is on fairly short notice considering the changes required. I > recommend to start with copying the CUSP sources and migrate it over to > VECCUDA by replacing any use of cusp::array1d to a raw CUDA handle. > Operations from CUSP should be replaced by CUBLAS calls. > > It's hard to imagine any performance benefit from this unless CUSP sucks. What am I missing? Jeff > Best regards, > Karli > > > > > If there is interest we can help in adding this stuff. > What are your time constraints? Best regards, Karli >>> -- >>> Dominic Meiser >>> Tech-X Corporation - 5621 Arapahoe Avenue - Boulder, CO 80303 >>> >> >> > -- Jeff Hammond jeff.scie...@gmail.com http://jeffhammond.github.io/
Re: [petsc-dev] Not possible to do a VecPlaceArray for veccusp
> El 28 feb 2016, a las 10:45, Karl Rupp escribió: > > Hi, > >> I like the idea of having separate VECCUDA and VECVIENNACL, because it is >> possible to implement VECCUDA without dependence on a C++ compiler (only the >> CUDA compiler). > > I don't understand this part. NVCC also requires a C++ host compiler and is > fairly picky about the supported compilers. You are right. I was thinking of the case when one has a pure C code and wants to use a --with-language=C PETSc configuration. > > >> If you want, we can prepare a rough initial implementation of VECCUDA in the >> next days, and we can later discuss what to keep/discard. > > Any contributions are welcome :-) > > >> Karl: regarding the time constraints, our idea is to present something at a >> conference this summer, and deadlines are approaching. > > Ok, this is on fairly short notice considering the changes required. I > recommend to start with copying the CUSP sources and migrate it over to > VECCUDA by replacing any use of cusp::array1d to a raw CUDA handle. > Operations from CUSP should be replaced by CUBLAS calls. Ok. Will start work on this. Jose > > Best regards, > Karli >
Re: [petsc-dev] Not possible to do a VecPlaceArray for veccusp
Hi, I like the idea of having separate VECCUDA and VECVIENNACL, because it is possible to implement VECCUDA without dependence on a C++ compiler (only the CUDA compiler). I don't understand this part. NVCC also requires a C++ host compiler and is fairly picky about the supported compilers. If you want, we can prepare a rough initial implementation of VECCUDA in the next days, and we can later discuss what to keep/discard. Any contributions are welcome :-) Karl: regarding the time constraints, our idea is to present something at a conference this summer, and deadlines are approaching. Ok, this is on fairly short notice considering the changes required. I recommend to start with copying the CUSP sources and migrate it over to VECCUDA by replacing any use of cusp::array1d to a raw CUDA handle. Operations from CUSP should be replaced by CUBLAS calls. Best regards, Karli If there is interest we can help in adding this stuff. What are your time constraints? Best regards, Karli -- Dominic Meiser Tech-X Corporation - 5621 Arapahoe Avenue - Boulder, CO 80303
Re: [petsc-dev] Not possible to do a VecPlaceArray for veccusp
> El 26 feb 2016, a las 18:31, Dominic Meiser escribió: > > On Fri, Feb 26, 2016 at 02:49:39PM +0100, Karl Rupp wrote: >> The alternative would be to use raw cuda pointers instead of cusp arrays for GPU memory in VecCUSP. That would be a fairly significant undertaking (certainly more than the 2-3 weeks Karli is estimating for getting the ViennaCL cuda backend in). >>> >>> Do you mean creating a new class VECCUDA in addition to VECCUSP and >>> VECVIENNACL? This could be a solution for us. It would mean maybe >>> refactoring MATAIJCUSPARSE to work with these new Vecs? >> >> I prefer to replace VECCUSP with e.g. VECCUDA (and eventually also >> rename VECCUSPARSE to VECCUDA to have a unified naming for all the >> things provided natively with the CUDA SDK) in order to reduce >> external dependencies. CUSP will provide matrices, preconditioners, >> etc. as before, but is only optional and thus less likely to cause >> installation troubles. Supporting VECCUSP and VECCUDA next to each >> other is going to be too much code duplication without any benefit. > > That makes sense. At the Vec level we should be using a low > level construct (i.e. cuda raw pointers) because clients can > always provide raw pointers and they know how to consume them > (e.g. if they want to use cusp vectors on their end). > >> >> Even if we do provide VECCUDA, I still dislike the fact that we >> would have to maintain essentially the same code twice: One for >> CUDA, one for OpenCL/ViennaCL. With the ViennaCL bindings providing >> OpenMP and CUDA support soon, this also duplicates functionality. A >> possible 'fix' is to just use ViennaCL for CUDA+OpenCL+OpenMP and >> thus only maintain a single PETSc plugin for all three. However, I'm >> certainly too biased to be taken seriously here. > > I agree with this in principle. Perhaps it's time to consolidate > the cuda/cusp/cusparse/opencl efforts. Note however that > MATAIJCUSPARSE provides capabilities that won't be available > right away with ViennaCL (e.g. multi-GPU block Jacobi and ASM > preconditioners). I like the idea of having separate VECCUDA and VECVIENNACL, because it is possible to implement VECCUDA without dependence on a C++ compiler (only the CUDA compiler). If you want, we can prepare a rough initial implementation of VECCUDA in the next days, and we can later discuss what to keep/discard. Karl: regarding the time constraints, our idea is to present something at a conference this summer, and deadlines are approaching. > > Cheers, > Dominic > > >> >>> >>> If there is interest we can help in adding this stuff. >> >> What are your time constraints? >> >> Best regards, >> Karli >> >> > > -- > Dominic Meiser > Tech-X Corporation - 5621 Arapahoe Avenue - Boulder, CO 80303
Re: [petsc-dev] Not possible to do a VecPlaceArray for veccusp
On Fri, Feb 26, 2016 at 02:49:39PM +0100, Karl Rupp wrote: > > >>The alternative would be to use raw cuda pointers instead of cusp > >>arrays for GPU memory in VecCUSP. That would be a fairly > >>significant undertaking (certainly more than the 2-3 weeks Karli > >>is estimating for getting the ViennaCL cuda backend in). > > > >Do you mean creating a new class VECCUDA in addition to VECCUSP and > >VECVIENNACL? This could be a solution for us. It would mean maybe > >refactoring MATAIJCUSPARSE to work with these new Vecs? > > I prefer to replace VECCUSP with e.g. VECCUDA (and eventually also > rename VECCUSPARSE to VECCUDA to have a unified naming for all the > things provided natively with the CUDA SDK) in order to reduce > external dependencies. CUSP will provide matrices, preconditioners, > etc. as before, but is only optional and thus less likely to cause > installation troubles. Supporting VECCUSP and VECCUDA next to each > other is going to be too much code duplication without any benefit. That makes sense. At the Vec level we should be using a low level construct (i.e. cuda raw pointers) because clients can always provide raw pointers and they know how to consume them (e.g. if they want to use cusp vectors on their end). > > Even if we do provide VECCUDA, I still dislike the fact that we > would have to maintain essentially the same code twice: One for > CUDA, one for OpenCL/ViennaCL. With the ViennaCL bindings providing > OpenMP and CUDA support soon, this also duplicates functionality. A > possible 'fix' is to just use ViennaCL for CUDA+OpenCL+OpenMP and > thus only maintain a single PETSc plugin for all three. However, I'm > certainly too biased to be taken seriously here. I agree with this in principle. Perhaps it's time to consolidate the cuda/cusp/cusparse/opencl efforts. Note however that MATAIJCUSPARSE provides capabilities that won't be available right away with ViennaCL (e.g. multi-GPU block Jacobi and ASM preconditioners). Cheers, Dominic > > > > > If there is interest we can help in adding this stuff. > > What are your time constraints? > > Best regards, > Karli > > -- Dominic Meiser Tech-X Corporation - 5621 Arapahoe Avenue - Boulder, CO 80303
Re: [petsc-dev] Not possible to do a VecPlaceArray for veccusp
The alternative would be to use raw cuda pointers instead of cusp arrays for GPU memory in VecCUSP. That would be a fairly significant undertaking (certainly more than the 2-3 weeks Karli is estimating for getting the ViennaCL cuda backend in). Do you mean creating a new class VECCUDA in addition to VECCUSP and VECVIENNACL? This could be a solution for us. It would mean maybe refactoring MATAIJCUSPARSE to work with these new Vecs? I prefer to replace VECCUSP with e.g. VECCUDA (and eventually also rename VECCUSPARSE to VECCUDA to have a unified naming for all the things provided natively with the CUDA SDK) in order to reduce external dependencies. CUSP will provide matrices, preconditioners, etc. as before, but is only optional and thus less likely to cause installation troubles. Supporting VECCUSP and VECCUDA next to each other is going to be too much code duplication without any benefit. Even if we do provide VECCUDA, I still dislike the fact that we would have to maintain essentially the same code twice: One for CUDA, one for OpenCL/ViennaCL. With the ViennaCL bindings providing OpenMP and CUDA support soon, this also duplicates functionality. A possible 'fix' is to just use ViennaCL for CUDA+OpenCL+OpenMP and thus only maintain a single PETSc plugin for all three. However, I'm certainly too biased to be taken seriously here. > > If there is interest we can help in adding this stuff. What are your time constraints? Best regards, Karli
Re: [petsc-dev] Not possible to do a VecPlaceArray for veccusp
> El 25 feb 2016, a las 17:19, Dominic Meiser escribió: > > On Thu, Feb 25, 2016 at 01:13:01PM +0100, Jose E. Roman wrote: >> We are trying to do some GPU developments on the SLEPc side, and we would >> need a way of placing the array of a VECCUSP vector, providing the GPU >> address. Specifically, what we want to do is have a large Vec on GPU and >> slice it in several smaller Vecs. >> >> For the GetArray/RestoreArray we have all possibilities: >> - VecGetArray: gets the pointer to the buffer stored in CPU memory >> - VecCUSPGetArray*: returns a CUSPARRAY object that contains some info, >> including the buffer allocated in GPU memory >> - VecCUSPGetCUDAArray*: returns a raw pointer of the GPU buffer >> >> The problem comes with PlaceArray equivalents. Using VecPlaceArray we can >> provide a new pointer to CPU memory. We wanted to implement the equivalent >> thing for GPU, but we found difficulties due to Thrust. If we wanted to >> provide a VecCUSPPlaceCUDAArray the problem is that Thrust does not allow >> wrapping an exisiting GPU buffer with a CUSPARRAY object (when creating a >> CUSPARRAY it always allocates new memory). On the other hand, >> VecCUSPPlaceArray is possible to implement, but the problem is that one >> should provide a CUSPARRAY obtained from a VecCUSPGetArray* without >> modification (it is not possible to do pointer arithmetic with a CUSPARRAY). >> >> Any thoughts? >> > > I think your and Karli's analysis is correct, this is currently > not supported. Besides Karli's proposal to use ViennaCL's cuda > backend a different option might be to use cusp's array views. > These have a constructor for sub-ranges of other cusp arrays: > > https://github.com/cusplibrary/cusplibrary/blob/master/cusp/array1d.h#L409 > > However, enabling cusp array views in something like > VecCUSPPlaceArray is not immediately possible. The CUSPARRAY > type, which is currently hardwired to be > cusp::array1d, would have to > become a template parameter. I'm not sure if we want to go down > that path. Yes, we do not like this. > > The alternative would be to use raw cuda pointers instead of cusp > arrays for GPU memory in VecCUSP. That would be a fairly > significant undertaking (certainly more than the 2-3 weeks Karli > is estimating for getting the ViennaCL cuda backend in). Do you mean creating a new class VECCUDA in addition to VECCUSP and VECVIENNACL? This could be a solution for us. It would mean maybe refactoring MATAIJCUSPARSE to work with these new Vecs? If there is interest we can help in adding this stuff. > > Cheers, > Dominic > > -- > Dominic Meiser > Tech-X Corporation - 5621 Arapahoe Avenue - Boulder, CO 80303
Re: [petsc-dev] Not possible to do a VecPlaceArray for veccusp
On Thu, Feb 25, 2016 at 01:13:01PM +0100, Jose E. Roman wrote: > We are trying to do some GPU developments on the SLEPc side, and we would > need a way of placing the array of a VECCUSP vector, providing the GPU > address. Specifically, what we want to do is have a large Vec on GPU and > slice it in several smaller Vecs. > > For the GetArray/RestoreArray we have all possibilities: > - VecGetArray: gets the pointer to the buffer stored in CPU memory > - VecCUSPGetArray*: returns a CUSPARRAY object that contains some info, > including the buffer allocated in GPU memory > - VecCUSPGetCUDAArray*: returns a raw pointer of the GPU buffer > > The problem comes with PlaceArray equivalents. Using VecPlaceArray we can > provide a new pointer to CPU memory. We wanted to implement the equivalent > thing for GPU, but we found difficulties due to Thrust. If we wanted to > provide a VecCUSPPlaceCUDAArray the problem is that Thrust does not allow > wrapping an exisiting GPU buffer with a CUSPARRAY object (when creating a > CUSPARRAY it always allocates new memory). On the other hand, > VecCUSPPlaceArray is possible to implement, but the problem is that one > should provide a CUSPARRAY obtained from a VecCUSPGetArray* without > modification (it is not possible to do pointer arithmetic with a CUSPARRAY). > > Any thoughts? > I think your and Karli's analysis is correct, this is currently not supported. Besides Karli's proposal to use ViennaCL's cuda backend a different option might be to use cusp's array views. These have a constructor for sub-ranges of other cusp arrays: https://github.com/cusplibrary/cusplibrary/blob/master/cusp/array1d.h#L409 However, enabling cusp array views in something like VecCUSPPlaceArray is not immediately possible. The CUSPARRAY type, which is currently hardwired to be cusp::array1d, would have to become a template parameter. I'm not sure if we want to go down that path. The alternative would be to use raw cuda pointers instead of cusp arrays for GPU memory in VecCUSP. That would be a fairly significant undertaking (certainly more than the 2-3 weeks Karli is estimating for getting the ViennaCL cuda backend in). Cheers, Dominic -- Dominic Meiser Tech-X Corporation - 5621 Arapahoe Avenue - Boulder, CO 80303
Re: [petsc-dev] Not possible to do a VecPlaceArray for veccusp
Hi Jose, > We are trying to do some GPU developments on the SLEPc side, and we would need a way of placing the array of a VECCUSP vector, providing the GPU address. Specifically, what we want to do is have a large Vec on GPU and slice it in several smaller Vecs. For the GetArray/RestoreArray we have all possibilities: - VecGetArray: gets the pointer to the buffer stored in CPU memory - VecCUSPGetArray*: returns a CUSPARRAY object that contains some info, including the buffer allocated in GPU memory - VecCUSPGetCUDAArray*: returns a raw pointer of the GPU buffer The problem comes with PlaceArray equivalents. Using VecPlaceArray we can provide a new pointer to CPU memory. We wanted to implement the equivalent thing for GPU, but we found difficulties due to Thrust. If we wanted to provide a VecCUSPPlaceCUDAArray the problem is that Thrust does not allow wrapping an exisiting GPU buffer with a CUSPARRAY object (when creating a CUSPARRAY it always allocates new memory). On the other hand, VecCUSPPlaceArray is possible to implement, but the problem is that one should provide a CUSPARRAY obtained from a VecCUSPGetArray* without modification (it is not possible to do pointer arithmetic with a CUSPARRAY). As far as I can see from browsing the documentation and the web there is indeed no such option. Ouch. Any thoughts? I'll soon expose the CUDA backend of ViennaCL in PETSc, which will offer such functionality. Is this an option? It will take 2-3 weeks, though. Best regards, Karli
[petsc-dev] Not possible to do a VecPlaceArray for veccusp
We are trying to do some GPU developments on the SLEPc side, and we would need a way of placing the array of a VECCUSP vector, providing the GPU address. Specifically, what we want to do is have a large Vec on GPU and slice it in several smaller Vecs. For the GetArray/RestoreArray we have all possibilities: - VecGetArray: gets the pointer to the buffer stored in CPU memory - VecCUSPGetArray*: returns a CUSPARRAY object that contains some info, including the buffer allocated in GPU memory - VecCUSPGetCUDAArray*: returns a raw pointer of the GPU buffer The problem comes with PlaceArray equivalents. Using VecPlaceArray we can provide a new pointer to CPU memory. We wanted to implement the equivalent thing for GPU, but we found difficulties due to Thrust. If we wanted to provide a VecCUSPPlaceCUDAArray the problem is that Thrust does not allow wrapping an exisiting GPU buffer with a CUSPARRAY object (when creating a CUSPARRAY it always allocates new memory). On the other hand, VecCUSPPlaceArray is possible to implement, but the problem is that one should provide a CUSPARRAY obtained from a VecCUSPGetArray* without modification (it is not possible to do pointer arithmetic with a CUSPARRAY). Any thoughts?