I understand the issue of some uses of blas being too small to justify shipping to the gpu. So I was thinking maybe petsc could choose at runtime based on problem size whether to use a cpu implementation of the blas or a gpu implementation of the blas. Seems like that should be feasible but more complex than just having a petsc interface to substitute use of cublas for everything.
Yes, I will back off tonight and try to work with a petsc example problem. I had to add an additional library to CHOLMOD.py to satisfy an unstatisfied external. I think it was named libsuitesparseconfig.a and was needed to resolve the SuiteSparse_time symbol. Would probably be good to have a SuiteSparse.py module. I'll send more info if I can reproduce the problem with a petsc example. Thanks, Dave ----- Original Message ----- From: "Jed Brown" <[email protected]> To: "Dave Nystrom" <dnystrom1 at comcast.net> Cc: "For users of the development version of PETSc" <petsc-dev at mcs.anl.gov> Sent: Wednesday, June 13, 2012 7:41:05 AM Subject: Re: [petsc-dev] SuiteSparse 4.0 You can't use cublas everywhere because it's better to do small sizes on the CPU and different threads need to be able to operate independently (without multiple devices). Yes, try a PETSc example before your own code. Did petsc build without any changes to the cholmod interface? ALWAYS send the error message/stack trace---even if we don't have an answer, it usually tells us how difficult the problem is likely to be to fix. On Jun 13, 2012 8:09 AM, "Dave Nystrom" < dnystrom1 at comcast.net > wrote: Well, I tried it last night but without success. I got a memory corruption error when I tried running my app. I suppose I should try first to get it working with a petsc example. I'm building SuiteSparse but only using cholmod out of that package. Several of the SuiteSparse packages use blas but cholmod is the only one so far that has the option to use cublas. Tim gets about a 10x speedup over the single core result when using cublas on an example problem that spends a lot of time in blas. So seems worth trying. Anyway, I think I must be doing something wrong so far. But for the same problem, Tim gets about a 20+ percent speedup over using multi-threaded Goto blas. Probably a better solution for petsc would be support for using cublas for all of the petsc blas needs. Why should just one petsc blas client have access to cublas? But not sure how much work is involved for that - I see it is on the petsc ToDo list. I will try again tonight but would welcome advice or experiences from anyone else who has tried the new cholmod. Dave Jed Brown writes: > Nope, why don't you try it and send us a patch if you get it working. > > On Wed, Jun 13, 2012 at 12:49 AM, Dave Nystrom < dnystrom1 at comcast.net > >wrote: > > > Has anyone tried building petsc-dev to use cholmod-2.0 which is part of > > SuiteSparse-4.0 with the cholmod support for using cublas enabled? I am > > interested in trying the cublas support for cholmod to see how it compares > > with mkl and goto for my solves. > > > > Thanks, > > > > Dave > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20120613/0120280e/attachment.html>
