Dominic, I second a request for such a branch. Thanks, Ashwin
On Mon, Sep 22, 2014 at 3:38 PM, Dominic Meiser <[email protected]> wrote: > On 09/22/2014 12:57 PM, Chung Shen wrote: > >> Dear PETSc Users, >> >> I am new to PETSc and trying to determine if GPU speedup is possible with >> the 3D Poisson solvers. I configured 2 copies of 'petsc-master' on a >> standalone machine, one with CUDA toolkit 5.0 and one without (both without >> MPI): >> Machine: HP Z820 Workstation, Redhat Enterprise Linux 5.0 >> CPU: (x2) 8-core Xeon E5-2650 2.0GHz, 128GB Memory >> GPU: (x2) Tesla K20c (706MHz, 5.12GB Memory, Cuda Compatibility: 3.5, >> Driver: 313.09) >> >> I used 'src/ksp/ksp/examples/tests/ex32.c' as a test and was getting >> about 20% speedup with GPU. Is this reasonable or did I miss something? >> >> Attached is a comparison chart with two sample logs. The y-axis is the >> elapsed time in seconds and the x-axis corresponds to the size of the >> problem. In particular, I wonder if the numbers of calls to 'VecCUSPCopyTo' >> and 'VecCUSPCopyFrom' shown in the GPU log are excessive? >> >> Thanks in advance for your reply. >> >> Best Regards, >> >> Chung Shen >> > A few comments: > > - To get reliable timing you should configure PETSc without debugging > (i.e. --with-debugging=no) > - The ILU preconditioning in your GPU benchmark is done on the CPU. The > host-device data transfers are killing performance. Can you try to run with > the additional option --pc_factor_mat_solver_packe cusparse? This will > perform the preconditioning on the GPU. > - If you're interested in running benchmarks in parallel you will need a > few patches that are not yet in petsc/master. I can put together a branch > that has the needed fixes. > > Cheers, > Dominic > > -- > Dominic Meiser > Tech-X Corporation > 5621 Arapahoe Avenue > Boulder, CO 80303 > USA > Telephone: 303-996-2036 > Fax: 303-448-7756 > www.txcorp.com > >
