Re: [petsc-dev] Modify 3rd party lib

Xiaoye S. Li Tue, 21 Apr 2020 08:50:42 -0700

Mark,
even with multiple cores CPU (either with OpenMP or MPI), you can still use
GPU.  The current model uses GPU as an off-load device, each MPI drives one
GPU.


Since Summit has a lot more memory on GPU (compared to the older GPUs), we
can give a larger portion of the work to GPU.  I'll write later about some
env variables setup.

Sherry

On Tue, Apr 21, 2020 at 6:09 AM Mark Adams <[email protected]> wrote:

>
>> Odd, but it seems to work fine for me now. eg, I get a speedup of 6x on a
>> ~50K equation 3D systems (Q3 elements with 2 dof per vertex).
>>
>>
>>
>> Mark, is it such speedup wrt the CPU version of SUPERLU_DIST? Or just the
>> PETSc factorizations?
>>
>
>
> I was not clear. THis is on SUMMIT. One CPU and one GPU. SUPERLU_DIST-GPU
> vs PETSc-CPU was about 6x faster. I have seen SUPERLU vs PETSc on the CPU
> on smaller problems and PETSc was a little faster.
>
> Note, SUMMIT has 7 cores per GPU so it would be reasonable to
> run SUPERLU_DIST-CPU on 7 cores, in which case the speedup would clearly be
> gone, but that is not how I run this app.
>
>
>>
>>
>>> I just updated the master branch with this fix.  Will be absorbed in a
>>> future release.
>>>
>>> As for PRNTlevel>=2, perhaps check your cmake build script.  It should
>>> be set to 0 for production build.
>>>
>>>
>> I don't see where that gets set. PRNTlevel does not seem to be in our
>> repo. I see it in 'MAKE_INC/make.cuda_gpu:         -DDEBUGlevel=0
>> -DPRNTlevel=1 -DPROFlevel=0', but I think it is set at >= 2. I have
>> manually disabled the print statements (~ 5 places).
>>
>> Thanks,
>> Mark
>>
>>
>>> Sherry
>>>
>>>
>>> On Sun, Apr 19, 2020 at 6:32 PM Mark Adams <[email protected]> wrote:
>>>
>>>> Also, we have PRNTlevel>=2 in SuperLU_dist. This is causing a lot of
>>>> output. It's not clear where that is set (it's a #define)
>>>>
>>>> On Sun, Apr 19, 2020 at 9:28 PM Mark Adams <[email protected]> wrote:
>>>>
>>>>> Sherry, I found the problem.
>>>>>
>>>>> I added this print statement to dDestroy_LU
>>>>>
>>>>>     nb = CEILING(nsupers, grid->npcol);
>>>>>     for (i = 0; i < nb; ++i)
>>>>> if ( Llu->Lrowind_bc_ptr[i] ) {
>>>>>
>>>>> *  fprintf(stderr,"dDestroy_LU: GPU free Llu->Lnzval_bc_ptr[%d/%d] =
>>>>> %p, CPU free Llu->Lrowind_bc_ptr =
>>>>> %p\n",i,nb,Llu->Lnzval_bc_ptr[i],Llu->Lrowind_bc_ptr[i]);*
>>>>>  SUPERLU_FREE (Llu->Lrowind_bc_ptr[i]);
>>>>> #ifdef GPU_ACC
>>>>>    checkCuda(cudaFreeHost(Llu->Lnzval_bc_ptr[i]));
>>>>> #else
>>>>>    SUPERLU_FREE (Llu->Lnzval_bc_ptr[i]);
>>>>> #endif
>>>>> }
>>>>>
>>>>> And I see:
>>>>>
>>>>>    1 SNES Function norm 1.245977692562e-04
>>>>>
>>>>> *dDestroy_LU: GPU free Llu->Lnzval_bc_ptr[0/134] = 0x4ff9b000, CPU
>>>>> free Llu->Lrowind_bc_ptr = 0x4ff9a000*ex112d: cudahook.cc:762:
>>>>> CUresult host_free_callback(void*): Assertion `cacheNode != __null' 
>>>>> failed.
>>>>>
>>>>> THis looks like Lnzval_bc_ptr is on the CPU so I removed the GPU_ACC
>>>>> stuff and it works now.
>>>>>
>>>>> I see this in distribution. Perhaps this a serial run bug?
>>>>>
>>>>> On Sun, Apr 19, 2020 at 5:58 PM Xiaoye S. Li <[email protected]> wrote:
>>>>>
>>>>>> Mark,
>>>>>> you should fork a branch of your own to do this.
>>>>>>
>>>>>> Sherry
>>>>>>
>>>>>> On Sun, Apr 19, 2020 at 2:54 PM Stefano Zampini <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> First, commit your changes to the superlu_dist branch, then rerun
>>>>>>> configure with
>>>>>>>
>>>>>>> —download-superlu_dist-commit=HEAD
>>>>>>>
>>>>>>>
>>>>>>> > On Apr 20, 2020, at 12:50 AM, Mark Adams <[email protected]> wrote:
>>>>>>> >
>>>>>>> > I would like to modify SuperLU_dist but if I change the source and
>>>>>>> configure it says no need to reconfigure, use --force. I use --force 
>>>>>>> and it
>>>>>>> seems to clobber my changes. Can I tell configure to use build but not
>>>>>>> download SuperLU?
>>>>>>>
>>>>>>>
>>

Re: [petsc-dev] Modify 3rd party lib

Reply via email to