Mark, thanks for debugging this! Indeed, I confirm -- that particular "free" should be regular free instead of cudaHostfree(), because that data structure is not allocated by cudaAllocHost(). I have been running this cuda code on Summit, somehow the bug didn't show up.
I just updated the master branch with this fix. Will be absorbed in a future release. As for PRNTlevel>=2, perhaps check your cmake build script. It should be set to 0 for production build. Sherry On Sun, Apr 19, 2020 at 6:32 PM Mark Adams <[email protected]> wrote: > Also, we have PRNTlevel>=2 in SuperLU_dist. This is causing a lot of > output. It's not clear where that is set (it's a #define) > > On Sun, Apr 19, 2020 at 9:28 PM Mark Adams <[email protected]> wrote: > >> Sherry, I found the problem. >> >> I added this print statement to dDestroy_LU >> >> nb = CEILING(nsupers, grid->npcol); >> for (i = 0; i < nb; ++i) >> if ( Llu->Lrowind_bc_ptr[i] ) { >> >> * fprintf(stderr,"dDestroy_LU: GPU free Llu->Lnzval_bc_ptr[%d/%d] = %p, >> CPU free Llu->Lrowind_bc_ptr = >> %p\n",i,nb,Llu->Lnzval_bc_ptr[i],Llu->Lrowind_bc_ptr[i]);* >> SUPERLU_FREE (Llu->Lrowind_bc_ptr[i]); >> #ifdef GPU_ACC >> checkCuda(cudaFreeHost(Llu->Lnzval_bc_ptr[i])); >> #else >> SUPERLU_FREE (Llu->Lnzval_bc_ptr[i]); >> #endif >> } >> >> And I see: >> >> 1 SNES Function norm 1.245977692562e-04 >> >> *dDestroy_LU: GPU free Llu->Lnzval_bc_ptr[0/134] = 0x4ff9b000, CPU free >> Llu->Lrowind_bc_ptr = 0x4ff9a000*ex112d: cudahook.cc:762: CUresult >> host_free_callback(void*): Assertion `cacheNode != __null' failed. >> >> THis looks like Lnzval_bc_ptr is on the CPU so I removed the GPU_ACC >> stuff and it works now. >> >> I see this in distribution. Perhaps this a serial run bug? >> >> On Sun, Apr 19, 2020 at 5:58 PM Xiaoye S. Li <[email protected]> wrote: >> >>> Mark, >>> you should fork a branch of your own to do this. >>> >>> Sherry >>> >>> On Sun, Apr 19, 2020 at 2:54 PM Stefano Zampini < >>> [email protected]> wrote: >>> >>>> First, commit your changes to the superlu_dist branch, then rerun >>>> configure with >>>> >>>> —download-superlu_dist-commit=HEAD >>>> >>>> >>>> > On Apr 20, 2020, at 12:50 AM, Mark Adams <[email protected]> wrote: >>>> > >>>> > I would like to modify SuperLU_dist but if I change the source and >>>> configure it says no need to reconfigure, use --force. I use --force and it >>>> seems to clobber my changes. Can I tell configure to use build but not >>>> download SuperLU? >>>> >>>>
