On Mon, Apr 20, 2020 at 10:28 PM Xiaoye S. Li <[email protected]> wrote: > Mark, > thanks for debugging this! Indeed, I confirm -- that particular "free" > should be regular free instead of cudaHostfree(), because that data > structure is not allocated by cudaAllocHost(). I have been running this > cuda code on Summit, somehow the bug didn't show up. >
Odd, but it seems to work fine for me now. eg, I get a speedup of 6x on a ~50K equation 3D systems (Q3 elements with 2 dof per vertex). > > I just updated the master branch with this fix. Will be absorbed in a > future release. > > As for PRNTlevel>=2, perhaps check your cmake build script. It should be > set to 0 for production build. > > I don't see where that gets set. PRNTlevel does not seem to be in our repo. I see it in 'MAKE_INC/make.cuda_gpu: -DDEBUGlevel=0 -DPRNTlevel=1 -DPROFlevel=0', but I think it is set at >= 2. I have manually disabled the print statements (~ 5 places). Thanks, Mark > Sherry > > > On Sun, Apr 19, 2020 at 6:32 PM Mark Adams <[email protected]> wrote: > >> Also, we have PRNTlevel>=2 in SuperLU_dist. This is causing a lot of >> output. It's not clear where that is set (it's a #define) >> >> On Sun, Apr 19, 2020 at 9:28 PM Mark Adams <[email protected]> wrote: >> >>> Sherry, I found the problem. >>> >>> I added this print statement to dDestroy_LU >>> >>> nb = CEILING(nsupers, grid->npcol); >>> for (i = 0; i < nb; ++i) >>> if ( Llu->Lrowind_bc_ptr[i] ) { >>> >>> * fprintf(stderr,"dDestroy_LU: GPU free Llu->Lnzval_bc_ptr[%d/%d] = %p, >>> CPU free Llu->Lrowind_bc_ptr = >>> %p\n",i,nb,Llu->Lnzval_bc_ptr[i],Llu->Lrowind_bc_ptr[i]);* >>> SUPERLU_FREE (Llu->Lrowind_bc_ptr[i]); >>> #ifdef GPU_ACC >>> checkCuda(cudaFreeHost(Llu->Lnzval_bc_ptr[i])); >>> #else >>> SUPERLU_FREE (Llu->Lnzval_bc_ptr[i]); >>> #endif >>> } >>> >>> And I see: >>> >>> 1 SNES Function norm 1.245977692562e-04 >>> >>> *dDestroy_LU: GPU free Llu->Lnzval_bc_ptr[0/134] = 0x4ff9b000, CPU free >>> Llu->Lrowind_bc_ptr = 0x4ff9a000*ex112d: cudahook.cc:762: CUresult >>> host_free_callback(void*): Assertion `cacheNode != __null' failed. >>> >>> THis looks like Lnzval_bc_ptr is on the CPU so I removed the GPU_ACC >>> stuff and it works now. >>> >>> I see this in distribution. Perhaps this a serial run bug? >>> >>> On Sun, Apr 19, 2020 at 5:58 PM Xiaoye S. Li <[email protected]> wrote: >>> >>>> Mark, >>>> you should fork a branch of your own to do this. >>>> >>>> Sherry >>>> >>>> On Sun, Apr 19, 2020 at 2:54 PM Stefano Zampini < >>>> [email protected]> wrote: >>>> >>>>> First, commit your changes to the superlu_dist branch, then rerun >>>>> configure with >>>>> >>>>> —download-superlu_dist-commit=HEAD >>>>> >>>>> >>>>> > On Apr 20, 2020, at 12:50 AM, Mark Adams <[email protected]> wrote: >>>>> > >>>>> > I would like to modify SuperLU_dist but if I change the source and >>>>> configure it says no need to reconfigure, use --force. I use --force and >>>>> it >>>>> seems to clobber my changes. Can I tell configure to use build but not >>>>> download SuperLU? >>>>> >>>>>
