FYI, I was running the test incorrectly: 03:38 cgpu12 ~/petsc_install$ srun -n 1 -G 1 ./a.out 70 70
On Wed, May 26, 2021 at 10:21 PM Mark Adams <[email protected]> wrote: > I had git bisect working and was 4 steps away when I got a new crash. > configure.log is empty. > > 19:15 1 cgpu02 (a531cba26b...)|BISECTING ~/petsc$ git bisect bad > Bisecting: 19 revisions left to test after this (roughly 4 steps) > [149e269f455574fbe8ce3ebaf42121ae7fdf0635] Merge branch > 'tisaac/feature-spqr' into 'main' > 19:16 cgpu02 (149e269f45...)|BISECTING ~/petsc$ > ../arch-cori-gpu-opt-gcc.py PETSC_DIR=$PWD > > =============================================================================== > Configuring PETSc to compile on your system > > > =============================================================================== > > ******************************************************************************* > CONFIGURATION CRASH (Please send configure.log to > [email protected]) > > ******************************************************************************* > > EOL while scanning string literal (cuda.py, line 176) > File "/global/u2/m/madams/petsc/config/configure.py", line 455, in > petsc_configure > framework = > config.framework.Framework(['--configModules=PETSc.Configure','--optionsModule=config.compilerOptions']+sys.argv[1:], > loadArgDB = 0) > File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", > line 107, in __init__ > self.createChildren() > File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", > line 344, in createChildren > self.getChild(moduleName) > File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", > line 329, in getChild > config.setupDependencies(self) > File "/global/u2/m/madams/petsc/config/PETSc/Configure.py", line 80, in > setupDependencies > self.blasLapack = > framework.require('config.packages.BlasLapack',self) > File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", > line 349, in require > config = self.getChild(moduleName, keywordArgs) > File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", > line 329, in getChild > config.setupDependencies(self) > File > "/global/u2/m/madams/petsc/config/BuildSystem/config/packages/BlasLapack.py", > line 21, in setupDependencies > config.package.Package.setupDependencies(self, framework) > File "/global/u2/m/madams/petsc/config/BuildSystem/config/package.py", > line 151, in setupDependencies > self.mpi = framework.require('config.packages.MPI',self) > File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", > line 349, in require > config = self.getChild(moduleName, keywordArgs) > File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", > line 329, in getChild > config.setupDependencies(self) > File > "/global/u2/m/madams/petsc/config/BuildSystem/config/packages/MPI.py", line > 73, in setupDependencies > self.mpich = framework.require('config.packages.MPICH', self) > File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", > line 349, in require > config = self.getChild(moduleName, keywordArgs) > File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", > line 329, in getChild > config.setupDependencies(self) > File > "/global/u2/m/madams/petsc/config/BuildSystem/config/packages/MPICH.py", > line 16, in setupDependencies > self.cuda = framework.require('config.packages.cuda',self) > File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", > line 349, in require > config = self.getChild(moduleName, keywordArgs) > File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", > line 302, in getChild > type = __import__(moduleName, globals(), locals(), > ['Configure']).Configure > 19:16 cgpu02 (149e269f45...)|BISECTING ~/petsc$ > ../arch-cori-gpu-opt-gcc.py PETSC_DIR=$PWD > > On Wed, May 26, 2021 at 10:10 PM Junchao Zhang <[email protected]> > wrote: > >> >> >> >> On Wed, May 26, 2021 at 6:13 PM Barry Smith <[email protected]> wrote: >> >>> >>> What is HOST=cori09 Does it have GPUs? >>> >>> >>> https://docs.nvidia.com/cuda/cuda-runtime-api/structcudaDeviceProp.html#structcudaDeviceProp_164490976c8e07e028a8f1ce1f5cd42d6 >>> >>> Seems to clearly state >>> >>> int cudaDeviceProp >>> <https://docs.nvidia.com/cuda/cuda-runtime-api/structcudaDeviceProp.html#structcudaDeviceProp> >>> ::major >>> <https://docs.nvidia.com/cuda/cuda-runtime-api/structcudaDeviceProp.html#structcudaDeviceProp_164490976c8e07e028a8f1ce1f5cd42d6> >>> [inherited] >>> >>> Major compute capability >>> >>> >>> Mark, please compile and run this program on the machine you are running >>> configure on >>> >>> #include <stdio.h> >>> #include <cuda.h> >>> #include <cuda_runtime.h> >>> #include <cuda_runtime_api.h> >>> #include <cuda_device_runtime_api.h> >>> int main(int arg,char **args) >>> { >>> struct cudaDeviceProp dp; >>> cudaGetDeviceProperties(&dp, 0); >>> printf("%d\n",10*dp.major+dp.minor); >>> >>> int major,minor; >>> cuDeviceGetAttribute(&major, >>> CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR, 0); >>> cuDeviceGetAttribute(&minor, >>> CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MINOR, 0); >>> printf("%d\n",10*major+minor); >>> return(0); >>> >> Probably, you need to check the return code of these two function calls >> to make sure they are correct. >> >> >>> } >>> >>> This is what I get >>> >>> $ nvcc mytest.c -lcuda >>> ~/petsc* (main=)* arch-main >>> $ ./a.out >>> 70 >>> 70 >>> >>> Which is exactly what it is suppose to do. >>> >>> Barry >>> >>> On May 26, 2021, at 5:31 PM, Barry Smith <[email protected]> wrote: >>> >>> >>> Yes, this code which I guess never got hit before >>> >>> cudaDeviceProp dp; cudaGetDeviceProperties(&dp, 0); >>> printf("%d\n",10*dp.major+dp.minor); >>> return(0);; >>> >>> is using the wrong property for the generation. >>> >>> Back to the CUDA documentation for the correct information. >>> >>> >>> >>> On May 26, 2021, at 3:47 PM, Jacob Faibussowitsch <[email protected]> >>> wrote: >>> >>> 1120 sounds suspiciously like some CUDA version rather than architecture >>> or compute capability… >>> >>> Best regards, >>> >>> Jacob Faibussowitsch >>> (Jacob Fai - booss - oh - vitch) >>> Cell: +1 (312) 694-3391 >>> >>> On May 26, 2021, at 22:29, Mark Adams <[email protected]> wrote: >>> >>> I started to get this error today on Cori. >>> >>> nvcc fatal : Unsupported gpu architecture 'compute_1120' >>> >>> I am pretty sure I had a clean build but I can redo it if you don't know >>> where this is from. >>> >>> Thanks, >>> Mark >>> <configure.log> >>> >>> >>> >>>
