Mark,

   

    Where did you run the little test program I sent you 

1) when it produced 

   The 1120 and negative number and   (was this on the compile server or on a 
compute node?)

2) when it produced the correct answer? (compile server or compute node?)

Do you run configure on a compile server (that has no GPUs) or a compute server 
that has GPUs

 Don't spend your time bisecting PETSc we know exactly where the problem is, we 
just don't see how it happens.

   cuda.py, if it cannot find deviceQuery and if you did not provide a 
generation arch with -with-cuda-gencodearch=70, runs a version of the little 
code I sent you to get the number but it is ??apparently?? producing garbage or 
not running on the compiler server and gives the wrong number 1120. 

   Just use the option -with-cuda-gencodearch=70  (you do not need to pass this 
information to any flags any more, just with this option and it will use it). 

  Barry

Ideally we want it to figure it out automatically and this little test program 
in configure is suppose to do this but since that is not always working yet you 
should just use -with-cuda-gencodearch=70



> On May 27, 2021, at 5:45 AM, Mark Adams <[email protected]> wrote:
> 
> FYI, I was running the test incorrectly:
> 03:38 cgpu12  ~/petsc_install$ srun -n 1 -G 1 ./a.out 
> 70
> 70
> 
> On Wed, May 26, 2021 at 10:21 PM Mark Adams <[email protected] 
> <mailto:[email protected]>> wrote:
> I had git bisect working and was 4 steps away when I got a new crash.
> configure.log is empty.
> 
> 19:15 1 cgpu02 (a531cba26b...)|BISECTING ~/petsc$ git bisect bad
> Bisecting: 19 revisions left to test after this (roughly 4 steps)
> [149e269f455574fbe8ce3ebaf42121ae7fdf0635] Merge branch 'tisaac/feature-spqr' 
> into 'main'
> 19:16 cgpu02 (149e269f45...)|BISECTING ~/petsc$ ../arch-cori-gpu-opt-gcc.py 
> PETSC_DIR=$PWD
> ===============================================================================
>              Configuring PETSc to compile on your system                      
>  
> ===============================================================================
> *******************************************************************************
>         CONFIGURATION CRASH  (Please send configure.log to 
> [email protected] <mailto:[email protected]>)
> *******************************************************************************
> 
> EOL while scanning string literal (cuda.py, line 176)
>   File "/global/u2/m/madams/petsc/config/configure.py", line 455, in 
> petsc_configure
>     framework = 
> config.framework.Framework(['--configModules=PETSc.Configure','--optionsModule=config.compilerOptions']+sys.argv[1:],
>  loadArgDB = 0)
>   File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", 
> line 107, in __init__
>     self.createChildren()
>   File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", 
> line 344, in createChildren
>     self.getChild(moduleName)
>   File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", 
> line 329, in getChild
>     config.setupDependencies(self)
>   File "/global/u2/m/madams/petsc/config/PETSc/Configure.py", line 80, in 
> setupDependencies
>     self.blasLapack    = framework.require('config.packages.BlasLapack',self)
>   File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", 
> line 349, in require
>     config = self.getChild(moduleName, keywordArgs)
>   File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", 
> line 329, in getChild
>     config.setupDependencies(self)
>   File 
> "/global/u2/m/madams/petsc/config/BuildSystem/config/packages/BlasLapack.py", 
> line 21, in setupDependencies
>     config.package.Package.setupDependencies(self, framework)
>   File "/global/u2/m/madams/petsc/config/BuildSystem/config/package.py", line 
> 151, in setupDependencies
>     self.mpi         = framework.require('config.packages.MPI',self)
>   File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", 
> line 349, in require
>     config = self.getChild(moduleName, keywordArgs)
>   File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", 
> line 329, in getChild
>     config.setupDependencies(self)
>   File "/global/u2/m/madams/petsc/config/BuildSystem/config/packages/MPI.py", 
> line 73, in setupDependencies
>     self.mpich   = framework.require('config.packages.MPICH', self)
>   File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", 
> line 349, in require
>     config = self.getChild(moduleName, keywordArgs)
>   File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", 
> line 329, in getChild
>     config.setupDependencies(self)
>   File 
> "/global/u2/m/madams/petsc/config/BuildSystem/config/packages/MPICH.py", line 
> 16, in setupDependencies
>     self.cuda            = framework.require('config.packages.cuda',self)
>   File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", 
> line 349, in require
>     config = self.getChild(moduleName, keywordArgs)
>   File "/global/u2/m/madams/petsc/config/BuildSystem/config/framework.py", 
> line 302, in getChild
>     type   = __import__(moduleName, globals(), locals(), 
> ['Configure']).Configure
> 19:16 cgpu02 (149e269f45...)|BISECTING ~/petsc$ ../arch-cori-gpu-opt-gcc.py 
> PETSC_DIR=$PWD
> 
> On Wed, May 26, 2021 at 10:10 PM Junchao Zhang <[email protected] 
> <mailto:[email protected]>> wrote:
> 
> 
> 
> On Wed, May 26, 2021 at 6:13 PM Barry Smith <[email protected] 
> <mailto:[email protected]>> wrote:
> 
>   What is HOST=cori09  Does it have GPUs?
> 
>   
> https://docs.nvidia.com/cuda/cuda-runtime-api/structcudaDeviceProp.html#structcudaDeviceProp_164490976c8e07e028a8f1ce1f5cd42d6
>  
> <https://docs.nvidia.com/cuda/cuda-runtime-api/structcudaDeviceProp.html#structcudaDeviceProp_164490976c8e07e028a8f1ce1f5cd42d6>
> 
>   Seems to clearly state
> 
> int  cudaDeviceProp 
> <https://docs.nvidia.com/cuda/cuda-runtime-api/structcudaDeviceProp.html#structcudaDeviceProp>::major
>  
> <https://docs.nvidia.com/cuda/cuda-runtime-api/structcudaDeviceProp.html#structcudaDeviceProp_164490976c8e07e028a8f1ce1f5cd42d6>
>  [inherited] 
> Major compute capability 
> 
> 
> 
> Mark, please compile and run this program on the machine you are running 
> configure on
> 
> #include <stdio.h>
> #include <cuda.h>
> #include <cuda_runtime.h>
> #include <cuda_runtime_api.h>
> #include <cuda_device_runtime_api.h>
> int main(int arg,char **args)
> {
> struct cudaDeviceProp dp;
>                 cudaGetDeviceProperties(&dp, 0);
>                 printf("%d\n",10*dp.major+dp.minor);
> 
>                 int major,minor;
>               cuDeviceGetAttribute(&major, 
> CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR, 0);
> cuDeviceGetAttribute(&minor, CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MINOR, 0);
>                 printf("%d\n",10*major+minor);
>                 return(0);
> Probably, you need to check the return code of these two function calls to 
> make sure they are correct.
>  
> }
> 
> This is what I get 
> 
> $ nvcc mytest.c -lcuda
> ~/petsc (main=) arch-main
> $ ./a.out
> 70
> 70
> 
> Which is exactly what it is suppose to do.
> 
> Barry
> 
>> On May 26, 2021, at 5:31 PM, Barry Smith <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> 
>>   Yes, this code which I guess never got hit before 
>> 
>> cudaDeviceProp dp;                cudaGetDeviceProperties(&dp, 0);           
>>      printf("%d\n",10*dp.major+dp.minor);                return(0);;
>> 
>> is using the wrong property for the generation. 
>> 
>>  Back to the CUDA documentation for the correct information. 
>> 
>> 
>> 
>>> On May 26, 2021, at 3:47 PM, Jacob Faibussowitsch <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> 1120 sounds suspiciously like some CUDA version rather than architecture or 
>>> compute capability…
>>> 
>>> Best regards,
>>> 
>>> Jacob Faibussowitsch
>>> (Jacob Fai - booss - oh - vitch)
>>> Cell: +1 (312) 694-3391
>>> 
>>>> On May 26, 2021, at 22:29, Mark Adams <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> 
>>>> I started to get this error today on Cori. 
>>>> 
>>>> nvcc fatal   : Unsupported gpu architecture 'compute_1120'
>>>> 
>>>> I am pretty sure I had a clean build but I can redo it if you don't know 
>>>> where this is from.
>>>> 
>>>> Thanks,
>>>> Mark
>>>> <configure.log>
>> 
> 

Reply via email to