One of the engineers here is having problems with any job that tries to use more than 1024 cores. His csh script is getting a 'Too many open files' error, so I tried raising the descriptors limit in the shell from 1024 to 65535. That seems to have worked for interactive logins, but not for gridengine jobs.
If I ssh to one of the client nodes and issue a 'limit' command, I get: % ssh compute-1-6 limit cputime unlimited filesize unlimited datasize unlimited stacksize 10240 kbytes coredumpsize 0 kbytes memoryuse unlimited vmemoryuse unlimited descriptors 65535 memorylocked 32 kbytes maxproc 131072 but if I submit a script that contains: # limit # echo 'cat /proc/sys/fs/file-max' cat /proc/sys/fs/file-max # I get (from the same client as above) in the logfile: cputime unlimited filesize unlimited datasize unlimited stacksize unlimited coredumpsize 0 kbytes memoryuse unlimited vmemoryuse unlimited descriptors 1024 memorylocked 32 kbytes maxproc 524288 cat /proc/sys/fs/file-max 6448170 Please note that 'descriptors' is still showing 1024 instead of 65535. Any idea where that is coming from? Why is gridengine using a different value than the one that I get when I just ssh into a node? Any suggestions? JY _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
