Brian Smith <[email protected]> writes:

> We found h_vmem to be highly unpredictable, especially with java-based
> applications.  Stack settings were screwed up, certain applications
> wouldn't launch (segfaults), and hard limits were hard to determine
> for things like MPI applications.  When your master has to launch 1024
> MPI sub-tasks (qrsh), it generally eats up more VMEM than the slave
> tasks do.  It was just hard to get right.

I've never really understood this business.  To start with, why does the
master need to launch so many qrshs?  It seems it would be worth moving
to an MPI which supports tree spawning for such cases.  Also, I get the
impression that other people see higher qrsh usage than I do.  I know
this is a much smaller job -- we envy you 1024 nodes, if they're
reliable! -- but the overhead on this 32-node job is in the noise; you
wouldn't guess which is the master.

  $ qhost -h `nodes-in-job 154667`|tail -15
  node224     lx-amd64    4    2    4    4  4.01    7.7G    210.6M    3.7G     
0.0
  node228     lx-amd64    4    2    4    4  4.02    7.6G    218.1M    3.7G  
288.0K
  node229     lx-amd64    4    2    4    4  4.04    7.7G    204.7M    3.7G     
0.0
  node230     lx-amd64    4    2    4    4  5.19    7.6G    251.5M    3.7G   
24.0K
  node232     lx-amd64    4    2    4    4  4.14    7.3G    205.9M    3.7G     
0.0
  node233     lx-amd64    4    2    4    4  4.10    7.3G    208.8M    3.7G     
0.0
  node234     lx-amd64    4    2    4    4  4.13    7.3G    212.6M    3.7G     
0.0
  node235     lx-amd64    4    2    4    4  4.03    7.3G    246.1M    3.7G   
25.9M
  node236     lx-amd64    4    2    4    4  4.19    7.3G    230.2M    3.7G    
3.7M
  node237     lx-amd64    4    2    4    4  4.14    7.3G    210.8M    3.7G     
0.0
  node239     lx-amd64    4    2    4    4  4.03    7.3G    206.2M    3.7G     
0.0
  node241     lx-amd64    4    2    4    4  4.21    7.3G    282.0M    3.7G   
70.3M
  node242     lx-amd64    4    2    4    4  4.11    7.3G    205.0M    3.7G     
0.0
  node243     lx-amd64    4    2    4    4  4.10    7.3G    210.7M    3.7G     
0.0
  node244     lx-amd64    4    2    4    4  4.20    7.3G    206.7M    3.7G     
0.0

That doesn't mean we shouldn't support different limits on the master,
of course, and I'll look at excluding particular processes from the
cpuset/cgroup-based accounting.  [It looks as if that's potentially
duplicating two other lots of work, sigh, though I'm not sure anyone
else is interested in the Red Hat 5-ish systems that are required for
the largest scientific computing effort.]

-- 
Community Grid Engine:  http://arc.liv.ac.uk/SGE/
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to