Hello, we're running an ageing cluster, which was initially built a few years ago with Myrinet as its high-performance interconnect. The cluster has recently acquired some new "fat" nodes with 32 cores, and things have started to break: apparently the Myrinet MX kernel module only allows 16 endpoints, but MPI processes allocate one MX endpoint per process. So on a fat node, 16 processes out of 32 are not able to communicate over Myrinet, and die with an error.
Is there a way I can tell SGE that there are only 16 endpoints on a node, so it would not allocate more than 16 MPI processes to a single node? (This seems to call for per-node consumable, which AFAIK do not exist.) Thanks for any suggestion! Riccardo -- Riccardo Murri http://www.s3it.uzh.ch/about/team/ S3IT: Services and Support for Science IT University of Zurich Winterthurerstrasse 190, CH-8057 Zürich (Switzerland) Tel: +41 44 635 4222 Fax: +41 44 635 6888 _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
