John, Thanks for the response. We use PropagateResourceLimits=NONE and also set both hard and soft for memlock to unlimited on all compute nodes via a file in /etc/security/limits.d.
Does your site limit virtual memory or use VSizeFactor? Based on other responses I've gotten that seems to be the one thing we do that no one else is doing. The INCAR values have been my fear as the cause. I have 0 understanding of those values and as such have really no ability to help my users if their INCAR is wrong. I've been told by these users they run these same jobs just fine on the other cluster at our University, and that group does not use SLURM and also doesn't enforce virtual memory limits. Thanks, - Trey ============================= Trey Dockendorf Systems Analyst I Texas A&M University Academy for Advanced Telecommunications and Learning Technologies Phone: (979)458-2396 Email: treyd...@tamu.edu Jabber: treyd...@tamu.edu On Wed, Jan 28, 2015 at 4:08 PM, John Desantis <desan...@mail.usf.edu> wrote: > > Trey, > > We use different VASP versions both compiled with OpenMPI and IntelMPI > without an issue. > > I'd check to make sure that you're propagating limits (or not, > depending on your environment) such as RSS_MEMLOCK via slurm.conf; > that you're starting the slurm daemon with the appropriate ulimit > value; that you're not enforcing memory improperly. > > Also, this is more related to VASP than Slurm, but I've seen VASP > segfault from start if the input isn't correct in terms of CPU's > requested versus what's in INCAR (NPAR/KPAR). > > Thanks, > John DeSantis > > 2015-01-28 16:40 GMT-05:00 Ben Roberts <b...@roberts.geek.nz>: > > > > Hi Trey, > > > > We use VASP with SLURM at our facility (the NZ eScience Infrastructure). > My colleague who sorted this out for us when we had similar problems is on > vacation right now, but I’ve included him on Cc so that he can respond when > he has a moment. But my recollection is that it has something to do with > locked memory limits (ulimit -l should be set to unlimited on all compute > nodes). > > > > Jordi, would you be so kind as to respond to Trey if there’s something > else as well? Because this does sound eerily familiar to the problem that > you fixed in December. > > > > Best regards, > > Ben Roberts > > > >> On 29/01/2015, at 10:10 am, Trey Dockendorf <treyd...@tamu.edu> wrote: > >> > >> This is a long shot, but does anyone that's running SLURM know of users > that run VASP? In our Torque environment we had users that ran VASP using > OpenMPI and once we migrated to SLURM those users are unable to run VASP. > The application segfaults very early on when it starts. This occurs with > both OpenMPI and MVAPICH2. At first I thought this was due to cgroups but > our test environment had cgroups disabled and VASP still segfaulted. My > hunch is that VASP is doing something it shouldn't and the ulimits for a > SLURM job cause it problems. Like I said it's a long shot, but be > interested to know if anyone out there has users successfully running VASP > under SLURM. Feel free to ping me off-list if you wish, since this isn't > really a SLURM issue (at least I don't have evidence to think the issue is > SLURM). > >> > >> Thanks, > >> - Trey > >> > >> ============================= > >> > >> Trey Dockendorf > >> Systems Analyst I > >> Texas A&M University > >> Academy for Advanced Telecommunications and Learning Technologies > >> Phone: (979)458-2396 > >> Email: treyd...@tamu.edu > >> Jabber: treyd...@tamu.edu > >> >