Re: [slurm-users] Strange error, submission denied

2019-02-13 Thread Henkel, Andreas
Hi Marcus, More ideas: CPUs doesn’t always count as core but may take the meaning of one thread, hence makes different Maybe the behavior of CR_ONE_TASK is still not solid nor properly documente and ntasks and ntasks-per-node are honored different internally. If so solely using ntasks can

Re: [slurm-users] Strange error, submission denied

2019-02-13 Thread Marcus Wagner
Hi Chris, this are 96 thread nodes with 48 cores. You are right, that if we set it to 24, the job will get scheduled. But then, only half of the node is used. On the other side, if I only use --ntasks=48, slurm schedules all tasks onto the same node. The hyperthread of each core is included

Re: [slurm-users] Strange error, submission denied

2019-02-13 Thread Marcus Wagner
Hi Andreas, I get the same result if I set --ntasks-per-node=48 and --ntasks=48, or 96, or whatever. What we wanted to achieve is, that exactly ntasks-per-node tasks get scheduled onto one host. Best Marcus On 2/14/19 7:09 AM, Henkel, Andreas wrote: Hi Marcus, What just came to my

Re: [slurm-users] Strange error, submission denied

2019-02-13 Thread Chris Samuel
On Wednesday, 13 February 2019 4:48:05 AM PST Marcus Wagner wrote: > #SBATCH --ntasks-per-node=48 I wouldn't mind betting is that if you set that to 24 it will work, and each thread will be assigned a single core with the 2 thread units on it. All the best, Chris -- Chris Samuel :

Re: [slurm-users] Strange error, submission denied

2019-02-13 Thread Marcus Wagner
Hi all, I have narrowed this down a little bit. the really astonishing thing is, that if I use --ntasks=48 I can submit the job, it will be scheduled onto one host:    NumNodes=1 NumCPUs=48 NumTasks=48 CPUs/Task=1 ReqB:S:C:T=0:0:*:*    TRES=cpu=48,mem=182400M,node=1,billing=48 but as soon

[slurm-users] How to request ONLY one CPU instead of one socket or one node?

2019-02-13 Thread Wang, Liaoyuan
Dear there, I wrote an analytic program to analyze my data. The analysis costs around twenty days to analyze all data for one species. When I submit my job to the cluster, it always request one node instead of one CPU. I am wondering how I can ONLY request one CPU using "sbatch" command? Below

Re: [slurm-users] New Bright Cluster Slurm issue for AD users

2019-02-13 Thread John Hearns
Matthew, that deserves an explanation. Bright Computing Proof of Concept causes nightmares? That is a pretty strong assertion. Please give more details. On Wed, 13 Feb 2019 at 16:01, Matthew BETTINGER < matthew.bettin...@external.total.com> wrote: > One of the main guy Panos left Bright so no

Re: [slurm-users] Error in job_submit.lua conditional?

2019-02-13 Thread Prentice Bisbal
I managed to figure out why this conditional wasn't working: shortly before this conditional,  I had another conditional that checked for my user_id. If I was submitting a job, it would skip the rest of the job_submit.lua file. I had added this so I could test some new features out that would

Re: [slurm-users] New Bright Cluster Slurm issue for AD users

2019-02-13 Thread Matthew BETTINGER
One of the main guy Panos left Bright so no answer to your specific question but I hope you can get some support with it. We dumped our BC PoC, the sysadmin working on the PoC still has nightmares. On 2/13/19, 6:54 AM, "slurm-users on behalf of John Hearns" wrote: Yugendra, the

[slurm-users] [Slurm 18.08.5-2] Fixing Slurm Bug DB Entries

2019-02-13 Thread Chance Bryce Carl Nelson
Hi All, Due to the CPU time bug in the last release, we currently have a large amount of malformed entries in our database. I have attempted to follow the instructions from a user on this bug report to fix the db entries, specifically: 1. Integer

Re: [slurm-users] Slurmd not starting

2019-02-13 Thread Nathalie Gocht
Both folders exists under /var/spool/: drwxr-xr-x 4 slurm slurm 4096 Feb 13 15:34 slurmctld drwxr-xr-x 2 slurm slurm 4096 Feb 13 14:05 slurmd Thank you for the tip, thinking about setting the slurm user to root. Von: slurm-users Im Auftrag von Antony Cleave Gesendet: Mittwoch, 13. Februar

Re: [slurm-users] Slurmd not starting

2019-02-13 Thread Nathalie Gocht
-Ursprüngliche Nachricht- Von: slurm-users Im Auftrag von Ole Holm Nielsen Gesendet: Mittwoch, 13. Februar 2019 15:10 An: slurm-users@lists.schedmd.com Betreff: Re: [slurm-users] Slurmd not starting Hi Nathalie, Which Slurm version and which OS version are you using? > I use version

Re: [slurm-users] Slurmd not starting

2019-02-13 Thread Gennaro Oliva
Hi Nathalie, On Wed, Feb 13, 2019 at 01:58:51PM +, Nathalie Gocht wrote: > I am building up a one node cluster. Master and node are n the same machine. > My slurm.conf: if you are using the Debian or Ubuntu package you may want to use the default locations for these systems: >

Re: [slurm-users] Slurmd not starting

2019-02-13 Thread Antony Cleave
there is very very a strong likelyhood that you have configured SlurmdUser=slurm and one of the following 1) there is no /var/spool/slurmd folder 2) the /var/spool/slurmd folder exists but is owned by root make sure it exists and is owned by whatever SlurmdUser is set to or change your

Re: [slurm-users] Slurmd not starting

2019-02-13 Thread Ole Holm Nielsen
Hi Nathalie, Which Slurm version and which OS version are you using? FYI: My Slurm Wiki contains all the details of setting up Slurm on CentOS 7: https://wiki.fysik.dtu.dk/niflheim/SLURM Best regards, Ole On 2/13/19 2:58 PM, Nathalie Gocht wrote: Hey, I am building up a one node cluster.

[slurm-users] Slurmd not starting

2019-02-13 Thread Nathalie Gocht
Hey, I am building up a one node cluster. Master and node are n the same machine. My slurm.conf: ControlMachine=bayes # MpiDefault=none ProctrackType=proctrack/pgid ReturnToService=1 SlurmctldPidFile=/var/run/slurm-llnl/slurmctld.pid SlurmctldPort=6817

Re: [slurm-users] New Bright Cluster Slurm issue for AD users

2019-02-13 Thread Antony Cleave
one method I've used a lot of times in bright is to integrate a compute node in the same way as the master and logins (I normally use realm join...) and then grab the changes back into to the image in cmsh, If you are worried you can clone into a new image Then you can make sure your compute

Re: [slurm-users] New Bright Cluster Slurm issue for AD users

2019-02-13 Thread Yugendra Guvvala
Thanks Guys. I will go through all resources and report back how it goes. Thanks, Yugi > On Feb 13, 2019, at 7:58 AM, John Hearns wrote: > > please have a look at section 6.3 of the Bright Admin Manual > You have run updateprovisioners then rebooted the nodes? > > > Configuring The Cluster

Re: [slurm-users] New Bright Cluster Slurm issue for AD users

2019-02-13 Thread John Hearns
please have a look at section 6.3 of the Bright Admin Manual You have run updateprovisioners then rebooted the nodes? Configuring The Cluster To Authenticate Against An External LDAP Server The cluster can be configured in different ways to authenticate against an external LDAP server. For smaller

Re: [slurm-users] New Bright Cluster Slurm issue for AD users

2019-02-13 Thread John Hearns
Yugendra, the Bright support guys are excellent. Slurm is their default choice. I would ask again. Yes, Slurm is technically out of scope for them, but they shoudl help a bit. By the way, I think your problem is that you have configured authentication using AD on your head node. BUT you have not

Re: [slurm-users] New Bright Cluster Slurm issue for AD users

2019-02-13 Thread Antony Cleave
Can you ssh in as root and the su to the AD user to make sure that the node is integrated correctly? If you cannot su to an AD user on the node then Slurm will not be able to resolve the UID either as they use the same methods. On Wed, 13 Feb 2019, 12:35 Yugendra Guvvala, wrote: > No, we can’t

Re: [slurm-users] New Bright Cluster Slurm issue for AD users

2019-02-13 Thread Loris Bennett
Yugendra Guvvala writes: >> On Feb 13, 2019, at 1:50 AM, Loris Bennett >> wrote: >> >> Yugendra Guvvala writes: >> >>> Hi, >>> >>> We are bringing a new cluster online. We installed SLURM through Bright >>> Cluster Manager how ever we are running into a issue here. >>> >>> We are

[slurm-users] Strange error, submission denied

2019-02-13 Thread Marcus Wagner
Hi all, I have a strange behaviour here. We are using slurm 18.08.5-2 on CentOS 7.6. Let me first describe our computenodes: NodeName=ncm[0001-1032]  CPUs=48  Sockets=4 CoresPerSocket=12 ThreadsPerCore=2 RealMemory=185000 Feature=skx8160,hostok,hpcwork    Weight=10541

Re: [slurm-users] New Bright Cluster Slurm issue for AD users

2019-02-13 Thread Ole Holm Nielsen
On 2/13/19 1:14 PM, Yugendra Guvvala wrote: Thank you, this is strange. Is there a way to integrate AD authentication with SLURM or Munge. Or allow all users who login to run jobs with out any restrictions . Integration of Slurm with LDAP accounts have been implemented at EPFL in

Re: [slurm-users] New Bright Cluster Slurm issue for AD users

2019-02-13 Thread Yugendra Guvvala
Also reached out to bright computing support and they say slurm is out of scope for them. Thanks, Yugi > On Feb 13, 2019, at 7:27 AM, Antony Cleave wrote: > > can you ssh to the compute node that job was trying to run on as as the AD > user in question? > > I've seen similar issues on AD

Re: [slurm-users] New Bright Cluster Slurm issue for AD users

2019-02-13 Thread Yugendra Guvvala
No, we can’t ssh to compute nodes. And this is by design that no one should be able to ssh to compute nodes other than root. I figure that munge is not configured for AD. We have configured our login image for AD and slurm and mung configurations are on head node. Not sure how to integrate

Re: [slurm-users] New Bright Cluster Slurm issue for AD users

2019-02-13 Thread Antony Cleave
can you ssh to the compute node that job was trying to run on as as the AD user in question? I've seen similar issues on AD integrated systems where some nodes boot from a different image that have not yet been joined to the domain. Antony On Wed, 13 Feb 2019 at 04:58, Yugendra Guvvala <

Re: [slurm-users] New Bright Cluster Slurm issue for AD users

2019-02-13 Thread Yugendra Guvvala
Hi Loris, Thank you, this is strange. Is there a way to integrate AD authentication with SLURM or Munge. Or allow all users who login to run jobs with out any restrictions . Thanks, Yugi > On Feb 13, 2019, at 1:50 AM, Loris Bennett wrote: > > Yugendra Guvvala writes: > >> Hi, >> >> We