[slurm-dev] Re: I am confused about slurm user
Thanks for your help. I installed slurm and start it successfully. Now I need to read scheduling code and think about how to plugin plan-based scheduling. Have a good night. Best Regards, Peixin On Tue, Nov 15, 2016 at 1:57 AM, Alexandre Strubewrote: > If you are installing from Ubuntu packages, it creates the user for you. > > If you are installing from source, you must create a user with useradd or > adduser. > > Then you chown the file to this user. Remove the colon, like > > chown slurm /var/spool/slurmctld /var/log/slurm > > > []s > Alexandre Strube > > On 14 Nov 2016, at 23:44, Peixin Qiao wrote: > > Hi experts, > > When I install and start slurm on Ubuntu as http://slurm.schedmd.com/ > quickstart_admin.html, I am confused about step 7: > > NOTE: The *SlurmUser* must exist prior to starting Slurm and must exist > on all nodes of the cluster. > NOTE: The parent directories for Slurm's log files, process ID files, > state save directories, etc. are not created by Slurm. They must be created > and made writable by *SlurmUser* as needed prior to starting Slurm > daemons. > > How to create slurmuser before creating those directories for slurm's log > files..? > > When I create those directories as https://wiki.fysik.dtu.dk/ > niflheim/SLURM: > > mkdir /var/spool/slurmctld /var/log/slurm > chown slurm: /var/spool/slurmctld /var/log/slurm > chmod 755 /var/spool/slurmctld /var/log/slurm > > > The command line shows: > chown: invalid spec: "slurm:" > > Could any expert explain the meaning of step 7? > > Best Regards, > Peixin > >
[slurm-dev] Re: Gres issue
On 17/11/16 11:31, Christopher Samuel wrote: > It depends on the library used to pass options, Oops - that should be parse, not pass. Need more caffeine.. -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci
[slurm-dev] Re: Gres issue
On 17/11/16 00:04, Michael Di Domenico wrote: > this might be nothing, but i usually call --gres with an equals > > srun --gres=gpu:k10:8 > > i'm not sure if the equals is optional or not It depends on the library used to pass options, I'm used to it being mandatory but apparently with Slurm it's not - just tested it out and using: --gres mic results in my job being scheduled on a Phi node with OFFLOAD_DEVICES=0 set in its environment. All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci
[slurm-dev] Re: Gres issue
this might be nothing, but i usually call --gres with an equals srun --gres=gpu:k10:8 i'm not sure if the equals is optional or not On Wed, Nov 16, 2016 at 4:34 AM, Dmitrij S. Kryzhevichwrote: > > Hi, > > I have some issues with gres usage. I'm running slurm of 16.05.4 version and > I have a small stand with 4 nodes+master. The best description of it would > be to paste confs: > slurm.conf: http://paste.org.ru/?m8v7ca > gres.conf: http://paste.org.ru/?ouspnz > They are populated on each node. > > And the problem is following: > > [dkryzhevich@gpu ~]$ srun -N 1 --gres gpu:c2050 > srun: error: Unable to allocate resources: Requested node configuration is > not available > [dkryzhevich@gpu ~]$ > > Relevant logs: http://paste.org.ru/?mj4dfs > Whatever I did with --gres flag it just does not start. What am I missing > here? > > I tried to remove Type column from gres.conf and all nodes have gone into > "drain" state. I tried to remove all details from Gres column in slurm.conf > in addition (i.e. "NodeName=node2 Gres=gpu:1 CoresPerSocket=2 > ThreadsPerCore=2 State=UNKNOWN") and task was submitted but I want the > ability to specify type of card in case I really need it. > > And two small unrelevant questions. > 1. Is it possible to submit a job from any node, or is it master only? Start > secondary slurmctl daemon on each node may be, I don't know. > 2. Is it possible to start a job on two separate nodes with nvidia cards in > a way something like > $ srun --gres gpu:2 > ? The point is to use 2-3-4 cards installed on different nodes with some MPI > connection between threads. > > BR, > Dmitrij
[slurm-dev] Re: Using slurm to control container images?
Lachlan, I am sure it has been mentioned on this thread, but look at Singularity http://singularity.lbl.gov/ From: Lachlan Musicman [mailto:data...@gmail.com] Sent: 16 November 2016 01:45 To: slurm-devSubject: [slurm-dev] Re: Using slurm to control container images? Yes, rkt was probably my preferred option. The researchers I work with aren't necessarily up to date with what's best practice wrt to this area, so docker is what they know best by virtue of branding/promotion. I don't mind which is used in a solution, if any. But yes, rkt would be my preference. Cheers L. -- The most dangerous phrase in the language is, "We've always done it this way." - Grace Hopper On 16 November 2016 at 12:29, Jean Chassoul > wrote: Hi, Just wondering, have you consider rkt? I wonder if you run pip inside virtualenv's if that is the case the switch to a container with rkt seems "normal" instead of a more intrusive one all mighty process to rule everything that docker had the last time I check, its probably better now. Saludos. Jean On Tue, Nov 15, 2016 at 8:21 PM, Lachlan Musicman > wrote: Hola, We were looking for the ability to make jobs perfectly reproducible - while the system is set up with environment modules with the increasing number of package management tools - pip/conda; npm; CRAN/Bioconductor - and people building increasingly more complex software stacks, our users have started asking about containerization and slurm. I have found a discussion on this list from about a year ago https://groups.google.com/d/msg/slurm-devel/oPmz5em5tAA/BYlDDfRDzTgJ which mentioned a tool that's not been updated since and one called Shifter by NERSC, which is Cray specific?. Has anyone tried Shifter out and has there been any movement on this? I presume the licensing issues remain. Cheers L. -- The most dangerous phrase in the language is, "We've always done it this way." - Grace Hopper Any views or opinions presented in this email are solely those of the author and do not necessarily represent those of the company. Employees of XMA Ltd are expressly required not to make defamatory statements and not to infringe or authorise any infringement of copyright or any other legal right by email communications. Any such communication is contrary to company policy and outside the scope of the employment of the individual concerned. The company will not accept any liability in respect of such communication, and the employee responsible will be personally liable for any damages or other liability arising. XMA Limited is registered in England and Wales (registered no. 2051703). Registered Office: Wilford Industrial Estate, Ruddington Lane, Wilford, Nottingham, NG11 7EP
[slurm-dev] Gres issue
Hi, I have some issues with gres usage. I'm running slurm of 16.05.4 version and I have a small stand with 4 nodes+master. The best description of it would be to paste confs: slurm.conf: http://paste.org.ru/?m8v7ca gres.conf: http://paste.org.ru/?ouspnz They are populated on each node. And the problem is following: [dkryzhevich@gpu ~]$ srun -N 1 --gres gpu:c2050 srun: error: Unable to allocate resources: Requested node configuration is not available [dkryzhevich@gpu ~]$ Relevant logs: http://paste.org.ru/?mj4dfs Whatever I did with --gres flag it just does not start. What am I missing here? I tried to remove Type column from gres.conf and all nodes have gone into "drain" state. I tried to remove all details from Gres column in slurm.conf in addition (i.e. "NodeName=node2 Gres=gpu:1 CoresPerSocket=2 ThreadsPerCore=2 State=UNKNOWN") and task was submitted but I want the ability to specify type of card in case I really need it. And two small unrelevant questions. 1. Is it possible to submit a job from any node, or is it master only? Start secondary slurmctl daemon on each node may be, I don't know. 2. Is it possible to start a job on two separate nodes with nvidia cards in a way something like $ srun --gres gpu:2 ? The point is to use 2-3-4 cards installed on different nodes with some MPI connection between threads. BR, Dmitrij