[slurm-dev] Re: I am confused about slurm user

2016-11-16 Thread Peixin Qiao
Thanks for your help. I installed slurm and start it successfully.

Now I need to read scheduling code and think about how to plugin plan-based
scheduling.

Have a good night.

Best Regards,
Peixin

On Tue, Nov 15, 2016 at 1:57 AM, Alexandre Strube 
wrote:

> If you are installing from Ubuntu packages, it creates the user for you.
>
> If you are installing from source, you must create a user with useradd or
> adduser.
>
> Then you chown the file to this user. Remove the colon, like
>
> chown slurm /var/spool/slurmctld /var/log/slurm
>
>
> []s
> Alexandre Strube
>
> On 14 Nov 2016, at 23:44, Peixin Qiao  wrote:
>
> Hi experts,
>
> When I install and start slurm on Ubuntu as http://slurm.schedmd.com/
> quickstart_admin.html, I am confused about step 7:
>
> NOTE: The *SlurmUser* must exist prior to starting Slurm and must exist
> on all nodes of the cluster.
> NOTE: The parent directories for Slurm's log files, process ID files,
> state save directories, etc. are not created by Slurm. They must be created
> and made writable by *SlurmUser* as needed prior to starting Slurm
> daemons.
>
> How to create slurmuser before creating those directories for slurm's log
> files..?
>
> When I create those directories as https://wiki.fysik.dtu.dk/
> niflheim/SLURM:
>
> mkdir /var/spool/slurmctld /var/log/slurm
> chown slurm: /var/spool/slurmctld /var/log/slurm
> chmod 755 /var/spool/slurmctld /var/log/slurm
>
>
> The command line shows:
> chown: invalid spec: "slurm:"
>
> Could any expert explain the meaning of step 7?
>
> Best Regards,
> Peixin
>
>


[slurm-dev] Re: Gres issue

2016-11-16 Thread Christopher Samuel

On 17/11/16 11:31, Christopher Samuel wrote:

> It depends on the library used to pass options,

Oops - that should be parse, not pass.

Need more caffeine..

-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Re: Gres issue

2016-11-16 Thread Christopher Samuel

On 17/11/16 00:04, Michael Di Domenico wrote:

> this might be nothing, but i usually call --gres with an equals
> 
> srun --gres=gpu:k10:8
> 
> i'm not sure if the equals is optional or not

It depends on the library used to pass options, I'm used to it being
mandatory but apparently with Slurm it's not - just tested it out and using:

--gres mic

results in my job being scheduled on a Phi node with OFFLOAD_DEVICES=0
set in its environment.

All the best,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Re: Gres issue

2016-11-16 Thread Michael Di Domenico

this might be nothing, but i usually call --gres with an equals

srun --gres=gpu:k10:8

i'm not sure if the equals is optional or not



On Wed, Nov 16, 2016 at 4:34 AM, Dmitrij S. Kryzhevich  wrote:
>
> Hi,
>
> I have some issues with gres usage. I'm running slurm of 16.05.4 version and
> I have a small stand with 4 nodes+master. The best description of it would
> be to paste confs:
> slurm.conf: http://paste.org.ru/?m8v7ca
> gres.conf: http://paste.org.ru/?ouspnz
> They are populated on each node.
>
> And the problem is following:
>
> [dkryzhevich@gpu ~]$ srun -N 1 --gres gpu:c2050 
> srun: error: Unable to allocate resources: Requested node configuration is
> not available
> [dkryzhevich@gpu ~]$
>
> Relevant logs: http://paste.org.ru/?mj4dfs
> Whatever I did with --gres flag it just does not start. What am I missing
> here?
>
> I tried to remove Type column from gres.conf and all nodes have gone into
> "drain" state. I tried to remove all details from Gres column in slurm.conf
> in addition (i.e. "NodeName=node2 Gres=gpu:1 CoresPerSocket=2
> ThreadsPerCore=2 State=UNKNOWN") and task was submitted but I want the
> ability to specify type of card in case I really need it.
>
> And two small unrelevant questions.
> 1. Is it possible to submit a job from any node, or is it master only? Start
> secondary slurmctl daemon on each node may be, I don't know.
> 2. Is it possible to start a job on two separate nodes with nvidia cards in
> a way something like
> $ srun --gres gpu:2
> ? The point is to use 2-3-4 cards installed on different nodes with some MPI
> connection between threads.
>
> BR,
> Dmitrij


[slurm-dev] Re: Using slurm to control container images?

2016-11-16 Thread John Hearns
Lachlan,
I am sure it has been mentioned on this thread, but look at Singularity 
http://singularity.lbl.gov/

From: Lachlan Musicman [mailto:data...@gmail.com]
Sent: 16 November 2016 01:45
To: slurm-dev 
Subject: [slurm-dev] Re: Using slurm to control container images?

Yes, rkt was probably my preferred option. The researchers I work with aren't 
necessarily up to date with what's best practice wrt to this area, so docker is 
what they know best by virtue of branding/promotion. I don't mind which is used 
in a solution, if any. But yes, rkt would be my preference.
Cheers
L.

--
The most dangerous phrase in the language is, "We've always done it this way."

- Grace Hopper

On 16 November 2016 at 12:29, Jean Chassoul 
> wrote:
Hi,

Just wondering, have you consider rkt? I wonder if you run pip inside 
virtualenv's if that is the case the switch to a container with rkt seems 
"normal" instead of a more intrusive one all mighty process to rule everything 
that docker had the last time I check, its probably better now.

Saludos.
Jean

On Tue, Nov 15, 2016 at 8:21 PM, Lachlan Musicman 
> wrote:
Hola,
We were looking for the ability to make jobs perfectly reproducible - while the 
system is set up with environment modules with the increasing number of package 
management tools - pip/conda; npm; CRAN/Bioconductor - and people building 
increasingly more complex software stacks, our users have started asking about 
containerization and slurm.
I have found a discussion on this list from about a year ago

https://groups.google.com/d/msg/slurm-devel/oPmz5em5tAA/BYlDDfRDzTgJ
which mentioned a tool that's not been updated since and one called Shifter by 
NERSC, which is Cray specific?.
Has anyone tried Shifter out and has there been any movement on this? I presume 
the licensing issues remain.
Cheers
L.
--
The most dangerous phrase in the language is, "We've always done it this way."

- Grace Hopper


Any views or opinions presented in this email are solely those of the author 
and do not necessarily represent those of the company. Employees of XMA Ltd are 
expressly required not to make defamatory statements and not to infringe or 
authorise any infringement of copyright or any other legal right by email 
communications. Any such communication is contrary to company policy and 
outside the scope of the employment of the individual concerned. The company 
will not accept any liability in respect of such communication, and the 
employee responsible will be personally liable for any damages or other 
liability arising. XMA Limited is registered in England and Wales (registered 
no. 2051703). Registered Office: Wilford Industrial Estate, Ruddington Lane, 
Wilford, Nottingham, NG11 7EP


[slurm-dev] Gres issue

2016-11-16 Thread Dmitrij S. Kryzhevich


Hi,

I have some issues with gres usage. I'm running slurm of 16.05.4 version 
and I have a small stand with 4 nodes+master. The best description of it 
would be to paste confs:

slurm.conf: http://paste.org.ru/?m8v7ca
gres.conf: http://paste.org.ru/?ouspnz
They are populated on each node.

And the problem is following:

[dkryzhevich@gpu ~]$ srun -N 1 --gres gpu:c2050 
srun: error: Unable to allocate resources: Requested node configuration 
is not available

[dkryzhevich@gpu ~]$

Relevant logs: http://paste.org.ru/?mj4dfs
Whatever I did with --gres flag it just does not start. What am I 
missing here?


I tried to remove Type column from gres.conf and all nodes have gone 
into "drain" state. I tried to remove all details from Gres column in 
slurm.conf in addition (i.e. "NodeName=node2 Gres=gpu:1 CoresPerSocket=2 
ThreadsPerCore=2 State=UNKNOWN") and task was submitted but I want the 
ability to specify type of card in case I really need it.


And two small unrelevant questions.
1. Is it possible to submit a job from any node, or is it master only? 
Start secondary slurmctl daemon on each node may be, I don't know.
2. Is it possible to start a job on two separate nodes with nvidia cards 
in a way something like

$ srun --gres gpu:2
? The point is to use 2-3-4 cards installed on different nodes with some 
MPI connection between threads.


BR,
Dmitrij