[slurm-dev] Re: Jobs submitted simultaneously go on the same GPU

2017-04-11 Thread Christopher Samuel
On 10/04/17 21:08, Oliver Grant wrote: > We did not have a gres.conf file. I've created one: > cat /cm/shared/apps/slurm/var/etc/gres.conf > # Configure support for our four GPU > NodeName=node[001-018] Name=gpu File=/dev/nvidia[0-3] > > I've read about "global" and "per-node" gres.conf, but I

[slurm-dev] Distinguishing past jobs that waited due to dependencies vs resources?

2017-04-11 Thread Christopher Samuel
Hi folks, We're looking at wait times on our clusters historically but would like to be able to distinguish jobs that had long wait times due to dependencies rather than just waiting for resources (or because the user had too many other jobs in the queue at that time). A quick 'git grep' of the

[slurm-dev] Re: Randomly jobs failures

2017-04-11 Thread Christopher Samuel
On 11/04/17 17:42, Andrea del Monaco wrote: > [2017-04-11T08:22:03+02:00] error: Error opening file > /cm/shared/apps/slurm/var/cm/statesave/job.830332/script, No such file > or directory > [2017-04-11T08:22:03+02:00] error: Error opening file >

[slurm-dev] Re: Re:Best Way to Schedule Jobs based on predetermined Lists

2017-04-11 Thread maviko . wagner
Hello Thomas and others, thanks again for the feedback. I agree, i don't actually need Slurm for my small-scale cluster. However it's part of the baseline-assignment i'm working on to use as much hpc-established software as possible. For now i settled with expanding the standard

[slurm-dev] Re: LDAP required?

2017-04-11 Thread Uwe Sauter
On modern systems, nscd or nslcd should have been replaced by sssd. sssd has much better caching then the older services. Am 11.04.2017 um 17:17 schrieb Benjamin Redling: > > AFAIK most request never hit LDAP servers. > In production there is always a cache on the client side -- nscd might >

[slurm-dev] Re: LDAP required?

2017-04-11 Thread Benjamin Redling
AFAIK most request never hit LDAP servers. In production there is always a cache on the client side -- nscd might have issue, but that's another story. Regards, Benjamin On 2017-04-11 15:32, Grigory Shamov wrote: > On a larger cluster, deploying NIS, LDAP etc. might require some > thought,

[slurm-dev] Re: LDAP required?

2017-04-11 Thread Daniel Kidger
Are you sure you need /etc/password. NIS, LDAP or whatever? For the simple case you can often get away with users only having UIDs on compute nodes with no matching username. By the way if using /etc/passwd then I would suggest you use *clush useradd *or equivalent rather than copy /etc/passwd,

[slurm-dev] Slurm license management

2017-04-11 Thread mercanca
Hi; We are using slurm-16.05.5. I am trying to set dynamic licenses according to "Licenses Guide" as follows: sacctmgr add resource name=matlab count=10 server=flex5 servertype=flexlm type=license percentallowed=100 sacctmgr shows license: sacctmgr show resource Name Server

[slurm-dev] Re: LDAP required?

2017-04-11 Thread Markus Koeberl
On Tuesday 11 April 2017 08:17:00 Raymond Wan wrote: > > Dear all, > > Thank you all of you for the many helpful alternatives! > > Unfortunately, system administration isn't my main responsibility so > I'm (regrettably) not very good at it and have found LDAP on Ubuntu to > be very unfriendly

[slurm-dev] Re: LDAP required?

2017-04-11 Thread Benjamin Redling
Am 11. April 2017 08:21:31 MESZ, schrieb Uwe Sauter : > >Ray, > >if you're going with the easy "copy" method just be sure that the nodes >are all in the same state (user management-wise) before >you do your first copy. Otherwise you might accidentally delete already

[slurm-dev] Randomly jobs failures

2017-04-11 Thread Andrea del Monaco
Hello There, Some of the jobs crashes without any apparent valid reason: Logs are the following: Controller: [2017-04-11T08:22:03+02:00] debug2: Processing RPC: MESSAGE_EPILOG_COMPLETE uid=0 [2017-04-11T08:22:03+02:00] debug2: _slurm_rpc_epilog_complete JobId=830468 Node=cnode001 usec=60

[slurm-dev] Re: LDAP required?

2017-04-11 Thread Uwe Sauter
Ray, if you're going with the easy "copy" method just be sure that the nodes are all in the same state (user management-wise) before you do your first copy. Otherwise you might accidentally delete already existing users. I also encourage you to have a look into Ansible which makes it easy to

[slurm-dev] Re: LDAP required?

2017-04-11 Thread Raymond Wan
Dear all, Thank you all of you for the many helpful alternatives! Unfortunately, system administration isn't my main responsibility so I'm (regrettably) not very good at it and have found LDAP on Ubuntu to be very unfriendly to set up. I do understand that it must be a good solution for a

[slurm-dev] Re: LDAP required?

2017-04-11 Thread Lachlan Musicman
On 11 April 2017 at 02:36, Raymond Wan wrote: > > For SLURM to work, I understand from web pages such as > https://slurm.schedmd.com/accounting.html that UIDs need to be shared > across nodes. Based on this web page, it seems sharing /etc/passwd > between nodes appears