On 10/04/17 21:08, Oliver Grant wrote:
> We did not have a gres.conf file. I've created one:
> cat /cm/shared/apps/slurm/var/etc/gres.conf
> # Configure support for our four GPU
> NodeName=node[001-018] Name=gpu File=/dev/nvidia[0-3]
>
> I've read about "global" and "per-node" gres.conf, but I
Hi folks,
We're looking at wait times on our clusters historically but would like
to be able to distinguish jobs that had long wait times due to
dependencies rather than just waiting for resources (or because the user
had too many other jobs in the queue at that time).
A quick 'git grep' of the
On 11/04/17 17:42, Andrea del Monaco wrote:
> [2017-04-11T08:22:03+02:00] error: Error opening file
> /cm/shared/apps/slurm/var/cm/statesave/job.830332/script, No such file
> or directory
> [2017-04-11T08:22:03+02:00] error: Error opening file
>
Hello Thomas and others,
thanks again for the feedback. I agree, i don't actually need Slurm for
my small-scale cluster.
However it's part of the baseline-assignment i'm working on to use as
much hpc-established software as possible.
For now i settled with expanding the standard
On modern systems, nscd or nslcd should have been replaced by sssd. sssd has
much better caching then the older services.
Am 11.04.2017 um 17:17 schrieb Benjamin Redling:
>
> AFAIK most request never hit LDAP servers.
> In production there is always a cache on the client side -- nscd might
>
AFAIK most request never hit LDAP servers.
In production there is always a cache on the client side -- nscd might
have issue, but that's another story.
Regards,
Benjamin
On 2017-04-11 15:32, Grigory Shamov wrote:
> On a larger cluster, deploying NIS, LDAP etc. might require some
> thought,
Are you sure you need /etc/password. NIS, LDAP or whatever?
For the simple case you can often get away with users only having UIDs on
compute nodes with no matching username.
By the way if using /etc/passwd then I would suggest you use *clush useradd
*or equivalent rather than copy /etc/passwd,
Hi;
We are using slurm-16.05.5. I am trying to set dynamic licenses
according to "Licenses Guide" as follows:
sacctmgr add resource name=matlab count=10 server=flex5
servertype=flexlm type=license percentallowed=100
sacctmgr shows license:
sacctmgr show resource
Name Server
On Tuesday 11 April 2017 08:17:00 Raymond Wan wrote:
>
> Dear all,
>
> Thank you all of you for the many helpful alternatives!
>
> Unfortunately, system administration isn't my main responsibility so
> I'm (regrettably) not very good at it and have found LDAP on Ubuntu to
> be very unfriendly
Am 11. April 2017 08:21:31 MESZ, schrieb Uwe Sauter :
>
>Ray,
>
>if you're going with the easy "copy" method just be sure that the nodes
>are all in the same state (user management-wise) before
>you do your first copy. Otherwise you might accidentally delete already
Hello There,
Some of the jobs crashes without any apparent valid reason:
Logs are the following:
Controller:
[2017-04-11T08:22:03+02:00] debug2: Processing RPC: MESSAGE_EPILOG_COMPLETE
uid=0
[2017-04-11T08:22:03+02:00] debug2: _slurm_rpc_epilog_complete JobId=830468
Node=cnode001 usec=60
Ray,
if you're going with the easy "copy" method just be sure that the nodes are all
in the same state (user management-wise) before
you do your first copy. Otherwise you might accidentally delete already
existing users.
I also encourage you to have a look into Ansible which makes it easy to
Dear all,
Thank you all of you for the many helpful alternatives!
Unfortunately, system administration isn't my main responsibility so
I'm (regrettably) not very good at it and have found LDAP on Ubuntu to
be very unfriendly to set up. I do understand that it must be a good
solution for a
On 11 April 2017 at 02:36, Raymond Wan wrote:
>
> For SLURM to work, I understand from web pages such as
> https://slurm.schedmd.com/accounting.html that UIDs need to be shared
> across nodes. Based on this web page, it seems sharing /etc/passwd
> between nodes appears
14 matches
Mail list logo