Re: [slurm-users] howto list/get all scripts run by a job?

2020-06-19 Thread Adrian Sevcenco
On 6/19/20 12:35 PM, mercan wrote: Hi; For running jobs, you can get the running script with using: scontrol write batch_script  "$SLURM_JOBID" - wow, thanks a lot!!! Adrian command. the - parameter reqired for screen output. Ahmet M. On 19.06.2020 12:25, Adrian Sevcenco wrote: On

Re: [slurm-users] Slurm and shared file systems

2020-06-19 Thread Steven Dick
Condor's original premise was to have long running compute jobs on distributed nodes with no shared filesystem. Of course, they played all kinds of dirty tricks to make this work including intercepted libc and system calls. I see no reason cleverly wrapped slurm jobs coudln't do the same, either

Re: [slurm-users] howto list/get all scripts run by a job?

2020-06-19 Thread Adrian Sevcenco
On 6/18/20 9:35 AM, Loris Bennett wrote: Hi Adrain, Hi Adrian Sevcenco writes: Hi! I'm trying to retrieve the actual executable of jobs but i did not find how to do it .. i would like to found this for both case when the job is started with sbatch or with srun. For running jobs:

Re: [slurm-users] Slurm and shared file systems

2020-06-19 Thread Riebs, Andy
David, I've been using Slurm for nearly 20 years, and while I can imagine some clever work-arounds, like staging your job in /var/tmp on all of the nodes before trying to run it, it's hard to imagine a cluster serving a useful purpose without a shared user file system, whether or not Slurm is

Re: [slurm-users] howto list/get all scripts run by a job?

2020-06-19 Thread mercan
Hi; For running jobs, you can get the running script with using: scontrol write batch_script  "$SLURM_JOBID" - command. the - parameter reqired for screen output. Ahmet M. On 19.06.2020 12:25, Adrian Sevcenco wrote: On 6/18/20 9:35 AM, Loris Bennett wrote: Hi Adrain, Hi Adrian Sevcenco

Re: [slurm-users] howto list/get all scripts run by a job?

2020-06-19 Thread mercan
But don't forget, if there aren't a script you can not get running script such as salloc jobs. Ahmet M. On 19.06.2020 12:39, Adrian Sevcenco wrote: On 6/19/20 12:35 PM, mercan wrote: Hi; For running jobs, you can get the running script with using: scontrol write batch_script 

[slurm-users] Slurm and shared file systems

2020-06-19 Thread David Baker
Hello, We are currently helping a research group to set up their own Slurm cluster. They have asked a very interesting question about Slurm and file systems. That is, they are posing the question -- do you need a shared user file store on a Slurm cluster? So, in the extreme case where this is

Re: [slurm-users] ssh-keys on compute nodes?

2020-06-19 Thread Brian Andrus
Nice write-up Ole! I especially like the statement (emphasis added): For security reasons it is strongly recommended*not*to include the Slurm serversslurmctld andslurmdbd hosts in theHost-based_Authentication

Re: [slurm-users] ssh-keys on compute nodes?

2020-06-19 Thread Mark Hahn
The host-based SSH authentication is a good idea, but only inside the cluster's security perimeter, and one should not trust computers external to the cluster nodes in this way. Even more than that! Hostbased allows you to define intersecting sets of asymmetric trust. For instance, usually

Re: [slurm-users] Slurm and shared file systems

2020-06-19 Thread Brian Andrus
It sounds like you are asking if there should be a shared /home, which you do not need. You do need to ensure a user can access the environment for the node (a home directory, ssh keys, etc). If you are asking about the job binary and the data it will be processing, again, you do not. You

Re: [slurm-users] [EXT] Set a per-cluster default limit of the number of active cores per user at a time

2020-06-19 Thread Sean Crosby
Hi Paddy, Why don't you add new QoS's and add them as partition QoS for each partition, and then set the defaults on those partition QoS? Like sacctmgr add qos cloud PartitionName=cloud Nodes=node[1-6] Default=YES MaxTime=30-0 DefaultTime=0:10:0 State=DOWN QoS=cloud That way you could have

Re: [slurm-users] Slurm and shared file systems

2020-06-19 Thread Alex Chekholko
Hi David, There are several approaches to have a shared filesystem namespace without an actual shared filesystem. One issue you will have to contend with is how to handle any kind of filesystem caching (how much room to allocate for local cache, how to handle cache inconsistencies). examples:

[slurm-users] Set a per-cluster default limit of the number of active cores per user at a time

2020-06-19 Thread Paddy Doyle
Hi all, I've been trying to understand how to properly set a limit on the number of cores a user (or an association is fine either) can have in use at any one time. Ideally, I'd like to be able to set a default value once for the cluster, and then have it inherit down to lots of associations and

Re: [slurm-users] ssh-keys on compute nodes?

2020-06-19 Thread Ole Holm Nielsen
On 19-06-2020 18:55, Mark Hahn wrote: The host-based SSH authentication is a good idea, but only inside the cluster's security perimeter, and one should not trust computers external to the cluster nodes in this way. Even more than that!  Hostbased allows you to define intersecting sets of