2015-09-08 12:55 GMT+02:00 Raymond Wan <[email protected]>: > > Dear all, > > I'm trying to figure out how to configure a "cluster" with a single > computer (i.e., execution and master node is the same). After I > figure this out, I hope that setting up a cluster with multiple nodes > is not too difficult. > > In particular, I think the default setting permits only a single job > per node at a time. However, I'd like to set things up so that more > than one job can run at a time. > > I'm looking at the CPU Management User and Administrator Guide [1], > and in particular, the Consumable Resources in Slurm page [2]. I hope > I'm on the right track? > > In the examples, I understand the memory (CR_Memory) example. But, I > don't quite understand the CR_CPU_Memory example. What is -N and -n? > The manpages says -N is the number of nodes...so with only one node, > that is meaningless in my case. -n is "number of tasks". Is "number > of tasks" the same as "number of CPUs"? > > Is there a reason why the example used both -N and -n and not just -n? > Do the two parameters interact somehow? > > If I have a computer with 2 cores and 10 threads each, that is 20 > CPUs. So, -n can range from 1 to 20? > > And under SelectTypeParameters, if I set CR_CPU_Memory, then a job > enters the running state if both CPU and Memory is available. > > So far, I hope I'm correct? If so, then my "real" question is that > the jobs I would like to run are mainly I/O intensive. CPU and Memory > usage is important, but the bottleneck is probably disk I/O. If I've > set up k disk partitions using object store, I'd like no more than k > jobs to run at a time and I'd like each one to write to a different > partition. > > I *think* this is "impossible" to do since it would be hard to force > users to write to one partition and not any others. But, I thought > I'd ask anyway in case there is something within SLURM that I've > missed. Any suggestions? >
You can use something like this: https://github.com/fafik23/slurm_plugins/blob/master/unshare/unshare.c Using unshare syscall/linux namespaces, and unmount specified filesystems. You can use licenses to achieve a kind of limiting number of jobs that are using specified mountpoint, but... thats not real IOPS threshold. Currently I don't how any linux mechanism that allows limitting process to specified number of I/O operations per second. At our side we've been considering writing our own fusefs with this functionality. If you are using local disks, gres may fit better than licenses... cheers, marcin
