[slurm-dev] Re: How to define multiple drives as Gres?

Jan Schulze Thu, 25 Jun 2015 00:42:53 -0700

Very good this seems to work thanks. One additional question:

As the gres as discussed below refer to actual disk space, I wonder what will 
happen to the job if it tries to write more data in the respective directory 
than it initially allocated in the job script? Or in other words: I do not 
quite get how the number in --gres=fast:30 is translated into actually used 
disk space (as this number could represent anything). So is this configuration 
exclusively a means for scheduling?


greetings

Jan



On Jun 23, 2015, at 2:17 PM, Aaron Knister wrote:

> 
> Hi Jan,
> 
> Apologies for the delay. It looks like SLURM
> versions less than 15.08 only support 32-bit gres values as you noticed (I 
> took a peek at the code). Perhaps a work around would be to do away with the 
> suffixes and append "_gb" to the GRES name (e.g disk_gb). Once you have 
> support for 64-but counters you could always change this later and use a 
> submission filter to provide backwards compatibility.
> 
> Hope that helps!
> 
> -Aaron
> 
> Sent from my iPhone
> 
>> On Jun 23, 2015, at 5:08 AM, Jan Schulze <[email protected]> wrote:
>> 
>> 
>> Dear all,
>> I still have the here mentioned problem. Did someone of you experience 
>> similar problems with disk related gres? Is there a trivial point which I 
>> missed so far?
>> 
>> Thanks in advance.
>> 
>> greetings
>> 
>> Jan
>> 
>> 
>> 
>> 
>>> On Jun 15, 2015, at 10:06 AM,  wrote:
>>> 
>>> Hi Aaron,
>>> thanks for the quick response. You are right, I'd like to provide some 
>>> scratch space by means of a filesystem. So I guess your 'recipe' should 
>>> perfectly work. I'm currently playing around with a test configuration and 
>>> adjusted the gres.conf accordingly:
>>> 
>>> 
>>>       cat gres.conf
>>>       Name=disk Type=fast Count=48G
>>>       Name=disk Type=data Count=147G
>>> 
>>>       cat nodenames.conf
>>>       NodeName=compute-0-0 Gres=disk:fast:48G,disk:data:147G 
>>> NodeAddr=192.168.255.253 CPUs=4 Weight=20484100 Feature=rack-0,4CPUs
>>> 
>>> 
>>> Unfortunately I stuck already when trying to restart the slurmd, it doesnt 
>>> come up and complains in the log file:
>>> 
>>>       fatal: Gres disk has invalid count value 51539607552
>>> 
>>> (slurmctld comes up without any troubles)
>>> 
>>> As both, slurmd and slurmctld, are properly come up when I change the Count 
>>> field to Count=1G (up to 3G), I figured that it is a problem of the 32-bit 
>>> nature of the count field. However, I thought that this issue would be 
>>> circumvented by the suffix K,M and G. 
>>> 
>>> 
>>> 
>>> What am I missing?
>>> 
>>> 
>>> Thanks.
>>> 
>>> greetings
>>> 
>>> 
>>> Jan
>>> 
>>> 
>>> 
>>> 
>>>> On Jun 12, 2015, at 2:44 PM, Aaron Knister wrote:
>>>> 
>>>> 
>>>> Hi Jan,
>>>> 
>>>> Are you looking to make raw block devices assessable to jobs or a file 
>>>> system?
>>>> 
>>>> The term "running on"  can mean different things-- it could be where the 
>>>> application binary lives, or where input and or output files live, or 
>>>> maybe some other things too. I'll figure you're looking to provide scratch 
>>>> space on the node by means of a filesystem. 
>>>> 
>>>> If you'd like to hand out filesystem access let's say each disk is mounted 
>>>> at /local_disk/sata and /local_disk/sas, respectively, you could define 
>>>> the GRES as:
>>>> 
>>>> Name=local_disk Type=sata Count=3800G
>>>> Name=local_disk Type=sas Count=580G
>>>> 
>>>> (You'll probably want to adjust the value of Count depending on what size 
>>>> the drives format out to). 
>>>> 
>>>> You could then write some prolog magic to actually allocate that space on 
>>>> the nodes (if you're sharing nodes between jobs) via quotas (or maybe 
>>>> something more fancy if you have say ZFS or btrfs) and creates a 
>>>> job-specific directory under the mount point.  In addition you could set 
>>>> an environment variable via the prolog that points to the path for the 
>>>> storage so users can reference it in their jobs regardless of disk type. A 
>>>> single SLURM_LOCAL_DISK variable might do the job. The last piece is an 
>>>> epilog job to delete the job-specific directory and unset any quotas along 
>>>> with a cron job to periodically check that the directories and quotas have 
>>>> been cleaned up on each node in case there's an issue with the SLURM 
>>>> epilog (e.g. A nodes reboots during the job)
>>>> 
>>>> I hope that helps and isn't overwhelming. If you have questions about any 
>>>> of the parts I'm happy to explain more. 
>>>> 
>>>> Best,
>>>> Aaron
>>>> 
>>>> 
>>>> Sent from my iPhone
>>>> 
>>>>> On Jun 12, 2015, at 8:18 AM, Jan Schulze <[email protected]> 
>>>>> wrote:
>>>>> 
>>>>> 
>>>>> Dear all,
>>>>> 
>>>>> this is slurm 14.11.6 on a ROCKS 6.2 cluster. 
>>>>> 
>>>>> We'are currently planing to build a cluster out of computing nodes each 
>>>>> having one SAS(600GB) and one SATA(4TB) hard drive. Is there a way that 
>>>>> one can configure the nodes such that the user can specify on which kind 
>>>>> of disk the job is supposed to run? So in the gres.conf file something 
>>>>> like 
>>>>> 
>>>>> Name=storage Type=SATA File=/dev/sda1 Count=4000G
>>>>> Name=fast Type=SAS File=/dev/sdb1 Count=600G
>>>>> 
>>>>> ?
>>>>> 
>>>>> 
>>>>> Thanks in advance.
>>>>> 
>>>>> 
>>>>> greetings
>>>>> 
>>>>> Jan Schulze=

[slurm-dev] Re: How to define multiple drives as Gres?

Reply via email to