Moe,

Thank you!

John DeSantis

2015-02-03 15:28 GMT-05:00  <[email protected]>:
>
> I believe that is OK, but don't know off the top of my head. Either test for
> youself or
> 1. add the new partition
> 2  move pending jobs to the new partition
> 3. Delete the other partitions once their jobs end
>
>
> Quoting John Desantis <[email protected]>:
>
>> Moe,
>>
>> One last question when you have a chance.
>>
>> If there are running jobs with active partitions now and we switch to
>> one partition with a topology, would those running jobs be lost?
>>
>> Thank you!
>> John DeSantis
>>
>> 2015-02-03 12:42 GMT-05:00  <[email protected]>:
>>>
>>>
>>> You can configure a single queue and use the topology/tree plugin to
>>> identify the nodes on separate fabrics.
>>>
>>>
>>> Quoting John Desantis <[email protected]>:
>>>>
>>>>
>>>> Hello all,
>>>>
>>>> Unfortunately, I have some confusion regarding how to achieve a global
>>>> and single partition for our users with several separate host groups
>>>> after reading the man pages and various documentation.
>>>>
>>>> When I say host groups, I mean separate sets of hardware which utilize
>>>> different infiniband fabrics and/or are accessible in different data
>>>> centers, different CPU architectures, etc.
>>>>
>>>> During initial testing periods, I was able to have use of a default
>>>> partition with all of the nodes allocated via the "Nodes="  value.
>>>> All was well until a latter set of nodes were added which had a
>>>> separate infiniband fabric.  Testing proved that applications were
>>>> attempting to utilize the nodes within the separate fabrics, which
>>>> failed miserably, and as a result we're using separate partitions -
>>>> which most users don't mind.
>>>>
>>>> Now that we're getting more users converted to Slurm, we're realizing
>>>> that some users don't know how to check for free partitions and
>>>> available hardware (boo!) and have grown used to our previous
>>>> scheduler configuration of 1 global queue.
>>>>
>>>> I'm looking into how to emulate this and I'm not quite clear if this
>>>> can be done using multiple partition definitions with a DEFAULT clause
>>>> or not.  I've looked at the topology/tree plugin as well and seeing
>>>> that you can specify either switches or nodes, if this would be the
>>>> preferred method to achieve 1 "global" partition which utilizes all of
>>>> the separate hardware pools and respects the separate host groups.
>>>>
>>>> Thank you,
>>>> John DeSantis
>>>
>>>
>>>
>>>
>>> --
>>> Morris "Moe" Jette
>>> CTO, SchedMD LLC
>>> Commercial Slurm Development and Support
>
>
>
> --
> Morris "Moe" Jette
> CTO, SchedMD LLC
> Commercial Slurm Development and Support

Reply via email to