Hi Adam,

No, obviously that would be too much and perhaps no benefit. I was
envisioning using a relatively small number of WCKeys in combination with
something in the name of the jobs. That way you would need to query/parse
only a limited number of jobs to see if the one(s) you want to resubmit are
already running (or has completed), as opposed to the whole database.

Something similar to what the multitenant app does, as described on page
38-60 of
https://github.com/WFU-HPC/OOD-MultitenantApps/blob/main/presentation.pdf
You might take inspiration from them also about how to cram information
into the job name!

Disclaimer: I have not done this myself, but I've seen their presentation
and spoke with them and it seemed very interesting

HTH,
Davide

On Thu, Feb 19, 2026 at 11:35 AM Adam Novak <[email protected]> wrote:

> Davide, how do you envision WCKeys being used here? I can imagine
> assigning a globally unique WCKey to every job, to allow retrieving or
> identifying a job later, but it doesn't seem like the WCKeys system is
> intended to be used with thousands of distinct WCKey values. It looks like
> the multitenant setup uses just one WCKey value of "multitenant".
>
> Thanks,
> -Adam
>
> On Wed, Feb 18, 2026 at 11:27 PM Davide DelVento <[email protected]>
> wrote:
>
>> Another option, probably better, would be to use WCKeys. See for example
>> how https://github.com/WFU-HPC/OOD-MultitenantApps solved a very
>> similar problem exploiting WCKeys (and other things)
>>
>> On Wed, Feb 18, 2026 at 9:08 AM Adam Novak via slurm-users <
>> [email protected]> wrote:
>>
>>> That could probably help; I'd still want to make the job names unique to
>>> prevent multiple workflows under one user from delaying each other, but I'd
>>> be able to have something much closer to correct without a lot of
>>> second-guessing the submission return code.
>>>
>>> On Tue, Feb 17, 2026 at 9:12 PM Kevin Buckley via slurm-users <
>>> [email protected]> wrote:
>>>
>>>> On 2026/02/18 01:56, Adam Novak via slurm-users wrote:
>>>> > ...
>>>> > Toil can't handle multiple copies of the same job running at once
>>>> > ...
>>>> > Is it possible to write an idempotent sbatch command, where it can be
>>>> run
>>>> > any number of times but will only actually submit one copy of the job?
>>>>
>>>> Could you not make use of the
>>>>
>>>>    --dependency=singleton
>>>>
>>>> constraint, to achieve something close to what your meta-scheduler
>>>> needs?
>>>>
>>>>  From the sbatch manpage:
>>>>
>>>>      singleton
>>>>          This job can begin execution after any previously launched
>>>> jobs sharing
>>>>          the same job name and user have terminated. In other words,
>>>> only one job
>>>>          by that name and owned by that user can be running or
>>>> suspended at any
>>>>          point in time. In a federation, a singleton dependency must be
>>>> fulfilled
>>>>          on all clusters unless
>>>> DependencyParameters=disable_remote_singleton is
>>>>          used in slurm.conf.
>>>>
>>>> You would still need to catch any queued dupe(s) that your
>>>> meta-scheduler created
>>>> but there wouldn't be two running at once.
>>>>
>>>>
>>>>
>>>> --
>>>> slurm-users mailing list -- [email protected]
>>>> To unsubscribe send an email to [email protected]
>>>>
>>>
>>>
>>> --
>>> Adam Novak (He/Him)
>>> Senior Software Engineer
>>> Computational Genomics Lab
>>> UC Santa Cruz Genomics Institute
>>> "Revealing life’s code."
>>>
>>> Personal Feedback: https://forms.gle/UXZhZc123knF65Dw5
>>>
>>>
>>>
>>>
>>> --
>>> slurm-users mailing list -- [email protected]
>>> To unsubscribe send an email to [email protected]
>>>
>>
>
> --
> Adam Novak (He/Him)
> Senior Software Engineer
> Computational Genomics Lab
> UC Santa Cruz Genomics Institute
> "Revealing life’s code."
>
> Personal Feedback: https://forms.gle/UXZhZc123knF65Dw5
>
>
>
>
-- 
slurm-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to