So i think tools like Slurm have more of an emphasis on system resource
management. You manage system resources using job queues, time, etc. I
think my use-case is related, hence somebody though of Slurm and likes.
However, the management i need here is data oriented: ie i want to use data
to determine the jobs, as opposed to using job queues, quotas, and other
such system environment resources. That is, i want to have jobs determined
by what kind of data i have, and not so much jobs determined or directly
limited by the kind of system resources(CPUs, GPUs, time/disk quota, etc).
The system resources are used indirectly. The tool i need here is a tool to
help me synthesize the jobs based on rules i describe using the tool. Once
the jobs are described, then the system resources begin to matter. The jobs
i describe here are dynamic- they can change depending on what data is
coming into this tool. As data comes into the tool, jobs are created
according to the provided to the tool, but some data can come in that
causes existing jobs to be cancelled, or changed to be processed
differently than was planned at initial setting,etc.

It sounds like something you could probably do with the combination of BAsh
and Parallel, but expressing this with BAsh is can be very hard to get
right- in the same sense that writing your code in 0s and 1s is hard even
though not impossible.

Regards,
Prince

On Mon, Feb 13, 2017 at 6:38 PM, Rob Sargent <[email protected]> wrote:

>
>
> On 02/13/2017 05:45 AM, Ole Tange wrote:
>
>> On Mon, Feb 13, 2017 at 11:11 AM, Prince Sibanda
>> <[email protected]> wrote:
>>
>> However, once one of these two cases starts running, i want to be able to
>>> issue interactively a command to stop feeding certain types of files from
>>> the joblist. I also want to be able to prioritise the jobs in joblist so
>>> that those are run first. I would also like to be able to insert new jobs
>>> into the joblist with a certain priority level, so that if the inserted
>>> is a
>>> high priority job for example, it is run next as soon as any of the
>>> currently running jobs has finished. I would like to be able to say skip
>>> a
>>> certain job, or repeat a certain job, take a certain job out of joblist,
>>> etc. All this i want to be able to do when one of those two cases has
>>> already started running.
>>>
>> You are describing a job queue system.
>>
>> GNU Parallel was not built as a job queue system, but can be used as a
>> very minimal queue.
>>
>> GNU Parallel is not designed for interactivity - it has very few
>> interactive features. It is not designed for removing jobs from the
>> queue, and it has no concept of a priority level.
>>
>> Extending GNU Parallel to a proper job queue system is outside the
>> scope of GNU Parallel, and even if someone made a patch for this, I
>> would probably be reluctant to include it - as it would have to
>> re-write huge sections of GNU Parallel.
>>
>> Some GNU Parallel users use Slurm. I would imagine GNU Parallel is
>> useful for generating and submitting jobs to Slurm, and I would be
>> open to making a few changes to make GNU Parallel interface better
>> with Slurm, if there are obvious improvement ideas.
>>
>> Slurm already has the concept of priority, and it is possible to
>> remove jobs from the queue, so my guess is that it will be easier for
>> you to extend Slurm to meet your needs, and I encourage you to see if
>> Slurm or some of the alternatives meet your needs already.
>>
>> Other alternatives include Torque and Rocks.
>>
>>
>> /Ole
>>
>>
> We use slurm and parallel, but perhaps in reverse to what Ole has in
> mind.  Here slurm is the queue and job manager to clusters of anonymous
> machines.  Our slurm jobs start parallel jobs in available hardware, which
> in turn occupy all available processors .
>
>
>

Reply via email to