On Thu, 15 Sep 2011 10:16:26 -0700, "Mark A. Grondona" <[email protected]> wrote: > > I'm not sure if this will be useful or not, but your use case reminded > me of a project by Jim Garlick awhile back called "industrial strength > pipes" (ISP). This project allows you to set up a chain of dependent > tasks much like a UNIX pipeline, and it has some kind of support for > spawning the tasks in the pipeline with srun(1). It might not exactly > map to your usage case, but I thought I'd mention it nonetheless.
I did mean to send the URL to ISP: http://isp.sourceforge.net/report.pdf mark > Another project that this discussion reminded me of was a set of > scripts I wrote awhile back to run a personal instance of SLURM > as a SLURM job. When this nested SLURM instance was launched, it > then appeared to commands running within the job that a full SLURM > cluster of however many nodes were in the job was available. You > could then submit multiple batch jobs to this nested instance (even > another request for a nested SLURM) > > The solution was kind of kludgy though, and a proper implementation > was never accepted into SLURM proper, so unfortunately no such > support exists today. > > mark > > > > On Wed, 14 Sep 2011 15:09:47 -0700, Yuri D'Elia <[email protected]> wrote: > > On Wed, 14 Sep 2011 10:44:36 -0700, Danny Auble wrote: > > > Have you had a look at the HTC documentation? > > > > > > http://schedmd.com/slurmdocs/high_throughput.html > > > > Yes, I have. I was able to improve the scheduling speed by tuning the > > configuration (before that, I couldn't even queue 65k jobs before > > getting timeouts and abysmal performance). Meanwhile, I will update to > > 2.2 to get larger job counts, but still that doesn't address all my > > concerns. Please be patient :) > > > > > Without knowing what your real objective is it is hard to prescribe a > > > real solution. > > > > > > From your description it seems strange you would have the script > > > sbatch is calling call sbatch once again. What are you trying to > > > accomplish there? > > > Wouldn't it just be easier to run this script outside of an > > > allocation? > > > > Ok, I will restate my problem in a more practical manner. Please ask if > > there's any question or any idea on how to improve the behavior. > > > > I'm running bioinformatic batches of various kinds on genetic data. A > > typical analysis will involve running a short batch (~ 10 minutes) > > multiplied for each polymorphism we have (roughly 100k times in the > > smallest case). Perfect candidate for distribution, since every step in > > a single stage is independent. > > > > Analyses are usually multi-stage: > > > > - we run "stage 1" (first 100k jobs) > > - collect and aggregate data (a single job depending on "stage 1" > > - run "stage 2" using collected data (another 100k jobs) > > - (repeat) > > > > Let's assume queuing ~200k jobs is not a problem with 2.2. > > > > First issue: "squeue" takes forever with more than >5000 jobs. If more > > than one user is scheduling a workflow like this it becomes impossible > > to use it at all. Also, managing the queue itself (managing jobs, > > killing just "stage 1" is impossible). I would like to group the first > > 100k jobs in a single "id", so that I know that jobs 1-100k belong to > > "stage 1". > > > > My impression by reading the docs is that I can create an allocation > > and run "steps" to achieve this behavior. squeue or salloc is the > > easiest way, but since queuing that many jobs is also time-consuming, > > running the queuing script on the queue itself seemed a perfect solution > > (hence sbatch --jobid within sbatch). This method (using salloc or > > sbatch) also seems to work fine if I put a fat "sleep" to keep alive the > > allocation. > > > > Also, consider that eventually I will need to queue jobs within a > > script anyway (the ending step of "stage 1" might be scheduling "stage > > 2" itself). > > > > Second issue: job dependencies. If I can use a single job with steps, I > > can put dependencies for "step 2" easily on a single id and schedule > > everything "outside" of slurm. If this is not possible, then I need a > > barrier (like "wait" in a script like you suggested) so that as soon a > > "stage 1" finishes I can schedule the next stages within the batch > > itself. > > > > Right now, to word around these issues, I'm artificially limiting the > > jobs by scheduling N/Z jobs, where each resuling job runs Z steps > > sequentially. This limits parallelism however. To work around > > dependencies issues, I'm looping with a script around "squeue" to see if > > a pre-determined stage has finished. Ugly, but having people wait to > > schedule more jobs (and thus letting the machines idle) is worse. > >
