Re: [easybuild] gc3pie on heterogenous cluster
On Tue, 5 Dec 2017 12:36:23 +0100 Jure Pečarwrote: > What would be the best approach to understand what's going on here? Reading through slurm.py in gc3pie backends made me aware that usernames in docker should be in sync with the ones on slurm. Now I see jobs being submitted. -- Jurij Pečar HPC Engineer, IT Operations, IT Services EMBL Heidelberg, Meyerhofstraße 1, 69117, Heidelberg, Germany Room 13-401
Re: [easybuild] gc3pie on heterogenous cluster
On Tue, 5 Dec 2017 11:36:31 +0100 Jure Pečarwrote: > However I would like to use something else than $HOME/.gc3pie_jobs to keep > track of jobs. I don't see an easy way to redefine it. Am I missing something > or is it really not configurable? But even if I set HOME to a shared folder /g/easybuild/tmp, I see lines like these in the log: == 2017-12-05 11:07:06,103 batch.py:385 INFO Creating remote temporary folder: command 'mkdir -p $HOME/.gc3pie_jobs; mktemp -d $HOME/.gc3pie_jobs/lrms_job.XX' == 2017-12-05 11:07:06,200 core.py:368 INFO Successfully submitted Application@34c2050 to: haswell == 2017-12-05 11:07:06,201 build_log.py:232 INFO GC3Pie job overview: 1 submitted (total: 1) == GC3Pie job overview: 1 submitted (total: 1) == 2017-12-05 11:07:36,413 slurm.py:514 INFO Updated resource 'haswell' status: free slots: -1, total running: 1108, own running jobs: 0, own queued jobs: 0, total queued jobs: 1138 == 2017-12-05 11:07:36,439 batch.py:740 WARNING Failed removing remote folder '/g/easybuild/tmp/.gc3pie_jobs/lrms_job.5X5tj7zdWU': : Could not remove directory tree '/g/easybuild/tmp/.gc3pie_jobs/lrms_job.5X5tj7zdWU': OSError: [Errno 2] No such file or directory: '/g/easybuild/tmp/.gc3pie_jobs/lrms_job.5X5tj7zdWU' == 2017-12-05 11:07:36,439 build_log.py:232 INFO GC3Pie job overview: 1 terminated, 1 ok (total: 1) == GC3Pie job overview: 1 terminated, 1 ok (total: 1) Should I worry about that warning? Thing is that I never see any jobs submitted to the queue and I don't really understand why. What would be the best approach to understand what's going on here? -- Jurij Pečar HPC Engineer, IT Operations, IT Services EMBL Heidelberg, Meyerhofstraße 1, 69117, Heidelberg, Germany Room 13-401
Re: [easybuild] gc3pie on heterogenous cluster
On Fri, 1 Dec 2017 13:26:54 +0100 Riccardo Murriwrote: > and then: > > eb --job-backend gc3pie --job-target-resource nehalem ... > > Hope this helps, This appears to work fine. However I would like to use something else than $HOME/.gc3pie_jobs to keep track of jobs. I don't see an easy way to redefine it. Am I missing something or is it really not configurable? -- Jurij Pečar HPC Engineer, IT Operations, IT Services EMBL Heidelberg, Meyerhofstraße 1, 69117, Heidelberg, Germany Room 13-401
Re: [easybuild] gc3pie on heterogenous cluster
Ciao Riccardo, all, thanks, this is indeed a step forward, this is a very common need. If you recall, I brought this up years ago. Is there any way this could be somewhat more automated, fi. “all goolfc builds are to go on resource/gpunodes”? I am now eyeing at the following list and wonder which concepts seen in other tools may be relevant here: https://github.com/common-workflow-language/common-workflow-language/wiki/Existing-Workflow-systems In short, we should remove human choice from the build process and, to the possible extend, cluster related builds together (fi. I often prefer each toolchain is build on a single dedicated node). tia! F. On Dec 1, 2017, at 12:26 PM, Riccardo Murriwrote: > (Jure Pečar, Fri, Dec 01, 2017 at 01:17:05PM +0100:) >> On Fri, 1 Dec 2017 12:34:42 +0100 >> Riccardo Murri wrote: >> >>> Do I understand correctly that you want to generate programmatically a >>> number >>> of different values for `-C`? >> >> Yes, like nehalem, sandybridge, haswell, skylake, epyc ... >> >> So multiple backend definitions in gc3pie.conf would be an option. How do I >> then tell eb --job which backend to use? --job-backend is a choice between >> gc3pie and pbspython. I assume I can play with --job-backend-config and have >> one backend per gc3pie.conf.arch file, right? I'll try that ... > > That's one option. Another one is to define different GC3Pie resources > in the same configuration file and use EB's `--job-target-resource`:: > > ### gc3pie.conf > [resource/nehalem] > # ... generic SLURM config here > sbatch = sbatch -C nehalem > > [resource/sandybridge] > # ... (copy config from `nehalem`) > sbatch = sbatch -C sandybridge > > and then: > >eb --job-backend gc3pie --job-target-resource nehalem ... > > Hope this helps, > Riccardo > > -- > Riccardo Murri > > S3IT: Services and Support for Science IT > University of Zurich cheers, Fotis -- echo "sysadmin know better bash than english" | sed s/min/mins/ \ | sed 's/better bash/bash better/' # signal detected in a CERN forum
Re: [easybuild] gc3pie on heterogenous cluster
(Jure Pečar, Fri, Dec 01, 2017 at 01:17:05PM +0100:) > On Fri, 1 Dec 2017 12:34:42 +0100 > Riccardo Murriwrote: > > > Do I understand correctly that you want to generate programmatically a > > number > > of different values for `-C`? > > Yes, like nehalem, sandybridge, haswell, skylake, epyc ... > > So multiple backend definitions in gc3pie.conf would be an option. How do I > then tell eb --job which backend to use? --job-backend is a choice between > gc3pie and pbspython. I assume I can play with --job-backend-config and have > one backend per gc3pie.conf.arch file, right? I'll try that ... That's one option. Another one is to define different GC3Pie resources in the same configuration file and use EB's `--job-target-resource`:: ### gc3pie.conf [resource/nehalem] # ... generic SLURM config here sbatch = sbatch -C nehalem [resource/sandybridge] # ... (copy config from `nehalem`) sbatch = sbatch -C sandybridge and then: eb --job-backend gc3pie --job-target-resource nehalem ... Hope this helps, Riccardo -- Riccardo Murri S3IT: Services and Support for Science IT University of Zurich
Re: [easybuild] gc3pie on heterogenous cluster
On Fri, 1 Dec 2017 12:34:42 +0100 Riccardo Murriwrote: > Do I understand correctly that you want to generate programmatically a number > of different values for `-C`? Yes, like nehalem, sandybridge, haswell, skylake, epyc ... So multiple backend definitions in gc3pie.conf would be an option. How do I then tell eb --job which backend to use? --job-backend is a choice between gc3pie and pbspython. I assume I can play with --job-backend-config and have one backend per gc3pie.conf.arch file, right? I'll try that ... -- Jurij Pečar HPC Engineer, IT Operations, IT Services EMBL Heidelberg, Meyerhofstraße 1, 69117, Heidelberg, Germany Room 13-401
Re: [easybuild] gc3pie on heterogenous cluster
Hi Jurij, > I'm looking at possible easybuild integrations to our slurm. Sice we have a > zoo of machines in the cluster, I would need to submit easybuild --job with > the use of sbatch -C flags so that each piece of software gets built for all > cpu generations we have. > > I don't see support for -C in gc3pie/slurm. If you were to use a fixed value for `-C` (e.g., `-C lustre`), this could be supported by setting sbatch = sbatch -C lustre in `gc3pie.conf`. Do I understand correctly that you want to generate programmatically a number of different values for `-C`? Or do you want to re-run the same `eb ...` command-line over a number of given `-C` values? Ciao, R -- Riccardo Murri S3IT: Services and Support for Science IT University of Zurich