Re: [easybuild] gc3pie on heterogenous cluster

2017-12-05 Thread Jure Pečar
On Tue, 5 Dec 2017 12:36:23 +0100
Jure Pečar  wrote:

> What would be the best approach to understand what's going on here?

Reading through slurm.py in gc3pie backends made me aware that usernames in 
docker should be in sync with the ones on slurm. Now I see jobs being submitted.


-- 

  Jurij Pečar
  HPC Engineer, IT Operations, IT Services
  EMBL Heidelberg, Meyerhofstraße 1, 69117, Heidelberg, Germany
  Room 13-401


Re: [easybuild] gc3pie on heterogenous cluster

2017-12-05 Thread Jure Pečar
On Tue, 5 Dec 2017 11:36:31 +0100
Jure Pečar  wrote:

> However I would like to use something else than $HOME/.gc3pie_jobs to keep 
> track of jobs. I don't see an easy way to redefine it. Am I missing something 
> or is it really not configurable?

But even if I set HOME to a shared folder /g/easybuild/tmp, I see lines like 
these in the log:

== 2017-12-05 11:07:06,103 batch.py:385 INFO Creating remote temporary folder: 
command 'mkdir -p $HOME/.gc3pie_jobs; mktemp -d 
$HOME/.gc3pie_jobs/lrms_job.XX' 
== 2017-12-05 11:07:06,200 core.py:368 INFO Successfully submitted 
Application@34c2050 to: haswell
== 2017-12-05 11:07:06,201 build_log.py:232 INFO GC3Pie job overview: 1 
submitted (total: 1)
== GC3Pie job overview: 1 submitted (total: 1)
== 2017-12-05 11:07:36,413 slurm.py:514 INFO Updated resource 'haswell' status: 
free slots: -1, total running: 1108, own running jobs: 0, own queued jobs: 0, 
total queued jobs: 1138
== 2017-12-05 11:07:36,439 batch.py:740 WARNING Failed removing remote folder 
'/g/easybuild/tmp/.gc3pie_jobs/lrms_job.5X5tj7zdWU': : Could not remove directory tree 
'/g/easybuild/tmp/.gc3pie_jobs/lrms_job.5X5tj7zdWU': OSError: [Errno 2] No such 
file or directory: '/g/easybuild/tmp/.gc3pie_jobs/lrms_job.5X5tj7zdWU'
== 2017-12-05 11:07:36,439 build_log.py:232 INFO GC3Pie job overview: 1 
terminated, 1 ok (total: 1)
== GC3Pie job overview: 1 terminated, 1 ok (total: 1)

Should I worry about that warning?

Thing is that I never see any jobs submitted to the queue and I don't really 
understand why.

What would be the best approach to understand what's going on here?


-- 

  Jurij Pečar
  HPC Engineer, IT Operations, IT Services
  EMBL Heidelberg, Meyerhofstraße 1, 69117, Heidelberg, Germany
  Room 13-401


Re: [easybuild] gc3pie on heterogenous cluster

2017-12-05 Thread Jure Pečar
On Fri, 1 Dec 2017 13:26:54 +0100
Riccardo Murri  wrote:

> and then:
> 
> eb --job-backend gc3pie --job-target-resource nehalem ...
> 
> Hope this helps,

This appears to work fine.

However I would like to use something else than $HOME/.gc3pie_jobs to keep 
track of jobs. I don't see an easy way to redefine it. Am I missing something 
or is it really not configurable?


-- 

  Jurij Pečar
  HPC Engineer, IT Operations, IT Services
  EMBL Heidelberg, Meyerhofstraße 1, 69117, Heidelberg, Germany
  Room 13-401


Re: [easybuild] gc3pie on heterogenous cluster

2017-12-02 Thread Fotis Georgatos

Ciao Riccardo, all,

thanks, this is indeed a step forward, this is a very common need. If you 
recall, I brought this up years ago.
Is there any way this could be somewhat more automated, fi. “all goolfc builds 
are to go on resource/gpunodes”?

I am now eyeing at the following list and wonder which concepts seen in other 
tools may be relevant here:
https://github.com/common-workflow-language/common-workflow-language/wiki/Existing-Workflow-systems
In short, we should remove human choice from the build process and, to the 
possible extend, 
cluster related builds together (fi. I often prefer each toolchain is build on 
a single dedicated node).

tia!
F.

On Dec 1, 2017, at 12:26 PM, Riccardo Murri  wrote:
> (Jure Pečar, Fri, Dec 01, 2017 at 01:17:05PM +0100:)
>> On Fri, 1 Dec 2017 12:34:42 +0100
>> Riccardo Murri  wrote:
>> 
>>> Do I understand correctly that you want to generate programmatically a 
>>> number
>>> of different values for `-C`?
>> 
>> Yes, like nehalem, sandybridge, haswell, skylake, epyc ...
>> 
>> So multiple backend definitions in gc3pie.conf would be an option. How do I 
>> then tell eb --job which backend to use? --job-backend is a choice between 
>> gc3pie and pbspython. I assume I can play with --job-backend-config and have 
>> one backend per gc3pie.conf.arch file, right? I'll try that ...
> 
> That's one option.  Another one is to define different GC3Pie resources
> in the same configuration file and use EB's `--job-target-resource`::
> 
>  ### gc3pie.conf
>  [resource/nehalem]
>  # ... generic SLURM config here
>  sbatch = sbatch -C nehalem
> 
>  [resource/sandybridge]
>  # ... (copy config from `nehalem`)
>  sbatch = sbatch -C sandybridge
> 
> and then:
> 
>eb --job-backend gc3pie --job-target-resource nehalem ...
> 
> Hope this helps,
> Riccardo
> 
> --
> Riccardo Murri
> 
> S3IT: Services and Support for Science IT
> University of Zurich

cheers,
Fotis


-- 
echo "sysadmin know better bash than english" | sed s/min/mins/ \
  | sed 's/better bash/bash better/' # signal detected in a CERN forum









Re: [easybuild] gc3pie on heterogenous cluster

2017-12-01 Thread Riccardo Murri
(Jure Pečar, Fri, Dec 01, 2017 at 01:17:05PM +0100:)
> On Fri, 1 Dec 2017 12:34:42 +0100
> Riccardo Murri  wrote:
>
> > Do I understand correctly that you want to generate programmatically a 
> > number
> > of different values for `-C`?
>
> Yes, like nehalem, sandybridge, haswell, skylake, epyc ...
>
> So multiple backend definitions in gc3pie.conf would be an option. How do I 
> then tell eb --job which backend to use? --job-backend is a choice between 
> gc3pie and pbspython. I assume I can play with --job-backend-config and have 
> one backend per gc3pie.conf.arch file, right? I'll try that ...

That's one option.  Another one is to define different GC3Pie resources
in the same configuration file and use EB's `--job-target-resource`::

  ### gc3pie.conf
  [resource/nehalem]
  # ... generic SLURM config here
  sbatch = sbatch -C nehalem

  [resource/sandybridge]
  # ... (copy config from `nehalem`)
  sbatch = sbatch -C sandybridge

and then:

eb --job-backend gc3pie --job-target-resource nehalem ...

Hope this helps,
Riccardo

--
Riccardo Murri

S3IT: Services and Support for Science IT
University of Zurich


Re: [easybuild] gc3pie on heterogenous cluster

2017-12-01 Thread Jure Pečar
On Fri, 1 Dec 2017 12:34:42 +0100
Riccardo Murri  wrote:

> Do I understand correctly that you want to generate programmatically a number 
> of different values for `-C`? 

Yes, like nehalem, sandybridge, haswell, skylake, epyc ...

So multiple backend definitions in gc3pie.conf would be an option. How do I 
then tell eb --job which backend to use? --job-backend is a choice between 
gc3pie and pbspython. I assume I can play with --job-backend-config and have 
one backend per gc3pie.conf.arch file, right? I'll try that ...


-- 

  Jurij Pečar
  HPC Engineer, IT Operations, IT Services
  EMBL Heidelberg, Meyerhofstraße 1, 69117, Heidelberg, Germany
  Room 13-401


Re: [easybuild] gc3pie on heterogenous cluster

2017-12-01 Thread Riccardo Murri
Hi Jurij,

> I'm looking at possible easybuild integrations to our slurm. Sice we have a 
> zoo of machines in the cluster, I would need to submit easybuild --job with 
> the use of sbatch -C flags so that each piece of software gets built for all 
> cpu generations we have.
>
> I don't see support for -C in gc3pie/slurm.

If you were to use a fixed value for `-C` (e.g., `-C lustre`), this
could be supported by setting

sbatch = sbatch -C lustre

in `gc3pie.conf`.

Do I understand correctly that you want to generate programmatically a number of
different values for `-C`?  Or do you want to re-run the same `eb ...`
command-line over a number of given `-C` values?

Ciao,
R
--
Riccardo Murri

S3IT: Services and Support for Science IT
University of Zurich