When I've been in that situation I have solved the problem with a lock
on a temporary file.
If you need any more help, please let me know. I probably still have
some examples around.
best,
Manuel
2017-09-03 15:02 GMT+02:00 Jordi A. Gómez :
> Hello,
>
> I am developing a
on:
> Can't I use backfill and multifactor priority together?
> First goes backfill, depending on the job length and then it is sorted
> by priority?
> Atleast thats were my first thoughts about it :D
>
> Best regards,
> Dennis
>
> Am 08.08.2017 um 14:53 schrieb Manuel Rodr
Hi Dennis,
If I undersand you correctly, what you need is to use the Multifactor Plugin
https://slurm.schedmd.com/priority_multifactor.html
In particular, I guess this is relevant for your installation:
Note: Computing the fair-share factor requires the installation and
operation of the Slurm
Hi all,
I'm trying to provide support to users demanding different versions of
MPI, namely mvapich2 and OpenMPI. In particular, I'd like to support
"srun -n X ./my_parallel_app", maybe with "--with-mpi=X" flag.
Has anybody got any suggestion on how to do that? I'm am kind of new
to cluster
anuel
2017-07-04 15:55 GMT+02:00 Manuel Rodríguez Pascual
<manuel.rodriguez.pasc...@gmail.com>:
>
> Hi all,
>
> Developing a Slurm plugin I've come to a funny problem. I guess it is not
> strictly related to Slurm but just system administration, but maybe someone
> can
Hi all,
Developing a Slurm plugin I've come to a funny problem. I guess it is not
strictly related to Slurm but just system administration, but maybe someone
can point me on the right direction.
I have 2 machines, one with CentOS 7 and one with BullX (based on CentOS6).
When I send a signal to
It's maybe a bit of a hack, but I guess it could be done with a Spank
plugin. Just put whatever you want to print on the spank_init method,
as it is called every time a job is submitted. As a drawback, it would
also be printed with srun, and maybe when starting slurmctld (not a
big deal anyway).
Hi all,
After working with the developers of DMTCP checkpoint library, we have a
nice working version of Slurm+DMTCP. We are able to checkpoint any batch
job (well, most of them) and restarting it anywhere else in the cluster. We
are testing it thoroughly, and will let you know in a few weeks in
Hi all,
I am experiencing a strange behavior on a plugin that I created. i
don't know if this is what I should expect, so maybe you can provide
me with some insights.
I want to have some stuff done in slurm_spank_task_init, but ONLY if
sbatch/srun was executed with "--myplugin". To do so, I
Hi Andy,
I was facing the same issue, but I was worried that it was because
something missconfigured or whatever. Did you receive an
answer/explanation from anybody?
Cheers,
Manuel
2017-01-05 12:00 GMT-05:00 Andy Riebs :
>
> Hi Y'all,
>
> Historically, our users often
ault value is "/var/spool/slurmd". Any "%h" within the name is
> replaced with the hostname on which the *slurmd* is running. Any "%n"
> within the name is replaced with the Slurm node name on which the *slurmd*
> is running.
>
> """
>
Hi all,
I keep having some issues using Slurm + mvapich2. It seems that I cannot
correctly configure Slurm and mvapich2 to work together. In particular,
sbatch works correctly but srun does not. Maybe someone here can provide
me some guidance, as I suspect that the error is an obvious one, but I
H,
After some searching into the code, I may have a clue of what is going on.
I have seen that the commit that launches the error is this one:
72ed146cd2a6facb76e854919fb887faf3fc0c25 (date May. 11th)
I have modified newest version of slurm, src/srun/libsrun/opt.c (line
2279) to print values
Hi all,
I am having the weirdest error ever. I am pretty sure this is a bug. I
have reproduced the error in latest slurm commit (slurm 17.02.0-0pre2,
commit 406d3fe429ef6b694f30e19f69acf989e65d7509 ) and in slurm 16.05.5
branch. It does NOT happen in slurm 15.08.12 .
My cluster is composed by
Hi Jose,
I don't know if it's the case, but this error tends to arise after changing
configuration in slurmctld but not rebooting the compute nodes or having
there a different configuration. Have you double-checked this?
Best regards,
Manuel
El martes, 4 de octubre de 2016, Jose Antonio
30/08/16 22:11, Manuel Rodríguez Pascual wrote:
>
>> We hope that this can be useful for the Slurm community.
>
> That's really pretty neat!
>
> I can't test myself as we're stuck on RHEL6 for the moment but I do
> wonder if you've considered doing the same for Open-MPI so t
Hi all,
After working together with CRIU ( https://criu.org/Main_Page )
developers, my team at CIEMAT has developed a CRIU plugin from Slurm.
This way, Slurm can employ this checkpoint/restart library to perform
these operations.
It is stored in my personal github account,
Hi Omer,
As a first step, I would update Slurm to the latest version. 2.6 is kind of
old, so maybe your problem was a bug has been solved by now.
Besides, could you post a bit more about your system (MPI library?) and
slurm.conf relevant information?
Cheers,
Manuel
2016-06-22 0:15 GMT+02:00
Hi all,
After looking at the problem with Sergio in a separate mail, we found
that the problem was "slurm_init_job_desc_msg" call. It sets default
values to all the objects of job_desc_msg_t, so it was overwriting
previous values. It has been moved just after the variable
declaration, and now
Hi all,
working on a slurm plugin, I've come to this error.
"slurm_auth_unpack error: packed by slurmctld_p unpack by auth/munge"
it appears when I execute a simple slurm API call from my plugin,
job_info_msg_t * job_ptr;
uint16_t show_flags = 0;
if ((error = slurm_load_job (_ptr, job_id,
There is a good tutorial on how to use DMTCP on their github page,
https://github.com/dmtcp/dmtcp/blob/master/QUICK-START.md
I would start there. Anyway, probably this Slurm mailing list is not
the best place to ask for that information.
Best regards,
Manuel
2016-04-14 11:01 GMT+02:00 Husen
Hi Danny, all,
As far as I know, unfortunately BLCR does not count with MPI support
At lest I haven't been able to achieve it.
On the other side, DMTCP ( http://dmtcp.sourceforge.net/ ) does work
with MPI. My team is very interested on counting with a reliable
checkpoint/restar mechanism in
the best solution either. All together, I keep having
the feeling that there is an obvious solution that I am not considering.
Any ideas or suggestions?
Thanks for your help,
Manuel
--
Dr. Manuel Rodríguez-Pascual
skype: manuel.rodriguez.pascual
phone: (+34) 913466173 // (+34) 679925108
x86_64.rpm slurm-perlapi-15.08.4-1.el6.x86_64.rpm
> slurm-sjobexit-15.08.4-1.el6.x86_64.rpm
> slurm-sjstat-15.08.4-1.el6.x86_64.rpm slurm-torque-15.08.4-1.el6.x86_64.rpm
> slurm-blcr-15.08.4-1.el6.x86_64.rpm
>
>
>
> If I untar slurm-15.08.4.tar.bz2, I not found any file named scch
&
your reply.
>
>
>
> David
>
--
Dr. Manuel Rodríguez-Pascual
skype: manuel.rodriguez.pascual
phone: (+34) 913466173 // (+34) 679925108
CIEMAT-Moncloa
Edificio 22, desp. 1.25
Avenida Complutense, 40
28040- MADRID
SPAIN
> *ii blcr-testsuite0.8.5-2.2
> i386 Userspace tools to Checkpoint and Restart Linux processes*
> *ii blcr-util 0.8.5-2.2
> i386 Userspace tools to Checkpoint and Restart Linux processes*
>
> Also *libcr-dev*, *l
vironment, slurm 16.05.0-0pre1, sbatch works, srun DOES NOT work.
Any hints?
Thanks for your help,
Manuel
--
Dr. Manuel Rodríguez-Pascual
skype: manuel.rodriguez.pascual
phone: (+34) 913466173 // (+34) 679925108
CIEMAT-Moncloa
Edificio 22, desp. 1.25
Avenida Complutense, 40
28040- MADRID
SPAIN
lobal variable accesible by all the methods storing a value and so.
Thanks for your help,
Manuel
--
Dr. Manuel Rodríguez-Pascual
skype: manuel.rodriguez.pascual
phone: (+34) 913466173 // (+34) 679925108
CIEMAT-Moncloa
Edificio 22, desp. 1.25
Avenida Complutense, 40
28040- MADRID
SPAIN
rnative? I am new on Slurm development, so
there is probably a simpler and better way of doing things.
Thanks for your help. Best regards,
Manuel
--
Dr. Manuel Rodríguez-Pascual
skype: manuel.rodriguez.pascual
phone: (+34) 913466173 // (+34) 679925108
CIEMAT-Moncloa
Edificio 22, desp. 1.25
Avenida Complutense, 40
28040- MADRID
SPAIN
; #SBATCH -n 3
>
> #SBATCH --output=test-srun.out
>
> #SBATCH --error=test-srun.err
>
> source /etc/profile
>
> module load openmpi-x86_64
>
>
>
> srun mpi
>
>
>
> Thanks,
>
> Ing.Fany Pagés Díaz
>
>
>
>
>
>
>
>
>
>
>
--
Dr. Manuel Rodríguez-Pascual
skype: manuel.rodriguez.pascual
phone: (+34) 913466173 // (+34) 679925108
CIEMAT-Moncloa
Edificio 22, desp. 1.25
Avenida Complutense, 40
28040- MADRID
SPAIN
-support-services/services/slurm-simulator
Thanks in advance!
//Gonzalo
--
Dr. Manuel Rodríguez-Pascual
skype: manuel.rodriguez.pascual
phone: (+34) 913466173 // (+34) 679925108
CIEMAT-Moncloa
Edificio 22, desp. 1.25
Avenida Complutense, 40
28040- MADRID
SPAIN
will keep updating in case someone else is trying to bring the simulator to
life.
Thanks!
//Gonzalo
On Wed, Aug 5, 2015 at 7:25 AM, Manuel Rodríguez Pascual
manuel.rodriguez.pasc...@gmail.com
javascript:_e(%7B%7D,'cvml','manuel.rodriguez.pasc...@gmail.com');
wrote:
Hi Gonzalo,
I am
That's exactly what I was looking for, thanks very much.
2015-06-02 16:30 GMT+02:00 Moe Jette je...@schedmd.com:
See the MinJobAge configuration option:
http://slurm.schedmd.com/slurm.conf.html
Quoting Manuel Rodríguez Pascual manuel.rodriguez.pasc...@gmail.com:
Hi all,
I have been
regards,
Manuel
2015-05-29 18:00 GMT+02:00 Manuel Rodríguez Pascual
manuel.rodriguez.pasc...@gmail.com:
Hi all,
I have been messing around a little bit with task checkpoint/restart.
I am employing BLCR to checkpoint a fairly small application with
slurm_checkpoint_vacate, what
? Or as an alternative, is there any API call
that can be employed to trigger that purge?
Thanks for your help,
Manuel
--
Dr. Manuel Rodríguez-Pascual
skype: manuel.rodriguez.pascual
phone: (+34) 913466173 // (+34) 679925108
CIEMAT-Moncloa
Edificio 22, desp. 1.25
Avenida Complutense, 40
28040- MADRID
by the function. Can
anyone enlighten me?
Thanks for your attention. Best regards,
Manuel
--
Dr. Manuel Rodríguez-Pascual
skype: manuel.rodriguez.pascual
phone: (+34) 913466173 // (+34) 679925108
CIEMAT-Moncloa
Edificio 22, desp. 1.25
Avenida Complutense, 40
28040- MADRID
SPAIN
some
sleep/retry logic should do what you want.
Quoting Manuel Rodríguez Pascual manuel.rodriguez.pasc...@gmail.com:
Good morning all,
I am facing a problem when using slurm.h API to manage checkpoints.
What I want to do is to checkpoint a running task, shut it down, and then
restore
Good morning all,
I am facing a problem when using slurm.h API to manage checkpoints.
What I want to do is to checkpoint a running task, shut it down, and then
restore it somewhere (in the same node or another one).
slurm.conf is configured with:
CheckpointType=checkpoint/blcr
.
Best regards,
Manuel
2014-10-27 19:10 GMT+01:00 r...@q-leap.de:
Manuel == Manuel Rodríguez Pascual
manuel.rodriguez.pasc...@gmail.com writes:
Hi Manuel,
Manuel Hi all, I have the intention of working on Slurm, modifying
Manuel it to satisfy my needs and (hopefully) include some new
= 0
TreeWidth = 50
UsePam = 0
UnkillableStepProgram = (null)
UnkillableStepTimeout = 60 sec
VSizeFactor = 0 percent
WaitTime= 0 sec
---
---
--
Dr. Manuel Rodríguez-Pascual
skype: manuel.rodriguez.pascual
phone: (+34) 913466173
40 matches
Mail list logo