[slurm-dev] Re: Running a privileged process from a Spank plugin

2017-09-04 Thread Manuel Rodríguez Pascual
When I've been in that situation I have solved the problem with a lock on a temporary file. If you need any more help, please let me know. I probably still have some examples around. best, Manuel 2017-09-03 15:02 GMT+02:00 Jordi A. Gómez : > Hello, > > I am developing a

[slurm-dev] Re: Account preferation

2017-08-08 Thread Manuel Rodríguez Pascual
on: > Can't I use backfill and multifactor priority together? > First goes backfill, depending on the job length and then it is sorted > by priority? > Atleast thats were my first thoughts about it :D > > Best regards, > Dennis > > Am 08.08.2017 um 14:53 schrieb Manuel Rodr

[slurm-dev] Re: Account preferation

2017-08-08 Thread Manuel Rodríguez Pascual
Hi Dennis, If I undersand you correctly, what you need is to use the Multifactor Plugin https://slurm.schedmd.com/priority_multifactor.html In particular, I guess this is relevant for your installation: Note: Computing the fair-share factor requires the installation and operation of the Slurm

[slurm-dev] multiple MPI versions with slurm

2017-07-21 Thread Manuel Rodríguez Pascual
Hi all, I'm trying to provide support to users demanding different versions of MPI, namely mvapich2 and OpenMPI. In particular, I'd like to support "srun -n X ./my_parallel_app", maybe with "--with-mpi=X" flag. Has anybody got any suggestion on how to do that? I'm am kind of new to cluster

[slurm-dev] Re: different behaviour of signals with sbatch in different machines

2017-07-06 Thread Manuel Rodríguez Pascual
anuel 2017-07-04 15:55 GMT+02:00 Manuel Rodríguez Pascual <manuel.rodriguez.pasc...@gmail.com>: > > Hi all, > > Developing a Slurm plugin I've come to a funny problem. I guess it is not > strictly related to Slurm but just system administration, but maybe someone > can

[slurm-dev] different behaviour of signals with sbatch in different machines

2017-07-04 Thread Manuel Rodríguez Pascual
Hi all, Developing a Slurm plugin I've come to a funny problem. I guess it is not strictly related to Slurm but just system administration, but maybe someone can point me on the right direction. I have 2 machines, one with CentOS 7 and one with BullX (based on CentOS6). When I send a signal to

[slurm-dev] Re: How to print information message at submission time.

2017-06-20 Thread Manuel Rodríguez Pascual
It's maybe a bit of a hack, but I guess it could be done with a Spank plugin. Just put whatever you want to print on the spank_init method, as it is called every time a job is submitted. As a drawback, it would also be printed with srun, and maybe when starting slurmctld (not a big deal anyway).

[slurm-dev] thoughts on task preemption

2017-05-22 Thread Manuel Rodríguez Pascual
Hi all, After working with the developers of DMTCP checkpoint library, we have a nice working version of Slurm+DMTCP. We are able to checkpoint any batch job (well, most of them) and restarting it anywhere else in the cluster. We are testing it thoroughly, and will let you know in a few weeks in

[slurm-dev] weird integration bewteen Spank and "--export"

2017-02-16 Thread Manuel Rodríguez Pascual
Hi all, I am experiencing a strange behavior on a plugin that I created. i don't know if this is what I should expect, so maybe you can provide me with some insights. I want to have some stuff done in slurm_spank_task_init, but ONLY if sbatch/srun was executed with "--myplugin". To do so, I

[slurm-dev] Re: Change in srun buffered output?

2017-01-10 Thread Manuel Rodríguez Pascual
Hi Andy, I was facing the same issue, but I was worried that it was because something missconfigured or whatever. Did you receive an answer/explanation from anybody? Cheers, Manuel 2017-01-05 12:00 GMT-05:00 Andy Riebs : > > Hi Y'all, > > Historically, our users often

[slurm-dev] Re: problem configuring mvapich + slurm: "error: mpi/pmi2: failed to send temp kvs to compute nodes"

2016-11-18 Thread Manuel Rodríguez Pascual
ault value is "/var/spool/slurmd". Any "%h" within the name is > replaced with the hostname on which the *slurmd* is running. Any "%n" > within the name is replaced with the Slurm node name on which the *slurmd* > is running. > > """ >

[slurm-dev] problem configuring mvapich + slurm: "error: mpi/pmi2: failed to send temp kvs to compute nodes"

2016-11-17 Thread Manuel Rodríguez Pascual
Hi all, I keep having some issues using Slurm + mvapich2. It seems that I cannot correctly configure Slurm and mvapich2 to work together. In particular, sbatch works correctly but srun does not. Maybe someone here can provide me some guidance, as I suspect that the error is an obvious one, but I

[slurm-dev] RE: Wrong behaviour of "--tasks-per-node" flag

2016-10-28 Thread Manuel Rodríguez Pascual
H, After some searching into the code, I may have a clue of what is going on. I have seen that the commit that launches the error is this one: 72ed146cd2a6facb76e854919fb887faf3fc0c25 (date May. 11th) I have modified newest version of slurm, src/srun/libsrun/opt.c (line 2279) to print values

[slurm-dev] Wrong behaviour of "--tasks-per-node" flag

2016-10-21 Thread Manuel Rodríguez Pascual
Hi all, I am having the weirdest error ever. I am pretty sure this is a bug. I have reproduced the error in latest slurm commit (slurm 17.02.0-0pre2, commit 406d3fe429ef6b694f30e19f69acf989e65d7509 ) and in slurm 16.05.5 branch. It does NOT happen in slurm 15.08.12 . My cluster is composed by

[slurm-dev] Re: cons_res / CR_CPU - we don't have select plugin type 102

2016-10-04 Thread Manuel Rodríguez Pascual
Hi Jose, I don't know if it's the case, but this error tends to arise after changing configuration in slurmctld but not rebooting the compute nodes or having there a different configuration. Have you double-checked this? Best regards, Manuel El martes, 4 de octubre de 2016, Jose Antonio

[slurm-dev] Re: new CRIU plugin

2016-08-31 Thread Manuel Rodríguez Pascual
30/08/16 22:11, Manuel Rodríguez Pascual wrote: > >> We hope that this can be useful for the Slurm community. > > That's really pretty neat! > > I can't test myself as we're stuck on RHEL6 for the moment but I do > wonder if you've considered doing the same for Open-MPI so t

[slurm-dev] new CRIU plugin

2016-08-30 Thread Manuel Rodríguez Pascual
Hi all, After working together with CRIU ( https://criu.org/Main_Page ) developers, my team at CIEMAT has developed a CRIU plugin from Slurm. This way, Slurm can employ this checkpoint/restart library to perform these operations. It is stored in my personal github account,

[slurm-dev] Re: jobs assigned to different cores than what Slurm thinks

2016-06-22 Thread Manuel Rodríguez Pascual
Hi Omer, As a first step, I would update Slurm to the latest version. 2.6 is kind of old, so maybe your problem was a bug has been solved by now. Besides, could you post a bit more about your system (MPI library?) and slurm.conf relevant information? Cheers, Manuel 2016-06-22 0:15 GMT+02:00

[slurm-dev] Re: Invalid user id when using Slurm API

2016-05-05 Thread Manuel Rodríguez Pascual
Hi all, After looking at the problem with Sergio in a separate mail, we found that the problem was "slurm_init_job_desc_msg" call. It sets default values to all the objects of job_desc_msg_t, so it was overwriting previous values. It has been moved just after the variable declaration, and now

[slurm-dev] slurm_auth_unpack error: packed by slurmctld_p unpack by auth/munge

2016-04-20 Thread Manuel Rodríguez Pascual
Hi all, working on a slurm plugin, I've come to this error. "slurm_auth_unpack error: packed by slurmctld_p unpack by auth/munge" it appears when I execute a simple slurm API call from my plugin, job_info_msg_t * job_ptr; uint16_t show_flags = 0; if ((error = slurm_load_job (_ptr, job_id,

[slurm-dev] Re: Slurm Checkpoint/Restart example

2016-04-14 Thread Manuel Rodríguez Pascual
There is a good tutorial on how to use DMTCP on their github page, https://github.com/dmtcp/dmtcp/blob/master/QUICK-START.md I would start there. Anyway, probably this Slurm mailing list is not the best place to ask for that information. Best regards, Manuel 2016-04-14 11:01 GMT+02:00 Husen

[slurm-dev] Re: Slurm Checkpoint/Restart example

2016-04-14 Thread Manuel Rodríguez Pascual
Hi Danny, all, As far as I know, unfortunately BLCR does not count with MPI support At lest I haven't been able to achieve it. On the other side, DMTCP ( http://dmtcp.sourceforge.net/ ) does work with MPI. My team is very interested on counting with a reliable checkpoint/restar mechanism in

[slurm-dev] Concurrence with Slurm: How can I force a job to be executed on a given node?

2016-04-12 Thread Manuel Rodríguez Pascual
the best solution either. All together, I keep having the feeling that there is an obvious solution that I am not considering. Any ideas or suggestions? Thanks for your help, Manuel -- Dr. Manuel Rodríguez-Pascual skype: manuel.rodriguez.pascual phone: (+34) 913466173 // (+34) 679925108

[slurm-dev] Re: scch not found !!

2016-02-10 Thread Manuel Rodríguez Pascual
x86_64.rpm slurm-perlapi-15.08.4-1.el6.x86_64.rpm > slurm-sjobexit-15.08.4-1.el6.x86_64.rpm > slurm-sjstat-15.08.4-1.el6.x86_64.rpm slurm-torque-15.08.4-1.el6.x86_64.rpm > slurm-blcr-15.08.4-1.el6.x86_64.rpm > > > > If I untar slurm-15.08.4.tar.bz2, I not found any file named scch &

[slurm-dev] Re: scch not found !!

2016-02-10 Thread Manuel Rodríguez Pascual
your reply. > > > > David > -- Dr. Manuel Rodríguez-Pascual skype: manuel.rodriguez.pascual phone: (+34) 913466173 // (+34) 679925108 CIEMAT-Moncloa Edificio 22, desp. 1.25 Avenida Complutense, 40 28040- MADRID SPAIN

[slurm-dev] Re: Slurm with BLCR and slurmctld.log error

2015-12-14 Thread Manuel Rodríguez Pascual
> *ii blcr-testsuite0.8.5-2.2 > i386 Userspace tools to Checkpoint and Restart Linux processes* > *ii blcr-util 0.8.5-2.2 > i386 Userspace tools to Checkpoint and Restart Linux processes* > > Also *libcr-dev*, *l

[slurm-dev] weird error (bug?) on srun (16.05.0-0pre1)

2015-11-24 Thread Manuel Rodríguez Pascual
vironment, slurm 16.05.0-0pre1, sbatch works, srun DOES NOT work. Any hints? Thanks for your help, Manuel -- Dr. Manuel Rodríguez-Pascual skype: manuel.rodriguez.pascual phone: (+34) 913466173 // (+34) 679925108 CIEMAT-Moncloa Edificio 22, desp. 1.25 Avenida Complutense, 40 28040- MADRID SPAIN

[slurm-dev] SPANK behavior and functionalities

2015-10-14 Thread Manuel Rodríguez Pascual
lobal variable accesible by all the methods storing a value and so. Thanks for your help, Manuel -- Dr. Manuel Rodríguez-Pascual skype: manuel.rodriguez.pascual phone: (+34) 913466173 // (+34) 679925108 CIEMAT-Moncloa Edificio 22, desp. 1.25 Avenida Complutense, 40 28040- MADRID SPAIN

[slurm-dev] Implementing slurm integration with DMTCP: some general questions

2015-09-24 Thread Manuel Rodríguez Pascual
rnative? I am new on Slurm development, so there is probably a simpler and better way of doing things. Thanks for your help. Best regards, Manuel -- Dr. Manuel Rodríguez-Pascual skype: manuel.rodriguez.pascual phone: (+34) 913466173 // (+34) 679925108 CIEMAT-Moncloa Edificio 22, desp. 1.25 Avenida Complutense, 40 28040- MADRID SPAIN

[slurm-dev] Re: Now I have this error with libltdl.so.7

2015-09-18 Thread Manuel Rodríguez Pascual
; #SBATCH -n 3 > > #SBATCH --output=test-srun.out > > #SBATCH --error=test-srun.err > > source /etc/profile > > module load openmpi-x86_64 > > > > srun mpi > > > > Thanks, > > Ing.Fany Pagés Díaz > > > > > > > > > > > -- Dr. Manuel Rodríguez-Pascual skype: manuel.rodriguez.pascual phone: (+34) 913466173 // (+34) 679925108 CIEMAT-Moncloa Edificio 22, desp. 1.25 Avenida Complutense, 40 28040- MADRID SPAIN

[slurm-dev] Re: Simulator sync issues

2015-08-05 Thread Manuel Rodríguez Pascual
-support-services/services/slurm-simulator Thanks in advance! //Gonzalo -- Dr. Manuel Rodríguez-Pascual skype: manuel.rodriguez.pascual phone: (+34) 913466173 // (+34) 679925108 CIEMAT-Moncloa Edificio 22, desp. 1.25 Avenida Complutense, 40 28040- MADRID SPAIN

[slurm-dev] Re: Simulator sync issues

2015-08-05 Thread Manuel Rodríguez Pascual
will keep updating in case someone else is trying to bring the simulator to life. Thanks! //Gonzalo On Wed, Aug 5, 2015 at 7:25 AM, Manuel Rodríguez Pascual manuel.rodriguez.pasc...@gmail.com javascript:_e(%7B%7D,'cvml','manuel.rodriguez.pasc...@gmail.com'); wrote: Hi Gonzalo, I am

[slurm-dev] Re: Messing with job checkpointing

2015-06-08 Thread Manuel Rodríguez Pascual
That's exactly what I was looking for, thanks very much. 2015-06-02 16:30 GMT+02:00 Moe Jette je...@schedmd.com: See the MinJobAge configuration option: http://slurm.schedmd.com/slurm.conf.html Quoting Manuel Rodríguez Pascual manuel.rodriguez.pasc...@gmail.com: Hi all, I have been

[slurm-dev] Re: Messing with job checkpointing

2015-06-02 Thread Manuel Rodríguez Pascual
regards, Manuel 2015-05-29 18:00 GMT+02:00 Manuel Rodríguez Pascual manuel.rodriguez.pasc...@gmail.com: Hi all, I have been messing around a little bit with task checkpoint/restart. I am employing BLCR to checkpoint a fairly small application with slurm_checkpoint_vacate, what

[slurm-dev] Messing with job checkpointing

2015-05-29 Thread Manuel Rodríguez Pascual
? Or as an alternative, is there any API call that can be employed to trigger that purge? Thanks for your help, Manuel -- Dr. Manuel Rodríguez-Pascual skype: manuel.rodriguez.pascual phone: (+34) 913466173 // (+34) 679925108 CIEMAT-Moncloa Edificio 22, desp. 1.25 Avenida Complutense, 40 28040- MADRID

[slurm-dev] error code in slurm_checkpoint_complete

2015-03-31 Thread Manuel Rodríguez Pascual
by the function. Can anyone enlighten me? Thanks for your attention. Best regards, Manuel -- Dr. Manuel Rodríguez-Pascual skype: manuel.rodriguez.pascual phone: (+34) 913466173 // (+34) 679925108 CIEMAT-Moncloa Edificio 22, desp. 1.25 Avenida Complutense, 40 28040- MADRID SPAIN

[slurm-dev] Re: restarting checkpoint after slurm_checkpoint_vacate API call

2015-01-30 Thread Manuel Rodríguez Pascual
some sleep/retry logic should do what you want. Quoting Manuel Rodríguez Pascual manuel.rodriguez.pasc...@gmail.com: Good morning all, I am facing a problem when using slurm.h API to manage checkpoints. What I want to do is to checkpoint a running task, shut it down, and then restore

[slurm-dev] restarting checkpoint after slurm_checkpoint_vacate API call

2015-01-29 Thread Manuel Rodríguez Pascual
Good morning all, I am facing a problem when using slurm.h API to manage checkpoints. What I want to do is to checkpoint a running task, shut it down, and then restore it somewhere (in the same node or another one). slurm.conf is configured with: CheckpointType=checkpoint/blcr

[slurm-dev] Re: reccomended software stack for development?

2014-10-28 Thread Manuel Rodríguez Pascual
. Best regards, Manuel 2014-10-27 19:10 GMT+01:00 r...@q-leap.de: Manuel == Manuel Rodríguez Pascual manuel.rodriguez.pasc...@gmail.com writes: Hi Manuel, Manuel Hi all, I have the intention of working on Slurm, modifying Manuel it to satisfy my needs and (hopefully) include some new

[slurm-dev] Slurm script not writing to stdout in computing node

2014-10-20 Thread Manuel Rodríguez Pascual
= 0 TreeWidth = 50 UsePam = 0 UnkillableStepProgram = (null) UnkillableStepTimeout = 60 sec VSizeFactor = 0 percent WaitTime= 0 sec --- --- -- Dr. Manuel Rodríguez-Pascual skype: manuel.rodriguez.pascual phone: (+34) 913466173